Perl DynaLoader hack using .bs files

I needed to install a perl extension from a third-party. It’s an interface to a shared library, a .so file, that they also supply.

Normally I’d add the directory containing the shared library to the LD_LIBRARY_PATH environment variable. Then when the extension is loaded the system dynamic loader can find it.

For various reasons I didn’t want to do that in this case.

An alternative approach is to pre-load the shared library with a flag to make it’s symbols globally available for later linking. (The flag is called RTLD_GLOBAL on Linux, Solaris and other systems that use dlopen(). This hack may not work on other systems.)

But how to pre-load the shared library, only when needed, and without changing any existing perl code? This is where the pesky little .bs files that get installed with perl extensions come in handy.

They’re known as ‘bootstrap’ files. If the .bs file for an extension not empty then DynaLoader (and XSLoader) will execute the contents just before loading the shared object for the extension.

So I put code like this into the .bs file for the extension:

    use DynaLoader;
    DynaLoader::dl_load_file("$ENV{...}/...so", 1)
	    or die DynaLoader::dl_error();

Not a recommended approach, but neat and handy for me in this case.

NYTProf 2.09 – now handles modules using AutoLoader, like POSIX and Storable

I’ve uploaded Devel::NYTProf 2.09 to CPAN.

If you’re using VMS the big news is that Peter (Stig) Edwards has contributed patches that enable NYTProf to work on VMS. Yeah! Thanks Peter.

For the rest of us there’s only one significant new feature in this release: NYTProf now includes a heuristic (that’s geek for “it’ll be wrong sometimes”) to handle modules using AutoLoader. The most common of which are Storable and POSIX. You may have encountered a warning like this when running nytprofhtml:

Unable to open '/../../lib/Storable.pm' for reading: No such file or directory

It’s a symptom of a deeper problem caused by AutoSplit, the companion to AutoLoader. The details of the cause, effect, and fix aren’t worth going into now. If you’re interested you can read my summary to the mailing list.

The upshot is that NYTProf now reports times for autoloaded subs as if the sub was part of the parent module. Just what you want. Time spent in the AUTOLOAD’er sub is also reported, naturally.

There is a small chance that the heuristic will pick the wrong ‘parent’ module file for the autoloaded subroutine file. That can happen if there are other modules that use the same name for the last portion of the package name (e.g., Bar::Foo and Baz::Foo). If it ever happens to you, please let me know. Ideally with a small test case.

The list of changes can be found here (or here if you’re a detail fanatic).

One other particularly notable item is that the savesrc option wasn’t working reliably. Thanks to Andy Grundman providing a good test case, that’s now been fixed.

Enjoy!

NYTProf screencast from the 2008 London Perl Workshop

I’ve uploaded the screencast of my NYTProf talk at the London Perl Workshop in November 2008.

It’s based on a not-quite-2.08 version and includes some coverage of an early draft of the ‘timings per rolled-up package name’ feature I discussed previously.

It also shows how and why anonymous subroutines, defined at the same line of different executions of the same string eval, get ‘merged’.

The demos use perlcritic and Moose code. It also includes a nice demonstration showing NYTProf highlighting a performance problem with File::Spec::Unix when called using Path::Class::Dir objects.

It’s 36 minutes long, including a good Q&A session at the end (wherein a market rate for performance improvements is established). Enjoy.

NYTProf 2.08 – better, faster, more cuddly

I’ve just released NYTProf 2.08 to CPAN, some three and a half months after 2.07.

If you’ve been profiling large applications then the first difference you’ll notice is that the new version generates reports much faster.

NYTProf 2.08 timings.pngThe next thing you may notice is that statement timings are now nicely formatted with units. Gisle Aas contributed the formatting code for 2.07 but I had to do some refactoring to get it working for the statement timings.

Another nice refinement is that hovering over a time will show a tool-tip with the time expressed as a percentage of the overall runtime.

Almost all the tables are now sortable. I used jQuery and the tablesorter plugin for that. I’ve not added any fancy buttons, just click on a table heading to sort by that column. You’ll see a little black arrow to show the column is sorted. (You can hold the shift key down to add second and third columns to the sort order.)

A profiler isn’t much use if it’s not accurate. NYTProf now has tests for correct handling of times for string evals within string evals. In fact the handling of string evals got a big overhaul for this version as part of ongoing improvements in the underlying data model. I’m working towards being able to show annotated performance reports for the contents of string evals. It’s not there yet, but definitely getting closer.

A related feature is the new savesrc=1 option. When enabled, with a recent version of perl, the source code for each source file is written into the profile data file. That makes the profile self-contained and, significantly, means that accurate reports can be generated even after the original source files have been modified.

Another new option is optimize=0. You can use it to disable the perl optimizer. That can be worth doing if the statement timings, or counts, for some chunk of code seem odd and you suspect that the perl optimizer has rewritten it.

The final new feature noted in the NYTProf 2.08 Changes file is that it’s now possible to generate multiple profile data files from a single application. Since v2.0 you could call DB::disable_profile() and DB::enable_profile() to control profiling at runtime. Now you can pass an optional filename to enable_profile to make it close the previous profile and open a new one. I imagine this would be most useful in long running applications where you’d leave profiling disabled (using the start=none option) and then call enable_profile and disable_profile around some specific code in specific situations – like certain requests to a mod_perl app.

There’s one more new feature that I’ve just realised I’d forgotten to add to the Changes file before the release: Timings per rolled-up package name. What’s that? Well, it’s probably easiest to show you…

These images are taken from a profile of perlcritic. Each shows the time spent exclusively in subroutines belonging to a certain package and any packages below it. Hovering over a time gives the percentage, so I can see that the 57.3s spent in the 36 PPI packages accounted for 42% of the runtime.

NYTProf 2.08 pkg1.png

This gives you a quick overview for large (wide) codebases that would be hard to get in any other way.

Tables are generated for upto five levels of package name hierarchy, so you can drill-down to finer levels of detail.

NYTProf 2.08 pkg2.png

 

NYTProf 2.08 pkg3.png

I can visualize a much better UI for this data than the series of tables nytprofhtml currently produces, but my limited free time and jQuery skills prevent me doing more. Patches welcome, naturally.

Enjoy!

p.s. I’ve a screencast from my NYTProf talk at the London Perl Workshop in November I hope to (finally) upload soon. It includes a demo of the package roll-up timing tables.

Can you reproduce this NYTProf failure?

I’ve a new release of NYTProf ready to upload but I’m stuck.

The CPAN Testers service is reporting a failure on a number of systems but I can’t reproduce it locally or work out the cause.

Can you reproduce the failure with Devel::NYTProf 2.07_94? If so, could you give me remote access via ssh? (Or spare some time to investigate yourself – I’ll happily clue you in if you can reproduce the problem.)

Update: No one could reproduce it. It seems that the failures was not what it appeared to be. A clue was that only one tester was affected. Devel-NYTProf-2.07_94.tar.gz unpacked itself into a directory called Devel-NYTProf-2.07. It seems that when using CPANPLUS if the user had already got an old Devel-NYTProf-2.07 directory its contents got merged and tests would fail. I’m not convinced that’s the whole story, but Devel-NYTProf-2.07_95.tar.gz unpacked into a Devel-NYTProf-2.07_95 directory and didn’t run into the problem.

Update: More usefully, Andreas made my wish come true by pointing out the –solve parameter to the ctgetreports utility in his CPAN::Testers::ParseReports distribution. It “tries to identify the best contenders for a blame using Statistics::Regression. […] The function prints the […] top 3 candidates according to R^2 with their regression analysis.” Cool.

NYTProf 2.07 – Even better, and now for Windows too

NYTProf development rolls on, with 2.07 being another significant release.

Here’s a summary of highlights from the many improvements.

Windows, at last

Windows wizard Jan Dubois has contributed patches that enable NYTProf to run on Windows. Many thanks Jan!

Corrected Subroutine Total Time

A significant error in time reported as spent in a subroutine has been fixed. It was showing the sum of the inclusive and exclusive time instead of just the inclusive time.

String Evals

One of NYTProf’s innovative features is tracking of subroutine calls by calling location. In earlier versions you couldn’t see calls made from within string evals. Now you can. Even for evals within evals:

NYTProf string eval.png

That’s a big step forward for profiling code that makes heavy use of evals.

(The use of the µ micro symbol is also new – part of some helpful interface polishing contributed by Gisle Aas. They’re not used on the statement timings yet, but will be at some point.)

Recursion

Previously the ‘inclusive time’ reported for a subroutine wasn’t useful for subroutines called recursively. The overall inclusive time included the inclusive time of nested calls, typically inflating it beyond usefulness.

Now the inclusive time only measures the inclusive time of the outermost calls. NYTProf now also tracks and reports the maximum call depth for each calling location.

Disabling the Statement Profiler

NYTProf started life as a statement profiler. I added a subroutine profiler later and was delighted with how well that worked out. The subroutine profiler is novel in a few ways and also extremely robust and efficient. Part of the efficiency comes from not streaming data to disk while the program is running. It accumulates the counts and timings in memory.

When profiling large and/or performance sensitive applications the costs of NYTProf’s statement profiler, in performance impact and disk space, can be prohibitive. So now I’ve added an option to disable the statement profiler. Setting the stmts=0 option turns NYTProf into a very lightweight subroutine profiler.

Caller Information for XSubs

NYTProf reports time spent in subroutines broken down by calling location. It’s one of its key features. The caller details are embedded as comment annotation into the report for each perl source code file.

That has worked very well for perl subs, but there’s no natural home for XS subroutines because there was no source code file available to annotate.

There’s a partial solution included in NYTProf 2.07: XS subs that are in the same package as profiled perl subs, get annotated subroutine ‘stubs’ appended to the report for the corresponding perl source file.

It works quite well, in particular the common UNIVERSAL methods get annotated:

NYTProf xsub callers.png

Also, now these subs have a place in the reports, any references to them are hyper links, naturally.

Sadly it doesn’t work so well for packages that don’t contain any perl subs, like DBI::st. I plan to add per-package report pages for them in a future release.

Clock Docs

I’ve greatly expanded the description of the various clocks available for profiling: POSIX real-time clocks, gettimeofday(), and Time::HiRes.

I’ve also added a caveats section that discusses issues with SMP systems, processor affinity, and virtual machines.

Feedback Wanted

How has NYTProf helped you optimize your applications? Let me know.

Perhaps as blog posts of your own, linking back here, or comments here, or CPAN ratings, or email to the mailing list or just to me. I’d love to hear from you.

Thanks!