NYTProf v5 – Flaming Precision

As soon as I saw a Flame Graph visualization I knew it would make a great addition to NYTProf. So I’m delighted that the new Devel::NYTProf version 5.00, just released, has a Flame Graph as the main feature of the index page.

nytprof-v5-flamegraph.png

In this post I’ll explain the Flame Graph visualization, the new ‘subroutine calls event stream’ that makes the Flame Graph possible, and other recent changes, including improved precision in the subroutine profiler. Continue reading

Introducing Devel::SizeMe – Visualizing Perl Memory Use

For a long time I’ve wanted to create a module that would shed light on how perl uses memory. This year I decided to do something about it.

My research and development didn’t yield much fruit in time for OSCON in July, where my talk ended up being about my research and plans. (I also tried to explain that RSS isn’t a useful measurement for this, and that malloc buffering means even total process size isn’t a very useful measurement.) I was invited to speak at YAPC::Asia in Tokyo in September and really wanted to have something worthwhile to demonstrate there.

I’m delighted to say that some frantic hacking (aka Conference Driven Development) yielded a working demo just in time and, after a little more polish, I’ve now uploaded Devel::SizeMe to CPAN.

In this post I want to introduce you to Devel::SizeMe, show some screenshots, a screencast of the talk and demo, and outline current issues and plans for future development. Continue reading

NYTProf v4 – Now with string-eval x-ray vision!

I released Devel::NYTProf v3 on Christmas Eve 2009. Over the next couple of months a few more features were added. The v3 work had involved a complete rewrite of the subroutine profiler and heavy work on much else besides. At that point I felt I’d done enough with NYTProf for now and it was time to focus on other more pressing projects.

Over those months I’d also started working on enhancements for PostgreSQL PL/Perl. That project turned into something of an epic adventure with more than its fair share of highs and lows and twists and turns. The dust is only just settling now. I would have blogged about it but security issues arose that led the PostgreSQL team to consider removing the plperl language entirely. Fortunately I was able to help avoid that by removing Safe.pm entirely! At some point I hope to write a blog post worthy of the journey. Meanwhile, if you’re using PostgreSQL, you really do want to upgrade to the latest point-release.

One of the my goals in enhancing PostgreSQL PL/Perl was improve the integration with NYTProf. I wanted to be able to profile PL/Perl code embedded in the database server. With PostgreSQL 8.4 I could get the profiler to run, with some hackery, but in the report the subroutines were all __ANON__ and you couldn’t see the source code, so there were no statement timings. It was useless.

The key problem was that Devel::NYTProf couldn’t see into string evals properly. To fix that I had to go back spelunking deep in the NYTProf guts again; mostly in the data model and report generation code. With NYTProf v4, string evals are now treated as files, mostly, and a whole new level of insight is opened up!

In the rest of this post I’ll be describing this and other new features.

Continue reading

NYTProf 2.09 – now handles modules using AutoLoader, like POSIX and Storable

I’ve uploaded Devel::NYTProf 2.09 to CPAN.

If you’re using VMS the big news is that Peter (Stig) Edwards has contributed patches that enable NYTProf to work on VMS. Yeah! Thanks Peter.

For the rest of us there’s only one significant new feature in this release: NYTProf now includes a heuristic (that’s geek for “it’ll be wrong sometimes”) to handle modules using AutoLoader. The most common of which are Storable and POSIX. You may have encountered a warning like this when running nytprofhtml:

Unable to open '/../../lib/Storable.pm' for reading: No such file or directory

It’s a symptom of a deeper problem caused by AutoSplit, the companion to AutoLoader. The details of the cause, effect, and fix aren’t worth going into now. If you’re interested you can read my summary to the mailing list.

The upshot is that NYTProf now reports times for autoloaded subs as if the sub was part of the parent module. Just what you want. Time spent in the AUTOLOAD’er sub is also reported, naturally.

There is a small chance that the heuristic will pick the wrong ‘parent’ module file for the autoloaded subroutine file. That can happen if there are other modules that use the same name for the last portion of the package name (e.g., Bar::Foo and Baz::Foo). If it ever happens to you, please let me know. Ideally with a small test case.

The list of changes can be found here (or here if you’re a detail fanatic).

One other particularly notable item is that the savesrc option wasn’t working reliably. Thanks to Andy Grundman providing a good test case, that’s now been fixed.

Enjoy!

NYTProf screencast from the 2008 London Perl Workshop

I’ve uploaded the screencast of my NYTProf talk at the London Perl Workshop in November 2008.

It’s based on a not-quite-2.08 version and includes some coverage of an early draft of the ‘timings per rolled-up package name’ feature I discussed previously.

It also shows how and why anonymous subroutines, defined at the same line of different executions of the same string eval, get ‘merged’.

The demos use perlcritic and Moose code. It also includes a nice demonstration showing NYTProf highlighting a performance problem with File::Spec::Unix when called using Path::Class::Dir objects.

It’s 36 minutes long, including a good Q&A session at the end (wherein a market rate for performance improvements is established). Enjoy.

NYTProf 2.08 – better, faster, more cuddly

I’ve just released NYTProf 2.08 to CPAN, some three and a half months after 2.07.

If you’ve been profiling large applications then the first difference you’ll notice is that the new version generates reports much faster.

NYTProf 2.08 timings.pngThe next thing you may notice is that statement timings are now nicely formatted with units. Gisle Aas contributed the formatting code for 2.07 but I had to do some refactoring to get it working for the statement timings.

Another nice refinement is that hovering over a time will show a tool-tip with the time expressed as a percentage of the overall runtime.

Almost all the tables are now sortable. I used jQuery and the tablesorter plugin for that. I’ve not added any fancy buttons, just click on a table heading to sort by that column. You’ll see a little black arrow to show the column is sorted. (You can hold the shift key down to add second and third columns to the sort order.)

A profiler isn’t much use if it’s not accurate. NYTProf now has tests for correct handling of times for string evals within string evals. In fact the handling of string evals got a big overhaul for this version as part of ongoing improvements in the underlying data model. I’m working towards being able to show annotated performance reports for the contents of string evals. It’s not there yet, but definitely getting closer.

A related feature is the new savesrc=1 option. When enabled, with a recent version of perl, the source code for each source file is written into the profile data file. That makes the profile self-contained and, significantly, means that accurate reports can be generated even after the original source files have been modified.

Another new option is optimize=0. You can use it to disable the perl optimizer. That can be worth doing if the statement timings, or counts, for some chunk of code seem odd and you suspect that the perl optimizer has rewritten it.

The final new feature noted in the NYTProf 2.08 Changes file is that it’s now possible to generate multiple profile data files from a single application. Since v2.0 you could call DB::disable_profile() and DB::enable_profile() to control profiling at runtime. Now you can pass an optional filename to enable_profile to make it close the previous profile and open a new one. I imagine this would be most useful in long running applications where you’d leave profiling disabled (using the start=none option) and then call enable_profile and disable_profile around some specific code in specific situations – like certain requests to a mod_perl app.

There’s one more new feature that I’ve just realised I’d forgotten to add to the Changes file before the release: Timings per rolled-up package name. What’s that? Well, it’s probably easiest to show you…

These images are taken from a profile of perlcritic. Each shows the time spent exclusively in subroutines belonging to a certain package and any packages below it. Hovering over a time gives the percentage, so I can see that the 57.3s spent in the 36 PPI packages accounted for 42% of the runtime.

NYTProf 2.08 pkg1.png

This gives you a quick overview for large (wide) codebases that would be hard to get in any other way.

Tables are generated for upto five levels of package name hierarchy, so you can drill-down to finer levels of detail.

NYTProf 2.08 pkg2.png

 

NYTProf 2.08 pkg3.png

I can visualize a much better UI for this data than the series of tables nytprofhtml currently produces, but my limited free time and jQuery skills prevent me doing more. Patches welcome, naturally.

Enjoy!

p.s. I’ve a screencast from my NYTProf talk at the London Perl Workshop in November I hope to (finally) upload soon. It includes a demo of the package roll-up timing tables.