At OSCON this year I gave a talk on my new pet project Devel::NYTProf v2 to a packed room. Turned out to be a lot of fun.
“The first thing I need to do is talk about Devel::DProf because it needs to be taken out and shot.”
I made a screencast of the 40 minute talk which you can watch on blip.tv here. Worth watching for the background on profilers, the demo of NYTProf, and the questions, not to mention the teasing I get along the way.
One of the final questions was about the size of the profile data file that NYTProf produces. One of the major drawbacks of statement-level profiling is the volume of data it generates while profiling your code. For every statement executed the profiler streams out the file id, the line number, and the time spent. For every statement! When trying to profile a full application doing real work the volume of data generated quickly becomes impractical to deal with. Multi-gigabyte files are common.
This was the major problem with Devel::SmallProf, which generated text files while profiling. Salvador Fandiño García addressed that in Devel::FastProf by writing the data in a compact binary form. A vast improvement that contributed to Devel::FastProf (on which Devel::NYTProf is based) being the first statement-level profiler worth using on large applications. Even so, the volume of data generated was still a problem when profiling all but short running applications.
NYTProf 2.03 was producing profile data at the rate of about 13MB per million statements executed. That might not sound too bad until you realise that on modern systems with cpu intensive code, perl can execute millions of statements every few seconds.
I could see a way to approximately halve the data volume by changing the format to optimize of the common case of consecutive statements being in the same file, but that wasn’t going to be enough. The best way forward would be to add zip compression. It would be easy enough to pipe the output stream through a separate zip process, but that approach has a problem: the zip process will be soaking up cpu time asynchronously from the app being profiled. That would affect the realtime measurements in an unpredictable way.
I realized back in June that a better approach would be to embed zip compression into NYTProf itself. Around the end of July Nicholas Clark, current Perl Pumpkin, got involved and was motivated to implement the internal zipping because he was “generating over 4Gb of profile data trying to profile the harness in the Perl 5 core running tests in parallel“.
He did a great job. The zlib library is automatically detected at build time and, if available, the code to dynamically route i/o through the zip library gets compiled in. The output stream starts in normal mode, so you can easily see and read the plain text headers in the data file, then switches to zip compression for the profile data. How well did it work out? This graph tells the story:
(The data relates to profiling perlcritic running on a portion of its own source code on my MacBook Pre 2GHz laptop. I only took one sample at each compression level so there may be some noise in the results.)
The data file size (red) plummets even at the lowest compression level. Also note the corresponding drop in system time (yellow) due to the reduction in context switches and file i/o.
I’ve set the default compression level to 6. I doubt you’ll want to change it, but you can by adding
compression=N to the
NYTPROF environment variable.
Here are the change notes for the 2.04 release:
Fixed rare divide-by-zero error in reporting code.
Fixed rare core dump in reporting code.
Fixed detection of #line directives to be more picky.
Fixed some compiler warnings thanks to Richard Foley.
Added on-the-fly ~90% zip compression thanks to Nicholas Clark.
Reduces data file size per million statements executed
from approx ~13MB to ~1MB (depends on code being profiled).
Added extra table of all subs sorted by inclusive time.
No longer warns about '/loader/0x800d8c/...' synthetic file
names perl assigns reading code from a CODE ref in @INC