NYTProf v4 – Now with string-eval x-ray vision!

I released Devel::NYTProf v3 on Christmas Eve 2009. Over the next couple of months a few more features were added. The v3 work had involved a complete rewrite of the subroutine profiler and heavy work on much else besides. At that point I felt I’d done enough with NYTProf for now and it was time to focus on other more pressing projects.

Over those months I’d also started working on enhancements for PostgreSQL PL/Perl. That project turned into something of an epic adventure with more than its fair share of highs and lows and twists and turns. The dust is only just settling now. I would have blogged about it but security issues arose that led the PostgreSQL team to consider removing the plperl language entirely. Fortunately I was able to help avoid that by removing Safe.pm entirely! At some point I hope to write a blog post worthy of the journey. Meanwhile, if you’re using PostgreSQL, you really do want to upgrade to the latest point-release.

One of the my goals in enhancing PostgreSQL PL/Perl was improve the integration with NYTProf. I wanted to be able to profile PL/Perl code embedded in the database server. With PostgreSQL 8.4 I could get the profiler to run, with some hackery, but in the report the subroutines were all __ANON__ and you couldn’t see the source code, so there were no statement timings. It was useless.

The key problem was that Devel::NYTProf couldn’t see into string evals properly. To fix that I had to go back spelunking deep in the NYTProf guts again; mostly in the data model and report generation code. With NYTProf v4, string evals are now treated as files, mostly, and a whole new level of insight is opened up!

In the rest of this post I’ll be describing this and other new features.

Seeing Into String Evals

Let’s start by taking a look at a small example:

perl -d:NYTProf -e 'eval("sub { $_ }")->() for 1,2,2'

That executes three string evals, each of which defines an anonymous subroutine which is then executed. Two of the three evals have identical source code.

With NYTProf 3.11 the report for the “-e file” looked like this:

NYTProf eval-sub example1 311.png

Two key things to note: Firstly, there’s no link to drill-down to see the actual source code executed by the eval. (In this example we can see the source, but that’s rare in practice.) Secondly, the three anonymous subroutines have been merged. You can’t see individual details like call counts, callers, or timings.

(In case you’re wondering, the main::BEGIN subroutine is defined as a side effect of loading NYTProf, and the main::RUNTIME subroutine is a dummy created by NYTProf to act as the ‘root caller’. You’ll see it appear as the caller of the anonymous subs in a later screenshot.)

With NYTProf 4.00 the same report looks like this:

NYTProf eval-sub example1 400.png

Now you can see much more detail right there. The two evals with identical source code have been merged, as have the identical anonymous subroutines defined by them. The eval and anonymous sub with different source code have been kept separate. What you can’t easily see from the image is that the “string eval” texts in the grey annotation are links. This is where it’s gets more interesting…

Clicking on the “2 string evals (merged)” link takes us to a typical NYTProf report page showing the performance-annotated source code executed by the eval:

NYTProf eval-sub example1 eval.png

What you’re looking at here is source code that never existed as a file. (That second line containing a semicolon was added by perl as part of the implementation of eval.)

In the table at the top, you’ll see “Eval Invoked At” with a link that’ll take you to the eval statement that executed this source code. You’ll also see a “Sibling evals” row. That’s added in cases where an eval was executed multiple times and not all were merged into a single report. Finally, because this particular eval includes data merged from others, the report includes a clear banner alerting you to how many evals were merged to produce this report page.

Update: You’ll need a recent version of perl (5.8.9+, 5.10.1+, or 5.12+) to see the eval source code.

So why does NYTProf go to all the effort of merging evals and anonymous subs? Here’s a real-world example using a profile of perlcritic:

NYTProf eval-sub perlcritic1.png

Without merging, those 4196 evals would have produced 4196 report pages! Their sheer volume making them almost useless. Now it’s clear from the report that only a few distinct source code strings are used and the reports for each are far more useful.

In that example perlcritic is compiling lots of tiny snippets of code. Many applications use string eval to compile large quantities of code. The ORLite module is one example. It dynamically generates and compiles a large chunk of code with many subroutines that implement a customized interface for a specific SQLite database file.

With NYTProf 3.11 you couldn’t see the hundreds of lines of source code, or the per-caller subroutine performance, or the individual statement performance. All you could see was a list of subs calls and the overall time spent in each:

NYTProf eval-sub cpangraph 311.png

That was better than nothing, but now, with NYTProf v4, you can see everything!

Aside…

I see a SQLite in the Distance

Getting all this working correctly, especially the data model manipulations required for merging evals and anonymous subroutines, was far more painful than I’d anticipated. I both blessed and cursed the test suite on numerous occasions!

I think it would be wise for NYTProf reporting and data model code to read the data from, and manipulate the data in, an SQLite database. That would yield simpler more maintainable code. It would also be enable nytprofhtml to be used for presenting performance data from other sources, including perl6.

If you’re interested in working on this, starting with a utility to load an nytprof.out file into a SQLite database, please contact the mailing list.

String Eval Timings

There’s a slight caveat worth noting about the timings shown for string evals that define subroutines.

Timings for string evals are taken from the statement profiler (the subroutine profiler doesn’t pay attention to evals). So the “time spent executing the eval” is the sum of the time spent executing any statements within the eval.

That’s fine for evals that don’t define subroutines. For those that do, the time for the eval includes not only the time spent executing the eval itself but also time spent executing statements in subroutines defined within the eval but called later from outside it.

Hence the careful wording you can see in the example from perlcritic shown previously:

# spent  2.65s executing statements in 1369 string evals (merged)
# includes 1.83s spent executing 122015 calls to 1 sub defined therein.

I could have automatically subtracted the subroutine time from the eval time but I was wary about doing that (for some reason that currently escapes me). Maybe that’ll change in the near future. Further out, a future version of NYTProf might use the subroutine profiler logic to time evals. That’s a deeper change that would give a more natural view of the timings.

Other Changes in v4

Subroutines that couldn’t be associated with a perl source file, such as xsubs in packages with no perl source, used to not appear in reports at all. So associated caller and timing information couldn’t be seen. Now it can.
Similarly, subroutine calls that couldn’t be associated with a specific line, such as calls made by perl to END blocks, are now shown in reports. They appear as as calls from line 0.
NYTProf v3 added renaming of BEGIN subs, so a BEGIN (or use) on line 3 would be called BEGIN@3 and so kept distinct from others in the same package. NYTProf v4 takes that further by detecting the rare cases where the modified name isn’t unique and adding a sequence number to it, like BEGIN@3.59.
NYTProf v2 added the savesrc option to enable storing a copy of the profiled perl source code into the profile data file itself. This makes report generation immune from later changes to the source files. NYTProf v4 now enables that option by default.
The report generator used to only generate report lines up to the maximum number of source lines present. If there was no source code available, for whatever reason, you’d get an empty report for that file, even though there was useful information to report. Now the report generates enough lines to ensure all available profile information gets included. This is especially useful for old perl versions where source code is more likely to be unavailable. Also, the report generator now collapses groups of three or more blank lines.
Nicholas Clark contributed changes to refine the timing of the beginning and end of profiling. Now END blocks defined at runtime are included in the profile and compilation-only checks (e.g., perl -c) can also be profiled.
You may be aware that the POSIX::_exit function exits the process immediately, without flushing stdio buffers and without giving perl a chance to clean up. That means NYTProf didn’t get a chance to finish and the profile wasn’t usable. NYTProf v4 now intercepts calls to POSIX::_exit and cleans up properly.
Finally, tired of waiting for nytprofhtml to produce a report from a long profile run? The new --minimal (-m) option makes nytprofhtml skip building reports for the rarely used ‘blocks’ and ‘subs’ levels of detail and skips generating the graphviz .dot files. That saves a lot of time.

Enjoy!

Translations: Belorussian translation thanks to Bohdan Zograf.

10 thoughts on “NYTProf v4 – Now with string-eval x-ray vision!”

Vyacheslav Matjukhin

June 9, 2010 at 9:11 pm

Hi!
Thanks for your great work, this is a really exciting feature which i wanted for a very long time.

But it strangely doesn’t work for me, generated report contains this comment:
> No source code available for string eval (eval 50)[lib/PPB/Join/Abstract.pm:151].
> See savesrc option in documentation.
And after that, many lines with times and “spent XXX ms making XXX calls” comments, but all code lines are empty.

Am I missing something?
- TimBunce
  
  June 9, 2010 at 9:31 pm
  
  I’d guess that your perl version is too old. Did you read the savesrc section in the documentation and try using the use_db_sub=1 option that it recommends?
Vyacheslav Matjukhin

June 9, 2010 at 9:43 pm

Oops, i didn’t noticed note about perl version in documentation. I’m using 5.8.8 (system perl from ubuntu hardy).

But use_db_sub=1 doesn’t seem to have any effect…
- Vyacheslav Matjukhin
  
  June 9, 2010 at 10:12 pm
  
  I reported this issue with simplest example on rt.perl.org (or maybe it is still my mistake, i’m not sure).
  Anyway, on perl 5.10 it works perfectly.
  - TimBunce
    
    June 10, 2010 at 9:35 am
    
    You’ll need a recent version of perl (5.8.9+, 5.10.1+, or 5.12+) to see the eval source code. I wasn’t clear on that myself. The confusion arose because use_db_sub=1 does allow old versions of perl to record the “perl -e …” and “perl -” source code. I’ve added an update to the blog post and I’ll make a 4.01 release soon. Thanks for prompting me to clarify this!
Alex Kapranoff

June 14, 2010 at 5:52 pm

It would be very interesting to know what you managed to replace Safe.pm with (and why).
- TimBunce
  
  June 14, 2010 at 8:38 pm
  
  And thereby hangs a tale! It’ll have to wait for a blog post though. Most of the discussion took place on the (unarchived) plsql-security list. The final patch is here if you’d like to dig that deep.
Fulko Hew

July 10, 2010 at 12:44 am

I think NYTProf is a wonder tool. I looked at my MIB/ASN.2 compiler I wrote in Perl, and the results are interesting. But most of the code I want to look at are CGI’ish stuff. How would I get NYTProf to profile their execution?
Pingback: java2perl6api – Java to Perl 6 API tranalation – What, Why, and Whereto « Not this…
Pingback: NYTProf v4 – Now with string-eval x-ray vision! | Blog Irakli Nishnianidze

Comments are closed.

	A Comparison of Auto… on A Comparison of Automatic Spee…
	A Comparison of Auto… on A Comparison of Automatic Spee…
	TimBunce on A Comparison of Automatic Spee…
	alexmelman on A Comparison of Automatic Spee…
	TimBunce on A Comparison of Automatic Spee…