NYTProf v2 – A major advance in perl profilers

After much hacking, and just in time for OSCON, I’m delighted to announce the release of Devel::NYTProf version 2. A powerful, efficient, feature-rich perl source code profiler.

“If NYTProf 1 is a Toyota, then 2.0 is a Cadillac”
— Adam Kaplan, author of NYTProf 1.

The Short Story

(I’ve written up the long story, for the record, in another post.)

Adam forked Devel::FastProf (the best statement-level profiler at the time), added a test harness and tests, made some internal improvements, and grafted on an html report derived from Devel::Cover. The resulting Devel::NYTProf v1 was a big hit.

Meanwhile I’d been working on Devel::FastProf, contributing some interesting new profiling features, but it had no test harness and no html reports. When Adam released NYTProf I switched. Attracted not so much by the html report as by the test harness. (A lesson to anyone wanting to attract developers to an open source project.)

I started out by adding in the same new features I’d been adding to FastProf, albeit with more polish and tests. And then I got carried away…

“Holy shit! That is amazing.”
— Andy Lester, after using a recent development version.

Example Screen Shots

As an example I’ve used NYTProf to profile perlcritic 1.088 running on its own source code.

$ cd Perl-Critic-1.088
$ perl -d:NYTProf -S perlcritic .
$ nytprofhtml
$ open nytprof/index.html

The first image is the main index page, showing the top few subroutines and the start of the list of all source files.

NYTProf perlcritic index.png

Each source file has links to separate line-level, block-level, and sub-level reports (though I hope to merge them in future). Clicking on a subroutine name takes you to the line-level report for the file it's defined in and positions you at the subroutine definition.

(The color coding is based on Median Absolute Deviation of all the values in the column, same as in NYProf v1.)

Here's the all_perl_files() subroutine, for example:

NYTProf perlcritic all_perl_files.png

The colored numbers show the number of statements executed, the total time taken, and the average. The statement times are always exclusive times. Time actually spent on that statement, the expressions and any built-in functions it uses. It doesn't include any time spent executing statements elsewhere in subroutines called by it. In NYTProf subroutine timings are inclusive and statement timings are exclusive.

Where did you come from and where are you going?

Notice the grey text.

On lines that define a subroutine NYTProf now adds 'comments' giving the total number of times the sub was called, the inclusive time spent in that sub, and the average. Then it adds a break-down of the same details for every location that called the subroutine. Here's a better example of that:

NYTProf sub-callers.png

On lines that call subroutines NYTProf now adds 'comments' giving the name of the actual subroutines called (resolving code refs to subroutine names, including the class that handled a method call). It also tells you how many calls were made and how much time was spent in that subroutine for calls made from that line. Here's an example:

NYTProf subs-called.png

When you mouse-over the grey text it turns black and you can click on embedded links to take you to the callers or callees. So with a few clicks you can run up and down the call stack exploring where the time is being spent and where the hot spots are being called from. The ability to explore the code so easily, guided by these performance signposts is incredibly useful.

Rolling up for a higher level view

Sometimes per-statement timing can overwhelming. In large subroutines it becomes "hard to see the wood for the trees". So, for the first time in any Perl profiler, NYTProf now provides a block-level view of the timing data:

NYTProf perlcritic all_perl_files block level.png

What's happening here is that NYTProf is taking the same time measurements per statement, but instead of accumulating the time against the line the statement is on, it accumulates it against the line of the first statement in the enclosing block. (The actual line it accumulates it against isn't ideal in some cases. I'm hoping to improve that soon.)

This report is a little more tricky to read but can be very useful, especially in large subroutines. (I hope to improve the readability with some css :hover magic in future.)

The subroutine-level report is very similar except that all the timings are accumulated against line of the first statement in the subroutine.

Have a Look

Back in June I gave a talk at the Irish Open Source Technology conference where I showed the first version of the annotated html report (which I'd been hacking on till 3am the night before while struggling with a cold - presentations are great motivators). You can see the 15 minute video here or here).

Explore for yourself

I've uploaded the full set of reports for you to explore here. Take a look. Let me know what you think.

Devel::DProf – broken by the passage of time

Measure twice, cut once.

To measure the performance of your Perl code many developers instinctively reach for Devel::DProf. The venerable perl profiler dates back to 1995.

When profiling you have two broad choices: cpu-time vs wallclock time, and subroutine-level profiling vs line-level profiling. DProf is a subroutine-level profiler. (I’ll talk more about subroutine-level profiling vs line-level profiling in an upcoming post about the soon-to-be-released NYTProf 2.0.)

Measuring cpu-time should, in theory, give you a clear picture of the time spent by perl executing your code, independent of the load on the machine or delays waiting for i/o. Sadly life isn’t so simple.

The big problem is that on most systems cpu-time measurement has a resolution of 0.01 seconds (a HZ value of 100). That may not sound like much time, but you can execute a lot of perl code, and a lot of subroutine calls, in that time.

So any profiler using cpu-time with such a course granularity is going to give ‘noisy’ results. Some subroutines may appear to be slow because the cpu-time ‘tick’ happened to occur more while those subs were running. Other subroutines may appear to be fast because the cpu-time ‘tick’ doesn’t happen much while they’re running.

Try playing around with the venerable Devel::DProf like this:

$ perl -we 'print "sub s$_ { sqrt(42) for 1..100 }; s$_({});\n" for 1..1000' > x.pl
$ perl -d:DProf x.pl
$ dprofpp
Total Elapsed Time = 0.047999 Seconds
  User+System Time = 0.047999 Seconds
Exclusive Times
%Time ExclSec CumulS #Calls sec/call Csec/c  Name
 20.8   0.010  0.010      1   0.0100 0.0100  main::s986
 20.8   0.010  0.010      1   0.0100 0.0100  main::s29
 20.8   0.010  0.010      1   0.0100 0.0100  main::s321
 0.00       - -0.000      1        -      -  main::s322
 0.00       - -0.000      1        -      -  main::s323
 0.00       - -0.000      1        -      -  main::s329
 0.00       - -0.000      1        -      -  main::s330

Clearly nonsense. Out of 1000 subroutines that would all take about the same length of time, a cpu-time 'tick' happened to occur in those three. The rest of the subroutines appear to be 'free'. (If you play with this yourself you'll probably need to adjust the 1..100 to see the effect on your system.)

Using dprofpp -r (to "display elapsed real times" instead of cpu-times) gives a very similar granular result. That's because Devel::DProf artificially reduces the resolution of real time to match the granular cpu-time! It's also odd that dprofpp doesn't provide an option to sort by elapsed time, but there'd be little value anyway given the low resolution.

Many people have reported strange results from Devel::DProf over the years. Design decisions that made sense back in 1995 are now causing problems as CPUs have got so much faster. Several have implemented alternative profilers.

Using cpu-time is effectively a kind of statistical sampling. That's fine as long as you remember it and interpret the results accordingly. The work-around for the course granularity is to execute the code for long enough for the effects to be averaged out. But that can be inconvenient, and how do you know how much is enough?

Personally I'd recommend using wallclock time for profiling. I want to know how much time was spent waiting for network, waiting for disk, waiting for cpu. Those are often significant factors in performance issues. Too many network round trips, or too many individual disk requests, for example, can easily kill performance. If you only profile cpu time you may not spot these kinds of problems. You may just be chasing the random ghosts of occasional cpu clock ticks.

 

So which profiler would I recommend? Stay tuned for an announcement in a day or so. Easy! NYTProf 2.0

Concurrency and Erlang, and more

Just found the excellent Concurrency and Erlang page by André Pang. (I’m not sure how I got there, but I started froma post by Pedro Melo.)

The page has great links to quality articles and resources with commentary and context for each. It also includes sections specific to C, Objective-C, C++, Java, Python, JavaScript, and Haskell.

What, no Perl? Well, using threads in Perl 5 is rather painful. I’ve never had to use threads with Perl 5 (beyond making DBI thread safe a few years ago) and I’d be happy to never have to.

On the other hand, I believe people are using threads successfully, though I’ve no handy links for you beyond pointing out that CPAN offers a number of solid Thread:: modules.

All this reminded me that I’d never got around to reading Parrot’s Concurrency design document. So I did. I liked it as a statement of direction, though it’s a little thin on the interaction between schedulers.

I couldn’t find many interesting links discussing both Parrot and Erlang. An O’Reilly Radar post called Parrot and Multi-threading from September 2007 was hopeful.

I’m still wondering if Parrot could act as a virtual machine for Erlang. I think that would be a valuable test case for the quality and scalability of the concurrency design.

Irish Open Source Technology Conference – June 18th-20th

I’ll be speaking at the Irish Open Source Technology Conference this year. It’s on at Dublin’s CineWorld Complex, from June 18th for three days. They’re running a 2-for-1 offer on tickets at the moment.

I’ll be speaking about something Perl’ish, naturally.

The “Perl Myths” presentation I gave at IWTC earlier this year turned out to be a hit. (At least, it was after the event. There were less than ten people in the room at the time, including me! Perl clearly isn’t a hot topic among Irish web developers.)

My blog post, with embedded slides and video, has topped 7400 hits, plus another 3000 or so views on slideshare.

I’m upgrading my methods for this next talk. I’ve bought a great wireless USB microphone and the amazing
ScreenFlow screencasting software to capture everything in detail.

So I’m going all high-tech. No more “camcorder perched on a chair at the back” for me!

It’ll be a good trial run for OSCON where I’m speaking in July.

Finding the cause of inexplicable warnings in XS code

Occasionally you may run across an odd warning like this:

   Use of uninitialized value in subroutine entry at X line Y

where the code at that line is a call to an XS subroutine (let's call it xsub()) and you're certain that the arguments you're passing are not undefined.

Somewhere, deep in the XS/C code, an undefined value is being used. But where? And why is perl reporting that line?

Perl is reporting the last line of perl code that was executed at the same or higher level in the stack. So other perl code, such as a callback, may have been executed between entering xsub() and the warning being generated, but that perl code must have returned before the warning was triggered.

Assuming XS/C code is large and complex, like mod_perl, how can you locate the code that's triggering the warning?

Here's a trick I've used a few times over the years:

    $SUB{__WARN__} = sub {
        CORE::dump if $_[0] =~ /uninitialized value in subroutine entry/;
        warn @_;
    }

That make the program abort and generate a core dump file at the point the warning is generated. You can then use a debugger, or Devel::CoreStack, to report the C call stack at the time. It's a savage but effective technique.

If the XS/C code was compiled with options to keep debug info (i.e., -g) then that'll show you exactly where in the XS/C code the undefined value is being used. If not, then it'll at least show you the name of the XS/C function and the overall call stack.

(The dump function is a curious vestige of old ways. You could use kill(9, $$). I'm not sure about the portability of either, for this purpose, beyond unix-like systems.)

I suggested the technique to Graham Barr recently and it proved effective in tracking down the source of that warning in a very large mod_perl application. The warning pointed the finger at a $r->internal_redirect($uri) call. The actual cause was a PerlInitHandler returning undef. (The handler was an old version of DashProfiler::start_sample_period_all_profiles.)

Anyway, it dawned on me this morning that I should update the technique. It doesn't have to be so savage. On modern systems you don't need to shoot the process dead to get a C stack trace.

A few approaches came to mind:

  • spawn a "gcore $$" command (or similar) to get a core file from the running process
  • spawn a "pstack $$" command (or similar) to directly dump the stack trace from the running process
  • spawn a "gdb $$ &" (to attach to the running process) followed immediately by kill(17, $$) to send a SIGSTOP to the process to give time for the debugger to attach and for you to investigate the state of the live process.

I think the second of those would be most useful most of the time.

Hopefully this will be useful to someone.

TIOBE or not TIOBE – “Lies, damned lies, and statistics”

[I couldn't resist the title, sorry.]

“Figures often beguile me, particularly when I have the arranging of them myself; in which case the remark attributed to Disraeli would often apply with justice and force: ‘There are three kinds of lies: lies, damned lies, and statistics.
- Mark Twain

I’ve been meaning to write a post about the suspect methodology of the TIOBE Index but Andrew Sterling Hanenkamp beat me to it (via Perl Buzz).

I do want to add a few thoughts though…

The TIOBE Programming Community Index is built on two assumptions:

  • that the number of search engine hits for the phrase “foo programming” is proportional to the “popularity” of that language.
  • that the proportionality is the same for different languages.
  • It’s not hard to pick holes in both of those assumptions.

    They also claim that “The ratings are based on the number of skilled engineers world-wide, courses and third party vendors” but I can’t see anything in their methodology that supports that claim.
    I presume they’re just pointing out the kinds of sites that are more likely to contain the “foo programming” phrase.

    Even if you can accept their assumptions as valid, can you trust their maths? Back in Jan 2008 when I was researching views of perl TIOBE was mentioned. So I took a look at it.

    At the time Python had just risen above Perl, prompting TIOBE to declare Python the “programming language of the year”. When I did a manual search, using the method they described, the results didn’t fit.

    I wrote an e-mail to Paul Jansen, the Managing Director and author of the TIOBE Index. Here’s most of it:

    Take perl and python, for example:

    I get 923,000 hits from google for +”python programming” and 3,030,000 for +”perl programming”. (The hits for Jython, IronPython, and pypy programming are tiny.) As reported by the “X-Y of approx Z results” at the top of the search results page.

    Using google blog search I get 139,887 for +”python programming” and 491,267 for +”perl programming”. (The hits for Jython, IronPython, and pypy programming are tiny.)

    So roughly 3-to-1 in perl’s favor from those two sources. It’s hard to imagine that “MSN, Yahoo!, and YouTube” would yield very different ratios.

    So 3:1 for perl, yet python ranks higher than perl. Certainly seems odd.

    Am I misunderstanding something?

    I didn’t get a reply.

    I did note that many languages had dipped sharply around that time and have risen sharply since. Is that level of month-to-month volatility realistic?

    Meanwhile, James Robson has implemented an alternative, and open source, set of Language Usage Indicators. I’m hoping he’ll add trend graphs soon.

    Update: the story continues.


    Comparative Language Job Trend Graphs

    I researched these comparative job trend graphs for my Keynote at the 2007 London Perl Workshop, and then added a few more for this blog post.

    The graphs are from indeed.com, a job data aggregator and search engine. They’re all live, so every time you visit this page they’ll be updated with the current trend data (though it seems the underlying data isn’t updated often). My notes between the graphs relate to how they looked when I wrote this post in February 2008 (and the graphs were all Feb 2005 thru Dec 2008).

    First up, all jobs that even mention perl, python or ruby anywhere in the description:

    The most amazing thing to me about this graph is that it indicates that 1% of all jobs mention perl. Wow.

    (Perhaps the profile of the jobs indeed.com is a little skewed towards technical jobs. If it is then I’m assuming it’s equally skewed for each of the programming languages. Note: An addendum below shows that ruby is getting ~17% boost through false positive matches from other jobs, like Ruby Tuesday restaurants. That applies to the graphs here that don’t qualify the search with an extra term like ‘software engineer’.)

    Here’s a slightly more focussed version that compares languages mentioned in jobs for “software engineer” or “software developer” roles:

    'software engineer' and 'software developer' roles mentioning perl or python or ruby

    A similar pattern. The narrowing of the gap between Perl and the others languages looks like good evidence of Perl’s broad appeal as a general purpose tool beyond the pure “software engineering/development” roles.

    I wanted to focus on jobs where developing software using a particular language was the principle focus of the job. So then I looked for “foo developer” jobs:

    perl developer vs python developer vs ruby developer

    That increases the gap between Perl and the others. Perhaps a reflection of Perl’s maturity – that it’s more entrenched so more likely to be used in the name of the role.

    But do people use “foo developer” or “foo programmer” for job titles? Let’s take a look:

    So “foo developer” is the most popular, but “foo programmer” is still significant, especially for Perl. (It’s a pity there’s no easy way to combine the pairs of trend lines. That would raise Perl even further.)

    To keep us dynamic language folk in our place, it’s worth comparing the trends above with those of more static languages:

    same as above but with C, c# and c++

    C++ and C# dwarf the dynamic languages. C and cobol are still alive and well, just.

    Then, to give the C++ and C# folk some perspective, let’s add Java to the mix:

    same as above but with java

    C++ and C# may dwarf the dynamic languages, but even they are dwarfed by Java.

    Let’s take a slight detour now to look at web related work. (It’s a detour because this post isn’t about web related work, it’s about the jobs market for the three main general purpose dynamic languages. People doing web work can tend to assume that everything is about web work.)

    We’ll start by adding in two more specialist languages, PHP and JavaScript:

    php and javascript developer

    I’m not surprised by the growth of PHP, though I’m sad that so many people are being introduced to ‘programming’ through it. I’m more surprised by the lack of height and growth in JavaScript. I presume that’s because it’s still rare for someone to be primarily a “JavaScript developer”. (That’ll change.) Let’s check that:

    perl, python, ruby, php, javascript, web-developer

    That’s much closer to what I’d expected. PHP is a popular skill, but is mentioned in less than half the jobs than Perl is. JavaScript, on the other hand, is in great and growing demand.

    Let’s look at the “web developer” role specifically and see which of the languages we’re interested in are mentioned most frequently:

    I think this graph captures the essence of why people think Perl is stagnant. It’s because Perl hasn’t been growing much in the ‘web developer’ world. People in that world are the ones most likely to be blogging about it and, I’ve noticed, tend to generalize their perceptions.

    (If you’re interested in PHP, Java, ASP and JavaScript and look here you’ll see that they all roughly follow the PHP line at about twice the height. JavaScript is at the top with accelerating growth.)

    Finally, just to show I’m not completely biased about Perl, here are the relative trends:relative trends

    This kind of graph reminds me of small companies that grow by a small absolute amount, say two employees growing to four, and then put out a press release saying they’re the “fastest growing company” in the area, or whatever. Dilbert recognises the issue. The graph looks striking now (Q1 2008) but means little. If it looks much like that in two years time, then it’ll be more impressive.

    Similarly, the fact that Perl is still growing its massive installed base over this period is impressive. (Seen most clearly by the second graph.) Perl 5 has been around for 14 years, and Perl itself for 21.

    The Perl community hasn’t been great at generating “Buzz” that’s visible outside the community. It’s just quietly getting on with the job. Lots of jobs. That lack of buzz helps create the impression that the Perl community lacks vitality relative to other similar languages. Hopefully this post, and others, go some small way towards correcting that.

    p.s. For an alternative, more geographic view, take a look at the Dynamic Language Jobs Map (about).

    Addendum:

    It turns out that approximately 14% of “ruby” jobs relate to restaurants – mostly the Ruby Tuesday chain. So I investigated how false positives affected the single-keyword searches I’ve used in some of the graphs. (I’m going to assume that “foo developer” is sufficiently immune from false positives.)

    I searched for Perl and then added negative keywords (-foo -bar …) until I’d removed almost all of the likely software related jobs. I ended up with this list (which shows that indeed.com don’t use stemming, which is sad and dumb of them):

    perl -developer -developers -engineer -software -programmer -programmers -programming -development -java -database -sql -oracle -sybase -scripting -scripter -coder -linux -unix -protocol -C -C++ -javascript -computing

    Then I did the same search but with python or ruby instead of perl. Here are the results:

    language
     
    all
    matches
    filtered
    matches
    inappropriate
    matches
    perl 29987 6 0.02% false
    python 7794 20 0.2% false
    ruby 4624 794 17% false

    Ruby is well below python (and far below perl) in the first graph, yet that includes this 17% boost from inappropriate matches. You have to marvel at Ruby’s ability to gain mind-share, if not market-share.