Relaxed Perl Myths in Ann Arbor

Speaking of belated screencasts, I also haven’t blogged about my visit to the Ann Arbor Perl Mongers in Michigan.

The Ann Arbor Perl Mongers group was being restarted (after a 10 year gap) by the TigerLead tech team. I’m working for TigerLead and was going to be in Ann Arbor for a meeting so they asked me to give a couple of talks: Devel::NYTProf and Perl Myths.

I like giving talks at events like these because there’s no set time limit and the audience is more relaxed (the free pizza probably helped).

I’ve uploaded a screencast of the Perl Myths talk. As usual it covers the Perl jobs market, CPAN, best practices, power tools, community and perl6. At almost 1 hour 20 minutes it’s significantly longer than my usual, more rushed, 40 minute version given at conferences and includes 15 minutes of Q & A at the end.

Customer Relationship Management (CRM) systems in Perl

I’m looking for a CRM system implemented in Perl. As it turns out, so are the Perl Foundation.

So I thought I’d summarize my interpretation of the comments on that thread, as much for my own benefit as yours, and see if this post flushes out any further information.

We’ll start with the smaller/personal projects and work up from there…
Continue reading

Has NYTProf helped you? Tell me how…

At OSCON this year1 I’m giving a “State-of-the-art Profiling with Devel::NYTProf” talk. It’ll be an update of the one I gave last year, including coverage of new features added since then (including, hopefully, two significant new features that are in development).

This year I’d like to spend some time talking about how interpret the raw information and using it to guide code changes. Approaches like common sub-expression elimination and moving invariant code out of loops are straight-forward. They’re ‘low hanging fruit’ with no API changes involved. Good for a first-pass through the code.

Moving loops down into lower-level code is an example of a deeper change I’ve found useful. There are many more. I’d like to collect them to add to the talk and the NYTProf documentation.

So here’s a question for you: after looking at the NYTProf report, how did you identify what you needed to do to fix the problems?

I’m interested in your experiences. How you used NYTProf, how you interpreted the raw information NYTProf presented, and then, critically, how you decided what code changes to make to improve performance. What worked, what didn’t. The practice, not the theory.

Could you to take a moment to think back over the times you’ve used NYTProf, the testing strategy you’ve used, and the code changes you’ve made as a result? Ideally go back and review the diffs and commit comments.

Then send me an email — tell me your story!

The more detail the better! Ideally with actual code (or pseudo-code) snippets2.

  1. OSCON is in San Jose this year, July 20-24th. You can use the code ‘os09fos’ to get a 20% discount.
  2. Annotated diff’s would be greatly appreciated. I’ll give credit for any examples used, naturally, and I’ll happily anonymize any code snippets that aren’t open source.

Examples of Modern Perl

In the spirit of re-tweeting, this is a short post to highlight some great examples of “modern perl”. (I’m using the term modern perl very loosely, not referring specifically to any one book, website, or module or organization.)

Firstly I’d like to highlight a couple of recent posts by Jonathan Rockway:

* Unshortening URLs with Modern Perl (also available here). An interesting example application built with modern perl modules like MooseX::Declare, MooseX::Getopt, HTTP::Engine, AnyEvent::HTTP, TryCatch, and KiokuDB.

* Multimethods. Another great example from Jonathan highlighting the combined power of MooseX::Types, MooseX::Declare, and MooseX:: MultiMethods.

Then, from his work at the BBC, Curtis “Ovid” Poe has given us a great series of thoughtful posts on the benefits of replacing multiple inheritance with roles in a complex production code-base. The slides of his Inheritance vs Roles talk is a good place to start. Then dive in to the blog posts back here and work your way forward.

I ♥ modern perl!

Generate Treemaps for HTML from Perl, please.

Seeing this video of treemap for perlcritic memory usage reminded me of something…

I’d really like to be able to use treemaps in NYTProf reports to visualize the time spent in different parts, and depths, of the package namespace hierarchy. Currently that information is reported in a series of tables.

A much better interface could be provided by treemaps. Ideally allowing the user to drill-down into deeper levels of the package namespace hierarchy. (It needn’t be this flashy, just 2D would be fine :-)

In case you’re not familiar with them, treemaps are a great way to visualise hierarchical data. Here’s an example treemap of the disk space used by the files in a directory tree (from schoschie) 82244D56-FA2C-4044-AE46-EE53B63861BE.jpg

Perl already has a Treemap module, which can generate treemap images via the Imager module. Treemap is designed to support more output formats by sub-classing.

I guess it wouldn’t be hard to write a sub-class to generate HTML client-side image map data along with the image, so clicks on the image could be used to drill-down into treemaps that have more detail of the specific area that was clicked on.

More interesting, and more flexible, would be a sub-class to generate the treemap as a Scalable Vector Graphics diagram using the SVG module (and others).

I’m not going to be able to work on either of those ideas anytime soon.

Any volunteers?

NYTProf screencast from the 2008 London Perl Workshop

I’ve uploaded the screencast of my NYTProf talk at the London Perl Workshop in November 2008.

It’s based on a not-quite-2.08 version and includes some coverage of an early draft of the ‘timings per rolled-up package name’ feature I discussed previously.

It also shows how and why anonymous subroutines, defined at the same line of different executions of the same string eval, get ‘merged’.

The demos use perlcritic and Moose code. It also includes a nice demonstration showing NYTProf highlighting a performance problem with File::Spec::Unix when called using Path::Class::Dir objects.

It’s 36 minutes long, including a good Q&A session at the end (wherein a market rate for performance improvements is established). Enjoy.

NYTProf v2 – the background story

This is the back story to the development of NYTProf v2.

Earlier this year (2008) I needed to do some performance profiling of the source code of a large application. Like many perl developers, my first instinct was to try using Devel::DProf (actually the Apache::DProf wrapper as it was a mod_perl application). It was not a great experience.


DProf seems easily confused by unusual control flow, spewing “… has unstacked calls” warnings. Also, the subroutines it said were taking the most time didn’t make sense to me. Eventually I worked out that Devel::DProf is effectively broken.

The application I was trying to profile has quite a few large subroutines, so knowing just the time spent in the subroutine as a whole didn’t help much for those. I wanted to know where in the subroutine the time was being spent.


That led me to look at line-based profilers. At the time there as only one worth looking at, Devel::FastProf by Salvador Fandiño García (which was based on Devel::SmallProf by Ted Ashton).

Line profilers spit out a stream of “file id, line number, time spent” records to a file. The time between starting one statement and starting the next is measured and associated with the line number of the statement.

Devel::FastProf was great. Fast and effective. It gave me far more accurate timings, and when I made changes in the code it highlighted I could see an immediate effect on performance.

Devel::FastProf was great, but I wanted more. The lack of subroutine level timing was frustrating. When you have a ~100,000 lines of code, knowing the time spent on each, and how many times it was executed, is less useful than you may think – there’s just too much detail. Especially when looking for structural problems in the code, or for good places to add caching, or pass extra information down a call chain to save expensive calls deeper in the code. There’s a need for both subroutine and line level timings when profiling.


I’d had an idea: The line number output in the FastProf profile need not be the line number of the statement. It could output the line number of the subroutine containing the statement. Then you’d automatically get subroutine level timings! Simple.
Then I wondered if it was possible to find the line number of the block the statement was in. That would give block level timing! A first for any perl profiler.

My perl internals knowledge was getting rusty as it was a few years since I’d been Pumpkin for the 5.4.x release, so I asked the wizards on perl5-porters. They gave me hope and enough clues to get going.

Salvador kindly moved the Devel::FastProf code to a public svn, so I could contribute more easily, and I started hacking. I added code to find the nearest enclosing block or sub and it proved very useful.

When I’m optimizing I start by identifying “locally inefficient code”. That is, code you can optimize without significant structural changes. Without making changes outside the subroutine. Moving code outside loops is a common example. Subroutine timings identify the hot subs, then block and line timings pinpoint the hot spots in the code.

That’s the low hanging fruit. Easy pickings, and often very effective. But there’s a limit to how far that’ll take you.

Callers and Callees

There are two ways to optimize a hot piece of code: make it faster, or execute it less often. The former tends to get much more attention than the latter. It’s important to remember to keep stepping back. To keep looking for the bigger picture.

When I’m optimizing I often use a well defined chunk of work, like 10 requests to the same URL, so I can see of the number of times a subroutine is called “feels right”. That often shows subs being called “too often”. But what then? You need to know why the sub is being called too often, so you need to know where it’s being called from.

In Devel::FastProf I added counting of subroutine callers two-levels up the call stack. So I could see that foo() was called 10 times by bar() and that of those 10 calls from bar() 7 has come from baz() and 3 from boo(). It was very simplistic, slow (implemented in perl) and only had counts, not timings, but proved very useful.

Using these additions to Devel::FastProf I reduced the CPU usage of the application by over 40%. Not bad. (I could see another 10% or so to be gained fairly easily but had to draw the line somewhere.)


Meanwhile Adam Kaplan had forked Devel::FastProf, added a test harness and tests, made some internal improvements, grafted on an html report derived from Devel::Cover, and released it as Devel::NYTProf.

When I saw NYTProf I switched to working on that. Again, Adam was kind enough to move the code to a public svn repository. I was attracted not so much by the html report as by the test harness. A lesson to anyone wanting to attract developers to an open source project!

Testing profilers is hard and Adam had come up with a good basic testing framework which was easy to extend. NYTProf now has a strong test suite that profiles 19 different perl scripts with four different combinations of profiler options. The test suite has proven invaluable in identifying regressions during development, and for identifying portability issues between perl versions.

I re-implemented the block/sub level profiling and the subroutine caller tracking from FastProf in NYTProf, but with more care, more attention to performance, and now tests.

I was particularly pleased with the subroutine caller tracking. It intercepts the entersub opcode and uses the save stack for storage and to trigger a ‘destructor’ call to end the timing when the subroutine is exited by any means. The end result is an extremely fast and robust subroutine call profiler. I plan to add an option to disable the other profiler so you can just get subroutine profile details when you don’t need statement level details. It currently lacks the ability to give exclusive times but I think I’ve an efficient solution for that. (Update: Implemented in r340 and r343 so will be in the 2.02 release.)


Another key innovation was to fix a fundamental problem inherent in all statement profilers. Consider a statement that calls a subroutine and then performs some other work that doesn’t execute new statements, for example:

    foo(...) || mkdir(...);

In all other statement profilers the time spent in remainder of the expression (mkdir in the example) will be recorded as having been spent on the last statement executed in foo()! Here’s another example:

    while (<>) {

After the first time around the loop, any further time spent evaluating the condition (waiting for input in this example) would be be recorded as having been spent on the last statement executed in the loop!

I fixed this in NYTProf by intercepting all the opcodes which indicate that control is returning into some previous statement and adjusting the profile accordingly.


As much effort, if not more, went into the reporting side of the code. And there’s a lot more to be done there. My goal is to keep growing the data model classes to the point where any reporting code can get the information it needs easily enough that there’s no longer a need for the rather limiting Reporter class.

I’d like to see a single ‘nytprof’ command line tool that loads a class to generate the report. That would replace nytprofhtml and nytprofcsv. That would make it easy for other developers to release ‘nytprof reporting modules’ to CPAN.

For example, one very useful report that FastProf has but NTProf currently lacks is a list of most expensive lines (or blocks, or subs) output in the format used by compiler error messages. The format is important because most editors have a special mode for reading such files that means you can hop from one ‘most expensive line’ to the next with a single key stroke. (For vim that’s called quickfix mode.) That’s a wonderful way to browse the hotspots and make edits on the spot.


There are many, many, ways NYTProf can be enhanced further. As I’ve worked on it I’ve dumped ideas, issues and random notes into the HACKING file.


I’d like to end by expressing my thanks to Salvador Fandiño García and especially Adam Kaplan for allowing me to contribute to the modules they created and tolerating my strong ideas with understanding. Thank you both. It’s been quite a ride.