At OSCON this year1 I’m giving a “State-of-the-art Profiling with Devel::NYTProf” talk. It’ll be an update of the one I gave last year, including coverage of new features added since then (including, hopefully, two significant new features that are in development).
This year I’d like to spend some time talking about how interpret the raw information and using it to guide code changes. Approaches like common sub-expression elimination and moving invariant code out of loops are straight-forward. They’re ‘low hanging fruit’ with no API changes involved. Good for a first-pass through the code.
Moving loops down into lower-level code is an example of a deeper change I’ve found useful. There are many more. I’d like to collect them to add to the talk and the NYTProf documentation.
So here’s a question for you: after looking at the NYTProf report, how did you identify what you needed to do to fix the problems?
I’m interested in your experiences. How you used NYTProf, how you interpreted the raw information NYTProf presented, and then, critically, how you decided what code changes to make to improve performance. What worked, what didn’t. The practice, not the theory.
Could you to take a moment to think back over the times you’ve used NYTProf, the testing strategy you’ve used, and the code changes you’ve made as a result? Ideally go back and review the diffs and commit comments.
Then send me an email — tell me your story!
The more detail the better! Ideally with actual code (or pseudo-code) snippets2.
11 thoughts on “Has NYTProf helped you? Tell me how…”
Pingback: Using NYTProf to understand why not to do something. « Elliot Loves Perl
Several months back I managed to get nytprof running on one of our development web servers at Craigslist to figure out why the code that renders our search result pages used so much CPU time. It just didn’t add up in my head.
After a few runs and some data analysis, I found that the CPAN module we used to help with time conversion (to display date/time in the user’s local time) was terribly inefficient and was getting called literally thousands of times per second. I found a better way to do this that used all built-in Perl features and it radically reduced the load on those boxes.
Ever since then I’ve been a big fan. nytprof presented that data in a way that made it *very* easy for me to understand the code flow in code I was still fairly new to at the time.
If memory serves, this shaved about 20ms (or 20%) off our typical response time in search, not to mention increasing overall capacity.
Hi Jeremy! (long time no chat)
Any chance you could email me some diffs?
You make a good point re ‘understanding the code flow’. I used it in that way when I started at a Shopzilla and found it invaluable for understanding the flow in the code base I was working on. I’ll add a note about that to the talk. Thanks.
Yeah, let me see if I can capture the essence of the change in a diff. It was across a few files, IIRC. But maybe it’s easy…
I used it recently to optimize one of our batch processes. With the clear and “actionable” information I had it going ~15-50% faster (depending on the dataset) within a few hours. Since it’s a process that can take days (or more!) to run, the speedup was significant.
One thing I wish it had was an easy way to see all the timestamps as “inclusive”. For my use at least I’m more often than not looking at those timestamps and right now I have to “hunt” for them manually (unless there’s an option I missed!).
Hi Ask! That’s great. Any chance you could email me some diffs?
Re inclusive times… The annotations injected into the source as comments show inclusive times. When looking at the report for a source file you can click on the columns of the subroutine summary table at the top to sort by inclusive time. Same goes for the ‘subroutine index’ page. Is that what you’re looking for? (It’s not obvious that they’re clickable.)
You probably remember when Jesse kept bringing up issues with profiling load time. I was the one he was giving profile output to, and using his app’s code I was able to shave about 30% off of Moose’s startup time and the startup time of Moose based code if memory serves me right.
Devel::NYTProf has two big advantages over all the other profilers as far as I’m concerned:
1. It seldom breaks. Many code bases break the other profilers, so an app big enough to need profiling is usually big enough to be unprofilable with other tools (though I have seen some strange things happen: http://github.com/nothingmuch/kiokudb/commit/0cc86555aebc8be8b17c26fc32af36dc07260e0b )
2. the output is very clear and very useful. by using a clean layout you can easily focus on the important data
Hi Yuval. “30% off of Moose’s startup time” is a nice sound-bite :)
NYTProf is getting a lot of love, but I’m getting few offers of actual real-world diffs.
Could you send me some diffs of the changes you made? Or perhaps just a range of revision numbers so I can get the diffs.
I took a look at the “use constant” change you linked to but I couldn’t reproduce any problems.
(Though I don’t know what problem you had as the comment just says “use constant broke NYTProf”)
If you can reproduce a problem with the current NYTProf let me know and I’ll fix it ASAP.
It was pretty intricate, I didn’t have time to investigate deeply.
Other instances of use constant worked fine, but that constant invoked as a method on the class was mysteriously no longer callable when running under -d:NYTProf.
I’ll try to reproduce it stably later today.
As for the actual changes – most of them were applied to Class::MOP on August 14th 2008. They are pretty involved (code generation, order sensitive operation during the Class::MOP bootstrap. Basically it came down to lots of the meta objects not being immutable early enough or not having their bootstrap constructors overwritten, so constructing meta objects was slower than it needed to be).
Devel::NYTProf has helped me tremendously. By iteratively profiling a long-running app of mine, I detected and fixed:
* wrong data structure choices that compromised scalability
* very hot, very short and very simple subroutines that were rewritten in C using Inline::C
* targets for memoization, using the Memoize module
* excessive type checking
* small issues like checking existence in scalar instead of void context and minor algorithm improvements.
Last but not least, it put the last nail in the “Moose is slow” coffin, since never any Moose:: nor Class::MOP:: subroutine appeard at the top in the profiling reports. It was as if I had coded the thing in barebones OO Perl.
I’ve mailed you a more detailed explanation of this, as well as links to the most important diffs that made speed improvements. Hope it helps!
Comments are closed.