NYTProf 2.05 – assorted fixes

Hot on the heels of NYTProf 2.04, I’ve just released NYTProf 2.05. It has a few small but significant bug fixes. From the Changes file:

  Fixed alteration of $! during profiling, thanks to Slaven Rezic.
  Fixed clock_gettime() clock selection to fallback to CLOCK_REALTIME
    if CLOCK_MONOTONIC is not available, e.g, on linux 2.4.
  Fixed error when application ends after DB::disable_profile().
  Added some docs to Devel::NYTProf::Apache
  Added clock=N option to enable user to select the clock.

All the bugs were long-standing ones that happened to come to light and get fixes after 2.04 was released.

The first was the cause of a couple of odd bugs were applications would behave differently when profiled. The problem was that in certain situations the profiler would alter the value of $! between statements, so code that tested for $! would think something had gone wrong.

The second caused weird timing values on some platforms. The statement times would be zero and the subroutine times would be garbage. The problem was that some systems, such as Linux 2.4, define the CLOCK_MONOTONIC macro but don’t implement that clock in the kernel. NYTProf wasn’t testing that the clock worked. Now it does. If the clock_gettime() call fails with the selected clock it falls back to CLOCK_REALTIME (and if that fails NYTProf aborts).

As a bonus I’ve added the ability to select what clock you want to use by setting clock=N in the NYTPROF environment variable, where N is the integer id of the clock you want to use (usually you can find the CLOCK_* macros in the /usr/include/time.h file). This only works if your system supports clock_gettime(), which most modern unix-like systems do, with the notable exception of OS X.

Why would you want to use a different clock?

Well the default, CLOCK_MONOTONIC, is relatively expensive to call, and NYTProf calls it twice per statement. So it’s the dominant cost of the profiler.

Some systems, like Linux 2.6, offer other clocks, like CLOCK_MONOTONIC_FAST that returns the most recently cached value of the clock. That’s typically just 1/10th the cost of CLOCK_MONOTONIC, so your code runs much faster under the profiler.

Naturally there’s a trade-off: the accuracy of the timings drops significantly. Still far better than Devel::DProf, but far worse than normal. You’re dependent on how often the cached time value gets updated by your kernel. I saw a figure of “600 times per second” quoted for one system, but I can’t find the link now. My advice would be to steer clear of this feature unless you feel comfortable with the issues like these.


An “Indian Summer” day of domesticity

Today was a great day.

An ordinary domestic weekend kind of a day, but blessed with great weather and no engagements for any of us.

house last afternoon.jpg

A chance for me to catch up on many of the long neglected house and garden chores, and for our family to enjoy some chilled-out time around the house.

We’ve been burning more wood on the fire recently and the chimney was overdue a sweep. So I blocked up the fireplace with a bin bag and some foam, then climbed on the roof and swept down from the chimney pot. It was my first time sweeping a chimney and this method made it very simple. I just left the fireplace sealed up for a few hours for the dust to settle before opening it up and shovelling out the shoot along with the old ash.

fireplace sealed.jpg  soot in fireplace.jpg

My wife had seen a recipe for “Hedgerow Jelly” by Elisabeth Luard in a magazine, so she, and our two daughters, walked out down the local lanes to find the ingredients: 1kg crab apples or Bramleys; 250g blackberries, 250g sloes or bullaces or damsons, 250g rowan berries or elderberries, 125g hawthorn berries (haws), 125g rosehips (wild or cultivated), plus about 2 litres of water and about 2kg of granulated sugar.

chopped fruit in caserole.jpg

(Our eldest daughter decided to take photographs of the process, which ultimately led to this blog post — I normally rarely post about home life.)

fruit into muslin.jpg fruit in muslin.jpg hanging muslin.jpg
ladling fruit juice.jpg sieves in sink.jpg hedgerow jelly jars.jpg

Meanwhile, after clearing gutters and sweeping the drive, I was working down by the lake front thinning out some trees. A few months ago we had a group of big old trees one corner of our garden cut down. Mostly pine with some silver birch and hazel. (You can see in the photo the large pile of wood I still need to chop, chop, chop.) That gave us a view of the lake again. Something we’d lost over the years since we moved in as the trees have grown up and bushed out.

Having got that narrow view of the lake I wanted to extend it through the trees to the right by thinning out their lower branches.

view to lake.jpg

The summer weather in Ireland has been pretty poor the last few years. We seem to get a few lovely weeks in spring, then summer is a mostly washout with grey skies and regular showers. Then, just after the kids go back to school, we get a few bright weeks again—the Irish “Indian Summer”—around September/October. Fabulous days like today.

There’s a reason Ireland is so green. It rains. A lot. Especially in the west were we are. Despite the bright sunshine today the grass was squelching under my feet. The logs need to be stored or they’ll rot over the winter. I shifted a few wheelbarrow loads today and started clearing a space off the grass to store them.

A friend and I wielded axes a few weeks ago so we’ve a reasonable stockpile of split logs now, which I finished stacking today. I’ve been told the wood gets harder to split the longer its left, and I don’t fancy trying to split logs on dark wet winter nights, so I guess I need to keep at it.

log pile.jpg

Back indoors the domesticity continued with our eldest daughter making the topping for an Apple Crumble.

making crumble.jpg

Our eating apple tree is still recovering from the pruning I gave it last winter (it’s first for many years) but our cooking apple tree seems very happy this year.

cooking apple tree.jpg


That was the glossy edited version of the day. The uncut version includes me falling off the ladder into the lake, our youngest daughter having a tantrum and throwing handfuls of apple crumble onto the floor, and me pouring scalding hot Hedgerow Jelly over my hand. Ahh, domestic bliss!

NYTProf 2.04 gives you 90% smaller data files

At OSCON this year I gave a talk on my new pet project Devel::NYTProf v2 to a packed room. Turned out to be a lot of fun.

“The first thing I need to do is talk about Devel::DProf because it needs to be taken out and shot.”

I made a screencast of the 40 minute talk which you can watch on blip.tv here. Worth watching for the background on profilers, the demo of NYTProf, and the questions, not to mention the teasing I get along the way.

One of the final questions was about the size of the profile data file that NYTProf produces. One of the major drawbacks of statement-level profiling is the volume of data it generates while profiling your code. For every statement executed the profiler streams out the file id, the line number, and the time spent. For every statement! When trying to profile a full application doing real work the volume of data generated quickly becomes impractical to deal with. Multi-gigabyte files are common.

This was the major problem with Devel::SmallProf, which generated text files while profiling. Salvador Fandiño García addressed that in Devel::FastProf by writing the data in a compact binary form. A vast improvement that contributed to Devel::FastProf (on which Devel::NYTProf is based) being the first statement-level profiler worth using on large applications. Even so, the volume of data generated was still a problem when profiling all but short running applications.

NYTProf 2.03 was producing profile data at the rate of about 13MB per million statements executed. That might not sound too bad until you realise that on modern systems with cpu intensive code, perl can execute millions of statements every few seconds.

I could see a way to approximately halve the data volume by changing the format to optimize of the common case of consecutive statements being in the same file, but that wasn’t going to be enough. The best way forward would be to add zip compression. It would be easy enough to pipe the output stream through a separate zip process, but that approach has a problem: the zip process will be soaking up cpu time asynchronously from the app being profiled. That would affect the realtime measurements in an unpredictable way.

I realized back in June that a better approach would be to embed zip compression into NYTProf itself. Around the end of July Nicholas Clark, current Perl Pumpkin, got involved and was motivated to implement the internal zipping because he was “generating over 4Gb of profile data trying to profile the harness in the Perl 5 core running tests in parallel“.

He did a great job. The zlib library is automatically detected at build time and, if available, the code to dynamically route i/o through the zip library gets compiled in. The output stream starts in normal mode, so you can easily see and read the plain text headers in the data file, then switches to zip compression for the profile data. How well did it work out? This graph tells the story:

NYTProf 2.04 compression.png

(The data relates to profiling perlcritic running on a portion of its own source code on my MacBook Pre 2GHz laptop. I only took one sample at each compression level so there may be some noise in the results.)

The data file size (red) plummets even at the lowest compression level. Also note the corresponding drop in system time (yellow) due to the reduction in context switches and file i/o.

I’ve set the default compression level to 6. I doubt you’ll want to change it, but you can by adding compression=N to the NYTPROF environment variable.

Here are the change notes for the 2.04 release:

  Fixed rare divide-by-zero error in reporting code.
  Fixed rare core dump in reporting code.
  Fixed detection of #line directives to be more picky.
  Fixed some compiler warnings thanks to Richard Foley.
  Added on-the-fly ~90% zip compression thanks to Nicholas Clark.
    Reduces data file size per million statements executed
    from approx ~13MB to ~1MB (depends on code being profiled).
  Added extra table of all subs sorted by inclusive time.
  No longer warns about '/loader/0x800d8c/...' synthetic file
    names perl assigns reading code from a CODE ref in @INC