The Voyage

We saw Johnny Duhan in a very small, intimate, concert in Ennis last year. Last weekend we saw Christy Moore in concert in Limerick. This song, written by Johnny Duhan and sung by Christy Moore, has always struck a cord with me.

I am a sailor, you’re my first mate
We signed on together, we coupled our fate
Hauled up our anchor, determined not to fail
For the hearts treasure, together we set sail

With no maps to guide us we steered our own course
Rode out the storms when the winds were gale force
Sat out the doldrums in patience and hope
Working together we learned how to cope

Chorus:
Life is an ocean and love is a boat
In troubled water that keeps us afloat
When we started the voyage, there was just me and you
Now gathered round us, we have our own crew

Together we’re in this relationship
We built it with care to last the whole trip
Our true destination’s not marked on any charts
We’re navigating to the shores of the heart

Chorus 2x

– Johnny Duhan

Here’s a video of Christy Moore and Johnny Duhan talking about the song and singing it together.

NYTProf v3 – a sneak peak

I’ve had a great week at OSCON. The talks are excellent but the real value is in the relationships formed and renewed in the “hallway track”. I’m honoured and humbled to be able to call many great people my friends.

My talk on Devel::NYTProf seemed to go well. This year I covered not just NYTProf and the new features in v3 (not yet released) but also added a section on how to use NYTProf to optimize your perl code.

Here’s a quick summary, with links to the slides and screen-cast, and outline what’s still to be done before v3 gets released (getting closer by the day). Continue reading

NYTProf 2.04 gives you 90% smaller data files

At OSCON this year I gave a talk on my new pet project Devel::NYTProf v2 to a packed room. Turned out to be a lot of fun.

“The first thing I need to do is talk about Devel::DProf because it needs to be taken out and shot.”

I made a screencast of the 40 minute talk which you can watch on blip.tv here. Worth watching for the background on profilers, the demo of NYTProf, and the questions, not to mention the teasing I get along the way.

One of the final questions was about the size of the profile data file that NYTProf produces. One of the major drawbacks of statement-level profiling is the volume of data it generates while profiling your code. For every statement executed the profiler streams out the file id, the line number, and the time spent. For every statement! When trying to profile a full application doing real work the volume of data generated quickly becomes impractical to deal with. Multi-gigabyte files are common.

This was the major problem with Devel::SmallProf, which generated text files while profiling. Salvador Fandiño García addressed that in Devel::FastProf by writing the data in a compact binary form. A vast improvement that contributed to Devel::FastProf (on which Devel::NYTProf is based) being the first statement-level profiler worth using on large applications. Even so, the volume of data generated was still a problem when profiling all but short running applications.

NYTProf 2.03 was producing profile data at the rate of about 13MB per million statements executed. That might not sound too bad until you realise that on modern systems with cpu intensive code, perl can execute millions of statements every few seconds.

I could see a way to approximately halve the data volume by changing the format to optimize of the common case of consecutive statements being in the same file, but that wasn’t going to be enough. The best way forward would be to add zip compression. It would be easy enough to pipe the output stream through a separate zip process, but that approach has a problem: the zip process will be soaking up cpu time asynchronously from the app being profiled. That would affect the realtime measurements in an unpredictable way.

I realized back in June that a better approach would be to embed zip compression into NYTProf itself. Around the end of July Nicholas Clark, current Perl Pumpkin, got involved and was motivated to implement the internal zipping because he was “generating over 4Gb of profile data trying to profile the harness in the Perl 5 core running tests in parallel“.

He did a great job. The zlib library is automatically detected at build time and, if available, the code to dynamically route i/o through the zip library gets compiled in. The output stream starts in normal mode, so you can easily see and read the plain text headers in the data file, then switches to zip compression for the profile data. How well did it work out? This graph tells the story:

NYTProf 2.04 compression.png

(The data relates to profiling perlcritic running on a portion of its own source code on my MacBook Pre 2GHz laptop. I only took one sample at each compression level so there may be some noise in the results.)

The data file size (red) plummets even at the lowest compression level. Also note the corresponding drop in system time (yellow) due to the reduction in context switches and file i/o.

I’ve set the default compression level to 6. I doubt you’ll want to change it, but you can by adding compression=N to the NYTPROF environment variable.

Here are the change notes for the 2.04 release:

  Fixed rare divide-by-zero error in reporting code.
  Fixed rare core dump in reporting code.
  Fixed detection of #line directives to be more picky.
  Fixed some compiler warnings thanks to Richard Foley.
  Added on-the-fly ~90% zip compression thanks to Nicholas Clark.
    Reduces data file size per million statements executed
    from approx ~13MB to ~1MB (depends on code being profiled).
  Added extra table of all subs sorted by inclusive time.
  No longer warns about '/loader/0x800d8c/...' synthetic file
    names perl assigns reading code from a CODE ref in @INC

Enjoy!

Arithmetic, Population, and Energy

Dr. Albert Bartlett is emeritus Professor of Physics at the University of Colorado at Boulder, USA. He has given this lecture on “Arithmetic, Population, and Energy” over 1,500 times.

“The greatest shortcoming of the human race is our inability to understand the exponential function.”

A challenging statement. Having seen the lecture now I can understand why it has been so popular, and is so important. I came across it recently and felt it was worth sharing.

You can watch it on YouTube here as a series of bite-size 9 minute clips.

(After that you might like this unrelated take on applying risk management.)

The Italian Perl Workshop

Pisa Cathedral Wall.jpgI spent a very pleasant few days in Pisa, Italy, last week. I’d been invited to speak at the Fourth Italian Perl Workshop. The workshop was a great success. In fact calling it a “workshop” is selling it short. It’s more of a mini-conference:

“2 days of conference, 2 simultaneous tracks, more than 30 talks, 120 attendees, 20 sponsors and many international guests”

The whole event ran very smoothly thanks to a great team lead by Gianni Ceccarelli, Francesco Nitido, and Enrico Sorcinelli. I’ll echo the compliments of one attendee “Complimenti sinceri agli organizzatori! Bravissimi! Tutto perfetto!

I gave short talk on Gofer on the Thursday, and then two 40 minutes talks on Friday: Perl Myths, and Devel::NYTProf. I hope to upload screencasts and slides next week. The talks were all recorded on video so I imagine they’ll be uploaded at some point. I’ll add links here to them when they are.

The majority of the sessions were in Italian so, since my Italian is practically non-existant, I had plenty of time to work.

Or at least try to. The one disappointment of the trip for me was the apparent poor quality of the Italian internet. Using mtr I’d regularly see over 20% packet loss within telecomitalia.it and interbusiness.it from my hotel room. Occasionally over 50%. It got much better at night, so I’d do more work then. At the conference venue the Italian academic network (garr.net) also regularly had over 20% packet loss at its link to the internet. All this was, of course, outside the control of the organisers.

The “corridoor track” at perl conferences is always good. I had a chance to talk to Rafel Garcia-Suarez (and meet his lovely wife and new baby son), Matt S Trout, Hakim Cassimally, Michel Rodriguez, Marcus Ramberg, and many others.

I had opted to take a very early fight so I’d have a day being a tourist in Pisa before the conference started. The weather was beautiful and I had a lovely time strolling though the streets of this ancient city.

Pisa Knights Square.jpg

I didn’t take my camera with me, but I did take my iPhone so I was able to capture a few snaps as I strolled around and climbed the tower. (Yes, it really does lean in a disconcerting “it must be about to fall down” way. All the more dramatic when you’re up close and can appreciate the massive scale of the tower.)

Pisa View over Cathedral.jpg

Hey, my own TV channel!

It felt strange when I first set up this blog. What would I write about? Who would care?

For several years now I’ve been giving talks at conferences and workshops. I’d generally upload a PDF of the slides somewhere, or at least email them to anyone that asked. I’ve now added a special page on the blog where I can list all the talks I’ve given. That now acts as a single location to find all my talks and links to slides any related materials. (It’s currently a work-in progress. I’ll be filling it in from time to time. Any major updates will be accompanied by a blog post.)

Slides, no matter how good, miss much of the real event. No ad-libs, no questions and answers. When writing slides I’m always caught between the desire write little, so the audience can pay attention to what I’m saying, and to write lots, so people reading the slides later still get a reasonably full picture.

There’s also the problem of notes. I often use ‘presenter notes’ on the slides to give extra information. Both to myself, if I may need it while presenting, and also for links to data sources and credits for images used. I’ve uploaded some talks to slideshare.net but I have to include a separate version with notes (which is useful for download and print, but almost unreadable in their viewer.

I tried making a video of a talk on a camcorder. The results weren’t great. Grainy, noisy, hard to read, and massive video files.

Then I decided to try using screencasting software. I bought a great wireless USB microphone and the amazing ScreenFlow screencasting software. Now I can to capture everything in fine detail and edit it easily afterwards.

Great. Now what? I needed somewhere to host the (very large) videos. I looked around and tried a few, like vimeo, but wasn’t happy with the results. Vimo, for example, transcode to quite a low resolution and don’t let viewers download the original.

Eventually I found the wonder that is blip.tv. A whole laundry list of great features. If you produce videos of any kind, give them a look.

So, now I have my own TV channel.

Strange world!

Irish Open Source Technology Conference – June 18th-20th

I’ll be speaking at the Irish Open Source Technology Conference this year. It’s on at Dublin’s CineWorld Complex, from June 18th for three days. They’re running a 2-for-1 offer on tickets at the moment.

I’ll be speaking about something Perl’ish, naturally.

The “Perl Myths” presentation I gave at IWTC earlier this year turned out to be a hit. (At least, it was after the event. There were less than ten people in the room at the time, including me! Perl clearly isn’t a hot topic among Irish web developers.)

My blog post, with embedded slides and video, has topped 7400 hits, plus another 3000 or so views on slideshare.

I’m upgrading my methods for this next talk. I’ve bought a great wireless USB microphone and the amazing
ScreenFlow screencasting software to capture everything in detail.

So I’m going all high-tech. No more “camcorder perched on a chair at the back” for me!

It’ll be a good trial run for OSCON where I’m speaking in July.

Pay no attention to that callback behind the curtain!

So you’ve got some perl code that connects to a particular database via a particular DBI driver. You want it to connect to a different database or driver. But you can’t change that part of the code. What can you do?

I ran into this problem recently. A large application is using an old version of DBIx::HA which doesn’t support DBD::Gofer. DBIx::HA can’t be upgraded (long story, don’t ask) but I wanted to use DBD::Gofer to provide client-side caching via Cache::FastMmap. (I’ll save more details of that, and the 40% reduction in database requests it gave, for another post.)

I needed a way for DBIx::HA to think that it was connecting to a particular driver and database, but for it to actually connect to another. Using $ENV{DBI_AUTOPROXY} wasn’t an option because that has global effect whereas I needed fine control over which connections were affected. It’s also fairly blunt instrument in other ways.

It seemed like I was stuck. Then I remembered the DBI callback mechanism – it would provide an elegant solution to this. I added it to DBI 1.49 back in November 2005 and enhanced it further in 1.55. I’d never documented it though. I think I was never quite sure it had sufficient functionality to be really useful. Now I’m sure it has.

The DBI callback mechanism lets you intercept, and optionally replace, any method call on a DBI handle. At the extreme, it lets you become a puppet master, deceiving the application in any way you want.

Here’s how the code looked (with a few irrelevant details changed):

    # The following section of code uses the DBI Callback mechanism to
    # intercept connect() calls to DBD::Sybase and, where appropriate, 
    # reroute them to DBD::Gofer.
    our $in_callback;

    # get Gofer $drh and make it pretend to be named Sybase
    # to keep DBIx::HA 0.62 happy
    my $gofer_drh  = DBI->install_driver("Gofer");
    $gofer_drh->{Name} = "Sybase";

    # get the Sybase drh and install a callback to intercept connect()s
    my $sybase_drh = DBI->install_driver("Sybase");
    $sybase_drh->{Callbacks} = {
        connect => sub {
            # protect against recursion when gofer itself makes a connection
            return if $in_callback; local $in_callback = 1;

            my $drh = shift;
            my ($dsn, $u, $p, $attr) = @_;
            warn "connect via callback $drh $dsn\n" if $DEBUG;

            # we're only interested in connections to particular databases
            return unless $dsn =~ /some pattern/;

            # rewrite the DSN to connect to the same DSN via Gofer
            # using the null transport so we can use Gofer caching
            $dsn = "transport=null;dsn=dbi:Sybase(ReadOnly=1):$dsn";

            my $dbh = $gofer_drh->connect($dsn, $u, $p, $attr);

            if (not $dbh) { # gofer connection failed for some reason
                warn "connect via gofer failed: $DBI::errstr\n"
                    unless our $connect_via_gofer_err++; # warn once
                return; # DBI will now call original connect method
            }

            undef $_;    # tell DBI not to call original connect method
            return $dbh; # tell DBI to return this $dbh instead
        },
    };

So the application, via DBIx::HA, executed

  $dbh = DBI->connect("dbi:Sybase:foo",...)

but what it got back was a DBD::Gofer dbh, as if the application has executed

  $dbh = DBI->connect("dbi:Gofer:transport=null;dsn=dbi:Sybase(ReadOnly=1):foo",...).

I guess I should document the callback mechanism now. Meanwhile the closest thing to documentation is the test file.

I’ve always enjoyed this kind of “plumbing”. If you come up with any interesting uses of DBI callbacks, do let me know.

Perl Myths

Update: several more recent versions of my Perl Myths talk are available. These have significant updates. Slides can be found on slideshare.net and screencasts can be found on my blip.tv channel.

I’ve uploaded my Perl Myths presentation to slideshare.net and google video:

“Perl has it’s share of myths. This presentation debunks a few popular ones with hard facts. Surprise yourself with the realities.”

While I agree with Andy Lester that Good Perl code is the best form of evangelism, I wanted to put together a presentation that others could refer to when they encounter misinformation about Perl. I cover these myths that I’ve heard recently:

  • Perl is dead
  • Perl is hard to read / test / maintain
  • Perl 6 is killing Perl 5

and pull in a wealth of upto date information, some of it quite surprising even to those familiar with Perl and its community. There are two versions, plus a video. I recommend the one with notes (which have useful extra detail and context for the slides) which is best viewed as a PDF. There’s also one without notes which I’ve embedded here:

I videoed an extended version of this presentation at IWTC in Dublin in February. The first 40 minutes or so correspond with the slides above. In the remaining 30 minutes or so I talk about Parrot and Perl 6. I’ve embedded the video below, but wordpress forces me to use a small size so you’ll probably prefer to view it at video.google.com: