Has NYTProf helped you? Tell me how…

At OSCON this year1 I’m giving a “State-of-the-art Profiling with Devel::NYTProf” talk. It’ll be an update of the one I gave last year, including coverage of new features added since then (including, hopefully, two significant new features that are in development).

This year I’d like to spend some time talking about how interpret the raw information and using it to guide code changes. Approaches like common sub-expression elimination and moving invariant code out of loops are straight-forward. They’re ‘low hanging fruit’ with no API changes involved. Good for a first-pass through the code.

Moving loops down into lower-level code is an example of a deeper change I’ve found useful. There are many more. I’d like to collect them to add to the talk and the NYTProf documentation.

So here’s a question for you: after looking at the NYTProf report, how did you identify what you needed to do to fix the problems?

I’m interested in your experiences. How you used NYTProf, how you interpreted the raw information NYTProf presented, and then, critically, how you decided what code changes to make to improve performance. What worked, what didn’t. The practice, not the theory.

Could you to take a moment to think back over the times you’ve used NYTProf, the testing strategy you’ve used, and the code changes you’ve made as a result? Ideally go back and review the diffs and commit comments.

Then send me an email — tell me your story!

The more detail the better! Ideally with actual code (or pseudo-code) snippets2.

  1. OSCON is in San Jose this year, July 20-24th. You can use the code ‘os09fos’ to get a 20% discount.
  2. Annotated diff’s would be greatly appreciated. I’ll give credit for any examples used, naturally, and I’ll happily anonymize any code snippets that aren’t open source.

Fixing the POD synopsis in OSX – take 2 (perldoc, nroff and UTF-8)

Ever copied and pasted a chunk from perldoc output and found you were getting mysterious errors from perl? I have.

I’ve learnt to rewrite the ‘-‘ characters because although they look like ‘-‘ characters they’re really a unicode HYPHEN: U+2010. Some other chars get mangled too, but that’s the most frequent problem for me.

So I was delighted to see a blog post by marcus ramberg called Fixing the POD synopsis in OSX wherein he fingers nroff as being the problem and gives a simple solution:

alias perldoc='perldoc -t'

Trouble is using perldoc -t means you loose the nice bold text that nroff gives you. So I went digging…

It seems the problem only affects people using UTF-8 and that nroff has a -T option that lets you specify an output encoding to use. So if perldoc ran ‘nroff -Tascii’ instead of plain ‘nroff’ that would avoid the hypen problem and let us keep the bold text.

It turns out that perldoc has an option to specify the nroff command to use, so the solution is simple:

alias perldoc="perldoc -n 'nroff -Tascii'"

This is, of course, still a hack. My main worry is that pod docs using non-ascii characters may get mangled. A much better fix would be to arrange for the ascii characters to not get mapped to unicode at all. So I went digging again…

nroff calls groff -m tty-char ... and tty-char refers to a tty-char.tmac file that defines the character mappings. The groff man pages point me to groff_tmac man pages which tell me I can get groff to look for .tmac files elsewhere by passing a -Mdir option or setting the GROFF_TMAC_PATH environment variable.

I looked at the default file, /usr/share/groff/1.19.2/tmac/tty-char.tmac on my Mac, and… decided it was time to go to sleep! The formatting is probably simple enough but I’m out of tuits.

So, what’s needed is for someone to determine what change is needed to the tty-char.tmac file, or the files it refers to, to avoid unwanted conversions to unicode. Then put a modified file into a directory and either add a -Mdir option to the nroff alias above, or set the GROFF_TMAC_PATH environment variable. Setting the env var has the benefit of ‘fixing’ all the man pages.

So, anyone want to dig deeper? (For all I know the solution can already be found on google…)

TIOBE Index is being gamed

It is sad, but inevitable, that the TIOBE index of programming language “popularity” (sic) would be gamed.

Once you start measuring something, and advertising the results, people with an interest in particular outcomes naturally start to look for ways to influence those results. (It’s the Observer Effect writ large.)

The fact that TIOBE’s methodology, which I’ve discussed previously here and here, is simplistic makes it particularly open to gaming. Anyone, or any community, with access to many web pages can simply add the magic phrase “foo programming”, where foo is their language of choice, to get counted.

And it seems that’s exactly what the Delphi community did at the end of 20081. They made a concerted effort and it seems to have paid off. (I’d be very interested in hearing about similar behaviour in other language communities.)

Is that behaviour gaming? The author of the post who exhorted is readers to “Update your Delphi related blog or site to say Delphi programming on every page in visible text (update the template). Stand up and be counted. You can make a difference!” doesn’t seem to think so, as he also said “I am not suggesting we game the system, just that we help TCPI get an accurate count.

An accurate count of what, exactly? That’s always been the fundamental question with TIOBE. It should be obvious that most web pages that talk about “delphi programming” wouldn’t actually contain the phrase “delphi programming”. The same applies to every other language. That’s the paradox at the heart of the TIOBE Index. And yet, somehow, TIOBE seem to think that counting pages containing the phrase “delphi programming” lets them claim that:

The ratings are based on the number of skilled engineers world-wide, courses and third party vendors.

Eh? How can they possibly defend that claim? Certainly their documented definition doesn’t support it, or even mention it.

I presume they’re thinking that CV’s, job postings, and adverts are most likely to contain the magic phrase. It should be obvious, again, that the number of CV’s, job postings, and adverts referring to a given programming language would naturally only be a small fraction of the total web pages referring to the language. (And only distantly related to the “popularity” of a language.) Yet that “small fraction” is what TIOBE measure and make bold claims about.

The fact that TIOBE is making a comparison based on a small fraction makes it even more troubling that TIOBE CEO Paul Jansen appears to support language communities changing their pages to include the magic “foo programming” phrase. In an email quoted on delphi.org he says:

For your information, I think your action has already some effect. Tonight’s run shows that Delphi is #8 at this moment. There is a realistic chance that Delphi will become “TIOBE’s Language of the Year 2008″

He’s endorsing the artificial insertion of the magic phrase. Clearly this distorts the TIOBE index in favour of language communities that infect as many pages as possible with the magic phrase.

That sure seems like an invitation to game the system! It’s likely to lead to other language communities doing the same, and so to further devaluation of the TIOBE Index.

(For alternatives to TIOBE you could look at sites like http://www.langpop.com/, James Robson’s Language Usage Indicators, or my popular comparison of job trends blog post with ‘live’ graphs.)

I have, on a couple of occasions, used the phrase “perl programming” in blog posts for my own amusement, and linked it to my original TIOBE or not TIOBE – “Lies, damned lies, and statistics” post. I haven’t suggested that others do the same. TIOBE’s endorsement of artificial insertion changes that. Now it seems like we’re going to get a dumb “race to the bottom” to see which language community controls the most web pages.

If, as a result, the TIOBE Index is affected significantly, then I simply hope they’ll drop their pretentious claims and state clearly exactly what they’re counting, how they’re doing it, and what it means: not much.

1. Many thanks to Barry Walsh for his blog post that alerted me to this.

Thanks, Iron Man, for the good excuse to perl blog

I’ve been thinking that I haven’t blogged much lately. Assorted half-baked ideas would cross my mind and then evaporate before I’d find the time, or motivation, to actually start writing.

The folks at the Enlightened Perl Organisation have solved the motivation problem by announcing the Iron Man Blogging Challenge: in short, “maintain a rolling frequency of 4 posts every 32 days, with no more than 10 days between posts“.

So about one post a week. I can aim for that!

Can you? “The rules are very simple: you blog, about Perl. Any aspect of Perl you like. Long, short, funny, serious, advanced, basic: it doesn’t matter. It doesn’t have to be in English, either, if that’s not your native language.” Why not try? Help yourself and help perl at the same time.

I’ll try to capture the half-baked ideas for perl blog posts as they cross my mind, then build on them as time and mood allow. Hopefully about one a week will mature into an actual blog post.


I remembered that almost exactly a year ago schwern blogged that he was “horrified at the junk which shows up when you search for perl blog on Google“. It seems the situation hasn’t improved much since. The planet.perl.org site is top, but the rest are a bit of a mishmash.

The problem is partly that “perl blog” isn’t a great search term. Google naturally gives preference to words that appear in urls and titles (all else being equal), but blogs rarely explicitly call themselves blogs on their pages or urls. I suspect many on the first page of results are there because ‘blog’ appears in the url. (To help out I’ve included “perl blog” in the title of this post :)

Searching for “perl blogs” (plural) works better because it finds pages talking about perl blogs, which is useful when searching for perl blogs.

One entry in the “perl blogs” results was the Perl Foundation’s wiki page listing perl blogs. That was new to me. This blog wasn’t on it so I’ve added it. Got a perl blog, or know someone who has, that’s not listed on that page? Go and add it, now! It’ll only take a moment.

Along similar lines, I’ve added the phrases “perl blog” and “perl programming” to the sidebar of my blog pages. The first is to help people searching for “perl blog”. The second is mostly for my own amusement.

iPhoto – Removing redundant originals

I recently came across this article on slimming down an iPhoto library by removing the ‘Original’ photo where a ‘Modified’ one existed.

(That’s one part of what the free iPhoto Diet app does, but it seems that’s not being maintained and doesn’t support recent versions of iPhoto.)

Inspired by the basic three-line shell script in the article I worked up this somewhat more advanced version:

#!/bin/sh -ex
cd ~/Pictures/
du -sh 'iPhoto Library'
dest=$HOME/Pictures/iPhoto-Redundant-Originals-`date "+%Y-%m-%d-%H%M"`
# for all files in iPhoto Library/Modified/...,
# move the corresponding iPhoto Library/Originals/...
# files into a zip archive file
find 'iPhoto Library/Modified' -type f -print \
    | perl -pe 's{iPhoto Library/Modified/}{iPhoto Library/Originals/}' \
    | zip -9 -T -m $dest -@ 2>&1 \
    | grep -v 'zip warning: name not matched'
du -sh 'iPhoto Library'

It has some advantages over the original: the photos are moved into a unique zip file, rather than the trash, and the file hierarchy is preserved, so files can be restored easily.

It could be modified to only operate on a subset of files, such as those older than a certain age.

Extra notes:

  1. After running this you’ll find that the photo will appear black in the ‘edit view’. That’s not a problem. The ‘edit view’ (which you enter by double clicking on a photo, for example) reconstructs the final image by taking the original and reapplying the edits-so-far. Since the original file has been removed you’ll just see a black image. Don’t worry. In all the other views, and for printing etc., your final modified picture will appear perfectly.
  2. iPhoto handles automatic rotation of images from cameras that record their orientation by performing a rotation ‘edit’ for you when you import the image. That rotation creates a modified copy, and that’s a common cause of bloat in your iPhoto library. It’s also why you may be surprised to see some originals being archived even though you haven’t editied them. It would be nice if iPhoto had an option to handle rotations destructively.

Usual caveats: this worked for me, your mileage may vary, cross your fingers, read the referenced article (and follow the links in contains), read the comments on them, quit iPhoto, take a backup, wear a tinfoil hat.

Examples of Modern Perl

In the spirit of re-tweeting, this is a short post to highlight some great examples of “modern perl”. (I’m using the term modern perl very loosely, not referring specifically to any one book, website, or module or organization.)

Firstly I’d like to highlight a couple of recent posts by Jonathan Rockway:

* Unshortening URLs with Modern Perl (also available here). An interesting example application built with modern perl modules like MooseX::Declare, MooseX::Getopt, HTTP::Engine, AnyEvent::HTTP, TryCatch, and KiokuDB.

* Multimethods. Another great example from Jonathan highlighting the combined power of MooseX::Types, MooseX::Declare, and MooseX:: MultiMethods.

Then, from his work at the BBC, Curtis “Ovid” Poe has given us a great series of thoughtful posts on the benefits of replacing multiple inheritance with roles in a complex production code-base. The slides of his Inheritance vs Roles talk is a good place to start. Then dive in to the blog posts back here and work your way forward.

I ♥ modern perl!

Generate Treemaps for HTML from Perl, please.

Seeing this video of treemap for perlcritic memory usage reminded me of something…

I’d really like to be able to use treemaps in NYTProf reports to visualize the time spent in different parts, and depths, of the package namespace hierarchy. Currently that information is reported in a series of tables.

A much better interface could be provided by treemaps. Ideally allowing the user to drill-down into deeper levels of the package namespace hierarchy. (It needn’t be this flashy, just 2D would be fine :-)

In case you’re not familiar with them, treemaps are a great way to visualise hierarchical data. Here’s an example treemap of the disk space used by the files in a directory tree (from schoschie) 82244D56-FA2C-4044-AE46-EE53B63861BE.jpg

Perl already has a Treemap module, which can generate treemap images via the Imager module. Treemap is designed to support more output formats by sub-classing.

I guess it wouldn’t be hard to write a sub-class to generate HTML client-side image map data along with the image, so clicks on the image could be used to drill-down into treemaps that have more detail of the specific area that was clicked on.

More interesting, and more flexible, would be a sub-class to generate the treemap as a Scalable Vector Graphics diagram using the SVG module (and others).

I’m not going to be able to work on either of those ideas anytime soon.

Any volunteers?