Perl Myths and Mongers in Dublin

Last weekend I went up to Dublin to speak at OSSBarcamp. I took the train from Limerick on Friday so I’d already be in Dublin the following morning, without having to get up at the crack of dawn.

Dublin.pm

Aidan Kehoe and I had a very small but interesting Dublin.pm meeting that night. Their first since 2004! Our wide-ranging discussions that night included me trying to understand what led Dublin.pm to flounder instead of flourish. I think a key factor was the (implicit?) expectation that members should make technical presentations.

Living in the west of Ireland there aren’t enough local Perl users (that I’ve found so far) to have a viable Perl Mongers group. So I setup the Limerick Open Source meetup instead.

Here’s what worked for us: We sit around in a quiet comfy hotel bar and chat. Naturally the chat tends towards the technical, and laptops are produced and turned around to illustrate a point or show results of a search, a chunk of video etc. There’s no set agenda, no declared topics, and no presentations. And yet, I think it’s fair to say, that everyone who’s come along has learnt interesting (albeit random) stuff.

I’d like to hear from perl mongers, in groups of all sizes, what kinds of balance between the social and technical aspects of Perl Mongers meetings works (or doesn’t work) for you.

OSSBarcamp

At OSSBarcamp I gave a ~15 minute ‘lightning talk’ on Devel::NYTProf in the morning, and a ~50 minute talk on Perl Myths in the afternoon.

The Perl Myths talk was a major update to my previous version, now over 18 months old, incorporating lots of updated graphs and other fresh information.

There is so much happy vibrant productive life in the Perl community that updating the presentation has been lovely experience. I keep having to revise the numbers on the slides upwards. There are lots of great graphs and they’re all going upwards too! (Many thanks to Barbie for the great new graphs of CPAN stats.)

I’ve put a PDF of the slides, with notes, on slideshare. Best viewed full-screen or downloaded.

I made a screencast but I think I’ll hang on to that until after I give the same talk, updated again, at the Italian Perl Workshop (IPW09) in Pisa in October — I’m really looking forward to that! I’ll make another screencast there and decide then which to upload.

After OSSBarcamp last week, and before IPW09 in late October, I’ll be flying to Moscow, visa permitting, to give a talk at the HighLoad++ (translated) conference. I’ve never been to Russia before so that’s going to be an amazing experience!

Is your Perl community visible?

As I mentioned recently, I’m working on an update to my Perl Myths talk. (Which is really a review of the state of the art, state of the community, resources, and best practices. You could even call it marketing.)

In recent months, and especially while researching for this update, it’s become clear to me that the Perl community is both functioning well and growing more conscious of its own role and value.

But are the various components of “the community” sufficiently visible? Continue reading

NYTProf 2.04 gives you 90% smaller data files

At OSCON this year I gave a talk on my new pet project Devel::NYTProf v2 to a packed room. Turned out to be a lot of fun.

“The first thing I need to do is talk about Devel::DProf because it needs to be taken out and shot.”

I made a screencast of the 40 minute talk which you can watch on blip.tv here. Worth watching for the background on profilers, the demo of NYTProf, and the questions, not to mention the teasing I get along the way.

One of the final questions was about the size of the profile data file that NYTProf produces. One of the major drawbacks of statement-level profiling is the volume of data it generates while profiling your code. For every statement executed the profiler streams out the file id, the line number, and the time spent. For every statement! When trying to profile a full application doing real work the volume of data generated quickly becomes impractical to deal with. Multi-gigabyte files are common.

This was the major problem with Devel::SmallProf, which generated text files while profiling. Salvador Fandiño García addressed that in Devel::FastProf by writing the data in a compact binary form. A vast improvement that contributed to Devel::FastProf (on which Devel::NYTProf is based) being the first statement-level profiler worth using on large applications. Even so, the volume of data generated was still a problem when profiling all but short running applications.

NYTProf 2.03 was producing profile data at the rate of about 13MB per million statements executed. That might not sound too bad until you realise that on modern systems with cpu intensive code, perl can execute millions of statements every few seconds.

I could see a way to approximately halve the data volume by changing the format to optimize of the common case of consecutive statements being in the same file, but that wasn’t going to be enough. The best way forward would be to add zip compression. It would be easy enough to pipe the output stream through a separate zip process, but that approach has a problem: the zip process will be soaking up cpu time asynchronously from the app being profiled. That would affect the realtime measurements in an unpredictable way.

I realized back in June that a better approach would be to embed zip compression into NYTProf itself. Around the end of July Nicholas Clark, current Perl Pumpkin, got involved and was motivated to implement the internal zipping because he was “generating over 4Gb of profile data trying to profile the harness in the Perl 5 core running tests in parallel“.

He did a great job. The zlib library is automatically detected at build time and, if available, the code to dynamically route i/o through the zip library gets compiled in. The output stream starts in normal mode, so you can easily see and read the plain text headers in the data file, then switches to zip compression for the profile data. How well did it work out? This graph tells the story:

NYTProf 2.04 compression.png

(The data relates to profiling perlcritic running on a portion of its own source code on my MacBook Pre 2GHz laptop. I only took one sample at each compression level so there may be some noise in the results.)

The data file size (red) plummets even at the lowest compression level. Also note the corresponding drop in system time (yellow) due to the reduction in context switches and file i/o.

I’ve set the default compression level to 6. I doubt you’ll want to change it, but you can by adding compression=N to the NYTPROF environment variable.

Here are the change notes for the 2.04 release:

  Fixed rare divide-by-zero error in reporting code.
  Fixed rare core dump in reporting code.
  Fixed detection of #line directives to be more picky.
  Fixed some compiler warnings thanks to Richard Foley.
  Added on-the-fly ~90% zip compression thanks to Nicholas Clark.
    Reduces data file size per million statements executed
    from approx ~13MB to ~1MB (depends on code being profiled).
  Added extra table of all subs sorted by inclusive time.
  No longer warns about '/loader/0x800d8c/...' synthetic file
    names perl assigns reading code from a CODE ref in @INC

Enjoy!

Perl Myths – OSCON 2008

I gave a updated version of my earlier Perl Myths talk at OSCON this year. It includes updated numbers, updated job trend graphs (showing good growth in perl jobs) and slides for the perl6 portion that were missing from the upload of the previous talk.

Two versions of the slides are available: one with just the slides on a landscape page, and another with slides and notes on a portrait page.

I also have a screencast of the presentation which I hope to edit and upload before long. (I’ll update this page and post a new note when I do.)

Lies, damn lies, and search engine rankings

I started a related recent post with a quote that seems just as apt here:

“Figures often beguile me, particularly when I have the arranging of them myself; in which case the remark attributed to Disraeli would often apply with justice and force: ‘There are three kinds of lies: lies, damned lies, and statistics.
– Mark Twain

If you regularly use just one search engine, as I tend to do, it’s very easy to be lulled into a false sense of security about the quality and relevance of the results.

I was recently reminded of the significant differences that can occur in the results of different search engines. That, in turn, reminded me of tools I’d come across previously to highlight those differences. In particular one that gives a very clear picture of the differences in ranking. After a little digging I found it at langreiter.com (via list of tools at http://www.seocompany.ca).

As a demonstration, here’s a comparison of the top results for +”perl programming” at Google (top) and Yahoo (bottom):

google vs yahoo rankings for perl programming via langreiter.png

and here’s the same for +”python programming”:

google vs yahoo rankings for python programming via langreiter.png

Each dot represents a result url, with the top ranked results on the left. Where a url appears in the top 100 results on both Google and Yahoo then a line is drawn between them to highlight the different rankings. On the site you can hover over the dots to see the corresponding url.

I remember being very surprised when I first saw these kinds of results a few years ago. I’m no less surprised now. If fact more so, as I’d had (naïvely) expected Yahoo and Google to have converged somewhat in their concept of relevancy. At least for top results.

The particular queries I used above are not exceptional. I couldn’t find any query that didn’t have significant differences in rankings. Don’t believe me? Go try it yourself at http://www.langreiter.com/exec/yahoo-vs-google.html.

That so many of the top 20 from one search engine don’t even appear in the top 100 of the other is… is… well, I’m not quite sure what to make of it. At first sight it seems like a bad thing, but I also have to admit that it’s a good thing. At least in some ways. Diversity is important in any ecosystem.

If you only use one major search engine then you have to accept that you’re getting just one view of the internet. Most of the time you may be happy with that. It’s worth keeping it in mind, though, for those times when you’re struggling to find good results.

One way to avoid the issue is to use a meta search engine that’ll query multiple search engines for you and merge the results. There are lots of them.

Comparative Language Job Trend Graphs

I researched these comparative job trend graphs for my Keynote at the 2007 London Perl Workshop, and then added a few more for this blog post.

The graphs are from indeed.com, a job data aggregator and search engine. They’re all live, so every time you visit this page they’ll be updated with the current trend data (though it seems the underlying data isn’t updated often). My notes between the graphs relate to how they looked when I wrote this post in February 2008 (and the graphs were all Feb 2005 thru Dec 2008).

First up, all jobs that even mention perl, python or ruby anywhere in the description:

The most amazing thing to me about this graph is that it indicates that 1% of all jobs mention perl. Wow.

(Perhaps the profile of the jobs indeed.com is a little skewed towards technical jobs. If it is then I’m assuming it’s equally skewed for each of the programming languages. Note: An addendum below shows that ruby is getting ~17% boost through false positive matches from other jobs, like Ruby Tuesday restaurants. That applies to the graphs here that don’t qualify the search with an extra term like ‘software engineer’.)

Here’s a slightly more focussed version that compares languages mentioned in jobs for “software engineer” or “software developer” roles:

'software engineer' and 'software developer' roles mentioning perl or python or ruby

A similar pattern. The narrowing of the gap between Perl and the others languages looks like good evidence of Perl’s broad appeal as a general purpose tool beyond the pure “software engineering/development” roles.

I wanted to focus on jobs where developing software using a particular language was the principle focus of the job. So then I looked for “foo developer” jobs:

perl developer vs python developer vs ruby developer

That increases the gap between Perl and the others. Perhaps a reflection of Perl’s maturity – that it’s more entrenched so more likely to be used in the name of the role.

But do people use “foo developer” or “foo programmer” for job titles? Let’s take a look:

So “foo developer” is the most popular, but “foo programmer” is still significant, especially for Perl. (It’s a pity there’s no easy way to combine the pairs of trend lines. That would raise Perl even further.)

To keep us dynamic language folk in our place, it’s worth comparing the trends above with those of more static languages:

same as above but with C, c# and c++

C++ and C# dwarf the dynamic languages. C and cobol are still alive and well, just.

Then, to give the C++ and C# folk some perspective, let’s add Java to the mix:

same as above but with java

C++ and C# may dwarf the dynamic languages, but even they are dwarfed by Java.

Let’s take a slight detour now to look at web related work. (It’s a detour because this post isn’t about web related work, it’s about the jobs market for the three main general purpose dynamic languages. People doing web work can tend to assume that everything is about web work.)

We’ll start by adding in two more specialist languages, PHP and JavaScript:

php and javascript developer

I’m not surprised by the growth of PHP, though I’m sad that so many people are being introduced to ‘programming’ through it. I’m more surprised by the lack of height and growth in JavaScript. I presume that’s because it’s still rare for someone to be primarily a “JavaScript developer”. (That’ll change.) Let’s check that:

perl, python, ruby, php, javascript, web-developer

That’s much closer to what I’d expected. PHP is a popular skill, but is mentioned in less than half the jobs than Perl is. JavaScript, on the other hand, is in great and growing demand.

Let’s look at the “web developer” role specifically and see which of the languages we’re interested in are mentioned most frequently:

I think this graph captures the essence of why people think Perl is stagnant. It’s because Perl hasn’t been growing much in the ‘web developer’ world. People in that world are the ones most likely to be blogging about it and, I’ve noticed, tend to generalize their perceptions.

(If you’re interested in PHP, Java, ASP and JavaScript and look here you’ll see that they all roughly follow the PHP line at about twice the height. JavaScript is at the top with accelerating growth.)

Finally, just to show I’m not completely biased about Perl, here are the relative trends:relative trends

This kind of graph reminds me of small companies that grow by a small absolute amount, say two employees growing to four, and then put out a press release saying they’re the “fastest growing company” in the area, or whatever. Dilbert recognises the issue. The graph looks striking now (Q1 2008) but means little. If it looks much like that in two years time, then it’ll be more impressive.

Similarly, the fact that Perl is still growing its massive installed base over this period is impressive. (Seen most clearly by the second graph.) Perl 5 has been around for 14 years, and Perl itself for 21.

The Perl community hasn’t been great at generating “Buzz” that’s visible outside the community. It’s just quietly getting on with the job. Lots of jobs. That lack of buzz helps create the impression that the Perl community lacks vitality relative to other similar languages. Hopefully this post, and others, go some small way towards correcting that.

p.s. For an alternative, more geographic view, take a look at the Dynamic Language Jobs Map (about).

Addendum:

It turns out that approximately 14% of “ruby” jobs relate to restaurants – mostly the Ruby Tuesday chain. So I investigated how false positives affected the single-keyword searches I’ve used in some of the graphs. (I’m going to assume that “foo developer” is sufficiently immune from false positives.)

I searched for Perl and then added negative keywords (-foo -bar …) until I’d removed almost all of the likely software related jobs. I ended up with this list (which shows that indeed.com don’t use stemming, which is sad and dumb of them):

perl -developer -developers -engineer -software -programmer -programmers -programming -development -java -database -sql -oracle -sybase -scripting -scripter -coder -linux -unix -protocol -C -C++ -javascript -computing

Then I did the same search but with python or ruby instead of perl. Here are the results:

language
 
all
matches
filtered
matches
inappropriate
matches
perl 29987 6 0.02% false
python 7794 20 0.2% false
ruby 4624 794 17% false

Ruby is well below python (and far below perl) in the first graph, yet that includes this 17% boost from inappropriate matches. You have to marvel at Ruby’s ability to gain mind-share, if not market-share.