TIOBE or not TIOBE – “Lies, damned lies, and statistics”

[I couldn't resist the title, sorry.]

“Figures often beguile me, particularly when I have the arranging of them myself; in which case the remark attributed to Disraeli would often apply with justice and force: ‘There are three kinds of lies: lies, damned lies, and statistics.
– Mark Twain

I’ve been meaning to write a post about the suspect methodology of the TIOBE Index but Andrew Sterling Hanenkamp beat me to it (via Perl Buzz).

I do want to add a few thoughts though…

The TIOBE Programming Community Index is built on two assumptions:

  • that the number of search engine hits for the phrase “foo programming” is proportional to the “popularity” of that language.
  • that the proportionality is the same for different languages.
  • It’s not hard to pick holes in both of those assumptions.

    They also claim that “The ratings are based on the number of skilled engineers world-wide, courses and third party vendors” but I can’t see anything in their methodology that supports that claim.
    I presume they’re just pointing out the kinds of sites that are more likely to contain the “foo programming” phrase.

    Even if you can accept their assumptions as valid, can you trust their maths? Back in Jan 2008 when I was researching views of perl TIOBE was mentioned. So I took a look at it.

    At the time Python had just risen above Perl, prompting TIOBE to declare Python the “programming language of the year”. When I did a manual search, using the method they described, the results didn’t fit.

    I wrote an e-mail to Paul Jansen, the Managing Director and author of the TIOBE Index. Here’s most of it:

    Take perl and python, for example:

    I get 923,000 hits from google for +”python programming” and 3,030,000 for +”perl programming”. (The hits for Jython, IronPython, and pypy programming are tiny.) As reported by the “X-Y of approx Z results” at the top of the search results page.

    Using google blog search I get 139,887 for +”python programming” and 491,267 for +”perl programming”. (The hits for Jython, IronPython, and pypy programming are tiny.)

    So roughly 3-to-1 in perl’s favor from those two sources. It’s hard to imagine that “MSN, Yahoo!, and YouTube” would yield very different ratios.

    So 3:1 for perl, yet python ranks higher than perl. Certainly seems odd.

    Am I misunderstanding something?

    I didn’t get a reply.

    I did note that many languages had dipped sharply around that time and have risen sharply since. Is that level of month-to-month volatility realistic?

    Meanwhile, James Robson has implemented an alternative, and open source, set of Language Usage Indicators. I’m hoping he’ll add trend graphs soon.

    Update: the story continues.


    Comparative Language Job Trend Graphs

    I researched these comparative job trend graphs for my Keynote at the 2007 London Perl Workshop, and then added a few more for this blog post.

    The graphs are from indeed.com, a job data aggregator and search engine. They’re all live, so every time you visit this page they’ll be updated with the current trend data (though it seems the underlying data isn’t updated often). My notes between the graphs relate to how they looked when I wrote this post in February 2008 (and the graphs were all Feb 2005 thru Dec 2008).

    First up, all jobs that even mention perl, python or ruby anywhere in the description:

    The most amazing thing to me about this graph is that it indicates that 1% of all jobs mention perl. Wow.

    (Perhaps the profile of the jobs indeed.com is a little skewed towards technical jobs. If it is then I’m assuming it’s equally skewed for each of the programming languages. Note: An addendum below shows that ruby is getting ~17% boost through false positive matches from other jobs, like Ruby Tuesday restaurants. That applies to the graphs here that don’t qualify the search with an extra term like ‘software engineer’.)

    Here’s a slightly more focussed version that compares languages mentioned in jobs for “software engineer” or “software developer” roles:

    'software engineer' and 'software developer' roles mentioning perl or python or ruby

    A similar pattern. The narrowing of the gap between Perl and the others languages looks like good evidence of Perl’s broad appeal as a general purpose tool beyond the pure “software engineering/development” roles.

    I wanted to focus on jobs where developing software using a particular language was the principle focus of the job. So then I looked for “foo developer” jobs:

    perl developer vs python developer vs ruby developer

    That increases the gap between Perl and the others. Perhaps a reflection of Perl’s maturity – that it’s more entrenched so more likely to be used in the name of the role.

    But do people use “foo developer” or “foo programmer” for job titles? Let’s take a look:

    So “foo developer” is the most popular, but “foo programmer” is still significant, especially for Perl. (It’s a pity there’s no easy way to combine the pairs of trend lines. That would raise Perl even further.)

    To keep us dynamic language folk in our place, it’s worth comparing the trends above with those of more static languages:

    same as above but with C, c# and c++

    C++ and C# dwarf the dynamic languages. C and cobol are still alive and well, just.

    Then, to give the C++ and C# folk some perspective, let’s add Java to the mix:

    same as above but with java

    C++ and C# may dwarf the dynamic languages, but even they are dwarfed by Java.

    Let’s take a slight detour now to look at web related work. (It’s a detour because this post isn’t about web related work, it’s about the jobs market for the three main general purpose dynamic languages. People doing web work can tend to assume that everything is about web work.)

    We’ll start by adding in two more specialist languages, PHP and JavaScript:

    php and javascript developer

    I’m not surprised by the growth of PHP, though I’m sad that so many people are being introduced to ‘programming’ through it. I’m more surprised by the lack of height and growth in JavaScript. I presume that’s because it’s still rare for someone to be primarily a “JavaScript developer”. (That’ll change.) Let’s check that:

    perl, python, ruby, php, javascript, web-developer

    That’s much closer to what I’d expected. PHP is a popular skill, but is mentioned in less than half the jobs than Perl is. JavaScript, on the other hand, is in great and growing demand.

    Let’s look at the “web developer” role specifically and see which of the languages we’re interested in are mentioned most frequently:

    I think this graph captures the essence of why people think Perl is stagnant. It’s because Perl hasn’t been growing much in the ‘web developer’ world. People in that world are the ones most likely to be blogging about it and, I’ve noticed, tend to generalize their perceptions.

    (If you’re interested in PHP, Java, ASP and JavaScript and look here you’ll see that they all roughly follow the PHP line at about twice the height. JavaScript is at the top with accelerating growth.)

    Finally, just to show I’m not completely biased about Perl, here are the relative trends:relative trends

    This kind of graph reminds me of small companies that grow by a small absolute amount, say two employees growing to four, and then put out a press release saying they’re the “fastest growing company” in the area, or whatever. Dilbert recognises the issue. The graph looks striking now (Q1 2008) but means little. If it looks much like that in two years time, then it’ll be more impressive.

    Similarly, the fact that Perl is still growing its massive installed base over this period is impressive. (Seen most clearly by the second graph.) Perl 5 has been around for 14 years, and Perl itself for 21.

    The Perl community hasn’t been great at generating “Buzz” that’s visible outside the community. It’s just quietly getting on with the job. Lots of jobs. That lack of buzz helps create the impression that the Perl community lacks vitality relative to other similar languages. Hopefully this post, and others, go some small way towards correcting that.

    p.s. For an alternative, more geographic view, take a look at the Dynamic Language Jobs Map (about).

    Addendum:

    It turns out that approximately 14% of “ruby” jobs relate to restaurants – mostly the Ruby Tuesday chain. So I investigated how false positives affected the single-keyword searches I’ve used in some of the graphs. (I’m going to assume that “foo developer” is sufficiently immune from false positives.)

    I searched for Perl and then added negative keywords (-foo -bar …) until I’d removed almost all of the likely software related jobs. I ended up with this list (which shows that indeed.com don’t use stemming, which is sad and dumb of them):

    perl -developer -developers -engineer -software -programmer -programmers -programming -development -java -database -sql -oracle -sybase -scripting -scripter -coder -linux -unix -protocol -C -C++ -javascript -computing

    Then I did the same search but with python or ruby instead of perl. Here are the results:

    language
     
    all
    matches
    filtered
    matches
    inappropriate
    matches
    perl 29987 6 0.02% false
    python 7794 20 0.2% false
    ruby 4624 794 17% false

    Ruby is well below python (and far below perl) in the first graph, yet that includes this 17% boost from inappropriate matches. You have to marvel at Ruby’s ability to gain mind-share, if not market-share.