[I couldn’t resist the title, sorry.]
“Figures often beguile me, particularly when I have the arranging of them myself; in which case the remark attributed to Disraeli would often apply with justice and force: ‘There are three kinds of lies: lies, damned lies, and statistics.
– Mark Twain
I’ve been meaning to write a post about the suspect methodology of the TIOBE Index but Andrew Sterling Hanenkamp beat me to it (via Perl Buzz).
I do want to add a few thoughts though…
The TIOBE Programming Community Index is built on two assumptions:
that the number of search engine hits for the phrase “foo programming” is proportional to the “popularity” of that language.
that the proportionality is the same for different languages.
It’s not hard to pick holes in both of those assumptions.
They also claim that “The ratings are based on the number of skilled engineers world-wide, courses and third party vendors” but I can’t see anything in their methodology that supports that claim.
I presume they’re just pointing out the kinds of sites that are more likely to contain the “foo programming” phrase.
Even if you can accept their assumptions as valid, can you trust their maths? Back in Jan 2008 when I was researching views of perl TIOBE was mentioned. So I took a look at it.
At the time Python had just risen above Perl, prompting TIOBE to declare Python the “programming language of the year”. When I did a manual search, using the method they described, the results didn’t fit.
I wrote an e-mail to Paul Jansen, the Managing Director and author of the TIOBE Index. Here’s most of it:
Take perl and python, for example:
I get 923,000 hits from google for +”python programming” and 3,030,000 for +”perl programming”. (The hits for Jython, IronPython, and pypy programming are tiny.) As reported by the “X-Y of approx Z results” at the top of the search results page.
Using google blog search I get 139,887 for +”python programming” and 491,267 for +”perl programming”. (The hits for Jython, IronPython, and pypy programming are tiny.)
So roughly 3-to-1 in perl’s favor from those two sources. It’s hard to imagine that “MSN, Yahoo!, and YouTube” would yield very different ratios.
So 3:1 for perl, yet python ranks higher than perl. Certainly seems odd.
Am I misunderstanding something?
I didn’t get a reply.
I did note that many languages had dipped sharply around that time and have risen sharply since. Is that level of month-to-month volatility realistic?
Meanwhile, James Robson has implemented an alternative, and open source, set of Language Usage Indicators. I’m hoping he’ll add trend graphs soon.
Update: the story continues.