TIOBE or not TIOBE - “Lies, damned lies, and statistics”
[I couldn't resist the title, sorry.]
“Figures often beguile me, particularly when I have the arranging of them myself; in which case the remark attributed to Disraeli would often apply with justice and force: ‘There are three kinds of lies: lies, damned lies, and statistics.
- Mark Twain
I’ve been meaning to write a post about the suspect methodology of the TIOBE Index but Andrew Sterling Hanenkamp beat me to it (via Perl Buzz).
I do want to add a few thoughts though…
The TIOBE Programming Community Index is built on two assumptions:
It’s not hard to pick holes in both of those assumptions.
They also claim that “The ratings are based on the number of skilled engineers world-wide, courses and third party vendors” but I can’t see anything in their methodology that supports that claim.
I presume they’re just pointing out the kinds of sites that are more likely to contain the “foo programming” phrase.
Even if you can accept their assumptions as valid, can you trust their maths? Back in Jan 2008 when I was researching views of perl TIOBE was mentioned. So I took a look at it.
At the time Python had just risen above Perl, prompting TIOBE to declare Python the “programming language of the year”. When I did a manual search, using the method they described, the results didn’t fit.
I wrote an e-mail to Paul Jansen, the Managing Director and author of the TIOBE Index. Here’s most of it:
Take perl and python, for example:
I get 923,000 hits from google for +”python programming” and 3,030,000 for +”perl programming”. (The hits for Jython, IronPython, and pypy programming are tiny.) As reported by the “X-Y of approx Z results” at the top of the search results page.
Using google blog search I get 139,887 for +”python programming” and 491,267 for +”perl programming”. (The hits for Jython, IronPython, and pypy programming are tiny.)
So roughly 3-to-1 in perl’s favor from those two sources. It’s hard to imagine that “MSN, Yahoo!, and YouTube” would yield very different ratios.
So 3:1 for perl, yet python ranks higher than perl. Certainly seems odd.
Am I misunderstanding something?
I didn’t get a reply.
I did note that many languages had dipped sharply around that time and have risen sharply since. Is that level of month-to-month volatility realistic?
Meanwhile, James Robson has implemented an alternative, and open source, set of Language Usage Indicators. I’m hoping he’ll add trend graphs soon.
Thanks for the post. Your analysis is spot on.
Comment by Andrew Sterling Hanenkamp — April 12, 2008 @ 3:51 pm
The TIOBE numbers are why I wrote “Don’t compare percentages”. People use percentages to lie, and I think TIOBE is purposedly trying to distort the picture. They’d be more interesting if they supplied raw numbers.
The most damning thing about TIOBE and any other hit counting is that it relies on a third party to decide what to index. Most of the stuff that comes back from Google Blogsearch, for instance, is reposted crap for link attractors and other search engine optimization shenanigans. Counting the same original content in multiple places distorts the data and makes the hit counting just about worthless. Note that TIOBE had to change its index in April 2004 just for this reason (see the FAQ at the bottom of http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html).
Comment by brian d foy — April 12, 2008 @ 7:43 pm
[...] TIOBE or not TIOBE - “Lies, damned lies, and statistics” « Not this… I’ve been meaning to write a post about the suspect methodology of the TIOBE Index but Andrew Sterling Hanenkamp beat me to it (via Perl Buzz). (tags: blog.timbunce.org 2008 mes3 dia13 at_home TIOBE programming trends estatísticas) [...]
Pingback by rascunho » Blog Archive » links for 2008-04-14 — April 14, 2008 @ 8:39 pm
[...] TIOBE or not TIOBE - An Update Filed under: software — TimBunce @ 9:43 pm Tags: language, perl, python, search, trends This is an update to my earlier post TIOBE or not TIOBE - “Lies, damned lies, and statistics” [...]
Pingback by TIOBE or not TIOBE - An Update « Not this… — April 20, 2008 @ 10:02 pm
do you know any information about this subject in other languages?
Comment by Bayrak — April 22, 2008 @ 12:36 am
Bayrak, I’m not sure what you mean. Perhaps you could contact me by email.
Comment by TimBunce — April 22, 2008 @ 8:22 am
[...] Filed under: tech — TimBunce @ 10:56 am Tags: graphs, search I started a related recent post with a quote that seems just as apt here: “Figures often beguile me, particularly when I have [...]
Pingback by Lies, damn lies, and search engine rankings « Not this… — April 25, 2008 @ 11:05 am
[...] I’m only spelling it out like this because, by saying “perl blog“, I help to keep Mr. Schwern happy, and I’m saying “perl programming” purely for my own amusement. [...]
Pingback by This is a perl blog, too. At least partly. « Not this… — April 28, 2008 @ 8:27 pm