Not this…

April 12, 2008

TIOBE or not TIOBE – “Lies, damned lies, and statistics”

Filed under: software — TimBunce @ 12:58 am
Tags: , , , ,

[I couldn't resist the title, sorry.]

“Figures often beguile me, particularly when I have the arranging of them myself; in which case the remark attributed to Disraeli would often apply with justice and force: ‘There are three kinds of lies: lies, damned lies, and statistics.
- Mark Twain

I’ve been meaning to write a post about the suspect methodology of the TIOBE Index but Andrew Sterling Hanenkamp beat me to it (via Perl Buzz).

I do want to add a few thoughts though…

The TIOBE Programming Community Index is built on two assumptions:

  • that the number of search engine hits for the phrase “foo programming” is proportional to the “popularity” of that language.
  • that the proportionality is the same for different languages.
  • It’s not hard to pick holes in both of those assumptions.

    They also claim that “The ratings are based on the number of skilled engineers world-wide, courses and third party vendors” but I can’t see anything in their methodology that supports that claim.
    I presume they’re just pointing out the kinds of sites that are more likely to contain the “foo programming” phrase.

    Even if you can accept their assumptions as valid, can you trust their maths? Back in Jan 2008 when I was researching views of perl TIOBE was mentioned. So I took a look at it.

    At the time Python had just risen above Perl, prompting TIOBE to declare Python the “programming language of the year”. When I did a manual search, using the method they described, the results didn’t fit.

    I wrote an e-mail to Paul Jansen, the Managing Director and author of the TIOBE Index. Here’s most of it:

    Take perl and python, for example:

    I get 923,000 hits from google for +”python programming” and 3,030,000 for +”perl programming”. (The hits for Jython, IronPython, and pypy programming are tiny.) As reported by the “X-Y of approx Z results” at the top of the search results page.

    Using google blog search I get 139,887 for +”python programming” and 491,267 for +”perl programming”. (The hits for Jython, IronPython, and pypy programming are tiny.)

    So roughly 3-to-1 in perl’s favor from those two sources. It’s hard to imagine that “MSN, Yahoo!, and YouTube” would yield very different ratios.

    So 3:1 for perl, yet python ranks higher than perl. Certainly seems odd.

    Am I misunderstanding something?

    I didn’t get a reply.

    I did note that many languages had dipped sharply around that time and have risen sharply since. Is that level of month-to-month volatility realistic?

    Meanwhile, James Robson has implemented an alternative, and open source, set of Language Usage Indicators. I’m hoping he’ll add trend graphs soon.

    21 Comments »

    1. Thanks for the post. Your analysis is spot on.

      Comment by Andrew Sterling Hanenkamp — April 12, 2008 @ 3:51 pm | Reply

    2. The TIOBE numbers are why I wrote “Don’t compare percentages”. People use percentages to lie, and I think TIOBE is purposedly trying to distort the picture. They’d be more interesting if they supplied raw numbers.

      The most damning thing about TIOBE and any other hit counting is that it relies on a third party to decide what to index. Most of the stuff that comes back from Google Blogsearch, for instance, is reposted crap for link attractors and other search engine optimization shenanigans. Counting the same original content in multiple places distorts the data and makes the hit counting just about worthless. Note that TIOBE had to change its index in April 2004 just for this reason (see the FAQ at the bottom of http://www.tiobe.com/index.php/content/paperinfo/tpci/index.html).

      Comment by brian d foy — April 12, 2008 @ 7:43 pm | Reply

    3. [...] TIOBE or not TIOBE – “Lies, damned lies, and statistics” « Not this… I’ve been meaning to write a post about the suspect methodology of the TIOBE Index but Andrew Sterling Hanenkamp beat me to it (via Perl Buzz). (tags: blog.timbunce.org 2008 mes3 dia13 at_home TIOBE programming trends estatísticas) [...]

      Pingback by rascunho » Blog Archive » links for 2008-04-14 — April 14, 2008 @ 8:39 pm | Reply

    4. [...] TIOBE or not TIOBE – An Update Filed under: software — TimBunce @ 9:43 pm Tags: language, perl, python, search, trends This is an update to my earlier post TIOBE or not TIOBE – “Lies, damned lies, and statistics” [...]

      Pingback by TIOBE or not TIOBE - An Update « Not this… — April 20, 2008 @ 10:02 pm | Reply

    5. do you know any information about this subject in other languages?

      Comment by Bayrak — April 22, 2008 @ 12:36 am | Reply

    6. Bayrak, I’m not sure what you mean. Perhaps you could contact me by email.

      Comment by TimBunce — April 22, 2008 @ 8:22 am | Reply

    7. [...] Filed under: tech — TimBunce @ 10:56 am Tags: graphs, search I started a related recent post with a quote that seems just as apt here: “Figures often beguile me, particularly when I have [...]

      Pingback by Lies, damn lies, and search engine rankings « Not this… — April 25, 2008 @ 11:05 am | Reply

    8. [...] I’m only spelling it out like this because, by saying “perl blog“, I help to keep Mr. Schwern happy, and I’m saying “perl programming” purely for my own amusement. [...]

      Pingback by This is a perl blog, too. At least partly. « Not this… — April 28, 2008 @ 8:27 pm | Reply

    9. Interesting analysis. I hadn’t paid much attention to Tiobe until this May. On the misinformation of an outsider, they removed ColdFusion from the list claiming it wasn’t a programming language. Much to their chagrin they were immediately corrected, but on the realization that the technically “correct” way to refer to ColdFusion-the-language was “CFML” (which almost nobody actually calls it) they changed their search terms and it promptly fell into oblivion.

      It would kind of be like Googling for “bathroom tissue” (technically correct) instead of “Kleenex” (most common vernacular) and wondering why you got so much fewer results.

      Comment by Brad Wood — June 19, 2008 @ 7:28 am | Reply

    10. Sure. The measurement has flaws. But do you have a better idea? Tiobe is the best I know of. Can you beat it? Or are you just whining?

      Comment by Arthur Grifith — July 18, 2008 @ 6:54 pm | Reply

    11. “Sure. The measurement has flaws. But do you have a better idea? Tiobe is the best I know of. Can you beat it? Or are you just whining?”

      When the flaws are so egregious, being the best is not good enough. Come to think of it, what does it mean to be the best at doing something wrong, anyway? Bringing to our attention that Tiobe is unreliable at every step of their methodology (underlying assumptions, search aggregation, even the math) is useful by itself, without trying to “beat it”.

      Comment by Bernardo Rechea — July 23, 2008 @ 2:23 pm | Reply

    12. Interesting analysis. Do you have any constructive suggestion to how they should change their methodology?

      Comment by Kebabbert — July 29, 2008 @ 4:13 pm | Reply

    13. Personally, I think that the TIOBE index is very skewed by the beginner effect. I believe that most of the hits for the top languages are in fact just student/hobbyist programmers asking for help etc. I’m sure there are a lot more people posting questions and answers for Visual Basic and Delphi than there are posting about Cobol or AS/400 CL. Then again, when you think about it, finding the amount of current chatter about a language does seem to be a fairly good indication of how many people are working with or learning it.

      Aside from that, who cares? Why get upset when somebody says Java is more “popular” than Python? If you’re making money, that’s the most important thing. If you really want to blow off TIOBE’s findings, then make the argument that really, for any given architecture, there is only *one* language – everything else is just a kind of pre-processor.

      Ha Ha

      Comment by iDentity Crysis — August 22, 2008 @ 2:25 am | Reply

    14. And on top of that, searching Google or any other engine doesn’t necessarily prove popularity of anything.

      Maybe the reason Java has so many results is that it is a pain in the XXX so there are tons of people out there looking for help. ;)

      Comment by Dave — December 9, 2008 @ 3:47 pm | Reply

    15. [...] pages. The first is to help people searching for “perl blog”. The second is mostly for my own amusement. Comments [...]

      Pingback by Thanks, Iron Man, for the good excuse to perl blog « Not this… — May 10, 2009 @ 10:24 pm | Reply

    16. [...] course there are additional motives behind all this. Now whether your concerned about Perl’s position in TIOBE results or not it [...]

      Pingback by Perl Blogs « transfixed but not dead! — May 17, 2009 @ 3:45 pm | Reply

    17. [...] fact that TIOBE’s methodology, which I’ve discussed previously here and here, is simplistic makes it particularly open to gaming. Any one, or any community, with [...]

      Pingback by TIOBE Index is being gamed « Not this… — May 17, 2009 @ 11:35 pm | Reply

    18. [...] TIOBE or not TIOBE – “Lies, damned lies, and statistics” [...]

      Pingback by CPAN Testers supports Perl programming — June 1, 2009 @ 2:23 am | Reply

    19. [...] TIOBE or not TIOBE – “Lies, damned lies, and statistics” [...]

      Pingback by CPAN Testers supports Perl programming | rapid-DEV.net — June 15, 2009 @ 6:28 am | Reply

    20. [...] TIOBE or not TIOBE – “Lies, damned lies, and statistics” [...]

      Pingback by CPAN Testers supports Perl programmingdagolden | rapid-DEV.net — June 15, 2009 @ 6:30 am | Reply


    RSS feed for comments on this post. TrackBack URI

    Leave a comment

    Blog at WordPress.com.