TIOBE or not TIOBE – An Update

This is an update to my

I emailed Paul Jansen again, with a link to Andrew Sterling Hanenkamp’s blog post and mine. Here’s Paul’s reply:


Hi Tim,

Sorry for not answering yet. I am heading a successful company, which means loads of work to do;-)… Your chances increase in case you ask me the same question more than once.

Here are the results for the TIOBE index run for today. The query that has been applied is +”<language> programming”. Here is an overview of the number of hits:

1. Perl     – Google: 966,000
1. Python – Google: 584,000

2. Perl     – Yahoo: 2,570,000
2. Python – Yahoo: 2,170,000

3. Perl     – Google Blogs: 164,518
3. Python – Google Blogs:   90,393

4. Perl     – MSN: 1,210,000
4. Python – MSN:   965,000

5. Perl   &nbsp – YouTube (marginal influence): 8
5. Python – YouTube (marginal influence): 52

So Python is closer to Perl for Yahoo and MSN if compared to the Google and Google Blogs hits. You only calculated Google and Google Blogs, so this might explain our different conclusions. It is possible that there was a temporary peak for Python in January. This occurs often when a language is in the spot light for some reason.

BTW. All this results in a score for Perl of 5.930% and for Python of 4.595%. I hope this answers your question!

Regards,
Paul


The difference in hit count between Google and Yahoo is remarkable. Perhaps Yahoo crawls deeper. Perhaps Google is smarter about discarding duplicate content. I suspect the latter. (I’ll talk more about the differences between search engines in another post.)

The difference also highlights a problem with the TIOBE methodology. They’re combining the absolute hits from each search engine before normalizing. That will bias the TIOBE results towards search engines that give high hit counts. Are the results from Yahoo really twice as significant as the result from Google?

You can argue it either way, but I think TIOBE should normalize the results from each search engine separately and then combine them. That would give the Yahoo view of programming language popularity equal weight to the Google and MSN views.

(Update 2008-04-21: Turns out that they do normalize each search engine separately, so I’ve struck out those two paragraphs. However, normalizing each search engine separately then raises the issue of how the normalized results are combined. My reading of the current definition is that they all get equal weight. So the question shifts from “Are the results from Yahoo really twice as significant as the result from Google?” to “Are the ~60 results from YouTube really just as significant as the ~1,500,000 results from Google?”)

I’d either drop Google Blogs search or add in another blog search to balance it. I’d also drop site specific searches, like YouTube, as the hits are too low to be useful and the other engines cover it anyway.

I’m also puzzled by the month-to-month volatility:

“It is possible that there was a temporary peak for Python in January. This occurs often when a language is in the spot light for some reason.”

It seems unlikely that enough new pages containing “foo language” could appear in one month to cause a significant spike in the results. And to then disappear the next month is even more unlikely.

It seems more likely to me that these spikes are ‘noise’ caused by the search engine index update processes. Even if they’re genuine spikes, actual programming language “popularity” doesn’t change significantly month to month. Pretending it does isn’t helpful and devalues the information they’re trying to provide.

I’d like to see TIOBE focus on something like a 3 month moving average.

Update: It seems that the TIOBE Index is being gamed.


6 thoughts on “TIOBE or not TIOBE – An Update

  1. Pingback: ask » Blog Archive » TIOBE or not TIOBE - An Update

  2. This is all arguing about nothing. What does the number of hits for “X programming” have to do with anything?

    It is absurd for him to draw the conclusion that Perl is less popular as a programming language because there are fewer hits for “perl programming” in any given search engine. It is a leap of logic so huge that I’m astonished that anyone gives a damn.

  3. Andy, you’re right of course that people read much more into it than it’s worth. That’s why I wanted to dig into it a little to see what merit, if any, there was.

    They make two big claims: “The TIOBE Programming Community index gives an indication of the popularity of programming languages” and “The ratings are based on the number of skilled engineers world-wide, courses and third party vendors.”

    The first of those I could accept, just, if popularity was in quotes and with a caveat added like “based on the popularity of a simple search term”.

    The second just seems plain wrong. The most charitable explanation is that those are the kinds of things most likely to generate hits, but the statement is certainly misleading.

  4. Pingback: Lies, damn lies, and search engine rankings « Not this…

  5. Pingback: TIOBE Index is being gamed « Not this…

  6. Pingback: TIOBE or not TIOBE – “Lies, damned lies, and statistics” « Not this…

Comments are closed.