This is an update to my
I emailed Paul Jansen again, with a link to Andrew Sterling Hanenkamp’s blog post and mine. Here’s Paul’s reply:
Sorry for not answering yet. I am heading a successful company, which means loads of work to do;-)… Your chances increase in case you ask me the same question more than once.
Here are the results for the TIOBE index run for today. The query that has been applied is +”<language> programming”. Here is an overview of the number of hits:
1. Perl – Google: 966,000
1. Python – Google: 584,000
2. Perl – Yahoo: 2,570,000
2. Python – Yahoo: 2,170,000
3. Perl – Google Blogs: 164,518
3. Python – Google Blogs: 90,393
4. Perl – MSN: 1,210,000
4. Python – MSN: 965,000
5. Perl   – YouTube (marginal influence): 8
5. Python – YouTube (marginal influence): 52
So Python is closer to Perl for Yahoo and MSN if compared to the Google and Google Blogs hits. You only calculated Google and Google Blogs, so this might explain our different conclusions. It is possible that there was a temporary peak for Python in January. This occurs often when a language is in the spot light for some reason.
BTW. All this results in a score for Perl of 5.930% and for Python of 4.595%. I hope this answers your question!
The difference in hit count between Google and Yahoo is remarkable. Perhaps Yahoo crawls deeper. Perhaps Google is smarter about discarding duplicate content. I suspect the latter. (I’ll talk more about the differences between search engines in another post.)
The difference also highlights a problem with the TIOBE methodology. They’re combining the absolute hits from each search engine before normalizing. That will bias the TIOBE results towards search engines that give high hit counts. Are the results from Yahoo really twice as significant as the result from Google? You can argue it either way, but I think TIOBE should normalize the results from each search engine separately and then combine them. That would give the Yahoo view of programming language popularity equal weight to the Google and MSN views.
(Update 2008-04-21: Turns out that they do normalize each search engine separately, so I’ve struck out those two paragraphs. However, normalizing each search engine separately then raises the issue of how the normalized results are combined. My reading of the current definition is that they all get equal weight. So the question shifts from “Are the results from Yahoo really twice as significant as the result from Google?” to “Are the ~60 results from YouTube really just as significant as the ~1,500,000 results from Google?”)
I’d either drop Google Blogs search or add in another blog search to balance it. I’d also drop site specific searches, like YouTube, as the hits are too low to be useful and the other engines cover it anyway.
I’m also puzzled by the month-to-month volatility:
“It is possible that there was a temporary peak for Python in January. This occurs often when a language is in the spot light for some reason.”
It seems unlikely that enough new pages containing “foo language” could appear in one month to cause a significant spike in the results. And to then disappear the next month is even more unlikely.
It seems more likely to me that these spikes are ‘noise’ caused by the search engine index update processes. Even if they’re genuine spikes, actual programming language “popularity” doesn’t change significantly month to month. Pretending it does isn’t helpful and devalues the information they’re trying to provide.
I’d like to see TIOBE focus on something like a 3 month moving average.
Update: It seems that the TIOBE Index is being gamed.