TIOBE Index is being gamed

It is sad, but inevitable, that the TIOBE index of programming language “popularity” (sic) would be gamed.

Once you start measuring something, and advertising the results, people with an interest in particular outcomes naturally start to look for ways to influence those results. (It’s the Observer Effect writ large.)

The fact that TIOBE’s methodology, which I’ve discussed previously here and here, is simplistic makes it particularly open to gaming. Anyone, or any community, with access to many web pages can simply add the magic phrase “foo programming”, where foo is their language of choice, to get counted.

And it seems that’s exactly what the Delphi community did at the end of 20081. They made a concerted effort and it seems to have paid off. (I’d be very interested in hearing about similar behaviour in other language communities.)

Is that behaviour gaming? The author of the post who exhorted is readers to “Update your Delphi related blog or site to say Delphi programming on every page in visible text (update the template). Stand up and be counted. You can make a difference!” doesn’t seem to think so, as he also said “I am not suggesting we game the system, just that we help TCPI get an accurate count.

An accurate count of what, exactly? That’s always been the fundamental question with TIOBE. It should be obvious that most web pages that talk about “delphi programming” wouldn’t actually contain the phrase “delphi programming”. The same applies to every other language. That’s the paradox at the heart of the TIOBE Index. And yet, somehow, TIOBE seem to think that counting pages containing the phrase “delphi programming” lets them claim that:

The ratings are based on the number of skilled engineers world-wide, courses and third party vendors.

Eh? How can they possibly defend that claim? Certainly their documented definition doesn’t support it, or even mention it.

I presume they’re thinking that CV’s, job postings, and adverts are most likely to contain the magic phrase. It should be obvious, again, that the number of CV’s, job postings, and adverts referring to a given programming language would naturally only be a small fraction of the total web pages referring to the language. (And only distantly related to the “popularity” of a language.) Yet that “small fraction” is what TIOBE measure and make bold claims about.

The fact that TIOBE is making a comparison based on a small fraction makes it even more troubling that TIOBE CEO Paul Jansen appears to support language communities changing their pages to include the magic “foo programming” phrase. In an email quoted on delphi.org he says:

For your information, I think your action has already some effect. Tonight’s run shows that Delphi is #8 at this moment. There is a realistic chance that Delphi will become “TIOBE’s Language of the Year 2008″

He’s endorsing the artificial insertion of the magic phrase. Clearly this distorts the TIOBE index in favour of language communities that infect as many pages as possible with the magic phrase.

That sure seems like an invitation to game the system! It’s likely to lead to other language communities doing the same, and so to further devaluation of the TIOBE Index.

(For alternatives to TIOBE you could look at sites like http://www.langpop.com/, James Robson’s Language Usage Indicators, or my popular comparison of job trends blog post with ‘live’ graphs.)

I have, on a couple of occasions, used the phrase “perl programming” in blog posts for my own amusement, and linked it to my original TIOBE or not TIOBE – “Lies, damned lies, and statistics” post. I haven’t suggested that others do the same. TIOBE’s endorsement of artificial insertion changes that. Now it seems like we’re going to get a dumb “race to the bottom” to see which language community controls the most web pages.

If, as a result, the TIOBE Index is affected significantly, then I simply hope they’ll drop their pretentious claims and state clearly exactly what they’re counting, how they’re doing it, and what it means: not much.

1. Many thanks to Barry Walsh for his blog post that alerted me to this.

Thanks, Iron Man, for the good excuse to perl blog

I’ve been thinking that I haven’t blogged much lately. Assorted half-baked ideas would cross my mind and then evaporate before I’d find the time, or motivation, to actually start writing.

The folks at the Enlightened Perl Organisation have solved the motivation problem by announcing the Iron Man Blogging Challenge: in short, “maintain a rolling frequency of 4 posts every 32 days, with no more than 10 days between posts“.

So about one post a week. I can aim for that!

Can you? “The rules are very simple: you blog, about Perl. Any aspect of Perl you like. Long, short, funny, serious, advanced, basic: it doesn’t matter. It doesn’t have to be in English, either, if that’s not your native language.” Why not try? Help yourself and help perl at the same time.

I’ll try to capture the half-baked ideas for perl blog posts as they cross my mind, then build on them as time and mood allow. Hopefully about one a week will mature into an actual blog post.


I remembered that almost exactly a year ago schwern blogged that he was “horrified at the junk which shows up when you search for perl blog on Google“. It seems the situation hasn’t improved much since. The planet.perl.org site is top, but the rest are a bit of a mishmash.

The problem is partly that “perl blog” isn’t a great search term. Google naturally gives preference to words that appear in urls and titles (all else being equal), but blogs rarely explicitly call themselves blogs on their pages or urls. I suspect many on the first page of results are there because ‘blog’ appears in the url. (To help out I’ve included “perl blog” in the title of this post :)

Searching for “perl blogs” (plural) works better because it finds pages talking about perl blogs, which is useful when searching for perl blogs.

One entry in the “perl blogs” results was the Perl Foundation’s wiki page listing perl blogs. That was new to me. This blog wasn’t on it so I’ve added it. Got a perl blog, or know someone who has, that’s not listed on that page? Go and add it, now! It’ll only take a moment.

Along similar lines, I’ve added the phrases “perl blog” and “perl programming” to the sidebar of my blog pages. The first is to help people searching for “perl blog”. The second is mostly for my own amusement.

Examples of Modern Perl

In the spirit of re-tweeting, this is a short post to highlight some great examples of “modern perl”. (I’m using the term modern perl very loosely, not referring specifically to any one book, website, or module or organization.)

Firstly I’d like to highlight a couple of recent posts by Jonathan Rockway:

* Unshortening URLs with Modern Perl (also available here). An interesting example application built with modern perl modules like MooseX::Declare, MooseX::Getopt, HTTP::Engine, AnyEvent::HTTP, TryCatch, and KiokuDB.

* Multimethods. Another great example from Jonathan highlighting the combined power of MooseX::Types, MooseX::Declare, and MooseX:: MultiMethods.

Then, from his work at the BBC, Curtis “Ovid” Poe has given us a great series of thoughtful posts on the benefits of replacing multiple inheritance with roles in a complex production code-base. The slides of his Inheritance vs Roles talk is a good place to start. Then dive in to the blog posts back here and work your way forward.

I ♥ modern perl!

Generate Treemaps for HTML from Perl, please.

Seeing this video of treemap for perlcritic memory usage reminded me of something…

I’d really like to be able to use treemaps in NYTProf reports to visualize the time spent in different parts, and depths, of the package namespace hierarchy. Currently that information is reported in a series of tables.

A much better interface could be provided by treemaps. Ideally allowing the user to drill-down into deeper levels of the package namespace hierarchy. (It needn’t be this flashy, just 2D would be fine :-)

In case you’re not familiar with them, treemaps are a great way to visualise hierarchical data. Here’s an example treemap of the disk space used by the files in a directory tree (from schoschie) 82244D56-FA2C-4044-AE46-EE53B63861BE.jpg

Perl already has a Treemap module, which can generate treemap images via the Imager module. Treemap is designed to support more output formats by sub-classing.

I guess it wouldn’t be hard to write a sub-class to generate HTML client-side image map data along with the image, so clicks on the image could be used to drill-down into treemaps that have more detail of the specific area that was clicked on.

More interesting, and more flexible, would be a sub-class to generate the treemap as a Scalable Vector Graphics diagram using the SVG module (and others).

I’m not going to be able to work on either of those ideas anytime soon.

Any volunteers?

Perl DynaLoader hack using .bs files

I needed to install a perl extension from a third-party. It’s an interface to a shared library, a .so file, that they also supply.

Normally I’d add the directory containing the shared library to the LD_LIBRARY_PATH environment variable. Then when the extension is loaded the system dynamic loader can find it.

For various reasons I didn’t want to do that in this case.

An alternative approach is to pre-load the shared library with a flag to make it’s symbols globally available for later linking. (The flag is called RTLD_GLOBAL on Linux, Solaris and other systems that use dlopen(). This hack may not work on other systems.)

But how to pre-load the shared library, only when needed, and without changing any existing perl code? This is where the pesky little .bs files that get installed with perl extensions come in handy.

They’re known as ‘bootstrap’ files. If the .bs file for an extension not empty then DynaLoader (and XSLoader) will execute the contents just before loading the shared object for the extension.

So I put code like this into the .bs file for the extension:

    use DynaLoader;
    DynaLoader::dl_load_file("$ENV{...}/...so", 1)
	    or die DynaLoader::dl_error();

Not a recommended approach, but neat and handy for me in this case.

NYTProf 2.09 – now handles modules using AutoLoader, like POSIX and Storable

I’ve uploaded Devel::NYTProf 2.09 to CPAN.

If you’re using VMS the big news is that Peter (Stig) Edwards has contributed patches that enable NYTProf to work on VMS. Yeah! Thanks Peter.

For the rest of us there’s only one significant new feature in this release: NYTProf now includes a heuristic (that’s geek for “it’ll be wrong sometimes”) to handle modules using AutoLoader. The most common of which are Storable and POSIX. You may have encountered a warning like this when running nytprofhtml:

Unable to open '/../../lib/Storable.pm' for reading: No such file or directory

It’s a symptom of a deeper problem caused by AutoSplit, the companion to AutoLoader. The details of the cause, effect, and fix aren’t worth going into now. If you’re interested you can read my summary to the mailing list.

The upshot is that NYTProf now reports times for autoloaded subs as if the sub was part of the parent module. Just what you want. Time spent in the AUTOLOAD’er sub is also reported, naturally.

There is a small chance that the heuristic will pick the wrong ‘parent’ module file for the autoloaded subroutine file. That can happen if there are other modules that use the same name for the last portion of the package name (e.g., Bar::Foo and Baz::Foo). If it ever happens to you, please let me know. Ideally with a small test case.

The list of changes can be found here (or here if you’re a detail fanatic).

One other particularly notable item is that the savesrc option wasn’t working reliably. Thanks to Andy Grundman providing a good test case, that’s now been fixed.


NYTProf screencast from the 2008 London Perl Workshop

I’ve uploaded the screencast of my NYTProf talk at the London Perl Workshop in November 2008.

It’s based on a not-quite-2.08 version and includes some coverage of an early draft of the ‘timings per rolled-up package name’ feature I discussed previously.

It also shows how and why anonymous subroutines, defined at the same line of different executions of the same string eval, get ‘merged’.

The demos use perlcritic and Moose code. It also includes a nice demonstration showing NYTProf highlighting a performance problem with File::Spec::Unix when called using Path::Class::Dir objects.

It’s 36 minutes long, including a good Q&A session at the end (wherein a market rate for performance improvements is established). Enjoy.

NYTProf 2.08 – better, faster, more cuddly

I’ve just released NYTProf 2.08 to CPAN, some three and a half months after 2.07.

If you’ve been profiling large applications then the first difference you’ll notice is that the new version generates reports much faster.

NYTProf 2.08 timings.pngThe next thing you may notice is that statement timings are now nicely formatted with units. Gisle Aas contributed the formatting code for 2.07 but I had to do some refactoring to get it working for the statement timings.

Another nice refinement is that hovering over a time will show a tool-tip with the time expressed as a percentage of the overall runtime.

Almost all the tables are now sortable. I used jQuery and the tablesorter plugin for that. I’ve not added any fancy buttons, just click on a table heading to sort by that column. You’ll see a little black arrow to show the column is sorted. (You can hold the shift key down to add second and third columns to the sort order.)

A profiler isn’t much use if it’s not accurate. NYTProf now has tests for correct handling of times for string evals within string evals. In fact the handling of string evals got a big overhaul for this version as part of ongoing improvements in the underlying data model. I’m working towards being able to show annotated performance reports for the contents of string evals. It’s not there yet, but definitely getting closer.

A related feature is the new savesrc=1 option. When enabled, with a recent version of perl, the source code for each source file is written into the profile data file. That makes the profile self-contained and, significantly, means that accurate reports can be generated even after the original source files have been modified.

Another new option is optimize=0. You can use it to disable the perl optimizer. That can be worth doing if the statement timings, or counts, for some chunk of code seem odd and you suspect that the perl optimizer has rewritten it.

The final new feature noted in the NYTProf 2.08 Changes file is that it’s now possible to generate multiple profile data files from a single application. Since v2.0 you could call DB::disable_profile() and DB::enable_profile() to control profiling at runtime. Now you can pass an optional filename to enable_profile to make it close the previous profile and open a new one. I imagine this would be most useful in long running applications where you’d leave profiling disabled (using the start=none option) and then call enable_profile and disable_profile around some specific code in specific situations – like certain requests to a mod_perl app.

There’s one more new feature that I’ve just realised I’d forgotten to add to the Changes file before the release: Timings per rolled-up package name. What’s that? Well, it’s probably easiest to show you…

These images are taken from a profile of perlcritic. Each shows the time spent exclusively in subroutines belonging to a certain package and any packages below it. Hovering over a time gives the percentage, so I can see that the 57.3s spent in the 36 PPI packages accounted for 42% of the runtime.

NYTProf 2.08 pkg1.png

This gives you a quick overview for large (wide) codebases that would be hard to get in any other way.

Tables are generated for upto five levels of package name hierarchy, so you can drill-down to finer levels of detail.

NYTProf 2.08 pkg2.png


NYTProf 2.08 pkg3.png

I can visualize a much better UI for this data than the series of tables nytprofhtml currently produces, but my limited free time and jQuery skills prevent me doing more. Patches welcome, naturally.


p.s. I’ve a screencast from my NYTProf talk at the London Perl Workshop in November I hope to (finally) upload soon. It includes a demo of the package roll-up timing tables.

Can you reproduce this NYTProf failure?

I’ve a new release of NYTProf ready to upload but I’m stuck.

The CPAN Testers service is reporting a failure on a number of systems but I can’t reproduce it locally or work out the cause.

Can you reproduce the failure with Devel::NYTProf 2.07_94? If so, could you give me remote access via ssh? (Or spare some time to investigate yourself – I’ll happily clue you in if you can reproduce the problem.)

Update: No one could reproduce it. It seems that the failures was not what it appeared to be. A clue was that only one tester was affected. Devel-NYTProf-2.07_94.tar.gz unpacked itself into a directory called Devel-NYTProf-2.07. It seems that when using CPANPLUS if the user had already got an old Devel-NYTProf-2.07 directory its contents got merged and tests would fail. I’m not convinced that’s the whole story, but Devel-NYTProf-2.07_95.tar.gz unpacked into a Devel-NYTProf-2.07_95 directory and didn’t run into the problem.

Update: More usefully, Andreas made my wish come true by pointing out the –solve parameter to the ctgetreports utility in his CPAN::Testers::ParseReports distribution. It “tries to identify the best contenders for a blame using Statistics::Regression. [...] The function prints the [...] top 3 candidates according to R^2 with their regression analysis.” Cool.