Pay no attention to that callback behind the curtain!

So you’ve got some perl code that connects to a particular database via a particular DBI driver. You want it to connect to a different database or driver. But you can’t change that part of the code. What can you do?

I ran into this problem recently. A large application is using an old version of DBIx::HA which doesn’t support DBD::Gofer. DBIx::HA can’t be upgraded (long story, don’t ask) but I wanted to use DBD::Gofer to provide client-side caching via Cache::FastMmap. (I’ll save more details of that, and the 40% reduction in database requests it gave, for another post.)

I needed a way for DBIx::HA to think that it was connecting to a particular driver and database, but for it to actually connect to another. Using $ENV{DBI_AUTOPROXY} wasn’t an option because that has global effect whereas I needed fine control over which connections were affected. It’s also fairly blunt instrument in other ways.

It seemed like I was stuck. Then I remembered the DBI callback mechanism – it would provide an elegant solution to this. I added it to DBI 1.49 back in November 2005 and enhanced it further in 1.55. I’d never documented it though. I think I was never quite sure it had sufficient functionality to be really useful. Now I’m sure it has.

The DBI callback mechanism lets you intercept, and optionally replace, any method call on a DBI handle. At the extreme, it lets you become a puppet master, deceiving the application in any way you want.

Here’s how the code looked (with a few irrelevant details changed):

    # The following section of code uses the DBI Callback mechanism to
    # intercept connect() calls to DBD::Sybase and, where appropriate, 
    # reroute them to DBD::Gofer.
    our $in_callback;

    # get Gofer $drh and make it pretend to be named Sybase
    # to keep DBIx::HA 0.62 happy
    my $gofer_drh  = DBI->install_driver("Gofer");
    $gofer_drh->{Name} = "Sybase";

    # get the Sybase drh and install a callback to intercept connect()s
    my $sybase_drh = DBI->install_driver("Sybase");
    $sybase_drh->{Callbacks} = {
        connect => sub {
            # protect against recursion when gofer itself makes a connection
            return if $in_callback; local $in_callback = 1;

            my $drh = shift;
            my ($dsn, $u, $p, $attr) = @_;
            warn "connect via callback $drh $dsn\n" if $DEBUG;

            # we're only interested in connections to particular databases
            return unless $dsn =~ /some pattern/;

            # rewrite the DSN to connect to the same DSN via Gofer
            # using the null transport so we can use Gofer caching
            $dsn = "transport=null;dsn=dbi:Sybase(ReadOnly=1):$dsn";

            my $dbh = $gofer_drh->connect($dsn, $u, $p, $attr);

            if (not $dbh) { # gofer connection failed for some reason
                warn "connect via gofer failed: $DBI::errstr\n"
                    unless our $connect_via_gofer_err++; # warn once
                return; # DBI will now call original connect method
            }

            undef $_;    # tell DBI not to call original connect method
            return $dbh; # tell DBI to return this $dbh instead
        },
    };

So the application, via DBIx::HA, executed

  $dbh = DBI->connect("dbi:Sybase:foo",...)

but what it got back was a DBD::Gofer dbh, as if the application has executed

  $dbh = DBI->connect("dbi:Gofer:transport=null;dsn=dbi:Sybase(ReadOnly=1):foo",...).

I guess I should document the callback mechanism now. Meanwhile the closest thing to documentation is the test file.

I’ve always enjoyed this kind of “plumbing”. If you come up with any interesting uses of DBI callbacks, do let me know.

This is a perl blog, too. At least partly.

Just in case it wasn’t clear, perl programming is one of my interests, so this is, at least partly, a perl blog.

I’m only spelling it out like this because, by saying “perl blog“, I help to keep Mr. Schwern happy, and I’m saying “perl programming” purely for my own amusement.

While I’m here I might as well spread some Link Love to other notable perl blogs and the like: use.perl.org, perlmonks.org, news.perlfoundation.org and of course planet.perl.org super blog which lists many more.

TIOBE or not TIOBE – An Update

This is an update to my

I emailed Paul Jansen again, with a link to Andrew Sterling Hanenkamp’s blog post and mine. Here’s Paul’s reply:


Hi Tim,

Sorry for not answering yet. I am heading a successful company, which means loads of work to do;-)… Your chances increase in case you ask me the same question more than once.

Here are the results for the TIOBE index run for today. The query that has been applied is +”<language> programming”. Here is an overview of the number of hits:

1. Perl     – Google: 966,000
1. Python – Google: 584,000

2. Perl     – Yahoo: 2,570,000
2. Python – Yahoo: 2,170,000

3. Perl     – Google Blogs: 164,518
3. Python – Google Blogs:   90,393

4. Perl     – MSN: 1,210,000
4. Python – MSN:   965,000

5. Perl   &nbsp – YouTube (marginal influence): 8
5. Python – YouTube (marginal influence): 52

So Python is closer to Perl for Yahoo and MSN if compared to the Google and Google Blogs hits. You only calculated Google and Google Blogs, so this might explain our different conclusions. It is possible that there was a temporary peak for Python in January. This occurs often when a language is in the spot light for some reason.

BTW. All this results in a score for Perl of 5.930% and for Python of 4.595%. I hope this answers your question!

Regards,
Paul


The difference in hit count between Google and Yahoo is remarkable. Perhaps Yahoo crawls deeper. Perhaps Google is smarter about discarding duplicate content. I suspect the latter. (I’ll talk more about the differences between search engines in another post.)

The difference also highlights a problem with the TIOBE methodology. They’re combining the absolute hits from each search engine before normalizing. That will bias the TIOBE results towards search engines that give high hit counts. Are the results from Yahoo really twice as significant as the result from Google?

You can argue it either way, but I think TIOBE should normalize the results from each search engine separately and then combine them. That would give the Yahoo view of programming language popularity equal weight to the Google and MSN views.

(Update 2008-04-21: Turns out that they do normalize each search engine separately, so I’ve struck out those two paragraphs. However, normalizing each search engine separately then raises the issue of how the normalized results are combined. My reading of the current definition is that they all get equal weight. So the question shifts from “Are the results from Yahoo really twice as significant as the result from Google?” to “Are the ~60 results from YouTube really just as significant as the ~1,500,000 results from Google?”)

I’d either drop Google Blogs search or add in another blog search to balance it. I’d also drop site specific searches, like YouTube, as the hits are too low to be useful and the other engines cover it anyway.

I’m also puzzled by the month-to-month volatility:

“It is possible that there was a temporary peak for Python in January. This occurs often when a language is in the spot light for some reason.”

It seems unlikely that enough new pages containing “foo language” could appear in one month to cause a significant spike in the results. And to then disappear the next month is even more unlikely.

It seems more likely to me that these spikes are ‘noise’ caused by the search engine index update processes. Even if they’re genuine spikes, actual programming language “popularity” doesn’t change significantly month to month. Pretending it does isn’t helpful and devalues the information they’re trying to provide.

I’d like to see TIOBE focus on something like a 3 month moving average.

Update: It seems that the TIOBE Index is being gamed.


TIOBE or not TIOBE – “Lies, damned lies, and statistics”

[I couldn’t resist the title, sorry.]

“Figures often beguile me, particularly when I have the arranging of them myself; in which case the remark attributed to Disraeli would often apply with justice and force: ‘There are three kinds of lies: lies, damned lies, and statistics.
– Mark Twain

I’ve been meaning to write a post about the suspect methodology of the TIOBE Index but Andrew Sterling Hanenkamp beat me to it (via Perl Buzz).

I do want to add a few thoughts though…

The TIOBE Programming Community Index is built on two assumptions:

  • that the number of search engine hits for the phrase “foo programming” is proportional to the “popularity” of that language.
  • that the proportionality is the same for different languages.
  • It’s not hard to pick holes in both of those assumptions.

    They also claim that “The ratings are based on the number of skilled engineers world-wide, courses and third party vendors” but I can’t see anything in their methodology that supports that claim.
    I presume they’re just pointing out the kinds of sites that are more likely to contain the “foo programming” phrase.

    Even if you can accept their assumptions as valid, can you trust their maths? Back in Jan 2008 when I was researching views of perl TIOBE was mentioned. So I took a look at it.

    At the time Python had just risen above Perl, prompting TIOBE to declare Python the “programming language of the year”. When I did a manual search, using the method they described, the results didn’t fit.

    I wrote an e-mail to Paul Jansen, the Managing Director and author of the TIOBE Index. Here’s most of it:

    Take perl and python, for example:

    I get 923,000 hits from google for +”python programming” and 3,030,000 for +”perl programming”. (The hits for Jython, IronPython, and pypy programming are tiny.) As reported by the “X-Y of approx Z results” at the top of the search results page.

    Using google blog search I get 139,887 for +”python programming” and 491,267 for +”perl programming”. (The hits for Jython, IronPython, and pypy programming are tiny.)

    So roughly 3-to-1 in perl’s favor from those two sources. It’s hard to imagine that “MSN, Yahoo!, and YouTube” would yield very different ratios.

    So 3:1 for perl, yet python ranks higher than perl. Certainly seems odd.

    Am I misunderstanding something?

    I didn’t get a reply.

    I did note that many languages had dipped sharply around that time and have risen sharply since. Is that level of month-to-month volatility realistic?

    Meanwhile, James Robson has implemented an alternative, and open source, set of Language Usage Indicators. I’m hoping he’ll add trend graphs soon.

    Update: the story continues.


    Code duplication, cheap but not free

    I’m working on a large old codebase at the moment where Repeat Yourself seems to have been standard practice. Here’s a typical example:

        if (exists(Foo::StaticData::NodeRemap->get_data()->{ $args{cid} }->{ $graph_id })
            && Foo::StaticData::NodeRemap->get_data()->{ $args{cid} }->{ $graph_id }->{ as_of_rev } <= $current_revision) {
            $self->{cid} = Foo::StaticData::NodeRemap->get_data()->{ $args{cid} }->{ $graph_id }->{ new_node_id };
        }
        elsif (exists(Foo::StaticData::NodeRemap->get_data()->{ $args{cid} }->{''})
            && Foo::StaticData::NodeRemap->get_data()->{ $args{cid} }->{''}->{ as_of_rev } <= $current_revision) {
            $self->{cid} = Foo::StaticData::NodeRemap->get_data()->{ $args{cid} }->{ '' }->{ new_node_id };
        }

    The codebase has many, many, examples of this style written by a variety of developers.

    What puzzles me is why this kind of code didn’t raise red flags for the developers at the time. It’s harder to read, harder to maintain, and slower than a simpler approach. Perhaps the elsif part was a copy-n-paste job, but that still doesn’t explain the three instances of Foo::StaticData::NodeRemap->get_data()->{ $args{cid} }->{ $graph_id } in the first part. Perhaps even those were copy-n-pasted.

    I’d always have an urge to factor out the common expression into a temporary variable for efficiency and clarity. I wonder if my sensitivity to code duplication is partly due to needing to pay close attention to efficiency for most of my career. (I started Perl programming about 15 years ago, when cpu performance was measured in MHz.)

    I also wonder how soon tools like Perl::Critic can help detect duplicate code fragments. Common sub-expressions that may be candidates for elimination.

    I’m working on optimizing the codebase at the moment. That chunk showed up as a performance issue so I rewrote it as

        my $NodeRemap = Foo::StaticData::NodeRemap->get_data();
        for my $id ( $graph_id, '' ) {
            my $x = $NodeRemap->{ $args{cid} }->{ $id };
            next unless $x and $x->{as_of_rev} > $current_revision;
            $self->{cid} = $x->{new_node_id};
            last;
        }

    Spot my mistake?

    Sidebar: This post is also an experiment in posting code to my blog. I’m trying out MarsEdit. It’s good but I’d like to see a “Paste Preformatted” mechanism that would also html escape the contents of the paste buffer. It’s scriptable so I guess I could implement it myself in my copious spare time…

    Loaded Perl: A history in 530,000 emails

    MarkMail is a free service for searching mailing list archives. They’ve just loaded 530,000 emails from 75 perl-related mailing lists into their index.

    They’ve got a home page for searching these lists at http://perl.markmail.org/.

    Of course the first thing people often do with new search engines is search for themselves. I’m no exception. Where MarkMail shines is the ability to drill-down into the results in many ways with a single click (bugs, announcements, attachments etc). Worth a look.

    The graph of messages per month is not just cute, you can click and drag over a range of bars to narrow the search to a specific period. It clearly shows my activity rising sharply in 2001 and then dropping to a lower level after 2004.

    I particularly pleased that they’ve indexed dbi-users, dbi-dev, and dbi-announce lists.

    Perl Myths

    Update: several more recent versions of my Perl Myths talk are available. These have significant updates. Slides can be found on slideshare.net and screencasts can be found on my blip.tv channel.

    I’ve uploaded my Perl Myths presentation to slideshare.net and google video:

    “Perl has it’s share of myths. This presentation debunks a few popular ones with hard facts. Surprise yourself with the realities.”

    While I agree with Andy Lester that Good Perl code is the best form of evangelism, I wanted to put together a presentation that others could refer to when they encounter misinformation about Perl. I cover these myths that I’ve heard recently:

    • Perl is dead
    • Perl is hard to read / test / maintain
    • Perl 6 is killing Perl 5

    and pull in a wealth of upto date information, some of it quite surprising even to those familiar with Perl and its community. There are two versions, plus a video. I recommend the one with notes (which have useful extra detail and context for the slides) which is best viewed as a PDF. There’s also one without notes which I’ve embedded here:

    I videoed an extended version of this presentation at IWTC in Dublin in February. The first 40 minutes or so correspond with the slides above. In the remaining 30 minutes or so I talk about Parrot and Perl 6. I’ve embedded the video below, but wordpress forces me to use a small size so you’ll probably prefer to view it at video.google.com:

    Perceptions of Perl – views from the edge

    To help research for a talk I’m giving soon I asked the OPEN mailing list:

    Please spare a few moments to jot down your thoughts about the Perl language, CPAN, the community. Even if you don’t use it. In fact, especially if you don’t use it. How would you choose a language to develop a new web project? How would Perl rate, and why?

    The OPEN mailing list is “is a community-based mailing list for the discussion of general Web, Internet and related technologies” in Ireland. The participants have an eclectic mix of web related jobs and interests. There’s certainly no bias towards Perl.

    I’ve tried to distill the key points and group them into topics. In the process I’ve made some slight edits.

    Just in case it’s not completely clear, these are not my views, they are a summary of the assorted views of others. I offer them here to give some insight into how Perl is viewed outside it’s core community.

    General:

    • Perl is seen as a sys-admin language. Still used for back-end jobs.
    • PHP is easier to deploy at front end stuff.
    • Perl is still more powerful and easier to deploy backend.
    • Too complex compared to PHP. Code overly obtuse.
    • For web development Perl is losing ground to the more mainstream languages such as Java, PHP, .NET etc.
    • Perl is over complex and out of date, a sort of Cobol of the scripting world.
    • Perl has aged well. Its no longer the only choice for tasks but it is still very useful.
    • Perl 6 is killing Perl (slowly). But Perl is still popular, see http://www.tiobe.com/tpci.htm [I plan to do a post about apparent significant errors in the TIOBE results — Tim]
    • Perl community has a great reputation.
    • If scripting language were spanners:
      • PHP would be plastic spanners (too brittle)
      • Python would be Gold (too soft)
      • Perl is pre 1989 Ironmongery (not the cheap west german crap)
    • I know what type of spanners I want in my toolbox.

    Skills Availability:

    • Perl developers are now hard to find and expensive to employ when compared with the general over abundance of say (let’s face it, mediocre) VB.Net developers
    • Whilst some might say that clients are just jumping on the bandwagon by demanding things be built using .Net framework etc. This may be true to an extent, but there is the logic of accessibility, maintainability and affordability being applied here.
    • There’s no point in having something built by an individual or small team, to find that 2-3 years down the line the original team are nowhere to be found and the system requires major re-working/upgrading etc. and there’s only a tiny handful of “experts” available to take over (at somewhat large expense).

    CPAN:

    • Perl has some nice features. The best is that when you install a piece of Perl software on Linux, it works. It is very easy to locate and download the modules you need to get it going. Python, in contrast, tends to have modules all over the place.
    • CPAN is a bit of a mixed blessing. Using it tends to fall into sorcerer’s apprentice mode and the associated dependency hell;

    Maintenance and Maturing Coding Styles:

    • I found that I couldn’t read scripts six months after writing them. At that stage I switched over to Python.
    • My style in Ruby programming is so different it’s hard to be sure it’s the language (when maintaining or extending Perl and PHP code I now backport the Ruby way of working to those languages where possible) and not the fact that I got the language and the ‘Ruby Way’ simultaneously.
    • I trust the code I write in Perl, PHP and Ruby because it’s not opaque (unlike J2EE for example, which is an appalling hellhole of a world of hurt).
    • My personal reason for choosing Ruby every time is that It Makes Writing Really Good Code Really Good Fun. Or as Why so poignantly put it “You’ll be writing such beautiful code it’ll make you cry”.

    Hosting and Delivery:

    • Many cheap hosting packages don’t work well out of the box with perl.
    • I’d love to do all my dev in perl (and still so when other languages let me down), but PHP is so much more entrenched in the web hosting arena and totally painless in most ways, that perl takes longer and is harder to support.
    • I have developed Perl into standalone items using PDK and it runs & runs at clients sites without any hassle.

    Frameworks:

    • Use of development frameworks is leveling the playing field between languages so it’s a personal choice and familiarity rather than any language offering that huge advantage.
    • I don’t see Perl as being big in the web-framework world. It is just not on the Radar. PHP, Python and Ruby is all that I see. I am familiar with the Zope stack so I would use that for a new project. It is just too much of a learning curve to switch.
    • Ruby/PHP/Python and a couple of others have market and mindshare; These guys also have really good ‘frameworks’ for the click-and-drool brigade (eg Rails).
    • Perl is not getting nearly as much attention as Ruby and Rails, despite the rapid development of the excellent Catalyst.

    Many thanks to those to responded: Diarmaid Mac Aonghusa, Paul Grant, Dave Wilson, Paul Mc Auley, Kevin Gill, Lee Hosty, Tony Byrne, Brian Greene and Fergal J Byrne.

    Comparative Language Job Trend Graphs

    I researched these comparative job trend graphs for my Keynote at the 2007 London Perl Workshop, and then added a few more for this blog post.

    The graphs are from indeed.com, a job data aggregator and search engine. They’re all live, so every time you visit this page they’ll be updated with the current trend data (though it seems the underlying data isn’t updated often). My notes between the graphs relate to how they looked when I wrote this post in February 2008 (and the graphs were all Feb 2005 thru Dec 2008).

    Update: the graphs have all changed significantly since I wrote the post originally, and generally not in Perl’s favour. I saved a copy of the post as a PDF so you can see the graphs as they looked in early 2008.

    First up, all jobs that even mention perl, python or ruby anywhere in the description:

    The most amazing thing to me about this graph is that it indicates that 1% of all jobs mention perl. Wow.

    (Perhaps the profile of the jobs indeed.com is a little skewed towards technical jobs. If it is then I’m assuming it’s equally skewed for each of the programming languages. Note: An addendum below shows that ruby is getting ~17% boost through false positive matches from other jobs, like Ruby Tuesday restaurants. That applies to the graphs here that don’t qualify the search with an extra term like ‘software engineer’.)

    Here’s a slightly more focussed version that compares languages mentioned in jobs for “software engineer” or “software developer” roles:

    'software engineer' and 'software developer' roles mentioning perl or python or ruby

    A similar pattern. The narrowing of the gap between Perl and the others languages looks like good evidence of Perl’s broad appeal as a general purpose tool beyond the pure “software engineering/development” roles.

    I wanted to focus on jobs where developing software using a particular language was the principle focus of the job. So then I looked for “foo developer” jobs:

    perl developer vs python developer vs ruby developer

    That increases the gap between Perl and the others. Perhaps a reflection of Perl’s maturity – that it’s more entrenched so more likely to be used in the name of the role.

    But do people use “foo developer” or “foo programmer” for job titles? Let’s take a look:

    So “foo developer” is the most popular, but “foo programmer” is still significant, especially for Perl. (It’s a pity there’s no easy way to combine the pairs of trend lines. That would raise Perl even further.)

    To keep us dynamic language folk in our place, it’s worth comparing the trends above with those of more static languages:

    same as above but with C, c# and c++

    C++ and C# dwarf the dynamic languages. C and cobol are still alive and well, just.

    Then, to give the C++ and C# folk some perspective, let’s add Java to the mix:

    same as above but with java

    C++ and C# may dwarf the dynamic languages, but even they are dwarfed by Java.

    Let’s take a slight detour now to look at web related work. (It’s a detour because this post isn’t about web related work, it’s about the jobs market for the three main general purpose dynamic languages. People doing web work can tend to assume that everything is about web work.)

    We’ll start by adding in two more specialist languages, PHP and JavaScript:

    php and javascript developer

    I’m not surprised by the growth of PHP, though I’m sad that so many people are being introduced to ‘programming’ through it. I’m more surprised by the lack of height and growth in JavaScript. I presume that’s because it’s still rare for someone to be primarily a “JavaScript developer”. (That’ll change.) Let’s check that:

    perl, python, ruby, php, javascript, web-developer

    That’s much closer to what I’d expected. PHP is a popular skill, but is mentioned in less than half the jobs than Perl is. JavaScript, on the other hand, is in great and growing demand.

    Let’s look at the “web developer” role specifically and see which of the languages we’re interested in are mentioned most frequently:

    I think this graph captures the essence of why people think Perl is stagnant. It’s because Perl hasn’t been growing much in the ‘web developer’ world. People in that world are the ones most likely to be blogging about it and, I’ve noticed, tend to generalize their perceptions.

    (If you’re interested in PHP, Java, ASP and JavaScript and look here you’ll see that they all roughly follow the PHP line at about twice the height. JavaScript is at the top with accelerating growth.)

    Finally, just to show I’m not completely biased about Perl, here are the relative trends:relative trends

    This kind of graph reminds me of small companies that grow by a small absolute amount, say two employees growing to four, and then put out a press release saying they’re the “fastest growing company” in the area, or whatever. Dilbert recognises the issue. The graph looks striking now (Q1 2008) but means little. If it looks much like that in two years time, then it’ll be more impressive.

    Similarly, the fact that Perl is still growing its massive installed base over this period is impressive. (Seen most clearly by the second graph.) Perl 5 has been around for 14 years, and Perl itself for 21.

    The Perl community hasn’t been great at generating “Buzz” that’s visible outside the community. It’s just quietly getting on with the job. Lots of jobs. That lack of buzz helps create the impression that the Perl community lacks vitality relative to other similar languages. Hopefully this post, and others, go some small way towards correcting that.

    p.s. For an alternative, more geographic view, take a look at the Dynamic Language Jobs Map (about).

    Addendum:

    It turns out that approximately 14% of “ruby” jobs relate to restaurants – mostly the Ruby Tuesday chain. So I investigated how false positives affected the single-keyword searches I’ve used in some of the graphs. (I’m going to assume that “foo developer” is sufficiently immune from false positives.)

    I searched for Perl and then added negative keywords (-foo -bar …) until I’d removed almost all of the likely software related jobs. I ended up with this list (which shows that indeed.com don’t use stemming, which is sad and dumb of them):

    perl -developer -developers -engineer -software -programmer -programmers -programming -development -java -database -sql -oracle -sybase -scripting -scripter -coder -linux -unix -protocol -C -C++ -javascript -computing

    Then I did the same search but with python or ruby instead of perl. Here are the results:

    language all
    matches
    filtered
    matches
    inappropriate
    matches
    perl 29987 6 0.02% false
    python 7794 20 0.2% false
    ruby 4624 794 17% false

    Ruby is well below python (and far below perl) in the first graph, yet that includes this 17% boost from inappropriate matches. You have to marvel at Ruby’s ability to gain mind-share, if not market-share.