Not this…

May 8, 2008

Finding the cause of inexplicable warnings in XS code

Filed under: perl — TimBunce @ 10:28 am
Tags:

Occasionally you may run across an odd warning like this:

   Use of uninitialized value in subroutine entry at X line Y

where the code at that line is a call to an XS subroutine (let’s call it xsub()) and you’re certain that the arguments you’re passing are not undefined.

Somewhere, deep in the XS/C code, an undefined value is being used. But where? And why is perl reporting that line?

Perl is reporting the last line of perl code that was executed at the same or higher level in the stack. So other perl code, such as a callback, may have been executed between entering xsub() and the warning being generated, but that perl code must have returned before the warning was triggered.

Assuming XS/C code is large and complex, like mod_perl, how can you locate the code that’s triggering the warning?

Here’s a trick I’ve used a few times over the years:

    $SUB{__WARN__} = sub {
        CORE::dump if $_[0] =~ /uninitialized value in subroutine entry/;
        warn @_;
    }

That make the program abort and generate a core dump file at the point the warning is generated. You can then use a debugger, or Devel::CoreStack, to report the C call stack at the time. It’s a savage but effective technique.

If the XS/C code was compiled with options to keep debug info (i.e., -g) then that’ll show you exactly where in the XS/C code the undefined value is being used. If not, then it’ll at least show you the name of the XS/C function and the overall call stack.

(The dump function is a curious vestige of old ways. You could use kill(9, $$). I’m not sure about the portability of either, for this purpose, beyond unix-like systems.)

I suggested the technique to Graham Barr recently and it proved effective in tracking down the source of that warning in a very large mod_perl application. The warning pointed the finger at a $r->internal_redirect($uri) call. The actual cause was a PerlInitHandler returning undef. (The handler was an old version of DashProfiler::start_sample_period_all_profiles.)

Anyway, it dawned on me this morning that I should update the technique. It doesn’t have to be so savage. On modern systems you don’t need to shoot the process dead to get a C stack trace.

A few approaches came to mind:

  • spawn a “gcore $$” command (or similar) to get a core file from the running process
  • spawn a “pstack $$” command (or similar) to directly dump the stack trace from the running process
  • spawn a “gdb $$ &” (to attach to the running process) followed immediately by kill(17, $$) to send a SIGSTOP to the process to give time for the debugger to attach and for you to investigate the state of the live process.

I think the second of those would be most useful most of the time.

Hopefully this will be useful to someone.

April 12, 2008

TIOBE or not TIOBE - “Lies, damned lies, and statistics”

Filed under: software — TimBunce @ 12:58 am
Tags: , , , ,

[I couldn't resist the title, sorry.]

“Figures often beguile me, particularly when I have the arranging of them myself; in which case the remark attributed to Disraeli would often apply with justice and force: ‘There are three kinds of lies: lies, damned lies, and statistics.
- Mark Twain

I’ve been meaning to write a post about the suspect methodology of the TIOBE Index but Andrew Sterling Hanenkamp beat me to it (via Perl Buzz).

I do want to add a few thoughts though…

The TIOBE Programming Community Index is built on two assumptions:

  • that the number of search engine hits for the phrase “foo programming” is proportional to the “popularity” of that language.
  • that the proportionality is the same for different languages.
  • It’s not hard to pick holes in both of those assumptions.

    They also claim that “The ratings are based on the number of skilled engineers world-wide, courses and third party vendors” but I can’t see anything in their methodology that supports that claim.
    I presume they’re just pointing out the kinds of sites that are more likely to contain the “foo programming” phrase.

    Even if you can accept their assumptions as valid, can you trust their maths? Back in Jan 2008 when I was researching views of perl TIOBE was mentioned. So I took a look at it.

    At the time Python had just risen above Perl, prompting TIOBE to declare Python the “programming language of the year”. When I did a manual search, using the method they described, the results didn’t fit.

    I wrote an e-mail to Paul Jansen, the Managing Director and author of the TIOBE Index. Here’s most of it:

    Take perl and python, for example:

    I get 923,000 hits from google for +”python programming” and 3,030,000 for +”perl programming”. (The hits for Jython, IronPython, and pypy programming are tiny.) As reported by the “X-Y of approx Z results” at the top of the search results page.

    Using google blog search I get 139,887 for +”python programming” and 491,267 for +”perl programming”. (The hits for Jython, IronPython, and pypy programming are tiny.)

    So roughly 3-to-1 in perl’s favor from those two sources. It’s hard to imagine that “MSN, Yahoo!, and YouTube” would yield very different ratios.

    So 3:1 for perl, yet python ranks higher than perl. Certainly seems odd.

    Am I misunderstanding something?

    I didn’t get a reply.

    I did note that many languages had dipped sharply around that time and have risen sharply since. Is that level of month-to-month volatility realistic?

    Meanwhile, James Robson has implemented an alternative, and open source, set of Language Usage Indicators. I’m hoping he’ll add trend graphs soon.

    February 12, 2008

    Comparative Language Job Trend Graphs

    Filed under: software — TimBunce @ 12:46 am
    Tags: , , , , , , ,

    I researched these comparative job trend graphs for my Keynote at the 2007 London Perl Workshop, and then added a few more for this blog post.

    The graphs are from indeed.com, a job data aggregator and search engine. They’re all live, so every time you visit this page they’ll be updated with the current trend data (though it seems the underlying data isn’t updated often). My notes between the graphs relate to how they looked when I wrote this post in February 2008 (and the graphs were all Feb 2005 thru Dec 2008).

    First up, all jobs that even mention perl, python or ruby anywhere in the description:

    The most amazing thing to me about this graph is that it indicates that 1% of all jobs mention perl. Wow.

    (Perhaps the profile of the jobs indeed.com is a little skewed towards technical jobs. If it is then I’m assuming it’s equally skewed for each of the programming languages. Note: An addendum below shows that ruby is getting ~17% boost through false positive matches from other jobs, like Ruby Tuesday restaurants. That applies to the graphs here that don’t qualify the search with an extra term like ’software engineer’.)

    Here’s a slightly more focussed version that compares languages mentioned in jobs for “software engineer” or “software developer” roles:

    'software engineer' and 'software developer' roles mentioning perl or python or ruby

    A similar pattern. The narrowing of the gap between Perl and the others languages looks like good evidence of Perl’s broad appeal as a general purpose tool beyond the pure “software engineering/development” roles.

    I wanted to focus on jobs where developing software using a particular language was the principle focus of the job. So then I looked for “foo developer” jobs:

    perl developer vs python developer vs ruby developer

    That increases the gap between Perl and the others. Perhaps a reflection of Perl’s maturity - that it’s more entrenched so more likely to be used in the name of the role.

    But do people use “foo developer” or “foo programmer” for job titles? Let’s take a look:

    So “foo developer” is the most popular, but “foo programmer” is still significant, especially for Perl. (It’s a pity there’s no easy way to combine the pairs of trend lines. That would raise Perl even further.)

    To keep us dynamic language folk in our place, it’s worth comparing the trends above with those of more static languages:

    same as above but with C, c# and c++

    C++ and C# dwarf the dynamic languages. C and cobol are still alive and well, just.

    Then, to give the C++ and C# folk some perspective, let’s add Java to the mix:

    same as above but with java

    C++ and C# may dwarf the dynamic languages, but even they are dwarfed by Java.

    Let’s take a slight detour now to look at web related work. (It’s a detour because this post isn’t about web related work, it’s about the jobs market for the three main general purpose dynamic languages. People doing web work can tend to assume that everything is about web work.)

    We’ll start by adding in two more specialist languages, PHP and JavaScript:

    php and javascript developer

    I’m not surprised by the growth of PHP, though I’m sad that so many people are being introduced to ‘programming’ through it. I’m more surprised by the lack of height and growth in JavaScript. I presume that’s because it’s still rare for someone to be primarily a “JavaScript developer”. (That’ll change.) Let’s check that:

    perl, python, ruby, php, javascript, web-developer

    That’s much closer to what I’d expected. PHP is a popular skill, but is mentioned in less than half the jobs than Perl is. JavaScript, on the other hand, is in great and growing demand.

    Let’s look at the “web developer” role specifically and see which of the languages we’re interested in are mentioned most frequently:

    I think this graph captures the essence of why people think Perl is stagnant. It’s because Perl hasn’t been growing much in the ‘web developer’ world. People in that world are the ones most likely to be blogging about it and, I’ve noticed, tend to generalize their perceptions.

    (If you’re interested in PHP, Java, ASP and JavaScript and look here you’ll see that they all roughly follow the PHP line at about twice the height. JavaScript is at the top with accelerating growth.)

    Finally, just to show I’m not completely biased about Perl, here are the relative trends:relative trends

    This kind of graph always reminds me of small companies that grow by a small absolute amount, say two employees growing to four, and then put out a press release saying they’re the “fastest growing company” in the area, or whatever. Dilbert recognises the issue. The graph looks striking now (Q1 2008) but means little. If it looks much like that in two years time, then it’ll be impressive.

    Similarly, the fact that Perl is still growing its massive installed base over this period is impressive. (Seen most clearly by the second graph.) Perl 5 has been around for 14 years, and Perl itself for 21.

    The Perl community isn’t great at generating “Buzz” that’s visible outside the community, it’s just quietly getting on with the job. Lots of jobs. That lack of buzz helps create the impression that the Perl community lacks vitality relative to other similar languages. Hopefully this post, and others, go some small way towards correcting that.

    p.s. For an alternative, more geographic view, take a look at the Dynamic Language Jobs Map (about).

    Addendum:

    It turns out that approximately 14% of “ruby” jobs relate to restaurants - mostly the Ruby Tuesday chain. So I investigated how false positives affected the single-keyword searches I’ve used in some of the graphs. (I’m going to assume that “foo developer” is sufficiently immune from false positives.)

    I searched for Perl and then added negative keywords (-foo -bar …) until I’d removed almost all of the likely software related jobs. I ended up with this list (which shows that indeed.com don’t use stemming, which is sad and dumb of them):

    perl -developer -developers -engineer -software -programmer -programmers -programming -development -java -database -sql -oracle -sybase -scripting -scripter -coder -linux -unix -protocol -C -C++ -javascript -computing

    Then I did the same search but with python or ruby instead of perl. Here are the results:

    /table>

    Ruby is well below python (and far below perl) in the first graph, yet that includes this 17% boost from inappropriate matches. You have to marvel at Ruby’s ability to gain mind-share, if not market-share.

    language
     
    all
    matches
    filtered
    matches
    inappropriate
    matches
    perl 29987 6 0.02% false
    python 7794 20 0.2% false
    ruby 4624 794 17% false

    February 8, 2008

    Pedant at the London Perl Workshop

    Filed under: life, software — TimBunce @ 11:03 pm
    Tags: , ,

    My arm referring to a presentation slide

    I came across this striking image of my left arm again recently and thought I’d add here.
    It was taken by cowfish at the London Perl Workshop where I gave a keynote and a couple of presentations, including this one about my Gofer software.

    Blog at WordPress.com.