Announcing Apache2::AuthPAM

I’ve been working on a mod_perl2 based admin interface for a pool of Gofer servers recently.

I needed to secure it and PAM plus SSL was the simplest way to go.

The Authen::PAM module is old (2005), doesn’t have good reviews, and the test results are poor, but it worked for us.

That just left the question of how to interface with Apache 2. There’s an Apache::AuthPAM module but that’s even older (2002), has more failing test reports than passes, and was mod_perl 1 only.

I emailed the author but got no reply so I’ve gone ahead and hacked it into a new module for mod_perl2: Apache2::AuthPAM. The source code is hosted by google.

Realistically I’m unlikely to touch it again, unless we find problems, so I’d be delighted if anyone wants to take it over, or even just co-maintain it.

Finding the cause of inexplicable warnings in XS code

Occasionally you may run across an odd warning like this:

   Use of uninitialized value in subroutine entry at X line Y

where the code at that line is a call to an XS subroutine (let’s call it xsub()) and you’re certain that the arguments you’re passing are not undefined.

Somewhere, deep in the XS/C code, an undefined value is being used. But where? And why is perl reporting that line?

Perl is reporting the last line of perl code that was executed at the same or higher level in the stack. So other perl code, such as a callback, may have been executed between entering xsub() and the warning being generated, but that perl code must have returned before the warning was triggered.

Assuming XS/C code is large and complex, like mod_perl, how can you locate the code that’s triggering the warning?

Here’s a trick I’ve used a few times over the years:

    $SUB{__WARN__} = sub {
        CORE::dump if $_[0] =~ /uninitialized value in subroutine entry/;
        warn @_;
    }

That make the program abort and generate a core dump file at the point the warning is generated. You can then use a debugger, or Devel::CoreStack, to report the C call stack at the time. It’s a savage but effective technique.

If the XS/C code was compiled with options to keep debug info (i.e., -g) then that’ll show you exactly where in the XS/C code the undefined value is being used. If not, then it’ll at least show you the name of the XS/C function and the overall call stack.

(The dump function is a curious vestige of old ways. You could use kill(9, $$). I’m not sure about the portability of either, for this purpose, beyond unix-like systems.)

I suggested the technique to Graham Barr recently and it proved effective in tracking down the source of that warning in a very large mod_perl application. The warning pointed the finger at a $r->internal_redirect($uri) call. The actual cause was a PerlInitHandler returning undef. (The handler was an old version of DashProfiler::start_sample_period_all_profiles.)

Anyway, it dawned on me this morning that I should update the technique. It doesn’t have to be so savage. On modern systems you don’t need to shoot the process dead to get a C stack trace.

A few approaches came to mind:

  • spawn a “gcore $$” command (or similar) to get a core file from the running process
  • spawn a “pstack $$” command (or similar) to directly dump the stack trace from the running process
  • spawn a “gdb $$ &” (to attach to the running process) followed immediately by kill(17, $$) to send a SIGSTOP to the process to give time for the debugger to attach and for you to investigate the state of the live process.

I think the second of those would be most useful most of the time.

Hopefully this will be useful to someone.

Pay no attention to that callback behind the curtain!

So you’ve got some perl code that connects to a particular database via a particular DBI driver. You want it to connect to a different database or driver. But you can’t change that part of the code. What can you do?

I ran into this problem recently. A large application is using an old version of DBIx::HA which doesn’t support DBD::Gofer. DBIx::HA can’t be upgraded (long story, don’t ask) but I wanted to use DBD::Gofer to provide client-side caching via Cache::FastMmap. (I’ll save more details of that, and the 40% reduction in database requests it gave, for another post.)

I needed a way for DBIx::HA to think that it was connecting to a particular driver and database, but for it to actually connect to another. Using $ENV{DBI_AUTOPROXY} wasn’t an option because that has global effect whereas I needed fine control over which connections were affected. It’s also fairly blunt instrument in other ways.

It seemed like I was stuck. Then I remembered the DBI callback mechanism – it would provide an elegant solution to this. I added it to DBI 1.49 back in November 2005 and enhanced it further in 1.55. I’d never documented it though. I think I was never quite sure it had sufficient functionality to be really useful. Now I’m sure it has.

The DBI callback mechanism lets you intercept, and optionally replace, any method call on a DBI handle. At the extreme, it lets you become a puppet master, deceiving the application in any way you want.

Here’s how the code looked (with a few irrelevant details changed):

    # The following section of code uses the DBI Callback mechanism to
    # intercept connect() calls to DBD::Sybase and, where appropriate, 
    # reroute them to DBD::Gofer.
    our $in_callback;

    # get Gofer $drh and make it pretend to be named Sybase
    # to keep DBIx::HA 0.62 happy
    my $gofer_drh  = DBI->install_driver("Gofer");
    $gofer_drh->{Name} = "Sybase";

    # get the Sybase drh and install a callback to intercept connect()s
    my $sybase_drh = DBI->install_driver("Sybase");
    $sybase_drh->{Callbacks} = {
        connect => sub {
            # protect against recursion when gofer itself makes a connection
            return if $in_callback; local $in_callback = 1;

            my $drh = shift;
            my ($dsn, $u, $p, $attr) = @_;
            warn "connect via callback $drh $dsn\n" if $DEBUG;

            # we're only interested in connections to particular databases
            return unless $dsn =~ /some pattern/;

            # rewrite the DSN to connect to the same DSN via Gofer
            # using the null transport so we can use Gofer caching
            $dsn = "transport=null;dsn=dbi:Sybase(ReadOnly=1):$dsn";

            my $dbh = $gofer_drh->connect($dsn, $u, $p, $attr);

            if (not $dbh) { # gofer connection failed for some reason
                warn "connect via gofer failed: $DBI::errstr\n"
                    unless our $connect_via_gofer_err++; # warn once
                return; # DBI will now call original connect method
            }

            undef $_;    # tell DBI not to call original connect method
            return $dbh; # tell DBI to return this $dbh instead
        },
    };

So the application, via DBIx::HA, executed

  $dbh = DBI->connect("dbi:Sybase:foo",...)

but what it got back was a DBD::Gofer dbh, as if the application has executed

  $dbh = DBI->connect("dbi:Gofer:transport=null;dsn=dbi:Sybase(ReadOnly=1):foo",...).

I guess I should document the callback mechanism now. Meanwhile the closest thing to documentation is the test file.

I’ve always enjoyed this kind of “plumbing”. If you come up with any interesting uses of DBI callbacks, do let me know.

This is a perl blog, too. At least partly.

Just in case it wasn’t clear, perl programming is one of my interests, so this is, at least partly, a perl blog.

I’m only spelling it out like this because, by saying “perl blog“, I help to keep Mr. Schwern happy, and I’m saying “perl programming” purely for my own amusement.

While I’m here I might as well spread some Link Love to other notable perl blogs and the like: use.perl.org, perlmonks.org, news.perlfoundation.org and of course planet.perl.org super blog which lists many more.

Lies, damn lies, and search engine rankings

I started a related recent post with a quote that seems just as apt here:

“Figures often beguile me, particularly when I have the arranging of them myself; in which case the remark attributed to Disraeli would often apply with justice and force: ‘There are three kinds of lies: lies, damned lies, and statistics.
– Mark Twain

If you regularly use just one search engine, as I tend to do, it’s very easy to be lulled into a false sense of security about the quality and relevance of the results.

I was recently reminded of the significant differences that can occur in the results of different search engines. That, in turn, reminded me of tools I’d come across previously to highlight those differences. In particular one that gives a very clear picture of the differences in ranking. After a little digging I found it at langreiter.com (via list of tools at http://www.seocompany.ca).

As a demonstration, here’s a comparison of the top results for +”perl programming” at Google (top) and Yahoo (bottom):

google vs yahoo rankings for perl programming via langreiter.png

and here’s the same for +”python programming”:

google vs yahoo rankings for python programming via langreiter.png

Each dot represents a result url, with the top ranked results on the left. Where a url appears in the top 100 results on both Google and Yahoo then a line is drawn between them to highlight the different rankings. On the site you can hover over the dots to see the corresponding url.

I remember being very surprised when I first saw these kinds of results a few years ago. I’m no less surprised now. If fact more so, as I’d had (naïvely) expected Yahoo and Google to have converged somewhat in their concept of relevancy. At least for top results.

The particular queries I used above are not exceptional. I couldn’t find any query that didn’t have significant differences in rankings. Don’t believe me? Go try it yourself at http://www.langreiter.com/exec/yahoo-vs-google.html.

That so many of the top 20 from one search engine don’t even appear in the top 100 of the other is… is… well, I’m not quite sure what to make of it. At first sight it seems like a bad thing, but I also have to admit that it’s a good thing. At least in some ways. Diversity is important in any ecosystem.

If you only use one major search engine then you have to accept that you’re getting just one view of the internet. Most of the time you may be happy with that. It’s worth keeping it in mind, though, for those times when you’re struggling to find good results.

One way to avoid the issue is to use a meta search engine that’ll query multiple search engines for you and merge the results. There are lots of them.

TIOBE or not TIOBE – An Update

This is an update to my

I emailed Paul Jansen again, with a link to Andrew Sterling Hanenkamp’s blog post and mine. Here’s Paul’s reply:


Hi Tim,

Sorry for not answering yet. I am heading a successful company, which means loads of work to do;-)… Your chances increase in case you ask me the same question more than once.

Here are the results for the TIOBE index run for today. The query that has been applied is +”<language> programming”. Here is an overview of the number of hits:

1. Perl     – Google: 966,000
1. Python – Google: 584,000

2. Perl     – Yahoo: 2,570,000
2. Python – Yahoo: 2,170,000

3. Perl     – Google Blogs: 164,518
3. Python – Google Blogs:   90,393

4. Perl     – MSN: 1,210,000
4. Python – MSN:   965,000

5. Perl   &nbsp – YouTube (marginal influence): 8
5. Python – YouTube (marginal influence): 52

So Python is closer to Perl for Yahoo and MSN if compared to the Google and Google Blogs hits. You only calculated Google and Google Blogs, so this might explain our different conclusions. It is possible that there was a temporary peak for Python in January. This occurs often when a language is in the spot light for some reason.

BTW. All this results in a score for Perl of 5.930% and for Python of 4.595%. I hope this answers your question!

Regards,
Paul


The difference in hit count between Google and Yahoo is remarkable. Perhaps Yahoo crawls deeper. Perhaps Google is smarter about discarding duplicate content. I suspect the latter. (I’ll talk more about the differences between search engines in another post.)

The difference also highlights a problem with the TIOBE methodology. They’re combining the absolute hits from each search engine before normalizing. That will bias the TIOBE results towards search engines that give high hit counts. Are the results from Yahoo really twice as significant as the result from Google?

You can argue it either way, but I think TIOBE should normalize the results from each search engine separately and then combine them. That would give the Yahoo view of programming language popularity equal weight to the Google and MSN views.

(Update 2008-04-21: Turns out that they do normalize each search engine separately, so I’ve struck out those two paragraphs. However, normalizing each search engine separately then raises the issue of how the normalized results are combined. My reading of the current definition is that they all get equal weight. So the question shifts from “Are the results from Yahoo really twice as significant as the result from Google?” to “Are the ~60 results from YouTube really just as significant as the ~1,500,000 results from Google?”)

I’d either drop Google Blogs search or add in another blog search to balance it. I’d also drop site specific searches, like YouTube, as the hits are too low to be useful and the other engines cover it anyway.

I’m also puzzled by the month-to-month volatility:

“It is possible that there was a temporary peak for Python in January. This occurs often when a language is in the spot light for some reason.”

It seems unlikely that enough new pages containing “foo language” could appear in one month to cause a significant spike in the results. And to then disappear the next month is even more unlikely.

It seems more likely to me that these spikes are ‘noise’ caused by the search engine index update processes. Even if they’re genuine spikes, actual programming language “popularity” doesn’t change significantly month to month. Pretending it does isn’t helpful and devalues the information they’re trying to provide.

I’d like to see TIOBE focus on something like a 3 month moving average.

Update: It seems that the TIOBE Index is being gamed.


TIOBE or not TIOBE – “Lies, damned lies, and statistics”

[I couldn’t resist the title, sorry.]

“Figures often beguile me, particularly when I have the arranging of them myself; in which case the remark attributed to Disraeli would often apply with justice and force: ‘There are three kinds of lies: lies, damned lies, and statistics.
– Mark Twain

I’ve been meaning to write a post about the suspect methodology of the TIOBE Index but Andrew Sterling Hanenkamp beat me to it (via Perl Buzz).

I do want to add a few thoughts though…

The TIOBE Programming Community Index is built on two assumptions:

  • that the number of search engine hits for the phrase “foo programming” is proportional to the “popularity” of that language.
  • that the proportionality is the same for different languages.
  • It’s not hard to pick holes in both of those assumptions.

    They also claim that “The ratings are based on the number of skilled engineers world-wide, courses and third party vendors” but I can’t see anything in their methodology that supports that claim.
    I presume they’re just pointing out the kinds of sites that are more likely to contain the “foo programming” phrase.

    Even if you can accept their assumptions as valid, can you trust their maths? Back in Jan 2008 when I was researching views of perl TIOBE was mentioned. So I took a look at it.

    At the time Python had just risen above Perl, prompting TIOBE to declare Python the “programming language of the year”. When I did a manual search, using the method they described, the results didn’t fit.

    I wrote an e-mail to Paul Jansen, the Managing Director and author of the TIOBE Index. Here’s most of it:

    Take perl and python, for example:

    I get 923,000 hits from google for +”python programming” and 3,030,000 for +”perl programming”. (The hits for Jython, IronPython, and pypy programming are tiny.) As reported by the “X-Y of approx Z results” at the top of the search results page.

    Using google blog search I get 139,887 for +”python programming” and 491,267 for +”perl programming”. (The hits for Jython, IronPython, and pypy programming are tiny.)

    So roughly 3-to-1 in perl’s favor from those two sources. It’s hard to imagine that “MSN, Yahoo!, and YouTube” would yield very different ratios.

    So 3:1 for perl, yet python ranks higher than perl. Certainly seems odd.

    Am I misunderstanding something?

    I didn’t get a reply.

    I did note that many languages had dipped sharply around that time and have risen sharply since. Is that level of month-to-month volatility realistic?

    Meanwhile, James Robson has implemented an alternative, and open source, set of Language Usage Indicators. I’m hoping he’ll add trend graphs soon.

    Update: the story continues.


    Boundaries of Discourse

    Back in my first blog post, entitled “This is not me…” I said:

    So I have a blog, yet I know not what I’ll use it for, nor what parts of my self I’ll choose to log.

    You’re welcome to join me on this meandering journey. Though the map is not the territory.

    Until recently the journey hadn’t meandered far from technical topics. Some chocolate, a mention of Cubs and Toastmasters. All safe topics. All likely to be expanded on in future, especially the chocolate!

    In my previous post however, Introversion, I stretched the envelope further by sharing some more personal insight into my self. That was an interesting experience. Feeling my way up to the boundary of what I was comfortable for me to blog about at this time.

    If you have a blog you must make choices about what to say, and what not to say. Just as in real-life conversations. Only with a blog you don’t know who the audience are. How do you, bloggers, make those choices? Where do you draw the line?

    I guess the answers must relate to the bigger question of Why Blog? I don’t have an answer to that question yet. I think I mainly blog to share. To give insight into my life, thoughts and experiences in the hope that it may be useful to others. I also blog to log. To create a record to look back on.

    After some further though I added a postscript to that Introversion post:


    Postscript: I paused a day or so before posting the above, wondering if it was wise. Wondering, especially, if it was likely to be misunderstood. Now, after a couple more days, I think it’s worth adding a little postscript.

    I approach my self and my life (mental, physical, emotional, and spiritual) with the same curiosity and interest with which I approach my work. The engineer part of me wants to know how it works. How I work. How the pieces of my life fit together.

    … let no day pass without discussing goodness and all the other subjects about which you hear me talking and examining both myself and others,
    [this] is really the very best thing that a man (or women) can do, and that life without this sort of examination is not worth living …”
    Socrates.


    Introversion

    I recently came across a thoughtful piece by Joe Kissell on Instant Messaging for Introverts.

    A common misconception about the word “introvert” is that it means someone who’s shy, withdrawn, afraid of crowds, or lacking in social skills.

    If you’re an introvert, like myself, I think you’ll find it interesting and helpful. I certainly did.

    If you’re not an introvert then I’d still recommend it. It gives some valuable insight that may improve your understanding of, and communication with, the introverts in your life.

    I took the quick Kiersey Temperament Sorter test at keirsey.com that Joe links to. I’m always a little skeptical of these kinds of tests that depend on answers to difficult to answer questions. Anyway, it labeled me an “Artisan, Composer“, for what it’s worth. The description seemed a good fit, mostly.

    That’s an ISFP Myers-Briggs type. My wife thinks I’m probably an ISTP. Being my wife, and a psychotherapist, she might be right.

    I find the Enneagram more interesting as a personal personality type indicator, partly because it acknowledges a range of personality development, from health to unhealthy. As well as gifts and aptitudes, we all have some unhelpful thought patterns, areas we’d like to improve, issues we struggle with. Most personality measures gloss over these.

    “If my devils are to leave me, I am afraid my angels will take flight as well.”
    – Rainer Maria Rilke

    The Enneagramm certainly has its critics, but I recognize my self in the description of my type. My fears and my desires, my angels and my devils.

    My type may, at best …

    Level 1: Become visionaries, broadly comprehending the world while penetrating it profoundly. Open-minded, take things in whole, in their true context. Make pioneering discoveries and find entirely new ways of doing and perceiving things.

    or, at worst …

    Level 9: Seeking oblivion, they may commit suicide or have a psychotic break with reality. Deranged, explosively self-destructive, with schizophrenic overtones. Generally corresponds to the Schizoid Avoidant and Schizotypal personality disorders.

    I’m not close to either of those extremes. But I recognize both of them.

    “By enhancing one’s self awareness with the help of the Enneagram, one can exercise more choice about one’s functioning rather than engaging in patterns of thought, emotion, and behavior in an automatic, habitual, unconscious way”

    Go find your self!

    Postscript: I paused a day or so before posting the above, wondering if it was wise. Wondering, especially, if it was likely to be misunderstood. Now, after a couple more days, I think it’s worth adding a little postscript.

    I approach my self and my life (mental, physical, emotional, and spiritual) with the same curiosity and interest with which I approach my work. The engineer part of me wants to know how it works. How I work. How the pieces of my life fit together.

    … to let no day pass without discussing goodness and all the other subjects about which you hear me talking and examining both myself and others, [this] is really the very best thing that a man (or women) can do, and that life without this sort of examination is not worth living …”
    Socrates.