Upgrading from Perl 5.8

Imagine…

  1. You have a production system, with many different kinds of application services running on many servers, all using the perl 5.8.8 supplied by the system.
  2. You want to upgrade to use perl 5.14.1
  3. You don’t want to change the system perl.
  4. You’re using CPAN modules that are slightly out of date but you can’t upgrade them because newer versions have dependencies that require perl 5.10.
  5. The perl application codebase is large and has poor test coverage.
  6. You want developers to be able to easily test their code with different versions of perl.
  7. You don’t want a risky all-at-once “big bang” upgrade. Individual production installations should be able to use different perl versions, even if only for a few days, and to switch back and forth easily.
  8. You want to simplify future perl upgrades.

I imagine there are lots of people in similar situations.

In this post I want to explore how I’m tackling a similar problem, both for my own benefit and in the hope it’ll be useful to others.

Incremental Upgrades

Perl now has an explicit deprecation policy that requires a mandatory warning for at least one major perl version before a feature is removed. So a feature that’s removed in perl 5.14 will generate a mandatory warning, at compile time if possible, in perl 5.12.

This means we should not jump straight from perl 5.8.8 to 5.14.1. It’s important to test our code with the latest 5.10.x and 5.12.x releases along the way. That way if we do hit a problem it’ll be easier to determine the cause.

This also fits in with our desire to simplify future upgrades. Effectively we’re not doing one perl version upgrade but three, although we may only do one or two actual upgrades on production machines.

Multiple Perls

We want the developers to be able to able to easily test their code with different versions of perl, so we need to allow multiple versions to be installed at the same time. Fortunately perlbrew makes that easy.

We’ll probably have the systems team install ready-built and read-only perlbrew perls on all the machines via scp. We’ll use perlbrew as a way to get a set of perls installed but the actual selecting of a perl via PATH etc. we’ll handle ourselves.

Multiple CPAN Install Trees

Major versions of perl aren’t binary compatible with each other. This means extension modules, like DBI, which were installed for one major version of perl can’t be reused with another.

We keep all the code installed from CPAN in a repository, separate from the perl installation. Perl finds them using PERL5LIB env var and installers install there using the PERL_MB_OPT and PERL_MM_OPT env vars to set it as the ‘install_base’.

Since we want developers to switch easily between perl versions, this means we need multiple CPAN installation directories, one per major perl version. We’ll rebuild and reinstall the extension modules into each immediately after building and installing the corresponding perl version.

If we have to rebuild and reinstall the extension modules then we can easily rebuild and reinstall all our CPAN modules. That way we get to rerun all their test suites against each version of perl plus the specific versions of their prerequisite modules that we’re using.

Reinstalling CPAN Distributions

This is where it gets tricky.

Identifying what CPAN distributions we have installed is fairly easy. You can use tools like CPAN.pm or whatdists.pl to generate a list. But there’s a catch. They’ll only tell you what current distributions you need to install to get the same set of modules. That’s not what we need.

We need a list of the specific distribution versions that are currently installed. It turns out that that information isn’t recorded in the installation and it’s amazingly difficult to recreate reliably. (The perllocal.pod file ought to have this information but isn’t updated by the Module::Build installer and doesn’t record the actual distribution name.)

In an extension of his MyCPAN work, brian d foy is trying to tackle this problem by creating MD5 hashes for the millions of files on BackPAN (the CPAN archive) but there’s still much hard work ahead.

Why do we need the specific versions, why not simply upgrade everything to the latest version first as a separate project? Two reasons.

First, we’re caught by the fact that some latest distributions, either directly or indirectly, require a later version of perl. (David Cantrel’s cpxxxan project offers an interesting approach to this problem. E.g., use http://cp5.8.8an.barnyard.co.uk/ as the CPAN mirror to get a “latest that works on 5.8.8” view. [Thanks to ribasushi++ for the reminder.])

Second, having a complete list of exactly what we have installed also gives us easy reproducibility. Future installs will always yield exactly the same set of files, without risk of silent changes due to new releases on CPAN. The cpxxxan indices for older perls are much less likely to change, but still may. Also, if we upgraded everything to the latest using cp5.8.8an we’d need an extra testing cycle to check for problems with that upgrade before we even start on the perl upgrade.

After contemplating the large, ambitious, and incomplete MyCPAN project, I decided I’d try a distinctly hackish solution to this problem by extending the whatdists.pl script with a perllocal.pod parser and some heuristics. It seems to have worked out well. I’m going to check it by installing the distributions into a different directory and diff’ing that against the original.

If that works out I’ll release the code and write up a blog post about it.

Installing Only Specific CPAN Distributions

Normally when you install a distribution from CPAN you’re happy for the installer to fetch and install the latest version of any prerequisite modules it might need. In our situation we want to install only a specific version of each.

In theory we could arrange that by ordering the list such that the prerequisite modules are installed first. The CPANDB module combined with a topological sort of the requires, test_requires, and build_requires dependencies via the Graph module should do the trick. [Hat tip to ribasushi++ for the CPANDB suggestion.] But there’s a simpler approach…

I’ll probably simply duck that issue by using CPAN::Mini::Inject to create a miniature CPAN that contains only the specific versions of the specific distributions we’re using. Then we can use the cpanm –mirror and –mirror-only options to install from that mini CPAN.

Extending Test Coverage

All the above will give developers the ability to switch perl versions with ease, while keeping exactly the same set of CPAN modules. So now we can turn our attention to testing.

Our test coverage could charitably be described as spotty. Getting it up to a good level across all our code is simply not viable in the short term.

So for now I’m setting a very low goal: simply get all the perl modules and scripts compiled. You could say I’m aiming for 100% “compilation coverage” :-)

This will get all the developers aware of the basic mechanics of testing, like Test::Most and prove and it gives us a good baseline to increase coverage from. More importantly in the short term it let’s us detect any compile-time deprecation warnings as we test with perl 5.10 and 5.12.

To ensure 100% (compilation) coverage I’ll use Devel::Cover to do coverage analysis and write a utility, probably using Devel::Cover::Covered, to find all our perl scripts and modules and check that they have all at least been compiled.

Summary

  • Multiple perl versions, via perlbrew.
  • Multiple identical CPAN install trees, one per major perl version.
  • Proven 100% compilation coverage as a minimum.

So, that’s the plan.

10 thoughts on “Upgrading from Perl 5.8

  1. I think it’s also a good idea to mention Miyagawa’s carton project (https://metacpan.org/module/carton). This would let you explicitly list the dependencies of your project even down to the versions you want and have them installed in a project specific directory. Obviously won’t help with completely legacy code, but once you get the list of what you have installed and their versions, it’s a nice approach so that you never have to do that again.

  2. I didn’t realize that the rule was that you could share installed CPAN modules according to perl _major_ version…
    That’s an interesting tip.
    I’ve always been wary of that.
    I wonder if there are any dists that don’t follow that rule.

    • It’s not a rule. The dists don’t have a choice. The issue is binary compatibility of compiled extensions. When an extension is built against perl 5.X it generates a binary shared library object (typically a .so file on unix). That object file embeds knowledge of the internals of the perl 5.X it was built against. Since perl 5.10.0 it’s guaranteed that minor versions of perl 5.X will remain binary compatible, but the next major release, perl 5.Y, almost certainly won’t be.

  3. Another approach to discovering exactly what versions of what distributions are installed is BackPAN::Version::Discover. I’m trying it out now.

    (On my initial run it had 39 distributions in dists_not_matched, 106 modules in no_dist_found, and seemed to be confused by the archlib, so there were 600+ modules in skipped_modules->bad_mod_info like “x86_64-linux-thread-multi::Moose”.)

  4. One problem still unmentioned: Some modules cause cyclic dependencies. Test::More and Test::Harness are a very icky combination. They require each other. If both are recent enough, you won’t ever see a problem, but trying to update those two on a perl where both are too old wil make you want to rip out your hairs. I bet there are other cyclic problems too, but this one is on the basis of the toolchain.

  5. Pingback: What’s actually installed in that perl library? « Not this…

  6. Thanks Tim for nice write up. In fact, we are still using Perl 5.8.5 on Cent OS 4.6 in production environment. We also use some of the very old CPAN modules, some of them are even more than 10 year old and not updated thereafter. Now, we really need to upgrade to newer Perl with Cent OS 6.x or so. I am sure that your article will help me in up-gradation process. Thanks

  7. Just went through something not too dissimilar to this at the BBC. Luckily, we could get the specific versions of modules off live as they’d all been cpan2rpm-ed before installing, then a trawl through backpan using Schwern’s Backpan::Index module (which requires a recent DBIC, unlike our system, which ribasushi helped to find out why we needed the old version) and the system was largely recreated.

    We looked at carton but it didn’t really scratch the particular itch we had.

Comments are closed.