Wishlist of PL/Perl Enhancements for PostgreSQL 8.5

I’m working with PostgreSQL for my day job, and liking it.

We’re fairly heavy users of stored procedures implemented in PL/Perl, with ~10,000 lines in ~100 functions (some of which have bloated to painful proportions). This creates some interesting issues and challenges for us.

There’s a window of opportunity now to make improvements to PL/Perl for PostgreSQL 8.5. I’m planning to work with Andrew Dunstan to agree on a set of changes and develop the patches.

As a first step along that road I want to map out here the changes I’m thinking of and to ask for comments and suggestions.

Goals:

  • Enable modular programming by pre-loading user libraries.
  • Soften the hard choice between plperl and plperlu, so there’s less reason to “give up” and use plperlu.
  • Improve performance.
  • Improve flexibility for future changes.
  • Enable use of tracing/debugging tools.

Specific Proposals:

  1. Enable configuration of perl at initialization
  2. Add ability to specify in postgresql.conf some code to be run when a perl interpreter is initialized. For example:

    plperl.at_init_do = 'use lib qw(/path/to/mylib); use MyPlPerlUtils; use List::Util qw(sum);'

  3. Configure extra items to be shared with the Safe compartment
  4. The Safe compartment used for plperl functions can’t access any namespace outside the compartment. So, by default, any subroutines defined by libraries loaded via plperl.at_init_do won’t be callable from plperl functions.

    Some mechanism is needed to specify which extra subroutines, and/or variables, should be shared with the Safe compartment. For example:

    plperl.safe_share = '$foo, myfunc, sum'

  5. Permit some more opcodes in the Safe compartment
  6. I’d like to add the following opcodes to the set of opcodes permitted in the Safe compartment: caller, dbstate, tms.

  7. Execute END blocks at process end
  8. Currently PostgreSQL doesn’t execute END blocks when the backend postgres process exits (oddly, it actually executes them immediately after initializing the interpreter). Fixing that would greatly simplify use of tools like NYTProf that need to know when the interpreter is exiting. Updated: used to say “at server shutdown” which was wrong.

  9. Name PL/Perl functions
  10. Currently PL/Perl functions are compiled as anonymous subroutines. Applying the same technique as the Sub::Name module would allow them have ‘more useful’ names than the current ‘__ANON__’.

    For a PL/Perl function called “foo”, a minimal implementation would use a name like “foo__id54321” where 54321 is the oid of the function. This avoids having to deal with polymorphic functions (where multiple functions have the same name but different arguments).

    The names won’t enable inter-function calling and may not even be installed in the symbol table. They’re just to improve error messages and to enable use of tools like Devel::NYTProf:: PgPLPerl (as yet unreleased).

  11. Miscellaneous updates to the PL/Perl documentation
  12. To document the new functionality and expand/update the related text.

  13. Improve Performance
  14. It seems likely that there’s room for improvement. Some code profiling is needed first, though, so I’ll leave this one vague for now.

Any comments on the above?
Anything you’d like to add?

If so, speak up, time is short!


Footnote

For completeness I’ll mention that I was thinking of adding a way to permit extra opcodes (plperl.safe_permit=’caller’) and a way to use a subclass of the Safe module (plperl.safe_class=’MySafe’). I dropped them because I felt the risks of subtle security issues outweighed the benefits. Any requirements for which these proposals seem like a good fit can also be met via plperl.at_init_do and plperl.safe_share.

14 thoughts on “Wishlist of PL/Perl Enhancements for PostgreSQL 8.5

  1. Currently PostgreSQL doesn’t execute END blocks at server shutdown

    You do realize that this would be impossible, right? That interpreters are created per-connection and no interpreter state can be preserved past the end of the session (each session is a separate process)?

    Most of the rest of your points would be better addresses at the perl level (Safe is a joke compared to, for example, TCL’s safe interpreters).

    • Thank you, yes, I mean at the end of the process. I’m not sure what you mean by “Most of the rest of your points would be better addressed at the perl level”.

      • Points 1-3 are primarily attempts to work around the lack of features that should have been in Safe.pm in the first place.

        For comparison, in TCL if you create a safe interpreter using ::safe::interpCreate, then the safe interpreter can load modules etc. in the normal way, subject to (a) the specified mapping between host filenames and the virtual filenames seen inside the safe interpreter, and (b) the fact that the loaded modules are still running inside the safe interpreter and have no more access than it does.

        A proper system of external interfaces for Safe would obviate the need for many of the features you are requesting.

        Incidentally, is the sort{} + threads + Safe bug (#60374) fixed yet? (and if it is, why is it still open?)

        • It’s not fixed in 5.10.1. I’ve asked for an update on the status. Meanwhile I’ve added a comment on the ticket with a simple workaround that would be effective for plperl and that I’ll include in my changes.

  2. I don’t have anything smart to write here – but I wanted to express my support, especially for the point 1 above. It was a long time ago that I learned about PL/Perl and I was initially thrilled about the possibilities – but after I learned about the constraints for loading modules I realized that it’s application is very limited.

  3. Suggested change: rename plperl.at_init_do to plperl.at_perl_init_do and add plperl.at_safe_init_do which would be similar but specify code to run when the Safe compartment is initialized. It could thus be set by per-user/per-role GUC.

  4. Item 1 could be controversial. It’s undoubtedly useful, but it creates an action at a distance that effectively invites users to create mutually incompatible PL/Perl installation. Ideally, you would want users to create libraries of reusable PL/Perl functions, but when those only work with certain at_init_do settings, then you create a big mess. Look at PHP; they have done something quite similar with their php.ini.

    • I see your point. What’s needed is some equivalent of a “use” statement that can be put at the start of plperl functions. I have a few ideas – I’ll give it some thought. Thanks.

  5. Did you consider support of pg’s internal type in perl ? I imagine a lot of people will benefit writing support functions for new operators, instead of using C.

  6. Pingback: NYTProf v4 – Now with string-eval x-ray vision! « Not this…

  7. Do you know whether there are any plperl performance improvements in Postgres 9.x?

    I’m running an experiment where I store json data in a big (~5KB) text column in Postgres. I wrote functions using plperlu to extract values out of the json using jsonpath expressions (making use of existing json and jsonpath libraries for Perl). I then created functional indexes (using these functions) to avoid having to actually run the functions at query-time (as long as the query has a matching expression). It all works, but performance is not where I’d like it to be. In my test setup it takes ~2 ms to insert the large record. If I add one functional index, it takes ~11 ms. If I add two indexes, it takes ~19 ms. In other words, it takes about 8 ms for each execution of the plperl function. If migrating to 9.x would solve this overhead, awesome. If not, I wonder: are any standard ways to tune or analyze plperl functions?

    thanks!

    –S

    • Yes, there are some plperl performance improvements in Postgres 9.x. You’ll also get more performance by using a Perl that was configured without threads (you may have to build and install a separate perl for that and then configure a build PostgreSQL to use that). Neither will “solve” the overhead, but they’ll reduce it.

      You should also ask for help on the pgsql-performance mailing list (after searching the archives http://archives.postgresql.org/pgsql-performance/)

      Good luck.

      • Thanks Tim!

        I dug in deeper and discovered that the majority of the overhead was attributed to a single line of Perl code used to deserialize the json string into an object. I then discovered the JSON::XS library which reduced the cost dramatically. It appears that in my case the overhead of calling plperl is insignificant compared to the cost of the actual work I’m doing in the plperl function.

Comments are closed.