Perl code reuse in Pl/Perl thru pg_proc and anonymous code blocks
PostgreSQL allows you to write executable code, e.g.,FUNCTION
s and PROCEDURE
s in Perl thru its extension language Pl/Perl (plperl
and plperlu
).
But sometimes there is the need to use the same Perl block over and over again across different code blocks and functions.
There are different approaches, most notably the module one: abstract your behavior into a module and load it whenever you need. Yeah, this means using
plperlu
, but it is a fair tradeoff.
However, keeping in mind how PostgreSQL stores procedures and their code, it is possible to use a more fancy approach. In this post I show you a couple of simple examples as proofs of concept, clearly in order to push this into production there is the need for a more sophisticated approach.
An easy function in Pl/Perl
Let’s start simple: a Pl/Perl function to say prime numbers.CREATE OR REPLACE FUNCTION
fluca.is_prime( int )
RETURNS bool
AS $CODE$
return 1 if $_[0] <= 2;
for my $i ( 2 .. $_[0] - 1 ) {
return 0 if $_[0] % $i == 0;
}
return 1;
$CODE$
LANGUAGE plperl;
Quite simple, uh? Now imagine that we need to prepare another function that needs to generate prime numbers, and thus needs to know if a given number is prime or not.
One approach could be to call the above
fluca.is_prime()
function, but this will slow down the whole process. But after all, this is the building block logic on functions!
Another approach could be to take apart the above bunch of Perl code, create a module, and use it wherever needed. But it is not the approach followed here.
Again, another approach could be to store a block reference into the
%_SHARED
global hash.
Last, why not querying the catalog
pg_proc
to extract the source code of the above function and wrap it into another Perl anonymous code block? It goes like this:
CREATE OR REPLACE FUNCTION
fluca.generate_primes_up_to( int )
RETURNS SETOF int
AS $CODE$
my $query = "select prosrc from pg_proc where proname = 'is_prime' and pronamespace = ( select oid from pg_namespace where nspname = 'fluca' );";
my $code = spi_exec_query( $query, 1 )->{ rows }[ 0 ]->{ prosrc };
my $is_prime = eval( "sub { $code }; " );
for my $n ( 1 .. $_[0] ) {
return_next( $n ) if $is_prime->( $n );
}
return undef;
$CODE$
LANGUAGE plperl;
The
$query
statement selects the pg_proc.prosrc
text field that contains the source code, whatever you have written between $CODE$
separators. That’s because plperl
is a pl language, therefore its source code is stored in the system catalog.
Having stated that, the
$code
string contains the block of code, so it was like the variable was declared as follows:
my $code = " return 1 if $_[0] <= 2;
for my $i ( 2 .. $_[0] - 1 ) {
return 0 if $_[0] % $i == 0;
}
return 1;";
Seems like a Perl sub, but it is not (yet). There is the need to wrap the bunch of code into a
sub
declaration, and this is the easy part, and then we need to compile it. That’s the task of eval( "sub { $code };" )
, that creates an anonymous subroutine with the source code extracted from the other function.
Such code is stored into a scalar
$is_prime
that is then used as a standard anonymous subroutine via ->
.
And that’s all!
Advantages
The main advantage of the above approach is that whenever a change is done ntofluca.is_prime()
, the same change is immediatly reflected into fluca.generate_primes_up_to()
, because the source code of the former is always queried at the time the latter starts its execution.
Drawbacks
Time!Extracting the code and compiling it every time requires time and resources, so it can be a pitfall for big Perl code blocks. There are different modules that can help in this scenario, e.g.,
[Perl::Parse](https://metacpan.org/pod/Parse::Perl){:target="_blank"}
.
An hidden drawback is that the two functions are not explicitly coupled, so if
fluca.is_prime
is accidentaly deleted, the other function will no more be able to run at all!
The SETOF
problem
Reusing a piece of code that returns a scalar is simple, but what about functions that return sets?
Assume there is the need for a function that returns all the even numbers up to a limit, and does that efficiently, that is returning one value at a time.
CREATE OR REPLACE FUNCTION
fluca.generate_evens( int )
RETURNS SETOF int
AS $CODE$
for ( 1 .. $_[0] ) {
return_next( $_ ) if $_ % 2 == 0;
}
return undef;
$CODE$
LANGUAGE plperl;
While the function is really simple, the problem of the sets arises immediatly: Pl/Perl provides particular ways of interacting with PostgreSQL, and
return_next
is one of such ways. Long story short: return_next
yelds the function adding a new element to the current result set.
Since (regular) Perl does not have a
return_next
operator nor a function, how to translate such code? A very inefficient approach is to put the result set into an array and return the whole array. It is not the same as return_next
, because there is no yelding, but it can work. Therefore, a function that wants to use the previous code could inject an array on the function prologue and substitute return_next
with a regular array returning.
Imagine we want to build a function that computes odd numbers on top of the even ones; the code looks like the following snippet.
CREATE OR REPLACE FUNCTION
fluca.generate_odds( int )
RETURNS SETOF int
AS $CODE$
my $query = "select prosrc from pg_proc where proname = 'generate_evens' and pronamespace = ( select oid from pg_namespace where nspname = 'fluca' );";
my $code = spi_exec_query( $query, 1 )->{ rows }[ 0 ]->{ prosrc };
$code = "my \@return_values;\n" . $code ;
$code =~ s/return_next\s*\(/push( \@return_values,/g;
$code =~ s/return\s+undef\.*;/return \@return_values;/g;
my $generate_evens = eval( "sub { $code }; " );
my @odds = map( { $_ + 1 } $generate_evens->( 10 ) );
for ( @odds ) {
return_next( $_ );
}
return undef;
$CODE$
LANGUAGE plperl;
The base idea is the same as in the previous case: query
pg_proc
to get the source of the function and store it into $code
.
Then, add the declaration for an array, named
@return_values
(a better and unique name should be chosen), and substitue with a regular expression all return_next
with a push
into the above array, removing also any return undef
(that in PlPerl is the way to end the result set).
Yeah, I hear you screaming! This is surely something not to do in production, but it is a simple and dirty way to make Perl do what you want.
As in the previous case, store the result of
eval
uating the so rewritten $code
into a scalar named $generate_evens
and use it as you prefer.
Danger Will Robinson!
The regex substitution is awful because it will go beyond the scope where
return_next
applies. Imagine to apply the same technique recursively to fluca.generate_odds()
: there is a return_next
level inside the code extracted from pg_proc
and an outer scope with return_next
used within the function itself. The regular expression is not able to find out the scope, so both return_next
will appear similar and will get substituted in the very same manner. And that’s why you should not use such approach in production! Again, there are Perl modules to get rid of these details and get things done in the right way.
Conclusions
Perl is great. It allows you to build dynamic code in a very dynamic way.PostgreSQL is great. It allows you to inspect every single part of the system, including executable code.
The Perl power to push some code out of a PostgreSQL table (
pg_proc
) into a scalar, so to use it later on, allows for code sharing among Pl/Perl functions and routines.
It is up to you to decide to shoot yourself in the foot or hit something valuable!