Estimating row count from explain output…in Perl!
At the end of his blog post, Laurenz Albe shows how to use a quick and dirty function to estimate the number of rows returned by an arbitrary query.While I don’t believe it is often a good idea to judge the size of a query by the optimizer guesses, the approach is interesting. Laurenz shows how to exploit the
JSON
format and query facilities to extract data from the EXPLAIN
output, why not using Perl to crunch the textual data?
So here it is a simple implementation to extract the estimate within Perl:
CREATE OR REPLACE FUNCTION plperl_row_estimate( query text )
RETURNS BIGINT
AS $PERL$
my ( $query ) = @_;
return 0 if ( ! $query );
$query = sprintf "EXPLAIN (FORMAT YAML) %s", $query;
elog( DEBUG, "Estimating from [$query]" );
my @estimated_rows = map { s/Plan Rows:\s+(\d+)$/$1/; $_ }
grep { $_ =~ /Plan Rows:/ }
split( "\n", spi_exec_query( $query )->{ rows }[ 0 ]->{ "QUERY PLAN" } );
return 0 if ( ! @estimated_rows );
return $estimated_rows[ 0 ];
$PERL$
LANGUAGE plperl;
testdb=> select plperl_row_estimate( 'SELECT p.* FROM persona p JOIN persona k on k.pk = p.pk WHERE k.eta = 40' );
plperl_row_estimate
---------------------
69500
my @estimated_rows = map { s/Plan Rows:\s+(\d+)$/$1/; $_ }
grep { $_ =~ /Plan Rows:/ }
split( "\n", spi_exec_query( $query )->{ rows }[ 0 ]->{ "QUERY PLAN" } );
spi_exec_query
an EXPLAIN
is executed and its format, in YAML
is split into an array of strings, one entry per line. Such array, is then passed to grep
to exclude all rows that do not contain information about the row estimation. Last, map
extracts the numeric value from such lines.
After that, therefore, there is an array of
@estimated_rows
entries where each one contains the rows estimatation of each plan node, with the outer node in the begin of the array. Such single position is therefore returned by the function and all the others are dropped away.
As a final note, please consider that such function accepts an arbitrary piece of text and tries to execute it as a query, therefore it must be used carefully to avoid SQL-injection and problems alike.