Pentagon numbers
Since a few weeks, I tend to implement some of the tasks for the Perl Weekly Challenge into PostgreSQL specific code. One interesting problem has been this week task 2 of the Challenge 147: finding out a couple of pentagon numbers that have simultaneously a sum and a diff that is another pentagon number.In this post, I discuss two possible solutions to the task.
What is a Pentagon Number?
A pentagon number is defined as the value of the expressionn * ( 3 * n - 1 ) / 2
, therefore the pentagon number corresponding to 3
is 12
.
The task required to find out a couple of pentagon numbers so that:
P(n1) + P(n2) = P(x)
P(n1) - P(n2) = P(y)
It does not matter what
x
and y
are, but n1
and n2
must be pentagon numbers and both their sum and diff must be pentagon numbers too.
The first approach: a record based function
The first solution I came with was inspired by the solution I provided in Raku, and is quite frankly a kind of record-based approach.Firs of all, I define an
IMMUTABLE
function named f_pentagon
that computes the pentagon number value starting from a given number, so that f_pentagon( 3 )
returns 12
. Why do I need a function? Because I want to implement a table with a stored virtual column to keep track of numbers and their pentagon values.
For that reason, I created a
pentagons
table with a generic n
column that represents the starting value and the p
column that represents the computed pentagon value.
CREATE OR REPLACE FUNCTION
f_pentagon( n bigint )
RETURNS bigint
AS
$CODE$
SELECT ( n * ( 3 * n - 1 ) / 2 );
$CODE$
LANGUAGE sql
IMMUTABLE;
DROP TABLE IF EXISTS pentagons;
CREATE TABLE pentagons
(
n bigint
, p bigint GENERATED ALWAYS AS ( f_pentagon( n ) ) STORED
);
INSERT INTO pentagons( n )
SELECT generate_series( 1, 5000 );
I inserted into the table
5000
records because I know, from the Raku solution, that what I’m looking for is within such range of values. It is, of course, possible to increase that limit to find out other values.
The table content looks therefore like the following:
testdb=> select * from pentagons limit 10;
n | p
----+-----
1 | 1
2 | 5
3 | 12
4 | 22
5 | 35
6 | 51
7 | 70
8 | 92
9 | 117
10 | 145
Now it is possible to implement a function, named
f_pentagon_pairs
that seeks the above table searching for the required values. The table returns a TABLE
, even if only one row will be returned, but since I want to output multiple values, I decided to implement it as a row level returning function. In particular, the returned information is:
n1
is the first number;n2
is the second number;s
is the sum of the pentagons, that isP(n1) + P(n2)
;d
is the difference of the pentagons, that isabs( P(n1) - P(n2) )
;ps
is the number which pentagon corresponds to the sum of the two pentagons, that isP(ps) = P(n1) + P(n2)
;pd
is the number which pentagon corresponds to the difference of the two pentagons, that isP(pd) = abs( P(n1) - P(n2) )
;
The function is the following one:
CREATE OR REPLACE FUNCTION
f_pentagons_pairs()
RETURNS TABLE ( n1 bigint, n2 bigint, s bigint, d bigint, ps bigint, pd bigint )
AS $CODE$
DECLARE
current_tuple pentagons%rowtype;
other_tuple pentagons%rowtype;
fnd int := 0;
BEGIN
FOR current_tuple IN SELECT * FROM pentagons ORDER BY n LOOP
SELECT *
INTO other_tuple
FROM pentagons pp
WHERE EXISTS(
SELECT *
FROM pentagons ps
WHERE ps.p = current_tuple.p + pp.p
)
AND EXISTS (
SELECT *
FROM pentagons ps
WHERE ps.p = abs( current_tuple.p - pp.p )
);
IF FOUND THEN
SELECT current_tuple.n
, other_tuple.n
, current_tuple.p + other_tuple.p
, abs( current_tuple.p - other_tuple.p )
, p1.n
, p2.n
INTO n1, n2, s, d, ps, pd
FROM pentagons p1, pentagons p2
WHERE p1.p = current_tuple.p + other_tuple.p
AND p2.p = abs( current_tuple.p - other_tuple.p );
RAISE INFO 'P(%) + P(%) = P(%) = %',
n1, n2, ps, s;
RAISE INFO 'P(%) - P(%) = P(%) = %',
n1, n2, pd, d;
fnd := fnd + 1;
RETURN NEXT;
RETURN;
END IF;
END LOOP;
RETURN;
END
$CODE$
LANGUAGE plpgsql;
It is quite simple to understand:
- it performs a one-record-at-time loop placing every row of
pentagons
intocurrent_tuple
; - it searches for an
other_tuple
inpentagons
so that the sum and the differenceEXISTS
inpentagons
at the very same time. This means that theother_tuple
andcurrent_tuple
lead to a sum and a difference that is still another pentagon number; - when such tuple is
FOUND
, the output tuple is built.
In order to get the reverse values that lead to the sum and difference, I do another double join with
pentagons
to get out the result.
The
RAISE
instructions are placed only to provide a textual representation of the expressions.
Launching the function on a very little virtual machine, busy in doing other stuff, results in:
testdb=> SELECT * FROM f_pentagons_pairs();
INFO: P(1020) + P(2167) = P(2395) = 8602840
INFO: P(1020) - P(2167) = P(1912) = 5482660
n1 | n2 | s | d | ps | pd
------+------+---------+---------+------+------
1020 | 2167 | 8602840 | 5482660 | 2395 | 1912
(1 row)
Time: 3346,886 ms (00:03,347)
A CTE Approach
Is there the need for thepentagons
table? Uhm…it is possible to materialize the same set of data with a recursive CTE.
And, therefore, it is possible to move the query at the outer level so that there is no need to perform a record-by-record scan.
After all, SQL is a set based language!
WITH RECURSIVE pentagons( n, p )
AS
(
SELECT 1 AS n
, f_pentagon( 1 ) AS p
UNION
SELECT p.n + 1
, f_pentagon( p.n + 1 )
FROM pentagons p
WHERE p.n < 5000
)
SELECT format( '%s, %s', l.n, r.n ) AS pentagon_pairs
FROM pentagons l, pentagons r
WHERE EXISTS(
SELECT *
FROM pentagons ps
WHERE ps.p = l.p + r.p
)
AND EXISTS (
SELECT *
FROM pentagons ps
WHERE ps.p = abs( l.p - r.p )
)
;
The query executes in a little less time than the approach using the table and the record-based function:
pentagon_pairs
----------------
1020, 2167
2167, 1020
(2 rows)
Time: 5820,066 ms (00:05,820)
There are some details that are harder to tune with the CTE approach, most notably the reverse lookup of the resulting base numbers and the exclusion of the duplicated row. However, it is possible to tune it to your needs.
Why is the CTE approach require more time than the function approach? Well, even if the times are similar, the function terminates as soon as it finds a solution, while the CTE does not, and therefore scans the whole dataset.
Plans!
Timing is not as much difference as it could seem at glance, and effectively the two approaches are comparable with regard to performances. The execution plans are, of course, a lot more different since the function approach works as a black box:testdb=> EXPLAIN ANALYZE SELECT * FROM f_pentagons_pairs();
INFO: P(1020) + P(2167) = P(2395) = 8602840
INFO: P(1020) - P(2167) = P(1912) = 5482660
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------
Function Scan on f_pentagons_pairs (cost=0.25..10.25 rows=1000 width=48) (actual time=4754.081..4754.082 rows=1 loops=1)
Planning Time: 0.047 ms
Execution Time: 4856.988 ms
(3 rows)
Time: 4909,165 ms (00:04,909)
testdb=> EXPLAIN ANALYZE WITH RECURSIVE pentagons( n, p )
AS
(
SELECT 1 AS n
, f_pentagon( 1 ) AS p
UNION
SELECT p.n + 1
, f_pentagon( p.n + 1 )
FROM pentagons p
WHERE p.n < 5000
)
SELECT format( '%s, %s', l.n, r.n ) AS pentagon_pairs
FROM pentagons l, pentagons r
WHERE EXISTS(
SELECT *
FROM pentagons ps
WHERE ps.p = l.p + r.p
)
AND EXISTS (
SELECT *
FROM pentagons ps
WHERE ps.p = abs( l.p - r.p )
)
;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Hash Semi Join (cost=5.57..30.98 rows=23 width=32) (actual time=2659.801..10415.225 rows=2 loops=1)
Hash Cond: (abs((l.p - r.p)) = ps_1.p)
CTE pentagons
-> Recursive Union (cost=0.00..3.56 rows=31 width=12) (actual time=0.002..21.821 rows=5000 loops=1)
-> Result (cost=0.00..0.01 rows=1 width=12) (actual time=0.001..0.001 rows=1 loops=1)
-> WorkTable Scan on pentagons p (cost=0.00..0.29 rows=3 width=12) (actual time=0.000..0.000 rows=1 loops=5000)
Filter: (n < 5000)
Rows Removed by Filter: 0
-> Hash Semi Join (cost=1.01..25.63 rows=149 width=24) (actual time=361.447..10156.895 rows=5341 loops=1)
Hash Cond: ((l.p + r.p) = ps.p)
-> Nested Loop (cost=0.00..20.15 rows=961 width=24) (actual time=0.005..6323.391 rows=25000000 loops=1)
-> CTE Scan on pentagons l (cost=0.00..0.62 rows=31 width=12) (actual time=0.002..0.671 rows=5000 loops=1)
-> CTE Scan on pentagons r (cost=0.00..0.62 rows=31 width=12) (actual time=0.000..0.546 rows=5000 loops=5000)
-> Hash (cost=0.62..0.62 rows=31 width=8) (actual time=354.101..354.101 rows=5000 loops=1)
Buckets: 8192 (originally 1024) Batches: 1 (originally 1) Memory Usage: 260kB
-> CTE Scan on pentagons ps (cost=0.00..0.62 rows=31 width=8) (actual time=0.001..0.405 rows=5000 loops=1)
-> Hash (cost=0.62..0.62 rows=31 width=8) (actual time=255.920..255.920 rows=5000 loops=1)
Buckets: 8192 (originally 1024) Batches: 1 (originally 1) Memory Usage: 260kB
-> CTE Scan on pentagons ps_1 (cost=0.00..0.62 rows=31 width=8) (actual time=0.004..49.709 rows=5000 loops=1)
Planning Time: 0.195 ms
Execution Time: 10431.113 ms
(21 rows)
Time: 10509,619 ms (00:10,510)
Changing the CTE between
MATERIALIZED
and NOT MATERIALIZED
does not produce any sensible change, of course.
Creating an index on
pentagons(p)
makes the function approach a little faster, but not very much faster since it is used only in the final part of the function.
A query only approach
Having thepentagons
table in place, it is possible to use it as the materialization of the CTE, thus pushing the query out of the function and not within the CTE:
testdb=> SELECT format( '%s, %s', l.n, r.n ) AS pentagon_pairs
FROM pentagons l, pentagons r
WHERE EXISTS(
SELECT *
FROM pentagons ps
WHERE ps.p = l.p + r.p
)
AND EXISTS (
SELECT *
FROM pentagons ps
WHERE ps.p = abs( l.p - r.p )
)
;
pentagon_pairs
----------------
1020, 2167
2167, 1020
(2 rows)
Time: 5024,468 ms (00:05,024)
It is possible to push some
LIMIT 1
into the subqueries, so to force them to terminate as soon as a match is found, and this slightly improves the speed of the whole query:
testdb=> SELECT format( '%s, %s', l.n, r.n ) AS pentagon_pairs
FROM pentagons l, pentagons r
WHERE EXISTS(
SELECT *
FROM pentagons ps
WHERE ps.p = l.p + r.p LIMIT 1
)
AND EXISTS (
SELECT *
FROM pentagons ps
WHERE ps.p = abs( l.p - r.p ) LIMIT 1
)
;
pentagon_pairs
----------------
1020, 2167
2167, 1020
(2 rows)
Time: 4328,603 ms (00:04,329)
The number of rows on the table is, however, too small for triggering the usage of the index, even forcing an
ORDER BY
.
Even an including index, that could cover all the columns, will not help in this case.
Conclusiops
There is more than one way to do it!No, sorry, this is not Perl, but PostgreSQL! However, given a specific problem, PostgreSQL provides a lot of fun and tools to solve a task.