Ordinality in fuction queries
The PostgreSQLSELECT
statement allows you to query function that return result set (either a SET OF
or TABLE
), that are used as source of tuple for the query itself.
There is nothing surprising about that!
However, the
SELECT
statement, when invoked against a function that provides a result set, allows an extra clause to appear: [WITH ORDINALITY](https://www.postgresql.org/docs/14/sql-select.html){:target="_blank"}
. This clause adds a column to the result set with a numerator (of type bigint
) representing the number of the tuple as got from the function.
Why is this important? Because you don’t need your function to provide by itself a kind of tuple numerator.
WITH ORDINALITY
in action
Let’s take a simple example to understand how it works. Let’s create a function that returns a table:
CREATE OR REPLACE FUNCTION animals( l int DEFAULT 5,
animal text DEFAULT 'cat',
owner text DEFAULT 'nobody' )
RETURNS TABLE( pk int, description text, mood text )
AS $CODE$
DECLARE
i int := 0;
j int := 0;
BEGIN
FOR i IN 1 .. l LOOP
pk := i;
description := format( '%s #%s owned by %s', animal, i, owner );
j := random() * 100;
IF j % 2 = 0 THEN
mood := 'good';
ELSE
mood := 'bad';
END IF;
RAISE DEBUG 'Generating % # % with mood %', animal, i, mood;
RETURN NEXT;
END LOOP;
RETURN;
END
$CODE$
LANGUAGE plpgsql;
The above function
animals()
produced an output with a simple name of the animal (numerated), the index of the generated tuple (i.e., a numerator) and a randomly select mood.
It is clearly easy to test it out:
testdb=> SELECT * FROM animals();
pk | description | mood
----+------------------------+------
1 | cat #1 owned by nobody | bad
2 | cat #2 owned by nobody | good
3 | cat #3 owned by nobody | bad
4 | cat #4 owned by nobody | bad
5 | cat #5 owned by nobody | bad
(5 rows)
The
pk
column contains the numerator of the generated tuple, so we know that the cat #1
tuple has been generated first, the cat #2
as second and so on.
Let’s kick
WITH ORDINALITY
in:
testdb=> SELECT * FROM animals() WITH ORDINALITY;
pk | description | mood | ordinality
----+------------------------+------+------------
1 | cat #1 owned by nobody | good | 1
2 | cat #2 owned by nobody | bad | 2
3 | cat #3 owned by nobody | good | 3
4 | cat #4 owned by nobody | good | 4
5 | cat #5 owned by nobody | good | 5
The
WITH ORDINALITY
clause must follow the function it will be apply onto. Such clause appends a new column to the result set, by default named ordinality
with a progressive numerator. Note how pk
and ordinality
contain the very same value: WITH ORDINALITY
is keeping track for you of the tuple produced by the result set stream (the function), so you don’t need to compute by yourself.
Clearly, this works also with a reordering of the tuples, because the clause does not numerate the appearance of the tuples, rather the instant (or better, the sequence) a tuple has been added to the result set:
testdb=> SELECT * FROM animals() WITH ORDINALITY
ORDER BY random();
pk | description | mood | ordinality
----+------------------------+------+------------
4 | cat #4 owned by nobody | good | 4
2 | cat #2 owned by nobody | good | 2
3 | cat #3 owned by nobody | good | 3
5 | cat #5 owned by nobody | good | 5
1 | cat #1 owned by nobody | good | 1
(5 rows)
It is also possible to rename the
ordinality
column with an alias, like the following:
testdb=> SELECT * FROM animals() WITH ORDINALITY
AS cat(i, name, mood, n)
ORDER BY random();
i | name | mood | n
---+------------------------+------+---
4 | cat #4 owned by nobody | good | 4
1 | cat #1 owned by nobody | good | 1
2 | cat #2 owned by nobody | bad | 2
5 | cat #5 owned by nobody | bad | 5
3 | cat #3 owned by nobody | good | 3
(5 rows)
Clearly, you have to alias the whole result set, not a single column!
WITH ORDINALITY
as a filtering condition
Having the automatically named ordinality
column, or a custom chosen named column, it is possible to add such column to the WHERE
clause of a query:
testdb=> SELECT * FROM animals() WITH ORDINALITY AS cat(i, name, mood, n)
WHERE n % 2 = 0
ORDER BY random();
i | name | mood | n
---+------------------------+------+---
4 | cat #4 owned by nobody | bad | 4
2 | cat #2 owned by nobody | bad | 2
(2 rows)
as you can see, the above query filters on the
n
column to get only even tuples.
WITH ORDINALITY
vs row_number()
You may think that the window function [row_number()](https://www.postgresql.org/docs/14/functions-window.html){:target="_blank"}
does the same job as WITH ORDINALITY
, at least in the function call scenario.
However, the row_number()
window function is a different beast, and can work on a window defined against the result set ordinality. In short, window functions cover a diferent set of problems!
Therefore, even if the following seems to produce the very same result:
testdb=> SELECT *, row_number() OVER () FROM animals() WITH ORDINALITY;
pk | description | mood | ordinality | row_number
----+------------------------+------+------------+------------
1 | cat #1 owned by nobody | good | 1 | 1
2 | cat #2 owned by nobody | bad | 2 | 2
3 | cat #3 owned by nobody | bad | 3 | 3
4 | cat #4 owned by nobody | good | 4 | 4
5 | cat #5 owned by nobody | bad | 5 | 5
(5 rows)
as soon as you define your partition to number in a more specialized way you see different results:
testdb=> SELECT *, row_number() OVER ( order by pk desc ) FROM animals() WITH ORDINALITY;
pk | description | mood | ordinality | row_number
----+------------------------+------+------------+------------
5 | cat #5 owned by nobody | bad | 5 | 1
4 | cat #4 owned by nobody | bad | 4 | 2
3 | cat #3 owned by nobody | bad | 3 | 3
2 | cat #2 owned by nobody | good | 2 | 4
1 | cat #1 owned by nobody | good | 1 | 5
In the above, you can see that the last row produced by the function (this with
ordinality
set to 5
) is the first row encountered by row_number()
.
Another example of different results can be quickly obtained when joining:
testdb=> SELECT *, row_number() OVER ()
FROM animals() WITH ORDINALITY,
generate_series(1, 3) WITH ORDINALITY as x(gs, counter);
pk | description | mood | ordinality | gs | counter | row_number
----+------------------------+------+------------+----+---------+------------
1 | cat #1 owned by nobody | bad | 1 | 1 | 1 | 1
2 | cat #2 owned by nobody | good | 2 | 1 | 1 | 2
3 | cat #3 owned by nobody | good | 3 | 1 | 1 | 3
4 | cat #4 owned by nobody | good | 4 | 1 | 1 | 4
5 | cat #5 owned by nobody | good | 5 | 1 | 1 | 5
1 | cat #1 owned by nobody | bad | 1 | 2 | 2 | 6
2 | cat #2 owned by nobody | good | 2 | 2 | 2 | 7
3 | cat #3 owned by nobody | good | 3 | 2 | 2 | 8
4 | cat #4 owned by nobody | good | 4 | 2 | 2 | 9
5 | cat #5 owned by nobody | good | 5 | 2 | 2 | 10
1 | cat #1 owned by nobody | bad | 1 | 3 | 3 | 11
2 | cat #2 owned by nobody | good | 2 | 3 | 3 | 12
3 | cat #3 owned by nobody | good | 3 | 3 | 3 | 13
4 | cat #4 owned by nobody | good | 4 | 3 | 3 | 14
5 | cat #5 owned by nobody | good | 5 | 3 | 3 | 15
(15 rows)
For every
generate_series()
tuple (column counter
) there are five animals()
tuples (column ordinality
), each one progressively tracked by row_number()
.
Conclusions
Why is this ordinality thing important?It may happen that you are tempted to include into your function result sets some extra information that will ease the post-processing of the result set itself. This practice should be avoided when the “external world” (i.e., the query using the function) is able to add such extra information by itself. You will not waste resources, but also keep your code cleaner and more readable.