A simple example of LATERAL use

A few days ago I found a question by a user on Facebook: how to select events from a table where they are no more than 10 minutes one from another?
My first answer was related to LATERAL, and this post I try to represent with an example how I understood and could solve the above question.

First of all, let’s build an events table, where each row has a timestamp.

testdb=> CREATE TABLE events(
  pk int generated always as identity
  , event text
  , ts timestamp default CURRENT_TIMESTAMP
  , PRIMARY KEY( pk )
);

CREATE TABLE


Now, let’s populate the table with “random” data so that there are events spread in a range of 2 minutes each:

testdb=> insert into events( event, ts )
select 'event #' || v, current_timestamp - ( ( v * 2 ) || ' minutes' )::interval
from generate_series( 1, 100 ) v;

INSERT 0 100


Having setup the table and the data, how can we relate every tuple with other events that are not outside a ten minutes window? LATERAL comes to the rescue.

testdb=> SELECT e1.pk, e1.event, e2.*  
         FROM events e1, 
         LATERAL ( SELECT pk, event, ts - e1.ts as time_elapsed 
                   FROM events 
                   WHERE pk <> e1.pk AND e1.ts - ts <= '10 minutes'::interval ) e2 
        ORDER BY e1.pk LIMIT 20;
        
 pk  |  event   | pk  |  event   | time_elapsed 
-----+----------+-----+----------+--------------
 501 | event #1 | 502 | event #2 | -00:02:00
 501 | event #1 | 503 | event #3 | -00:04:00
 501 | event #1 | 504 | event #4 | -00:06:00
 501 | event #1 | 505 | event #5 | -00:08:00
 501 | event #1 | 506 | event #6 | -00:10:00
 502 | event #2 | 501 | event #1 | 00:02:00
 502 | event #2 | 503 | event #3 | -00:02:00
 502 | event #2 | 504 | event #4 | -00:04:00
 502 | event #2 | 505 | event #5 | -00:06:00
 502 | event #2 | 506 | event #6 | -00:08:00
 502 | event #2 | 507 | event #7 | -00:10:00
 503 | event #3 | 501 | event #1 | 00:04:00
 503 | event #3 | 502 | event #2 | 00:02:00
 503 | event #3 | 504 | event #4 | -00:02:00
 503 | event #3 | 505 | event #5 | -00:04:00
 503 | event #3 | 506 | event #6 | -00:06:00
 503 | event #3 | 507 | event #7 | -00:08:00
 503 | event #3 | 508 | event #8 | -00:10:00
 504 | event #4 | 501 | event #1 | 00:06:00
 504 | event #4 | 502 | event #2 | 00:04:00
(20 rows)



Let’s disassemble the query and see how it works. The subquery selects all the tuples that are within a 10 minutes range and that are different from the query the system is currently evaluating (i.e., e1.pk). But usually a subquery is evaluated once for the outer query, but note that the subquery is prefixed with LATERAL that, in simple words, means evaluate the subquery for every row of the outer result set. This means that the LATERAL subquery can access the outer query row, and can “reason” about its own result set.
An important thing to keep in mind while dealing with LATERAL is that the subquery must be referenced with an alias, in my case e2. Please note that within the LATERAL subquery I do compute the time difference between the timestamp of the outer tuple and the one of the inner result set, and as you can see from the output column time_elapsed every row differs by 2 minutes, that is how we generated the rows.

What happens if you don’t use LATERAL? Well, you cannot reference the e1 outer tuple, that is there is no way for a subquery to cross-reference something outside of its scope:

testdb=> SELECT e1.pk, e1.event, e2.*  
         FROM events e1,  
         ( SELECT pk, event, ts - e1.ts as time_elapsed 
           FROM events WHERE pk <> e1.pk AND e1.ts - ts <= '10 minutes'::interval ) e2 
         ORDER BY e1.pk LIMIT 20;
ERROR:  invalid reference to FROM-clause entry for table "e1"
LINE 1: ..., e2.*  FROM events e1,  ( SELECT pk, event, ts - e1.ts as t...
                                                             ^
HINT:  There is an entry for table "e1", but it cannot be referenced from this part of the query.



As you can see, PostgreSQL clearly states that you cannot refer to e1 (the outer tuple) from within the scope of the subquery.

LATERAL Joins

It is, of course, possible to use LATERAL in a join, and in this case the above query can be rewritten as:

testdb=> SELECT e1.pk, e1.event, e2.*  
         FROM events e1 JOIN  LATERAL 
            ( SELECT pk, event, ts - e1.ts as time_elapsed 
              FROM events WHERE pk <> e1.pk AND e1.ts - ts <= '10 minutes'::interval ) e2 
        ON true  
        ORDER BY e1.pk LIMIT 20;
 pk  |  event   | pk  |  event   | time_elapsed 
-----+----------+-----+----------+--------------
 501 | event #1 | 502 | event #2 | -00:02:00
 501 | event #1 | 503 | event #3 | -00:04:00
 501 | event #1 | 504 | event #4 | -00:06:00
 501 | event #1 | 505 | event #5 | -00:08:00
 501 | event #1 | 506 | event #6 | -00:10:00
 502 | event #2 | 501 | event #1 | 00:02:00
 502 | event #2 | 503 | event #3 | -00:02:00
 502 | event #2 | 504 | event #4 | -00:04:00
 502 | event #2 | 505 | event #5 | -00:06:00
 502 | event #2 | 506 | event #6 | -00:08:00
 502 | event #2 | 507 | event #7 | -00:10:00
 503 | event #3 | 501 | event #1 | 00:04:00
 503 | event #3 | 502 | event #2 | 00:02:00
 503 | event #3 | 504 | event #4 | -00:02:00
 503 | event #3 | 505 | event #5 | -00:04:00
 503 | event #3 | 506 | event #6 | -00:06:00
 503 | event #3 | 507 | event #7 | -00:08:00
 503 | event #3 | 508 | event #8 | -00:10:00
 504 | event #4 | 501 | event #1 | 00:06:00
 504 | event #4 | 502 | event #2 | 00:04:00
(20 rows)


Conclusions

LATERAL is a very powerful SQL operator in PostgreSQL, and can help solving problems you would normally solve by means of cursors and iterations.

The article A simple example of LATERAL use has been posted by Luca Ferrari on August 7, 2021