Using psql Variables to Introspect Your Script
psqlis by far my favourite SQL text client, it has features that even the most expensive database tools provide. One very interesting property of
psqlis to support internal variables, pretty much like the variables you can find in a shell. Since I often find myself doing some queries to get information about a transaction, in term of time and quantity of data manipulated, and doing manually the math, I decided that
psqlcan do this for me by means of variables.
The Use Case: Quantitative Data About a TransactionI want to run a long transaction that does some data manipulation and transformation, and I want to get an idea about how much it is going to
costme such a transaction, so that I can estimate how to apply the same transformation in production. Usually, I begin the transaction having a look at the current time and WAL position, and I do the same at the end of the transaction. Doing the difference between the values provides me an hint about the wall clock time and the amount of data (assuming no other activity is going on the database). As an example:
testdb=> BEGIN; BEGIN testdb=*> SELECT clock_timestamp() AS begin_clock testdb-*> , pg_current_wal_lsn() AS begin_lsn; begin_clock | begin_lsn ------------------------------+------------ 2023-09-29 10:32:05.51654+02 | 2/A39CC3C0 (1 row) testdb=*> INSERT INTO t( t ) testdb-*> SELECT 'Dummy ' || v testdb-*> FROM generate_series( 1, 1000000 ) v; INSERT 0 1000000 testdb=*> SELECT clock_timestamp() AS end_clock , pg_current_wal_lsn() AS end_lsn; end_clock | end_lsn -------------------------------+------------ 2023-09-29 10:32:48.511892+02 | 2/A81AC000 (1 row) testdb=*> COMMIT;
Now that I have the times and WAL lsn positions, I can manually compute the cost of this transaction by copying and pasting the results:
testdb=> SELECT '2023-09-29 10:32:48.511892+02'::timestamp - '2023-09-29 10:32:05.51654+02'::timestamp AS wall_clock , pg_size_pretty( pg_wal_lsn_diff( '2/A81AC000', '2/A39CC3C0' ) ) as size; wall_clock | size -----------------+------- 00:00:42.995352 | 72 MB
So the transaction took
42seconds and produced around
72 MBof data (in the WALs). Note that I had to manually copy and paste every single value in order for the query to compute the difference I want.
If I store the begin and end values into
psql variables to obtain the computation automatically
psqlvariables, I can use an immutable query to compute the same results, without having to copy and paste the single values. This trick is made possible by the special command
\gset, that allows for the declaration and definition of variables out of a query result.
testdb=> BEGIN; BEGIN testdb=*> SELECT clock_timestamp() AS clock , pg_current_wal_lsn() AS lsn \gset begin_ testdb=*> INSERT INTO t( t ) SELECT 'Dummy ' || v FROM generate_series( 1, 1000000 ) v; INSERT 0 1000000 testdb=*> SELECT clock_timestamp() AS clock , pg_current_wal_lsn() AS lsn \gset end_ testdb=*> SELECT :'end_clock'::timestamp - :'begin_clock'::timestamp as wall_clock , pg_size_pretty( pg_wal_lsn_diff( :'end_lsn', :'begin_lsn' ) ) as size; wall_clock | size -----------------+------- 00:00:11.400421 | 72 MB testdb=*> COMMIT; COMMIT
The two query to get the timing and WAL lsn informations are similar, and exploit a
\gset end_command respectively. The first command takes the output of the query and, for each column, creates a variable with the given prefix (
begin_) and the column name, therefore
begin_lsn. The second query does the very same with the prefix
end_, therefore creating
end_lsnvariables. The interesting part is the last query, that by now is totally automated and performs the differences between
start_values (please note the quoting and casting). Thanks to this little trick, I can now place such queries at the boundaries of my scripts and get as output the result I want or need to monitor the transaction. Clearly, this approach can be extended, so you can have variables to track the number of tuples, the number of tables created or deleted, and so on. The key idea is to have a kind of catch-all set of queries that depend on variables you will define systematically in your scripts.
Why is the second transacction faster than the first one?In the above example I shown two identical transactions, but the first one is slower, in terms of execution time, than the second one. The answer is simple: in the first transaction I was literally typing in the SQL statements, while in the second I was recalling them from the
psqlhistory. It is only a matter of typing the statements!
ConclusionsWhen I do professional training and present the
psqlcommand line client I see disappointment in my trainee faces. However, the more I go on explaining how flexible and powerful
psqlis, the more the classroom likes it. Thanks to the capabiliy of automagically set variables from a query output,
psqlallows you to automate some tasks including your own script introspection.