A first look at pg_repackI got time to have a look at
pg_repack, an interesting extension that helps removing bloat from tables, indexes and databases with the promise of minimal locking.
Since this is a first look, I could be wrong on some aspects, so please apologize me.
The main idea behind
pg_repackis to perform an on-line copy of a source (bloated) table, then switching the original table with the new one. In short, something like:
BEGIN; CREATE TABLE not_bloated AS SELECT * FROM my_table; ALTER TABLE my_table RENAME TO old_bloated; ALTER TABLE not_bloated RENAME TO my_table; DROP TABLE old_bloated; COMMIT;
Of course things are a lot more complex than the above description, but I think that could be a good summary of what happens.
Why such a workflow would remove bloating?
Well, the idea is that the copy of tuples from the bloated table will, of course, copy only visible tuples (i.e., those that would left after a
VACUUM). In other words, dead tuples are not going to hit the new table and therefore the last will not be bloated.
What about locking?
Since the copy is done on-line, the source table (the bloated one) can be used as usual, that is DML queries can be executed against such table. This of course creates a kind of race-condition, since changes are not propagated automatically to the new table.
To solve the problem,
pg_repackinstalls a trigger that will fire for every DML statement and will log changes to a
repack.logtable, so that
pg_repackwill be able to replay changes at the end of the copy, that is just before switching the tables.
This is important, according to me, because this means that running
pg_repackis not the same as running
VACUUMsince the new table could have a small fraction of bloating. Why? Well, if during the copy the original table is subjected to a workload that can cause bloating (i.e.,
DELETE), such bloating will be propagated to the new table as well. <br/<
What about disk space?
Doing a copy of the original table,
pg_repackis going to require at least double the size of the original table on disk.
pg_repackis an extension, and therefore can be installed via
pgxn(as well as manually, of course):
% sudo pgxn install pg_repack
Then of course, you need to install the extension into the database you are going to use (or repack):
% psql -U postgres -c "CREATE EXTENSION pg_repack;" testdb CREATE EXTENSION
pg_repackmust be invoked from the command line as an external utility. The command accepts pretty much all the usual arguments from libpq:
% pg_repack -t "luca.wa" -U postgres testdb INFO: repacking table "luca.wa"
The above will repack a single table, but it is possible to repack all tables in a schema, all tables in a database and so on.
repackschema in the database where the extension lives. In such schema there are different tables, mainly temporary for repacking objects. An interesting table is
repack.tablesthat contains all the details for every table that can be repacked. Querying such tables you can see some tricks used in the workflow of
testdb=# select create_log, create_trigger, lock_table from repack.tables where relname = 'luca.wa'; ... create_log | CREATE TABLE repack.log_16553 (id bigserial PRIMARY KEY, pk repack.pk_16553, row luca.wa) create_trigger | CREATE TRIGGER repack_trigger AFTER INSERT OR DELETE OR UPDATE ON luca.wa FOR EACH ROW EXECUTE PROCEDURE repack.repack_trigger('INSERT INTO repack.log_16553(pk, row) VALUES( CASE WHEN $1 IS NULL THEN NULL ELSE (ROW($1.pk)::repack.pk_16553) END, $2)') lock_table | LOCK TABLE luca.wa IN ACCESS EXCLUSIVE MODE
As you can see, there are SQL instructions to create a
log_xxxtable where changed tuples will be logged, as well as the definition of the trigger to attach to the table.
repack_triggeris a C function that accepts an SQL string (as you can see) and that will execute an insert into the
log_xxxtable so that:
- in case of an
INSERTthe new tuple will be inserted as
- in case of an
UPDATEboth the new and old tuples will be inserted as
- in case of a
DELETEthe old tuple only will be inserted as
lock_tableis used to lock the table during the initial and final steps, that is at the time the trigger is attached and when the tables are swapped.
pg_repackis surely an interesting extension to keep into the bag. In the future I’m going to spend some time using this extension to see how it performs, but I already know there are happy people using it, so I’m expecting positive results!