A glance at pg_cron
to automatically schedule database tasks
pg_cron
is an interesting PostgreSQL extension by Citus Data: it does include a background worker (i.e., a PostgreSQL managed process) to execute database tasks on the server side. This is something I’ve done for years, I mean managing automated tasks using operating system wide cron(1)
and schedulers alike, but having the scheduler within the database sounds really cool, since I can keep it tied to the data itself.
An example scenario
I’ve one server pulling data regularly out of another server, via a foreign data wrapper. No matter how this design choice sounds to you, it works for me!In order to constantly pull data, I have set up a
cron(1)
task in my user crontab to execute a function that does all the business logic I need.
Therefore my crontab file looks like:
$ crontab -l
10,20,30,40,50,0 * * * * \
/usr/bin/psql -U postgres -h 127.0.0.1 \
-c "SELECT f_pull('crontab import');" mydb
f_pull
on database mydb
specifying a label crontab import
.
Let’s see how this can be done using pg_cron
too.
Installing pg_cron
While there are some packages for major Linux distributions, I find it quite easily to install it from the official repository with the following short commands:
$ git clone https://github.com/citusdata/pg_cron.git
$ cd pg_cron
$ export PATH=/usr/pgsql-11/bin:$PATH
$ make
$ sudo PATH=$PATH make install
pg_cron
working it must be loaded as a shared library, so you have to adjust the PostgreSQL configuration (usually ~postgresql.conf~) as follows:
shared_preload_libraries = 'pg_cron'
cron.database_name = 'mydb'
mydb
as the database on which store the pg_cron
data. In fact, pg_cron
will create a cron
schema with a table job
in there that will do the same as your crontab file on any Unix machine.
Unluckily, you need to restart the PostgreSQL cluster in order to apply changes.
$ sudo service postgresql-11 restart
postgres
superuser. This could be not the optimal choice, so choose the user that fits the need for you. In my case I was already using cron(1)
with postgres
user, so it sounded to me the right and faster way to migrate from regular cron to pg_cron
. Why does it matter choosing the user in advance? Because pg_cron
requires such user to be able to connect to the database without providing any password, so you either should adjust the pg_hba.conf
properly or add a .pgpass
in the home the user. Yes, even if a background worker is used to implement the pg_cron
features, the connection happens thru libpq
, so the need for the user to be granted to connect withou providing a password. Therefore I changed the pg_hba.conf
as follows:
host all all 127.0.0.1/32 md5
host mydb postgres localhost trust
postgres
(line ordering does not matter) and leaving all other connections requiring a password.
Now, you can issue a reload and test your user connectivity. Once this is done, you can configure pg_cron
.
Configuring a job
Configuringpg_cron
is really simple: all jobs are kept in the cron.job
table and you can either edit such table with standard SQL or use the cron.schedule()
function to get an initial entry to work later on.
Since I was migrating a cron(1)
entry, things were as simple as copy and paste the cron(1)
entry line with dollar quoting:
mydb=# SELECT cron.schedule('5,15,25,35,45,55 * * * *',
$CRON$ SELECT f_pull('pg_cron import'); $CRON$
);
pg_cron
replies with the identifier of the job, in my case 1
because it is the very first job inserted in the scheduler. I can inspect it with an ordinary SELECT
against the cron.job
table.
mydb=# SELECT * FROM cron.job;
-[ RECORD 1 ]------------------------------------------------------------
jobid | 1
schedule | 5,15,25,35,45,55 * * * *
command | SELECT public.f_pull('pg_cron import');
nodename | localhost
nodeport | 5432
database | mydb
username | postgres
active | t
schedule
field reports the string in the exact same format of cron(1)
; this is due to the fact the pg_cron
uses the very same parser as cron(1)
, making migration from cron(1)
to pg_cron
really easy. By feault cron.schedule()
uses the current PostgreSQL instance parameters and the current username, but you can than adjust them to something else. While I haven’t tested it, this means you could execute cron task from one PostgreSQL into a remote one.
And that’s all!
Now you can sit down and check your cron jobs.
pg_cron
logging
Things never works the first time! In the case you need inspection, consider that pg_cron
logs at the LOG
level and provides a statement for job begin and end. A succesfully executed job prints log statements as
LOG: cron job 1 starting: SELECT public.f_pull('pg_cron import');
LOG: cron job 1 completed: 1 row
LOG: cron job 1 starting: SELECT public.f_pull('pg_cron import');
LOG: cron job 1 connection failed
cron(1)
job, so the user was already granted to do its job.
Conclusions
pg_cron
is an awesome tool to keep in your toolbag because it makes really easy to migrate from cron(1)
to pg_cron
(and back!). Moreover, being an extension, it makes all schedule configuration available within the database, and since cron.job
is added to the backup from the extension installation instruction, this means you will get scheduler backups for free!