A glance at pg_cron to automatically schedule database tasks
pg_cron is an interesting PostgreSQL extension by Citus Data: it does include a background worker (i.e., a PostgreSQL managed process) to execute database tasks on the server side. This is something I’ve done for years, I mean managing automated tasks using operating system wide cron(1) and schedulers alike, but having the scheduler within the database sounds really cool, since I can keep it tied to the data itself.
An example scenario
I’ve one server pulling data regularly out of another server, via a foreign data wrapper. No matter how this design choice sounds to you, it works for me!In order to constantly pull data, I have set up a
cron(1) task in my user crontab to execute a function that does all the business logic I need.
Therefore my crontab file looks like:
$ crontab -l
10,20,30,40,50,0 * * * * \
/usr/bin/psql -U postgres -h 127.0.0.1 \
-c "SELECT f_pull('crontab import');" mydb
f_pull on database mydb specifying a label crontab import.
Let’s see how this can be done using pg_cron too.
Installing pg_cron
While there are some packages for major Linux distributions, I find it quite easily to install it from the official repository with the following short commands:
$ git clone https://github.com/citusdata/pg_cron.git
$ cd pg_cron
$ export PATH=/usr/pgsql-11/bin:$PATH
$ make
$ sudo PATH=$PATH make install
pg_cron working it must be loaded as a shared library, so you have to adjust the PostgreSQL configuration (usually ~postgresql.conf~) as follows:
shared_preload_libraries = 'pg_cron'
cron.database_name = 'mydb'
mydb as the database on which store the pg_cron data. In fact, pg_cron will create a cron schema with a table job in there that will do the same as your crontab file on any Unix machine.
Unluckily, you need to restart the PostgreSQL cluster in order to apply changes.
$ sudo service postgresql-11 restart
postgres superuser. This could be not the optimal choice, so choose the user that fits the need for you. In my case I was already using cron(1) with postgres user, so it sounded to me the right and faster way to migrate from regular cron to pg_cron. Why does it matter choosing the user in advance? Because pg_cron requires such user to be able to connect to the database without providing any password, so you either should adjust the pg_hba.conf properly or add a .pgpass in the home the user. Yes, even if a background worker is used to implement the pg_cron features, the connection happens thru libpq, so the need for the user to be granted to connect withou providing a password. Therefore I changed the pg_hba.conf as follows:
host all all 127.0.0.1/32 md5
host mydb postgres localhost trust
postgres (line ordering does not matter) and leaving all other connections requiring a password.
Now, you can issue a reload and test your user connectivity. Once this is done, you can configure pg_cron.
Configuring a job
Configuringpg_cron is really simple: all jobs are kept in the cron.job table and you can either edit such table with standard SQL or use the cron.schedule() function to get an initial entry to work later on.
Since I was migrating a cron(1) entry, things were as simple as copy and paste the cron(1) entry line with dollar quoting:
mydb=# SELECT cron.schedule('5,15,25,35,45,55 * * * *',
$CRON$ SELECT f_pull('pg_cron import'); $CRON$
);
pg_cron replies with the identifier of the job, in my case 1 because it is the very first job inserted in the scheduler. I can inspect it with an ordinary SELECT against the cron.job table.
mydb=# SELECT * FROM cron.job;
-[ RECORD 1 ]------------------------------------------------------------
jobid | 1
schedule | 5,15,25,35,45,55 * * * *
command | SELECT public.f_pull('pg_cron import');
nodename | localhost
nodeport | 5432
database | mydb
username | postgres
active | t
schedule field reports the string in the exact same format of cron(1); this is due to the fact the pg_cron uses the very same parser as cron(1), making migration from cron(1) to pg_cron really easy. By feault cron.schedule() uses the current PostgreSQL instance parameters and the current username, but you can than adjust them to something else. While I haven’t tested it, this means you could execute cron task from one PostgreSQL into a remote one.
And that’s all!
Now you can sit down and check your cron jobs.
pg_cron logging
Things never works the first time! In the case you need inspection, consider that pg_cron logs at the LOG level and provides a statement for job begin and end. A succesfully executed job prints log statements as
LOG: cron job 1 starting: SELECT public.f_pull('pg_cron import');
LOG: cron job 1 completed: 1 row
LOG: cron job 1 starting: SELECT public.f_pull('pg_cron import');
LOG: cron job 1 connection failed
cron(1) job, so the user was already granted to do its job.
Conclusions
pg_cron is an awesome tool to keep in your toolbag because it makes really easy to migrate from cron(1) to pg_cron (and back!). Moreover, being an extension, it makes all schedule configuration available within the database, and since cron.job is added to the backup from the extension installation instruction, this means you will get scheduler backups for free!