A glance at
pg_cron to automatically schedule database tasks
pg_cronis an interesting PostgreSQL extension by Citus Data: it does include a background worker (i.e., a PostgreSQL managed process) to execute database tasks on the server side. This is something I’ve done for years, I mean managing automated tasks using operating system wide
cron(1)and schedulers alike, but having the scheduler within the database sounds really cool, since I can keep it tied to the data itself.
An example scenarioI’ve one server pulling data regularly out of another server, via a foreign data wrapper. No matter how this design choice sounds to you, it works for me!
In order to constantly pull data, I have set up a
cron(1)task in my user crontab to execute a function that does all the business logic I need. Therefore my crontab file looks like:
So I’m executing the function
$ crontab -l 10,20,30,40,50,0 * * * * \ /usr/bin/psql -U postgres -h 127.0.0.1 \ -c "SELECT f_pull('crontab import');" mydb
mydbspecifying a label
crontab import. Let’s see how this can be done using
While there are some packages for major Linux distributions, I find it quite easily to install it from the official repository with the following short commands:
and in the case it does matter, I’m using a CentOS 7 Linux here. Now, in order to make
$ git clone https://github.com/citusdata/pg_cron.git $ cd pg_cron $ export PATH=/usr/pgsql-11/bin:$PATH $ make $ sudo PATH=$PATH make install
pg_cronworking it must be loaded as a shared library, so you have to adjust the PostgreSQL configuration (usually ~postgresql.conf~) as follows:
Here I use
shared_preload_libraries = 'pg_cron' cron.database_name = 'mydb'
mydbas the database on which store the
pg_crondata. In fact,
pg_cronwill create a
cronschema with a table
jobin there that will do the same as your crontab file on any Unix machine. Unluckily, you need to restart the PostgreSQL cluster in order to apply changes.
It is now time to decide who will execute the cron jobs, and in my case it is the
$ sudo service postgresql-11 restart
postgressuperuser. This could be not the optimal choice, so choose the user that fits the need for you. In my case I was already using
postgresuser, so it sounded to me the right and faster way to migrate from regular cron to
pg_cron. Why does it matter choosing the user in advance? Because
pg_cronrequires such user to be able to connect to the database without providing any password, so you either should adjust the
pg_hba.confproperly or add a
.pgpassin the home the user. Yes, even if a background worker is used to implement the
pg_cronfeatures, the connection happens thru
libpq, so the need for the user to be granted to connect withou providing a password. Therefore I changed the
adding the specific line for
host all all 127.0.0.1/32 md5 host mydb postgres localhost trust
postgres(line ordering does not matter) and leaving all other connections requiring a password. Now, you can issue a reload and test your user connectivity. Once this is done, you can configure
Configuring a jobConfiguring
pg_cronis really simple: all jobs are kept in the
cron.jobtable and you can either edit such table with standard SQL or use the
cron.schedule()function to get an initial entry to work later on. Since I was migrating a
cron(1)entry, things were as simple as copy and paste the
cron(1)entry line with dollar quoting:
mydb=# SELECT cron.schedule('5,15,25,35,45,55 * * * *', $CRON$ SELECT f_pull('pg_cron import'); $CRON$ );
pg_cronreplies with the identifier of the job, in my case
1because it is the very first job inserted in the scheduler. I can inspect it with an ordinary
All the fields are perfectly understandable, and please note that the
mydb=# SELECT * FROM cron.job; -[ RECORD 1 ]------------------------------------------------------------ jobid | 1 schedule | 5,15,25,35,45,55 * * * * command | SELECT public.f_pull('pg_cron import'); nodename | localhost nodeport | 5432 database | mydb username | postgres active | t
schedulefield reports the string in the exact same format of
cron(1); this is due to the fact the
pg_cronuses the very same parser as
cron(1), making migration from
pg_cronreally easy. By feault
cron.schedule()uses the current PostgreSQL instance parameters and the current username, but you can than adjust them to something else. While I haven’t tested it, this means you could execute cron task from one PostgreSQL into a remote one.
And that’s all!
Now you can sit down and check your cron jobs.
Things never works the first time! In the case you need inspection, consider that
pg_cronlogs at the
LOGlevel and provides a statement for job begin and end. A succesfully executed job prints log statements as
while a failing job prints lines as
LOG: cron job 1 starting: SELECT public.f_pull('pg_cron import'); LOG: cron job 1 completed: 1 row
Often problems arise due to connection permissions or grants, so double check your cron user is really able to do what you are exepcting to do. In my case it was really simple because I was already using a
LOG: cron job 1 starting: SELECT public.f_pull('pg_cron import'); LOG: cron job 1 connection failed
cron(1)job, so the user was already granted to do its job.
pg_cronis an awesome tool to keep in your toolbag because it makes really easy to migrate from
pg_cron(and back!). Moreover, being an extension, it makes all schedule configuration available within the database, and since
cron.jobis added to the backup from the extension installation instruction, this means you will get scheduler backups for free!