yum upgrade postgresql11 panic!How hard could it be to upgrade PostgreSQL within minor versions?
Usually it is very simple, and it is very simple but not when you don’t know your tools!
And in this case that’s my fault.
However, I’m writing this short note in order to avoid other people experience the same problem I had.
The current setupThe machine is a CentOS 7 running PostgreSQL 11.1 installed by packages provided by the PostgreSQL Global Development Group.
Preparing to upgradeOf course, I took a full backup before proceeding, just in case. The cluster I’m talking about is a low traffice cluster with roughly ~12 GB~ of data, that is the backup and restore are not a zero downtime (and no, I’m not in the position of having a WAL based backup, but that’s another story).
Having a backup helps keeping the amount of panic at a fair level.
Performing the upgradeI do like
yum(8)and its transactional approach. Doing the upgrade was a matter of:
and all dependencies are, of course, calculated and applied. Then I confirmed, waited a couple of minutes for the upgrade to apply, and I started keeping my breath:
% sudo yum upgrade postgresql11
psql: could not connect to server: Connection refused Is the server running on host "xxx" (192.168.222.123) and accepting TCP/IP connections on port 5432?
Inspecting and solving the problemApparently PostgreSQL has not been restarted after the upgrade, but what is worst is that is not going to restart again:
What the hell! (I’m allowed to spell it loud because my colleague was on vacation and I was alone in my office).
10:33:25 lnx168 systemd: Starting PostgreSQL 11 database server... 10:33:25 lnx168 postgresql-11-check-db-dir: "/var/lib/pgsql/11/data/" is missing or empty. 10:33:25 lnx168 postgresql-11-check-db-dir: Use "/usr/pgsql-11/bin/postgresql-11-setup initdb" to initialize the database cluster. 10:33:25 lnx168 postgresql-11-check-db-dir: See /usr/share/doc/postgresql11-11.4/README.rpm-dist for more information. 10:33:25 lnx168 systemd: postgresql-11.service: control process exited, code=exited status=1 10:33:25 lnx168 systemd: Failed to start PostgreSQL 11 database server.
First of all, do not run
initdbas suggested because chances are you will destroy all your data. But that’s a good hint about the problem: systemd was trying to launch PostgreSQL with an empty PGDATA.
Of course, the
PGDATAwas not empty and was still in place, but
systemdconfiguration for PostgreSQL to the CentOS default, therefore my file
/usr/lib/systemd/system/postgresql-11.servicewas overriden without any advice!
And in fact, to confirm the above, I was able to start the server manually using
pg_ctl, and at least I had the server running.
Now that the server is running, I have more time to inspect
/usr/lib/systemd/system/postgresql-11.serviceand adjust the
PGDATAparameter to the right value:
I also double checked that the
% sudo grep PGDATA /usr/lib/systemd/system/postgresql-11.service Environment=PGDATA=/data/pgdata
systemdstartup script correctly links to the edited file:
Seems fine, right?
$ ls -l /etc/systemd/system/multi-user.target.wants/postgresql-11.service lrwxrwxrwx 1 root root 45 20 dic 2018 /etc/systemd/system/multi-user.target.wants/postgresql-11.service -> /usr/lib/systemd/system/postgresql-11.service
Nested problemsNo matter how fine the setup was,
systemdstill refused to restart the cluster:
For a reason I don’t really know, it seems that
$ sudo service postgresql-11 restart Redirecting to /bin/systemctl restart postgresql-11.service Job for postgresql-11.service failed because the control process exited with error code. See "systemctl status postgresql-11.service" and "journalctl -xe" for details.
systemdkeeps track that it hasn’t started the service, and that the latter is in failed mode. The solution was to manually stop the cluster via
pg_ctland that asks
systemdto start it again, and this time it gets running.
Fixing the problem with `systemd**: the right approach
updated on 2019-07-22
As pointed out by Andrew Gierth in a comment, editing the
systemdunit service file is not the right approach to configure services. Here it is the right approach, so that my changes do not get overwritten by
systemd: 1) run
systemctl edit postgresql-11; 2) add a line with
3) inspect the service with
systemctl status postgresql-11, that will show the following:
The important part in the above is the Drop-In line that points to a freshly created directory
$ systemctl status postgresql-11 ● postgresql-11.service - PostgreSQL 11 database server Loaded: loaded (/usr/lib/systemd/system/postgresql-11.service; enabled; vendor preset: disabled) Drop-In: /etc/systemd/system/postgresql-11.service.d └─override.conf Active: active (running) since lun 2019-07-22 15:43:50 CEST; 31s ago Docs: https://www.postgresql.org/docs/11/static/ Main PID: 16114 (postmaster) CGroup: /system.slice/postgresql-11.service ├─16114 /usr/pgsql-11/bin/postmaster -D /postgres/data ├─16116 postgres: logger ├─16118 postgres: checkpointer ├─16119 postgres: background writer ├─16120 postgres: walwriter ├─16121 postgres: autovacuum launcher ├─16122 postgres: stats collector ├─16123 postgres: pg_cron scheduler └─16124 postgres: logical replication launcher
/etc/systemd/system/postgresql-11.service.dwith a single file,
override.confthat contains the new
PGDATAdefinition. In other words,
systemdkeeps the service units under its own control, and you have to create an
override.conffile to place other variable values.
ConclusionsNot knowing your tools,
systemdin this case, can lead to panic when they do not behave as you expect. Unluckily, there are too many little details to know about every different system, and I wish
systemdbecomes a little less rude and at least warns the user that his files are going to be overriden.
While the unit file states, in its beginning, to not modify the file, it is not clear what is the best approach to use to re-define variables (include or override file?).