pgbackrest async behavior
pgbackrest is an amazing backup tool, it is rock-solid (as PostgreSQL is) and designed to work under heavy database load.One feature it has to improve efficienty of WAL archiving is the async mode.
In “standard” mode,
pgbackrest
will push WAL segments to the backup machine, using the classical archive_command
provided by PostgreSQL. As you probably already know, PostgreSQL will wait for archive_command
to complete and acknowledge the WAL transfert. It could happen that:
- the
archive_command
could take a very long time, and while PostgreSQL will continue to work, not yet transferred WALs will makepg_wal
to grow; - the
archive_command
could fail, and PostgreSQL will warn you (in the logs) about this event and will try again to archive the failed WALs (forever, or better, unless it succeed).
On the other hand, when doing a restore, PostgreSQL executes the
restore_command
to get a new WAL segment, and this in turn results in running pgbackrest
for a single WAL request.
The key concept here is probably single WAL request, both for push and get.
pgbackrest
allows for an improvement on this situation by means of asynchronous archive management, both push
and get
. The idea is to give more control to pgbackrest
so that it can optimize I/O operations.
When PostgreSQL archives a WAL segment, it executes the
archive_command
within a loop (allow me to simplify things): when a WAL is ready, archive_command
is invoked and until it has finished, there is no chance to archive an already available WAL segment. On the other hand, when PostgreSQL needs to get a WAL in order to do a restore/recovery, it executes restore_command
on every WAL segment it is expecting to replay. Therefore, if the server has to replay many WALs, it has to execute restore_command
and “download” every WAL one after the other.
How does the asynchronous mode improve on the above?
When archiving, that means pushing,
pgbackrest
can decide to group several WALs in a single transfert, that means for instance to reduce the setup/tear-down operations for establishing a network connection with the backup machine.
When restoring, that means getting,
pgbackrest
could perform a pre-fetch, downloading a few WALs on the local machine and make them available immediatly to the PostgreSQL server when needed.
The test environment
In this post, I will demonstrate the usage ofpgbackrest
asynchronously using my usual two-machine setup:
miguel
is the PostgreSQL server, running Fedora Linux with PostgreSQL 13.3;carmensita
is the backup machine, running Fedora Linux.
pgbackrest
is at version 2.34
.
Asynchronous Configuration Parameters
There are a bunch of configuration parameters that can be configured within thepgbackrest.conf
file or specified on the command line, as usual.
The settings mainly regard the spool directory, the queues and the enabling of the asynchronous mode.
Enabling or disabling the asynchronous mode
There is a single configuration parameter to enable the asynchronous mode:async
.
By default this is false, meaning pgbackrest
will work “normally” as you expect. Turning it on, will automatically make any archive-get
and archive-push
in asynchronous mode.
The spool directory
In order to manage the async operations,pgbackrest
creates on the PostgreSQL machine a spool directory, usually /var/spool/pgbackrest
where it places an archive
directory and a directory named after the server, or better, the stanza. Such directory could then be split into in
or out
for respectively archive-get
and archive-push
.
The spool directory root can be defined with the
spool-path
configuration parameter.
For example, given the stanza named
miguel
, the spool directory will either be /var/spool/pgbackrest/archive/miguel/out
or /var/spool/pgbackrest/archive/miguel/in
.
In the
out
directory the system will write book-keeping stuff, mainly small text files that will be used to identify at which point the archiving has arrived.
In the
in
directory, the system will store incoming WALs ready to be restored from the PostgreSQL server.
Queues
There are two different setting to manage the queues ofpgbackrest
:
archive-push-max-queue
;archive-get-max-queue
.
They configure the max size of the data enqueued for the push and get operations. When the queue is full,
pgbackrest
will behave differently depending on the operation that is ongoing, as explained below.
Configuration
The backup machine, namedcarmensita
has a 7etc/pgbackrest.conf
file configured as follows:
$ cat /etc/pgbackrest.conf
[global]
start-fast = y
stop-auto = y
repo1-path = /backup/pgbackrest
repo1-retention-full=2
repo1-host-user = backup
log-level-console = info
[miguel]
pg1-host = miguel
pg1-path = /postgres/13/data
while on the PostgreSQL server machine, named
miguel
the /etc/pgbackrest.conf
file is
[global]
repo1-path = /backup/pgbackrest
repo1-host-user = backup
log-level-console = info
repo1-host = carmensita
archive-async = y
archive-push-queue-max = 500MB
spool-path = /var/spool/pgbackrest
archive-get-queue-max = 32MB
Last, the
archive_command
on the PostgreSQL machine is configured as follows:
archive_command = '/usr/bin/pgbackrest \
--pg1-path=/postgres/13/data \
--config=/etc/pgbackrest.conf \
--stanza=miguel \
archive-push %p'
archive_mode = on
Please note that the
archive-async
parameter is specified in the configuration, instead of setting it in the archive-push
or archive-get
. This simplifies, in my opinion, the usage of pgbackrest
.
With all the above up and running, it is possible to see how the asynchronous mode works.
Archiving (archive-push
)
Let’s start with the backup scenario, that is archive-push
.
When things go right
Let’s see what happens when everything is fine: I launched apgbench
session in order to generate some traffic and, therefore, some WAL segment generation and archiving.
On one hand, pgbench
was running as follows:
% pgbench -c 8 -T 120 -h miguel -U pgbench -n -P 5 pgbench
While
pgbench
is running, let’s inspect what is happening on the PostgreSQL machine, with particular regard to the spooling folder:
# ls -1s /var/spool/pgbackrest/archive/miguel/out \
&& psql -h miguel -U postgres \
-c 'select last_archived_wal from pg_stat_archiver;' postgres
0 000000070000014E00000015.ok
last_archived_wal
--------------------------
000000070000014E00000015
# # after a while
# ls -1s /var/spool/pgbackrest/archive/miguel/out \
&& psql -h miguel -U postgres \
-c 'select last_archived_wal from pg_stat_archiver;' postgres
0 000000070000014E00000016.ok
last_archived_wal
--------------------------
000000070000014E00000016
As you can see, in the spool directory there will be an empty file named after the last archived WAL segment, that is the last segment sent to the backup machine, and the suffix
.ok
.
In the PostgreSQL logs, there will be a notice when the
pgbackrest
completes the pushing (depending on the log level you configured):
INFO: pushed WAL file '000000070000014E000000AD' to the archive asynchronously
When things go wrong
First case: shutting down the backup machine
Assume the backup machine,carmensita
, is turned off. The archiving cannot work, of course, and if you generate again some traffic on the PostgreSQL server (e.g., by using pgbench
as shown above), the situation on the spool directory is:
# ls -1s /var/spool/pgbackrest/archive/miguel/out \
&& psql -h miguel -U postgres \
-c 'select last_archived_wal, last_failed_wal from pg_stat_archiver;' postgres
4 global.error
last_archived_wal | last_failed_wal
--------------------------|--------------------------
000000070000014E0000001A | 000000070000014E0000001B
The file
global.error
contains a textual description of what is happening:
# cat /var/spool/pgbackrest/archive/miguel/out/global.error
103
unable to find a valid repository:
repo1: [UnknownError] remote-0 process on 'carmensita' terminated unexpectedly [255]: ssh: connect to host carmensita port 22: No route to hos
If you then restart the backup machine, so that the archiving starts working again, the situation on the spool directory is as follows:
# ls -1s /var/spool/pgbackrest/archive/miguel/out \
&& psql -h miguel -U postgres \
-c 'select last_archived_wal, last_failed_wal from pg_stat_archiver;' postgres
0 000000070000014E0000001B.ok
0 000000070000014E0000001C.ok
0 000000070000014E0000001D.ok
0 000000070000014E0000001E.ok
0 000000070000014E0000001F.ok
last_archived_wal | last_failed_wal
--------------------------|--------------------------
000000070000014E0000001F | 000000070000014E0000001B
As you can see, the
.ok
files are there and the archiving is working again.
During the time, there could be one or more
.ok
files. The idea is that the last .ok
file indicates the last asynchronously archived WAL segment (in the above, the one ending with 1F
).
Second case: generating more WALs
Shutdown the backup machine again, so that the PostgreSQL server is not able to archive WAL segments; then generate quite an amount of traffic to increase the WAL directory size (pg_wal
).
Let’s inspect the situation:
# ls -1s /var/spool/pgbackrest/archive/miguel/out \
&& psql -h miguel -U postgres \
-c 'select last_archived_wal, last_failed_wal from pg_stat_archiver;' postgres
4 global.error
last_archived_wal | last_failed_wal
--------------------------|--------------------------
000000070000014E000000ED | 000000070000014E000000EE
# cat /var/spool/pgbackrest/archive/miguel/out/global.error
103
unable to find a valid repository:
repo1: [UnknownError] remote-0 process on 'carmensita' terminated unexpectedly [255]: ssh: connect to host carmensita port 22: No route to host
Therefore, the
14E0xED
is the last archived WAL on the backup machine.
Suppose now a larger amount of data is mangled on PostgreSQL, so that it starts generating WAL segments. Clearly PostgreSQL cannot archive segments anymore, and will start accumulating them into
pg_wal
to keep them available for when the archive_command
will start to work again.
Or does it?
Inspect again the situation on disk:
# ls -1s /var/spool/pgbackrest/archive/miguel/out \
&& psql -h miguel -U postgres \
-c 'select last_archived_wal, last_failed_wal from pg_stat_archiver;' postgres
4 000000070000014F00000053.ok
4 000000070000014F00000054.ok
4 000000070000014F00000055.ok
4 000000070000014F00000056.ok
4 000000070000014F00000057.ok
4 000000070000014F00000058.ok
4 000000070000014F00000059.ok
4 000000070000014F0000005A.ok.pgbackrest.tmp
last_archived_wal | last_failed_wal
--------------------------|--------------------------
000000070000014F00000059 | 000000070000014F00000053
# cat /var/spool/pgbackrest/archive/miguel/out/000000070000014F00000059.ok
0
dropped WAL file '000000070000014F00000059' because archive queue exceeded 500MB
First of all:
last_archived_wal
advanced even if the acrhive_command
is failing (remember that the backup machine is down)! How is that possible?
The answer is in how
pgbackrest
asynchronous works: if the number of failed WALs is greater than a specified size, pgbackrest
decides to ackwnloedge the archiving to the PostgreSQL server, that in turn advances in archiving even if the archived WAL did not hit the backup machine!
The idea is that
pgbackrest
will prevent the pg_wal
to grow undefinetly, thus risking to stop PostgreSQL to work at all. However, ** acknowledging a fake archiving means that the WAL-stream is broken, so Point In Time Recovery will not be possible anymore around this “hole” and a new backup is strongly recommended!**
pgbackrest
inserts an information into its .ok
files, that now are non-empty and inform the administrator that the WAL segment has been dropped explicitly.
You can find the same information into the PostgreSQL logs, where
pgbackrest
prints a message to make it clear:
% sudo grep 000000070000014F00000059 $PGLOG
INFO: archive-push command begin 2.34: [pg_wal/000000070000014F00000059] --archive-async --archive-push-queue-max=500MB --config=/etc/pgbackrest.conf --exec-id=40124-af251b1c --log-level-console=info --pg1-path=/postgres/13/data --repo1-host=carmensita --repo1-host-user=backup --repo1-path=/backup/pgbackrest --spool-path=/var/spool/pgbackrest --stanza=miguel
WARN: dropped WAL file '000000070000014F00000059' because archive queue exceeded 500MB
INFO: pushed WAL file '000000070000014F00000059' to the archive asynchronously
There are three pieces of information:
pgbackrest
tried to archive the WAL segment, failing;- there is a
WARN
that informs you thatpgbackrest
instrumented PostgreSQL to drop the WAL file as if it was archived correctly; pgbackrest
states that it has archived the file, so PostgreSQL can proceed to delete or recycle it.
What and when does
pgbackrest
decides to give up and starts faking to PostgreSQL? The acrhive-push-queue-max
configuration paramater establish how many data pgbackrest
can fail behind the normal WAL operations before trying to make PostgreSQL delete segments.
In my configuration, there is
archive-push-queue-max=500MB
, that means that after 500MB
of failed WALs, pgbackrest
will start faking and there will be a hole into the WAL stream. Roughly, this corresponds to 32
failed WALs on a row.
Parallel Processes
The configuration parameterprocess-max
can be used to control how many push workers can be launched to serve the asynchronous system. Suppose that in the configuration there is process-max = 4
, then during WAL archiving you could see something as follows in the process list:
# pstree -c -A
systemd-|-NetworkManager-|-{NetworkManager}
...
|-pgbackrest-|-pgbackrest---ssh
| |-pgbackrest---ssh
| |-pgbackrest---ssh
| |-pgbackrest---ssh
| `-ssh
...
|-postmaster-|-postmaster
| |-postmaster
| |-postmaster
| |-postmaster
| |-postmaster
| |-postmaster---pgbackrest
| |-postmaster
| `-postmaster
...
As you can see, PostgreSQL has launched
pgbackrest
(that is, is executing the archive_command
), and there are four pgbackrest
processes.
If the system is pushing archives in synchronous mode,
process-max
is ignored.
Every concurrent process will share an
exec-id
that identifies the batch to which the process belongs:
# pstree -A -c -a -l | grep pgbackrest
...
|-pgbackrest --config=/etc/pgbackrest.conf --exec-id=46475-10e060a1
| |-pgbackrest --config=/etc/pgbackrest.conf --exec-id=46475-10e060a1
| |-pgbackrest --config=/etc/pgbackrest.conf --exec-id=46475-10e060a1
| |-pgbackrest --config=/etc/pgbackrest.conf --exec-id=46475-10e060a1
...
Restoring (archive-get
)
Let’s do a restore from a recent backup:
% sudo systemctl stop postgresql-13.service
% sudo -u postgres pgbackrest --stanza miguel \
--pg1-path /postgres/13/data --delta restore
...
INFO: restore command end: completed successfully (69861ms)
During the restore, the
archive
directory within the spool directory of pgbackrest
is cleaned, in particular the specific server directory miguel
is removed, since no WAL archiving is in progress.
The
postgresql.auto.conf
file contains the archive-get
command ready to fetch the WAL segments:
% sudo cat /postgres/13/data/postgresql.auto.conf
# Recovery settings generated by pgBackRest restore on 2021-07-27 05:26:26
restore_command = 'pgbackrest --pg1-path=/postgres/13/data --stanza=miguel archive-get %f "%p"'
During the system startup,
pgbackrest
will get (as usual) WAL segments from the backup machine, but this time in an asynchronous way:
INFO: archive-get command begin 2.34: [000000070000014F000000A4, pg_wal/RECOVERYXLOG] --archive-async --archive-get-queue-max=32MB --exec-id=42831-f4ada646 --log-level-console=info --pg1-path=/postgres/13/data --repo1-host=carmensita --repo1-host-user=backup --repo1-path=/backup/pgbackrest --spool-path=/var/spool/pgbackrest --stanza=miguel
INFO: found 000000070000014F000000A4 in the archive asynchronously
INFO: archive-get command end: completed successfully (713ms)
The above is an excerpt of the PostgreSQL log. In the meantime, the spool directory was populated with a
in
subdirectory for the server, and in such directory the incoming WALs were stored waiting to be replayed by the PostgreSQL server:
% sudo ls -1s /var/spool/pgbackrest/archive/miguel/in
16384 000000070000014F000000A4.pgbackrest.tmp
In this scenario, the
archive-get-queue-max
parameter can specify the size of pre-fetched WALs: pgbackrest
will fetch and store in the spooling directory no more WAL segments than the specified amount. Unlike the push configuration, setting this parameter does not imply the system will throw away WALs.
Conclusions
pgbackrest
is an amazing backup tool, rock solid and with a lot of configuration parameters that can help improving the resource usage so that the backup and restore work fast and reliably even under heavy loads.
The asynchronous mode can help improving performances by means of batches and pre-fetching of WAL segments. However, you need to be aware about the fact that, by design, asynchronous pushing of WALs could produce holes in the WAL stream if the archiving accumulates too much data.
This is, in my opinion, an excellent feature, because in my experience I’ve seen many times a PostgreSQL server accumulating too much WAL segments (up to consuming all the storage) due to a faulty backup machine (or networking). After all,
pgbackrest
is ensuring you that a backup exists, and at least that your PostgreSQL server will not go read-only due to archive_command
failing.
Love it or hate it!