pgbackrest async behavior
pgbackrest is an amazing backup tool, it is rock-solid (as PostgreSQL is) and designed to work under heavy database load.One feature it has to improve efficienty of WAL archiving is the async mode.
In “standard” mode,
pgbackrest will push WAL segments to the backup machine, using the classical archive_command provided by PostgreSQL. As you probably already know, PostgreSQL will wait for archive_command to complete and acknowledge the WAL transfert. It could happen that:
- the 
archive_commandcould take a very long time, and while PostgreSQL will continue to work, not yet transferred WALs will makepg_walto grow; - the 
archive_commandcould fail, and PostgreSQL will warn you (in the logs) about this event and will try again to archive the failed WALs (forever, or better, unless it succeed). 
On the other hand, when doing a restore, PostgreSQL executes the
restore_command to get a new WAL segment, and this in turn results in running pgbackrest for a single WAL request.
The key concept here is probably single WAL request, both for push and get.
pgbackrest allows for an improvement on this situation by means of asynchronous archive management, both push and get. The idea is to give more control to pgbackrest so that it can optimize I/O operations.
When PostgreSQL archives a WAL segment, it executes the
archive_command within a loop (allow me to simplify things): when a WAL is ready, archive_command is invoked and until it has finished, there is no chance to archive an already available WAL segment. On the other hand, when PostgreSQL needs to get a WAL in order to do a restore/recovery, it executes restore_command on every WAL segment it is expecting to replay. Therefore, if the server has to replay many WALs, it has to execute restore_command and “download” every WAL one after the other.
How does the asynchronous mode improve on the above?
When archiving, that means pushing,
pgbackrest can decide to group several WALs in a single transfert, that means for instance to reduce the setup/tear-down operations for establishing a network connection with the backup machine.
When restoring, that means getting,
pgbackrest could perform a pre-fetch, downloading a few WALs on the local machine and make them available immediatly to the PostgreSQL server when needed.
The test environment
In this post, I will demonstrate the usage ofpgbackrest asynchronously using my usual two-machine setup:
miguelis the PostgreSQL server, running Fedora Linux with PostgreSQL 13.3;carmensitais the backup machine, running Fedora Linux.
pgbackrest is at version 2.34.
Asynchronous Configuration Parameters
There are a bunch of configuration parameters that can be configured within thepgbackrest.conf file or specified on the command line, as usual.
The settings mainly regard the spool directory, the queues and the enabling of the asynchronous mode.
Enabling or disabling the asynchronous mode
There is a single configuration parameter to enable the asynchronous mode:async.
By default this is false, meaning pgbackrest will work “normally” as you expect. Turning it on, will automatically make any archive-get and archive-push in asynchronous mode.
The spool directory
In order to manage the async operations,pgbackrest creates on the PostgreSQL machine a spool directory, usually /var/spool/pgbackrest where it places an archive directory and a directory named after the server, or better, the stanza. Such directory could then be split into in or out for respectively archive-get and archive-push.
The spool directory root can be defined with the
spool-path configuration parameter.
For example, given the stanza named
miguel, the spool directory will either be /var/spool/pgbackrest/archive/miguel/out or /var/spool/pgbackrest/archive/miguel/in.
In the
out directory the system will write book-keeping stuff, mainly small text files that will be used to identify at which point the archiving has arrived.
In the
in directory, the system will store incoming WALs ready to be restored from the PostgreSQL server.
Queues
There are two different setting to manage the queues ofpgbackrest:
archive-push-max-queue;archive-get-max-queue.
They configure the max size of the data enqueued for the push and get operations. When the queue is full,
pgbackrest will behave differently depending on the operation that is ongoing, as explained below.
Configuration
The backup machine, namedcarmensita has a 7etc/pgbackrest.conf file configured as follows:
$ cat /etc/pgbackrest.conf
[global]
start-fast = y
stop-auto  = y
repo1-path = /backup/pgbackrest
repo1-retention-full=2
repo1-host-user = backup
log-level-console = info
[miguel]
pg1-host = miguel
pg1-path = /postgres/13/data
while on the PostgreSQL server machine, named
miguel the /etc/pgbackrest.conf file is
[global]
repo1-path = /backup/pgbackrest
repo1-host-user = backup
log-level-console = info
repo1-host = carmensita
archive-async          = y
archive-push-queue-max = 500MB
spool-path             = /var/spool/pgbackrest
archive-get-queue-max  = 32MB
Last, the
archive_command on the PostgreSQL machine is configured as follows:
archive_command = '/usr/bin/pgbackrest \
                    --pg1-path=/postgres/13/data \
                    --config=/etc/pgbackrest.conf \
                    --stanza=miguel \
                    archive-push %p'
archive_mode = on
Please note that the
archive-async parameter is specified in the configuration, instead of setting it in the archive-push or archive-get. This simplifies, in my opinion, the usage of pgbackrest.
With all the above up and running, it is possible to see how the asynchronous mode works.
Archiving (archive-push)
Let’s start with the backup scenario, that is archive-push.
When things go right
Let’s see what happens when everything is fine: I launched apgbench session in order to generate some traffic and, therefore, some WAL segment generation and archiving.
On one hand, pgbench was running as follows:
% pgbench -c 8 -T 120 -h miguel -U pgbench -n -P 5 pgbench
While
pgbench is running, let’s inspect what is happening on the PostgreSQL machine, with particular regard to the spooling folder:
# ls -1s /var/spool/pgbackrest/archive/miguel/out \
      && psql -h miguel -U postgres \
         -c 'select last_archived_wal from pg_stat_archiver;' postgres
0 000000070000014E00000015.ok
last_archived_wal     
--------------------------
 000000070000014E00000015
# # after a while
# ls -1s /var/spool/pgbackrest/archive/miguel/out \
        && psql -h miguel -U postgres \
           -c 'select last_archived_wal from pg_stat_archiver;' postgres
0 000000070000014E00000016.ok
last_archived_wal     
--------------------------
 000000070000014E00000016
As you can see, in the spool directory there will be an empty file named after the last archived WAL segment, that is the last segment sent to the backup machine, and the suffix
.ok.
In the PostgreSQL logs, there will be a notice when the
pgbackrest completes the pushing (depending on the log level you configured):
INFO: pushed WAL file '000000070000014E000000AD' to the archive asynchronously
When things go wrong
First case: shutting down the backup machine
Assume the backup machine,carmensita, is turned off. The archiving cannot work, of course, and if you generate again some traffic on the PostgreSQL server (e.g., by using pgbench as shown above), the situation on the spool directory is:
# ls -1s /var/spool/pgbackrest/archive/miguel/out \
    && psql -h miguel -U postgres \
       -c 'select last_archived_wal, last_failed_wal from pg_stat_archiver;' postgres
4 global.error
    last_archived_wal     |     last_failed_wal      
--------------------------|--------------------------
 000000070000014E0000001A | 000000070000014E0000001B
The file
global.error contains a textual description of what is happening:
# cat /var/spool/pgbackrest/archive/miguel/out/global.error 
103
unable to find a valid repository:
repo1: [UnknownError] remote-0 process on 'carmensita' terminated unexpectedly [255]: ssh: connect to host carmensita port 22: No route to hos
If you then restart the backup machine, so that the archiving starts working again, the situation on the spool directory is as follows:
# ls -1s /var/spool/pgbackrest/archive/miguel/out \
    && psql -h miguel -U postgres \
       -c 'select last_archived_wal, last_failed_wal from pg_stat_archiver;' postgres
0 000000070000014E0000001B.ok
0 000000070000014E0000001C.ok
0 000000070000014E0000001D.ok
0 000000070000014E0000001E.ok
0 000000070000014E0000001F.ok
    last_archived_wal     |     last_failed_wal      
--------------------------|--------------------------
 000000070000014E0000001F | 000000070000014E0000001B
As you can see, the
.ok files are there and the archiving is working again.
During the time, there could be one or more
.ok files. The idea is that the last .ok file indicates the last asynchronously archived WAL segment (in the above, the one ending with 1F).
Second case: generating more WALs
Shutdown the backup machine again, so that the PostgreSQL server is not able to archive WAL segments; then generate quite an amount of traffic to increase the WAL directory size (pg_wal).
Let’s inspect the situation:
# ls -1s /var/spool/pgbackrest/archive/miguel/out \
     && psql -h miguel -U postgres \
        -c 'select last_archived_wal, last_failed_wal from pg_stat_archiver;' postgres
4 global.error
    last_archived_wal     |     last_failed_wal      
--------------------------|--------------------------
 000000070000014E000000ED | 000000070000014E000000EE
  
 # cat /var/spool/pgbackrest/archive/miguel/out/global.error 
103
unable to find a valid repository:
repo1: [UnknownError] remote-0 process on 'carmensita' terminated unexpectedly [255]: ssh: connect to host carmensita port 22: No route to host
Therefore, the
14E0xED is the last archived WAL on the backup machine.
Suppose now a larger amount of data is mangled on PostgreSQL, so that it starts generating WAL segments. Clearly PostgreSQL cannot archive segments anymore, and will start accumulating them into
pg_wal to keep them available for when the archive_command will start to work again.
Or does it?
Inspect again the situation on disk:
# ls -1s /var/spool/pgbackrest/archive/miguel/out \
   && psql -h miguel -U postgres      \
     -c 'select last_archived_wal, last_failed_wal from pg_stat_archiver;' postgres
4 000000070000014F00000053.ok
4 000000070000014F00000054.ok
4 000000070000014F00000055.ok
4 000000070000014F00000056.ok
4 000000070000014F00000057.ok
4 000000070000014F00000058.ok
4 000000070000014F00000059.ok
4 000000070000014F0000005A.ok.pgbackrest.tmp
    last_archived_wal     |     last_failed_wal      
--------------------------|--------------------------
 000000070000014F00000059 | 000000070000014F00000053
# cat /var/spool/pgbackrest/archive/miguel/out/000000070000014F00000059.ok
0
dropped WAL file '000000070000014F00000059' because archive queue exceeded 500MB
  
First of all:
last_archived_wal advanced even if the acrhive_command is failing (remember that the backup machine is down)! How is that possible?
The answer is in how
pgbackrest asynchronous works: if the number of failed WALs is greater than a specified size, pgbackrest decides to ackwnloedge the archiving to the PostgreSQL server, that in turn advances in archiving even if the archived WAL did not hit the backup machine!
The idea is that
pgbackrest will prevent the pg_wal to grow undefinetly, thus risking to stop PostgreSQL to work at all. However, ** acknowledging a fake archiving means that the WAL-stream is broken, so Point In Time Recovery will not be possible anymore around this “hole” and a new backup is strongly recommended!** 
pgbackrest inserts an information into its .ok files, that now are non-empty and inform the administrator that the WAL segment has been dropped explicitly.
You can find the same information into the PostgreSQL logs, where
pgbackrest prints a message to make it clear:
% sudo grep 000000070000014F00000059 $PGLOG
INFO: archive-push command begin 2.34: [pg_wal/000000070000014F00000059] --archive-async --archive-push-queue-max=500MB --config=/etc/pgbackrest.conf --exec-id=40124-af251b1c --log-level-console=info --pg1-path=/postgres/13/data --repo1-host=carmensita --repo1-host-user=backup --repo1-path=/backup/pgbackrest --spool-path=/var/spool/pgbackrest --stanza=miguel
WARN: dropped WAL file '000000070000014F00000059' because archive queue exceeded 500MB
INFO: pushed WAL file '000000070000014F00000059' to the archive asynchronously
There are three pieces of information:
pgbackresttried to archive the WAL segment, failing;- there is a 
WARNthat informs you thatpgbackrestinstrumented PostgreSQL to drop the WAL file as if it was archived correctly; pgbackreststates that it has archived the file, so PostgreSQL can proceed to delete or recycle it.
What and when does
pgbackrest decides to give up and starts faking to PostgreSQL? The acrhive-push-queue-max configuration paramater establish how many data pgbackrest can fail behind the normal WAL operations before trying to make PostgreSQL delete segments.
In my configuration, there is
archive-push-queue-max=500MB, that means that after 500MB of failed WALs, pgbackrest will start faking and there will be a hole into the WAL stream. Roughly, this corresponds to 32 failed WALs on a row.
Parallel Processes
The configuration parameterprocess-max can be used to control how many push workers can be launched to serve the asynchronous system. Suppose that in the configuration there is process-max = 4, then during WAL archiving you could see something as follows in the process list:
# pstree -c  -A
systemd-|-NetworkManager-|-{NetworkManager}
       ...                                                                                        
        |-pgbackrest-|-pgbackrest---ssh
        |            |-pgbackrest---ssh
        |            |-pgbackrest---ssh
        |            |-pgbackrest---ssh
        |            `-ssh
       ...                                        
        |-postmaster-|-postmaster
        |            |-postmaster
        |            |-postmaster
        |            |-postmaster
        |            |-postmaster
        |            |-postmaster---pgbackrest
        |            |-postmaster
        |            `-postmaster
      ...
As you can see, PostgreSQL has launched
pgbackrest (that is, is executing the archive_command), and there are four pgbackrest processes.
If the system is pushing archives in synchronous mode,
process-max is ignored.
Every concurrent process will share an
exec-id that identifies the batch to which the process belongs:
# pstree -A -c -a -l | grep pgbackrest
...
  |-pgbackrest --config=/etc/pgbackrest.conf --exec-id=46475-10e060a1
  |   |-pgbackrest --config=/etc/pgbackrest.conf --exec-id=46475-10e060a1
  |   |-pgbackrest --config=/etc/pgbackrest.conf --exec-id=46475-10e060a1
  |   |-pgbackrest --config=/etc/pgbackrest.conf --exec-id=46475-10e060a1
...
Restoring (archive-get)
Let’s do a restore from a recent backup:
% sudo systemctl stop postgresql-13.service
% sudo -u postgres pgbackrest --stanza miguel \
       --pg1-path /postgres/13/data --delta restore
       ...
       INFO: restore command end: completed successfully (69861ms)
During the restore, the
archive directory within the spool directory of pgbackrest is cleaned, in particular the specific server directory miguel is removed, since no WAL archiving is in progress.
The
postgresql.auto.conf file contains the archive-get command ready to fetch the WAL segments:
% sudo cat /postgres/13/data/postgresql.auto.conf
# Recovery settings generated by pgBackRest restore on 2021-07-27 05:26:26
restore_command = 'pgbackrest --pg1-path=/postgres/13/data --stanza=miguel archive-get %f "%p"'
During the system startup,
pgbackrest will get (as usual) WAL segments from the backup machine, but this time in an asynchronous way:
INFO: archive-get command begin 2.34: [000000070000014F000000A4, pg_wal/RECOVERYXLOG] --archive-async --archive-get-queue-max=32MB --exec-id=42831-f4ada646 --log-level-console=info --pg1-path=/postgres/13/data --repo1-host=carmensita --repo1-host-user=backup --repo1-path=/backup/pgbackrest --spool-path=/var/spool/pgbackrest --stanza=miguel
INFO: found 000000070000014F000000A4 in the archive asynchronously
INFO: archive-get command end: completed successfully (713ms)
The above is an excerpt of the PostgreSQL log. In the meantime, the spool directory was populated with a
in subdirectory for the server, and in such directory the incoming WALs were stored waiting to be replayed by the PostgreSQL server:
% sudo ls -1s /var/spool/pgbackrest/archive/miguel/in
16384 000000070000014F000000A4.pgbackrest.tmp
In this scenario, the
archive-get-queue-max parameter can specify the size of pre-fetched WALs: pgbackrest will fetch and store in the spooling directory no more WAL segments than the specified amount. Unlike the push configuration, setting this parameter does not imply the system will throw away WALs.
Conclusions
pgbackrest is an amazing backup tool, rock solid and with a lot of configuration parameters that can help improving the resource usage so that the backup and restore work fast and reliably even under heavy loads.
The asynchronous mode can help improving performances by means of batches and pre-fetching of WAL segments. However, you need to be aware about the fact that, by design, asynchronous pushing of WALs could produce holes in the WAL stream if the archiving accumulates too much data.
This is, in my opinion, an excellent feature, because in my experience I’ve seen many times a PostgreSQL server accumulating too much WAL segments (up to consuming all the storage) due to a faulty backup machine (or networking). After all,
pgbackrest is ensuring you that a backup exists, and at least that your PostgreSQL server will not go read-only due to archive_command failing.
Love it or hate it!