pgBackRest 2.33: multiple repositories (and more)
A few weeks ago a new release of pgbackrest, the 2.33 has been released. This release improves a lot of things, in particular two of them caught my attention:- multi repository support;
 - custom configuration path.
 
The former allows
pgbackrest to perform a multiple backup scattared over different repositories, in other words it allows the backup to be mirrored across different storages.
The second improvement fixes a few annoyances with non-Linux operating systems, such as FreeBSD.
In the following I give a glance at both this improvements, in no specific order.
Custom configuration path
FreeBSD and, most in general, non-Linux machines use different default configuration paths. For example, what is commonly used as/etc on Linux is usually /usr/local/etc. In previous releases, there was room for using the --prefix option during the configure phase, but this was tedious because there was the need to specify the path to non standard files manually before invoking the command.
In other words:
archive_command = '/usr/local/bin/pgbackrest --pg1-path=/postgres/12/data \
                       --config=/usr/local/etc/pgbackrest.conf \
                       --stanza=miguel  archive-push %p'
archive_mode = on
The important part to note in the above snippet, is that on FreeBSD if you wanted to use the standard (from an operating system point of view) path for the configuration,
pgbackrest did not have any clue about and would try to look up the configuration file as /etc/pgbackrest.conf. The solution was, of course, to specify the --config option with the appropriate file.
Things have changed in version 2.33, since the
configure command now can instrument the pgbackrest binary to find out the correct configuration file:
% ./configure --help
 ...
 --with-configdir=DIR    default configuration path
 ...
**The default configuration path remains
/etc/pgbackrest.conf ** but it is now possible to specify a default configuration file path at compile time, so that you don’t have to repeat yourself with --config at every invocation.
Multi Repository Support
This is a much more important improvement, at least in my opinion.pgbackrest has been designed with this feature in mind, but until now there was not support for multiple repositories.
Thanks to multiple repositories you can now scatter or even mirror your backups across different storage systems, so for example you can have a local repository and a remote one (e.g., in one of the supported cloud storages), or you can mount different storages and have the backup to be mirrored across all of them.
The advantage of this solution is that it provides a better redundancy in the case your single-point-of-failure backup storage dies.
One thing to take into account when working with multiple repositories is that a few
pgbackrest commands now require a repository specification other than the stanza. The rule of thumb is that whenever pgbackrest is able to find out which repository to use, it will do, and this applies to the case when a single repository is configured. In other words, backward compatibility is safe!
In the following, there will be two configured repositories on the same backup machine. While this is a very bad idea, because it emphasizes a single point of failure, it allows for a quick run on multiple repository setup. The
carmensita machine will handle two different local repositories:
/backup/pgbackrestis the main repository;/backup/pgbackrest-mirroris the secondary repository, attached to a different storage.
In the beginning there was only repo1
With pgbackrest prior to version 2.33, you could not configure multiple repositories: the configuration did accept a repo1 set of variables but it was unable to handle repositories with a specification different from 1. As an example, consider the following configuration:
[global]
start-fast = y
stop-auto  = y
repo1-path = /backup/pgbackrest
repo1-retention-full=2
repo1-retention-archive=5
repo2-path = /backup/pgbackrest-mirror
repo2-retention-full = 1
Such a configuration produces an error even in version 2.32:
$ pgbackrest --stanza miguel stanza-create
ERROR: [032]: only repo1 may be configured
Multiple Repositories
I have to confess that setting uppgbackrest for different repositories on the same machine was not as simple as I initially thought, but once again thanks to very professional community behind this great product  I was able to fix my setup:
[global]
start-fast = y
stop-auto  = y
repo1-path = /backup/pgbackrest
repo1-retention-full=2
repo1-retention-archive=5
repo2-path = /backup/pgbackrest-mirror
repo2-retention-full = 1
log-level-console = info
[miguel]
pg1-host = miguel
pg1-path = /postgres/12/data
pg1-host-user = postgres
while on the target machine the main configuration parameters are (
/usr/local/etc/pgbackrest.conf):
[global]
repo1-path = /backup/pgbackrest
repo1-host-user = backup
repo1-host = carmensita
repo2-host = sheriff
repo2-host-user = backup
repo2-path = /backup/pgbackrest-mirror
Creating a stanza
As you can imagine, thestanza-create command creates the stanza in all the repositories automatically:
$ pgbackrest --stanza miguel stanza-create
P00   INFO: stanza-create for stanza 'miguel' on repo1
P00   INFO: stanza-create for stanza 'miguel' on repo2
P00   INFO: stanza-create command end: completed successfully (1017ms)
Executing a backup
It is now time to execute a backup and see what happens:% pgbackrest --stanza miguel backup
...
INFO: repo option not specified, defaulting to repo1
...
INFO: new backup label = 20210413-105939F
INFO: backup command end: completed successfully (254377ms)
INFO: expire command begin 2.33: --exec-id=1606-12c0320b --log-level-console=info --repo1-path=/backup/pgbackrest --repo2-path=/backup/pgbackrest-mirror --repo1-retention-archive=5 --repo1-retention-full=2 --repo2-retention-full=1 --stanza=miguel
INFO: expire command end: completed successfully (59ms)
As you can see, since I did not specify any particular repository, the program program automatically selects the first repository.
Mixed backups
Having a single repository active in the backup list means the backup status is mixed:$ pgbackrest --stanza miguel info
stanza: miguel
    status: mixed
        repo1: ok
        repo2: error (no valid backups)
    cipher: none
    db (current)
        wal archive min/max (12): 0000000100000005000000F2/000000010000000600000004
        full backup: 20210413-105939F
            timestamp start/stop: 2021-04-13 10:59:39 / 2021-04-13 11:03:51
            wal start/stop: 000000010000000600000004 / 000000010000000600000004
            database size: 2.5GB, database backup size: 2.5GB
            repo1: backup set size: 142.8MB, backup size: 142.8MB
To some extent, the above is a degraded state, that means not all repositories are up with good backups.
Note that the single backup info now has a final line that indicates the repository where the backup can be found.
Specifying the repository for a backup
You can specify the--repo option to instrument pgbackrest on which repository to store the backup:
% pgbackrest --stanza miguel backup --repo 2
...
INFO: backup command end: completed successfully (4846ms)
The situation on the repositories
Theinfo command can, as always, display information about repositories and their content:
% pgbackrest --stanza miguel info
stanza: miguel
    status: ok
    cipher: none
    db (current)
        wal archive min/max (12): 0000000100000005000000F2/000000010000000600000016
        full backup: 20210413-105939F
            timestamp start/stop: 2021-04-13 10:59:39 / 2021-04-13 11:03:51
            wal start/stop: 000000010000000600000004 / 000000010000000600000004
            database size: 2.5GB, database backup size: 2.5GB
            repo1: backup set size: 142.8MB, backup size: 142.8MB
        full backup: 20210413-111525F
            timestamp start/stop: 2021-04-13 11:15:25 / 2021-04-13 11:19:37
            wal start/stop: 00000001000000060000000F / 00000001000000060000000F
            database size: 2.5GB, database backup size: 2.5GB
            repo2: backup set size: 142.8MB, backup size: 142.8MB
...
One backup at a time
It is not possible, as far as I know, to instrumentpgbackrest to do simultaneously backups on all the repositories. This means that you are in charge of scheduling backups on all the repositories manually!
Archiving on all the repositories
The archiving, however, is done on all repositories at the same time. However, as explained here, thearchive-push will iterate on every repository to push the same WAL segment. What this mean is that, from a PostgreSQL perspective, if a repository fails to get the WAL (while the others succeed), PostgreSQL will think the archiving has failed and will retry later.
One way to solve the problem is to use the
archive-push asynchronous mode.
Conclusions
I am very enthusiast about howpgbackrest is progressing and how it is enabling new features at every release.