pgdump, text and xz
I have a database that contains around50 GB
of data. I do continuos backup thru pgBackRest, I also do regular pg_dump
in directory format via multiple jobs, so Iām fine with backups.
However, why not have a look at SQL backups?
First of all: the content of the database is mostly numeric, being a quite large container of sensors data. This means that the data should be very good for compression.
Moreover, tables are partitioned on a per-year and per-month basis, therefore I have a regular structure with one year table and twelve month childrens. For instance, in the current year there is a table named
y2021
with other partitions named y2021m01
thru y2021m12
.
pg_dump
in text mode
I did a simple for
loop in my shell to produce a few backup files, separating every single file by its year:
% for y in $(echo 2018 2019 2020 2021 2022 ); do
echo "Backup year $y"
time pg_dump -h miguel -U postgres -f sensorsdb.$y.sql -t "respi.y${y}*" sensorsdb
done
This produce the following amount of data:
% ls -sh1 *.sql
3,5G sensorsdb.2018.sql
13G sensorsdb.2019.sql
12G sensorsdb.2020.sql
10G sensorsdb.2021.sql
20K sensorsdb.2022.sql
year | SQL size | time |
---|---|---|
2018 | 3.5 GB | 7 minutes |
2019 | 13 GB | 20 minutes |
2020 | 12 GB | 20 minutes |
2021 | 10 GB | 17 minutes |
Compress them!
Usexz
with the default settings, that according to my installation is a compression level 6:
% for y in $(echo 2018 2019 2020 2021 2022 ); do
echo "Compress year $y"
time xz sensorsdb.$y.sql
done
Compress year 2018
xz sensorsdb.$y.sql 2911,75s user 12,62s system 98% cpu 49:22,22 total
Compress year 2019
xz sensorsdb.$y.sql 7411,57s user 41,22s system 98% cpu 2:06:24,38 total
Compress year 2020
xz sensorsdb.$y.sql 6599,22s user 19,08s system 98% cpu 1:52:07,38 total
Compress year 2021
xz sensorsdb.$y.sql 5487,37s user 15,25s system 98% cpu 1:33:08,32 total
Compress year 2022
xz sensorsdb.$y.sql 0,01s user 0,01s system 36% cpu 0,069 total
It requires from one to two hours to compress every single file, as summarized in the following table:
File size | Time | Compressed size | Compression ratio |
---|---|---|---|
3.5 GB | 50 minutes | 227 MB | 92 % |
13 GB | 2 hours | 766 MB | 94 % |
12 GB | 2 hours | 658 MB | 94 % |
10 GB | 1 and half hour | 566 MB | 94 % |
Therefore,
xz
is a great tool to compress dump data, especially if that data is textual and most in a numeric form. Unluckily, xz
results a little slow when applied with the default compression.
How much does it take to decompress the data? Well, it takes around 4 minutes for every file, that is much faster than the compression.
Just as a comparison, doing a compression with
-2
instead of -6
requires around one quarter of the time doing only 1/3 of less compression, e.g., 13 GB
required 35 minutes
instead of 120 minutes
, requiring 1.1 GB
of disk space instead of
0.77 GB.
Let's see the result using
-2` as default compression:
File size | Time | Compressed size | Compression ratio |
---|---|---|---|
3.5 GB | 10 minutes | 338 MB | 90 % |
13 GB | 35 minutes | 1.1 GB | 91 % |
12 GB | 37 minutes | 918 MB | 92 % |
10 GB | 30 minutes | 786 MB | 92 % |
As you can see, using compression
-2
can greatly improve the speed of compression with a minum extra disk space requirement.
What about a directory format of dumping? Well, the same backup with
pg_dump -Fd
, that defaults at creating compressed objects, required 4.7 GB
of disk space. The xz
version requires from 3.1 GB
(compression -2
) to 2.2 GB
(compression -6
).
Conclusions
xz
can help you save a lot of disk storage for textual (SQL) backups, but the default compression level could require an huge amount of time, especially on not-so-poweful machines. However, a lower level of compression can greatly make pg_dump
and xz
as fast as pg_dump -Fd
with some extra space saving.