pgdump, text and xz
I have a database that contains around50 GB of data. I do continuos backup thru pgBackRest, I also do regular pg_dump in directory format via multiple jobs, so Iām fine with backups.
However, why not have a look at SQL backups?
First of all: the content of the database is mostly numeric, being a quite large container of sensors data. This means that the data should be very good for compression.
Moreover, tables are partitioned on a per-year and per-month basis, therefore I have a regular structure with one year table and twelve month childrens. For instance, in the current year there is a table named
y2021 with other partitions named y2021m01 thru y2021m12.
pg_dump in text mode
I did a simple for loop in my shell to produce a few backup files, separating every single file by its year:
% for y in $(echo 2018 2019 2020 2021 2022 ); do
echo "Backup year $y"
time pg_dump -h miguel -U postgres -f sensorsdb.$y.sql -t "respi.y${y}*" sensorsdb
done
This produce the following amount of data:
% ls -sh1 *.sql
3,5G sensorsdb.2018.sql
13G sensorsdb.2019.sql
12G sensorsdb.2020.sql
10G sensorsdb.2021.sql
20K sensorsdb.2022.sql
| year | SQL size | time |
|---|---|---|
| 2018 | 3.5 GB | 7 minutes |
| 2019 | 13 GB | 20 minutes |
| 2020 | 12 GB | 20 minutes |
| 2021 | 10 GB | 17 minutes |
Compress them!
Usexz with the default settings, that according to my installation is a compression level 6:
% for y in $(echo 2018 2019 2020 2021 2022 ); do
echo "Compress year $y"
time xz sensorsdb.$y.sql
done
Compress year 2018
xz sensorsdb.$y.sql 2911,75s user 12,62s system 98% cpu 49:22,22 total
Compress year 2019
xz sensorsdb.$y.sql 7411,57s user 41,22s system 98% cpu 2:06:24,38 total
Compress year 2020
xz sensorsdb.$y.sql 6599,22s user 19,08s system 98% cpu 1:52:07,38 total
Compress year 2021
xz sensorsdb.$y.sql 5487,37s user 15,25s system 98% cpu 1:33:08,32 total
Compress year 2022
xz sensorsdb.$y.sql 0,01s user 0,01s system 36% cpu 0,069 total
It requires from one to two hours to compress every single file, as summarized in the following table:
| File size | Time | Compressed size | Compression ratio |
|---|---|---|---|
| 3.5 GB | 50 minutes | 227 MB | 92 % |
| 13 GB | 2 hours | 766 MB | 94 % |
| 12 GB | 2 hours | 658 MB | 94 % |
| 10 GB | 1 and half hour | 566 MB | 94 % |
Therefore,
xz is a great tool to compress dump data, especially if that data is textual and most in a numeric form. Unluckily, xz results a little slow when applied with the default compression.
How much does it take to decompress the data? Well, it takes around 4 minutes for every file, that is much faster than the compression.
Just as a comparison, doing a compression with
-2 instead of -6 requires around one quarter of the time doing only 1/3 of less compression, e.g., 13 GB required 35 minutes instead of 120 minutes, requiring 1.1 GB of disk space instead of 0.77 GB.
Let's see the result using -2` as default compression:
| File size | Time | Compressed size | Compression ratio |
|---|---|---|---|
| 3.5 GB | 10 minutes | 338 MB | 90 % |
| 13 GB | 35 minutes | 1.1 GB | 91 % |
| 12 GB | 37 minutes | 918 MB | 92 % |
| 10 GB | 30 minutes | 786 MB | 92 % |
As you can see, using compression
-2 can greatly improve the speed of compression with a minum extra disk space requirement.
What about a directory format of dumping? Well, the same backup with
pg_dump -Fd, that defaults at creating compressed objects, required 4.7 GB of disk space. The xz version requires from 3.1 GB (compression -2) to 2.2 GB (compression -6).
Conclusions
xz can help you save a lot of disk storage for textual (SQL) backups, but the default compression level could require an huge amount of time, especially on not-so-poweful machines. However, a lower level of compression can greatly make pg_dump and xz as fast as pg_dump -Fd with some extra space saving.