pgdump, text and xz

I have a database that contains around 50 GB of data. I do continuos backup thru pgBackRest, I also do regular pg_dump in directory format via multiple jobs, so I’m fine with backups.
However, why not have a look at SQL backups?
First of all: the content of the database is mostly numeric, being a quite large container of sensors data. This means that the data should be very good for compression.
Moreover, tables are partitioned on a per-year and per-month basis, therefore I have a regular structure with one year table and twelve month childrens. For instance, in the current year there is a table named y2021 with other partitions named y2021m01 thru y2021m12.

`pg_dump` in text mode

I did a simple for loop in my shell to produce a few backup files, separating every single file by its year:

% for y in $(echo 2018 2019 2020 2021 2022 ); do
echo "Backup year $y"
time pg_dump -h miguel -U postgres -f sensorsdb.$y.sql -t "respi.y${y}*" sensorsdb
done

This produce the following amount of data:

% ls -sh1 *.sql     
3,5G sensorsdb.2018.sql
 13G sensorsdb.2019.sql
 12G sensorsdb.2020.sql
 10G sensorsdb.2021.sql
 20K sensorsdb.2022.sql

The following is a table that summarizes the file size and the time required to create it:

year	SQL size	time
2018	3.5 GB	7 minutes
2019	13 GB	20 minutes
2020	12 GB	20 minutes
2021	10 GB	17 minutes

Compress them!

Use xz with the default settings, that according to my installation is a compression level 6:

% for y in $(echo 2018 2019 2020 2021 2022 ); do
echo "Compress year $y"
time xz sensorsdb.$y.sql                                                          
done

Compress year 2018
xz sensorsdb.$y.sql  2911,75s user 12,62s system 98% cpu 49:22,22 total
Compress year 2019
xz sensorsdb.$y.sql  7411,57s user 41,22s system 98% cpu 2:06:24,38 total
Compress year 2020
xz sensorsdb.$y.sql  6599,22s user 19,08s system 98% cpu 1:52:07,38 total
Compress year 2021
xz sensorsdb.$y.sql  5487,37s user 15,25s system 98% cpu 1:33:08,32 total
Compress year 2022
xz sensorsdb.$y.sql  0,01s user 0,01s system 36% cpu 0,069 total

It requires from one to two hours to compress every single file, as summarized in the following table:

File size	Time	Compressed size	Compression ratio
3.5 GB	50 minutes	227 MB	92 %
13 GB	2 hours	766 MB	94 %
12 GB	2 hours	658 MB	94 %
10 GB	1 and half hour	566 MB	94 %

Therefore, xz is a great tool to compress dump data, especially if that data is textual and most in a numeric form. Unluckily, xz results a little slow when applied with the default compression.
How much does it take to decompress the data? Well, it takes around 4 minutes for every file, that is much faster than the compression.
Just as a comparison, doing a compression with -2 instead of -6 requires around one quarter of the time doing only 1/3 of less compression, e.g., 13 GB required 35 minutes instead of 120 minutes, requiring 1.1 GB of disk space instead of 0.77 GB

.
Let's see the result using

-2` as default compression:

File size	Time	Compressed size	Compression ratio
3.5 GB	10 minutes	338 MB	90 %
13 GB	35 minutes	1.1 GB	91 %
12 GB	37 minutes	918 MB	92 %
10 GB	30 minutes	786 MB	92 %

As you can see, using compression -2 can greatly improve the speed of compression with a minum extra disk space requirement.
What about a directory format of dumping? Well, the same backup with pg_dump -Fd, that defaults at creating compressed objects, required 4.7 GB of disk space. The xz version requires from 3.1 GB (compression -2) to 2.2 GB (compression -6).

Conclusions

xz can help you save a lot of disk storage for textual (SQL) backups, but the default compression level could require an huge amount of time, especially on not-so-poweful machines. However, a lower level of compression can greatly make pg_dump and xz as fast as pg_dump -Fd with some extra space saving.

The article pgdump, text and xz has been posted by Luca Ferrari on December 6, 2021

Tags: postgresql , xz , planet-postgresql-org

pgdump, text and xz

pg_dump in text mode

Compress them!

Conclusions

`pg_dump` in text mode