replace vs regexp_replaceWhile trying to help Stefan Stefanov with his
pg_spreadsheetmlI came across something that would have been obvious, but not too much to me.
The obvious thing is
replaceis generally faster than
The fact is that, probably due to my heavy usage of Perl and Raku, I tend to use regular expressions even where they are not really required, and that is why I tried to change a nested invocation of
replaceinto one of
regexp_replace. The pull request, and in particular the commit did transform something like:
into something like
replace(replace(replace(s, '&', '&'), '>', '>'), '<', '<');
regexp_replace( regexp_replace( regexp_replace( s, '&', '&', 'g' ) , '>' , '>' , 'g' ) , '<' , '<' , 'g' );
Now, despite the newlines, the usage of
regexp_replaceresulted in slower code. So we decided to benchmark, and I decided in particular to test it with
I created three sql scripts that essentially do the following:
- loop from 1 to the
- build a single XML piece of code with a sligthly different content to avoid caching;
- perform the substitution in three different ways
- store the results with timing (
clock_timestamp()) into a table for later analysis.
I did run the tests in a way similar to the following:
% pgbench -s 300000 -f benchmark_regexp_replace_compact.sql -U luca testdb
and at the end I asked to get the result for the type of test.
ResultsGetting the results is quite straightforward, and on my PostgreSQL 12.2 I got:
testdb=> SELECT replacement_type, avg( ms ), min( ms ), max( ms ) FROM benchmark_replace GROUP BY replacement_type; replacement_type | avg | min | max ------------------------|------------------------|-----|---------- regexp_replace | 2.0656612333436503e-05 | 0 | 0.039055 regexp_replace_compact | 0.00018001079899881362 | 0 | 0.06716 replace | 4.885953333294914e-06 | 0 | 0.027875 (3 rows)
that clearly show how
replaceis ten times faster than
regexp_replacethat in turns, is roughly ten time faster that a
regexp_replacewith backreferences, as you could expect (even if I was hoping for a lower difference due to a minor number of invocations of the function).
It is also interesting that the maximum times pretty much are
200%of the previous best case.
ConclusionsEven if the presented approach cannot be considered a good benchmarking, it does emphasizes how it is important to use the simplest function available for the task, in this case
replacewhen you don’t need to do a regular expression magic.