replace vs regexp_replace
While trying to help Stefan Stefanov with hispg_spreadsheetml I came across something that would have been obvious, but not too much to me.
The obvious thing is
replace is generally faster than regexp_replace.
The fact is that, probably due to my heavy usage of Perl and Raku, I tend to use regular expressions even where they are not really required, and that is why I tried to change a nested invocation of
replace into one of regexp_replace. The pull request, and in particular the commit did transform something like:
replace(replace(replace(s, '&', '&'), '>', '>'), '<', '<');
regexp_replace( regexp_replace( regexp_replace( s, '&', '&', 'g' )
, '>'
, '>'
, 'g' )
, '<'
, '<'
, 'g' );
Now, despite the newlines, the usage of
regexp_replace resulted in slower code.
So we decided to benchmark, and I decided in particular to test it with pgbench.
Testing with pgbench
I created three sql scripts that essentially do the following:
- loop from 1 to the
:scale; - build a single XML piece of code with a sligthly different content to avoid caching;
- perform the substitution in three different ways
- with
replace - with
regexp_replace - with
regexp_replaceand backreferences
- with
- store the results with timing (
clock_timestamp()) into a table for later analysis.
I did run the tests in a way similar to the following:
% pgbench -s 300000 -f benchmark_regexp_replace_compact.sql -U luca testdb
and at the end I asked to get the result for the type of test.
Results
Getting the results is quite straightforward, and on my PostgreSQL 12.2 I got:testdb=> SELECT replacement_type, avg( ms ), min( ms ), max( ms )
FROM benchmark_replace GROUP BY replacement_type;
replacement_type | avg | min | max
------------------------|------------------------|-----|----------
regexp_replace | 2.0656612333436503e-05 | 0 | 0.039055
regexp_replace_compact | 0.00018001079899881362 | 0 | 0.06716
replace | 4.885953333294914e-06 | 0 | 0.027875
(3 rows)
that clearly show how
replace is ten times faster than regexp_replace that in turns, is roughly ten time faster that a regexp_replace with backreferences, as you could expect (even if I was hoping for a lower difference due to a minor number of invocations of the function).
It is also interesting that the maximum times pretty much are
200% of the previous best case.
Conclusions
Even if the presented approach cannot be considered a good benchmarking, it does emphasizes how it is important to use the simplest function available for the task, in this casereplace when you don’t need to do a regular expression magic.