replace vs regexp_replace
While trying to help Stefan Stefanov with hispg_spreadsheetml
I came across something that would have been obvious, but not too much to me.
The obvious thing is
replace
is generally faster than regexp_replace
.
The fact is that, probably due to my heavy usage of Perl and Raku, I tend to use regular expressions even where they are not really required, and that is why I tried to change a nested invocation of
replace
into one of regexp_replace
. The pull request, and in particular the commit did transform something like:
replace(replace(replace(s, '&', '&'), '>', '>'), '<', '<');
regexp_replace( regexp_replace( regexp_replace( s, '&', '&', 'g' )
, '>'
, '>'
, 'g' )
, '<'
, '<'
, 'g' );
Now, despite the newlines, the usage of
regexp_replace
resulted in slower code.
So we decided to benchmark, and I decided in particular to test it with pgbench
.
Testing with pgbench
I created three sql scripts that essentially do the following:
- loop from 1 to the
:scale
; - build a single XML piece of code with a sligthly different content to avoid caching;
- perform the substitution in three different ways
- with
replace
- with
regexp_replace
- with
regexp_replace
and backreferences
- with
- store the results with timing (
clock_timestamp()
) into a table for later analysis.
I did run the tests in a way similar to the following:
% pgbench -s 300000 -f benchmark_regexp_replace_compact.sql -U luca testdb
and at the end I asked to get the result for the type of test.
Results
Getting the results is quite straightforward, and on my PostgreSQL 12.2 I got:testdb=> SELECT replacement_type, avg( ms ), min( ms ), max( ms )
FROM benchmark_replace GROUP BY replacement_type;
replacement_type | avg | min | max
------------------------|------------------------|-----|----------
regexp_replace | 2.0656612333436503e-05 | 0 | 0.039055
regexp_replace_compact | 0.00018001079899881362 | 0 | 0.06716
replace | 4.885953333294914e-06 | 0 | 0.027875
(3 rows)
that clearly show how
replace
is ten times faster than regexp_replace
that in turns, is roughly ten time faster that a regexp_replace
with backreferences, as you could expect (even if I was hoping for a lower difference due to a minor number of invocations of the function).
It is also interesting that the maximum times pretty much are
200%
of the previous best case.
Conclusions
Even if the presented approach cannot be considered a good benchmarking, it does emphasizes how it is important to use the simplest function available for the task, in this casereplace
when you don’t need to do a regular expression magic.