A PL/PgSQL Simple Roman Number Translator
In the last Weekly Challenge 227 the second task was about building a simple Roman numbers calculator. Since I usually try to implement those tasks also in PL/PgSQL (as well as in PL/Perl), I tried to implement such calculator and, along the path, I implemented a couple of simple functions to translate a number from and to roman notations.In this short post I explain how the two functions work.
My approach is based on a lookup table that stores arabic and roman correspondencies for special cases and base units.
The lookup table
I defined a lookup table, that can be in whatever schema you want, even temporary, and that is populated with base units and some special cases:CREATE SCHEMA IF NOT EXISTS fluca1978;
CREATE TABLE IF NOT EXISTS fluca1978.roman( r text, n int, repeatable boolean );
TRUNCATE TABLE fluca1978.roman;
INSERT INTO fluca1978.roman
VALUES
('I', 1, true )
,( 'IV', 4, false )
,( 'V', 5, false )
,( 'IX', 9, false )
,( 'X', 10, true )
,( 'XL', 40, false )
,( 'L', 50, false )
,( 'XC', 90, false )
,( 'C', 100, true )
,( 'CD', 400, false )
,( 'D', 500, false )
,( 'CM', 900, false )
,( 'M', 1000, true );
The
r
field holds the roman value for a number n
, while the repeatable
flag indicates if the number can be repeated consequently in the same stirng. For example, I
can be repeated to form III
, while IV
cannot be repeated into IVIV
. This will be useful during validation.
Validating a Roman String
The following function perform the minimal validation for a given input string that is supposed to be a roman number:CREATE OR REPLACE FUNCTION
fluca1978.validate_roman( r text )
RETURNS boolean
STRICT
AS $CODE$
DECLARE
current_record fluca1978.roman%rowtype;
rx text;
matches int;
BEGIN
FOR current_record IN SELECT * FROM fluca1978.roman ORDER BY n DESC LOOP
RAISE DEBUG 'Iterating over Roman value % = %', current_record.r, current_record.n;
matches := 0;
rx := format( '^%s', current_record.r );
WHILE r ~ rx LOOP
matches := matches + 1;
RAISE DEBUG 'Input string % -> % matches the Roman value %', r, matches, current_record.r;
IF NOT current_record.repeatable AND matches > 1 THEN
RAISE DEBUG 'Roman symbol % cannot be repeated!', current_record.r;
RETURN false;
END IF;
r := regexp_replace( r, rx, '' );
EXIT WHEN length( r ) = 0;
END LOOP;
EXIT WHEN length( r ) = 0;
END LOOP;
IF length( r ) > 0 THEN
RETURN false;
END IF;
RETURN true;
END
$CODE$
LANGUAGE plpgsql;
The idea is simple: I order the lookup table in descending order, so from the biggest value to the smallest one. At each iteration, I search fi the current roman string starts with the roman letter (or couple of letters). If that is the case, I keep track of how many
matches
I’ve found, then remove the roman symbol from the beginning of the string.
Then I see if the same letter/symbol can be found in the beginning of the string, and if so, I ensure it is a repeatable value, otherwise there is an error.
If everything goes well, the ending string r
will be empty due to the substitutions, otherwise if some characters remain then the string is wrong.
That happens, for example, when the roman values on the right are biggest than those on the left.
Converting from Roman to Arabic
The following function does the convertion starting from a Roman string:CREATE OR REPLACE FUNCTION
fluca1978.from_roman( r text )
RETURNS int
STRICT
AS $CODE$
DECLARE
v int := 0;
current_record fluca1978.roman%rowtype;
rx text;
BEGIN
IF r = '' THEN
RETURN 0;
END IF;
IF NOT fluca1978.validate_roman( r ) THEN
RETURN 0;
END IF;
FOR current_record IN SELECT * FROM fluca1978.roman ORDER BY n DESC LOOP
RAISE DEBUG 'Iterating over Roman value % = %', current_record.r, current_record.n;
rx := format( '^%s', current_record.r );
WHILE r ~ rx LOOP
RAISE DEBUG 'Input string % matches the Roman value %', r, current_record.r;
v := v + current_record.n;
r := regexp_replace( r, rx, '' );
END LOOP;
END LOOP;
RAISE DEBUG 'Converted value is %', v;
RETURN v;
END
$CODE$
LANGUAGE plpgsql;
It is really similar to the validating function: it iterates on each part of the string searching to decode as the biggest possible value in the roman lookup table.
It is possible to see the workflow of the function by means of using the
DEBUG
log level:
testdb=> set client_min_messages to debug;
SET
testdb=> select fluca1978.from_roman( 'MCMLXXVIII' );
DEBUG: Iterating over Roman value M = 1000
DEBUG: Input string MCMLXXVIII matches the Roman value M
DEBUG: Iterating over Roman value CM = 900
DEBUG: Input string CMLXXVIII matches the Roman value CM
DEBUG: Iterating over Roman value D = 500
DEBUG: Iterating over Roman value CD = 400
DEBUG: Iterating over Roman value C = 100
DEBUG: Iterating over Roman value XC = 90
DEBUG: Iterating over Roman value L = 50
DEBUG: Input string LXXVIII matches the Roman value L
DEBUG: Iterating over Roman value XL = 40
DEBUG: Iterating over Roman value X = 10
DEBUG: Input string XXVIII matches the Roman value X
DEBUG: Input string XVIII matches the Roman value X
DEBUG: Iterating over Roman value IX = 9
DEBUG: Iterating over Roman value V = 5
DEBUG: Input string VIII matches the Roman value V
DEBUG: Iterating over Roman value IV = 4
DEBUG: Iterating over Roman value I = 1
DEBUG: Input string III matches the Roman value I
DEBUG: Input string II matches the Roman value I
DEBUG: Input string I matches the Roman value I
DEBUG: Converted value is 1978
from_roman
------------
1978
(1 row)
As you can see, at every iteration the function removes the leftmost letter from the string and continues to see what it can find next.
The matching is performed by building a regular expression as condition to the
WHILE
loop: the condition has the begin at string anchor ^
followed by whatever roman symbole is in the current record out of the lookup table. The special EXIT
part ensures that there cannot be repetitions of two letetrs symbols. For example you cannot express IVIV
as 8
, so once IV
is encountered, the WHILE
knows it can safely exit the loop.
Converting From Arabic to Roman
The following function does the opposite: converts an integer into a roman string.CREATE OR REPLACE FUNCTION
fluca1978.to_roman( n int )
RETURNS text
STRICT
AS $CODE$
DECLARE
roman_value text := '';
current_record fluca1978.roman%rowtype;
BEGIN
IF n <= 0 THEN
RAISE DEBUG 'Cannot convert zero!';
RETURN NULL;
END IF;
FOR current_record IN SELECT * FROM fluca1978.roman ORDER BY n DESC LOOP
WHILE n >= current_record.n LOOP
RAISE DEBUG 'The value % is greater than % so appending a %', n, current_record.n, current_record.r;
roman_value := roman_value || current_record.r;
n := n - current_record.n;
EXIT WHEN length( current_record.r ) = 2;
END LOOP;
END LOOP;
RAISE DEBUG 'Computed value is %', roman_value;
RETURN roman_value;
END
$CODE$
LANGUAGE plpgsql;
Again, thanks to the debug output it is easy to understand the workflow of the converter:
testdb=> select fluca1978.to_roman( 1978 );
DEBUG: The value 1978 is greater than 1000 so appending a M
DEBUG: The value 978 is greater than 900 so appending a CM
DEBUG: The value 78 is greater than 50 so appending a L
DEBUG: The value 28 is greater than 10 so appending a X
DEBUG: The value 18 is greater than 10 so appending a X
DEBUG: The value 8 is greater than 5 so appending a V
DEBUG: The value 3 is greater than 1 so appending a I
DEBUG: The value 2 is greater than 1 so appending a I
DEBUG: The value 1 is greater than 1 so appending a I
DEBUG: Computed value is MCMLXXVIII
to_roman
------------
MCMLXXVIII
Caching Results
It is, clearly, very simple to define a materialized view or a cache table to handle all values for a faster lookup. As an example, imagine to create a table that serves as a cache:CREATE TABLE IF NOT EXISTS fluca1978.roman_cache_table( n int, r text );
TRUNCATE TABLE fluca1978.roman_cache_table;
INSERT INTO fluca1978.roman_cache_table( n, r )
SELECT n, r
FROM fluca1978.roman
ORDER BY n;
and then a function that, given a number, tries to understand if the caching table contains such a number, otherwise populates the table from the last found index to the given one
CREATE OR REPLACE FUNCTION
fluca1978.roman_cache( x int )
RETURNS text
STRICT
AS $CODE$
DECLARE
max_cached_value int;
i int;
v text;
BEGIN
SELECT max( n )
INTO max_cached_value
FROM fluca1978.roman_cache_table;
RAISE DEBUG 'Max cached value % and looking for %', max_cached_value, x;
IF max_cached_value IS NULL OR x > max_cached_value THEN
IF max_cached_value IS NULL THEN
max_cached_value := 1;
END IF;
RAISE DEBUG 'Repopulating the cache from % to %', max_cached_value, x;
FOR i IN max_cached_value + 1 .. x LOOP
INSERT INTO fluca1978.roman_cache_table( n, r )
SELECT i, fluca1978.to_roman( i );
END LOOP;
END IF;
SELECT r
INTO v
FROM fluca1978.roman_cache_table
WHERE n = x;
RETURN v;
END
$CODE$
LANGUAGE plpgsql;
When you query the above function, the system inspects the
roman_cache_table
for the asked arabic number, and the number is in there it returns it. If the number is greater than the max value within the caching table, the function populates the table up to the given number.