Preventing FreeBSD to kill PostgreSQL (aka OOM Killer prevention)

There are a lot of interesting articles on how to prevent the Out of Memory Killer (OOM killer in short) on Linux to ruin you day, or better your night. One particular well done explaination about how the OOM Killer works, and how to help PostgreSQL to survive, is, in my humble opinion, the one from Percona Blog.

I tend to run PostgreSQL on FreeBSD machines, at least whenever it is possible, and quite frankly I have still a lot of things to learn. One of those little details is about FreeBSD OOM Killer.
It turned out FreeBSD has its own OOM Killer implementation, see this excellent article; I discovered it recently via the excellent FreeBSD forum and, as usual, the kindness and professional of the community behind this great operating system.

A difference between Linux and FreeBSD is that the former exploits a lot the /proc filesystem to let the administrator to interact with the process configurations and information, while the latter does not. And thanks to the the above article I discovered the protect(1) command, that is aimed to instrument the OOM Killer.

In the following I describe what I learnt so far and how to protect PostgreSQL from the OOM Killer.

`protect(1)` and FreeBSD OOM Killer

Processes in FreeBSD has a particular flag named PROC_SPROTECT that, as the man page for procctl(2) system call states, is used to instrument the OOM Killer to skip this process when selecting a candidate to kill:

PROC_SPROTECT    Set process protection state.  This is used to mark
                 a process as protected from being killed if the
                 system exhausts the available memory and swap. 

The idea is that when the OOM Killer scans the processes to find out one (or more) candidate to kill to immediatly free memory, the protected processes must be skipped.
An important thing to note is that protection is not inherited by fork(2)-ed processes. Luckily, it is possible to mark a protected process to let its children to inherit the protection status. In fact, setting PROC_SPROTECT to:

PPROT_SET protects the current process but not its children;
PPROT_SET | PPROT_INHERIT protects the current process and any children from hereby.

Why is this detail important? Because as we all know, PostgreSQL starts with a main process (the postmaster) that forks a new process for every connection. Therefore, you are free to control the OOM Killer protection at level of postmaster or connection level.

WARNING: marking all processes as protected can prevent the OOM Killer to work at all, with the presumably result of panicing the whole machine.

Protecting PostgreSQL from OOM Killer

There are two main ways to protect PostgreSQL from the OOM Killer:

manually use protect(1) against one or more PostgreSQL processes;
automatically use protect(1) at sevrice startup.

Manually using protect(1) means that you are going to protect the process by means of its PID. As an example, suppose that on a machine there are the following processes:

% sudo pstree -s postgres
 \-+= 00776 postgres /usr/local/bin/postgres -D /postgres/12/data
   |--= 00777 postgres postgres: logger    (postgres)
   |--= 00779 postgres postgres: checkpointer    (postgres)
   |--= 00780 postgres postgres: background writer    (postgres)
   |--= 00781 postgres postgres: walwriter    (postgres)
   |--= 00782 postgres postgres: stats collector    (postgres)
   \--= 00783 postgres postgres: logical replication launcher    (postgres)

where the process with PID 776 is clearly the postmaster. Now, assume you want to protect the postmaster itself: you can call protect(1) specyfing the PID of the process.

% sudo protect -p 776

The main flags for protect(1) are:

-p specifies the PID of the process to protect;
-d or -i to apply the protection to all the current children or to the future children;
-c to remove the protection.
Therefore, in order to protect all new connections to the database the command to use is:

% sudo protect -i -p 776

that reads as protect process 776 and all new forked processes.

Doing all the protection manually is boring, and luckily the excellent rc.d system allows for the configuration of protection at the service startup. It is possible to specify the oomprotect configuration parameter for the service (all services, not only PostgreSQL!), that in turn can assume the following values:

yes enables protection for (a single) process;
all enables protection for all processes (forked).

Unluckily, this does not apply directly to PostgreSQL since the service(8) script /usr/local/etc/rc.d/postgresql does not fully use /etc/rc.subr that, in turn, is in charge of examining the oomprotect variable. The postgresql script uses directly pg_ctl(1) to manage the cluster, without any “protection** possible. I suspect the problem is due to the fact that pg_ctl(1) must be run as a normal user, and therefore there is the need to simultaneously run the pg_ctl(1) command without root privileges, as well as with such privileges to wrap it in protect(1). In short, this means that even a configuration like the following will not apply protect(1) to PostgreSQL:

postgresql_enable="YES"
postgresql_data="/postgres/12/data"

# all = protect -i -p
# yes = protect -p
postgresql_oomprotect="all"

Therefore, in order to protect the postmaster or any other PostgreSQL process, you need to manually use protect(1) as already shown.
I am not sure if this is going to change in the future to allow the rc.d script to honor the oomprotect variable.

How to inspect the protection status

This has been hard to me, but again thanks to great FreeBSD community and IRC, I discovered that ps(1) has a special command line argument, named flags, that can show the status of the single process protection. It is also the flags2 command line argument that shows the status of the protection inheritance.
Both the flags and flags2 sections contain hexadecimal values that indicates all the extra information tied to a process. In the case of P_PROTECTED the value is 0x100000 (and this is found in flags), while for the P_INHERIT_PROTECTED the value is 0x00000001 (and this is found in flags2).
Putting it all together, you can inspect your PostgreSQL processes as follows:

% sudo ps -ax -o flags,flags2,pid,command | grep postgres
10104000 00000001 3747 /usr/local/bin/postgres -D /postgres/12/data
10100000 00000001 3748 postgres: logger    (postgres)
10100000 00000001 3750 postgres: checkpointer    (postgres)
10100000 00000001 3751 postgres: background writer    (postgres)
10100000 00000001 3752 postgres: walwriter    (postgres)
10100000 00000001 3753 postgres: stats collector    (postgres)
10100000 00000001 3754 postgres: logical replication launcher    (postgres)

The first process, with PID 3747 is the already mentioned postmaster and it has a flags value of 10104000 that means it is OOM protected, and it also has a flags2 section that is 00000001 that means it will make any spawn process protected too. You can check this with some math and Perl:

% sudo ps -ax -o flags,flags2,command \
       | grep postgres \  
       | perl -lanE 'say "[OOM PROTECTED]\t@F[2 .. $#F]" if $F[0] =~ /^\d{2}1\d{5}$/; '                |                                                                                |
[OOM PROTECTED] /usr/local/bin/postgres -D /postgres/12/data
[OOM PROTECTED] postgres: logger (postgres)
[OOM PROTECTED] postgres: checkpointer (postgres)
[OOM PROTECTED] postgres: background writer (postgres)
[OOM PROTECTED] postgres: walwriter (postgres)
[OOM PROTECTED] postgres: stats collector (postgres)
[OOM PROTECTED] postgres: logical replication launcher (postgres)

The above Perl one liner gets the command line and the flags section, as internal array @F, and checks if the third leftmost bit is set; in such case the process is protected against OOM killing.

Hey ‘ma, am I protected?

I created an example pl/pgSQL function to check if the current connection is protected against the OOM Killer. The function is defined with SECURITY DEFINER and has to be created by a superuser, because it internally uses the COPY command to execute the ps utility.

CREATE OR REPLACE FUNCTION
f_oomprotect( pid int DEFAULT NULL )
RETURNS boolean
AS
$CODE$
DECLARE
  p_protected  bit(8)  = '00100000';
  is_protected boolean = false;
  shell        text;
BEGIN
  -- if no pid supplied, use my own
  IF pid IS NULL OR pid < 0 THEN
    pid := pg_backend_pid();
  END IF;

  RAISE DEBUG 'Inspecting PostgreSQL process %', pid;

  shell :=    '/bin/ps -ax -o flags,flags2 -p '
                || pid
                || ' | /usr/bin/tail -n 1 ';
  CREATE TEMPORARY TABLE IF NOT EXISTS
            my_ps( flags bit(8), flags2 bit(8) );
  TRUNCATE my_ps;
  EXECUTE format( '  COPY my_ps( flags , flags2 ) FROM PROGRAM $$ %s $$ WITH ( DELIMITER $$ $$, FORMAT TEXT)', shell );


   SELECT ( flags & p_protected )::int > 0
   INTO is_protected
   FROM my_ps;

   RETURN is_protected;
END
$CODE$
LANGUAGE plpgsql
SECURITY DEFINER;

The idea is quite simple: the function gets a PID to check, if none is specified it assumes we are interested in the current connection. Then the function creates (or empties) a temporary table my_ps to store the result of the ps shell command, in particular flags and flags2 (even if only the former is used). Flags are stored as bit strings, so that it becomes simpler to make flag comparison. Last, the flags field is compared with a logical and with the p_protected internal variable, and the boolean result is returned.
Therefore if the function returns true the selected connection/backend process is protected against the OOM Killer.

Conclusions

As usual FreeBSD reveals itself as a complex and well designed operating system. PostgreSQL can be protected against the OOM Killer in a more aggressive way with regard to Linux, but as usual protecting everything is like protecting nothing at all, so I recommend to not abuse about the protec(1) command.

The article Preventing FreeBSD to kill PostgreSQL (aka OOM Killer prevention) has been posted by Luca Ferrari on April 2, 2021

Tags: postgresql , planet-postgresql-org , freebsd