Who needs comments?

My friend and colleague Enrico told me about one of those hidden features of pg_dump: --no-comments.

The option allows you to dump the database (or the part of it) without dumping any user defined comment, that is no comment on tables, data types, and nothing you placed with an explicit COMMENT ON statement.
This made me lough at firts: why should I don’t want comments on my dump? Are we still back in the ninenties where people thought that hiding information was a good strategy to ensure their job?
However, there are some cases I can think about where you don’t want comments. For example, some extensions use comments on objects to perform some magic, and pgaudit comes to mind. But it is not always true that you need to replicate the same configuration on another database, hence you should strip off the comments.

How pg_dump avoids comments

Having a quick look at pg_dump source code, the function dumpTableComment represent a good introduction to how the comments are dumped or not. In particular, in the very beginning of the function, you can find something like:

/* do nothing, if --no-comments is supplied */
if (dopt->no_comments)
	return;

/* Comments are SCHEMA not data */
if (dopt->dataOnly)
	return;

/* Search for comments associated with relation, using table */
ncomments = findComments(fout,
						 tbinfo->dobj.catId.tableoid,
						 tbinfo->dobj.catId.oid,
						 &comments);



If the --no-comments command line option is set (i.e., dopt->no_comments is true), the function returns immediatly since there is nothing to do.
Interestingly, if the user wants to dump only the data for the database, and not its schema, the comments are not dumped too. That’s quite obvious if you think about.
The findComments function is in charge of going to the storage to retrieve the comments, and it does in a strange way. It invokes, in turn, collectComments, that executes a query like the following one:

appendPQExpBufferStr(query, "SELECT description, classoid, objoid, objsubid "
						 "FROM pg_catalog.pg_description "
						 "ORDER BY classoid, objoid, objsubid");


Do you see something strange there? There is no WHERE clause in the query! It does mean that the function is going to get all the comments from all the objects in the database, as reported also by the function comments preamble:

/*
 * collectComments --
 *
 * Construct a table of all comments available for database objects.
 * We used to do per-object queries for the comments, but it's much faster
 * to pull them all over at once, and on most databases the memory cost
 * isn't high.
 *
 * The table is sorted by classoid/objid/objsubid for speed in lookup.
 */


The idea is that all the comments are retrieved on a single pass, and then findComments performs a kind of binary search to find out the exact range of comments that match the object it is dumping at that moment (i.e., the table).

The article Who needs comments? has been posted by Luca Ferrari on August 18, 2020