Query of the week: Find large field values in jsonb column in postgresql

Query of the week: Find large field values in a jsonb column in postgresql

Table name is 'foods'. The query uses a few intersting jsonb functions and runs at a reasonable clip. It will list fields in a jsonb column named 'details_jsonb' that have a large average value length.

SELECT key, AVG(char_length(value)) as value_length FROM 
(SELECT (jsonb_each_text(details_jsonb)).* FROM foods) k 
GROUP BY key 
ORDER BY value_length DESC 
LIMIT 250;

Bonus:

Once you find those fields and they turn out to be noise, you may want to remove them with a query like so:

UPDATE foods SET details_jsonb = 
details_jsonb - ARRAY(SELECT ky FROM (SELECT jsonb_object_keys(details_jsonb) AS ky) k WHERE ky LIKE 'ingredients_text_%_ocr_%');

Interesting here is the subquery that is converted into an array with ARRAY() and passed to the - (minus) operator on a jsonb field. Runs at a reasonable pace as well. You could (probably) even pass the first query, wrapped in another subquery, to the second like so:

UPDATE foods SET details_jsonb = details_jsonb - ARRAY(SELECT key FROM (SELECT key, AVG(char_length(value)) as value_length FROM 
(SELECT (jsonb_each_text(details_jsonb)).* FROM foods) k1 GROUP BY key ORDER BY value_length DESC LIMIT 250;) k2);

... haven't tried it in terms of performance with a large array, so go wild (after backups) trying different subqueries turned into arrays.

Hope it helps.

Posted: 13 May 2022