Query of the week: Find large field values in jsonb column in postgresql
Query of the week: Find large field values in a jsonb column in postgresql
Table name is 'foods'. The query uses a few intersting jsonb functions and runs at a reasonable clip. It will list fields in a jsonb column named 'details_jsonb' that have a large average value length.
SELECT key, AVG(char_length(value)) as value_length FROM
(SELECT (jsonb_each_text(details_jsonb)).* FROM foods) k
GROUP BY key
ORDER BY value_length DESC
LIMIT 250;
Bonus:
Once you find those fields and they turn out to be noise, you may want to remove them with a query like so:
UPDATE foods SET details_jsonb =
details_jsonb - ARRAY(SELECT ky FROM (SELECT jsonb_object_keys(details_jsonb) AS ky) k WHERE ky LIKE 'ingredients_text_%_ocr_%');
Interesting here is the subquery that is converted into an array with ARRAY() and passed to the - (minus) operator on a jsonb field. Runs at a reasonable pace as well. You could (probably) even pass the first query, wrapped in another subquery, to the second like so:
UPDATE foods SET details_jsonb = details_jsonb - ARRAY(SELECT key FROM (SELECT key, AVG(char_length(value)) as value_length FROM
(SELECT (jsonb_each_text(details_jsonb)).* FROM foods) k1 GROUP BY key ORDER BY value_length DESC LIMIT 250;) k2);
... haven't tried it in terms of performance with a large array, so go wild (after backups) trying different subqueries turned into arrays.
Hope it helps.
Posted: 13 May 2022