How a few lines of code reduced database load by a few million queries
Back when I originally published Slonik PostgreSQL client for Node.js, I was working with large datasets, inserting tens of millions of records per hour. However, the shape of the data being inserted was usually pretty flat and therefore pretty easy to insert using INSERT INTO ... SELECT * FROM unnset() pattern
. At the time, I recommended the unnest
pattern as the go-to pattern for inserting large number of rows in batches (that was part I).
However, today I’ve found a better way: jsonb_to_recordset.
jsonb_to_recordset
expands the top-level JSON array of objects to a set of rows having the composite type defined by anAS
clause.
Put it simply, jsonb_to_recordset
allows to convert arbitrary JSON into a recordset which we can use to query and insert records from, just like with unnest. However, unlike unnest, since we are simply passing a JSON to PostgreSQL, the resulting format is a lot more expressive and powerful.
SELECT *
FROM json_to_recordset('[{"name":"John","tags":["foo","bar"]},{"name":"Jane","tags":["baz"]}]')
AS t1(name text, tags text[]);
name | tags
------+-----------
John | {foo,bar}
Jane | {baz}
(2 rows)
Let’s demonstrate how you would use it to insert data.
Inserting data using json_to_recordset
Let’s say you need to insert a list of persons into the database, and each person has an array of attributes associated with them.
const persons = [
{
name: 'John',
tags: ['foo', 'bar']
},
{
name: 'Jane',
tags: ['baz']
}
];
You might be tempted to simply iterate through the array and fire a separate INSERT
query for each record, e.g.
for (const person of persons) {
await pool.query(sql`
INSERT INTO person (name, tags)
VALUES (
${person.name},
${sql.array(person.tags, 'text[]')}
)
`);
}
When dealing with a few records, this is even preferable, because it is easy to read and understand. However, if you are like me and you find yourself debugging an INSERT
query that is called 2M+ times per day, then chances are that batching inserts is preferable.
What prompted the search for better alternatives.
You could also attempt to insert using unnest
pattern and it would look something like this:
await pool.query(sql`
INSERT INTO public.person (name, tags)
SELECT t1.name, t1.tags::text[]
FROM unnest(
${sql.array(['John', 'Jane'], 'text')},
${sql.array(['{foo,bar}', '{baz}'], 'text')}
) AS t1.(name, tags);
`);
The problem here is that you are effective responsible for converting arrays into PostgreSQL array string representation and passing them as text parameters, which is ugly. You also need to iterate the array to construct slices that represent values for each column, which is also quite ugly.
However, thanks to jsonb_to_recordset
, we can achieve the same result with:
await pool.query(sql`
INSERT INTO person (name, tags)
SELECT *
FROM jsonb_to_recordset(${sql.jsonb(persons)}) AS t(name text, tags text[])
`);
In contrast to the unnest
approach, using jsonb_to_recordset
we can easily insert complex nested data structures, and we can pass the original JSON document to the query without needing to manipulate it.
In terms of performance they are also exactly the same. As such, my current recommendation is to prefer jsonb_to_recordset
whenever inserting lots of rows or nested data structures.