I manage a substantial dataset comprising over 100,000 company profiles, each represented by several jsonb columns, including services, categories, and languages, all formatted as arrays. For weeks, background scripts diligently updated these columns without any apparent errors. However, I soon discovered that a significant portion of the records was failing to render values correctly. Containment filters, such as services @> '["SEO"]', returned empty results for rows that should have contained the expected services.
Upon investigation, I found that approximately 150,000 values across ten jsonb columns had become corrupted—not deleted, but rather stored in a format that appeared correct yet behaved incorrectly.
The one line that did the damage
The culprit was a seemingly innocuous line of code using postgres.js:
await sql` update listings set services = ${JSON.stringify(record.services)} where id = ${record.id}`;
Here, record.services contained ["SEO", "PPC"]. The instinct to use JSON.stringify was the root of the problem. The postgres.js library already handles serialization for you. When you provide a JavaScript array to a jsonb column, it stores it as a jsonb array. However, when you pass a string—what JSON.stringify returns—it stores it as a jsonb string.
This means that:
JSON.stringify(["SEO", "PPC"]) // => '["SEO","PPC"]' (a JS string)
does not translate into a jsonb array but rather becomes a jsonb string scalar containing JSON text:
"["SEO","PPC"]"
The array was double-encoded, wrapped in quotes and escaped.
Why it is so sneaky
- It throws no error. A jsonb column accepts a string scalar without complaint. Both
"hello"and"["SEO"]"are valid jsonb entries, allowing the insert to succeed. - It looks correct at a glance. When dumping the column, the data appears array-like. If you do not check the type, it seems valid.
- A cast does not save you. I assumed that an explicit
::jsonbwould resolve the issue:
set services = ${JSON.stringify(arr)}::jsonb // still wrong
This still results in the driver passing a string, and casting JSON-text-as-a-string to jsonb yields a jsonb string scalar instead of an array. The only indication of the issue is through behavior. Using jsonb_typeof reveals the truth:
select services, jsonb_typeof(services) from listings where id = '...';
This query would return:
"["SEO","PPC"]" | string <- should be 'array'
As a result, .map() in the UI produced no useful output, and @> matched nothing. The data was present yet inert.
Finding all of it
To detect the issue, I executed a query that focused on the type, one line per column:
select count(*) from listings where jsonb_typeof(services) = 'string';
Running this across all jsonb columns revealed a disproportionate amount of damage: two columns that my scripts wrote to most frequently were the hardest hit, with around 105,000 and 41,000 corrupted values respectively. The remaining jsonb columns shared the rest of the corrupted entries, totaling approximately 150,000 values, all affected by various scripts that shared the same JSON.stringify habit.
The repair
The solution involved extracting the inner text and re-casting it as valid jsonb. In Postgres, #>> '{}' allows you to extract a jsonb value as plain text, which, when applied to a string scalar, provides the underlying JSON text. This can then be cast back to jsonb, restoring the array:
update listings set services = (services #>> '{}')::jsonb where jsonb_typeof(services) = 'string';
However, two complications turned this one-liner into a small script:
- NUL bytes. Some scraped text contained
u0000escapes. Since Postgres jsonb cannot store NUL, the cast failed on those rows, necessitating their removal first:
set services = replace(services #>> '{}', 'u0000', '')::jsonb
The correct way to write jsonb with postgres.js
The key takeaway is straightforward: never use JSON.stringify on a value intended for a jsonb column. Allow the driver to handle serialization. With postgres.js, the explicit and safe method is:
await sql` update listings set services = ${sql.json(record.services)} where id = ${record.id}`;
For dynamic updates, wrap the values similarly:
JSON.stringify(["SEO", "PPC"]) // => '["SEO","PPC"]' (a JS string)
0
Using sql.json instructs the driver that the input is JSON, ensuring it is serialized only once. This is the complete solution.
What I took away from it
A corruption that generates no error and appears correct is the most costly type of issue. While a crash can be resolved within an hour, this problem lingered in production for weeks, silently degrading the most frequently accessed columns in the database. Every signal indicated that everything was fine, except for the one aspect I failed to check: the type.
Two habits emerged from this experience:
- When a value is destined for a jsonb column, serialization should be the driver’s responsibility, not mine. The moment I encounter JSON.stringify near a database write, I treat it as a potential bug until proven otherwise.
- Type invariants warrant testing. A query like
jsonb_typeof(col) = 'string'on an array column should always yield zero. This check now runs as part of a daily integrity job, ensuring that any errant script reintroducing the pattern is identified promptly, rather than weeks later.
FAQ
Why did JSON.stringify break a jsonb insert if jsonb is JSON?
The driver also performs serialization. In postgres.js, when you provide an array to a jsonb column, it is stored as a jsonb array. However, passing a string (the output of JSON.stringify) results in a jsonb string whose content is JSON text, leading to double encoding.
Does casting with ::jsonb fix it?
No, casting does not resolve the issue. The driver continues to send a string, and casting a JSON-text string to jsonb produces a jsonb string scalar, not an array. Avoid stringifying in the first place, or utilize your driver’s JSON helper.
How do I find double-encoded jsonb values?
Check the type. For a column that should contain arrays or objects, running select count(*) from t where jsonb_typeof(col) = 'string' should yield zero.
How do I repair them?
Extract the inner text using col #>> '{}' and cast it back: set col = (col #>> '{}')::jsonb where jsonb_typeof(col) = 'string'. If present, remove u0000 first, and loop through multiple passes to account for any repeated encoding.