Case Study 2 — The Idempotent Import: Fixing a Check-Then-Insert Race
"Check if it exists, then insert if it doesn't" looks correct and works in testing — then fails intermittently in production under concurrency.
INSERT ... ON CONFLICTfixes it atomically. This is a small story with a big lesson about why some bugs only appear when two things happen at once.
Background
A company synced product data from a supplier feed every few minutes. The job upserted each product: if a product with that sku existed, update it; otherwise insert it. The original code did the obvious thing — check, then write:
# Pseudocode of the original sync (check-then-insert)
row = db.query("SELECT product_id FROM products WHERE sku = %s", sku)
if row:
db.execute("UPDATE products SET name=%s, price=%s WHERE sku=%s", name, price, sku)
else:
db.execute("INSERT INTO products (sku, name, price, ...) VALUES (%s, %s, %s, ...)", ...)
In testing — one job, one thread — it worked perfectly. In production, the logs filled with sporadic errors:
ERROR: duplicate key value violates unique constraint "products_sku_key"
DETAIL: Key (sku)=(LAP-001) already exists.
Some sync runs failed partway through, leaving the catalog partially updated, and on-call kept "fixing" it by re-running the job.
What went wrong: a race condition
The feed sometimes ran as two overlapping workers (a retry fired before the first finished, or the schedule overlapped). Two workers could process the same new sku at nearly the same instant:
Worker A: SELECT sku 'LAP-001' → not found
Worker B: SELECT sku 'LAP-001' → not found (A hasn't inserted yet)
Worker A: INSERT 'LAP-001' → succeeds
Worker B: INSERT 'LAP-001' → ERROR: duplicate key
Both workers "checked" and saw nothing, because the check and the insert were separate steps with a gap between them. In that gap, the other worker inserted. The UNIQUE(sku) constraint then correctly rejected the second insert — but the application wasn't expecting it, so the job crashed. This is a race condition: a bug that exists only because of the interleaving of concurrent operations, invisible when one thing runs at a time.
The UNIQUE constraint did its job (it prevented duplicate skus — integrity held). The bug was that the application logic assumed check-then-insert was atomic. It isn't.
The fix: make it atomic with ON CONFLICT
INSERT ... ON CONFLICT performs the "insert, or update if it already exists" as a single atomic statement, so there's no gap for another worker to slip into:
INSERT INTO products (sku, name, price, category_id, supplier_id)
VALUES (%s, %s, %s, %s, %s)
ON CONFLICT (sku)
DO UPDATE SET name = EXCLUDED.name,
price = EXCLUDED.price,
category_id = EXCLUDED.category_id,
supplier_id = EXCLUDED.supplier_id;
Now, whichever worker reaches the row first inserts it; the other hits the conflict and updates instead of erroring. No crash, no partial runs, and — crucially — the operation is idempotent: running the sync twice produces the same result as running it once. The intermittent errors vanished, and the job became safe to retry.
The analysis
-
Check-then-act is not atomic. Any pattern of "read, decide, then write" has a gap between the read and the write where another session can change the world. Under concurrency, that gap is a bug. (You'll study this formally as the "lost update" and related anomalies in Chapter 27.)
-
The constraint caught it — that's the system working. The
UNIQUEconstraint preventing the duplicate is exactly what you want; integrity was never violated. The fix isn't to remove the constraint (never do that), it's to make the application's write atomic so it doesn't fight the constraint. -
ON CONFLICTcollapses check-and-write into one atomic step. The database does the existence check and the insert/update as a unit, with no exploitable gap. This is the right tool for "insert or update" — and for any idempotent load or sync. -
Idempotency is a feature. A job you can safely re-run (because it converges to the same state) is dramatically easier to operate — retries, overlaps, and replays become harmless. Upsert is a primary tool for building idempotent data jobs.
-
Concurrency bugs hide in single-threaded testing. The code was "correct" with one worker. Always ask of write logic: what if two of these run at the same time? If the answer is "trouble," you need atomicity (a single statement, or a transaction with the right isolation — Chapters 26–27).
Discussion questions
- Draw the interleaving of two workers that produces the duplicate-key error. Where exactly is the gap?
- Why is removing the
UNIQUEconstraint the wrong fix? - How does
ON CONFLICTeliminate the race? What makes it atomic where check-then-insert isn't? - Define idempotency and explain why the upsert version is safe to re-run while the original wasn't.
- ⭐ Name two other common "check-then-act" patterns (besides insert) that have the same race hazard, and how you'd make each atomic.