Case Study 1 — The Anti-Join That Silently Returned Nothing
The
NOT IN-with-NULLtrap, in production. A nightly job that found "inactive accounts" worked for two years, then one day quietly stopped finding any — because a singleNULLslipped into the subquery's result.
Background
A SaaS company ran a nightly job to identify accounts with no activity, so it could send re-engagement emails and eventually flag them for archival. The query found accounts that had no row in the activity_log:
-- "Accounts with no logged activity" ❌ NULL-fragile
SELECT account_id, email
FROM accounts
WHERE account_id NOT IN (SELECT account_id FROM activity_log);
For two years it worked. Then, after a release, the job started returning zero inactive accounts every night. No error, no alert — the job "succeeded" and simply emailed no one. The re-engagement program silently went dark for six weeks before anyone noticed the email volume had flatlined.
What went wrong
The release added a new code path that occasionally inserted an activity_log row with a NULL account_id (an anonymous, pre-login event). That single NULL in the subquery's result was enough to break NOT IN completely.
Here's the mechanism (three-valued logic, Chapters 3 and 5). account_id NOT IN (1, 2, NULL) is evaluated as:
account_id <> 1 AND account_id <> 2 AND account_id <> NULL
└──────────────┘
this is always NULL (unknown)
Because account_id <> NULL is NULL (never true), the whole AND chain can never be true for any account — at best it's NULL, which WHERE treats as "exclude." So the moment the subquery contains a single NULL, NOT IN returns no rows at all, regardless of the data. The "find inactive accounts" query became "find nothing."
The fix
Switch to NOT EXISTS, which is NULL-safe:
-- NULL-safe anti-join
SELECT a.account_id, a.email
FROM accounts a
WHERE NOT EXISTS (
SELECT 1 FROM activity_log l WHERE l.account_id = a.account_id
);
NOT EXISTS asks, per account, "does any activity row match this account_id?" A row with a NULL account_id simply doesn't match any real account, so it's harmlessly ignored — it can't poison the entire result the way it does inside NOT IN. The job immediately started finding inactive accounts again.
Two further hardening steps:
- Fix the data at the source. A
NULLaccount_idinactivity_logwas itself a bug; the column was madeNOT NULL(with a sentinel "anonymous" account for pre-login events), so the situation can't recur. - Add a sanity assertion. The job now logs the count of inactive accounts and alerts if it drops to zero (or changes by more than, say, 90% night-over-night) — so a silent "found nothing" can never again go unnoticed for weeks.
The analysis
-
NOT IN+ aNULLin the subquery = no rows. This is the single most dangerous beginner-and-expert SQL trap, because the query is correct for years until aNULLappears, then fails silently and totally. TreatNOT IN (subquery)as a code smell. -
Prefer
NOT EXISTSfor anti-joins. It'sNULL-safe, it short-circuits, and it usually optimizes well. Make it your default for every "X with no matching Y." -
NULL-safety is a property you must design for. The deeper fix wasn't just the query — it was making the columnNOT NULLso the bad value couldn't exist. Constraints (Chapter 3) prevent the data states that break queries. -
Silent "success" is worse than failure. The job didn't crash; it ran clean and did nothing. Jobs that produce no output on a normally-nonempty task should assert against that — "found zero" is often a bug, not a result. Build that check in.
-
Test queries against
NULL-containing data. The bug would have surfaced instantly in a test where the subquery's source contained aNULL. Anti-join queries deserve a "what if there's a NULL in here?" test case.
Discussion questions
- Walk through why one
NULLin the subquery makesNOT INreturn zero rows for every account. - Why is
NOT EXISTSimmune to this problem? - The team also made
account_idNOT NULL. Why fix the data and the query, rather than just one? - The job ran "successfully" for six weeks while doing nothing. What monitoring would catch this class of silent failure?
- ⭐ Are there cases where
NOT INis fine? State the precise condition under whichNOT IN (subquery)is safe.