Case Study 2 — Reconciling Two Systems with EXCEPT

Not every case study is a disaster. This one shows set operations doing exactly what they're best at: comparing two data sets to find precisely what's in one but not the other. EXCEPT in both directions is a reconciliation superpower.

Background

A company was migrating order data from a legacy system into a new PostgreSQL database. After the bulk import, the critical question was: did every order make it, and did anything get corrupted or duplicated in transit? With hundreds of thousands of orders, eyeballing was impossible. The team needed to know, precisely:

  • Which orders are in the legacy system but missing from the new one (lost in migration)?
  • Which orders are in the new system but not in the legacy one (spuriously created)?
  • Do the orders that exist in both agree on their key fields?

Set operations answered all three cleanly.

The reconciliation queries

Both systems exposed an order summary as (order_id, customer_id, order_date, total). (Assume a foreign-data-wrapper or a staging table legacy_orders made the legacy data queryable alongside the new orders.)

1. Orders missing from the new system — in legacy EXCEPT new:

SELECT order_id FROM legacy_orders
EXCEPT
SELECT order_id FROM orders;

If this returns any rows, those order ids were lost in migration. (Empty result = nothing lost.)

2. Orders that shouldn't be there — in new EXCEPT legacy:

SELECT order_id FROM orders
EXCEPT
SELECT order_id FROM legacy_orders;

Rows here are orders in the new system with no legacy counterpart — duplicates or spurious inserts.

3. Rows that exist in both but disagree on a field — compare full rows with EXCEPT both ways:

-- Rows in legacy whose exact values don't match new (same id, different data)
SELECT order_id, customer_id, order_date, total FROM legacy_orders
EXCEPT
SELECT order_id, customer_id, order_date, total FROM orders;

Because EXCEPT compares entire rows, any order whose customer_id, order_date, or total changed during migration shows up here — even though its order_id exists in both. Running it the other direction (orders EXCEPT legacy_orders) catches the mirror discrepancies.

4. A count sanity check with INTERSECT:

-- How many order_ids exist in BOTH systems?
SELECT order_id FROM legacy_orders
INTERSECT
SELECT order_id FROM orders;

If the INTERSECT count equals both source counts and both EXCEPTs are empty, the migration is provably complete and faithful.

The outcome

The first run revealed 37 orders in legacy EXCEPT new (lost — traced to a malformed character in an address field that aborted their import batch) and 0 in new EXCEPT legacy (nothing spurious). The full-row EXCEPT surfaced 12 orders whose total differed by a cent — a rounding difference from a currency-conversion change, caught before it reached finance. All were fixed, the queries re-run until both EXCEPTs returned empty, and the migration was signed off with evidence rather than hope.

The analysis

  1. EXCEPT is the precise tool for "what's in A but not B." Two EXCEPTs (A−B and B−A) together give you a complete picture of differences between two sets — the foundation of any reconciliation.

  2. Compare full rows to catch value-level drift. Selecting all the key columns (not just the id) makes EXCEPT flag rows that exist in both but disagree on content. This catches corruption that an id-only check would miss.

  3. INTERSECT confirms overlap. "How many are in both?" plus "both differences are empty" is a complete, auditable proof of a faithful copy.

  4. Set operations turn "looks right" into "provably right." For migrations, dedups, audits, and system reconciliations, these operators replace spot-checking with exhaustive comparison. Empty EXCEPT results are a strong, checkable guarantee.

  5. This is the constructive twin of the anti-join. The same idea (EXCEPT = difference) that finds "customers who never ordered" finds "orders that didn't migrate." One mental tool, many uses.

Discussion questions

  1. Explain what legacy_orders EXCEPT orders tells you, and why you must also run it the other way.
  2. Why does selecting all key columns (not just order_id) catch value-level corruption?
  3. How do two empty EXCEPTs plus a matching INTERSECT count constitute a proof of a faithful migration?
  4. Why are set operations better than manual spot-checking for this task?
  5. ⭐ Could you do the same reconciliation with NOT EXISTS joins? Sketch it, and say when the EXCEPT form is clearer.