Case Study 2 — The Transitive Dependency Nobody Noticed
Not every normalization failure is a wide spreadsheet table. This one was subtle: a single transitively-dependent column hid in an otherwise-clean schema and silently produced inconsistent data for months. A textbook 3NF violation, in the wild.
Background
A logistics company had a reasonable-looking shipments table:
CREATE TABLE shipments (
shipment_id integer PRIMARY KEY,
order_id integer NOT NULL REFERENCES orders(order_id),
warehouse_id integer NOT NULL REFERENCES warehouses(warehouse_id),
warehouse_region text, -- ⚠️ the culprit
shipped_at timestamptz,
carrier text
);
At a glance it's fine — keys, foreign keys, atomic values. But warehouse_region is a problem: a warehouse's region is a fact about the warehouse, not the shipment. The dependency chain is shipment_id → warehouse_id → warehouse_region — a transitive dependency, a 3NF violation. The region was being copied onto every shipment from that warehouse.
Nobody noticed at design time because the table looked clean and the region was "convenient to have on the shipment for reporting."
What went wrong
Months later, the company reorganized its regions — warehouse #2 moved from 'US-East' to 'US-Northeast'. The warehouses table was updated (one row). But shipments.warehouse_region held thousands of copies of the old 'US-East' value, and they were not updated (and updating them retroactively was itself ambiguous — should historical shipments show the old region or the new?).
Now the same warehouse reported two different regions depending on whether you read warehouses.region (new) or shipments.warehouse_region (old, for past shipments). Regional shipping reports — which joined neither consistently — produced numbers that didn't reconcile, and a logistics dashboard showed a region that, per the warehouses table, no longer existed. It took days to discover that the "regional discrepancy" was a duplicated, un-updated column, not a data-entry error.
This is the update anomaly in miniature: a fact (warehouse_region) stored redundantly (on every shipment) drifted out of sync with its source of truth (warehouses).
The fix
Remove the transitively-dependent column; derive the region from the warehouse via a join when needed:
ALTER TABLE shipments DROP COLUMN warehouse_region;
-- region comes from the warehouse, always current and consistent:
SELECT s.shipment_id, w.region
FROM shipments s
JOIN warehouses w ON w.warehouse_id = s.warehouse_id;
Now there is exactly one place a warehouse's region lives (warehouses.region); every report joins to it and gets the current value, consistently. The reorganization becomes a one-row update with no drift. If region-by-shipment appears in many queries, a view (Chapter 15) can package the join so it's as convenient as the old column — without the redundancy.
The deliberate-redundancy nuance
There's one legitimate exception worth noting: if the business needs the region as it was at ship time (point-in-time history), then storing it on the shipment is a deliberate denormalization (Chapter 20) — capturing a historical fact, like order_items.unit_price. The key difference: that's a conscious decision to record history, documented as such, not an accidental copy of a current fact. The logistics team decided they wanted current region (reports should reflect today's org), so removing the column was right. Had they wanted historical region, they'd keep it — but knowingly, and labeled as a historical snapshot.
The analysis
-
3NF violations hide in "convenient" columns.
warehouse_regionlooked harmless — just a handy field on the shipment. But it depended onwarehouse_id, not on the shipment, making it transitively dependent and redundant. The convenience masked the redundancy. -
Redundant copies of current facts drift. The instant the source (
warehouses.region) changes, every copy is stale unless perfectly synchronized — which never holds. The result is two sources of truth that disagree (the update anomaly). -
The fix is to derive, not store. Remove the column; get the region via a join (optionally packaged in a view). One source of truth, always consistent.
-
Distinguish "current fact" from "historical snapshot." Storing a current fact redundantly is a 3NF bug. Storing a point-in-time value deliberately (unit price at sale, region at ship time if the business wants history) is intentional denormalization — fine, when chosen and documented. The test: are you copying a fact that should stay in sync (bug) or recording a fact as-of-a-moment (feature)?
-
Normal forms catch subtle bugs, not just spreadsheet messes. A schema can look clean and still harbor a single transitive dependency that causes months of confusion. Checking each non-key column — "does this depend on the key, or on another column?" — catches them.
Discussion questions
- State the dependency chain that makes
warehouse_regiona 3NF violation. - Why did the regional reports stop reconciling after the reorganization?
- How does deriving region via a join (or a view) eliminate the anomaly?
- When would storing the region on the shipment be the correct, deliberate choice?
- ⭐ Give a checklist question to ask of every column in a table to catch transitive dependencies before they ship.