Chapter 19 — Key Takeaways

The big idea

Normalization structures tables to eliminate redundancy — the root cause of data anomalies. Each fact should live in exactly one place; then it can't contradict itself. This is theme #1's deepest tool.

The disease and its symptoms

  • Redundancy (same fact stored in many places) → three anomalies:
  • Update: change one copy, others go stale (two emails for one customer).
  • Insertion: can't record a fact without an unrelated one (a product with no order).
  • Deletion: deleting a row loses unrelated facts (delete an order, lose product info).

The tool: functional dependencies

A → B = "A determines B" (each A has exactly one B). Slogan: every non-key attribute depends on the key, the whole key, and nothing but the key.

The normal forms

Form Removes Informal rule
1NF lists-in-cells atomic values, no repeating groups
2NF partial dependencies non-key depends on the whole (composite) key
3NF transitive dependencies no non-key → non-key
BCNF rare 3NF edge cases every determinant is a candidate key
  • 3NF is the practical baseline (usually = BCNF). Stopping at 2NF leaves transitive-dependency redundancy.
  • Normalizing the flat orders table to 3NF yields (essentially) Mercado's actual schema — the anchor's payoff.

Subtlety: it's a tool, not dogma

  • Normalization optimizes write integrity at the cost of read-time joins.
  • Normalize to 3NF by default, then denormalize selectively and consciously for measured read performance (Chapter 20).
  • Don't over-normalize prematurely (10-way-join schemas before any perf problem).

Current fact vs. historical snapshot

  • Copying a current fact (warehouse_region on shipments) = a 3NF bug (it drifts — Case Study 2).
  • Recording a point-in-time value deliberately (order_items.unit_price at sale) = intentional denormalization (a feature, documented).
  • Test: "does this need to stay in sync (bug) or capture a moment (feature)?"

You can now…

  • ☐ Spot update/insertion/deletion anomalies and trace them to redundancy.
  • ☐ Write functional dependencies and use them to drive splits.
  • ☐ Normalize to 1NF/2NF/3NF (and recognize BCNF) step by step.
  • ☐ Explain which class of redundancy each normal form removes.
  • ☐ Distinguish accidental redundancy (a bug) from deliberate denormalization (a choice).

Looking ahead

Chapter 20 — Denormalization. The other half of the anchor: when and how to deliberately re-introduce redundancy for read performance, the trade-offs you accept, and why "normalize, then selectively denormalize" is the professional default.

One sentence to carry forward: Normalize so every fact lives once — then, and only with a measured reason, denormalize on purpose; redundancy by accident is a bug, redundancy by decision is a tool.