Appendix F — Normalization Reference
A compact companion to Chapter 19: functional dependencies, the normal forms (1NF–BCNF, with a note on 4NF/5NF), the anomalies they cure, and a worked example.
The disease: redundancy → anomalies
Storing the same fact in multiple places (redundancy) causes:
- Update anomaly — change one copy, others go stale (one customer, two emails).
- Insertion anomaly — can't record a fact without an unrelated one (a product with no order).
- Deletion anomaly — deleting a row loses unrelated facts (delete an order, lose product info).
Normalization eliminates redundancy so each fact lives in exactly one place.
Functional dependencies (FDs)
A → B ("A determines B"): for each value of A there is exactly one value of B. Normalization goal, as a slogan:
Every non-key attribute depends on the key, the whole key, and nothing but the key.
The normal forms
| Form | Requires | Removes |
|---|---|---|
| 1NF | atomic values; no repeating groups/lists in a cell | lists-in-cells, repeating columns |
| 2NF | 1NF + no partial dependency (on part of a composite key) | facts depending on part of the key |
| 3NF | 2NF + no transitive dependency (non-key → non-key) | facts depending on another non-key |
| BCNF | every determinant is a candidate key | rare 3NF edge cases (overlapping candidate keys) |
- 2NF only bites tables with composite keys.
- 3NF is the practical baseline (usually coincides with BCNF).
- 4NF (no multi-valued dependencies) and 5NF (join dependencies) handle independent multi-valued facts — rarely needed explicitly; good ER modeling (separate tables per independent relationship) usually achieves them.
Worked example (the Chapter 19 anchor)
Flat table (key (order_id, product_name)):
order_id, order_date, customer_name, customer_email, product_name, product_price, category, qty
→ 1NF: atomic values (already; ensure no lists/repeating groups).
→ 2NF (remove partial dependencies):
- order_id → order_date, customer_* (depends on part of key) → split to orders.
- product_name → product_price, category (depends on other part) → split to products.
- Line data → order_items(order_id, product_name, qty).
→ 3NF (remove transitive dependencies):
- In orders: order_id → customer_email → customer_name (transitive) → split customers.
- In products: product_id → category_id → category (transitive) → split categories.
Result (≈ Mercado's schema):
customers(customer_id, name, email)
orders(order_id, order_date, customer_id)
categories(category_id, category)
products(product_id, name, price, category_id)
order_items(order_id, product_id, qty)
Every fact now lives once; all three anomalies are gone.
Denormalization (the deliberate reverse — Ch. 20)
Normalization optimizes write integrity at the cost of read joins. Denormalization re-introduces controlled redundancy for read speed — deliberately and documented (e.g., order_items.unit_price as a point-in-time copy). Distinguish:
- Accidental redundancy of a current fact → a bug (it drifts).
- Deliberate redundancy (point-in-time snapshot, cached aggregate with a maintenance plan) → a feature.
Normalize to 3NF by default; denormalize only on measured need, with a consistency plan (triggers / scheduled refresh / accepted staleness).
Quick checklist (audit a table)
- Any cell holding a list / repeating columns? → 1NF fix (child table).
- Composite key with an attribute depending on only part of it? → 2NF fix (split).
- A non-key attribute depending on another non-key attribute? → 3NF fix (split).
- Is each remaining redundancy deliberate (documented denormalization) or accidental (a bug)?
See also: Chapter 17 (ER modeling), Chapter 18 (ER→tables), Chapter 19 (Normalization), Chapter 20 (Denormalization).