Appendix F — Normalization Reference

A compact companion to Chapter 19: functional dependencies, the normal forms (1NF–BCNF, with a note on 4NF/5NF), the anomalies they cure, and a worked example.

The disease: redundancy → anomalies

Storing the same fact in multiple places (redundancy) causes:

  • Update anomaly — change one copy, others go stale (one customer, two emails).
  • Insertion anomaly — can't record a fact without an unrelated one (a product with no order).
  • Deletion anomaly — deleting a row loses unrelated facts (delete an order, lose product info).

Normalization eliminates redundancy so each fact lives in exactly one place.

Functional dependencies (FDs)

A → B ("A determines B"): for each value of A there is exactly one value of B. Normalization goal, as a slogan:

Every non-key attribute depends on the key, the whole key, and nothing but the key.

The normal forms

Form Requires Removes
1NF atomic values; no repeating groups/lists in a cell lists-in-cells, repeating columns
2NF 1NF + no partial dependency (on part of a composite key) facts depending on part of the key
3NF 2NF + no transitive dependency (non-key → non-key) facts depending on another non-key
BCNF every determinant is a candidate key rare 3NF edge cases (overlapping candidate keys)
  • 2NF only bites tables with composite keys.
  • 3NF is the practical baseline (usually coincides with BCNF).
  • 4NF (no multi-valued dependencies) and 5NF (join dependencies) handle independent multi-valued facts — rarely needed explicitly; good ER modeling (separate tables per independent relationship) usually achieves them.

Worked example (the Chapter 19 anchor)

Flat table (key (order_id, product_name)):

order_id, order_date, customer_name, customer_email, product_name, product_price, category, qty

→ 1NF: atomic values (already; ensure no lists/repeating groups).

→ 2NF (remove partial dependencies): - order_id → order_date, customer_* (depends on part of key) → split to orders. - product_name → product_price, category (depends on other part) → split to products. - Line data → order_items(order_id, product_name, qty).

→ 3NF (remove transitive dependencies): - In orders: order_id → customer_email → customer_name (transitive) → split customers. - In products: product_id → category_id → category (transitive) → split categories.

Result (≈ Mercado's schema):

customers(customer_id, name, email)
orders(order_id, order_date, customer_id)
categories(category_id, category)
products(product_id, name, price, category_id)
order_items(order_id, product_id, qty)

Every fact now lives once; all three anomalies are gone.

Denormalization (the deliberate reverse — Ch. 20)

Normalization optimizes write integrity at the cost of read joins. Denormalization re-introduces controlled redundancy for read speed — deliberately and documented (e.g., order_items.unit_price as a point-in-time copy). Distinguish:

  • Accidental redundancy of a current fact → a bug (it drifts).
  • Deliberate redundancy (point-in-time snapshot, cached aggregate with a maintenance plan) → a feature.

Normalize to 3NF by default; denormalize only on measured need, with a consistency plan (triggers / scheduled refresh / accepted staleness).

Quick checklist (audit a table)

  1. Any cell holding a list / repeating columns? → 1NF fix (child table).
  2. Composite key with an attribute depending on only part of it? → 2NF fix (split).
  3. A non-key attribute depending on another non-key attribute? → 3NF fix (split).
  4. Is each remaining redundancy deliberate (documented denormalization) or accidental (a bug)?

See also: Chapter 17 (ER modeling), Chapter 18 (ER→tables), Chapter 19 (Normalization), Chapter 20 (Denormalization).