Appendix F — Normalization Reference

A compact companion to Chapter 19: functional dependencies, the normal forms (1NF–BCNF, with a note on 4NF/5NF), the anomalies they cure, and a worked example.

The disease: redundancy → anomalies

Storing the same fact in multiple places (redundancy) causes:

Update anomaly — change one copy, others go stale (one customer, two emails).
Insertion anomaly — can't record a fact without an unrelated one (a product with no order).
Deletion anomaly — deleting a row loses unrelated facts (delete an order, lose product info).

Normalization eliminates redundancy so each fact lives in exactly one place.

Functional dependencies (FDs)

A → B ("A determines B"): for each value of A there is exactly one value of B. Normalization goal, as a slogan:

Every non-key attribute depends on the key, the whole key, and nothing but the key.

The normal forms

Form	Requires	Removes
1NF	atomic values; no repeating groups/lists in a cell	lists-in-cells, repeating columns
2NF	1NF + no partial dependency (on part of a composite key)	facts depending on part of the key
3NF	2NF + no transitive dependency (non-key → non-key)	facts depending on another non-key
BCNF	every determinant is a candidate key	rare 3NF edge cases (overlapping candidate keys)

2NF only bites tables with composite keys.
3NF is the practical baseline (usually coincides with BCNF).
4NF (no multi-valued dependencies) and 5NF (join dependencies) handle independent multi-valued facts — rarely needed explicitly; good ER modeling (separate tables per independent relationship) usually achieves them.

Worked example (the Chapter 19 anchor)

Flat table (key (order_id, product_name)):

order_id, order_date, customer_name, customer_email, product_name, product_price, category, qty

→ 1NF: atomic values (already; ensure no lists/repeating groups).

→ 2NF (remove partial dependencies): - order_id → order_date, customer_* (depends on part of key) → split to orders. - product_name → product_price, category (depends on other part) → split to products. - Line data → order_items(order_id, product_name, qty).

→ 3NF (remove transitive dependencies): - In orders: order_id → customer_email → customer_name (transitive) → split customers. - In products: product_id → category_id → category (transitive) → split categories.

Result (≈ Mercado's schema):

customers(customer_id, name, email)
orders(order_id, order_date, customer_id)
categories(category_id, category)
products(product_id, name, price, category_id)
order_items(order_id, product_id, qty)

Every fact now lives once; all three anomalies are gone.

Denormalization (the deliberate reverse — Ch. 20)

Normalization optimizes write integrity at the cost of read joins. Denormalization re-introduces controlled redundancy for read speed — deliberately and documented (e.g., order_items.unit_price as a point-in-time copy). Distinguish:

Accidental redundancy of a current fact → a bug (it drifts).
Deliberate redundancy (point-in-time snapshot, cached aggregate with a maintenance plan) → a feature.

Normalize to 3NF by default; denormalize only on measured need, with a consistency plan (triggers / scheduled refresh / accepted staleness).

Quick checklist (audit a table)

Any cell holding a list / repeating columns? → 1NF fix (child table).
Composite key with an attribute depending on only part of it? → 2NF fix (split).
A non-key attribute depending on another non-key attribute? → 3NF fix (split).
Is each remaining redundancy deliberate (documented denormalization) or accidental (a bug)?

See also: Chapter 17 (ER modeling), Chapter 18 (ER→tables), Chapter 19 (Normalization), Chapter 20 (Denormalization).