Case Study 1 — The Spreadsheet Database

DataField.Dev

Case Study 1 — The Spreadsheet Database

The normalization-disaster anchor, in a real company. A team built their core data as one wide, denormalized table because it "matched the spreadsheet everyone already used." Within a year, all three anomalies were causing daily pain — and the cure was step-by-step normalization to 3NF.

Background

A field-services company tracked jobs in a single table that mirrored their old spreadsheet — one row per line item of work, with everything denormalized onto each row:

 job_id | job_date  | client_name | client_email      | client_tier | tech_name | service     | service_price | qty
--------+-----------+-------------+-------------------+-------------+-----------+-------------+---------------+----
  1001  | 2024-03-01| Acme Corp   | ap@acme.example   | gold        | Sam Lee   | Inspection  | 150.00        | 1
  1001  | 2024-03-01| Acme Corp   | ap@acme.example   | gold        | Sam Lee   | Repair      | 400.00        | 2
  1002  | 2024-03-02| Beta LLC    | pay@beta.example  | standard    | Sam Lee   | Inspection  | 150.00        | 1

Convenient to eyeball. A swamp to operate.

The anomalies, in production

Update anomaly. Acme changed their AP email. It lived on every line of every Acme job — hundreds of rows. A junior updated the recent ones and missed the old; now reports grouped Acme under two emails, and an automated invoice went to the stale address. (Exactly the redundancy disease.)

Update anomaly, again. The "Inspection" service price rose from $150 to $175. It was stored on every inspection line ever. Updating "current" prices while preserving historical ones became a nightmare of half-right data — nobody could tell which $150s were "old correct" vs "should-have-been-updated."

Insertion anomaly. Sales wanted to add a new service ("Calibration") to the catalog before anyone ordered it. There was nowhere to put it — the table only held performed work. A service couldn't exist without a job.

Deletion anomaly. A cancelled job was deleted. It happened to be the only job that had used a rarely-offered service — so that service's price and description vanished from the company's records entirely.

Client tier confusion. client_tier depended on the client, but was stored per line; a client's tier was upgraded in some rows and not others, so discount logic applied inconsistently within the same client.

Every one of these traces to redundancy: client facts repeated per line, service facts repeated per line, tier (transitively dependent on client) repeated per line.

The normalization

The team normalized step by step, exactly as the chapter teaches.

To 2NF — remove partial dependencies (the natural key was (job_id, service)): - job_id → job_date, client_*, tech_name depended on only part of the key → split into a jobs table. - service → service_price depended on only the other part → split into a services table. - The genuine line data stayed in job_lines(job_id, service_id, qty).

To 3NF — remove transitive dependencies: - In jobs, client_email → client_name, client_tier (transitive: client facts depend on the client, not the job) → split into a clients table; jobs keeps client_id. - tech_name likewise → a technicians table.

The result:

   clients(client_id, name, email, tier)
   technicians(technician_id, name)
   services(service_id, name, price)
   jobs(job_id, job_date, client_id, technician_id)
   job_lines(job_id, service_id, qty)

Now every fact lives once. Acme's email: one row. Inspection's price: one row (and historical line prices, if needed, are captured by storing unit_price on the line at time of service — a deliberate denormalization, Chapter 20). A new service: one row in services, no job required. Deleting a job: services untouched. Client tier: one place, always consistent.

The analysis

"It matches the spreadsheet" is how schemas rot. Spreadsheets are denormalized by nature (everything on one row). Copying that shape into a database imports all the redundancy and every anomaly. A database's power is splitting facts apart so each lives once.
Each anomaly maps to a specific dependency violation. Client facts repeated → transitive dependency → 3NF fix. Service price repeated → partial dependency → 2NF fix. Naming the FD that's violated tells you exactly which split to make.
Normalization made integrity automatic. After the split, you can't have two emails for Acme or two tiers for one client — the data structurally permits only one. That's the goal: not "be careful," but "make the bad state unrepresentable" (theme #1).
Some redundancy is intentional — but chosen. The line's unit_price is deliberately copied at sale time (so history is preserved even when the catalog price changes). That's denormalization done on purpose for a reason (Chapter 20), not the accidental redundancy of the flat table. The difference is intent.
Normalize to 3NF as the baseline. Stopping at 2NF would have left the client-tier and email anomalies (transitive). 3NF is where the anomalies actually stop.

Discussion questions

For each anomaly described, name the dependency it violates and the normal-form step that fixes it.
Why couldn't a new service be added before normalization? Which anomaly is that?
After normalization, why is it impossible for Acme to have two emails?
The line's unit_price is still copied onto each line. Why is that not a normalization failure?
⭐ Would stopping at 2NF have fixed the client-tier inconsistency? Explain using the relevant dependency.