Case Study 1 — The Bakery That Outgrew Its Spreadsheet

DataField.Dev

Case Study 1 — The Bakery That Outgrew Its Spreadsheet

A concrete look at the five problems from this chapter, playing out in a real-feeling small business. As you read, try to name which of the five (structure, integrity, concurrency, durability, querying-at-scale) is at work in each moment.

Background

Rolling Pin is a regional bakery that started as a single storefront and grew, over six years, into eleven locations plus a wholesale operation supplying cafés and grocers. Like almost every small business, it ran on spreadsheets. There was a "Products" workbook (recipes, prices, allergens), an "Orders" workbook (wholesale orders by account), an "Inventory" workbook (ingredients on hand per location), and a shared "Customers" sheet for the wholesale accounts.

For the first two years — one location, one manager, a few dozen wholesale accounts — this worked beautifully. The owner could see everything at a glance, change a price in seconds, and email the files to the accountant each month. There was no software to buy, no system to learn. It felt not just adequate but optimal.

The trouble began, as it usually does, gradually.

The breaking points

The double-booked Tuesday. Two locations shared the wholesale-orders sheet. One morning, the downtown manager and the wholesale coordinator both had it open. Downtown added three new café orders at the bottom; the coordinator, working from a copy she'd opened minutes earlier, re-sorted the sheet by delivery date and saved. When the dust settled, the three new orders were gone — overwritten by the coordinator's older copy. Nobody noticed until two cafés called asking where their croissants were.

This is a concurrency failure. A spreadsheet assumes a single editor. The instant two people edit in parallel, one person's work can silently vanish — there is no mechanism to merge or to prevent the conflict.

The two Mrs. Patels. A loyal wholesale customer, Corner Café, was entered by the downtown manager as "Corner Cafe" and by the eastside manager as "Corner Café (Elm St)." For a year, the business treated one customer as two. Loyalty discounts were calculated wrong. A "top 20 accounts" report missed Corner Café entirely, because its orders were split across two names and neither half cracked the top 20.

This is an integrity failure born of redundancy. The customer's identity was copied and re-typed instead of stored once and referenced. Two copies drifted apart, and now the data contradicted itself.

The allergen scare. A recipe's allergen list was updated in the master Products workbook — but each location kept its own copy of the products sheet for printing labels. One location's copy wasn't updated. A product that now contained tree nuts was labeled as nut-free for three weeks. Fortunately no one was harmed, but it was the kind of near-miss that ends businesses.

This is integrity again, and it's the most dangerous kind: the same fact (a product's allergens) lived in many places, and the copies disagreed. In a database, allergens would be stored once, in one row, and every label query would read that single source of truth.

The price column. Someone formatting the Products sheet typed two prices as text — "4.50" with quotes instead of the number 4.50. The monthly revenue projection, which summed the price column, silently skipped them. The owner over-ordered flour for a month based on inflated margins.

This is a structure failure. A spreadsheet cell will hold anything; "this column contains non-negative numbers" is a hope, not a rule. A database column declared numeric(10,2) CHECK (price >= 0) makes the bad value impossible to enter.

The 90-second open. By year four, the wholesale-orders workbook had grown past 150,000 rows (every line item of every order, across eleven locations and four years). It took a minute and a half to open and locked up when filtered. The "revenue by location by month" pivot table had to be rebuilt by hand each time and took an afternoon.

This is querying-at-scale. 150,000 rows is trivial for a database — milliseconds with the right index — but punishing for a spreadsheet. And the question itself ("revenue by location by month") is a one-line SQL query, not an afternoon of pivot-table surgery.

The laptop. The final straw: the owner's laptop, which held the only working copies of several sheets, was stolen from a car. The most recent backup was on a USB stick from eleven days earlier. Eleven days of orders, inventory counts, and customer updates were simply gone.

This is a durability failure. The data was only as safe as one device and one human's discipline about backups.

The analysis

Notice the pattern: not one of these failures was caused by carelessness alone. Each was the spreadsheet doing exactly what spreadsheets do — and being asked to do something spreadsheets fundamentally can't. The owner didn't have a discipline problem; she had a tool problem. The data had become shared (eleven locations), growing (years of history), long-lived (it had to be correct over time), and important (allergens, revenue, customer relationships). That combination is precisely the boundary where, as the chapter put it, you've outgrown a spreadsheet.

When Rolling Pin moved to a relational database (PostgreSQL, as it happens), each failure mapped to a built-in solution:

Failure	Database mechanism
Overwritten orders	Transactions + concurrency control (Ch. 26–27)
Duplicate "Corner Café"	One `customers` row, referenced by `customer_id`; a `UNIQUE` constraint on name/account (Ch. 3, 14)
Stale allergen copies	Allergens stored once; every label reads the single row (normalization, Ch. 19)
Text in the price column	`numeric` type + `CHECK (price >= 0)` (Ch. 3, 14)
90-second open	Indexed tables; "revenue by location by month" as one query (Ch. 7, 23)
Stolen-laptop data loss	Centralized server, automated backups, point-in-time recovery (Ch. 38)

The lessons

The five problems are not abstractions. Every one of them showed up as a real, costly incident — lost orders, mislabeled allergens, wrong revenue, lost data. When you study transactions, constraints, and backups later in this book, remember Rolling Pin: you're learning to prevent these exact events.
The tool was never "bad" — it was outgrown. Spreadsheets were the right choice at one location and the wrong choice at eleven. The skill is recognizing the boundary before the stolen-laptop moment, not after.
Design is what unlocks the benefits. Simply dumping the spreadsheets into database tables wouldn't have fixed the duplicate-customer problem. It took designing the data — one customer, stored once, referenced everywhere — to make integrity automatic. This is the book's first theme, and Rolling Pin is why it matters: a database gives you the ability to keep data correct; good design is how you use it.

Discussion questions

For each of the six incidents, which of the five problems is the primary cause? Which incidents involve more than one?
The allergen near-miss was the scariest. Explain precisely how storing allergens "once and referenced" would have prevented it.
Rolling Pin waited until a crisis to switch. List three earlier signals that should have prompted the move. What made them easy to ignore?
Could Rolling Pin have used a different tool short of a full database (e.g., a cloud spreadsheet with version history)? What would that have fixed, and what would it still have left exposed?