Case Study 1 — The Table With No Rules
A table created with the bare minimum — columns and types, but almost no constraints — looked fine on day one and became a swamp of contradictory data within a year. The fix wasn't more application code; it was the constraints that should have been there from the start.
Background
A team built a subscriptions table quickly, to ship a feature. The CREATE TABLE had columns and types but treated constraints as optional polish to "add later":
-- The "we'll add constraints later" version ❌
CREATE TABLE subscriptions (
id integer,
user_email text,
plan text,
status text,
price real,
started_at text,
renews_at text
);
"Later" never came. The application code was supposed to keep the data clean. Over a year, with several developers, a few bugs, and one bulk import, the data rotted:
- Duplicate
ids (no primary key) — two subscriptions sharedid = 1041, breaking a downstream join that assumed uniqueness. - Garbage
statusvalues —'active','Active','ACTIVE','cancelled','canceled','paused', andNULL, written by different code paths. Every query that filtered by status had to defensively handle all the variants. - Orphaned rows —
user_emails that no longer existed inusers(no foreign key), so "subscription per user" reports double-counted and mis-attributed. - Money drift —
price realaccumulated fractions of a cent; monthly revenue never quite reconciled. - Dates as text —
started_atheld'2024-01-05','01/05/2024', and'Jan 5 2024'depending on who wrote it; date math was impossible without parsing heroics. NULLeverywhere — required fields were sometimes missing because nothing required them.
Cleaning it up took a multi-week project: deduplicating, normalizing statuses, parsing dates, reconciling money, and removing orphans — all while the table was live.
What the constraints would have prevented
Every single problem corresponds to a constraint that was omitted. The table should have been:
CREATE TABLE subscriptions (
id integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY, -- no dup ids
user_id integer NOT NULL REFERENCES users (user_id), -- no orphans
plan text NOT NULL CHECK (plan IN ('free','pro','enterprise')),
status text NOT NULL DEFAULT 'active'
CHECK (status IN ('active','paused','cancelled')), -- one vocabulary
price numeric(10,2) NOT NULL CHECK (price >= 0), -- exact money
started_at date NOT NULL, -- real dates
renews_at date,
CHECK (renews_at IS NULL OR renews_at >= started_at) -- sane ordering
);
With these, none of the rot could have occurred: the primary key forbids duplicate ids; the foreign key forbids orphans; the CHECK on status forbids the seven spellings (the database rejects anything but the three valid values); numeric keeps money exact; date forbids unparseable text; NOT NULL forbids missing required fields. The bad data states were unrepresentable.
The analysis
-
Constraints are not polish — they're the contract. "We'll enforce it in the app" means "we'll re-implement the rule in every code path, perfectly, forever, including the bulk importer and the migration script and the next developer's feature." That never holds. The database enforces a rule once, for everyone, with no exceptions (Chapter 3). That's not a nicety; it's the only reliable place for the rule.
-
Every omitted constraint is a future bug. Each problem here maps one-to-one to a missing constraint. The cost of adding them at
CREATEtime is seconds; the cost of cleaning up the resulting mess was weeks — plus the bugs that shipped in between. -
"Add constraints later" rarely works — and is hard once data is dirty. By the time the team tried to add
CHECK (status IN (...)), existing rows violated it, so the constraint addition itself failed until the data was cleaned. Constraints are cheapest to add when the table is empty (i.e., at creation). -
Types are constraints too.
price realandstarted_at textwere type-level mistakes that caused money drift and date chaos. Choosingnumericanddatewould have prevented entire categories of problems for free. -
This is theme #1 made vivid. Design is the most important skill. A good schema makes bad data impossible; a careless one makes bad data inevitable, and no amount of careful querying fixes data that's already wrong.
Discussion questions
- For each of the six problems, name the exact constraint (or type choice) that would have prevented it.
- Why is "enforce it in the application" an unreliable substitute for a database constraint?
- Why was it hard to add the
statusCHECKconstraint a year in? What does that imply about when to add constraints? - The team argued constraints would "slow down shipping." Weigh the seconds saved at creation against the weeks spent cleaning up.
- ⭐ Are there cases where you'd deliberately relax a constraint (e.g., a staging/ingest table)? How would you contain the looseness so it doesn't reach clean tables?