Case Study 1 — The Table With No Rules

A table created with the bare minimum — columns and types, but almost no constraints — looked fine on day one and became a swamp of contradictory data within a year. The fix wasn't more application code; it was the constraints that should have been there from the start.

Background

A team built a subscriptions table quickly, to ship a feature. The CREATE TABLE had columns and types but treated constraints as optional polish to "add later":

-- The "we'll add constraints later" version   ❌
CREATE TABLE subscriptions (
    id          integer,
    user_email  text,
    plan        text,
    status      text,
    price       real,
    started_at  text,
    renews_at   text
);

"Later" never came. The application code was supposed to keep the data clean. Over a year, with several developers, a few bugs, and one bulk import, the data rotted:

  • Duplicate ids (no primary key) — two subscriptions shared id = 1041, breaking a downstream join that assumed uniqueness.
  • Garbage status values'active', 'Active', 'ACTIVE', 'cancelled', 'canceled', 'paused', and NULL, written by different code paths. Every query that filtered by status had to defensively handle all the variants.
  • Orphaned rowsuser_emails that no longer existed in users (no foreign key), so "subscription per user" reports double-counted and mis-attributed.
  • Money driftprice real accumulated fractions of a cent; monthly revenue never quite reconciled.
  • Dates as textstarted_at held '2024-01-05', '01/05/2024', and 'Jan 5 2024' depending on who wrote it; date math was impossible without parsing heroics.
  • NULL everywhere — required fields were sometimes missing because nothing required them.

Cleaning it up took a multi-week project: deduplicating, normalizing statuses, parsing dates, reconciling money, and removing orphans — all while the table was live.

What the constraints would have prevented

Every single problem corresponds to a constraint that was omitted. The table should have been:

CREATE TABLE subscriptions (
    id          integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY,   -- no dup ids
    user_id     integer NOT NULL REFERENCES users (user_id),         -- no orphans
    plan        text    NOT NULL CHECK (plan IN ('free','pro','enterprise')),
    status      text    NOT NULL DEFAULT 'active'
                        CHECK (status IN ('active','paused','cancelled')),  -- one vocabulary
    price       numeric(10,2) NOT NULL CHECK (price >= 0),           -- exact money
    started_at  date    NOT NULL,                                    -- real dates
    renews_at   date,
    CHECK (renews_at IS NULL OR renews_at >= started_at)             -- sane ordering
);

With these, none of the rot could have occurred: the primary key forbids duplicate ids; the foreign key forbids orphans; the CHECK on status forbids the seven spellings (the database rejects anything but the three valid values); numeric keeps money exact; date forbids unparseable text; NOT NULL forbids missing required fields. The bad data states were unrepresentable.

The analysis

  1. Constraints are not polish — they're the contract. "We'll enforce it in the app" means "we'll re-implement the rule in every code path, perfectly, forever, including the bulk importer and the migration script and the next developer's feature." That never holds. The database enforces a rule once, for everyone, with no exceptions (Chapter 3). That's not a nicety; it's the only reliable place for the rule.

  2. Every omitted constraint is a future bug. Each problem here maps one-to-one to a missing constraint. The cost of adding them at CREATE time is seconds; the cost of cleaning up the resulting mess was weeks — plus the bugs that shipped in between.

  3. "Add constraints later" rarely works — and is hard once data is dirty. By the time the team tried to add CHECK (status IN (...)), existing rows violated it, so the constraint addition itself failed until the data was cleaned. Constraints are cheapest to add when the table is empty (i.e., at creation).

  4. Types are constraints too. price real and started_at text were type-level mistakes that caused money drift and date chaos. Choosing numeric and date would have prevented entire categories of problems for free.

  5. This is theme #1 made vivid. Design is the most important skill. A good schema makes bad data impossible; a careless one makes bad data inevitable, and no amount of careful querying fixes data that's already wrong.

Discussion questions

  1. For each of the six problems, name the exact constraint (or type choice) that would have prevented it.
  2. Why is "enforce it in the application" an unreliable substitute for a database constraint?
  3. Why was it hard to add the status CHECK constraint a year in? What does that imply about when to add constraints?
  4. The team argued constraints would "slow down shipping." Weigh the seconds saved at creation against the weeks spent cleaning up.
  5. ⭐ Are there cases where you'd deliberately relax a constraint (e.g., a staging/ingest table)? How would you contain the looseness so it doesn't reach clean tables?