Case Study 1 — When the Primary Key Had to Change

DataField.Dev

Case Study 1 — When the Primary Key Had to Change

The chapter recommended a stable surrogate key as the primary key, and protecting natural keys with UNIQUE. Here's the cautionary tale of a team that used a natural key as its primary key — and the slow-motion disaster that followed when reality refused to cooperate.

Background

A regional health-services company built a patient-records system. Early on, an engineer made a decision that felt clean and even elegant: use each patient's government-issued ID number as the primary key of the patients table. After all, it was unique, every patient had one, and it meant they'd never need a separate id column. Every other table — appointments, prescriptions, lab_results, billing — referenced patients by that government ID as a foreign key.

For two years it worked. Then reality arrived, as it always does, in several inconvenient shapes.

The cracks

Patients without an ID. The clinic began serving newborns (who don't yet have an ID issued), foreign visitors, and undocumented patients. These people needed records now, before any government ID existed. But the primary key can't be NULL (entity integrity), and it must be unique. The team's ugly workaround was to invent fake IDs (TEMP-0001, UNKNOWN-2, all-zeros) — which promptly collided with each other and polluted the one column every other table depended on.

IDs that changed. Some patients legitimately received corrected government IDs — data-entry fixes at the issuing agency, identity corrections, status changes. Changing a primary key sounds simple. It is not. Because a dozen tables referenced the old ID as a foreign key, changing it meant updating the patient row and every appointment, prescription, lab result, and bill that pointed at it — in one careful transaction, in the right order, without violating any foreign key along the way. What should have been a one-field correction became a risky, multi-table migration. Some got done wrong, silently splitting a patient's history in two.

The merge problem. Occasionally the clinic discovered that one human being had two records (registered twice, under slightly different details). Merging them meant re-pointing all the foreign keys from one ID to the other and deleting the duplicate — again, a fragile cross-table operation, because the identity of the patient was smeared across every table instead of living in one stable place.

Privacy. The government ID was now duplicated into a dozen tables as a foreign key, appearing in logs, query results, and exports all over the system. A sensitive national identifier had become the most-copied value in the database — a compliance problem on top of an engineering one.

The fix

The team migrated to a surrogate primary key. They added an auto-generated patient_id (a meaningless integer) as the new primary key, made the government ID an ordinary attribute protected by a UNIQUE constraint that allowed NULL (so ID-less patients were fine), and re-pointed every foreign key in the system to the new patient_id.

The migration itself was painful — precisely because the old design had spread the natural key everywhere — but once done, every original problem dissolved:

Problem	Why the surrogate key fixed it
Patients without an ID	The primary key is auto-generated, so it always exists; the (now nullable) government ID can be filled in later
IDs that changed	Correcting the government ID is now a one-field `UPDATE` to one attribute in one row — no foreign keys reference it
Merging duplicates	Identity lives in one stable column; re-pointing FKs is still work, but the target never moves
Privacy	The value joined-on everywhere is a meaningless integer, not a sensitive identifier

The analysis

The root mistake was treating a natural key as if it were stable and universal, when in fact it was neither. The relational model gives clear guidance here, and this case is the argument for it:

Primary keys should be stable. A primary key is referenced by foreign keys throughout the schema; if it changes, all those references must change with it. Natural keys (emails, usernames, government IDs, phone numbers) change more often than you expect. Surrogate keys never need to.
Primary keys should always exist. Entity integrity demands a non-null primary key for every row. If some real entities lack a natural identifier (newborns without an ID), a natural key can't serve. A generated surrogate always can.
Natural keys still matter — protect them with UNIQUE. The lesson is not "ignore natural keys." The government ID still needed to be unique (no two patients share one). The right tool for that is a UNIQUE constraint, not the primary-key role. You get the uniqueness guarantee without the brittleness.
Don't spread sensitive data through foreign keys. Because foreign keys copy the referenced value into many tables, making a sensitive value the key multiplies its exposure. A meaningless surrogate keeps sensitive attributes in one place.

Discussion questions

Explain precisely why changing a primary key is so much harder than changing an ordinary attribute. What role do foreign keys play?
The team allowed the government ID's UNIQUE constraint to permit NULL. Why is that important for the newborn/visitor case, and is it consistent with the relational model?
Could the team have kept the government ID as the primary key and just "been careful"? Argue why discipline is not a substitute for a stable surrogate key here.
⭐ Are there cases where a natural key is a fine primary key? (Think about small, truly immutable reference data — e.g., a table of US state codes.) What properties make a natural key safe to use as a PK?