Case Study 2: Normalization in Practice at a Hospital System
Building a HIPAA-Compliant Patient Data Model for Great Plains Regional Health
The Organization
Great Plains Regional Health (GPRH) is a hospital system operating seven hospitals and forty-two outpatient clinics across Kansas, Nebraska, and western Missouri. With 3.2 million patient records, 18,000 employees (including 2,400 physicians), and $2.1 billion in annual revenue, GPRH runs its clinical and administrative systems on DB2 for z/OS — a common choice for large health systems where reliability, auditability, and mainframe integration are non-negotiable.
In 2021, GPRH undertook a major redesign of its patient data model. The legacy system — built in the 1990s — had grown organically for twenty-five years, accumulating denormalized structures, duplicated data, and workarounds that made HIPAA compliance audits increasingly painful. This case study examines how the database team applied normalization principles to build a model that was both clinically accurate and regulatory compliant.
The HIPAA Challenge
The Health Insurance Portability and Accountability Act (HIPAA) imposes strict requirements on how Protected Health Information (PHI) is stored, accessed, and audited. For database design, the key requirements include:
- Minimum Necessary Standard: Systems must be designed so that users and applications access only the minimum PHI necessary for their function. A billing clerk should not see clinical diagnoses. A radiologist should not see financial records.
- Audit Trail: Every access to PHI must be logged — who accessed what, when, and why.
- Data Integrity: Patient data must be accurate and complete. Errors in medication records or allergy information can be life-threatening, not just inconvenient.
- Access Controls: Different roles (physician, nurse, billing, research) require different levels of access to different subsets of data.
A denormalized design — where patient demographics, clinical data, insurance information, and billing details are intermixed in a few wide tables — makes all of these requirements harder to satisfy. If a single PATIENT table contains both the patient's diagnosis and their Social Security number, you cannot grant a billing clerk access to the SSN column without also exposing the diagnosis, or vice versa. DB2's column-level GRANT helps, but it is far easier to enforce access boundaries when the data is in separate tables.
The Legacy Design
The legacy system's core table looked approximately like this:
PATIENT_MASTER (approximately 120 columns)
---
PATIENT_ID, MRN, SSN, FIRST_NAME, LAST_NAME, DOB, GENDER,
ADDRESS, CITY, STATE, ZIP,
HOME_PHONE, WORK_PHONE, CELL_PHONE, EMAIL,
EMERGENCY_CONTACT_1_NAME, EMERGENCY_CONTACT_1_PHONE, EMERGENCY_CONTACT_1_RELATION,
EMERGENCY_CONTACT_2_NAME, EMERGENCY_CONTACT_2_PHONE, EMERGENCY_CONTACT_2_RELATION,
PRIMARY_INSURANCE_ID, PRIMARY_INSURANCE_NAME, PRIMARY_POLICY_NUM, PRIMARY_GROUP_NUM,
SECONDARY_INSURANCE_ID, SECONDARY_INSURANCE_NAME, SECONDARY_POLICY_NUM, SECONDARY_GROUP_NUM,
PRIMARY_PHYSICIAN_ID, PRIMARY_PHYSICIAN_NAME, PRIMARY_PHYSICIAN_PHONE,
ALLERGY_1, ALLERGY_2, ALLERGY_3, ALLERGY_4, ALLERGY_5,
CHRONIC_CONDITION_1, CHRONIC_CONDITION_2, CHRONIC_CONDITION_3,
... (and 80 more columns)
The problems were severe:
1NF violations. Allergies were stored in five fixed columns (ALLERGY_1 through ALLERGY_5). Patients with six allergies had the sixth allergy recorded in the NOTES field. Patients with no allergies had five NULL columns consuming space. Querying "all patients allergic to penicillin" required checking five columns with OR logic.
2NF / 3NF violations. Insurance company names and phone numbers were stored alongside the patient's policy information. When BlueCross BlueShield changed its customer service number, the change had to be applied to 1.8 million patient records. (It was not applied consistently. Six months later, different patients had different phone numbers for the same insurance company.)
Repeating groups. Emergency contacts were stored as numbered column sets (EMERGENCY_CONTACT_1_, EMERGENCY_CONTACT_2_). Adding a third emergency contact required an ALTER TABLE that added three more columns to a 120-column table.
Access control nightmare. Granting a billing clerk access to the PATIENT_MASTER table exposed clinical information (allergies, chronic conditions) that the clerk had no business seeing. GPRH had implemented row-level and column-level access controls, but they were brittle and expensive to maintain.
The Normalized Redesign
The database team, led by architect Dr. Sarah Okonkwo, decomposed PATIENT_MASTER into seventeen tables organized into four domains:
Domain 1: Patient Identity (Accessible by all authorized roles)
CREATE TABLE GPRH.PATIENT (
PATIENT_ID INTEGER NOT NULL GENERATED ALWAYS AS IDENTITY,
MRN CHAR(10) NOT NULL, -- Medical Record Number
FIRST_NAME VARCHAR(50) NOT NULL,
MIDDLE_NAME VARCHAR(50),
LAST_NAME VARCHAR(50) NOT NULL,
DATE_OF_BIRTH DATE NOT NULL,
GENDER_CODE CHAR(1) NOT NULL,
DECEASED_DATE DATE,
CREATED_TIMESTAMP TIMESTAMP(6) NOT NULL DEFAULT CURRENT TIMESTAMP,
CONSTRAINT PK_PATIENT PRIMARY KEY (PATIENT_ID),
CONSTRAINT UQ_PATIENT_MRN UNIQUE (MRN),
CONSTRAINT CK_GENDER CHECK (GENDER_CODE IN ('M','F','U','O'))
);
CREATE TABLE GPRH.PATIENT_ADDRESS (
ADDRESS_ID INTEGER NOT NULL GENERATED ALWAYS AS IDENTITY,
PATIENT_ID INTEGER NOT NULL,
ADDRESS_TYPE CHAR(4) NOT NULL, -- HOME, MAIL, TEMP, WORK
ADDRESS_LINE1 VARCHAR(200) NOT NULL,
ADDRESS_LINE2 VARCHAR(200),
CITY VARCHAR(50) NOT NULL,
STATE_CODE CHAR(2) NOT NULL,
ZIP_CODE CHAR(10) NOT NULL,
EFFECTIVE_DATE DATE NOT NULL,
END_DATE DATE,
IS_PRIMARY SMALLINT NOT NULL DEFAULT 1,
CONSTRAINT PK_PATIENT_ADDR PRIMARY KEY (ADDRESS_ID),
CONSTRAINT FK_PTADDR_PATIENT FOREIGN KEY (PATIENT_ID)
REFERENCES GPRH.PATIENT (PATIENT_ID) ON DELETE RESTRICT,
CONSTRAINT CK_ADDR_TYPE CHECK (ADDRESS_TYPE IN ('HOME','MAIL','TEMP','WORK'))
);
CREATE TABLE GPRH.PATIENT_CONTACT (
CONTACT_ID INTEGER NOT NULL GENERATED ALWAYS AS IDENTITY,
PATIENT_ID INTEGER NOT NULL,
CONTACT_METHOD CHAR(5) NOT NULL, -- PHONE, EMAIL, FAX
CONTACT_VALUE VARCHAR(254) NOT NULL,
CONTACT_USE CHAR(4) NOT NULL, -- HOME, WORK, MOBL, EMER
IS_PRIMARY SMALLINT NOT NULL DEFAULT 0,
CONSTRAINT PK_PATIENT_CONTACT PRIMARY KEY (CONTACT_ID),
CONSTRAINT FK_PTCONT_PATIENT FOREIGN KEY (PATIENT_ID)
REFERENCES GPRH.PATIENT (PATIENT_ID) ON DELETE RESTRICT
);
CREATE TABLE GPRH.PATIENT_EMERGENCY_CONTACT (
EMERG_CONTACT_ID INTEGER NOT NULL GENERATED ALWAYS AS IDENTITY,
PATIENT_ID INTEGER NOT NULL,
CONTACT_NAME VARCHAR(100) NOT NULL,
RELATIONSHIP VARCHAR(50) NOT NULL,
PHONE_NUMBER VARCHAR(20) NOT NULL,
PRIORITY_ORDER SMALLINT NOT NULL,
CONSTRAINT PK_EMERG_CONTACT PRIMARY KEY (EMERG_CONTACT_ID),
CONSTRAINT FK_EMCONT_PATIENT FOREIGN KEY (PATIENT_ID)
REFERENCES GPRH.PATIENT (PATIENT_ID) ON DELETE RESTRICT
);
Design rationale: Patient addresses are historized with EFFECTIVE_DATE and END_DATE, so address history is preserved (critical for insurance claims that reference the address at the time of service). Emergency contacts are rows in a separate table, not numbered columns — a patient can have one or ten. Contact methods are fully normalized — no more PHONE_HOME, PHONE_WORK, PHONE_CELL columns.
Domain 2: Clinical (Accessible by physicians and nurses only)
CREATE TABLE GPRH.PATIENT_ALLERGY (
ALLERGY_ID INTEGER NOT NULL GENERATED ALWAYS AS IDENTITY,
PATIENT_ID INTEGER NOT NULL,
ALLERGEN_CODE VARCHAR(20) NOT NULL,
ALLERGEN_DESC VARCHAR(200) NOT NULL,
SEVERITY_CODE CHAR(2) NOT NULL, -- MI=Mild, MO=Moderate, SE=Severe, LT=Life-threatening
REACTION_DESC VARCHAR(500),
REPORTED_DATE DATE NOT NULL,
VERIFIED_BY INTEGER, -- FK to PROVIDER
CONSTRAINT PK_PT_ALLERGY PRIMARY KEY (ALLERGY_ID),
CONSTRAINT FK_PTALL_PATIENT FOREIGN KEY (PATIENT_ID)
REFERENCES GPRH.PATIENT (PATIENT_ID) ON DELETE RESTRICT,
CONSTRAINT CK_SEVERITY CHECK (SEVERITY_CODE IN ('MI','MO','SE','LT'))
);
CREATE TABLE GPRH.PATIENT_CONDITION (
CONDITION_ID INTEGER NOT NULL GENERATED ALWAYS AS IDENTITY,
PATIENT_ID INTEGER NOT NULL,
ICD10_CODE VARCHAR(10) NOT NULL,
CONDITION_DESC VARCHAR(200) NOT NULL,
ONSET_DATE DATE,
RESOLVED_DATE DATE,
STATUS_CODE CHAR(1) NOT NULL DEFAULT 'A', -- A=Active, R=Resolved, I=Inactive
DIAGNOSED_BY INTEGER, -- FK to PROVIDER
CONSTRAINT PK_PT_CONDITION PRIMARY KEY (CONDITION_ID),
CONSTRAINT FK_PTCOND_PATIENT FOREIGN KEY (PATIENT_ID)
REFERENCES GPRH.PATIENT (PATIENT_ID) ON DELETE RESTRICT
);
Design rationale: Allergies and conditions are now proper entities with unlimited cardinality — no more ALLERGY_1 through ALLERGY_5. Each allergy has a severity code, a reporting date, and a verifying provider. Conditions use ICD-10 codes (the international standard medical classification), enabling standardized reporting. The RESOLVED_DATE column allows tracking whether conditions are current or historical.
Domain 3: Insurance (Accessible by billing and admissions)
CREATE TABLE GPRH.INSURANCE_COMPANY (
INSURANCE_CO_ID INTEGER NOT NULL GENERATED ALWAYS AS IDENTITY,
COMPANY_NAME VARCHAR(200) NOT NULL,
COMPANY_PHONE VARCHAR(20),
COMPANY_ADDRESS VARCHAR(300),
PAYER_ID CHAR(10), -- Standard electronic payer ID
CONSTRAINT PK_INSURANCE_CO PRIMARY KEY (INSURANCE_CO_ID)
);
CREATE TABLE GPRH.PATIENT_INSURANCE (
PATIENT_INS_ID INTEGER NOT NULL GENERATED ALWAYS AS IDENTITY,
PATIENT_ID INTEGER NOT NULL,
INSURANCE_CO_ID INTEGER NOT NULL,
POLICY_NUMBER VARCHAR(30) NOT NULL,
GROUP_NUMBER VARCHAR(30),
COVERAGE_TYPE CHAR(1) NOT NULL, -- P=Primary, S=Secondary, T=Tertiary
EFFECTIVE_DATE DATE NOT NULL,
TERMINATION_DATE DATE,
CONSTRAINT PK_PT_INSURANCE PRIMARY KEY (PATIENT_INS_ID),
CONSTRAINT FK_PTINS_PATIENT FOREIGN KEY (PATIENT_ID)
REFERENCES GPRH.PATIENT (PATIENT_ID) ON DELETE RESTRICT,
CONSTRAINT FK_PTINS_INSCO FOREIGN KEY (INSURANCE_CO_ID)
REFERENCES GPRH.INSURANCE_COMPANY (INSURANCE_CO_ID) ON DELETE RESTRICT
);
Design rationale: Insurance company data is stored once in INSURANCE_COMPANY. When BlueCross changes its phone number, one row is updated. The PATIENT_INSURANCE table links patients to their insurance with coverage type and date ranges, supporting patients who switch insurance or carry multiple policies.
Domain 4: Audit Trail
CREATE TABLE GPRH.PHI_ACCESS_LOG (
LOG_ID BIGINT NOT NULL GENERATED ALWAYS AS IDENTITY,
USER_ID VARCHAR(30) NOT NULL,
ACCESS_TIMESTAMP TIMESTAMP(6) NOT NULL DEFAULT CURRENT TIMESTAMP,
TABLE_ACCESSED VARCHAR(128) NOT NULL,
PATIENT_ID INTEGER,
ACCESS_TYPE CHAR(1) NOT NULL, -- R=Read, W=Write, D=Delete
ACCESS_REASON VARCHAR(200),
APPLICATION_NAME VARCHAR(50),
CONSTRAINT PK_PHI_ACCESS PRIMARY KEY (LOG_ID),
CONSTRAINT CK_ACCESS_TYPE CHECK (ACCESS_TYPE IN ('R','W','D'))
);
Design rationale: Every access to a PHI table is logged. The log itself is append-only — no UPDATE or DELETE is permitted (enforced through application-level controls and DB2 authority settings). BIGINT is used for LOG_ID because this table will grow very rapidly — potentially millions of entries per day across all applications.
The Results
The migration from the 120-column PATIENT_MASTER to the normalized seventeen-table schema took fourteen months. Key outcomes:
HIPAA compliance improvement. Access controls became table-level rather than column-level. The billing team accessed Domain 3 tables. Clinical staff accessed Domain 2 tables. The audit trail in Domain 4 was clear and complete. The next HIPAA audit took three days instead of three weeks. Zero findings.
Data quality improvement. The "find all patients allergic to penicillin" query went from a five-column OR scan to a single-column index lookup. The query returned 47,000 patients — 12% more than the old system found, because the old system had missed patients whose penicillin allergy was recorded in the NOTES field instead of the ALLERGY columns.
Storage efficiency. Despite having more tables, the normalized design used 22% less total storage. The 120-column table had been padded with NULLs in the allergy, emergency contact, and secondary insurance columns for most patients. The normalized tables stored only the data that existed.
Insurance update cost. Updating BlueCross's phone number went from a batch job that modified 1.8 million rows (taking 4 hours and requiring a maintenance window) to a single-row UPDATE that completed in milliseconds.
Clinical safety. Three months after go-live, the pharmacy department reported that the medication allergy checking system was catching 31% more potential drug interactions. The old system had missed interactions for patients whose allergies were in the NOTES field, whose allergy count exceeded five, or whose allergy descriptions used non-standard terminology. The normalized design with standardized ALLERGEN_CODE values eliminated all three failure modes.
Lessons for Meridian Bank
Healthcare and banking share a fundamental requirement: the data must be correct, because errors have consequences that go far beyond inconvenience. GPRH's experience validates several principles that apply directly to Meridian Bank's design:
-
Separate domains into separate tables. At GPRH, clinical data and billing data are different domains with different access requirements. At Meridian, customer identity data and account financial data are different domains. Keep them separate.
-
Use rows, not numbered columns. ALLERGY_1 through ALLERGY_5 at GPRH is the same anti-pattern as PHONE_1, PHONE_2, PHONE_3 in a customer table. Normalize repeating groups into child tables.
-
Historize when history matters. GPRH's patient addresses have EFFECTIVE_DATE and END_DATE, preserving the address at the time of each medical encounter. Meridian Bank should consider the same pattern for customer addresses — a loan application filed from an address matters even after the customer moves.
-
Normalization enables access control. When sensitive data is in its own table, you grant access to that table only to those who need it. When everything is in one table, access control becomes a column-by-column battle.
-
The audit trail is a first-class entity. GPRH's PHI_ACCESS_LOG is a proper table with constraints and a clear structure. Meridian Bank's transaction logs and access audit trails deserve the same treatment.
"In medicine, the right diagnosis depends on the right data. If the allergy list is incomplete because the database could only hold five, the computer did not make an error — the designer did." — Dr. Sarah Okonkwo, GPRH Database Architecture Lead