Case Study 2: Normalization in Practice at a Hospital System

Building a HIPAA-Compliant Patient Data Model for Great Plains Regional Health


The Organization

Great Plains Regional Health (GPRH) is a hospital system operating seven hospitals and forty-two outpatient clinics across Kansas, Nebraska, and western Missouri. With 3.2 million patient records, 18,000 employees (including 2,400 physicians), and $2.1 billion in annual revenue, GPRH runs its clinical and administrative systems on DB2 for z/OS — a common choice for large health systems where reliability, auditability, and mainframe integration are non-negotiable.

In 2021, GPRH undertook a major redesign of its patient data model. The legacy system — built in the 1990s — had grown organically for twenty-five years, accumulating denormalized structures, duplicated data, and workarounds that made HIPAA compliance audits increasingly painful. This case study examines how the database team applied normalization principles to build a model that was both clinically accurate and regulatory compliant.

The HIPAA Challenge

The Health Insurance Portability and Accountability Act (HIPAA) imposes strict requirements on how Protected Health Information (PHI) is stored, accessed, and audited. For database design, the key requirements include:

  • Minimum Necessary Standard: Systems must be designed so that users and applications access only the minimum PHI necessary for their function. A billing clerk should not see clinical diagnoses. A radiologist should not see financial records.
  • Audit Trail: Every access to PHI must be logged — who accessed what, when, and why.
  • Data Integrity: Patient data must be accurate and complete. Errors in medication records or allergy information can be life-threatening, not just inconvenient.
  • Access Controls: Different roles (physician, nurse, billing, research) require different levels of access to different subsets of data.

A denormalized design — where patient demographics, clinical data, insurance information, and billing details are intermixed in a few wide tables — makes all of these requirements harder to satisfy. If a single PATIENT table contains both the patient's diagnosis and their Social Security number, you cannot grant a billing clerk access to the SSN column without also exposing the diagnosis, or vice versa. DB2's column-level GRANT helps, but it is far easier to enforce access boundaries when the data is in separate tables.

The Legacy Design

The legacy system's core table looked approximately like this:

PATIENT_MASTER (approximately 120 columns)
---
PATIENT_ID, MRN, SSN, FIRST_NAME, LAST_NAME, DOB, GENDER,
ADDRESS, CITY, STATE, ZIP,
HOME_PHONE, WORK_PHONE, CELL_PHONE, EMAIL,
EMERGENCY_CONTACT_1_NAME, EMERGENCY_CONTACT_1_PHONE, EMERGENCY_CONTACT_1_RELATION,
EMERGENCY_CONTACT_2_NAME, EMERGENCY_CONTACT_2_PHONE, EMERGENCY_CONTACT_2_RELATION,
PRIMARY_INSURANCE_ID, PRIMARY_INSURANCE_NAME, PRIMARY_POLICY_NUM, PRIMARY_GROUP_NUM,
SECONDARY_INSURANCE_ID, SECONDARY_INSURANCE_NAME, SECONDARY_POLICY_NUM, SECONDARY_GROUP_NUM,
PRIMARY_PHYSICIAN_ID, PRIMARY_PHYSICIAN_NAME, PRIMARY_PHYSICIAN_PHONE,
ALLERGY_1, ALLERGY_2, ALLERGY_3, ALLERGY_4, ALLERGY_5,
CHRONIC_CONDITION_1, CHRONIC_CONDITION_2, CHRONIC_CONDITION_3,
... (and 80 more columns)

The problems were severe:

1NF violations. Allergies were stored in five fixed columns (ALLERGY_1 through ALLERGY_5). Patients with six allergies had the sixth allergy recorded in the NOTES field. Patients with no allergies had five NULL columns consuming space. Querying "all patients allergic to penicillin" required checking five columns with OR logic.

2NF / 3NF violations. Insurance company names and phone numbers were stored alongside the patient's policy information. When BlueCross BlueShield changed its customer service number, the change had to be applied to 1.8 million patient records. (It was not applied consistently. Six months later, different patients had different phone numbers for the same insurance company.)

Repeating groups. Emergency contacts were stored as numbered column sets (EMERGENCY_CONTACT_1_, EMERGENCY_CONTACT_2_). Adding a third emergency contact required an ALTER TABLE that added three more columns to a 120-column table.

Access control nightmare. Granting a billing clerk access to the PATIENT_MASTER table exposed clinical information (allergies, chronic conditions) that the clerk had no business seeing. GPRH had implemented row-level and column-level access controls, but they were brittle and expensive to maintain.

The Normalized Redesign

The database team, led by architect Dr. Sarah Okonkwo, decomposed PATIENT_MASTER into seventeen tables organized into four domains:

Domain 1: Patient Identity (Accessible by all authorized roles)

CREATE TABLE GPRH.PATIENT (
    PATIENT_ID        INTEGER     NOT NULL GENERATED ALWAYS AS IDENTITY,
    MRN               CHAR(10)   NOT NULL,  -- Medical Record Number
    FIRST_NAME        VARCHAR(50) NOT NULL,
    MIDDLE_NAME       VARCHAR(50),
    LAST_NAME         VARCHAR(50) NOT NULL,
    DATE_OF_BIRTH     DATE        NOT NULL,
    GENDER_CODE       CHAR(1)     NOT NULL,
    DECEASED_DATE     DATE,
    CREATED_TIMESTAMP TIMESTAMP(6) NOT NULL DEFAULT CURRENT TIMESTAMP,
    CONSTRAINT PK_PATIENT PRIMARY KEY (PATIENT_ID),
    CONSTRAINT UQ_PATIENT_MRN UNIQUE (MRN),
    CONSTRAINT CK_GENDER CHECK (GENDER_CODE IN ('M','F','U','O'))
);

CREATE TABLE GPRH.PATIENT_ADDRESS (
    ADDRESS_ID        INTEGER     NOT NULL GENERATED ALWAYS AS IDENTITY,
    PATIENT_ID        INTEGER     NOT NULL,
    ADDRESS_TYPE      CHAR(4)     NOT NULL,  -- HOME, MAIL, TEMP, WORK
    ADDRESS_LINE1     VARCHAR(200) NOT NULL,
    ADDRESS_LINE2     VARCHAR(200),
    CITY              VARCHAR(50)  NOT NULL,
    STATE_CODE        CHAR(2)      NOT NULL,
    ZIP_CODE          CHAR(10)     NOT NULL,
    EFFECTIVE_DATE    DATE         NOT NULL,
    END_DATE          DATE,
    IS_PRIMARY        SMALLINT     NOT NULL DEFAULT 1,
    CONSTRAINT PK_PATIENT_ADDR PRIMARY KEY (ADDRESS_ID),
    CONSTRAINT FK_PTADDR_PATIENT FOREIGN KEY (PATIENT_ID)
        REFERENCES GPRH.PATIENT (PATIENT_ID) ON DELETE RESTRICT,
    CONSTRAINT CK_ADDR_TYPE CHECK (ADDRESS_TYPE IN ('HOME','MAIL','TEMP','WORK'))
);

CREATE TABLE GPRH.PATIENT_CONTACT (
    CONTACT_ID        INTEGER     NOT NULL GENERATED ALWAYS AS IDENTITY,
    PATIENT_ID        INTEGER     NOT NULL,
    CONTACT_METHOD    CHAR(5)     NOT NULL,  -- PHONE, EMAIL, FAX
    CONTACT_VALUE     VARCHAR(254) NOT NULL,
    CONTACT_USE       CHAR(4)     NOT NULL,  -- HOME, WORK, MOBL, EMER
    IS_PRIMARY        SMALLINT    NOT NULL DEFAULT 0,
    CONSTRAINT PK_PATIENT_CONTACT PRIMARY KEY (CONTACT_ID),
    CONSTRAINT FK_PTCONT_PATIENT FOREIGN KEY (PATIENT_ID)
        REFERENCES GPRH.PATIENT (PATIENT_ID) ON DELETE RESTRICT
);

CREATE TABLE GPRH.PATIENT_EMERGENCY_CONTACT (
    EMERG_CONTACT_ID  INTEGER     NOT NULL GENERATED ALWAYS AS IDENTITY,
    PATIENT_ID        INTEGER     NOT NULL,
    CONTACT_NAME      VARCHAR(100) NOT NULL,
    RELATIONSHIP      VARCHAR(50)  NOT NULL,
    PHONE_NUMBER      VARCHAR(20)  NOT NULL,
    PRIORITY_ORDER    SMALLINT     NOT NULL,
    CONSTRAINT PK_EMERG_CONTACT PRIMARY KEY (EMERG_CONTACT_ID),
    CONSTRAINT FK_EMCONT_PATIENT FOREIGN KEY (PATIENT_ID)
        REFERENCES GPRH.PATIENT (PATIENT_ID) ON DELETE RESTRICT
);

Design rationale: Patient addresses are historized with EFFECTIVE_DATE and END_DATE, so address history is preserved (critical for insurance claims that reference the address at the time of service). Emergency contacts are rows in a separate table, not numbered columns — a patient can have one or ten. Contact methods are fully normalized — no more PHONE_HOME, PHONE_WORK, PHONE_CELL columns.

Domain 2: Clinical (Accessible by physicians and nurses only)

CREATE TABLE GPRH.PATIENT_ALLERGY (
    ALLERGY_ID        INTEGER     NOT NULL GENERATED ALWAYS AS IDENTITY,
    PATIENT_ID        INTEGER     NOT NULL,
    ALLERGEN_CODE     VARCHAR(20) NOT NULL,
    ALLERGEN_DESC     VARCHAR(200) NOT NULL,
    SEVERITY_CODE     CHAR(2)     NOT NULL,  -- MI=Mild, MO=Moderate, SE=Severe, LT=Life-threatening
    REACTION_DESC     VARCHAR(500),
    REPORTED_DATE     DATE        NOT NULL,
    VERIFIED_BY       INTEGER,    -- FK to PROVIDER
    CONSTRAINT PK_PT_ALLERGY PRIMARY KEY (ALLERGY_ID),
    CONSTRAINT FK_PTALL_PATIENT FOREIGN KEY (PATIENT_ID)
        REFERENCES GPRH.PATIENT (PATIENT_ID) ON DELETE RESTRICT,
    CONSTRAINT CK_SEVERITY CHECK (SEVERITY_CODE IN ('MI','MO','SE','LT'))
);

CREATE TABLE GPRH.PATIENT_CONDITION (
    CONDITION_ID      INTEGER     NOT NULL GENERATED ALWAYS AS IDENTITY,
    PATIENT_ID        INTEGER     NOT NULL,
    ICD10_CODE        VARCHAR(10) NOT NULL,
    CONDITION_DESC    VARCHAR(200) NOT NULL,
    ONSET_DATE        DATE,
    RESOLVED_DATE     DATE,
    STATUS_CODE       CHAR(1)     NOT NULL DEFAULT 'A',  -- A=Active, R=Resolved, I=Inactive
    DIAGNOSED_BY      INTEGER,    -- FK to PROVIDER
    CONSTRAINT PK_PT_CONDITION PRIMARY KEY (CONDITION_ID),
    CONSTRAINT FK_PTCOND_PATIENT FOREIGN KEY (PATIENT_ID)
        REFERENCES GPRH.PATIENT (PATIENT_ID) ON DELETE RESTRICT
);

Design rationale: Allergies and conditions are now proper entities with unlimited cardinality — no more ALLERGY_1 through ALLERGY_5. Each allergy has a severity code, a reporting date, and a verifying provider. Conditions use ICD-10 codes (the international standard medical classification), enabling standardized reporting. The RESOLVED_DATE column allows tracking whether conditions are current or historical.

Domain 3: Insurance (Accessible by billing and admissions)

CREATE TABLE GPRH.INSURANCE_COMPANY (
    INSURANCE_CO_ID   INTEGER     NOT NULL GENERATED ALWAYS AS IDENTITY,
    COMPANY_NAME      VARCHAR(200) NOT NULL,
    COMPANY_PHONE     VARCHAR(20),
    COMPANY_ADDRESS   VARCHAR(300),
    PAYER_ID          CHAR(10),    -- Standard electronic payer ID
    CONSTRAINT PK_INSURANCE_CO PRIMARY KEY (INSURANCE_CO_ID)
);

CREATE TABLE GPRH.PATIENT_INSURANCE (
    PATIENT_INS_ID    INTEGER     NOT NULL GENERATED ALWAYS AS IDENTITY,
    PATIENT_ID        INTEGER     NOT NULL,
    INSURANCE_CO_ID   INTEGER     NOT NULL,
    POLICY_NUMBER     VARCHAR(30) NOT NULL,
    GROUP_NUMBER      VARCHAR(30),
    COVERAGE_TYPE     CHAR(1)     NOT NULL,  -- P=Primary, S=Secondary, T=Tertiary
    EFFECTIVE_DATE    DATE        NOT NULL,
    TERMINATION_DATE  DATE,
    CONSTRAINT PK_PT_INSURANCE PRIMARY KEY (PATIENT_INS_ID),
    CONSTRAINT FK_PTINS_PATIENT FOREIGN KEY (PATIENT_ID)
        REFERENCES GPRH.PATIENT (PATIENT_ID) ON DELETE RESTRICT,
    CONSTRAINT FK_PTINS_INSCO FOREIGN KEY (INSURANCE_CO_ID)
        REFERENCES GPRH.INSURANCE_COMPANY (INSURANCE_CO_ID) ON DELETE RESTRICT
);

Design rationale: Insurance company data is stored once in INSURANCE_COMPANY. When BlueCross changes its phone number, one row is updated. The PATIENT_INSURANCE table links patients to their insurance with coverage type and date ranges, supporting patients who switch insurance or carry multiple policies.

Domain 4: Audit Trail

CREATE TABLE GPRH.PHI_ACCESS_LOG (
    LOG_ID            BIGINT      NOT NULL GENERATED ALWAYS AS IDENTITY,
    USER_ID           VARCHAR(30) NOT NULL,
    ACCESS_TIMESTAMP  TIMESTAMP(6) NOT NULL DEFAULT CURRENT TIMESTAMP,
    TABLE_ACCESSED    VARCHAR(128) NOT NULL,
    PATIENT_ID        INTEGER,
    ACCESS_TYPE       CHAR(1)     NOT NULL,  -- R=Read, W=Write, D=Delete
    ACCESS_REASON     VARCHAR(200),
    APPLICATION_NAME  VARCHAR(50),
    CONSTRAINT PK_PHI_ACCESS PRIMARY KEY (LOG_ID),
    CONSTRAINT CK_ACCESS_TYPE CHECK (ACCESS_TYPE IN ('R','W','D'))
);

Design rationale: Every access to a PHI table is logged. The log itself is append-only — no UPDATE or DELETE is permitted (enforced through application-level controls and DB2 authority settings). BIGINT is used for LOG_ID because this table will grow very rapidly — potentially millions of entries per day across all applications.

The Results

The migration from the 120-column PATIENT_MASTER to the normalized seventeen-table schema took fourteen months. Key outcomes:

HIPAA compliance improvement. Access controls became table-level rather than column-level. The billing team accessed Domain 3 tables. Clinical staff accessed Domain 2 tables. The audit trail in Domain 4 was clear and complete. The next HIPAA audit took three days instead of three weeks. Zero findings.

Data quality improvement. The "find all patients allergic to penicillin" query went from a five-column OR scan to a single-column index lookup. The query returned 47,000 patients — 12% more than the old system found, because the old system had missed patients whose penicillin allergy was recorded in the NOTES field instead of the ALLERGY columns.

Storage efficiency. Despite having more tables, the normalized design used 22% less total storage. The 120-column table had been padded with NULLs in the allergy, emergency contact, and secondary insurance columns for most patients. The normalized tables stored only the data that existed.

Insurance update cost. Updating BlueCross's phone number went from a batch job that modified 1.8 million rows (taking 4 hours and requiring a maintenance window) to a single-row UPDATE that completed in milliseconds.

Clinical safety. Three months after go-live, the pharmacy department reported that the medication allergy checking system was catching 31% more potential drug interactions. The old system had missed interactions for patients whose allergies were in the NOTES field, whose allergy count exceeded five, or whose allergy descriptions used non-standard terminology. The normalized design with standardized ALLERGEN_CODE values eliminated all three failure modes.

Lessons for Meridian Bank

Healthcare and banking share a fundamental requirement: the data must be correct, because errors have consequences that go far beyond inconvenience. GPRH's experience validates several principles that apply directly to Meridian Bank's design:

  1. Separate domains into separate tables. At GPRH, clinical data and billing data are different domains with different access requirements. At Meridian, customer identity data and account financial data are different domains. Keep them separate.

  2. Use rows, not numbered columns. ALLERGY_1 through ALLERGY_5 at GPRH is the same anti-pattern as PHONE_1, PHONE_2, PHONE_3 in a customer table. Normalize repeating groups into child tables.

  3. Historize when history matters. GPRH's patient addresses have EFFECTIVE_DATE and END_DATE, preserving the address at the time of each medical encounter. Meridian Bank should consider the same pattern for customer addresses — a loan application filed from an address matters even after the customer moves.

  4. Normalization enables access control. When sensitive data is in its own table, you grant access to that table only to those who need it. When everything is in one table, access control becomes a column-by-column battle.

  5. The audit trail is a first-class entity. GPRH's PHI_ACCESS_LOG is a proper table with constraints and a clear structure. Meridian Bank's transaction logs and access audit trails deserve the same treatment.


"In medicine, the right diagnosis depends on the right data. If the allergy list is incomplete because the database could only hold five, the computer did not make an error — the designer did." — Dr. Sarah Okonkwo, GPRH Database Architecture Lead