56 min read

> "The SQL is the easy part. Designing the schema that makes the SQL simple, fast, and correct — that's where database professionals earn their salary."

Chapter 13: Logical Database Design — From Business Requirements to Entity-Relationship Models

"The SQL is the easy part. Designing the schema that makes the SQL simple, fast, and correct — that's where database professionals earn their salary." — Attributed to a senior DBA at a Fortune 100 bank, overheard at an IDUG conference


You know SQL. You can write a twelve-table JOIN with window functions, CTEs, and correlated subqueries. You can CREATE TABLE, define constraints, and build triggers that enforce business rules at the database layer. Parts I and II of this book made sure of that.

Now comes the hard question: What tables should you create in the first place?

This is not a syntactic question. It is a question about meaning — about what the business does, what data it needs to remember, and how those pieces of data relate to one another. Get it right, and the SQL practically writes itself. Queries are clean, indexes are obvious, and the schema evolves gracefully as requirements change. Get it wrong, and you spend the next five years writing tortured workarounds, patching data anomalies with nightly batch jobs, and explaining to auditors why the same customer has three different addresses — none of which match.

Chapter 13 is where we shift from using a database to designing one. We begin at the beginning: business requirements. We translate those requirements into a formal logical model using entity-relationship (ER) diagrams. We identify entities, attributes, and relationships. We model cardinality, handle subtypes, resolve many-to-many relationships, and validate our design through normalization. We catalog the anti-patterns that trip up even experienced designers. And we apply every technique to the Meridian National Bank, producing a complete logical data model with 30+ entities that will serve as the foundation for the rest of Part III.

By the end of this chapter, you will be able to walk into a conference room with business stakeholders, listen to their requirements, ask the right questions, and walk out with a logical model that both sides trust. That skill is what separates a database programmer from a data architect.


13.1 Why Logical Design Matters

Before we touch a single diagram, let us establish why logical design deserves its own chapter — and why so many projects skip it at enormous cost.

The Cost of Getting It Wrong

Consider three real-world scenarios that I have personally witnessed (names changed, patterns preserved):

Scenario 1: The Duplicated Customer. A regional bank built their core banking system without a unified CUSTOMER entity. The checking account system had its own customer table. The loan system had its own. The credit card system had a third. When regulators asked, "What is the total exposure for customer John Smith across all products?" the answer required a six-month data integration project, three consultants, and a probabilistic name-matching algorithm that was right 94% of the time. Six percent of the time it was wrong, and 6% of $2 billion in total exposure is $120 million in unquantified risk.

Scenario 2: The Denormalized Transaction Log. An e-commerce company stored the customer's shipping address directly in every order row. When a customer updated their address, old orders still showed the old address — which was intentional. But the company also stored the customer's name in the order row, and when a customer legally changed their name, some reports showed the old name and others showed the new name. Customer service calls doubled because representatives could not reconcile the data. The root cause: no one had drawn a logical model and asked, "Which attributes belong to the order and which belong to the customer?"

Scenario 3: The One Big Table. A startup decided that normalization was "overhead" and put everything into a single table with 187 columns. For the first year, queries were simple — just WHERE clauses on one table. By year three, the table had 400 million rows, most columns were NULL for most rows, and every ALTER TABLE ADD COLUMN required a full table reorganization that took the system offline for four hours. The company eventually re-platformed at a cost of $3 million.

These are not exotic edge cases. They are the predictable consequences of skipping logical design. The pattern is always the same:

  1. Design shortcuts save time in week one.
  2. Data anomalies appear by month three.
  3. Workarounds accumulate by year one.
  4. The cost to fix exceeds the cost of doing it right by year two.

What Logical Design Gives You

A proper logical model provides:

  • A shared vocabulary. Business stakeholders and technical teams agree on what "customer," "account," and "transaction" mean — including the edge cases. Is a joint account holder a customer? Is a voided check a transaction? These questions are answered in the model, not in a support ticket eighteen months from now.

  • Data integrity by design. When relationships and constraints are modeled correctly, the database prevents invalid data from being stored. You do not need application code to enforce that every account belongs to a customer, because the foreign key enforces it.

  • Predictable query performance. A normalized model with clear relationships tells you exactly which JOINs will be needed for common queries. You can design indexes before writing a single query. A denormalized or confused model forces you into table scans, self-joins, and parsing delimited strings out of VARCHAR columns.

  • Graceful evolution. When the business adds a new product type or regulatory requirement, a well-modeled schema accommodates it with an ALTER TABLE or a new table. A poorly modeled schema requires a migration project.

  • Audit and compliance readiness. Financial regulators (OCC, FDIC, FCA), healthcare regulators (HIPAA), and data privacy regulations (GDPR, CCPA) all require that organizations know what data they hold and how it relates. A logical model is the documentation that proves you know.

The Threshold Concept for This Chapter

A logical model is a contract between the business and the technical team. It defines what data means, not where it lives. Getting the logical model wrong means the physical database will be correct code for the wrong problem.


13.2 Requirements Gathering for Database Design

The logical model does not emerge from the mind of a data architect sitting alone in a dark room. It emerges from a structured conversation between the people who understand the business and the people who understand data.

Sources of Requirements

For a project like Meridian National Bank, requirements come from multiple sources:

Stakeholder interviews. Sit down with the branch manager and ask: "Walk me through what happens when a customer opens a new account." Sit down with the loan officer and ask: "What information do you need to approve a loan?" Sit down with the compliance officer and ask: "What reports does the OCC require, and what data feeds them?" Each conversation reveals entities, attributes, and business rules that no single person knows completely.

Existing systems and databases. If there is an existing system being replaced or extended, its schema is a rich (if imperfect) source of requirements. Run SELECT * FROM SYSCAT.TABLES WHERE TABSCHEMA = 'LEGACY' and examine every table. But treat the existing schema as evidence of what the business did, not necessarily what it should do. Legacy schemas accumulate workarounds, obsolete columns, and design compromises.

Business documents. Regulatory filings, product brochures, service agreements, account applications, and internal policy manuals all contain data requirements. The loan application form, for instance, tells you exactly what attributes the LOAN entity needs — because the business already decided what information it collects.

Process flows and use cases. Business process diagrams and user stories reveal the verbs in the system — open account, transfer funds, approve loan, generate statement. Each verb implies entities that participate and attributes that are read or written. If the process says "calculate monthly interest based on the daily closing balance," you know you need a balance history, not just a current balance.

Regulatory requirements. Banking regulations in particular drive data requirements. The Bank Secrecy Act requires Currency Transaction Reports for cash transactions over $10,000 — which means the TRANSACTION entity needs a cash indicator and a regulatory reporting flag. Know Your Customer (KYC) rules require identity verification — which means the CUSTOMER entity needs document references and verification status.

The Requirements Interview

A good requirements interview for data modeling follows a pattern:

  1. Start with nouns. Ask the stakeholder to describe their domain. Write down every noun: customer, account, branch, loan, payment, statement, officer, product. These are candidate entities.

  2. Ask "what do you need to know about each?" For each noun, ask what information matters. "What do you need to know about a customer?" yields: name, address, phone, date of birth, SSN, risk rating, relationship start date. These are candidate attributes.

  3. Ask "how do these relate?" "Can a customer have multiple accounts? Can an account belong to multiple customers? Does every account belong to a branch, or can an account exist without one?" These questions reveal relationships and cardinality.

  4. Ask about the edge cases. "What happens when a customer closes all their accounts but has an outstanding loan? Are they still a customer? What about a deceased customer whose estate is in probate?" Edge cases reveal optionality, status attributes, and lifecycle rules that the "happy path" misses.

  5. Ask about time. "Do you need to know the customer's address as of today, or as of the date of a transaction? When an interest rate changes, do you need the history of previous rates?" These questions reveal whether you need temporal modeling — effective dates, history tables, and bitemporal patterns.

  6. Ask about volume. "How many customers? How many transactions per day? How many years of history?" Volume does not directly affect the logical model, but it influences design decisions (such as whether to model a many-to-many relationship as a junction table versus a different approach) and is critical for the physical design in Chapter 14.

Common Requirements Pitfalls

The "just use a spreadsheet" trap. Stakeholders often describe their data needs in terms of the spreadsheet they currently use. "I need a column for January, a column for February, a column for March..." This is not a data requirement; it is a reporting format. Your job is to extract the underlying entities (MONTHLY_BALANCE with a MONTH_DATE column, not twelve separate columns).

The "we might need it someday" trap. Adding columns "just in case" leads to bloated tables full of NULLs. If the business cannot articulate a current use for an attribute, leave it out. ALTER TABLE ADD COLUMN is always available later.

The "everything is a string" trap. Stakeholders who think in spreadsheet terms tend to say "it's just text." A date of birth is not text — it is a DATE. A monetary amount is not text — it is a DECIMAL. An account status is not text — it is a constrained domain with exactly four valid values. Your job as a data modeler is to push for precise typing.

The "I'll know it when I see it" trap. Some stakeholders cannot articulate requirements in the abstract. For these people, build a prototype — even a whiteboard sketch with sample data — and ask, "Is this what you mean?" Concrete examples unlock requirements that interviews miss.

Check Your Understanding — Section 13.2

  1. You are interviewing a branch manager about the loan approval process. She says, "The loan officer evaluates the application, checks the customer's credit score, and either approves or denies it. If approved, we set the rate based on the product and the customer's creditworthiness." Identify at least four candidate entities and three candidate relationships from this statement.
  2. Why is an existing database schema a useful but imperfect source of requirements?
  3. A stakeholder says, "We need a column for each month's sales total." What is wrong with this requirement, and how would you remodel it?

13.3 Entities and Attributes

With requirements gathered, we begin the formal modeling process. The first step is identifying entities and their attributes.

What Is an Entity?

An entity is a thing of significance about which the business needs to store data. In a banking context:

  • CUSTOMER — a person or business that holds accounts
  • ACCOUNT — a financial product held by a customer (checking, savings, loan, CD)
  • BRANCH — a physical location where the bank operates
  • EMPLOYEE — a person who works for the bank
  • TRANSACTION — a financial event that affects an account balance
  • LOAN — an extension of credit from the bank to a customer
  • PRODUCT — a type of financial product the bank offers

Each entity becomes a table in the physical database. But at the logical level, we do not think in terms of tables — we think in terms of business concepts.

Strong Entities vs. Weak Entities

A strong entity can be uniquely identified by its own attributes. CUSTOMER has a CUSTOMER_ID; BRANCH has a BRANCH_CODE. These exist independently.

A weak entity depends on another entity for its identification. LOAN_PAYMENT cannot be uniquely identified without knowing which LOAN it belongs to. The payment is identified by the combination of LOAN_ID and PAYMENT_NUMBER. If the loan is deleted, the payments have no independent meaning.

In Crow's foot notation (which we will adopt as our standard in Section 13.5), weak entities are sometimes drawn with double-bordered rectangles or with their identifying relationship shown as a solid line.

Identifying Attributes

For each entity, we identify its attributes — the individual data elements. Attributes fall into several categories:

Simple attributes hold a single, atomic value. FIRST_NAME, DATE_OF_BIRTH, and ACCOUNT_BALANCE are simple attributes.

Composite attributes are conceptually composed of smaller parts. An address consists of ADDRESS_LINE1, ADDRESS_LINE2, CITY, STATE_CODE, ZIP_CODE, and COUNTRY_CODE. In the logical model, you have a choice: model the address as a single composite attribute (useful for ER diagrams) or decompose it into its constituent simple attributes (required for implementation). Best practice: decompose at the logical level. You will inevitably need to query by city, state, or zip code, and if the address is stored as a single text blob, every such query requires string parsing.

Derived attributes can be calculated from other stored data. ACCOUNT_AGE (the time since the account was opened) can be derived from OPEN_DATE and the current date. TOTAL_BALANCE (the sum of all account balances for a customer) can be derived by summing the ACCOUNT table. At the logical level, note derived attributes but do not plan to store them — storing them creates update anomalies. (There are exceptions: in Chapter 14, we will discuss strategic denormalization where a derived value is stored for performance, but that is a physical design decision, not a logical one.)

Multi-valued attributes can hold multiple values for a single entity instance. A customer may have multiple phone numbers or multiple email addresses. In the relational model, multi-valued attributes cannot be stored directly in a single column (that would violate First Normal Form). They must be modeled as a separate entity with a relationship back to the parent. CUSTOMER has many CUSTOMER_PHONE_NUMBER instances.

Attribute Domains

Every attribute has a domain — the set of valid values it can take. At the logical level, define domains precisely:

Attribute Domain
CUSTOMER_STATUS {'ACTIVE', 'INACTIVE', 'CLOSED', 'SUSPENDED'}
INTEREST_RATE Decimal, 0.0000 to 99.9999
ACCOUNT_BALANCE Decimal, -999,999,999,999.99 to 999,999,999,999.99
DATE_OF_BIRTH Date, must be in the past
STATE_CODE Two-letter US state abbreviation or territory code
EMAIL_ADDRESS Valid email format (per RFC 5322)

In DB2, domains are implemented through data types and CHECK constraints, as we saw in Chapter 11. But the logical model defines them in business terms, not SQL syntax.

Choosing the Primary Key

Every entity needs a primary key — one or more attributes that uniquely identify each instance. The choice of primary key has far-reaching consequences, as we discussed in Chapter 2. At the logical design level, the key decisions are:

Natural key vs. surrogate key. A natural key is derived from business data (SSN for CUSTOMER, BRANCH_CODE for BRANCH). A surrogate key is a system-generated identifier with no business meaning (CUSTOMER_ID as an auto-incrementing integer). The general recommendation for enterprise systems: use a surrogate key as the primary key, and enforce uniqueness on natural keys through UNIQUE constraints.

The rationale is practical: natural keys change. Customers change their SSN (it happens, especially after identity theft). Branch codes get renumbered in mergers. Email addresses change. When a natural key changes and it is the primary key, every foreign key reference in every related table must be updated — a cascading modification that can touch millions of rows. Surrogate keys never change because they carry no meaning.

Composite key vs. single-column key. Some entities are naturally identified by a combination of attributes. A LOAN_PAYMENT is identified by (LOAN_ID, PAYMENT_SEQUENCE). An ACCOUNT_HOLDER is identified by (ACCOUNT_ID, CUSTOMER_ID). Composite keys are legitimate and common, especially for associative entities (Section 13.6). But be aware that composite keys propagate into every child table as multi-column foreign keys, increasing storage and JOIN complexity.

Check Your Understanding — Section 13.3

  1. Classify each of the following as a simple, composite, derived, or multi-valued attribute of CUSTOMER: (a) last name, (b) mailing address, (c) age, (d) phone numbers, (e) risk rating.
  2. Why should derived attributes generally not be stored in the database at the logical design level?
  3. A developer proposes using EMAIL_ADDRESS as the primary key for the CUSTOMER table. Give three arguments against this choice.

13.4 Relationships and Cardinality

Entities do not exist in isolation. They are connected by relationships that reflect business rules. The core of ER modeling is identifying these relationships and specifying their cardinality (how many instances of one entity can be associated with one instance of another) and participation (whether the relationship is optional or mandatory).

One-to-One (1:1)

A one-to-one relationship means each instance of Entity A is associated with at most one instance of Entity B, and vice versa. These are relatively rare in practice.

Example: Each BRANCH has at most one BRANCH_MANAGER (an EMPLOYEE), and each EMPLOYEE can manage at most one branch.

One-to-one relationships often indicate that two entities could be merged into one. Before modeling a 1:1 relationship, ask: "Is this really one entity that we split for some reason?" Legitimate reasons to keep them separate include:

  • Access control. EMPLOYEE and EMPLOYEE_SALARY might be separated so that the salary table can have different security permissions.
  • Performance isolation. A CUSTOMER table and a CUSTOMER_PHOTO table might be separated so that the photo (a large BLOB) does not slow down queries that only need the customer's name and address.
  • Optional existence. Not every CUSTOMER has a CUSTOMER_ONLINE_PROFILE. If only 30% of customers use online banking, storing the online profile attributes in the main CUSTOMER table wastes space for the other 70%.

One-to-Many (1:N)

The most common relationship type. One instance of Entity A is associated with many instances of Entity B, but each instance of Entity B is associated with at most one instance of Entity A.

Examples at Meridian Bank:

  • One BRANCH has many EMPLOYEES; each EMPLOYEE works at one BRANCH.
  • One CUSTOMER has many ACCOUNTS; each ACCOUNT belongs to one CUSTOMER (for now — we will revisit joint accounts later).
  • One ACCOUNT has many TRANSACTIONS; each TRANSACTION belongs to one ACCOUNT.
  • One LOAN_OFFICER (EMPLOYEE) manages many LOANS; each LOAN is managed by one LOAN_OFFICER.

In a relational database, one-to-many relationships are implemented by placing the primary key of the "one" side as a foreign key in the "many" side table. EMPLOYEE has a BRANCH_ID column that references BRANCH.BRANCH_ID.

Many-to-Many (M:N)

A many-to-many relationship means each instance of Entity A can be associated with many instances of Entity B, and each instance of Entity B can be associated with many instances of Entity A.

Examples at Meridian Bank:

  • A CUSTOMER can hold many ACCOUNTS, and an ACCOUNT can be held by many CUSTOMERS (joint accounts).
  • An EMPLOYEE can have many SKILLS (or certifications), and each SKILL can belong to many EMPLOYEES.
  • A PRODUCT can be offered at many BRANCHES, and each BRANCH can offer many PRODUCTS.

Many-to-many relationships cannot be directly implemented in a relational database. They must be resolved through a junction table (also called a bridge table, associative entity, or linking table). We cover this in detail in Section 13.6.

Identifying vs. Non-Identifying Relationships

An identifying relationship is one where the child entity depends on the parent for its identity. The parent's primary key becomes part of the child's primary key.

Example: LOAN_PAYMENT is identified by (LOAN_ID, PAYMENT_SEQUENCE). Without LOAN_ID, the payment has no identity. If the relationship between LOAN and LOAN_PAYMENT is identifying, LOAN_ID migrates into the primary key of LOAN_PAYMENT.

A non-identifying relationship is one where the child can be identified independently. The parent's primary key appears as a foreign key in the child, but not as part of the child's primary key.

Example: ACCOUNT has a CUSTOMER_ID foreign key, but ACCOUNT is identified by its own ACCOUNT_ID. The customer relationship is non-identifying.

In Crow's foot notation, identifying relationships are shown with a solid line; non-identifying relationships are shown with a dashed line.

Participation: Mandatory vs. Optional

Mandatory participation (also called total participation) means every instance of the entity must participate in the relationship. At Meridian Bank, every ACCOUNT must belong to a CUSTOMER. The relationship from ACCOUNT to CUSTOMER has mandatory participation on the ACCOUNT side.

Optional participation (also called partial participation) means some instances may not participate. Not every EMPLOYEE is a BRANCH_MANAGER. The relationship from EMPLOYEE to BRANCH (as manager) has optional participation on the EMPLOYEE side.

In the logical model, mandatory participation translates to NOT NULL foreign keys in the physical design. Optional participation translates to nullable foreign keys.

Reading Cardinality Notation

When we say "a CUSTOMER has many ACCOUNTS," we write the cardinality from each side:

  • From CUSTOMER to ACCOUNT: one CUSTOMER has zero or more ACCOUNTS (written as 0..* or 0..N)
  • From ACCOUNT to CUSTOMER: one ACCOUNT belongs to exactly one CUSTOMER (written as 1..1)

The full notation is: CUSTOMER 1 ——— 0..* ACCOUNT

This tells us: - The relationship is one-to-many (1:N) - Participation is mandatory on the ACCOUNT side (every ACCOUNT must have a CUSTOMER) - Participation is optional on the CUSTOMER side from the perspective of the "many" — a CUSTOMER with zero accounts is allowed

Always read relationships from both directions. "A CUSTOMER has many ACCOUNTS" and "An ACCOUNT belongs to one CUSTOMER" are two perspectives on the same relationship. Both must be stated and validated with stakeholders.

Self-Referencing Relationships

An entity can have a relationship with itself. The most common example at Meridian Bank is the employee-manager hierarchy:

  • EMPLOYEE reports to EMPLOYEE: Each employee has zero or one manager (who is also an employee). Each manager can have zero or more direct reports.

This is a one-to-many self-referencing relationship. In the physical model, it is implemented as a MANAGER_ID column in the EMPLOYEE table that references EMPLOYEE_ID in the same table. We saw this in Chapter 4's initial schema, and we explored it with recursive CTEs in Chapter 8.

Another example: in a chart of accounts, ACCOUNT_CATEGORY can have a parent ACCOUNT_CATEGORY, forming a tree structure.

Check Your Understanding — Section 13.4

  1. Classify each relationship as 1:1, 1:N, or M:N: (a) CUSTOMER to ADDRESS, assuming each customer has one mailing address, (b) CUSTOMER to ADDRESS, assuming customers can have home, work, and mailing addresses, (c) ACCOUNT to PRODUCT_TYPE, (d) EMPLOYEE to DEPARTMENT.
  2. Why can many-to-many relationships not be directly implemented in a relational database?
  3. Describe a real-world 1:1 relationship in a banking context and explain why you might keep the two entities separate.

13.5 ER Diagram Notation

There are several notations for drawing ER diagrams. The three most common are:

Chen Notation

Invented by Peter Chen in 1976, this is the original ER notation. Entities are rectangles, relationships are diamonds, and attributes are ovals connected to their entity. Cardinality is shown with numbers (1, N, M) on the connecting lines.

┌──────────┐          ┌──────────┐          ┌──────────┐
│ CUSTOMER │——1———<has>———N——│ ACCOUNT  │——1———<at>———N——│ BRANCH   │
└──────────┘          └──────────┘          └──────────┘

Strengths: Visually clear for teaching. Relationship names are explicit (the diamond). Weaknesses: Takes up a lot of space. Attribute ovals create visual clutter for entities with many attributes. Rarely used in industry.

Crow's Foot (IE) Notation

Developed as part of Information Engineering (IE) methodology, this is the most widely used notation in industry and the one we will use for the rest of this book.

Entities are rectangles. Relationships are lines connecting entities. Cardinality and participation are shown with symbols at each end of the line:

Symbol Meaning
──\|\|── Exactly one (mandatory)
──\|O── Zero or one (optional)
──<── or crow's foot Many
──\|\|──<── One or many (mandatory)
──O──<── Zero or many (optional)

In Crow's foot notation, the "crow's foot" (a three-pronged fork) represents the "many" side. A vertical bar represents "one." A circle represents "zero" (optional).

For Meridian Bank's core relationship:

┌──────────┐                    ┌──────────┐
│ CUSTOMER │ ──── 1 ──── 0..* ──── │ ACCOUNT  │
└──────────┘                    └──────────┘

  Read: One CUSTOMER has zero or more ACCOUNTS.
        One ACCOUNT belongs to exactly one CUSTOMER.

Strengths: Compact. Cardinality and participation are immediately visible. Supported by virtually all data modeling tools (erwin, PowerDesigner, Lucidchart, draw.io). Industry standard. Weaknesses: Relationship names are often omitted (shown as labels on the line), which can reduce clarity for complex models.

UML Class Diagram Notation

UML (Unified Modeling Language) class diagrams can also represent data models. Entities are rectangles divided into compartments (name, attributes, operations). Relationships are lines with multiplicity annotations (1, 0..1, , 1..).

┌──────────────┐          ┌──────────────┐
│  CUSTOMER    │ 1    0..*│  ACCOUNT     │
│──────────────│──────────│──────────────│
│ customer_id  │          │ account_id   │
│ first_name   │          │ customer_id  │
│ last_name    │          │ balance      │
└──────────────┘          └──────────────┘

Strengths: Familiar to software developers. Shows attributes in the entity box. Good for object-relational mapping discussions. Weaknesses: Less precise about identifying relationships. The "operations" compartment is irrelevant for data modeling. Not as widely used by DBAs.

Our Choice: Crow's Foot

For the remainder of this book, we use Crow's foot notation for all ER diagrams. The reasons:

  1. It is the industry standard for relational data modeling.
  2. It compactly conveys both cardinality and participation.
  3. Every major modeling tool supports it.
  4. DB2 professionals (your future colleagues) will expect it.

When you encounter diagrams in other notations — and you will — use the equivalence table above to translate.

Practitioner's Tip

The notation matters far less than the conversations it enables. A whiteboard sketch with boxes, lines, and numbers written next to them is a perfectly valid ER diagram if it helps the team agree on the data model. Do not let notation debates delay the design work.


13.6 Resolving Many-to-Many Relationships

Many-to-many (M:N) relationships are common in business requirements but cannot exist as-is in a relational database. A relational table can hold a foreign key to one parent, but it cannot hold a foreign key to "many" parents without repeating rows or using comma-separated lists (both of which violate normalization principles). The solution is a junction table — a new entity that sits between the two original entities and has a one-to-many relationship with each.

The Basic Pattern

Business requirement: A customer can hold multiple accounts, and an account can be held by multiple customers (joint accounts).

Direct M:N (cannot implement):

CUSTOMER ——— M:N ——— ACCOUNT

Resolved with junction table:

CUSTOMER ──── 1 ──── 0..* ──── ACCOUNT_HOLDER ──── 0..* ──── 1 ──── ACCOUNT

The junction table ACCOUNT_HOLDER contains:

Column Description
ACCOUNT_ID FK to ACCOUNT (part of composite PK)
CUSTOMER_ID FK to CUSTOMER (part of composite PK)
HOLDER_TYPE 'PRIMARY' or 'JOINT'
ADDED_DATE When this customer was added to the account
REMOVED_DATE When (if) this customer was removed
CAN_TRANSACT Whether this holder can execute transactions
CAN_CLOSE Whether this holder can close the account

Attributes on the Relationship

Notice that ACCOUNT_HOLDER has its own attributes: HOLDER_TYPE, ADDED_DATE, CAN_TRANSACT. These attributes describe the relationship between a customer and an account, not the customer or the account individually. This is a key insight: when a many-to-many relationship has attributes, the junction table is not just a mechanical bridge — it is a meaningful entity in its own right.

The name matters. Do not call it CUSTOMER_ACCOUNT_XREF or CUST_ACCT_LINK. Call it ACCOUNT_HOLDER, because that is what it means. A junction table is an entity. Name it like one.

More Examples at Meridian Bank

Products offered at branches:

BRANCH ──── 1 ──── 0..* ──── BRANCH_PRODUCT ──── 0..* ──── 1 ──── PRODUCT

BRANCH_PRODUCT attributes: EFFECTIVE_DATE, END_DATE, CUSTOM_RATE (if the branch offers a promotional rate for that product).

Employee skills and certifications:

EMPLOYEE ──── 1 ──── 0..* ──── EMPLOYEE_CERTIFICATION ──── 0..* ──── 1 ──── CERTIFICATION

EMPLOYEE_CERTIFICATION attributes: CERTIFICATION_DATE, EXPIRY_DATE, SCORE, STATUS.

Customer documents:

CUSTOMER ──── 1 ──── 0..* ──── CUSTOMER_DOCUMENT ──── 0..* ──── 1 ──── DOCUMENT_TYPE

CUSTOMER_DOCUMENT attributes: DOCUMENT_NUMBER, ISSUE_DATE, EXPIRY_DATE, VERIFICATION_STATUS, VERIFIED_BY, VERIFIED_DATE.

When Not to Use a Junction Table

If the business rule is truly one-to-many — "each account belongs to exactly one customer" — do not introduce a junction table preemptively. Adding unnecessary junction tables complicates queries (every query that joins CUSTOMER to ACCOUNT now requires an intermediate JOIN), increases storage, and reduces clarity. Model what the business requires today, not what it might require someday.

If the requirement changes from one-to-many to many-to-many later, you can add the junction table at that time using the schema evolution techniques in Chapter 16. This is one of the advantages of surrogate keys: because ACCOUNT references CUSTOMER through a CUSTOMER_ID foreign key, and not vice versa, adding a junction table is an additive change — you create the new table and migrate the foreign key data into it.

Check Your Understanding — Section 13.6

  1. A library system has BOOKS and AUTHORS. An author can write many books, and a book can have many authors. Design the junction table, including at least two attributes that describe the relationship.
  2. Why is the name of a junction table important? How would you name the junction table for the BOOKS/AUTHORS example?
  3. Under what circumstances would you avoid creating a junction table even when the relationship appears to be many-to-many?

13.7 Subtypes and Supertypes

Not all entities of the same "type" have the same attributes. At Meridian Bank, all accounts share certain attributes (account number, balance, open date, status), but checking accounts have overdraft limits, savings accounts have minimum balance requirements, and certificates of deposit have maturity dates and early withdrawal penalties. This is the supertype/subtype pattern (also called generalization/specialization or inheritance in ER modeling).

The Supertype

The supertype is the general entity that contains attributes common to all subtypes:

ACCOUNT (supertype) - ACCOUNT_ID (PK) - ACCOUNT_NUMBER - CUSTOMER_ID (FK) - BRANCH_ID (FK) - CURRENT_BALANCE - OPEN_DATE - STATUS

The Subtypes

Each subtype contains attributes specific to that category:

CHECKING_ACCOUNT (subtype) - ACCOUNT_ID (PK, FK to ACCOUNT) - OVERDRAFT_LIMIT - CHECK_REORDER_THRESHOLD - DEBIT_CARD_NUMBER

SAVINGS_ACCOUNT (subtype) - ACCOUNT_ID (PK, FK to ACCOUNT) - INTEREST_RATE - MIN_BALANCE - INTEREST_ACCRUAL_METHOD

CERTIFICATE_OF_DEPOSIT (subtype) - ACCOUNT_ID (PK, FK to ACCOUNT) - MATURITY_DATE - TERM_MONTHS - EARLY_WITHDRAWAL_PENALTY_PCT - AUTO_RENEW

Completeness and Disjointness

Two properties define how subtypes relate to the supertype:

Completeness: Is every instance of the supertype also an instance of some subtype? - Total (mandatory): Every ACCOUNT must be a CHECKING, SAVINGS, CD, or some other specific type. No "untyped" accounts exist. - Partial (optional): Some ACCOUNTs might not belong to any subtype (perhaps a new product type that has not been modeled yet).

Disjointness: Can an instance belong to more than one subtype? - Disjoint (exclusive): An ACCOUNT is either a CHECKING_ACCOUNT or a SAVINGS_ACCOUNT, never both. - Overlapping (non-exclusive): An EMPLOYEE could be both a LOAN_OFFICER and a BRANCH_MANAGER simultaneously.

At Meridian Bank, accounts are total and disjoint — every account has exactly one type. This is enforced by a discriminator attribute (ACCOUNT_TYPE_CODE) in the supertype that determines which subtype table contains the additional attributes.

Implementation Strategies

The logical model describes the supertype/subtype hierarchy. When we move to physical design (Chapter 14), we have three implementation strategies:

Strategy 1: Table-per-hierarchy (single table). Put all attributes from all subtypes into one table. Subtype-specific attributes are NULL for rows of other subtypes.

CREATE TABLE ACCOUNT (
    ACCOUNT_ID          INTEGER NOT NULL,
    ACCOUNT_TYPE_CODE   CHAR(3) NOT NULL,
    -- Common attributes
    CURRENT_BALANCE     DECIMAL(15,2),
    OPEN_DATE           DATE,
    -- Checking-specific (NULL for non-checking)
    OVERDRAFT_LIMIT     DECIMAL(15,2),
    CHECK_REORDER_THRESHOLD INTEGER,
    -- Savings-specific (NULL for non-savings)
    INTEREST_RATE       DECIMAL(5,4),
    MIN_BALANCE         DECIMAL(15,2),
    -- CD-specific (NULL for non-CD)
    MATURITY_DATE       DATE,
    TERM_MONTHS         INTEGER,
    EARLY_WITHDRAWAL_PENALTY_PCT DECIMAL(5,2),
    ...
);

Pros: Simple queries (no JOINs needed). Good for read-heavy workloads. Cons: Many NULL columns. Cannot enforce NOT NULL on subtype-specific attributes through DDL (you need CHECK constraints). Table can become very wide.

Strategy 2: Table-per-type (separate tables). Create one table for the supertype and one table for each subtype. The subtype tables reference the supertype through a shared primary key.

CREATE TABLE ACCOUNT (
    ACCOUNT_ID      INTEGER NOT NULL PRIMARY KEY,
    ACCOUNT_TYPE    CHAR(3) NOT NULL,
    CURRENT_BALANCE DECIMAL(15,2),
    OPEN_DATE       DATE
);

CREATE TABLE CHECKING_ACCOUNT (
    ACCOUNT_ID          INTEGER NOT NULL PRIMARY KEY
        REFERENCES ACCOUNT(ACCOUNT_ID),
    OVERDRAFT_LIMIT     DECIMAL(15,2) NOT NULL,
    CHECK_REORDER_THRESHOLD INTEGER
);

CREATE TABLE SAVINGS_ACCOUNT (
    ACCOUNT_ID      INTEGER NOT NULL PRIMARY KEY
        REFERENCES ACCOUNT(ACCOUNT_ID),
    INTEREST_RATE   DECIMAL(5,4) NOT NULL,
    MIN_BALANCE     DECIMAL(15,2) NOT NULL
);

Pros: Clean separation. Subtype-specific NOT NULL constraints are enforceable. No wasted NULL columns. Cons: Queries that need both common and subtype-specific attributes require a JOIN. INSERT requires two statements (or a stored procedure).

Strategy 3: Table-per-concrete-class (no supertype table). Each subtype is a standalone table containing both the common and subtype-specific attributes. There is no shared supertype table.

CREATE TABLE CHECKING_ACCOUNT (
    ACCOUNT_ID      INTEGER NOT NULL PRIMARY KEY,
    CURRENT_BALANCE DECIMAL(15,2),
    OPEN_DATE       DATE,
    OVERDRAFT_LIMIT DECIMAL(15,2) NOT NULL,
    CHECK_REORDER_THRESHOLD INTEGER
);

CREATE TABLE SAVINGS_ACCOUNT (
    ACCOUNT_ID      INTEGER NOT NULL PRIMARY KEY,
    CURRENT_BALANCE DECIMAL(15,2),
    OPEN_DATE       DATE,
    INTEREST_RATE   DECIMAL(5,4) NOT NULL,
    MIN_BALANCE     DECIMAL(15,2) NOT NULL
);

Pros: Each table is self-contained. No JOINs needed for subtype-specific queries. Cons: Common attributes are duplicated in every table. Queries across all account types require UNION ALL. Foreign keys from other tables (like TRANSACTION) must reference one of many possible tables — which is unworkable.

The Recommendation for Meridian Bank: Use table-per-type (Strategy 2) for the ACCOUNT hierarchy. It gives us the clean separation of subtypes with the ability to query all accounts through the shared ACCOUNT table. Every TRANSACTION references ACCOUNT.ACCOUNT_ID, regardless of the account type.

Spaced Review — Chapter 2 Connection

In Chapter 2, we introduced the concept of normalization and its normal forms. Supertype/subtype design is directly related: the reason we separate subtype-specific attributes into their own tables is to eliminate the NULL-heavy rows that would violate the spirit (if not the letter) of normalization. A single-table approach with many NULLs is technically in 3NF (NULLs are not a normalization violation per se), but it creates the same kinds of update anomalies that normalization was designed to prevent.


13.8 Normalization as Design Validation

You were introduced to normalization in Chapter 2. There, we learned normalization as a theoretical framework — a way of understanding how data should be organized. Now we revisit normalization as a practical validation technique for your ER model.

The workflow is:

  1. Gather requirements.
  2. Build the ER model based on business understanding.
  3. Translate the ER model into preliminary table definitions.
  4. Apply normalization rules to validate (and if needed, correct) the design.

Normalization is not a design method — it is a design check. You do not normalize first and design second. You design from the business requirements (top-down) and then validate with normalization (bottom-up). If normalization reveals a problem, you go back and fix the ER model.

Quick Review: The Normal Forms

First Normal Form (1NF): Every column contains atomic (indivisible) values. No repeating groups. No arrays or comma-separated lists in a single column.

Violation example: A CUSTOMER table with a PHONE_NUMBERS column containing "555-1234, 555-5678, 555-9012". Fix: Create a CUSTOMER_PHONE table.

Second Normal Form (2NF): Must be in 1NF, and every non-key attribute must depend on the entire primary key (not just part of it). This matters for tables with composite primary keys.

Violation example: An ORDER_LINE table with composite key (ORDER_ID, PRODUCT_ID) that includes ORDER_DATE. The ORDER_DATE depends only on ORDER_ID, not on the full key (ORDER_ID, PRODUCT_ID). Fix: Move ORDER_DATE to the ORDER table.

Third Normal Form (3NF): Must be in 2NF, and no non-key attribute depends on another non-key attribute (no transitive dependencies).

Violation example: A CUSTOMER table with BRANCH_ID, BRANCH_NAME, and BRANCH_ADDRESS. BRANCH_NAME and BRANCH_ADDRESS depend on BRANCH_ID, not directly on CUSTOMER_ID. Fix: Move BRANCH_NAME and BRANCH_ADDRESS to a BRANCH table; keep only BRANCH_ID in CUSTOMER.

Boyce-Codd Normal Form (BCNF): Must be in 3NF, and every determinant is a candidate key. BCNF handles an edge case where 3NF allows certain anomalies when multiple overlapping candidate keys exist.

Violation example: A COURSE_SCHEDULE table where (STUDENT, COURSE) is one candidate key and INSTRUCTOR determines COURSE (each instructor teaches only one course). INSTRUCTOR is a determinant but not a candidate key. Fix: Decompose into two tables.

Applying Normalization to the Meridian Bank Model

Let us validate a portion of our model. Consider the initial draft of the TRANSACTION entity:

TRANSACTION
-----------
TRANSACTION_ID (PK)
ACCOUNT_ID (FK)
TRANSACTION_TYPE
AMOUNT
TRANSACTION_DATE
DESCRIPTION
CUSTOMER_NAME        ← problem
ACCOUNT_NUMBER       ← problem
BRANCH_NAME          ← problem

1NF check: Passes. All attributes are atomic.

2NF check: Passes. The primary key is a single column (TRANSACTION_ID), so partial dependencies are impossible.

3NF check: Fails. CUSTOMER_NAME depends on ACCOUNT_ID (transitively through CUSTOMER_ID). ACCOUNT_NUMBER depends on ACCOUNT_ID. BRANCH_NAME depends on BRANCH_ID (through ACCOUNT). These are transitive dependencies.

Fix: Remove CUSTOMER_NAME, ACCOUNT_NUMBER, and BRANCH_NAME. They belong in their respective tables. When you need the customer name on a transaction report, JOIN TRANSACTION to ACCOUNT to CUSTOMER.

This is exactly the kind of design flaw that normalization catches. A developer might include CUSTOMER_NAME in the TRANSACTION table "for convenience" — to avoid a JOIN in the most common query. But the moment a customer changes their name, every historical transaction would show the new name instead of the name at the time of the transaction. Is that the business requirement? Maybe. Maybe not. The normalization check forces you to ask the question.

When Normalization Reveals Design Flaws

If normalization keeps pointing to the same entity — "this attribute does not belong here, that attribute does not belong here" — it often means you are missing an entity. The attributes that keep getting removed are trying to tell you they belong to a table that does not exist yet.

Example: You design a LOAN table and include COLLATERAL_TYPE, COLLATERAL_VALUE, COLLATERAL_DESCRIPTION, COLLATERAL_APPRAISAL_DATE. Normalization says these depend on the collateral, not the loan. This suggests a COLLATERAL entity that the LOAN entity relates to — and once you create it, you realize that multiple loans can be secured by the same collateral (a house can secure both a mortgage and a home equity line of credit). A missing entity has been found.

Practitioner's Note

In practice, most ER models designed by experienced data architects are already in 3NF or BCNF because the architects think in terms of entities and relationships, not flat tables. Normalization violations tend to creep in when developers add "convenience columns" during physical design or when requirements change and new attributes get tacked onto existing tables. Run a normalization check every time you modify the model.


13.9 Common Design Patterns

Certain data modeling patterns recur across industries. Recognizing them saves design time and avoids reinventing wheels that have already been tested and refined.

Pattern 1: Audit Trail

Every change to critical data should be tracked: who changed it, when, what the old value was, and what the new value is. At Meridian Bank, this is a regulatory requirement.

Design options:

Option A: Centralized audit log. A single AUDIT_LOG table records changes across all tables.

AUDIT_LOG
---------
AUDIT_ID (PK)
TABLE_NAME
OPERATION (I/U/D)
PRIMARY_KEY_VALUE
COLUMN_NAME
OLD_VALUE (VARCHAR)
NEW_VALUE (VARCHAR)
CHANGED_BY
CHANGED_TIMESTAMP

Pros: Simple. One table to query for all audit data. Cons: OLD_VALUE and NEW_VALUE are VARCHAR regardless of the original data type. Querying "all changes to customer 10042" requires string matching on PRIMARY_KEY_VALUE.

Option B: Shadow tables. For each audited table, create a corresponding history table with the same columns plus audit metadata.

CUSTOMER_HISTORY
---------
HISTORY_ID (PK)
CUSTOMER_ID
FIRST_NAME
LAST_NAME
... (all CUSTOMER columns)
OPERATION (I/U/D)
CHANGED_BY
CHANGED_TIMESTAMP
EFFECTIVE_FROM
EFFECTIVE_TO

Pros: Preserves data types. Easy to reconstruct the state of a customer at any point in time. Efficient querying. Cons: Schema maintenance — every ALTER TABLE on CUSTOMER requires a matching ALTER TABLE on CUSTOMER_HISTORY.

Meridian Bank choice: Both. The centralized AUDIT_LOG captures fine-grained column-level changes for compliance queries ("what changed in the last 24 hours?"), while shadow tables capture row-level snapshots for historical reconstruction ("what was this customer's address on March 15, 2024?").

Pattern 2: Effective Dating

Many business attributes change over time, and the business needs to know not just the current value, but the value at any given point in the past (or future).

Example: Interest rates change. A savings account might earn 2.5% from January to March, 2.75% from April to June, and 3.0% from July onward.

INTEREST_RATE_HISTORY
---------------------
RATE_ID (PK)
ACCOUNT_ID (FK) or PRODUCT_ID (FK)
EFFECTIVE_DATE
END_DATE (NULL if current)
INTEREST_RATE
CHANGED_BY
CHANGED_TIMESTAMP

The query to find the rate on any given date is:

SELECT INTEREST_RATE
FROM INTEREST_RATE_HISTORY
WHERE ACCOUNT_ID = ?
  AND EFFECTIVE_DATE <= ?
  AND (END_DATE IS NULL OR END_DATE > ?)

Design rules for effective dating:

  • Use a half-open interval: EFFECTIVE_DATE is inclusive, END_DATE is exclusive. This prevents gaps and overlaps.
  • The current record has END_DATE = NULL.
  • When a new rate takes effect, update the previous record's END_DATE and insert a new record.
  • Consider adding a CHECK constraint or trigger to prevent overlapping date ranges.

Pattern 3: Address and Phone Polymorphism

Customers, employees, and branches all have addresses. Do you duplicate the address columns in each table, or do you create a shared ADDRESS table?

Option A: Embedded addresses (addresses inline in each table).

CUSTOMER (... ADDRESS_LINE1, ADDRESS_LINE2, CITY, STATE, ZIP ...)
EMPLOYEE (... ADDRESS_LINE1, ADDRESS_LINE2, CITY, STATE, ZIP ...)
BRANCH   (... ADDRESS_LINE1, ADDRESS_LINE2, CITY, STATE, ZIP ...)

Pros: Simple JOINs. Each entity "owns" its address. Cons: Address validation logic is duplicated. If the business adds COUNTY or COUNTRY, three tables must be altered.

Option B: Shared address table with polymorphic relationship.

ADDRESS
-------
ADDRESS_ID (PK)
ENTITY_TYPE ('CUSTOMER', 'EMPLOYEE', 'BRANCH')
ENTITY_ID
ADDRESS_TYPE ('HOME', 'WORK', 'MAILING', 'LEGAL')
ADDRESS_LINE1
ADDRESS_LINE2
CITY
STATE_CODE
ZIP_CODE
COUNTRY_CODE

Pros: Single source of address logic. Supports multiple addresses per entity. Cons: Cannot enforce foreign keys (ENTITY_ID could reference any table, and DB2 cannot validate a polymorphic FK). Queries require filtering by ENTITY_TYPE.

Option C: Separate address tables per entity (recommended).

CUSTOMER_ADDRESS (CUSTOMER_ID FK, ADDRESS_TYPE, ADDRESS_LINE1, ...)
EMPLOYEE_ADDRESS (EMPLOYEE_ID FK, ADDRESS_TYPE, ADDRESS_LINE1, ...)

Pros: Foreign keys are enforceable. Each entity's addresses are isolated. Cons: Some column duplication across address tables.

Meridian Bank choice: For the initial model, we use Option A (embedded addresses) for BRANCH and EMPLOYEE (which typically have one address each) and Option C (CUSTOMER_ADDRESS) for CUSTOMER (who may have multiple addresses — home, work, mailing). This is a pragmatic compromise: we do not over-engineer the model where the business does not require it, but we handle the genuine many-address requirement for customers properly.

Pattern 4: Configurable Attributes (Flexible Fields)

Sometimes the business needs attributes that vary by category and change over time. Product features, for example: a checking account might have attributes (monthly fee, minimum balance, free checks per month) that differ from a savings account's attributes (interest rate, compounding frequency).

The proper approach: Model the known attributes directly and use a well-structured configuration table for truly variable attributes.

PRODUCT_FEATURE
---------------
PRODUCT_ID (FK, part of PK)
FEATURE_CODE (FK to FEATURE_DEFINITION, part of PK)
FEATURE_VALUE
EFFECTIVE_DATE
END_DATE

This is different from the Entity-Attribute-Value (EAV) anti-pattern (Section 13.10) because FEATURE_DEFINITION has a defined set of feature codes with data type information and validation rules. It is a controlled vocabulary, not a dumping ground.

Pattern 5: Tree Structures (Hierarchies)

Organizational charts, account categories, product taxonomies — trees appear everywhere. The most common implementations:

Adjacency list: Each row has a PARENT_ID column pointing to its parent. Simple to insert and update. Querying the full tree requires recursive CTEs (Chapter 8).

Path enumeration: Each row stores its full path: "/1/5/12/". Easy to query descendants (WHERE PATH LIKE '/1/5/%'), but path strings must be maintained on every move.

Nested sets: Each row stores a LEFT and RIGHT boundary number. Querying all descendants is a simple range query. But inserts and moves are expensive because boundary numbers must be recalculated.

Closure table: A separate table stores every ancestor-descendant pair with a depth. Flexible and efficient for reads. Requires maintenance of the closure table on every insert, move, or delete.

Meridian Bank choice: Adjacency list for the employee hierarchy and account category hierarchy. DB2's recursive CTE support (Chapter 8) makes adjacency lists practical for the query patterns we need, and the tree depth is shallow (typically 5-7 levels for employee hierarchy).


13.10 Anti-Patterns to Avoid

A data modeling anti-pattern is a design approach that seems reasonable but leads to predictable problems. Learn to recognize these so you can catch them in design reviews.

Anti-Pattern 1: Entity-Attribute-Value (EAV) Abuse

The EAV model stores data as (entity_id, attribute_name, attribute_value) triples instead of as columns:

EAV_DATA
--------
ENTITY_ID    ATTRIBUTE_NAME    ATTRIBUTE_VALUE
10001        'FIRST_NAME'      'Margaret'
10001        'LAST_NAME'       'Chen'
10001        'DATE_OF_BIRTH'   '1978-03-15'
10001        'BALANCE'         '52340.50'

Why it seems attractive: "We can add any attribute without ALTER TABLE!" This appeals to developers who want infinite flexibility.

Why it is disastrous:

  • Type safety is gone. Every value is a string. DATE_OF_BIRTH and BALANCE are stored as VARCHAR. No CHECK constraints, no data type validation.
  • Queries are hideous. To reconstruct a single customer record, you need to PIVOT (or write one self-JOIN per attribute). A five-column query becomes five self-JOINs.
  • Referential integrity is impossible. You cannot create a foreign key from an EAV attribute to another table.
  • The optimizer cannot help. DB2's statistics and histogram-based optimization depend on column-level statistics. An EAV table has no meaningful statistics.
  • Indexing is reduced to brute force. You can index (ENTITY_ID, ATTRIBUTE_NAME), but you cannot create a targeted index on "all customers born before 1980."

When EAV is acceptable: Only when the attributes are genuinely dynamic and user-defined — for example, custom fields in a CRM where each client defines their own attributes. Even then, a hybrid approach (core attributes as columns, custom attributes as EAV) is better than pure EAV.

At Meridian Bank: We do not use EAV. Every attribute is a column in a properly typed table. If the business needs a new attribute, we use ALTER TABLE. That is what ALTER TABLE is for.

Anti-Pattern 2: The "One Table to Rule Them All"

This is the mega-table: hundreds of columns, mixing data from multiple business concepts, with most columns NULL for most rows.

UNIVERSAL_RECORD
---------
RECORD_ID, RECORD_TYPE, NAME1, NAME2, NAME3, DATE1, DATE2, DATE3,
AMOUNT1, AMOUNT2, CODE1, CODE2, CODE3, CODE4, CODE5,
TEXT1, TEXT2, TEXT3, TEXT4, FLAG1, FLAG2, FLAG3, ...

Depending on RECORD_TYPE, different columns have different meanings. When RECORD_TYPE = 'CUSTOMER', NAME1 is the first name and DATE1 is the date of birth. When RECORD_TYPE = 'ACCOUNT', NAME1 is the account name and DATE1 is the open date.

Why it is disastrous: Column names are meaningless. Constraints are impossible (you cannot put a CHECK constraint on DATE1 that means different things for different record types). Queries require CASE expressions everywhere. New developers cannot understand the schema without a Rosetta Stone document. The optimizer's column statistics are polluted by mixed data.

Anti-Pattern 3: OTLT (One True Lookup Table)

Instead of separate lookup tables for account types, transaction types, status codes, and country codes, some designers create a single CODES table:

CODES
-----
CODE_TYPE     CODE_VALUE    CODE_DESCRIPTION
'ACCT_TYPE'   'CHK'         'Checking Account'
'ACCT_TYPE'   'SAV'         'Savings Account'
'TXN_TYPE'    'DEP'         'Deposit'
'TXN_TYPE'    'WDR'         'Withdrawal'
'COUNTRY'     'USA'         'United States'
'COUNTRY'     'CAN'         'Canada'

Why it seems attractive: "One table instead of twenty! Simple!"

Why it is problematic:

  • Foreign keys cannot validate. You cannot create a FK from ACCOUNT.ACCOUNT_TYPE to CODES where CODE_TYPE = 'ACCT_TYPE'. DB2 foreign keys reference primary keys or unique constraints — they do not support conditional references.
  • Each code type has different attributes. Country codes need a continent and ISO numeric code. Account types need interest-bearing flags and minimum balances. The CODES table can only hold the lowest common denominator.
  • Performance suffers. Every lookup JOIN hits the same table, creating a contention hotspot in the buffer pool.

The fix: Create separate lookup tables: ACCOUNT_TYPE, TRANSACTION_TYPE, COUNTRY, STATUS_CODE. Each can have its own specific attributes, constraints, and foreign keys.

Anti-Pattern 4: Over-Normalization

Yes, this is a real thing. Normalizing to an extreme — splitting every attribute into its own table — creates a schema where even simple queries require ten-way JOINs.

Example of over-normalization:

CUSTOMER (CUSTOMER_ID, NAME_ID FK, ADDRESS_ID FK, PHONE_ID FK, STATUS_ID FK, ...)
PERSON_NAME (NAME_ID, FIRST_NAME, LAST_NAME)
ADDRESS (ADDRESS_ID, LINE1, LINE2, CITY, STATE_ID FK, ZIP)
STATE (STATE_ID, STATE_CODE, STATE_NAME)

This is technically "more normalized," but now getting a customer's name requires a JOIN to PERSON_NAME, and getting their state requires two JOINs (through ADDRESS to STATE). If a customer's name is always stored and always retrieved with the customer, there is no update anomaly that normalization would fix — FIRST_NAME and LAST_NAME functionally depend on CUSTOMER_ID. Putting them in a separate table adds complexity without removing anomalies.

The rule: Normalize to eliminate real anomalies. Do not normalize to satisfy an abstract principle when no anomaly exists.

Anti-Pattern 5: Comma-Separated Lists in Columns

CUSTOMER
--------
CUSTOMER_ID    PHONE_NUMBERS
10001          '555-1234,555-5678,555-9012'

This violates 1NF. Every query that needs a specific phone number must parse the string. You cannot index individual phone numbers. You cannot enforce that a phone number appears only once. You cannot JOIN to a phone number. And when the list grows past the VARCHAR column length, you hit a hard limit.

The fix: Always model multi-valued data as a separate table:

CUSTOMER_PHONE
--------------
CUSTOMER_ID (FK)
PHONE_TYPE
PHONE_NUMBER
IS_PRIMARY

Practitioner's Warning

These anti-patterns are not theoretical. I have encountered every one of them in production banking systems. The EAV pattern and the OTLT pattern are particularly insidious because they look elegant in a design meeting and only reveal their costs months later when real queries need to be written against real data volumes.


13.11 The Meridian Bank Complete Logical Model

We have learned the theory. Now let us apply it. In this section, we design the complete logical data model for Meridian National Bank, expanding far beyond the initial sketch from Chapter 2 and the basic tables from Chapter 4.

Business Context Recap

Meridian National Bank is a mid-size commercial bank with:

  • 2 million customers (individuals and businesses)
  • 50 branches across three states
  • 4.5 million accounts (checking, savings, CDs, money market, loans, credit lines)
  • 500 million transactions per year
  • Core banking on DB2 for z/OS; digital banking on DB2 LUW
  • Full regulatory compliance requirements (OCC, FDIC, BSA/AML, KYC)

Entity Catalog

The complete logical model contains the following entities, organized by domain:

Customer Domain

CUSTOMER — A person or business that has a relationship with the bank. Supertype for INDIVIDUAL_CUSTOMER and BUSINESS_CUSTOMER.

Attribute Type Notes
CUSTOMER_ID Integer (surrogate PK) System-generated
CUSTOMER_TYPE Code 'I' (Individual), 'B' (Business) — discriminator
CUSTOMER_SINCE Date Relationship start date
PRIMARY_BRANCH_ID FK to BRANCH Customer's home branch
RISK_RATING Smallint (1-5) KYC risk assessment
STATUS Code ACTIVE, INACTIVE, CLOSED, SUSPENDED
TAX_ID_LAST_FOUR Char(4) Last 4 of SSN or EIN (masked)

INDIVIDUAL_CUSTOMER — Subtype for personal customers.

Attribute Type Notes
CUSTOMER_ID FK/PK to CUSTOMER Identifying relationship
FIRST_NAME Varchar Legal first name
MIDDLE_NAME Varchar Optional
LAST_NAME Varchar Legal last name
DATE_OF_BIRTH Date Required for individuals
GENDER Code Optional
MARITAL_STATUS Code Optional

BUSINESS_CUSTOMER — Subtype for commercial/business customers.

Attribute Type Notes
CUSTOMER_ID FK/PK to CUSTOMER Identifying relationship
BUSINESS_NAME Varchar Legal business name
DBA_NAME Varchar "Doing business as" name
BUSINESS_TYPE Code LLC, CORP, PARTNERSHIP, SOLE_PROP
INCORPORATION_STATE Char(2) State of incorporation
INCORPORATION_DATE Date
ANNUAL_REVENUE Decimal For creditworthiness

CUSTOMER_ADDRESS — Multiple addresses per customer with type classification.

Attribute Type Notes
CUSTOMER_ADDRESS_ID Integer (PK) Surrogate key
CUSTOMER_ID FK to CUSTOMER
ADDRESS_TYPE Code HOME, WORK, MAILING, LEGAL
ADDRESS_LINE1 Varchar
ADDRESS_LINE2 Varchar Optional
CITY Varchar
STATE_CODE Char(2)
ZIP_CODE Varchar(10) Supports ZIP+4
COUNTRY_CODE Char(3) ISO 3166-1 alpha-3
IS_PRIMARY Boolean One primary per type
EFFECTIVE_DATE Date
END_DATE Date NULL if current

CUSTOMER_PHONE — Multiple phone numbers per customer.

Attribute Type Notes
CUSTOMER_ID FK to CUSTOMER (part of PK)
PHONE_TYPE Code (part of PK) MOBILE, HOME, WORK, FAX
PHONE_NUMBER Varchar(20) E.164 format
IS_PRIMARY Boolean

CUSTOMER_EMAIL — Multiple email addresses per customer.

Attribute Type Notes
CUSTOMER_ID FK to CUSTOMER (part of PK)
EMAIL_TYPE Code (part of PK) PERSONAL, WORK
EMAIL_ADDRESS Varchar(200)
IS_PRIMARY Boolean
IS_VERIFIED Boolean

CUSTOMER_DOCUMENT — Identity and regulatory documents.

Attribute Type Notes
DOCUMENT_ID Integer (PK)
CUSTOMER_ID FK to CUSTOMER
DOCUMENT_TYPE_CODE FK to DOCUMENT_TYPE
DOCUMENT_NUMBER Varchar Driver's license number, passport number, etc.
ISSUING_AUTHORITY Varchar
ISSUE_DATE Date
EXPIRY_DATE Date
VERIFICATION_STATUS Code PENDING, VERIFIED, EXPIRED, REJECTED
VERIFIED_BY FK to EMPLOYEE
VERIFIED_DATE Date

DOCUMENT_TYPE — Lookup: types of identity documents.

Attribute Type Notes
DOCUMENT_TYPE_CODE Char(3) PK DLI, PSP, SSC, EIN, etc.
DOCUMENT_TYPE_NAME Varchar 'Driver License', 'Passport', etc.
IS_GOVERNMENT_ISSUED Boolean
DESCRIPTION Varchar

Account Domain

ACCOUNT — Supertype for all financial accounts.

Attribute Type Notes
ACCOUNT_ID Integer (surrogate PK)
ACCOUNT_NUMBER Varchar (unique) Customer-facing number
ACCOUNT_TYPE_CODE FK to ACCOUNT_TYPE Discriminator for subtypes
BRANCH_ID FK to BRANCH Account's home branch
CURRENT_BALANCE Decimal(15,2)
AVAILABLE_BALANCE Decimal(15,2) Balance minus holds
OPEN_DATE Date
CLOSE_DATE Date NULL if open
LAST_ACTIVITY_DATE Date
STATUS Code ACTIVE, INACTIVE, CLOSED, FROZEN

ACCOUNT_HOLDER — Junction: M:N between CUSTOMER and ACCOUNT.

Attribute Type Notes
ACCOUNT_ID FK to ACCOUNT (part of PK)
CUSTOMER_ID FK to CUSTOMER (part of PK)
HOLDER_TYPE Code PRIMARY, JOINT, AUTHORIZED_SIGNER, BENEFICIARY
ADDED_DATE Date
REMOVED_DATE Date NULL if active
CAN_TRANSACT Boolean
CAN_CLOSE Boolean

ACCOUNT_TYPE — Lookup: product types for accounts.

Attribute Type Notes
ACCOUNT_TYPE_CODE Char(3) PK CHK, SAV, MMA, CDT, LON, CRL
ACCOUNT_TYPE_NAME Varchar 'Regular Checking', 'Premium Savings', etc.
CATEGORY Code CHECKING, SAVINGS, MONEY_MARKET, CERTIFICATE, LOAN, CREDIT
IS_INTEREST_BEARING Boolean
MIN_OPENING_BALANCE Decimal
DESCRIPTION Varchar
STATUS Code

CHECKING_DETAIL — Subtype: checking-specific attributes.

Attribute Type Notes
ACCOUNT_ID FK/PK to ACCOUNT
OVERDRAFT_LIMIT Decimal
OVERDRAFT_PROTECTION_ACCT FK to ACCOUNT Linked savings
MONTHLY_FEE Decimal
FREE_TRANSACTIONS_PER_MONTH Integer

SAVINGS_DETAIL — Subtype: savings-specific attributes.

Attribute Type Notes
ACCOUNT_ID FK/PK to ACCOUNT
INTEREST_RATE Decimal(5,4)
MIN_BALANCE Decimal
INTEREST_ACCRUAL_METHOD Code DAILY, MONTHLY
WITHDRAWAL_LIMIT_PER_MONTH Integer Reg D compliance

CD_DETAIL — Subtype: certificate of deposit.

Attribute Type Notes
ACCOUNT_ID FK/PK to ACCOUNT
TERM_MONTHS Integer
MATURITY_DATE Date
INTEREST_RATE Decimal(5,4) Locked rate
EARLY_WITHDRAWAL_PENALTY_PCT Decimal(5,2)
AUTO_RENEW Boolean
RENEWAL_TERM_MONTHS Integer

Transaction Domain

TRANSACTION — Every financial event that affects an account balance.

Attribute Type Notes
TRANSACTION_ID Bigint (PK) High volume — use BIGINT
ACCOUNT_ID FK to ACCOUNT
TRANSACTION_TYPE_CODE FK to TRANSACTION_TYPE
AMOUNT Decimal(15,2) Positive for credits, negative for debits
RUNNING_BALANCE Decimal(15,2) Balance after this transaction
TRANSACTION_DATE Date Business date
TRANSACTION_TIMESTAMP Timestamp Exact time
DESCRIPTION Varchar(200)
REFERENCE_NUMBER Varchar(30) Check number, wire ref, etc.
RELATED_ACCOUNT_ID FK to ACCOUNT For transfers
CHANNEL Code BRANCH, ATM, ONLINE, MOBILE, WIRE, ACH, CHECK, SYSTEM
TELLER_ID FK to EMPLOYEE NULL for non-branch transactions
STATUS Code PENDING, COMPLETED, REVERSED, VOIDED

TRANSACTION_TYPE — Lookup: classification of transactions.

Attribute Type Notes
TRANSACTION_TYPE_CODE Char(3) PK DEP, WDR, TRF, PMT, FEE, INT, ADJ, CHK
TRANSACTION_TYPE_NAME Varchar
AFFECTS_BALANCE Boolean
REQUIRES_APPROVAL Boolean
DESCRIPTION Varchar

PENDING_TRANSACTION — Transactions not yet posted (holds, pending ACH).

Attribute Type Notes
PENDING_TXN_ID Bigint (PK)
ACCOUNT_ID FK to ACCOUNT
TRANSACTION_TYPE_CODE FK to TRANSACTION_TYPE
AMOUNT Decimal(15,2)
EXPECTED_POST_DATE Date
HOLD_EXPIRY_DATE Date
SOURCE_DESCRIPTION Varchar
STATUS Code PENDING, POSTED, EXPIRED, CANCELLED
POSTED_TXN_ID FK to TRANSACTION Populated when posted

Loan Domain

LOAN — Extended attributes for loan accounts.

Attribute Type Notes
LOAN_ID Integer (PK)
ACCOUNT_ID FK to ACCOUNT Every loan is also an account
LOAN_TYPE Code MORTGAGE, AUTO, PERSONAL, BUSINESS, HELOC, STUDENT
ORIGINAL_AMOUNT Decimal(15,2)
CURRENT_PRINCIPAL Decimal(15,2)
INTEREST_RATE Decimal(5,4)
RATE_TYPE Code FIXED, VARIABLE
TERM_MONTHS Integer
MONTHLY_PAYMENT Decimal(12,2)
ORIGINATION_DATE Date
MATURITY_DATE Date
NEXT_PAYMENT_DATE Date
LOAN_OFFICER_ID FK to EMPLOYEE
STATUS Code ACTIVE, DELINQUENT, PAID_OFF, CHARGED_OFF, FORECLOSED

LOAN_PAYMENT — Individual payments against a loan (weak entity).

Attribute Type Notes
LOAN_ID FK to LOAN (part of PK) Identifying relationship
PAYMENT_SEQUENCE Integer (part of PK)
PAYMENT_DATE Date
TOTAL_AMOUNT Decimal(12,2)
PRINCIPAL_AMOUNT Decimal(12,2)
INTEREST_AMOUNT Decimal(12,2)
ESCROW_AMOUNT Decimal(12,2) For mortgages
LATE_FEE_AMOUNT Decimal(12,2)
TRANSACTION_ID FK to TRANSACTION Links to general transaction

LOAN_COLLATERAL — Assets securing a loan.

Attribute Type Notes
COLLATERAL_ID Integer (PK)
LOAN_ID FK to LOAN
COLLATERAL_TYPE Code REAL_ESTATE, VEHICLE, EQUIPMENT, SECURITIES, OTHER
DESCRIPTION Varchar(500)
ESTIMATED_VALUE Decimal(15,2)
APPRAISAL_DATE Date
APPRAISED_BY Varchar

Product and Fee Domain

PRODUCT — Financial products offered by the bank.

Attribute Type Notes
PRODUCT_ID Integer (PK)
PRODUCT_CODE Varchar(10) (unique)
PRODUCT_NAME Varchar(100)
PRODUCT_CATEGORY Code DEPOSIT, LOAN, CREDIT, INVESTMENT, SERVICE
ACCOUNT_TYPE_CODE FK to ACCOUNT_TYPE Maps product to account type
DESCRIPTION Varchar(500)
LAUNCH_DATE Date
DISCONTINUE_DATE Date
STATUS Code

FEE_SCHEDULE — Fee definitions for products and services.

Attribute Type Notes
FEE_SCHEDULE_ID Integer (PK)
PRODUCT_ID FK to PRODUCT Nullable (bank-wide fees)
FEE_TYPE Code MONTHLY, OVERDRAFT, ATM, WIRE, STMT, EARLY_CLOSE
FEE_NAME Varchar(100)
FEE_AMOUNT Decimal(10,2)
FEE_CALCULATION Code FLAT, PERCENTAGE, TIERED
EFFECTIVE_DATE Date
END_DATE Date

INTEREST_RATE_SCHEDULE — Rate tiers for products.

Attribute Type Notes
RATE_SCHEDULE_ID Integer (PK)
PRODUCT_ID FK to PRODUCT
BALANCE_TIER_MIN Decimal(15,2)
BALANCE_TIER_MAX Decimal(15,2)
INTEREST_RATE Decimal(5,4)
EFFECTIVE_DATE Date
END_DATE Date

Branch and Employee Domain

BRANCH — Physical bank locations.

Attribute Type Notes
BRANCH_ID Integer (PK)
BRANCH_CODE Char(6) (unique)
BRANCH_NAME Varchar(100)
ADDRESS_LINE1 Varchar(200) Embedded (single address)
ADDRESS_LINE2 Varchar(200)
CITY Varchar(100)
STATE_CODE Char(2)
ZIP_CODE Varchar(10)
PHONE_NUMBER Varchar(20)
FAX_NUMBER Varchar(20)
MANAGER_EMP_ID FK to EMPLOYEE Branch manager
OPEN_DATE Date
CLOSE_DATE Date
STATUS Code ACTIVE, INACTIVE, CLOSED

EMPLOYEE — Bank staff.

Attribute Type Notes
EMPLOYEE_ID Integer (PK)
EMPLOYEE_NUMBER Varchar(10) (unique)
FIRST_NAME Varchar(50)
LAST_NAME Varchar(50)
EMAIL_ADDRESS Varchar(200)
PHONE_NUMBER Varchar(20)
JOB_TITLE Varchar(100)
DEPARTMENT Code
BRANCH_ID FK to BRANCH
HIRE_DATE Date
TERMINATION_DATE Date NULL if active
MANAGER_ID FK to EMPLOYEE Self-referencing
STATUS Code ACTIVE, INACTIVE, TERMINATED

BRANCH_PRODUCT — Junction: which products are offered at which branches.

Attribute Type Notes
BRANCH_ID FK to BRANCH (part of PK)
PRODUCT_ID FK to PRODUCT (part of PK)
EFFECTIVE_DATE Date
END_DATE Date
CUSTOM_RATE_ADJUSTMENT Decimal(5,4) Branch-specific rate modifier

Regulatory and Audit Domain

AUDIT_LOG — Centralized change tracking.

Attribute Type Notes
AUDIT_ID Bigint (PK)
TABLE_NAME Varchar(128)
OPERATION Char(1) I, U, D
PRIMARY_KEY_VALUE Varchar(200)
COLUMN_NAME Varchar(128) NULL for INSERT/DELETE
OLD_VALUE Varchar(1000)
NEW_VALUE Varchar(1000)
CHANGED_BY Varchar(128)
CHANGED_TIMESTAMP Timestamp
APPLICATION_ID Varchar(128)

REGULATORY_REPORT — Tracking of regulatory filings.

Attribute Type Notes
REPORT_ID Integer (PK)
REPORT_TYPE Code CTR, SAR, CRA, CALL
REPORTING_PERIOD_START Date
REPORTING_PERIOD_END Date
GENERATED_TIMESTAMP Timestamp
GENERATED_BY FK to EMPLOYEE
FILED_DATE Date
STATUS Code DRAFT, GENERATED, FILED, AMENDED

CTR_FILING — Currency Transaction Reports (BSA requirement for cash > $10,000).

Attribute Type Notes
CTR_ID Integer (PK)
REPORT_ID FK to REGULATORY_REPORT
CUSTOMER_ID FK to CUSTOMER
TRANSACTION_ID FK to TRANSACTION
CASH_IN_AMOUNT Decimal(15,2)
CASH_OUT_AMOUNT Decimal(15,2)
FILING_DATE Date

ALERT — Suspicious activity alerts for BSA/AML compliance.

Attribute Type Notes
ALERT_ID Integer (PK)
ALERT_TYPE Code LARGE_CASH, STRUCTURING, UNUSUAL_PATTERN, SANCTIONS
CUSTOMER_ID FK to CUSTOMER
ACCOUNT_ID FK to ACCOUNT
TRANSACTION_ID FK to TRANSACTION
ALERT_DATE Date
DESCRIPTION Varchar(1000)
ASSIGNED_TO FK to EMPLOYEE
RESOLUTION Code OPEN, INVESTIGATING, CLEARED, ESCALATED, SAR_FILED
RESOLVED_DATE Date

Online Banking Domain

ONLINE_PROFILE — Digital banking credentials and preferences (1:1 with CUSTOMER, optional).

Attribute Type Notes
CUSTOMER_ID FK/PK to CUSTOMER 1:1 with CUSTOMER
USERNAME Varchar (unique)
PASSWORD_HASH Varchar
MFA_ENABLED Boolean
MFA_METHOD Code SMS, EMAIL, AUTHENTICATOR
LAST_LOGIN_TIMESTAMP Timestamp
FAILED_LOGIN_COUNT Integer
ACCOUNT_LOCKED Boolean
ENROLLMENT_DATE Date

LOGIN_HISTORY — Audit trail for digital access.

Attribute Type Notes
LOGIN_ID Bigint (PK)
CUSTOMER_ID FK to CUSTOMER
LOGIN_TIMESTAMP Timestamp
IP_ADDRESS Varchar(45) IPv6 support
DEVICE_FINGERPRINT Varchar(200)
CHANNEL Code WEB, MOBILE_IOS, MOBILE_ANDROID
SUCCESS Boolean
FAILURE_REASON Code NULL if successful

Relationship Summary

The complete model contains the following key relationships:

Parent Entity Child Entity Cardinality Participation
CUSTOMER INDIVIDUAL_CUSTOMER 1:1 Supertype/subtype
CUSTOMER BUSINESS_CUSTOMER 1:1 Supertype/subtype
CUSTOMER CUSTOMER_ADDRESS 1:N Optional (0..*)
CUSTOMER CUSTOMER_PHONE 1:N Optional (0..*)
CUSTOMER CUSTOMER_EMAIL 1:N Optional (0..*)
CUSTOMER CUSTOMER_DOCUMENT 1:N Optional (0..*)
CUSTOMER -- ACCOUNT ACCOUNT_HOLDER M:N (via junction)
ACCOUNT CHECKING_DETAIL 1:1 Supertype/subtype
ACCOUNT SAVINGS_DETAIL 1:1 Supertype/subtype
ACCOUNT CD_DETAIL 1:1 Supertype/subtype
ACCOUNT TRANSACTION 1:N Mandatory
ACCOUNT PENDING_TRANSACTION 1:N Optional
ACCOUNT LOAN 1:1 Optional
LOAN LOAN_PAYMENT 1:N Optional
LOAN LOAN_COLLATERAL 1:N Optional
BRANCH ACCOUNT 1:N
BRANCH EMPLOYEE 1:N
BRANCH CUSTOMER (primary) 1:N
BRANCH -- PRODUCT BRANCH_PRODUCT M:N (via junction)
EMPLOYEE EMPLOYEE (manager) 1:N (self) Optional
EMPLOYEE BRANCH (as manager) 1:1 Optional
EMPLOYEE LOAN (as officer) 1:N Optional
PRODUCT FEE_SCHEDULE 1:N
PRODUCT INTEREST_RATE_SCHEDULE 1:N
CUSTOMER ONLINE_PROFILE 1:1 Optional
CUSTOMER LOGIN_HISTORY 1:N Optional

Design Decisions Log

Every non-obvious design decision should be documented. Here are the key decisions for the Meridian Bank model:

  1. Surrogate keys everywhere. Every entity has a system-generated integer primary key. Natural keys (ACCOUNT_NUMBER, EMPLOYEE_NUMBER, BRANCH_CODE) are enforced as UNIQUE constraints.

  2. CUSTOMER supertype/subtype split. Individuals and businesses share a customer relationship but have fundamentally different attributes. Using table-per-type avoids a wide table full of NULLs.

  3. ACCOUNT supertype/subtype split. Same reasoning as CUSTOMER. The discriminator is ACCOUNT_TYPE_CODE.

  4. ACCOUNT_HOLDER junction table. We model the customer-account relationship as M:N to support joint accounts, authorized signers, and beneficiaries from day one. The overhead is one additional JOIN compared to a simple FK, which is modest.

  5. Separate TRANSACTION and PENDING_TRANSACTION. Posted transactions are immutable (a regulatory requirement). Pending transactions change status. Keeping them in separate tables avoids mixing mutable and immutable data.

  6. LOAN as an extension of ACCOUNT, not a separate hierarchy. Every loan is also an account (it has a balance, receives payments, generates statements). LOAN is a detail table that extends ACCOUNT with loan-specific attributes.

  7. Embedded addresses for BRANCH and EMPLOYEE; separate table for CUSTOMER. Branches and employees have one address each. Customers have multiple addresses that change over time.

  8. BIGINT for TRANSACTION_ID and AUDIT_ID. With 500 million transactions per year and a multi-year retention requirement, INTEGER (max ~2.1 billion) will be exhausted within a few years. BIGINT provides headroom.

  9. Effective dating on FEE_SCHEDULE and INTEREST_RATE_SCHEDULE. Fee and rate structures change over time. We need to know what fee was in effect when a transaction occurred.

  10. Separate DOCUMENT_TYPE lookup table. Different document types have different attributes (government-issued flag). This is not an OTLT — it is a properly designed lookup table for a specific domain.

Check Your Understanding — Section 13.11

  1. Why did we choose a junction table (ACCOUNT_HOLDER) for the customer-account relationship instead of a simple foreign key in ACCOUNT?
  2. If Meridian Bank adds a new product type — "Health Savings Account" — what changes are needed in the logical model?
  3. Trace the JOINs needed to answer: "Show me all transactions for all accounts held by customer 'Margaret Chen' in the last 30 days." List every table involved.

13.12 From Logical to Physical — Preview

The logical model describes what data means. The physical model describes how DB2 stores it. The gap between them is where DBA expertise lives.

In Chapter 14, we will translate the Meridian Bank logical model into physical DB2 objects. Here is a preview of the decisions that await:

Data Type Selection

The logical model says "Decimal(15,2) for monetary amounts." The physical design must choose between DECIMAL and DECFLOAT, decide on the scale and precision, and consider whether DECIMAL(15,2) provides enough headroom for international currencies with different decimal precisions.

Tablespace Assignment

Each table will be assigned to a tablespace based on its access pattern:

  • TRANSACTION — Range-partitioned by TRANSACTION_DATE (monthly partitions) for efficient historical queries and partition-level maintenance.
  • CUSTOMER — Partition-by-growth (LUW) or universal tablespace (z/OS) for a large but slowly growing table.
  • ACCOUNT_TYPE — Small lookup table; can coexist with other lookup tables in a shared tablespace.

Page Size

Wide tables (many columns or long VARCHAR/LOB columns) may need 16K or 32K page sizes. Narrow tables with high-volume point lookups perform better with 4K or 8K pages (more rows fit in the buffer pool).

Compression

The TRANSACTION table with 500 million rows per year is a prime candidate for row-level compression. DB2's dictionary-based compression typically achieves 60-80% space savings on transaction data with repetitive column values (same branch IDs, same transaction types).

Denormalization Decisions

The logical model is in 3NF. The physical model may include controlled denormalization:

  • Storing CUSTOMER_NAME in TRANSACTION for statement generation performance (avoiding the JOIN chain TRANSACTION -> ACCOUNT -> ACCOUNT_HOLDER -> CUSTOMER for every statement line).
  • Storing RUNNING_BALANCE in TRANSACTION as a denormalized, derived attribute (avoiding a running-sum query over potentially millions of transactions).

Every denormalization decision will be documented with the anomaly risk it introduces, the performance benefit it provides, and the mechanism (trigger, stored procedure, or application code) that keeps the denormalized data in sync.

Referential Integrity Enforcement

DB2 offers several options for foreign key enforcement: RESTRICT (prevent parent deletion), CASCADE (delete children with parent), SET NULL (null out foreign key on parent deletion). For the Meridian Bank model:

  • CUSTOMER -> ACCOUNT_HOLDER: CASCADE (if a customer is deleted, remove their account holder records)
  • ACCOUNT -> TRANSACTION: RESTRICT (cannot delete an account that has transactions)
  • EMPLOYEE -> LOAN (loan officer): SET NULL (if an employee leaves, the loan remains but the officer reference is cleared)

These decisions are physical design choices with business implications, and they will be fully explored in Chapter 14.


Spaced Review

These questions revisit concepts from earlier chapters to strengthen long-term retention.

From Chapter 2: The Relational Model

Review Question 1: In Chapter 2, we distinguished between a relation (mathematical concept) and a table (SQL/DB2 implementation). Name two ways in which a DB2 table can deviate from a pure relation, and explain the constraint you would add to make the table behave more like a proper relation.

Show Answer Two deviations: 1. **Duplicate rows.** A relation is a set (no duplicates). A DB2 table allows duplicate rows unless you enforce a PRIMARY KEY or UNIQUE constraint. Fix: always define a primary key. 2. **Row ordering.** A relation has no inherent ordering. DB2 can return rows in any order (which is relationally correct), but developers sometimes depend on insertion order. Fix: always use ORDER BY when order matters; never assume row order.

From Chapter 4: Setting Up Your DB2 Environment

Review Question 2: In Chapter 4, we created buffer pools BP_DATA_32K and BP_INDEX_8K with different page sizes. Why did we use a smaller page size for the index buffer pool?

Show Answer Index pages are typically much smaller than data pages. B+ tree index entries consist of key columns and row pointers, which are narrow compared to full data rows. Using 8K pages for indexes means more index entries fit in each page, and more index pages fit in the buffer pool, increasing the cache hit ratio for index lookups. Using 32K pages for indexes would waste buffer pool memory storing partially empty pages.

From Chapter 11: Data Definition Language

Review Question 3: In Chapter 11, we learned that CHECK constraints enforce domain rules. In the Meridian Bank logical model, we defined ACCOUNT.STATUS with valid values {ACTIVE, INACTIVE, CLOSED, FROZEN}. Write the CHECK constraint DDL, and explain why this approach is preferable to a foreign key to a STATUS lookup table for a domain with only four values.

Show Answer
CONSTRAINT CHK_ACCT_STATUS CHECK (STATUS IN ('A', 'I', 'C', 'F'))
For a small, stable domain (four values that rarely change), a CHECK constraint is preferable because: 1. No JOIN is needed to validate or display the status. 2. The constraint is visible in the DDL, serving as documentation. 3. There is no lookup table to manage, populate, or cache. A foreign key to a lookup table is better when the domain is large (country codes), changes frequently (product types), or has additional attributes (status descriptions, display order, active flag).

Chapter Summary

This chapter transformed you from a SQL practitioner into a data modeler. We covered:

  1. Why logical design matters — the cost of getting it wrong, measured in data anomalies, performance degradation, and maintenance burden.

  2. Requirements gathering — interviewing stakeholders, analyzing existing systems, extracting entities from business processes, and avoiding common traps.

  3. Entities and attributes — identifying what the business needs to track, classifying attributes (simple, composite, derived, multi-valued), defining domains, and choosing primary keys.

  4. Relationships and cardinality — modeling one-to-one, one-to-many, and many-to-many relationships with correct participation and cardinality constraints.

  5. ER diagram notation — Chen, Crow's foot, and UML notation, with Crow's foot as our standard for the rest of the book.

  6. Resolving M:N relationships — junction tables as first-class entities with their own attributes.

  7. Subtypes and supertypes — modeling inheritance hierarchies and choosing implementation strategies (single table, table-per-type, table-per-concrete-class).

  8. Normalization as validation — using 1NF through BCNF as a check on your ER model, not as a design method.

  9. Common design patterns — audit trails, effective dating, address polymorphism, configurable attributes, and tree structures.

  10. Anti-patterns — EAV abuse, mega-tables, OTLT, over-normalization, and comma-separated lists.

  11. The Meridian Bank complete logical model — 30+ entities across customer, account, transaction, loan, product, branch/employee, regulatory, and online banking domains.

  12. The bridge to physical design — a preview of how the logical model maps to DB2 tablespaces, page sizes, compression, and denormalization decisions in Chapter 14.

The logical model is a contract. It says: "This is what the data means. These are the business rules. These are the relationships." The physical model (Chapter 14), the index strategy (Chapter 15), and the schema evolution plan (Chapter 16) all flow from this contract. Get the contract right, and everything downstream is tractable. Get it wrong, and no amount of tuning will save you.

Next, in Chapter 14, we will take this logical model and make it real — translating every entity into a DB2 table, every relationship into a foreign key, and every design decision into DDL that the optimizer can work with.


In the next chapter: Physical Database Design — translating the logical model into DB2 tablespaces, page sizes, compression, and partitioning strategies.