> "The SQL is the easy part. Designing the schema that makes the SQL simple, fast, and correct — that's where database professionals earn their salary."
In This Chapter
- 13.1 Why Logical Design Matters
- 13.2 Requirements Gathering for Database Design
- 13.3 Entities and Attributes
- 13.4 Relationships and Cardinality
- 13.5 ER Diagram Notation
- 13.6 Resolving Many-to-Many Relationships
- 13.7 Subtypes and Supertypes
- 13.8 Normalization as Design Validation
- 13.9 Common Design Patterns
- 13.10 Anti-Patterns to Avoid
- 13.11 The Meridian Bank Complete Logical Model
- 13.12 From Logical to Physical — Preview
- Spaced Review
- Chapter Summary
Chapter 13: Logical Database Design — From Business Requirements to Entity-Relationship Models
"The SQL is the easy part. Designing the schema that makes the SQL simple, fast, and correct — that's where database professionals earn their salary." — Attributed to a senior DBA at a Fortune 100 bank, overheard at an IDUG conference
You know SQL. You can write a twelve-table JOIN with window functions, CTEs, and correlated subqueries. You can CREATE TABLE, define constraints, and build triggers that enforce business rules at the database layer. Parts I and II of this book made sure of that.
Now comes the hard question: What tables should you create in the first place?
This is not a syntactic question. It is a question about meaning — about what the business does, what data it needs to remember, and how those pieces of data relate to one another. Get it right, and the SQL practically writes itself. Queries are clean, indexes are obvious, and the schema evolves gracefully as requirements change. Get it wrong, and you spend the next five years writing tortured workarounds, patching data anomalies with nightly batch jobs, and explaining to auditors why the same customer has three different addresses — none of which match.
Chapter 13 is where we shift from using a database to designing one. We begin at the beginning: business requirements. We translate those requirements into a formal logical model using entity-relationship (ER) diagrams. We identify entities, attributes, and relationships. We model cardinality, handle subtypes, resolve many-to-many relationships, and validate our design through normalization. We catalog the anti-patterns that trip up even experienced designers. And we apply every technique to the Meridian National Bank, producing a complete logical data model with 30+ entities that will serve as the foundation for the rest of Part III.
By the end of this chapter, you will be able to walk into a conference room with business stakeholders, listen to their requirements, ask the right questions, and walk out with a logical model that both sides trust. That skill is what separates a database programmer from a data architect.
13.1 Why Logical Design Matters
Before we touch a single diagram, let us establish why logical design deserves its own chapter — and why so many projects skip it at enormous cost.
The Cost of Getting It Wrong
Consider three real-world scenarios that I have personally witnessed (names changed, patterns preserved):
Scenario 1: The Duplicated Customer. A regional bank built their core banking system without a unified CUSTOMER entity. The checking account system had its own customer table. The loan system had its own. The credit card system had a third. When regulators asked, "What is the total exposure for customer John Smith across all products?" the answer required a six-month data integration project, three consultants, and a probabilistic name-matching algorithm that was right 94% of the time. Six percent of the time it was wrong, and 6% of $2 billion in total exposure is $120 million in unquantified risk.
Scenario 2: The Denormalized Transaction Log. An e-commerce company stored the customer's shipping address directly in every order row. When a customer updated their address, old orders still showed the old address — which was intentional. But the company also stored the customer's name in the order row, and when a customer legally changed their name, some reports showed the old name and others showed the new name. Customer service calls doubled because representatives could not reconcile the data. The root cause: no one had drawn a logical model and asked, "Which attributes belong to the order and which belong to the customer?"
Scenario 3: The One Big Table. A startup decided that normalization was "overhead" and put everything into a single table with 187 columns. For the first year, queries were simple — just WHERE clauses on one table. By year three, the table had 400 million rows, most columns were NULL for most rows, and every ALTER TABLE ADD COLUMN required a full table reorganization that took the system offline for four hours. The company eventually re-platformed at a cost of $3 million.
These are not exotic edge cases. They are the predictable consequences of skipping logical design. The pattern is always the same:
- Design shortcuts save time in week one.
- Data anomalies appear by month three.
- Workarounds accumulate by year one.
- The cost to fix exceeds the cost of doing it right by year two.
What Logical Design Gives You
A proper logical model provides:
-
A shared vocabulary. Business stakeholders and technical teams agree on what "customer," "account," and "transaction" mean — including the edge cases. Is a joint account holder a customer? Is a voided check a transaction? These questions are answered in the model, not in a support ticket eighteen months from now.
-
Data integrity by design. When relationships and constraints are modeled correctly, the database prevents invalid data from being stored. You do not need application code to enforce that every account belongs to a customer, because the foreign key enforces it.
-
Predictable query performance. A normalized model with clear relationships tells you exactly which JOINs will be needed for common queries. You can design indexes before writing a single query. A denormalized or confused model forces you into table scans, self-joins, and parsing delimited strings out of VARCHAR columns.
-
Graceful evolution. When the business adds a new product type or regulatory requirement, a well-modeled schema accommodates it with an ALTER TABLE or a new table. A poorly modeled schema requires a migration project.
-
Audit and compliance readiness. Financial regulators (OCC, FDIC, FCA), healthcare regulators (HIPAA), and data privacy regulations (GDPR, CCPA) all require that organizations know what data they hold and how it relates. A logical model is the documentation that proves you know.
The Threshold Concept for This Chapter
A logical model is a contract between the business and the technical team. It defines what data means, not where it lives. Getting the logical model wrong means the physical database will be correct code for the wrong problem.
13.2 Requirements Gathering for Database Design
The logical model does not emerge from the mind of a data architect sitting alone in a dark room. It emerges from a structured conversation between the people who understand the business and the people who understand data.
Sources of Requirements
For a project like Meridian National Bank, requirements come from multiple sources:
Stakeholder interviews. Sit down with the branch manager and ask: "Walk me through what happens when a customer opens a new account." Sit down with the loan officer and ask: "What information do you need to approve a loan?" Sit down with the compliance officer and ask: "What reports does the OCC require, and what data feeds them?" Each conversation reveals entities, attributes, and business rules that no single person knows completely.
Existing systems and databases. If there is an existing system being replaced or extended, its schema is a rich (if imperfect) source of requirements. Run SELECT * FROM SYSCAT.TABLES WHERE TABSCHEMA = 'LEGACY' and examine every table. But treat the existing schema as evidence of what the business did, not necessarily what it should do. Legacy schemas accumulate workarounds, obsolete columns, and design compromises.
Business documents. Regulatory filings, product brochures, service agreements, account applications, and internal policy manuals all contain data requirements. The loan application form, for instance, tells you exactly what attributes the LOAN entity needs — because the business already decided what information it collects.
Process flows and use cases. Business process diagrams and user stories reveal the verbs in the system — open account, transfer funds, approve loan, generate statement. Each verb implies entities that participate and attributes that are read or written. If the process says "calculate monthly interest based on the daily closing balance," you know you need a balance history, not just a current balance.
Regulatory requirements. Banking regulations in particular drive data requirements. The Bank Secrecy Act requires Currency Transaction Reports for cash transactions over $10,000 — which means the TRANSACTION entity needs a cash indicator and a regulatory reporting flag. Know Your Customer (KYC) rules require identity verification — which means the CUSTOMER entity needs document references and verification status.
The Requirements Interview
A good requirements interview for data modeling follows a pattern:
-
Start with nouns. Ask the stakeholder to describe their domain. Write down every noun: customer, account, branch, loan, payment, statement, officer, product. These are candidate entities.
-
Ask "what do you need to know about each?" For each noun, ask what information matters. "What do you need to know about a customer?" yields: name, address, phone, date of birth, SSN, risk rating, relationship start date. These are candidate attributes.
-
Ask "how do these relate?" "Can a customer have multiple accounts? Can an account belong to multiple customers? Does every account belong to a branch, or can an account exist without one?" These questions reveal relationships and cardinality.
-
Ask about the edge cases. "What happens when a customer closes all their accounts but has an outstanding loan? Are they still a customer? What about a deceased customer whose estate is in probate?" Edge cases reveal optionality, status attributes, and lifecycle rules that the "happy path" misses.
-
Ask about time. "Do you need to know the customer's address as of today, or as of the date of a transaction? When an interest rate changes, do you need the history of previous rates?" These questions reveal whether you need temporal modeling — effective dates, history tables, and bitemporal patterns.
-
Ask about volume. "How many customers? How many transactions per day? How many years of history?" Volume does not directly affect the logical model, but it influences design decisions (such as whether to model a many-to-many relationship as a junction table versus a different approach) and is critical for the physical design in Chapter 14.
Common Requirements Pitfalls
The "just use a spreadsheet" trap. Stakeholders often describe their data needs in terms of the spreadsheet they currently use. "I need a column for January, a column for February, a column for March..." This is not a data requirement; it is a reporting format. Your job is to extract the underlying entities (MONTHLY_BALANCE with a MONTH_DATE column, not twelve separate columns).
The "we might need it someday" trap. Adding columns "just in case" leads to bloated tables full of NULLs. If the business cannot articulate a current use for an attribute, leave it out. ALTER TABLE ADD COLUMN is always available later.
The "everything is a string" trap. Stakeholders who think in spreadsheet terms tend to say "it's just text." A date of birth is not text — it is a DATE. A monetary amount is not text — it is a DECIMAL. An account status is not text — it is a constrained domain with exactly four valid values. Your job as a data modeler is to push for precise typing.
The "I'll know it when I see it" trap. Some stakeholders cannot articulate requirements in the abstract. For these people, build a prototype — even a whiteboard sketch with sample data — and ask, "Is this what you mean?" Concrete examples unlock requirements that interviews miss.
Check Your Understanding — Section 13.2
- You are interviewing a branch manager about the loan approval process. She says, "The loan officer evaluates the application, checks the customer's credit score, and either approves or denies it. If approved, we set the rate based on the product and the customer's creditworthiness." Identify at least four candidate entities and three candidate relationships from this statement.
- Why is an existing database schema a useful but imperfect source of requirements?
- A stakeholder says, "We need a column for each month's sales total." What is wrong with this requirement, and how would you remodel it?
13.3 Entities and Attributes
With requirements gathered, we begin the formal modeling process. The first step is identifying entities and their attributes.
What Is an Entity?
An entity is a thing of significance about which the business needs to store data. In a banking context:
- CUSTOMER — a person or business that holds accounts
- ACCOUNT — a financial product held by a customer (checking, savings, loan, CD)
- BRANCH — a physical location where the bank operates
- EMPLOYEE — a person who works for the bank
- TRANSACTION — a financial event that affects an account balance
- LOAN — an extension of credit from the bank to a customer
- PRODUCT — a type of financial product the bank offers
Each entity becomes a table in the physical database. But at the logical level, we do not think in terms of tables — we think in terms of business concepts.
Strong Entities vs. Weak Entities
A strong entity can be uniquely identified by its own attributes. CUSTOMER has a CUSTOMER_ID; BRANCH has a BRANCH_CODE. These exist independently.
A weak entity depends on another entity for its identification. LOAN_PAYMENT cannot be uniquely identified without knowing which LOAN it belongs to. The payment is identified by the combination of LOAN_ID and PAYMENT_NUMBER. If the loan is deleted, the payments have no independent meaning.
In Crow's foot notation (which we will adopt as our standard in Section 13.5), weak entities are sometimes drawn with double-bordered rectangles or with their identifying relationship shown as a solid line.
Identifying Attributes
For each entity, we identify its attributes — the individual data elements. Attributes fall into several categories:
Simple attributes hold a single, atomic value. FIRST_NAME, DATE_OF_BIRTH, and ACCOUNT_BALANCE are simple attributes.
Composite attributes are conceptually composed of smaller parts. An address consists of ADDRESS_LINE1, ADDRESS_LINE2, CITY, STATE_CODE, ZIP_CODE, and COUNTRY_CODE. In the logical model, you have a choice: model the address as a single composite attribute (useful for ER diagrams) or decompose it into its constituent simple attributes (required for implementation). Best practice: decompose at the logical level. You will inevitably need to query by city, state, or zip code, and if the address is stored as a single text blob, every such query requires string parsing.
Derived attributes can be calculated from other stored data. ACCOUNT_AGE (the time since the account was opened) can be derived from OPEN_DATE and the current date. TOTAL_BALANCE (the sum of all account balances for a customer) can be derived by summing the ACCOUNT table. At the logical level, note derived attributes but do not plan to store them — storing them creates update anomalies. (There are exceptions: in Chapter 14, we will discuss strategic denormalization where a derived value is stored for performance, but that is a physical design decision, not a logical one.)
Multi-valued attributes can hold multiple values for a single entity instance. A customer may have multiple phone numbers or multiple email addresses. In the relational model, multi-valued attributes cannot be stored directly in a single column (that would violate First Normal Form). They must be modeled as a separate entity with a relationship back to the parent. CUSTOMER has many CUSTOMER_PHONE_NUMBER instances.
Attribute Domains
Every attribute has a domain — the set of valid values it can take. At the logical level, define domains precisely:
| Attribute | Domain |
|---|---|
| CUSTOMER_STATUS | {'ACTIVE', 'INACTIVE', 'CLOSED', 'SUSPENDED'} |
| INTEREST_RATE | Decimal, 0.0000 to 99.9999 |
| ACCOUNT_BALANCE | Decimal, -999,999,999,999.99 to 999,999,999,999.99 |
| DATE_OF_BIRTH | Date, must be in the past |
| STATE_CODE | Two-letter US state abbreviation or territory code |
| EMAIL_ADDRESS | Valid email format (per RFC 5322) |
In DB2, domains are implemented through data types and CHECK constraints, as we saw in Chapter 11. But the logical model defines them in business terms, not SQL syntax.
Choosing the Primary Key
Every entity needs a primary key — one or more attributes that uniquely identify each instance. The choice of primary key has far-reaching consequences, as we discussed in Chapter 2. At the logical design level, the key decisions are:
Natural key vs. surrogate key. A natural key is derived from business data (SSN for CUSTOMER, BRANCH_CODE for BRANCH). A surrogate key is a system-generated identifier with no business meaning (CUSTOMER_ID as an auto-incrementing integer). The general recommendation for enterprise systems: use a surrogate key as the primary key, and enforce uniqueness on natural keys through UNIQUE constraints.
The rationale is practical: natural keys change. Customers change their SSN (it happens, especially after identity theft). Branch codes get renumbered in mergers. Email addresses change. When a natural key changes and it is the primary key, every foreign key reference in every related table must be updated — a cascading modification that can touch millions of rows. Surrogate keys never change because they carry no meaning.
Composite key vs. single-column key. Some entities are naturally identified by a combination of attributes. A LOAN_PAYMENT is identified by (LOAN_ID, PAYMENT_SEQUENCE). An ACCOUNT_HOLDER is identified by (ACCOUNT_ID, CUSTOMER_ID). Composite keys are legitimate and common, especially for associative entities (Section 13.6). But be aware that composite keys propagate into every child table as multi-column foreign keys, increasing storage and JOIN complexity.
Check Your Understanding — Section 13.3
- Classify each of the following as a simple, composite, derived, or multi-valued attribute of CUSTOMER: (a) last name, (b) mailing address, (c) age, (d) phone numbers, (e) risk rating.
- Why should derived attributes generally not be stored in the database at the logical design level?
- A developer proposes using EMAIL_ADDRESS as the primary key for the CUSTOMER table. Give three arguments against this choice.
13.4 Relationships and Cardinality
Entities do not exist in isolation. They are connected by relationships that reflect business rules. The core of ER modeling is identifying these relationships and specifying their cardinality (how many instances of one entity can be associated with one instance of another) and participation (whether the relationship is optional or mandatory).
One-to-One (1:1)
A one-to-one relationship means each instance of Entity A is associated with at most one instance of Entity B, and vice versa. These are relatively rare in practice.
Example: Each BRANCH has at most one BRANCH_MANAGER (an EMPLOYEE), and each EMPLOYEE can manage at most one branch.
One-to-one relationships often indicate that two entities could be merged into one. Before modeling a 1:1 relationship, ask: "Is this really one entity that we split for some reason?" Legitimate reasons to keep them separate include:
- Access control. EMPLOYEE and EMPLOYEE_SALARY might be separated so that the salary table can have different security permissions.
- Performance isolation. A CUSTOMER table and a CUSTOMER_PHOTO table might be separated so that the photo (a large BLOB) does not slow down queries that only need the customer's name and address.
- Optional existence. Not every CUSTOMER has a CUSTOMER_ONLINE_PROFILE. If only 30% of customers use online banking, storing the online profile attributes in the main CUSTOMER table wastes space for the other 70%.
One-to-Many (1:N)
The most common relationship type. One instance of Entity A is associated with many instances of Entity B, but each instance of Entity B is associated with at most one instance of Entity A.
Examples at Meridian Bank:
- One BRANCH has many EMPLOYEES; each EMPLOYEE works at one BRANCH.
- One CUSTOMER has many ACCOUNTS; each ACCOUNT belongs to one CUSTOMER (for now — we will revisit joint accounts later).
- One ACCOUNT has many TRANSACTIONS; each TRANSACTION belongs to one ACCOUNT.
- One LOAN_OFFICER (EMPLOYEE) manages many LOANS; each LOAN is managed by one LOAN_OFFICER.
In a relational database, one-to-many relationships are implemented by placing the primary key of the "one" side as a foreign key in the "many" side table. EMPLOYEE has a BRANCH_ID column that references BRANCH.BRANCH_ID.
Many-to-Many (M:N)
A many-to-many relationship means each instance of Entity A can be associated with many instances of Entity B, and each instance of Entity B can be associated with many instances of Entity A.
Examples at Meridian Bank:
- A CUSTOMER can hold many ACCOUNTS, and an ACCOUNT can be held by many CUSTOMERS (joint accounts).
- An EMPLOYEE can have many SKILLS (or certifications), and each SKILL can belong to many EMPLOYEES.
- A PRODUCT can be offered at many BRANCHES, and each BRANCH can offer many PRODUCTS.
Many-to-many relationships cannot be directly implemented in a relational database. They must be resolved through a junction table (also called a bridge table, associative entity, or linking table). We cover this in detail in Section 13.6.
Identifying vs. Non-Identifying Relationships
An identifying relationship is one where the child entity depends on the parent for its identity. The parent's primary key becomes part of the child's primary key.
Example: LOAN_PAYMENT is identified by (LOAN_ID, PAYMENT_SEQUENCE). Without LOAN_ID, the payment has no identity. If the relationship between LOAN and LOAN_PAYMENT is identifying, LOAN_ID migrates into the primary key of LOAN_PAYMENT.
A non-identifying relationship is one where the child can be identified independently. The parent's primary key appears as a foreign key in the child, but not as part of the child's primary key.
Example: ACCOUNT has a CUSTOMER_ID foreign key, but ACCOUNT is identified by its own ACCOUNT_ID. The customer relationship is non-identifying.
In Crow's foot notation, identifying relationships are shown with a solid line; non-identifying relationships are shown with a dashed line.
Participation: Mandatory vs. Optional
Mandatory participation (also called total participation) means every instance of the entity must participate in the relationship. At Meridian Bank, every ACCOUNT must belong to a CUSTOMER. The relationship from ACCOUNT to CUSTOMER has mandatory participation on the ACCOUNT side.
Optional participation (also called partial participation) means some instances may not participate. Not every EMPLOYEE is a BRANCH_MANAGER. The relationship from EMPLOYEE to BRANCH (as manager) has optional participation on the EMPLOYEE side.
In the logical model, mandatory participation translates to NOT NULL foreign keys in the physical design. Optional participation translates to nullable foreign keys.
Reading Cardinality Notation
When we say "a CUSTOMER has many ACCOUNTS," we write the cardinality from each side:
- From CUSTOMER to ACCOUNT: one CUSTOMER has zero or more ACCOUNTS (written as 0..* or 0..N)
- From ACCOUNT to CUSTOMER: one ACCOUNT belongs to exactly one CUSTOMER (written as 1..1)
The full notation is: CUSTOMER 1 ——— 0..* ACCOUNT
This tells us: - The relationship is one-to-many (1:N) - Participation is mandatory on the ACCOUNT side (every ACCOUNT must have a CUSTOMER) - Participation is optional on the CUSTOMER side from the perspective of the "many" — a CUSTOMER with zero accounts is allowed
Always read relationships from both directions. "A CUSTOMER has many ACCOUNTS" and "An ACCOUNT belongs to one CUSTOMER" are two perspectives on the same relationship. Both must be stated and validated with stakeholders.
Self-Referencing Relationships
An entity can have a relationship with itself. The most common example at Meridian Bank is the employee-manager hierarchy:
- EMPLOYEE reports to EMPLOYEE: Each employee has zero or one manager (who is also an employee). Each manager can have zero or more direct reports.
This is a one-to-many self-referencing relationship. In the physical model, it is implemented as a MANAGER_ID column in the EMPLOYEE table that references EMPLOYEE_ID in the same table. We saw this in Chapter 4's initial schema, and we explored it with recursive CTEs in Chapter 8.
Another example: in a chart of accounts, ACCOUNT_CATEGORY can have a parent ACCOUNT_CATEGORY, forming a tree structure.
Check Your Understanding — Section 13.4
- Classify each relationship as 1:1, 1:N, or M:N: (a) CUSTOMER to ADDRESS, assuming each customer has one mailing address, (b) CUSTOMER to ADDRESS, assuming customers can have home, work, and mailing addresses, (c) ACCOUNT to PRODUCT_TYPE, (d) EMPLOYEE to DEPARTMENT.
- Why can many-to-many relationships not be directly implemented in a relational database?
- Describe a real-world 1:1 relationship in a banking context and explain why you might keep the two entities separate.
13.5 ER Diagram Notation
There are several notations for drawing ER diagrams. The three most common are:
Chen Notation
Invented by Peter Chen in 1976, this is the original ER notation. Entities are rectangles, relationships are diamonds, and attributes are ovals connected to their entity. Cardinality is shown with numbers (1, N, M) on the connecting lines.
┌──────────┐ ┌──────────┐ ┌──────────┐
│ CUSTOMER │——1———<has>———N——│ ACCOUNT │——1———<at>———N——│ BRANCH │
└──────────┘ └──────────┘ └──────────┘
Strengths: Visually clear for teaching. Relationship names are explicit (the diamond). Weaknesses: Takes up a lot of space. Attribute ovals create visual clutter for entities with many attributes. Rarely used in industry.
Crow's Foot (IE) Notation
Developed as part of Information Engineering (IE) methodology, this is the most widely used notation in industry and the one we will use for the rest of this book.
Entities are rectangles. Relationships are lines connecting entities. Cardinality and participation are shown with symbols at each end of the line:
| Symbol | Meaning |
|---|---|
──\|\|── |
Exactly one (mandatory) |
──\|O── |
Zero or one (optional) |
──<── or crow's foot |
Many |
──\|\|──<── |
One or many (mandatory) |
──O──<── |
Zero or many (optional) |
In Crow's foot notation, the "crow's foot" (a three-pronged fork) represents the "many" side. A vertical bar represents "one." A circle represents "zero" (optional).
For Meridian Bank's core relationship:
┌──────────┐ ┌──────────┐
│ CUSTOMER │ ──── 1 ──── 0..* ──── │ ACCOUNT │
└──────────┘ └──────────┘
Read: One CUSTOMER has zero or more ACCOUNTS.
One ACCOUNT belongs to exactly one CUSTOMER.
Strengths: Compact. Cardinality and participation are immediately visible. Supported by virtually all data modeling tools (erwin, PowerDesigner, Lucidchart, draw.io). Industry standard. Weaknesses: Relationship names are often omitted (shown as labels on the line), which can reduce clarity for complex models.
UML Class Diagram Notation
UML (Unified Modeling Language) class diagrams can also represent data models. Entities are rectangles divided into compartments (name, attributes, operations). Relationships are lines with multiplicity annotations (1, 0..1, , 1..).
┌──────────────┐ ┌──────────────┐
│ CUSTOMER │ 1 0..*│ ACCOUNT │
│──────────────│──────────│──────────────│
│ customer_id │ │ account_id │
│ first_name │ │ customer_id │
│ last_name │ │ balance │
└──────────────┘ └──────────────┘
Strengths: Familiar to software developers. Shows attributes in the entity box. Good for object-relational mapping discussions. Weaknesses: Less precise about identifying relationships. The "operations" compartment is irrelevant for data modeling. Not as widely used by DBAs.
Our Choice: Crow's Foot
For the remainder of this book, we use Crow's foot notation for all ER diagrams. The reasons:
- It is the industry standard for relational data modeling.
- It compactly conveys both cardinality and participation.
- Every major modeling tool supports it.
- DB2 professionals (your future colleagues) will expect it.
When you encounter diagrams in other notations — and you will — use the equivalence table above to translate.
Practitioner's Tip
The notation matters far less than the conversations it enables. A whiteboard sketch with boxes, lines, and numbers written next to them is a perfectly valid ER diagram if it helps the team agree on the data model. Do not let notation debates delay the design work.
13.6 Resolving Many-to-Many Relationships
Many-to-many (M:N) relationships are common in business requirements but cannot exist as-is in a relational database. A relational table can hold a foreign key to one parent, but it cannot hold a foreign key to "many" parents without repeating rows or using comma-separated lists (both of which violate normalization principles). The solution is a junction table — a new entity that sits between the two original entities and has a one-to-many relationship with each.
The Basic Pattern
Business requirement: A customer can hold multiple accounts, and an account can be held by multiple customers (joint accounts).
Direct M:N (cannot implement):
CUSTOMER ——— M:N ——— ACCOUNT
Resolved with junction table:
CUSTOMER ──── 1 ──── 0..* ──── ACCOUNT_HOLDER ──── 0..* ──── 1 ──── ACCOUNT
The junction table ACCOUNT_HOLDER contains:
| Column | Description |
|---|---|
| ACCOUNT_ID | FK to ACCOUNT (part of composite PK) |
| CUSTOMER_ID | FK to CUSTOMER (part of composite PK) |
| HOLDER_TYPE | 'PRIMARY' or 'JOINT' |
| ADDED_DATE | When this customer was added to the account |
| REMOVED_DATE | When (if) this customer was removed |
| CAN_TRANSACT | Whether this holder can execute transactions |
| CAN_CLOSE | Whether this holder can close the account |
Attributes on the Relationship
Notice that ACCOUNT_HOLDER has its own attributes: HOLDER_TYPE, ADDED_DATE, CAN_TRANSACT. These attributes describe the relationship between a customer and an account, not the customer or the account individually. This is a key insight: when a many-to-many relationship has attributes, the junction table is not just a mechanical bridge — it is a meaningful entity in its own right.
The name matters. Do not call it CUSTOMER_ACCOUNT_XREF or CUST_ACCT_LINK. Call it ACCOUNT_HOLDER, because that is what it means. A junction table is an entity. Name it like one.
More Examples at Meridian Bank
Products offered at branches:
BRANCH ──── 1 ──── 0..* ──── BRANCH_PRODUCT ──── 0..* ──── 1 ──── PRODUCT
BRANCH_PRODUCT attributes: EFFECTIVE_DATE, END_DATE, CUSTOM_RATE (if the branch offers a promotional rate for that product).
Employee skills and certifications:
EMPLOYEE ──── 1 ──── 0..* ──── EMPLOYEE_CERTIFICATION ──── 0..* ──── 1 ──── CERTIFICATION
EMPLOYEE_CERTIFICATION attributes: CERTIFICATION_DATE, EXPIRY_DATE, SCORE, STATUS.
Customer documents:
CUSTOMER ──── 1 ──── 0..* ──── CUSTOMER_DOCUMENT ──── 0..* ──── 1 ──── DOCUMENT_TYPE
CUSTOMER_DOCUMENT attributes: DOCUMENT_NUMBER, ISSUE_DATE, EXPIRY_DATE, VERIFICATION_STATUS, VERIFIED_BY, VERIFIED_DATE.
When Not to Use a Junction Table
If the business rule is truly one-to-many — "each account belongs to exactly one customer" — do not introduce a junction table preemptively. Adding unnecessary junction tables complicates queries (every query that joins CUSTOMER to ACCOUNT now requires an intermediate JOIN), increases storage, and reduces clarity. Model what the business requires today, not what it might require someday.
If the requirement changes from one-to-many to many-to-many later, you can add the junction table at that time using the schema evolution techniques in Chapter 16. This is one of the advantages of surrogate keys: because ACCOUNT references CUSTOMER through a CUSTOMER_ID foreign key, and not vice versa, adding a junction table is an additive change — you create the new table and migrate the foreign key data into it.
Check Your Understanding — Section 13.6
- A library system has BOOKS and AUTHORS. An author can write many books, and a book can have many authors. Design the junction table, including at least two attributes that describe the relationship.
- Why is the name of a junction table important? How would you name the junction table for the BOOKS/AUTHORS example?
- Under what circumstances would you avoid creating a junction table even when the relationship appears to be many-to-many?
13.7 Subtypes and Supertypes
Not all entities of the same "type" have the same attributes. At Meridian Bank, all accounts share certain attributes (account number, balance, open date, status), but checking accounts have overdraft limits, savings accounts have minimum balance requirements, and certificates of deposit have maturity dates and early withdrawal penalties. This is the supertype/subtype pattern (also called generalization/specialization or inheritance in ER modeling).
The Supertype
The supertype is the general entity that contains attributes common to all subtypes:
ACCOUNT (supertype) - ACCOUNT_ID (PK) - ACCOUNT_NUMBER - CUSTOMER_ID (FK) - BRANCH_ID (FK) - CURRENT_BALANCE - OPEN_DATE - STATUS
The Subtypes
Each subtype contains attributes specific to that category:
CHECKING_ACCOUNT (subtype) - ACCOUNT_ID (PK, FK to ACCOUNT) - OVERDRAFT_LIMIT - CHECK_REORDER_THRESHOLD - DEBIT_CARD_NUMBER
SAVINGS_ACCOUNT (subtype) - ACCOUNT_ID (PK, FK to ACCOUNT) - INTEREST_RATE - MIN_BALANCE - INTEREST_ACCRUAL_METHOD
CERTIFICATE_OF_DEPOSIT (subtype) - ACCOUNT_ID (PK, FK to ACCOUNT) - MATURITY_DATE - TERM_MONTHS - EARLY_WITHDRAWAL_PENALTY_PCT - AUTO_RENEW
Completeness and Disjointness
Two properties define how subtypes relate to the supertype:
Completeness: Is every instance of the supertype also an instance of some subtype? - Total (mandatory): Every ACCOUNT must be a CHECKING, SAVINGS, CD, or some other specific type. No "untyped" accounts exist. - Partial (optional): Some ACCOUNTs might not belong to any subtype (perhaps a new product type that has not been modeled yet).
Disjointness: Can an instance belong to more than one subtype? - Disjoint (exclusive): An ACCOUNT is either a CHECKING_ACCOUNT or a SAVINGS_ACCOUNT, never both. - Overlapping (non-exclusive): An EMPLOYEE could be both a LOAN_OFFICER and a BRANCH_MANAGER simultaneously.
At Meridian Bank, accounts are total and disjoint — every account has exactly one type. This is enforced by a discriminator attribute (ACCOUNT_TYPE_CODE) in the supertype that determines which subtype table contains the additional attributes.
Implementation Strategies
The logical model describes the supertype/subtype hierarchy. When we move to physical design (Chapter 14), we have three implementation strategies:
Strategy 1: Table-per-hierarchy (single table). Put all attributes from all subtypes into one table. Subtype-specific attributes are NULL for rows of other subtypes.
CREATE TABLE ACCOUNT (
ACCOUNT_ID INTEGER NOT NULL,
ACCOUNT_TYPE_CODE CHAR(3) NOT NULL,
-- Common attributes
CURRENT_BALANCE DECIMAL(15,2),
OPEN_DATE DATE,
-- Checking-specific (NULL for non-checking)
OVERDRAFT_LIMIT DECIMAL(15,2),
CHECK_REORDER_THRESHOLD INTEGER,
-- Savings-specific (NULL for non-savings)
INTEREST_RATE DECIMAL(5,4),
MIN_BALANCE DECIMAL(15,2),
-- CD-specific (NULL for non-CD)
MATURITY_DATE DATE,
TERM_MONTHS INTEGER,
EARLY_WITHDRAWAL_PENALTY_PCT DECIMAL(5,2),
...
);
Pros: Simple queries (no JOINs needed). Good for read-heavy workloads. Cons: Many NULL columns. Cannot enforce NOT NULL on subtype-specific attributes through DDL (you need CHECK constraints). Table can become very wide.
Strategy 2: Table-per-type (separate tables). Create one table for the supertype and one table for each subtype. The subtype tables reference the supertype through a shared primary key.
CREATE TABLE ACCOUNT (
ACCOUNT_ID INTEGER NOT NULL PRIMARY KEY,
ACCOUNT_TYPE CHAR(3) NOT NULL,
CURRENT_BALANCE DECIMAL(15,2),
OPEN_DATE DATE
);
CREATE TABLE CHECKING_ACCOUNT (
ACCOUNT_ID INTEGER NOT NULL PRIMARY KEY
REFERENCES ACCOUNT(ACCOUNT_ID),
OVERDRAFT_LIMIT DECIMAL(15,2) NOT NULL,
CHECK_REORDER_THRESHOLD INTEGER
);
CREATE TABLE SAVINGS_ACCOUNT (
ACCOUNT_ID INTEGER NOT NULL PRIMARY KEY
REFERENCES ACCOUNT(ACCOUNT_ID),
INTEREST_RATE DECIMAL(5,4) NOT NULL,
MIN_BALANCE DECIMAL(15,2) NOT NULL
);
Pros: Clean separation. Subtype-specific NOT NULL constraints are enforceable. No wasted NULL columns. Cons: Queries that need both common and subtype-specific attributes require a JOIN. INSERT requires two statements (or a stored procedure).
Strategy 3: Table-per-concrete-class (no supertype table). Each subtype is a standalone table containing both the common and subtype-specific attributes. There is no shared supertype table.
CREATE TABLE CHECKING_ACCOUNT (
ACCOUNT_ID INTEGER NOT NULL PRIMARY KEY,
CURRENT_BALANCE DECIMAL(15,2),
OPEN_DATE DATE,
OVERDRAFT_LIMIT DECIMAL(15,2) NOT NULL,
CHECK_REORDER_THRESHOLD INTEGER
);
CREATE TABLE SAVINGS_ACCOUNT (
ACCOUNT_ID INTEGER NOT NULL PRIMARY KEY,
CURRENT_BALANCE DECIMAL(15,2),
OPEN_DATE DATE,
INTEREST_RATE DECIMAL(5,4) NOT NULL,
MIN_BALANCE DECIMAL(15,2) NOT NULL
);
Pros: Each table is self-contained. No JOINs needed for subtype-specific queries. Cons: Common attributes are duplicated in every table. Queries across all account types require UNION ALL. Foreign keys from other tables (like TRANSACTION) must reference one of many possible tables — which is unworkable.
The Recommendation for Meridian Bank: Use table-per-type (Strategy 2) for the ACCOUNT hierarchy. It gives us the clean separation of subtypes with the ability to query all accounts through the shared ACCOUNT table. Every TRANSACTION references ACCOUNT.ACCOUNT_ID, regardless of the account type.
Spaced Review — Chapter 2 Connection
In Chapter 2, we introduced the concept of normalization and its normal forms. Supertype/subtype design is directly related: the reason we separate subtype-specific attributes into their own tables is to eliminate the NULL-heavy rows that would violate the spirit (if not the letter) of normalization. A single-table approach with many NULLs is technically in 3NF (NULLs are not a normalization violation per se), but it creates the same kinds of update anomalies that normalization was designed to prevent.
13.8 Normalization as Design Validation
You were introduced to normalization in Chapter 2. There, we learned normalization as a theoretical framework — a way of understanding how data should be organized. Now we revisit normalization as a practical validation technique for your ER model.
The workflow is:
- Gather requirements.
- Build the ER model based on business understanding.
- Translate the ER model into preliminary table definitions.
- Apply normalization rules to validate (and if needed, correct) the design.
Normalization is not a design method — it is a design check. You do not normalize first and design second. You design from the business requirements (top-down) and then validate with normalization (bottom-up). If normalization reveals a problem, you go back and fix the ER model.
Quick Review: The Normal Forms
First Normal Form (1NF): Every column contains atomic (indivisible) values. No repeating groups. No arrays or comma-separated lists in a single column.
Violation example: A CUSTOMER table with a PHONE_NUMBERS column containing "555-1234, 555-5678, 555-9012". Fix: Create a CUSTOMER_PHONE table.
Second Normal Form (2NF): Must be in 1NF, and every non-key attribute must depend on the entire primary key (not just part of it). This matters for tables with composite primary keys.
Violation example: An ORDER_LINE table with composite key (ORDER_ID, PRODUCT_ID) that includes ORDER_DATE. The ORDER_DATE depends only on ORDER_ID, not on the full key (ORDER_ID, PRODUCT_ID). Fix: Move ORDER_DATE to the ORDER table.
Third Normal Form (3NF): Must be in 2NF, and no non-key attribute depends on another non-key attribute (no transitive dependencies).
Violation example: A CUSTOMER table with BRANCH_ID, BRANCH_NAME, and BRANCH_ADDRESS. BRANCH_NAME and BRANCH_ADDRESS depend on BRANCH_ID, not directly on CUSTOMER_ID. Fix: Move BRANCH_NAME and BRANCH_ADDRESS to a BRANCH table; keep only BRANCH_ID in CUSTOMER.
Boyce-Codd Normal Form (BCNF): Must be in 3NF, and every determinant is a candidate key. BCNF handles an edge case where 3NF allows certain anomalies when multiple overlapping candidate keys exist.
Violation example: A COURSE_SCHEDULE table where (STUDENT, COURSE) is one candidate key and INSTRUCTOR determines COURSE (each instructor teaches only one course). INSTRUCTOR is a determinant but not a candidate key. Fix: Decompose into two tables.
Applying Normalization to the Meridian Bank Model
Let us validate a portion of our model. Consider the initial draft of the TRANSACTION entity:
TRANSACTION
-----------
TRANSACTION_ID (PK)
ACCOUNT_ID (FK)
TRANSACTION_TYPE
AMOUNT
TRANSACTION_DATE
DESCRIPTION
CUSTOMER_NAME ← problem
ACCOUNT_NUMBER ← problem
BRANCH_NAME ← problem
1NF check: Passes. All attributes are atomic.
2NF check: Passes. The primary key is a single column (TRANSACTION_ID), so partial dependencies are impossible.
3NF check: Fails. CUSTOMER_NAME depends on ACCOUNT_ID (transitively through CUSTOMER_ID). ACCOUNT_NUMBER depends on ACCOUNT_ID. BRANCH_NAME depends on BRANCH_ID (through ACCOUNT). These are transitive dependencies.
Fix: Remove CUSTOMER_NAME, ACCOUNT_NUMBER, and BRANCH_NAME. They belong in their respective tables. When you need the customer name on a transaction report, JOIN TRANSACTION to ACCOUNT to CUSTOMER.
This is exactly the kind of design flaw that normalization catches. A developer might include CUSTOMER_NAME in the TRANSACTION table "for convenience" — to avoid a JOIN in the most common query. But the moment a customer changes their name, every historical transaction would show the new name instead of the name at the time of the transaction. Is that the business requirement? Maybe. Maybe not. The normalization check forces you to ask the question.
When Normalization Reveals Design Flaws
If normalization keeps pointing to the same entity — "this attribute does not belong here, that attribute does not belong here" — it often means you are missing an entity. The attributes that keep getting removed are trying to tell you they belong to a table that does not exist yet.
Example: You design a LOAN table and include COLLATERAL_TYPE, COLLATERAL_VALUE, COLLATERAL_DESCRIPTION, COLLATERAL_APPRAISAL_DATE. Normalization says these depend on the collateral, not the loan. This suggests a COLLATERAL entity that the LOAN entity relates to — and once you create it, you realize that multiple loans can be secured by the same collateral (a house can secure both a mortgage and a home equity line of credit). A missing entity has been found.
Practitioner's Note
In practice, most ER models designed by experienced data architects are already in 3NF or BCNF because the architects think in terms of entities and relationships, not flat tables. Normalization violations tend to creep in when developers add "convenience columns" during physical design or when requirements change and new attributes get tacked onto existing tables. Run a normalization check every time you modify the model.
13.9 Common Design Patterns
Certain data modeling patterns recur across industries. Recognizing them saves design time and avoids reinventing wheels that have already been tested and refined.
Pattern 1: Audit Trail
Every change to critical data should be tracked: who changed it, when, what the old value was, and what the new value is. At Meridian Bank, this is a regulatory requirement.
Design options:
Option A: Centralized audit log. A single AUDIT_LOG table records changes across all tables.
AUDIT_LOG
---------
AUDIT_ID (PK)
TABLE_NAME
OPERATION (I/U/D)
PRIMARY_KEY_VALUE
COLUMN_NAME
OLD_VALUE (VARCHAR)
NEW_VALUE (VARCHAR)
CHANGED_BY
CHANGED_TIMESTAMP
Pros: Simple. One table to query for all audit data. Cons: OLD_VALUE and NEW_VALUE are VARCHAR regardless of the original data type. Querying "all changes to customer 10042" requires string matching on PRIMARY_KEY_VALUE.
Option B: Shadow tables. For each audited table, create a corresponding history table with the same columns plus audit metadata.
CUSTOMER_HISTORY
---------
HISTORY_ID (PK)
CUSTOMER_ID
FIRST_NAME
LAST_NAME
... (all CUSTOMER columns)
OPERATION (I/U/D)
CHANGED_BY
CHANGED_TIMESTAMP
EFFECTIVE_FROM
EFFECTIVE_TO
Pros: Preserves data types. Easy to reconstruct the state of a customer at any point in time. Efficient querying. Cons: Schema maintenance — every ALTER TABLE on CUSTOMER requires a matching ALTER TABLE on CUSTOMER_HISTORY.
Meridian Bank choice: Both. The centralized AUDIT_LOG captures fine-grained column-level changes for compliance queries ("what changed in the last 24 hours?"), while shadow tables capture row-level snapshots for historical reconstruction ("what was this customer's address on March 15, 2024?").
Pattern 2: Effective Dating
Many business attributes change over time, and the business needs to know not just the current value, but the value at any given point in the past (or future).
Example: Interest rates change. A savings account might earn 2.5% from January to March, 2.75% from April to June, and 3.0% from July onward.
INTEREST_RATE_HISTORY
---------------------
RATE_ID (PK)
ACCOUNT_ID (FK) or PRODUCT_ID (FK)
EFFECTIVE_DATE
END_DATE (NULL if current)
INTEREST_RATE
CHANGED_BY
CHANGED_TIMESTAMP
The query to find the rate on any given date is:
SELECT INTEREST_RATE
FROM INTEREST_RATE_HISTORY
WHERE ACCOUNT_ID = ?
AND EFFECTIVE_DATE <= ?
AND (END_DATE IS NULL OR END_DATE > ?)
Design rules for effective dating:
- Use a half-open interval: EFFECTIVE_DATE is inclusive, END_DATE is exclusive. This prevents gaps and overlaps.
- The current record has END_DATE = NULL.
- When a new rate takes effect, update the previous record's END_DATE and insert a new record.
- Consider adding a CHECK constraint or trigger to prevent overlapping date ranges.
Pattern 3: Address and Phone Polymorphism
Customers, employees, and branches all have addresses. Do you duplicate the address columns in each table, or do you create a shared ADDRESS table?
Option A: Embedded addresses (addresses inline in each table).
CUSTOMER (... ADDRESS_LINE1, ADDRESS_LINE2, CITY, STATE, ZIP ...)
EMPLOYEE (... ADDRESS_LINE1, ADDRESS_LINE2, CITY, STATE, ZIP ...)
BRANCH (... ADDRESS_LINE1, ADDRESS_LINE2, CITY, STATE, ZIP ...)
Pros: Simple JOINs. Each entity "owns" its address. Cons: Address validation logic is duplicated. If the business adds COUNTY or COUNTRY, three tables must be altered.
Option B: Shared address table with polymorphic relationship.
ADDRESS
-------
ADDRESS_ID (PK)
ENTITY_TYPE ('CUSTOMER', 'EMPLOYEE', 'BRANCH')
ENTITY_ID
ADDRESS_TYPE ('HOME', 'WORK', 'MAILING', 'LEGAL')
ADDRESS_LINE1
ADDRESS_LINE2
CITY
STATE_CODE
ZIP_CODE
COUNTRY_CODE
Pros: Single source of address logic. Supports multiple addresses per entity. Cons: Cannot enforce foreign keys (ENTITY_ID could reference any table, and DB2 cannot validate a polymorphic FK). Queries require filtering by ENTITY_TYPE.
Option C: Separate address tables per entity (recommended).
CUSTOMER_ADDRESS (CUSTOMER_ID FK, ADDRESS_TYPE, ADDRESS_LINE1, ...)
EMPLOYEE_ADDRESS (EMPLOYEE_ID FK, ADDRESS_TYPE, ADDRESS_LINE1, ...)
Pros: Foreign keys are enforceable. Each entity's addresses are isolated. Cons: Some column duplication across address tables.
Meridian Bank choice: For the initial model, we use Option A (embedded addresses) for BRANCH and EMPLOYEE (which typically have one address each) and Option C (CUSTOMER_ADDRESS) for CUSTOMER (who may have multiple addresses — home, work, mailing). This is a pragmatic compromise: we do not over-engineer the model where the business does not require it, but we handle the genuine many-address requirement for customers properly.
Pattern 4: Configurable Attributes (Flexible Fields)
Sometimes the business needs attributes that vary by category and change over time. Product features, for example: a checking account might have attributes (monthly fee, minimum balance, free checks per month) that differ from a savings account's attributes (interest rate, compounding frequency).
The proper approach: Model the known attributes directly and use a well-structured configuration table for truly variable attributes.
PRODUCT_FEATURE
---------------
PRODUCT_ID (FK, part of PK)
FEATURE_CODE (FK to FEATURE_DEFINITION, part of PK)
FEATURE_VALUE
EFFECTIVE_DATE
END_DATE
This is different from the Entity-Attribute-Value (EAV) anti-pattern (Section 13.10) because FEATURE_DEFINITION has a defined set of feature codes with data type information and validation rules. It is a controlled vocabulary, not a dumping ground.
Pattern 5: Tree Structures (Hierarchies)
Organizational charts, account categories, product taxonomies — trees appear everywhere. The most common implementations:
Adjacency list: Each row has a PARENT_ID column pointing to its parent. Simple to insert and update. Querying the full tree requires recursive CTEs (Chapter 8).
Path enumeration: Each row stores its full path: "/1/5/12/". Easy to query descendants (WHERE PATH LIKE '/1/5/%'), but path strings must be maintained on every move.
Nested sets: Each row stores a LEFT and RIGHT boundary number. Querying all descendants is a simple range query. But inserts and moves are expensive because boundary numbers must be recalculated.
Closure table: A separate table stores every ancestor-descendant pair with a depth. Flexible and efficient for reads. Requires maintenance of the closure table on every insert, move, or delete.
Meridian Bank choice: Adjacency list for the employee hierarchy and account category hierarchy. DB2's recursive CTE support (Chapter 8) makes adjacency lists practical for the query patterns we need, and the tree depth is shallow (typically 5-7 levels for employee hierarchy).
13.10 Anti-Patterns to Avoid
A data modeling anti-pattern is a design approach that seems reasonable but leads to predictable problems. Learn to recognize these so you can catch them in design reviews.
Anti-Pattern 1: Entity-Attribute-Value (EAV) Abuse
The EAV model stores data as (entity_id, attribute_name, attribute_value) triples instead of as columns:
EAV_DATA
--------
ENTITY_ID ATTRIBUTE_NAME ATTRIBUTE_VALUE
10001 'FIRST_NAME' 'Margaret'
10001 'LAST_NAME' 'Chen'
10001 'DATE_OF_BIRTH' '1978-03-15'
10001 'BALANCE' '52340.50'
Why it seems attractive: "We can add any attribute without ALTER TABLE!" This appeals to developers who want infinite flexibility.
Why it is disastrous:
- Type safety is gone. Every value is a string. DATE_OF_BIRTH and BALANCE are stored as VARCHAR. No CHECK constraints, no data type validation.
- Queries are hideous. To reconstruct a single customer record, you need to PIVOT (or write one self-JOIN per attribute). A five-column query becomes five self-JOINs.
- Referential integrity is impossible. You cannot create a foreign key from an EAV attribute to another table.
- The optimizer cannot help. DB2's statistics and histogram-based optimization depend on column-level statistics. An EAV table has no meaningful statistics.
- Indexing is reduced to brute force. You can index (ENTITY_ID, ATTRIBUTE_NAME), but you cannot create a targeted index on "all customers born before 1980."
When EAV is acceptable: Only when the attributes are genuinely dynamic and user-defined — for example, custom fields in a CRM where each client defines their own attributes. Even then, a hybrid approach (core attributes as columns, custom attributes as EAV) is better than pure EAV.
At Meridian Bank: We do not use EAV. Every attribute is a column in a properly typed table. If the business needs a new attribute, we use ALTER TABLE. That is what ALTER TABLE is for.
Anti-Pattern 2: The "One Table to Rule Them All"
This is the mega-table: hundreds of columns, mixing data from multiple business concepts, with most columns NULL for most rows.
UNIVERSAL_RECORD
---------
RECORD_ID, RECORD_TYPE, NAME1, NAME2, NAME3, DATE1, DATE2, DATE3,
AMOUNT1, AMOUNT2, CODE1, CODE2, CODE3, CODE4, CODE5,
TEXT1, TEXT2, TEXT3, TEXT4, FLAG1, FLAG2, FLAG3, ...
Depending on RECORD_TYPE, different columns have different meanings. When RECORD_TYPE = 'CUSTOMER', NAME1 is the first name and DATE1 is the date of birth. When RECORD_TYPE = 'ACCOUNT', NAME1 is the account name and DATE1 is the open date.
Why it is disastrous: Column names are meaningless. Constraints are impossible (you cannot put a CHECK constraint on DATE1 that means different things for different record types). Queries require CASE expressions everywhere. New developers cannot understand the schema without a Rosetta Stone document. The optimizer's column statistics are polluted by mixed data.
Anti-Pattern 3: OTLT (One True Lookup Table)
Instead of separate lookup tables for account types, transaction types, status codes, and country codes, some designers create a single CODES table:
CODES
-----
CODE_TYPE CODE_VALUE CODE_DESCRIPTION
'ACCT_TYPE' 'CHK' 'Checking Account'
'ACCT_TYPE' 'SAV' 'Savings Account'
'TXN_TYPE' 'DEP' 'Deposit'
'TXN_TYPE' 'WDR' 'Withdrawal'
'COUNTRY' 'USA' 'United States'
'COUNTRY' 'CAN' 'Canada'
Why it seems attractive: "One table instead of twenty! Simple!"
Why it is problematic:
- Foreign keys cannot validate. You cannot create a FK from ACCOUNT.ACCOUNT_TYPE to CODES where CODE_TYPE = 'ACCT_TYPE'. DB2 foreign keys reference primary keys or unique constraints — they do not support conditional references.
- Each code type has different attributes. Country codes need a continent and ISO numeric code. Account types need interest-bearing flags and minimum balances. The CODES table can only hold the lowest common denominator.
- Performance suffers. Every lookup JOIN hits the same table, creating a contention hotspot in the buffer pool.
The fix: Create separate lookup tables: ACCOUNT_TYPE, TRANSACTION_TYPE, COUNTRY, STATUS_CODE. Each can have its own specific attributes, constraints, and foreign keys.
Anti-Pattern 4: Over-Normalization
Yes, this is a real thing. Normalizing to an extreme — splitting every attribute into its own table — creates a schema where even simple queries require ten-way JOINs.
Example of over-normalization:
CUSTOMER (CUSTOMER_ID, NAME_ID FK, ADDRESS_ID FK, PHONE_ID FK, STATUS_ID FK, ...)
PERSON_NAME (NAME_ID, FIRST_NAME, LAST_NAME)
ADDRESS (ADDRESS_ID, LINE1, LINE2, CITY, STATE_ID FK, ZIP)
STATE (STATE_ID, STATE_CODE, STATE_NAME)
This is technically "more normalized," but now getting a customer's name requires a JOIN to PERSON_NAME, and getting their state requires two JOINs (through ADDRESS to STATE). If a customer's name is always stored and always retrieved with the customer, there is no update anomaly that normalization would fix — FIRST_NAME and LAST_NAME functionally depend on CUSTOMER_ID. Putting them in a separate table adds complexity without removing anomalies.
The rule: Normalize to eliminate real anomalies. Do not normalize to satisfy an abstract principle when no anomaly exists.
Anti-Pattern 5: Comma-Separated Lists in Columns
CUSTOMER
--------
CUSTOMER_ID PHONE_NUMBERS
10001 '555-1234,555-5678,555-9012'
This violates 1NF. Every query that needs a specific phone number must parse the string. You cannot index individual phone numbers. You cannot enforce that a phone number appears only once. You cannot JOIN to a phone number. And when the list grows past the VARCHAR column length, you hit a hard limit.
The fix: Always model multi-valued data as a separate table:
CUSTOMER_PHONE
--------------
CUSTOMER_ID (FK)
PHONE_TYPE
PHONE_NUMBER
IS_PRIMARY
Practitioner's Warning
These anti-patterns are not theoretical. I have encountered every one of them in production banking systems. The EAV pattern and the OTLT pattern are particularly insidious because they look elegant in a design meeting and only reveal their costs months later when real queries need to be written against real data volumes.
13.11 The Meridian Bank Complete Logical Model
We have learned the theory. Now let us apply it. In this section, we design the complete logical data model for Meridian National Bank, expanding far beyond the initial sketch from Chapter 2 and the basic tables from Chapter 4.
Business Context Recap
Meridian National Bank is a mid-size commercial bank with:
- 2 million customers (individuals and businesses)
- 50 branches across three states
- 4.5 million accounts (checking, savings, CDs, money market, loans, credit lines)
- 500 million transactions per year
- Core banking on DB2 for z/OS; digital banking on DB2 LUW
- Full regulatory compliance requirements (OCC, FDIC, BSA/AML, KYC)
Entity Catalog
The complete logical model contains the following entities, organized by domain:
Customer Domain
CUSTOMER — A person or business that has a relationship with the bank. Supertype for INDIVIDUAL_CUSTOMER and BUSINESS_CUSTOMER.
| Attribute | Type | Notes |
|---|---|---|
| CUSTOMER_ID | Integer (surrogate PK) | System-generated |
| CUSTOMER_TYPE | Code | 'I' (Individual), 'B' (Business) — discriminator |
| CUSTOMER_SINCE | Date | Relationship start date |
| PRIMARY_BRANCH_ID | FK to BRANCH | Customer's home branch |
| RISK_RATING | Smallint (1-5) | KYC risk assessment |
| STATUS | Code | ACTIVE, INACTIVE, CLOSED, SUSPENDED |
| TAX_ID_LAST_FOUR | Char(4) | Last 4 of SSN or EIN (masked) |
INDIVIDUAL_CUSTOMER — Subtype for personal customers.
| Attribute | Type | Notes |
|---|---|---|
| CUSTOMER_ID | FK/PK to CUSTOMER | Identifying relationship |
| FIRST_NAME | Varchar | Legal first name |
| MIDDLE_NAME | Varchar | Optional |
| LAST_NAME | Varchar | Legal last name |
| DATE_OF_BIRTH | Date | Required for individuals |
| GENDER | Code | Optional |
| MARITAL_STATUS | Code | Optional |
BUSINESS_CUSTOMER — Subtype for commercial/business customers.
| Attribute | Type | Notes |
|---|---|---|
| CUSTOMER_ID | FK/PK to CUSTOMER | Identifying relationship |
| BUSINESS_NAME | Varchar | Legal business name |
| DBA_NAME | Varchar | "Doing business as" name |
| BUSINESS_TYPE | Code | LLC, CORP, PARTNERSHIP, SOLE_PROP |
| INCORPORATION_STATE | Char(2) | State of incorporation |
| INCORPORATION_DATE | Date | |
| ANNUAL_REVENUE | Decimal | For creditworthiness |
CUSTOMER_ADDRESS — Multiple addresses per customer with type classification.
| Attribute | Type | Notes |
|---|---|---|
| CUSTOMER_ADDRESS_ID | Integer (PK) | Surrogate key |
| CUSTOMER_ID | FK to CUSTOMER | |
| ADDRESS_TYPE | Code | HOME, WORK, MAILING, LEGAL |
| ADDRESS_LINE1 | Varchar | |
| ADDRESS_LINE2 | Varchar | Optional |
| CITY | Varchar | |
| STATE_CODE | Char(2) | |
| ZIP_CODE | Varchar(10) | Supports ZIP+4 |
| COUNTRY_CODE | Char(3) | ISO 3166-1 alpha-3 |
| IS_PRIMARY | Boolean | One primary per type |
| EFFECTIVE_DATE | Date | |
| END_DATE | Date | NULL if current |
CUSTOMER_PHONE — Multiple phone numbers per customer.
| Attribute | Type | Notes |
|---|---|---|
| CUSTOMER_ID | FK to CUSTOMER (part of PK) | |
| PHONE_TYPE | Code (part of PK) | MOBILE, HOME, WORK, FAX |
| PHONE_NUMBER | Varchar(20) | E.164 format |
| IS_PRIMARY | Boolean |
CUSTOMER_EMAIL — Multiple email addresses per customer.
| Attribute | Type | Notes |
|---|---|---|
| CUSTOMER_ID | FK to CUSTOMER (part of PK) | |
| EMAIL_TYPE | Code (part of PK) | PERSONAL, WORK |
| EMAIL_ADDRESS | Varchar(200) | |
| IS_PRIMARY | Boolean | |
| IS_VERIFIED | Boolean |
CUSTOMER_DOCUMENT — Identity and regulatory documents.
| Attribute | Type | Notes |
|---|---|---|
| DOCUMENT_ID | Integer (PK) | |
| CUSTOMER_ID | FK to CUSTOMER | |
| DOCUMENT_TYPE_CODE | FK to DOCUMENT_TYPE | |
| DOCUMENT_NUMBER | Varchar | Driver's license number, passport number, etc. |
| ISSUING_AUTHORITY | Varchar | |
| ISSUE_DATE | Date | |
| EXPIRY_DATE | Date | |
| VERIFICATION_STATUS | Code | PENDING, VERIFIED, EXPIRED, REJECTED |
| VERIFIED_BY | FK to EMPLOYEE | |
| VERIFIED_DATE | Date |
DOCUMENT_TYPE — Lookup: types of identity documents.
| Attribute | Type | Notes |
|---|---|---|
| DOCUMENT_TYPE_CODE | Char(3) PK | DLI, PSP, SSC, EIN, etc. |
| DOCUMENT_TYPE_NAME | Varchar | 'Driver License', 'Passport', etc. |
| IS_GOVERNMENT_ISSUED | Boolean | |
| DESCRIPTION | Varchar |
Account Domain
ACCOUNT — Supertype for all financial accounts.
| Attribute | Type | Notes |
|---|---|---|
| ACCOUNT_ID | Integer (surrogate PK) | |
| ACCOUNT_NUMBER | Varchar (unique) | Customer-facing number |
| ACCOUNT_TYPE_CODE | FK to ACCOUNT_TYPE | Discriminator for subtypes |
| BRANCH_ID | FK to BRANCH | Account's home branch |
| CURRENT_BALANCE | Decimal(15,2) | |
| AVAILABLE_BALANCE | Decimal(15,2) | Balance minus holds |
| OPEN_DATE | Date | |
| CLOSE_DATE | Date | NULL if open |
| LAST_ACTIVITY_DATE | Date | |
| STATUS | Code | ACTIVE, INACTIVE, CLOSED, FROZEN |
ACCOUNT_HOLDER — Junction: M:N between CUSTOMER and ACCOUNT.
| Attribute | Type | Notes |
|---|---|---|
| ACCOUNT_ID | FK to ACCOUNT (part of PK) | |
| CUSTOMER_ID | FK to CUSTOMER (part of PK) | |
| HOLDER_TYPE | Code | PRIMARY, JOINT, AUTHORIZED_SIGNER, BENEFICIARY |
| ADDED_DATE | Date | |
| REMOVED_DATE | Date | NULL if active |
| CAN_TRANSACT | Boolean | |
| CAN_CLOSE | Boolean |
ACCOUNT_TYPE — Lookup: product types for accounts.
| Attribute | Type | Notes |
|---|---|---|
| ACCOUNT_TYPE_CODE | Char(3) PK | CHK, SAV, MMA, CDT, LON, CRL |
| ACCOUNT_TYPE_NAME | Varchar | 'Regular Checking', 'Premium Savings', etc. |
| CATEGORY | Code | CHECKING, SAVINGS, MONEY_MARKET, CERTIFICATE, LOAN, CREDIT |
| IS_INTEREST_BEARING | Boolean | |
| MIN_OPENING_BALANCE | Decimal | |
| DESCRIPTION | Varchar | |
| STATUS | Code |
CHECKING_DETAIL — Subtype: checking-specific attributes.
| Attribute | Type | Notes |
|---|---|---|
| ACCOUNT_ID | FK/PK to ACCOUNT | |
| OVERDRAFT_LIMIT | Decimal | |
| OVERDRAFT_PROTECTION_ACCT | FK to ACCOUNT | Linked savings |
| MONTHLY_FEE | Decimal | |
| FREE_TRANSACTIONS_PER_MONTH | Integer |
SAVINGS_DETAIL — Subtype: savings-specific attributes.
| Attribute | Type | Notes |
|---|---|---|
| ACCOUNT_ID | FK/PK to ACCOUNT | |
| INTEREST_RATE | Decimal(5,4) | |
| MIN_BALANCE | Decimal | |
| INTEREST_ACCRUAL_METHOD | Code | DAILY, MONTHLY |
| WITHDRAWAL_LIMIT_PER_MONTH | Integer | Reg D compliance |
CD_DETAIL — Subtype: certificate of deposit.
| Attribute | Type | Notes |
|---|---|---|
| ACCOUNT_ID | FK/PK to ACCOUNT | |
| TERM_MONTHS | Integer | |
| MATURITY_DATE | Date | |
| INTEREST_RATE | Decimal(5,4) | Locked rate |
| EARLY_WITHDRAWAL_PENALTY_PCT | Decimal(5,2) | |
| AUTO_RENEW | Boolean | |
| RENEWAL_TERM_MONTHS | Integer |
Transaction Domain
TRANSACTION — Every financial event that affects an account balance.
| Attribute | Type | Notes |
|---|---|---|
| TRANSACTION_ID | Bigint (PK) | High volume — use BIGINT |
| ACCOUNT_ID | FK to ACCOUNT | |
| TRANSACTION_TYPE_CODE | FK to TRANSACTION_TYPE | |
| AMOUNT | Decimal(15,2) | Positive for credits, negative for debits |
| RUNNING_BALANCE | Decimal(15,2) | Balance after this transaction |
| TRANSACTION_DATE | Date | Business date |
| TRANSACTION_TIMESTAMP | Timestamp | Exact time |
| DESCRIPTION | Varchar(200) | |
| REFERENCE_NUMBER | Varchar(30) | Check number, wire ref, etc. |
| RELATED_ACCOUNT_ID | FK to ACCOUNT | For transfers |
| CHANNEL | Code | BRANCH, ATM, ONLINE, MOBILE, WIRE, ACH, CHECK, SYSTEM |
| TELLER_ID | FK to EMPLOYEE | NULL for non-branch transactions |
| STATUS | Code | PENDING, COMPLETED, REVERSED, VOIDED |
TRANSACTION_TYPE — Lookup: classification of transactions.
| Attribute | Type | Notes |
|---|---|---|
| TRANSACTION_TYPE_CODE | Char(3) PK | DEP, WDR, TRF, PMT, FEE, INT, ADJ, CHK |
| TRANSACTION_TYPE_NAME | Varchar | |
| AFFECTS_BALANCE | Boolean | |
| REQUIRES_APPROVAL | Boolean | |
| DESCRIPTION | Varchar |
PENDING_TRANSACTION — Transactions not yet posted (holds, pending ACH).
| Attribute | Type | Notes |
|---|---|---|
| PENDING_TXN_ID | Bigint (PK) | |
| ACCOUNT_ID | FK to ACCOUNT | |
| TRANSACTION_TYPE_CODE | FK to TRANSACTION_TYPE | |
| AMOUNT | Decimal(15,2) | |
| EXPECTED_POST_DATE | Date | |
| HOLD_EXPIRY_DATE | Date | |
| SOURCE_DESCRIPTION | Varchar | |
| STATUS | Code | PENDING, POSTED, EXPIRED, CANCELLED |
| POSTED_TXN_ID | FK to TRANSACTION | Populated when posted |
Loan Domain
LOAN — Extended attributes for loan accounts.
| Attribute | Type | Notes |
|---|---|---|
| LOAN_ID | Integer (PK) | |
| ACCOUNT_ID | FK to ACCOUNT | Every loan is also an account |
| LOAN_TYPE | Code | MORTGAGE, AUTO, PERSONAL, BUSINESS, HELOC, STUDENT |
| ORIGINAL_AMOUNT | Decimal(15,2) | |
| CURRENT_PRINCIPAL | Decimal(15,2) | |
| INTEREST_RATE | Decimal(5,4) | |
| RATE_TYPE | Code | FIXED, VARIABLE |
| TERM_MONTHS | Integer | |
| MONTHLY_PAYMENT | Decimal(12,2) | |
| ORIGINATION_DATE | Date | |
| MATURITY_DATE | Date | |
| NEXT_PAYMENT_DATE | Date | |
| LOAN_OFFICER_ID | FK to EMPLOYEE | |
| STATUS | Code | ACTIVE, DELINQUENT, PAID_OFF, CHARGED_OFF, FORECLOSED |
LOAN_PAYMENT — Individual payments against a loan (weak entity).
| Attribute | Type | Notes |
|---|---|---|
| LOAN_ID | FK to LOAN (part of PK) | Identifying relationship |
| PAYMENT_SEQUENCE | Integer (part of PK) | |
| PAYMENT_DATE | Date | |
| TOTAL_AMOUNT | Decimal(12,2) | |
| PRINCIPAL_AMOUNT | Decimal(12,2) | |
| INTEREST_AMOUNT | Decimal(12,2) | |
| ESCROW_AMOUNT | Decimal(12,2) | For mortgages |
| LATE_FEE_AMOUNT | Decimal(12,2) | |
| TRANSACTION_ID | FK to TRANSACTION | Links to general transaction |
LOAN_COLLATERAL — Assets securing a loan.
| Attribute | Type | Notes |
|---|---|---|
| COLLATERAL_ID | Integer (PK) | |
| LOAN_ID | FK to LOAN | |
| COLLATERAL_TYPE | Code | REAL_ESTATE, VEHICLE, EQUIPMENT, SECURITIES, OTHER |
| DESCRIPTION | Varchar(500) | |
| ESTIMATED_VALUE | Decimal(15,2) | |
| APPRAISAL_DATE | Date | |
| APPRAISED_BY | Varchar |
Product and Fee Domain
PRODUCT — Financial products offered by the bank.
| Attribute | Type | Notes |
|---|---|---|
| PRODUCT_ID | Integer (PK) | |
| PRODUCT_CODE | Varchar(10) (unique) | |
| PRODUCT_NAME | Varchar(100) | |
| PRODUCT_CATEGORY | Code | DEPOSIT, LOAN, CREDIT, INVESTMENT, SERVICE |
| ACCOUNT_TYPE_CODE | FK to ACCOUNT_TYPE | Maps product to account type |
| DESCRIPTION | Varchar(500) | |
| LAUNCH_DATE | Date | |
| DISCONTINUE_DATE | Date | |
| STATUS | Code |
FEE_SCHEDULE — Fee definitions for products and services.
| Attribute | Type | Notes |
|---|---|---|
| FEE_SCHEDULE_ID | Integer (PK) | |
| PRODUCT_ID | FK to PRODUCT | Nullable (bank-wide fees) |
| FEE_TYPE | Code | MONTHLY, OVERDRAFT, ATM, WIRE, STMT, EARLY_CLOSE |
| FEE_NAME | Varchar(100) | |
| FEE_AMOUNT | Decimal(10,2) | |
| FEE_CALCULATION | Code | FLAT, PERCENTAGE, TIERED |
| EFFECTIVE_DATE | Date | |
| END_DATE | Date |
INTEREST_RATE_SCHEDULE — Rate tiers for products.
| Attribute | Type | Notes |
|---|---|---|
| RATE_SCHEDULE_ID | Integer (PK) | |
| PRODUCT_ID | FK to PRODUCT | |
| BALANCE_TIER_MIN | Decimal(15,2) | |
| BALANCE_TIER_MAX | Decimal(15,2) | |
| INTEREST_RATE | Decimal(5,4) | |
| EFFECTIVE_DATE | Date | |
| END_DATE | Date |
Branch and Employee Domain
BRANCH — Physical bank locations.
| Attribute | Type | Notes |
|---|---|---|
| BRANCH_ID | Integer (PK) | |
| BRANCH_CODE | Char(6) (unique) | |
| BRANCH_NAME | Varchar(100) | |
| ADDRESS_LINE1 | Varchar(200) | Embedded (single address) |
| ADDRESS_LINE2 | Varchar(200) | |
| CITY | Varchar(100) | |
| STATE_CODE | Char(2) | |
| ZIP_CODE | Varchar(10) | |
| PHONE_NUMBER | Varchar(20) | |
| FAX_NUMBER | Varchar(20) | |
| MANAGER_EMP_ID | FK to EMPLOYEE | Branch manager |
| OPEN_DATE | Date | |
| CLOSE_DATE | Date | |
| STATUS | Code | ACTIVE, INACTIVE, CLOSED |
EMPLOYEE — Bank staff.
| Attribute | Type | Notes |
|---|---|---|
| EMPLOYEE_ID | Integer (PK) | |
| EMPLOYEE_NUMBER | Varchar(10) (unique) | |
| FIRST_NAME | Varchar(50) | |
| LAST_NAME | Varchar(50) | |
| EMAIL_ADDRESS | Varchar(200) | |
| PHONE_NUMBER | Varchar(20) | |
| JOB_TITLE | Varchar(100) | |
| DEPARTMENT | Code | |
| BRANCH_ID | FK to BRANCH | |
| HIRE_DATE | Date | |
| TERMINATION_DATE | Date | NULL if active |
| MANAGER_ID | FK to EMPLOYEE | Self-referencing |
| STATUS | Code | ACTIVE, INACTIVE, TERMINATED |
BRANCH_PRODUCT — Junction: which products are offered at which branches.
| Attribute | Type | Notes |
|---|---|---|
| BRANCH_ID | FK to BRANCH (part of PK) | |
| PRODUCT_ID | FK to PRODUCT (part of PK) | |
| EFFECTIVE_DATE | Date | |
| END_DATE | Date | |
| CUSTOM_RATE_ADJUSTMENT | Decimal(5,4) | Branch-specific rate modifier |
Regulatory and Audit Domain
AUDIT_LOG — Centralized change tracking.
| Attribute | Type | Notes |
|---|---|---|
| AUDIT_ID | Bigint (PK) | |
| TABLE_NAME | Varchar(128) | |
| OPERATION | Char(1) | I, U, D |
| PRIMARY_KEY_VALUE | Varchar(200) | |
| COLUMN_NAME | Varchar(128) | NULL for INSERT/DELETE |
| OLD_VALUE | Varchar(1000) | |
| NEW_VALUE | Varchar(1000) | |
| CHANGED_BY | Varchar(128) | |
| CHANGED_TIMESTAMP | Timestamp | |
| APPLICATION_ID | Varchar(128) |
REGULATORY_REPORT — Tracking of regulatory filings.
| Attribute | Type | Notes |
|---|---|---|
| REPORT_ID | Integer (PK) | |
| REPORT_TYPE | Code | CTR, SAR, CRA, CALL |
| REPORTING_PERIOD_START | Date | |
| REPORTING_PERIOD_END | Date | |
| GENERATED_TIMESTAMP | Timestamp | |
| GENERATED_BY | FK to EMPLOYEE | |
| FILED_DATE | Date | |
| STATUS | Code | DRAFT, GENERATED, FILED, AMENDED |
CTR_FILING — Currency Transaction Reports (BSA requirement for cash > $10,000).
| Attribute | Type | Notes |
|---|---|---|
| CTR_ID | Integer (PK) | |
| REPORT_ID | FK to REGULATORY_REPORT | |
| CUSTOMER_ID | FK to CUSTOMER | |
| TRANSACTION_ID | FK to TRANSACTION | |
| CASH_IN_AMOUNT | Decimal(15,2) | |
| CASH_OUT_AMOUNT | Decimal(15,2) | |
| FILING_DATE | Date |
ALERT — Suspicious activity alerts for BSA/AML compliance.
| Attribute | Type | Notes |
|---|---|---|
| ALERT_ID | Integer (PK) | |
| ALERT_TYPE | Code | LARGE_CASH, STRUCTURING, UNUSUAL_PATTERN, SANCTIONS |
| CUSTOMER_ID | FK to CUSTOMER | |
| ACCOUNT_ID | FK to ACCOUNT | |
| TRANSACTION_ID | FK to TRANSACTION | |
| ALERT_DATE | Date | |
| DESCRIPTION | Varchar(1000) | |
| ASSIGNED_TO | FK to EMPLOYEE | |
| RESOLUTION | Code | OPEN, INVESTIGATING, CLEARED, ESCALATED, SAR_FILED |
| RESOLVED_DATE | Date |
Online Banking Domain
ONLINE_PROFILE — Digital banking credentials and preferences (1:1 with CUSTOMER, optional).
| Attribute | Type | Notes |
|---|---|---|
| CUSTOMER_ID | FK/PK to CUSTOMER | 1:1 with CUSTOMER |
| USERNAME | Varchar (unique) | |
| PASSWORD_HASH | Varchar | |
| MFA_ENABLED | Boolean | |
| MFA_METHOD | Code | SMS, EMAIL, AUTHENTICATOR |
| LAST_LOGIN_TIMESTAMP | Timestamp | |
| FAILED_LOGIN_COUNT | Integer | |
| ACCOUNT_LOCKED | Boolean | |
| ENROLLMENT_DATE | Date |
LOGIN_HISTORY — Audit trail for digital access.
| Attribute | Type | Notes |
|---|---|---|
| LOGIN_ID | Bigint (PK) | |
| CUSTOMER_ID | FK to CUSTOMER | |
| LOGIN_TIMESTAMP | Timestamp | |
| IP_ADDRESS | Varchar(45) | IPv6 support |
| DEVICE_FINGERPRINT | Varchar(200) | |
| CHANNEL | Code | WEB, MOBILE_IOS, MOBILE_ANDROID |
| SUCCESS | Boolean | |
| FAILURE_REASON | Code | NULL if successful |
Relationship Summary
The complete model contains the following key relationships:
| Parent Entity | Child Entity | Cardinality | Participation |
|---|---|---|---|
| CUSTOMER | INDIVIDUAL_CUSTOMER | 1:1 | Supertype/subtype |
| CUSTOMER | BUSINESS_CUSTOMER | 1:1 | Supertype/subtype |
| CUSTOMER | CUSTOMER_ADDRESS | 1:N | Optional (0..*) |
| CUSTOMER | CUSTOMER_PHONE | 1:N | Optional (0..*) |
| CUSTOMER | CUSTOMER_EMAIL | 1:N | Optional (0..*) |
| CUSTOMER | CUSTOMER_DOCUMENT | 1:N | Optional (0..*) |
| CUSTOMER -- ACCOUNT | ACCOUNT_HOLDER | M:N (via junction) | |
| ACCOUNT | CHECKING_DETAIL | 1:1 | Supertype/subtype |
| ACCOUNT | SAVINGS_DETAIL | 1:1 | Supertype/subtype |
| ACCOUNT | CD_DETAIL | 1:1 | Supertype/subtype |
| ACCOUNT | TRANSACTION | 1:N | Mandatory |
| ACCOUNT | PENDING_TRANSACTION | 1:N | Optional |
| ACCOUNT | LOAN | 1:1 | Optional |
| LOAN | LOAN_PAYMENT | 1:N | Optional |
| LOAN | LOAN_COLLATERAL | 1:N | Optional |
| BRANCH | ACCOUNT | 1:N | |
| BRANCH | EMPLOYEE | 1:N | |
| BRANCH | CUSTOMER (primary) | 1:N | |
| BRANCH -- PRODUCT | BRANCH_PRODUCT | M:N (via junction) | |
| EMPLOYEE | EMPLOYEE (manager) | 1:N (self) | Optional |
| EMPLOYEE | BRANCH (as manager) | 1:1 | Optional |
| EMPLOYEE | LOAN (as officer) | 1:N | Optional |
| PRODUCT | FEE_SCHEDULE | 1:N | |
| PRODUCT | INTEREST_RATE_SCHEDULE | 1:N | |
| CUSTOMER | ONLINE_PROFILE | 1:1 | Optional |
| CUSTOMER | LOGIN_HISTORY | 1:N | Optional |
Design Decisions Log
Every non-obvious design decision should be documented. Here are the key decisions for the Meridian Bank model:
-
Surrogate keys everywhere. Every entity has a system-generated integer primary key. Natural keys (ACCOUNT_NUMBER, EMPLOYEE_NUMBER, BRANCH_CODE) are enforced as UNIQUE constraints.
-
CUSTOMER supertype/subtype split. Individuals and businesses share a customer relationship but have fundamentally different attributes. Using table-per-type avoids a wide table full of NULLs.
-
ACCOUNT supertype/subtype split. Same reasoning as CUSTOMER. The discriminator is ACCOUNT_TYPE_CODE.
-
ACCOUNT_HOLDER junction table. We model the customer-account relationship as M:N to support joint accounts, authorized signers, and beneficiaries from day one. The overhead is one additional JOIN compared to a simple FK, which is modest.
-
Separate TRANSACTION and PENDING_TRANSACTION. Posted transactions are immutable (a regulatory requirement). Pending transactions change status. Keeping them in separate tables avoids mixing mutable and immutable data.
-
LOAN as an extension of ACCOUNT, not a separate hierarchy. Every loan is also an account (it has a balance, receives payments, generates statements). LOAN is a detail table that extends ACCOUNT with loan-specific attributes.
-
Embedded addresses for BRANCH and EMPLOYEE; separate table for CUSTOMER. Branches and employees have one address each. Customers have multiple addresses that change over time.
-
BIGINT for TRANSACTION_ID and AUDIT_ID. With 500 million transactions per year and a multi-year retention requirement, INTEGER (max ~2.1 billion) will be exhausted within a few years. BIGINT provides headroom.
-
Effective dating on FEE_SCHEDULE and INTEREST_RATE_SCHEDULE. Fee and rate structures change over time. We need to know what fee was in effect when a transaction occurred.
-
Separate DOCUMENT_TYPE lookup table. Different document types have different attributes (government-issued flag). This is not an OTLT — it is a properly designed lookup table for a specific domain.
Check Your Understanding — Section 13.11
- Why did we choose a junction table (ACCOUNT_HOLDER) for the customer-account relationship instead of a simple foreign key in ACCOUNT?
- If Meridian Bank adds a new product type — "Health Savings Account" — what changes are needed in the logical model?
- Trace the JOINs needed to answer: "Show me all transactions for all accounts held by customer 'Margaret Chen' in the last 30 days." List every table involved.
13.12 From Logical to Physical — Preview
The logical model describes what data means. The physical model describes how DB2 stores it. The gap between them is where DBA expertise lives.
In Chapter 14, we will translate the Meridian Bank logical model into physical DB2 objects. Here is a preview of the decisions that await:
Data Type Selection
The logical model says "Decimal(15,2) for monetary amounts." The physical design must choose between DECIMAL and DECFLOAT, decide on the scale and precision, and consider whether DECIMAL(15,2) provides enough headroom for international currencies with different decimal precisions.
Tablespace Assignment
Each table will be assigned to a tablespace based on its access pattern:
- TRANSACTION — Range-partitioned by TRANSACTION_DATE (monthly partitions) for efficient historical queries and partition-level maintenance.
- CUSTOMER — Partition-by-growth (LUW) or universal tablespace (z/OS) for a large but slowly growing table.
- ACCOUNT_TYPE — Small lookup table; can coexist with other lookup tables in a shared tablespace.
Page Size
Wide tables (many columns or long VARCHAR/LOB columns) may need 16K or 32K page sizes. Narrow tables with high-volume point lookups perform better with 4K or 8K pages (more rows fit in the buffer pool).
Compression
The TRANSACTION table with 500 million rows per year is a prime candidate for row-level compression. DB2's dictionary-based compression typically achieves 60-80% space savings on transaction data with repetitive column values (same branch IDs, same transaction types).
Denormalization Decisions
The logical model is in 3NF. The physical model may include controlled denormalization:
- Storing CUSTOMER_NAME in TRANSACTION for statement generation performance (avoiding the JOIN chain TRANSACTION -> ACCOUNT -> ACCOUNT_HOLDER -> CUSTOMER for every statement line).
- Storing RUNNING_BALANCE in TRANSACTION as a denormalized, derived attribute (avoiding a running-sum query over potentially millions of transactions).
Every denormalization decision will be documented with the anomaly risk it introduces, the performance benefit it provides, and the mechanism (trigger, stored procedure, or application code) that keeps the denormalized data in sync.
Referential Integrity Enforcement
DB2 offers several options for foreign key enforcement: RESTRICT (prevent parent deletion), CASCADE (delete children with parent), SET NULL (null out foreign key on parent deletion). For the Meridian Bank model:
- CUSTOMER -> ACCOUNT_HOLDER: CASCADE (if a customer is deleted, remove their account holder records)
- ACCOUNT -> TRANSACTION: RESTRICT (cannot delete an account that has transactions)
- EMPLOYEE -> LOAN (loan officer): SET NULL (if an employee leaves, the loan remains but the officer reference is cleared)
These decisions are physical design choices with business implications, and they will be fully explored in Chapter 14.
Spaced Review
These questions revisit concepts from earlier chapters to strengthen long-term retention.
From Chapter 2: The Relational Model
Review Question 1: In Chapter 2, we distinguished between a relation (mathematical concept) and a table (SQL/DB2 implementation). Name two ways in which a DB2 table can deviate from a pure relation, and explain the constraint you would add to make the table behave more like a proper relation.
Show Answer
Two deviations: 1. **Duplicate rows.** A relation is a set (no duplicates). A DB2 table allows duplicate rows unless you enforce a PRIMARY KEY or UNIQUE constraint. Fix: always define a primary key. 2. **Row ordering.** A relation has no inherent ordering. DB2 can return rows in any order (which is relationally correct), but developers sometimes depend on insertion order. Fix: always use ORDER BY when order matters; never assume row order.From Chapter 4: Setting Up Your DB2 Environment
Review Question 2: In Chapter 4, we created buffer pools BP_DATA_32K and BP_INDEX_8K with different page sizes. Why did we use a smaller page size for the index buffer pool?
Show Answer
Index pages are typically much smaller than data pages. B+ tree index entries consist of key columns and row pointers, which are narrow compared to full data rows. Using 8K pages for indexes means more index entries fit in each page, and more index pages fit in the buffer pool, increasing the cache hit ratio for index lookups. Using 32K pages for indexes would waste buffer pool memory storing partially empty pages.From Chapter 11: Data Definition Language
Review Question 3: In Chapter 11, we learned that CHECK constraints enforce domain rules. In the Meridian Bank logical model, we defined ACCOUNT.STATUS with valid values {ACTIVE, INACTIVE, CLOSED, FROZEN}. Write the CHECK constraint DDL, and explain why this approach is preferable to a foreign key to a STATUS lookup table for a domain with only four values.
Show Answer
CONSTRAINT CHK_ACCT_STATUS CHECK (STATUS IN ('A', 'I', 'C', 'F'))
For a small, stable domain (four values that rarely change), a CHECK constraint is preferable because:
1. No JOIN is needed to validate or display the status.
2. The constraint is visible in the DDL, serving as documentation.
3. There is no lookup table to manage, populate, or cache.
A foreign key to a lookup table is better when the domain is large (country codes), changes frequently (product types), or has additional attributes (status descriptions, display order, active flag).
Chapter Summary
This chapter transformed you from a SQL practitioner into a data modeler. We covered:
-
Why logical design matters — the cost of getting it wrong, measured in data anomalies, performance degradation, and maintenance burden.
-
Requirements gathering — interviewing stakeholders, analyzing existing systems, extracting entities from business processes, and avoiding common traps.
-
Entities and attributes — identifying what the business needs to track, classifying attributes (simple, composite, derived, multi-valued), defining domains, and choosing primary keys.
-
Relationships and cardinality — modeling one-to-one, one-to-many, and many-to-many relationships with correct participation and cardinality constraints.
-
ER diagram notation — Chen, Crow's foot, and UML notation, with Crow's foot as our standard for the rest of the book.
-
Resolving M:N relationships — junction tables as first-class entities with their own attributes.
-
Subtypes and supertypes — modeling inheritance hierarchies and choosing implementation strategies (single table, table-per-type, table-per-concrete-class).
-
Normalization as validation — using 1NF through BCNF as a check on your ER model, not as a design method.
-
Common design patterns — audit trails, effective dating, address polymorphism, configurable attributes, and tree structures.
-
Anti-patterns — EAV abuse, mega-tables, OTLT, over-normalization, and comma-separated lists.
-
The Meridian Bank complete logical model — 30+ entities across customer, account, transaction, loan, product, branch/employee, regulatory, and online banking domains.
-
The bridge to physical design — a preview of how the logical model maps to DB2 tablespaces, page sizes, compression, and denormalization decisions in Chapter 14.
The logical model is a contract. It says: "This is what the data means. These are the business rules. These are the relationships." The physical model (Chapter 14), the index strategy (Chapter 15), and the schema evolution plan (Chapter 16) all flow from this contract. Get the contract right, and everything downstream is tractable. Get it wrong, and no amount of tuning will save you.
Next, in Chapter 14, we will take this logical model and make it real — translating every entity into a DB2 table, every relationship into a foreign key, and every design decision into DDL that the optimizer can work with.
In the next chapter: Physical Database Design — translating the logical model into DB2 tablespaces, page sizes, compression, and partitioning strategies.
Related Reading
Explore this topic in other books
IBM DB2 Physical Design IBM DB2 The Relational Model Intro to Data Science Reshaping and Transforming Data