Chapter 2: The Relational Model — How DB2 Thinks About Data

DataField.Dev

43 min read

> "The relational model is based on the concept that all data is represented as mathematical n-ary relations. An n-ary relation is a subset of the Cartesian product of n domains."

Prerequisites

Chapter 1 background on what Db2 is and the LUW vs. z/OS distinction
Comfort reading rows-and-columns data such as spreadsheets
No prior SQL knowledge required

Learning Objectives

Explain the relational model and its mathematical foundations from Codd's 1970 paper
Contrast hierarchical, network, and relational data models
Identify entities, attributes, tuples, relations, and domains in real-world data
Apply primary keys, foreign keys, and referential integrity to bank data
Design first-normal-form tables that eliminate duplication and update anomalies
Sketch the initial CUSTOMER, ACCOUNT, and BRANCH tables for Meridian National Bank

In This Chapter

2.1 Before Relational — Hierarchical and Network Databases
2.2 Codd's Vision — The Relational Model
2.3 Tables, Rows, and Columns — The Building Blocks
2.4 Keys — The Glue That Holds It All Together
2.5 Data Types in DB2
2.6 Normalization — Organizing Data to Eliminate Redundancy
2.7 Denormalization — When and Why to Break the Rules
2.8 Relational Algebra Intuition
2.9 The Meridian Bank Core Tables
2.10 Constraints and Integrity — DB2 as Your Safety Net
Spaced Review — Chapter 1
Bringing It Together
Key Terms Introduced in This Chapter

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

Chapter 2: The Relational Model — How DB2 Thinks About Data

"The relational model is based on the concept that all data is represented as mathematical n-ary relations. An n-ary relation is a subset of the Cartesian product of n domains." — Edgar F. Codd, "A Relational Model of Data for Large Shared Data Banks," 1970

Picture this. It is Monday morning at Meridian National Bank, and someone from the mortgage department has just emailed you a spreadsheet. It contains 14,000 rows of customer data. Some rows have the customer's name in a single cell — "John Q. Public" — while others split it across three columns. The branch column sometimes says "Downtown," sometimes "DWNTWN," and sometimes "Branch #4 (Main St)." Phone numbers appear in at least six different formats. Three hundred rows have the same customer listed multiple times because they hold multiple accounts, and every row duplicates the customer's address, date of birth, and Social Security number. When you scroll to row 8,712, you notice that Margaret Chen's address is different in two rows — and you have no idea which one is current.

Now picture a different world. You open DB2 and query the CUSTOMER table. One row per customer. One canonical address. One phone number column with a CHECK constraint enforcing format. Margaret Chen appears exactly once. Her three accounts live in a separate ACCOUNT table, linked to her customer record by a foreign key. If you need Margaret's address alongside her account balances, you JOIN the two tables — and you always get the single, authoritative address.

The difference between that spreadsheet and that database is not just tooling. It is a fundamentally different way of thinking about data. That way of thinking has a name: the relational model. And it is the foundation upon which every DB2 database is built.

In this chapter, we will take the relational model apart piece by piece. We will learn where it came from, why it displaced everything that came before it, and how DB2 implements it in practice. By the end, you will have designed the first core tables for Meridian National Bank — and you will understand why each design decision matters.

2.1 Before Relational — Hierarchical and Network Databases

To appreciate why the relational model was revolutionary, you need to understand what it replaced. In the 1960s and early 1970s, two data models dominated commercial computing: the hierarchical model and the network model.

The Hierarchical Model and IMS

IBM's own Information Management System (IMS), first deployed in 1966 for the Apollo space program, is the canonical example of a hierarchical database. In IMS, data is organized into tree structures. A parent segment contains child segments, which can contain grandchild segments, and so on. If you wanted to store customers and their accounts, you would define CUSTOMER as a parent segment and ACCOUNT as a child segment beneath it.

This worked beautifully — as long as your queries followed the tree. "Give me all accounts for customer 10042" was fast and natural. But "give me all customers who have an account at the Downtown branch" was a nightmare, because BRANCH was not an ancestor of ACCOUNT in the tree — you had to traverse every customer to find their accounts and check the branch. The data model dictated what questions you could ask efficiently, and asking the "wrong" question meant reorganizing your entire database or writing tortuous application code.

Here is what a simplified IMS hierarchy might look like for a bank:

BANK (root)
  └── BRANCH
        └── CUSTOMER
              └── ACCOUNT
                    └── TRANSACTION

Need to find all transactions across all branches for a single customer? You are walking the entire tree. Need to find all customers at a branch who have savings accounts? Different walk. The physical organization of the data and the logical access paths were tightly coupled.

The Network Model and CODASYL

The network model, standardized by the CODASYL (Conference on Data Systems Languages) committee in 1969, was more flexible. Instead of strict parent-child trees, it allowed records to participate in multiple "sets" — essentially many-to-many relationships implemented through pointer chains. A CUSTOMER record could be linked to multiple ACCOUNT records, and an ACCOUNT record could be linked back to a BRANCH record through a different set.

This solved some of the hierarchical model's limitations, but at a steep cost: complexity. Application programmers had to navigate pointer chains explicitly. They wrote code like "FIND NEXT ACCOUNT WITHIN CUSTOMER-ACCOUNT SET." Changing the database structure meant rewriting application programs. The database was not just storing data — it was imposing a physical navigation path on every program that used it.

Why Relational Won

Both models had the same fundamental problem: physical data independence did not exist. The way data was physically stored dictated how programs accessed it. If you reorganized the storage for performance, programs broke. If you needed to ask a new kind of question, you might need to restructure the entire database.

Edgar F. Codd, a mathematician working at IBM's San Jose Research Laboratory, saw this and asked a deceptively simple question: What if we separated the logical view of data from its physical storage entirely?

That question — and his answer — changed everything.

A Note on IMS and DB2 at IBM

It is one of the great ironies of computing history that IBM developed both IMS and DB2. IMS was (and remains) deeply embedded in IBM's mainframe ecosystem. When Codd proposed the relational model, there was significant internal resistance at IBM — IMS was a cash cow, and the relational model threatened it. DB2 for z/OS was not released until 1983, thirteen years after Codd's paper. Even today, IMS runs at many large enterprises alongside DB2 on z/OS. You may encounter both in your career, and understanding why relational won helps you appreciate what DB2 gives you that IMS does not.

2.2 Codd's Vision — The Relational Model

In June 1970, Edgar F. Codd published "A Relational Model of Data for Large Shared Data Banks" in the Communications of the ACM. It is one of the most influential papers in the history of computer science, and it is worth understanding not just as history, but as a living design philosophy that DB2 embodies every day.

The Core Idea

Codd's insight was this: represent all data as relations — mathematical tables where each row is a tuple (an ordered collection of values) and each column is an attribute (a named domain of values). Provide a high-level language for manipulating these relations (what became SQL), and let the database management system figure out the most efficient way to execute the operations. The programmer describes what data they want. The system determines how to get it.

This separation of what from how is the relational model's deepest gift. When you write:

SELECT c.CUSTOMER_NAME, a.ACCOUNT_BALANCE
FROM CUSTOMER c
  JOIN ACCOUNT a ON c.CUSTOMER_ID = a.CUSTOMER_ID
WHERE a.ACCOUNT_TYPE = 'SAVINGS'
  AND a.ACCOUNT_BALANCE > 50000.00;

You are not telling DB2 which index to use, which table to read first, or how to combine the results. You are declaring what you want, and DB2's optimizer — one of the most sophisticated query optimizers ever built — decides the execution plan. If you add an index next month, or reorganize the tablespace, or partition the table differently, your query does not change. That is physical data independence, and it was Codd's central goal.

Codd's Twelve Rules

In 1985, Codd published twelve rules (numbered 0 through 12, so actually thirteen) that a database management system must satisfy to be considered fully relational. No commercial RDBMS satisfies all of them perfectly — they are aspirational standards. But understanding them tells you what the relational model is trying to achieve, and where DB2 comes close.

Here are the rules that matter most for a practicing DB2 professional:

Rule 0 — The Foundation Rule. A relational DBMS must manage its stored data using only its relational capabilities. This seems obvious, but it means you should not need to bypass the relational engine to get work done. DB2 satisfies this — even its system catalog is stored in relational tables that you can query with SQL.

Rule 1 — The Information Rule. All information in the database is represented explicitly as values in tables. In DB2, this extends to metadata: the SYSCAT views (LUW) and SYSIBM catalog tables (z/OS) store table definitions, column definitions, index definitions, and statistics as ordinary rows in ordinary tables. You can SELECT * FROM SYSCAT.TABLES and see every table in the database.

Rule 2 — Guaranteed Access. Every datum is accessible by a combination of table name, column name, and primary key value. This is why primary keys matter — they are not just a good idea, they are a fundamental requirement of the relational model.

Rule 3 — Systematic Treatment of NULL. NULL represents missing or inapplicable information, distinct from zero or an empty string. DB2 supports NULL fully, and we will see in Section 2.10 how NOT NULL constraints let you control where NULLs are allowed.

Rule 4 — Active Online Catalog. The database description is stored in the same relational format as ordinary data, accessible through the same relational language. DB2's catalog tables are one of its great strengths — experienced DBAs query them constantly.

Rule 5 — Comprehensive Data Sublanguage. The system must support at least one language with data definition, data manipulation, integrity constraints, authorization, and transaction management. SQL is that language for DB2. A single CREATE TABLE statement defines structure, constraints, and (with GRANT) authorization.

Rule 6 — View Updating. All views that are theoretically updatable must be updatable by the system. DB2 supports updatable views, though with some restrictions (views with JOINs, aggregations, or DISTINCT are generally not updatable — and for good reason).

Rule 8 — Physical Data Independence. Application programs are unaffected by changes to storage structures or access methods. This is the big one. You can reorganize a DB2 tablespace, add or drop an index, redistribute data across partitions, and your SQL stays the same. The optimizer adjusts. Your programs do not break.

Rule 9 — Logical Data Independence. Application programs are unaffected by information-preserving changes to table structures. DB2 supports this through views and the ALTER TABLE statement. You can add a column to a table without breaking existing queries (they simply do not select the new column unless asked).

Rule 12 — Non-Subversion. If the system provides a low-level access path, that path must not be able to bypass the relational integrity constraints. DB2 enforces this — you cannot sneak past a foreign key constraint by using a utility or internal API in normal operation. (There are narrow exceptions with certain administrative utilities on z/OS, but they require explicit DBA authority and are intended for recovery scenarios, not routine use.)

The rules we skipped (7, 10, 11) address high-level operations on sets, distribution independence, and integrity independence — important in theory, largely satisfied by modern DB2 in practice.

Check Your Understanding — Section 2.2

In your own words, what is the difference between telling a database what data you want versus how to get it?

Why is physical data independence valuable for a bank like Meridian National that will operate its database for decades?

Rule 1 says all information is represented as values in tables. Where in DB2 can you find information about your tables stored as values in tables?

2.3 Tables, Rows, and Columns — The Building Blocks

The relational model uses formal mathematical terminology. The database industry uses informal terminology. DB2 uses SQL terminology. All three describe the same things, and you need to be fluent in all three.

Formal (Relational Theory)	Informal (Common Usage)	SQL / DB2 Term
Relation	Table	TABLE
Tuple	Row	ROW
Attribute	Column	COLUMN
Domain	Data type / valid values	Data type + constraints
Relation schema	Table definition	CREATE TABLE statement
Relation instance	Table contents at a point in time	Result of SELECT *

Relations (Tables)

A relation is a set of tuples that all have the same attributes. In DB2, this is a table. The word "set" is important — in pure relational theory, a relation cannot contain duplicate tuples. In SQL and DB2, a table can contain duplicate rows unless you enforce uniqueness through constraints. This is one of SQL's departures from pure relational theory, and it is why primary keys and unique constraints are not optional luxuries — they are how you get DB2 to behave like a proper relational system.

A table in DB2 has: - A schema (namespace) — for example, MERIDIAN or HR. - A name — for example, CUSTOMER. - A set of columns, each with a name and data type. - Zero or more constraints (primary keys, foreign keys, check constraints, unique constraints). - Rows containing the actual data.

The fully qualified name of a table in DB2 is SCHEMA.TABLE_NAME — for example, MERIDIAN.CUSTOMER. On z/OS, the schema is called the qualifier and is often tied to the authorization ID. On LUW, you can create explicit schemas with CREATE SCHEMA.

Tuples (Rows)

A tuple is an ordered collection of values, one for each attribute in the relation. In DB2, this is a row. When you INSERT INTO CUSTOMER VALUES (10001, 'Margaret', 'Chen', '1978-03-15', ...), you are adding a tuple to the CUSTOMER relation.

One critical property: in relational theory, tuples within a relation are unordered. There is no "first row" or "last row." When you SELECT * FROM CUSTOMER, DB2 does not guarantee any particular order unless you specify ORDER BY. This surprises beginners who expect rows to come back in insertion order. They might — or they might not. DB2 is free to return them in whatever order is most efficient.

Attributes (Columns)

An attribute is a named component of a tuple, drawn from a domain (a set of permitted values). In DB2, attributes are columns, and domains are implemented through data types and constraints.

Every column has: - A name (unique within the table). - A data type (INTEGER, VARCHAR(100), DATE, etc.). - A nullability setting (whether NULL is allowed). - Optional default value. - Optional constraints (CHECK, UNIQUE, etc.).

The column name should be descriptive and consistent. At Meridian National Bank, we will adopt a naming convention early: table-qualified column names like CUSTOMER_ID, ACCOUNT_ID, BRANCH_ID. This convention prevents ambiguity when joining tables — CUSTOMER.CUSTOMER_ID and ACCOUNT.CUSTOMER_ID make the relationship obvious.

The Degree and Cardinality of a Table

Two terms you will encounter in DB2 documentation and performance tuning:

Degree: The number of columns in a table. The CUSTOMER table might have a degree of 15 (fifteen columns).
Cardinality: The number of rows in a table. If Meridian has 2 million customers, the CUSTOMER table has a cardinality of 2,000,000.

DB2's optimizer uses cardinality estimates (stored in the catalog as statistics) to choose execution plans. When you run RUNSTATS (LUW) or RUNSTATS UTILITY (z/OS), you are updating these estimates. We will cover this in depth in the performance chapters, but know that the optimizer's decisions are only as good as its statistics.

2.4 Keys — The Glue That Holds It All Together

If tables are the bones of a relational database, keys are the joints. They connect tables to each other, enforce uniqueness, and give every row a guaranteed identity. Understanding keys deeply is not optional — it is the difference between a database that works and one that slowly descends into chaos.

Candidate Keys

A candidate key is any column (or combination of columns) that uniquely identifies every row in a table. The formal requirements are:

Uniqueness: No two rows can have the same value(s) for the candidate key columns.
Irreducibility (minimality): No subset of the candidate key columns also satisfies uniqueness. If columns (A, B) together are unique, but column A alone is also unique, then (A, B) is not a candidate key — A is.

Consider the CUSTOMER table at Meridian Bank. Several column combinations might uniquely identify a customer: - CUSTOMER_ID (a system-generated number) — unique by design. - SSN (Social Security Number) — unique in practice for US customers. - (LAST_NAME, FIRST_NAME, DATE_OF_BIRTH) — probably not unique (there could be two John Smiths born on the same day).

CUSTOMER_ID and SSN are both candidate keys. (LAST_NAME, FIRST_NAME, DATE_OF_BIRTH) is not, because it fails the uniqueness requirement.

Primary Keys

The primary key is the candidate key you designate as the official row identifier. A table can have only one primary key. In DB2, declaring a primary key automatically creates a unique index (unless one already exists) and enforces NOT NULL on all primary key columns.

CREATE TABLE MERIDIAN.CUSTOMER (
    CUSTOMER_ID   INTEGER       NOT NULL,
    SSN           CHAR(11)      NOT NULL,
    FIRST_NAME    VARCHAR(50)   NOT NULL,
    LAST_NAME     VARCHAR(50)   NOT NULL,
    -- ... more columns ...
    CONSTRAINT PK_CUSTOMER PRIMARY KEY (CUSTOMER_ID)
);

The other candidate keys that are not chosen as the primary key are called alternate keys. You should still enforce their uniqueness with UNIQUE constraints:

ALTER TABLE MERIDIAN.CUSTOMER
    ADD CONSTRAINT UQ_CUSTOMER_SSN UNIQUE (SSN);

Surrogate Keys vs. Natural Keys

This is one of the most debated topics in database design, and you will encounter passionate advocates on both sides. Here is the practical reality.

A natural key is a candidate key drawn from the real-world data — like SSN, email address, or product SKU. A surrogate key is an artificial identifier generated by the system, typically an auto-incrementing integer or a sequence-generated value.

Arguments for surrogate keys: - They never change. An SSN can be corrected (data entry error), an email address can change, a product SKU can be reassigned. A surrogate key is immutable. - They are compact. A 4-byte INTEGER is smaller and faster to index and join than an 11-byte CHAR SSN or a 254-byte VARCHAR email. - They are meaningless — which is a feature, not a bug. No one will try to encode information in them or parse them.

Arguments for natural keys: - They carry meaning. When you see SSN 123-45-6789 in a foreign key column, you know what it refers to without a JOIN. - They can prevent the need for some JOINs (if the natural key is already in the referencing table, you may not need to look up the referenced table). - They enforce real-world uniqueness at the database level.

The Meridian Bank decision: We will use surrogate keys (CUSTOMER_ID, ACCOUNT_ID, BRANCH_ID) as primary keys, and enforce natural key uniqueness with UNIQUE constraints. This is the most common pattern in enterprise DB2 databases, and it is what we recommend.

In DB2 for LUW, you generate surrogate keys with:

CUSTOMER_ID INTEGER NOT NULL GENERATED ALWAYS AS IDENTITY
    (START WITH 1, INCREMENT BY 1)

In DB2 for z/OS, identity columns are also supported (since DB2 V8), or you can use a SEQUENCE object:

CREATE SEQUENCE MERIDIAN.SEQ_CUSTOMER_ID
    AS INTEGER
    START WITH 1
    INCREMENT BY 1
    NO MAXVALUE
    NO CYCLE
    CACHE 20;

Foreign Keys

A foreign key is a column (or set of columns) in one table that references the primary key of another table. It is the mechanism by which DB2 enforces referential integrity — the guarantee that relationships between tables are valid.

CREATE TABLE MERIDIAN.ACCOUNT (
    ACCOUNT_ID    INTEGER       NOT NULL,
    CUSTOMER_ID   INTEGER       NOT NULL,
    BRANCH_ID     INTEGER       NOT NULL,
    ACCOUNT_TYPE  CHAR(3)       NOT NULL,
    -- ... more columns ...
    CONSTRAINT PK_ACCOUNT PRIMARY KEY (ACCOUNT_ID),
    CONSTRAINT FK_ACCOUNT_CUSTOMER
        FOREIGN KEY (CUSTOMER_ID)
        REFERENCES MERIDIAN.CUSTOMER (CUSTOMER_ID)
        ON DELETE RESTRICT,
    CONSTRAINT FK_ACCOUNT_BRANCH
        FOREIGN KEY (BRANCH_ID)
        REFERENCES MERIDIAN.BRANCH (BRANCH_ID)
        ON DELETE RESTRICT
);

The ON DELETE RESTRICT clause tells DB2: if someone tries to delete a customer who still has accounts, refuse the deletion. Other options include:

ON DELETE CASCADE — delete the customer and automatically delete all their accounts. Powerful but dangerous in a banking system.
ON DELETE SET NULL — set the CUSTOMER_ID in the ACCOUNT rows to NULL. Rarely appropriate when the foreign key is NOT NULL.
ON DELETE NO ACTION — similar to RESTRICT, but the check happens at the end of the statement rather than immediately. The practical difference is subtle and matters mainly when triggers are involved.

[!THRESHOLD CONCEPT] Referential Integrity

Referential integrity is the idea that every foreign key value must correspond to an existing primary key value in the referenced table. This is a threshold concept — once you truly understand it, you will never design a database without it, and you will be horrified by databases that lack it.

Without referential integrity, you can have an ACCOUNT row with CUSTOMER_ID = 99999 when no customer 99999 exists. This is called an orphan row. Orphan rows corrupt reports, break application logic, and erode trust in the data. They are the relational equivalent of a broken hyperlink — except the consequences are financial.

At Meridian Bank, an orphan account row could mean a deposit gets credited to a non-existent customer, a loan payment goes untracked, or a regulatory report undercount accounts. DB2 enforces referential integrity in the engine, not in the application — and that is exactly where it belongs.

Composite Keys

A composite key (or compound key) is a key made up of two or more columns. They are most common in association tables (also called junction tables or bridge tables) that implement many-to-many relationships.

For example, if a customer can have multiple accounts and an account can have multiple authorized signers (joint accounts), you might create:

CREATE TABLE MERIDIAN.ACCOUNT_SIGNER (
    ACCOUNT_ID    INTEGER NOT NULL,
    CUSTOMER_ID   INTEGER NOT NULL,
    SIGNER_ROLE   CHAR(1) NOT NULL,  -- 'P' = Primary, 'J' = Joint
    CONSTRAINT PK_ACCOUNT_SIGNER
        PRIMARY KEY (ACCOUNT_ID, CUSTOMER_ID),
    CONSTRAINT FK_ACCSIGN_ACCOUNT
        FOREIGN KEY (ACCOUNT_ID)
        REFERENCES MERIDIAN.ACCOUNT (ACCOUNT_ID),
    CONSTRAINT FK_ACCSIGN_CUSTOMER
        FOREIGN KEY (CUSTOMER_ID)
        REFERENCES MERIDIAN.CUSTOMER (CUSTOMER_ID)
);

The primary key is (ACCOUNT_ID, CUSTOMER_ID) — neither column alone is unique (a customer can be a signer on multiple accounts, and an account can have multiple signers), but together they are.

Check Your Understanding — Section 2.4

A table has columns (A, B, C, D). The combination (A, B) is always unique, A alone is sometimes duplicated, and B alone is sometimes duplicated. Is (A, B) a candidate key? Why or why not?

Why would ON DELETE CASCADE on a foreign key from ACCOUNT to CUSTOMER be risky at a bank?

You discover that the TRANSACTION table has 47 rows where the ACCOUNT_ID does not match any row in the ACCOUNT table. What is this problem called, and what constraint would have prevented it?

2.5 Data Types in DB2

Choosing the right data type is not a cosmetic decision. It affects storage, performance, precision, and what operations you can perform on the data. DB2 offers a rich set of data types, and there are important differences between z/OS and LUW.

Numeric Types

Data Type	Storage	Range / Precision	Notes
SMALLINT	2 bytes	-32,768 to 32,767	Good for codes, flags, small counts
INTEGER (INT)	4 bytes	-2,147,483,648 to 2,147,483,647	Standard for surrogate keys
BIGINT	8 bytes	-9.2 quintillion to 9.2 quintillion	Transaction IDs, large sequence values
DECIMAL(p,s)	Varies	Up to 31 digits	Use for all monetary values. DECIMAL(15,2) for dollars and cents
REAL	4 bytes	~7 significant digits	Approximate — never for money
DOUBLE	8 bytes	~15 significant digits	Scientific calculations
DECFLOAT(16\|34)	8 or 16 bytes	Decimal floating point	[LUW] and [z/OS V9+] — useful for financial calculations needing large range

The cardinal rule: use DECIMAL for money, never REAL or DOUBLE. Floating-point arithmetic introduces rounding errors that are unacceptable in financial systems. At Meridian Bank, all monetary columns will be DECIMAL(15,2) — fifteen total digits, two after the decimal point, supporting values up to $9,999,999,999,999.99.

Character Types

Data Type	Maximum Length	Notes
CHAR(n)	254 bytes	Fixed-length. Padded with spaces. Use for codes and fixed-format data (state codes, account type codes)
VARCHAR(n)	32,672 bytes [LUW], 32,704 bytes [z/OS]	Variable-length. Use for names, addresses, descriptions
CLOB(n)	2 GB	Character Large Object. Use for documents, long text
GRAPHIC(n)	127 double-byte chars	Double-byte fixed-length [z/OS]
VARGRAPHIC(n)	16,336 double-byte chars	Double-byte variable-length [z/OS]
DBCLOB(n)	1 GB	Double-byte CLOB [z/OS]

CHAR vs. VARCHAR: Use CHAR when the length is truly fixed — state codes (CHAR(2)), account type codes (CHAR(3)), SSN format (CHAR(11)). Use VARCHAR for everything else. CHAR wastes space when values vary in length (a CHAR(50) column storing "Ed" wastes 48 bytes per row), and those wasted bytes add up in a 2-million-row table.

[z/OS] On z/OS, you will encounter GRAPHIC and VARGRAPHIC types for double-byte character set (DBCS) data, particularly in installations that support East Asian languages. On LUW, UTF-8 encoding handles multibyte characters in standard VARCHAR columns.

[LUW] DB2 for LUW supports a FOR BIT DATA clause on CHAR and VARCHAR columns, storing raw binary data without code page conversion. This is sometimes used for hashed values or encrypted data.

Date and Time Types

Data Type	Storage	Format	Notes
DATE	4 bytes	YYYY-MM-DD	Calendar date only
TIME	3 bytes	HH.MM.SS	Time of day only
TIMESTAMP	10-12 bytes	YYYY-MM-DD-HH.MM.SS.nnnnnnnnnnnn	Date, time, and fractional seconds

DB2 timestamps support up to 12 digits of fractional second precision (picoseconds) on z/OS V10+ and LUW 9.7+. For most applications, TIMESTAMP(6) (microseconds) is sufficient. At Meridian Bank, we will use TIMESTAMP(6) for all audit columns (CREATED_TIMESTAMP, LAST_UPDATED_TIMESTAMP) and transaction timestamps.

[z/OS] Date and time formats in DB2 for z/OS can be configured at the subsystem level (ISO, USA, EUR, JIS, or a local format). The ISO format (YYYY-MM-DD) is recommended for portability.

[LUW] DB2 for LUW also supports the TIMESTAMP WITH TIME ZONE attribute starting in DB2 11.1, which is valuable for applications spanning multiple time zones — as Meridian Bank's online banking system surely does.

Binary Types

Data Type	Maximum Length	Notes
BLOB(n)	2 GB	Binary Large Object — images, documents, encrypted data
BINARY(n)	254 bytes	Fixed-length binary [LUW 9.7+]
VARBINARY(n)	32,672 bytes	Variable-length binary [LUW 9.7+]

[z/OS] DB2 for z/OS supports BLOB but not BINARY or VARBINARY as standalone types. Use CHAR(n) FOR BIT DATA or VARCHAR(n) FOR BIT DATA for shorter binary values.

XML Type

DB2 supports a native XML data type on both z/OS and LUW, with full XQuery and SQL/XML support. This is useful for storing semi-structured data — regulatory filings, customer correspondence metadata, integration payloads — alongside relational data. We will explore XML in later chapters.

Why Does This Work? — Why DECIMAL for Money?

Try this thought experiment. The number 0.1 in decimal (one-tenth) cannot be represented exactly in binary floating point. It becomes a repeating binary fraction, like how 1/3 becomes 0.333... in decimal. So 0.1 + 0.1 + 0.1 in floating point does not equal 0.3 — it equals something like 0.30000000000000004.

Now imagine that rounding error multiplied across 500 million annual transactions at Meridian Bank. Pennies appear and disappear. Account balances do not reconcile. Regulatory audits flag discrepancies.

DECIMAL (also called packed decimal or fixed-point) stores numbers exactly as decimal digits — the way humans write them. DECIMAL(15,2) stores exactly two decimal places with no rounding. 0.1 + 0.1 + 0.1 equals exactly 0.3. This is why every financial database uses DECIMAL for monetary values, and why DB2's DECIMAL implementation — inherited from decades of mainframe financial computing — is one of its most trusted features.

2.6 Normalization — Organizing Data to Eliminate Redundancy

Normalization is the process of organizing tables and columns to reduce data redundancy and prevent data anomalies. It is one of the most important skills in database design, and it is where many designs go wrong — either by not normalizing enough (leading to data corruption) or by not understanding when to stop (leading to performance problems from excessive joins).

We will walk through each normal form with concrete Meridian Bank examples, showing the specific problems each form prevents.

The Unnormalized Mess

Imagine we start with a single flat table that stores everything about customer accounts:

CUSTOMER_ACCOUNT_FLAT
---
CUSTOMER_ID | CUSTOMER_NAME | SSN | ADDRESS | PHONE_NUMBERS | BRANCH_NAME | BRANCH_ADDRESS | ACCOUNT_NUM | ACCOUNT_TYPE | BALANCE | OPENED_DATE

This table has every problem in the book: - PHONE_NUMBERS stores multiple values in one cell ("555-0100, 555-0101"). - Customer data is duplicated for every account they hold. - Branch data is duplicated for every account at that branch. - If a customer changes their address, you must update every row for that customer. - If you delete the last account for a customer, you lose all their information.

These are called data anomalies, and they come in three flavors:

Update anomaly: Changing a customer's address requires updating multiple rows. If you miss one, the data is inconsistent.
Insert anomaly: You cannot record a new branch until at least one account exists at that branch (the branch data has no independent existence).
Delete anomaly: Deleting the last account at a branch destroys all information about that branch.

Normalization systematically eliminates these anomalies.

First Normal Form (1NF)

Rule: Every column must contain only atomic (indivisible) values. No repeating groups, no multi-valued fields.

Our flat table violates 1NF because PHONE_NUMBERS contains multiple values. To fix this:

CUSTOMER_PHONE
---
CUSTOMER_ID | PHONE_NUMBER | PHONE_TYPE
10001       | 555-0100     | HOME
10001       | 555-0101     | MOBILE
10002       | 555-0200     | HOME

Each phone number gets its own row in a separate table. The CUSTOMER table no longer stores phone numbers at all — instead, CUSTOMER_PHONE references CUSTOMER_ID.

What 1NF prevents: The inability to query or index individual values within a multi-valued field. Try writing a query to find all customers with a specific phone number when phone numbers are comma-separated in a single column. It requires string parsing, cannot use an index, and is fragile. With 1NF, it is a simple WHERE PHONE_NUMBER = '555-0100'.

Second Normal Form (2NF)

Rule: The table must be in 1NF, and every non-key column must depend on the entire primary key, not just part of it.

2NF matters when you have composite primary keys. Consider a table tracking which customers have which accounts at which branches:

CUSTOMER_ACCOUNT (Primary Key: CUSTOMER_ID, ACCOUNT_NUM)
---
CUSTOMER_ID | ACCOUNT_NUM | CUSTOMER_NAME | CUSTOMER_ADDRESS | ACCOUNT_TYPE | BALANCE | BRANCH_NAME

Here, CUSTOMER_NAME and CUSTOMER_ADDRESS depend only on CUSTOMER_ID, not on the full key (CUSTOMER_ID, ACCOUNT_NUM). This is a partial dependency, and it violates 2NF.

To fix it, decompose:

CUSTOMER (Primary Key: CUSTOMER_ID)
---
CUSTOMER_ID | CUSTOMER_NAME | CUSTOMER_ADDRESS

ACCOUNT (Primary Key: ACCOUNT_NUM)
---
ACCOUNT_NUM | CUSTOMER_ID | ACCOUNT_TYPE | BALANCE | BRANCH_NAME

What 2NF prevents: If Margaret Chen has three accounts and changes her address, you had to update three rows in the old design. In the normalized design, you update one row in the CUSTOMER table. The update anomaly is eliminated.

Third Normal Form (3NF)

Rule: The table must be in 2NF, and every non-key column must depend on the primary key directly — not through another non-key column. No transitive dependencies.

Look at the ACCOUNT table from above:

ACCOUNT (Primary Key: ACCOUNT_NUM)
---
ACCOUNT_NUM | CUSTOMER_ID | ACCOUNT_TYPE | BALANCE | BRANCH_NAME | BRANCH_ADDRESS | BRANCH_MANAGER

BRANCH_ADDRESS and BRANCH_MANAGER depend on BRANCH_NAME, which is not the primary key. The dependency chain is: ACCOUNT_NUM → BRANCH_NAME → BRANCH_ADDRESS. This is a transitive dependency.

To fix it:

ACCOUNT (Primary Key: ACCOUNT_NUM)
---
ACCOUNT_NUM | CUSTOMER_ID | ACCOUNT_TYPE | BALANCE | BRANCH_ID

BRANCH (Primary Key: BRANCH_ID)
---
BRANCH_ID | BRANCH_NAME | BRANCH_ADDRESS | BRANCH_MANAGER

What 3NF prevents: If the Downtown branch moves to a new address, you update one row in the BRANCH table instead of thousands of rows in the ACCOUNT table. If the last account at a branch is closed, the branch information survives independently.

Productive Struggle — Normalize This Table

Before reading further, try normalizing this table yourself. Identify the anomalies, the dependencies, and decompose it into 3NF.

``` EMPLOYEE_PROJECT

EMP_ID | EMP_NAME | DEPT_NAME | DEPT_LOCATION | PROJECT_ID | PROJECT_NAME | HOURS_WORKED ```

Primary key: (EMP_ID, PROJECT_ID)

Questions to guide you: - Which columns depend on only EMP_ID? (Partial dependency — violates 2NF) - Which columns depend on only PROJECT_ID? (Partial dependency — violates 2NF) - Among the columns that depend on EMP_ID, are there any transitive dependencies? (Violates 3NF) - What tables do you end up with?

Take five minutes with pencil and paper before moving on. The struggle of working through this yourself builds understanding that reading the answer never can.

Boyce-Codd Normal Form (BCNF)

Rule: For every functional dependency X → Y in the table, X must be a superkey (a candidate key or a superset of one).

BCNF is a stricter version of 3NF. They differ only in rare cases where a table has overlapping composite candidate keys. Here is a simplified example:

Suppose Meridian Bank assigns each customer to a personal banker, and each personal banker works at exactly one branch. Also suppose that each branch has unique personal banker names (no two bankers at the same branch share a name).

CUSTOMER_BANKER
---
CUSTOMER_ID | BANKER_NAME | BRANCH_ID

Candidate keys: (CUSTOMER_ID) — each customer has one banker. Also (BANKER_NAME, BRANCH_ID) — the banker name is unique within a branch.

There is a functional dependency: BANKER_NAME → BRANCH_ID (because each banker works at exactly one branch). But BANKER_NAME alone is not a superkey — it does not uniquely identify a row (two different branches might have a banker named "Smith"). This violates BCNF.

To fix it:

CUSTOMER_BANKER (Primary Key: CUSTOMER_ID)
---
CUSTOMER_ID | BANKER_NAME

BANKER_BRANCH (Primary Key: BANKER_NAME)
---
BANKER_NAME | BRANCH_ID

In practice, BCNF violations are uncommon, and most professionals aim for 3NF as the standard target. If you achieve 3NF and your design feels right, you are almost certainly in BCNF too.

Check Your Understanding — Section 2.6

You have a table where a customer's city name is stored in every order row. What type of anomaly occurs when the customer moves to a new city?

A table has a composite primary key (A, B) and a column C that depends only on A. Which normal form does this violate?

What is the practical difference between 3NF and BCNF? When would a working DBA care about the distinction?

2.7 Denormalization — When and Why to Break the Rules

If normalization is so beneficial, why would anyone deliberately denormalize? Because normalization optimizes for data integrity, not for query performance. And in a real system processing 500 million transactions per year, performance matters.

The Cost of Normalization

A perfectly normalized database might require five-way JOINs for common queries. Each JOIN is an operation that DB2's optimizer must plan and execute. On z/OS, where CPU time is metered and billed, excessive JOINs directly increase costs. On LUW, they increase response time and I/O.

Consider this query to get a customer statement at Meridian Bank in a fully normalized design:

SELECT c.FIRST_NAME, c.LAST_NAME,
       a.ACCOUNT_NUMBER, at.TYPE_DESCRIPTION,
       b.BRANCH_NAME, b.BRANCH_CITY,
       t.TRANSACTION_DATE, t.AMOUNT,
       tc.CATEGORY_DESCRIPTION
FROM CUSTOMER c
  JOIN ACCOUNT a ON c.CUSTOMER_ID = a.CUSTOMER_ID
  JOIN ACCOUNT_TYPE at ON a.ACCOUNT_TYPE_CODE = at.TYPE_CODE
  JOIN BRANCH b ON a.BRANCH_ID = b.BRANCH_ID
  JOIN TRANSACTION t ON a.ACCOUNT_ID = t.ACCOUNT_ID
  JOIN TRANSACTION_CATEGORY tc ON t.CATEGORY_CODE = tc.CATEGORY_CODE
WHERE c.CUSTOMER_ID = 10001
  AND t.TRANSACTION_DATE BETWEEN '2025-01-01' AND '2025-12-31'
ORDER BY t.TRANSACTION_DATE DESC;

Six tables joined. For an online query that must return in under a second, this is manageable with proper indexing. But for a batch report that runs across all 2 million customers? The accumulated cost of joining every row is significant.

Strategic Denormalization Techniques

1. Redundant columns. Store the branch name directly in the ACCOUNT table alongside the BRANCH_ID. This eliminates one JOIN for the most common query pattern. The trade-off: when a branch is renamed, you must update both the BRANCH table and every ACCOUNT row at that branch.

2. Summary tables. Create pre-computed aggregate tables. Instead of summing transactions every time someone asks for an account balance, maintain a DAILY_ACCOUNT_BALANCE table that is updated nightly. DB2 for LUW supports Materialized Query Tables (MQTs) — summary tables that DB2 can maintain automatically and use transparently when the optimizer recognizes that a query can be satisfied from the summary.

-- DB2 LUW Materialized Query Table
CREATE TABLE MERIDIAN.DAILY_BALANCE_SUMMARY AS (
    SELECT ACCOUNT_ID,
           DATE(TRANSACTION_TIMESTAMP) AS BALANCE_DATE,
           SUM(AMOUNT) AS DAILY_NET
    FROM MERIDIAN.TRANSACTION
    GROUP BY ACCOUNT_ID, DATE(TRANSACTION_TIMESTAMP)
) DATA INITIALLY DEFERRED REFRESH DEFERRED;

[z/OS] DB2 for z/OS supports MQTs as well, and the optimizer is particularly sophisticated about routing queries to them automatically when CURRENT REFRESH AGE is set appropriately.

3. Derived columns. Store a calculated value like ACCOUNT_BALANCE directly, updated by triggers or application logic, rather than computing it from the TRANSACTION table every time. This is extremely common in banking systems — the alternative (summing every transaction since account opening) becomes slower every year.

The Denormalization Decision Framework

Before denormalizing, ask:

Is there a proven performance problem? Never denormalize prophylactically. Measure first.
Can the problem be solved with indexes, MQTs, or query rewriting? These are less invasive.
What is the update-to-read ratio? If the denormalized data is read 10,000 times for every update, the redundancy pays for itself. If it is updated frequently, the maintenance cost may outweigh the read benefit.
Can you ensure consistency? If you denormalize, you need a mechanism (triggers, application logic, batch processes) to keep the redundant data in sync. What happens if that mechanism fails?
Document the decision. Future developers (and future you) need to know why this table is denormalized and how consistency is maintained.

At Meridian Bank, our initial design will be fully normalized to 3NF. We will denormalize strategically in later chapters when we have performance data to guide decisions.

2.8 Relational Algebra Intuition

SQL is a declarative language — you state what you want. Under the hood, DB2 translates your SQL into operations from relational algebra, the mathematical foundation of the relational model. You do not need to be a mathematician to use DB2, but understanding these operations builds intuition about what your queries are doing and why some are fast and others are slow.

The Three Core Operations

1. SELECT (Restriction / σ)

Not the SQL SELECT statement — in relational algebra, SELECT means choosing rows that satisfy a condition. In SQL, this maps to the WHERE clause.

Relational algebra: σ(BALANCE > 50000)(ACCOUNT) SQL: SELECT * FROM ACCOUNT WHERE BALANCE > 50000

Think of it as a horizontal slice through the table — you keep some rows, discard others.

2. PROJECT (Projection / π)

PROJECT means choosing columns — keeping only the attributes you need and discarding the rest. In SQL, this maps to the column list in the SELECT clause.

Relational algebra: π(CUSTOMER_NAME, PHONE)(CUSTOMER) SQL: SELECT CUSTOMER_NAME, PHONE FROM CUSTOMER

Think of it as a vertical slice — you keep some columns, discard others.

3. JOIN (⋈)

JOIN combines rows from two tables based on a condition, typically matching a foreign key in one table to a primary key in another.

SQL:

SELECT c.CUSTOMER_NAME, a.ACCOUNT_NUMBER, a.BALANCE
FROM CUSTOMER c
  JOIN ACCOUNT a ON c.CUSTOMER_ID = a.CUSTOMER_ID;

Think of JOIN as a bridge between tables. The condition (c.CUSTOMER_ID = a.CUSTOMER_ID) is the bridge's foundation. Without it, you get a Cartesian product — every customer matched with every account — which is almost never what you want. (A Cartesian product of 2 million customers and 5 million accounts is 10 trillion rows. Your query will not finish.)

The Venn Diagram Mental Model

JOINs are often visualized with Venn diagrams, representing two overlapping circles (one for each table):

INNER JOIN: Only the overlap — rows that have matching keys in both tables. Customers who have accounts; accounts that have customers.
LEFT OUTER JOIN: All rows from the left table, plus matching rows from the right. All customers, whether or not they have accounts. Non-matching rows get NULLs for the right table's columns.
RIGHT OUTER JOIN: All rows from the right table, plus matching rows from the left.
FULL OUTER JOIN: All rows from both tables. Non-matching rows on either side get NULLs.

For Meridian Bank, INNER JOIN is the workhorse — almost every query joins customers to accounts, accounts to transactions, accounts to branches, and all of these relationships should have matching rows. LEFT OUTER JOIN is useful for finding orphans or including optional data (customers who may or may not have a mailing address on file).

Set Operations

Relational algebra also includes set operations:

UNION: Combine rows from two compatible tables, eliminating duplicates. (Use UNION ALL in DB2 to keep duplicates — it is faster because no sorting is required.)
INTERSECT: Rows that appear in both result sets.
EXCEPT (called MINUS in some databases): Rows in the first result set that are not in the second.

These are useful at Meridian for queries like "customers who have a savings account but not a checking account" (EXCEPT) or "all transactions from all account types in a single list" (UNION ALL).

Check Your Understanding — Section 2.8

In relational algebra terms, what SQL clause performs the SELECT (restriction) operation?

You join a 2-million-row CUSTOMER table with a 5-million-row ACCOUNT table. If every customer has at least one account and every account belongs to exactly one customer, approximately how many rows does the INNER JOIN produce?

When would you use a LEFT OUTER JOIN instead of an INNER JOIN at Meridian Bank?

2.9 The Meridian Bank Core Tables

Let us put everything together and design the first three core tables for Meridian National Bank. These will serve as the foundation for the entire book's project.

Design Principles

Before writing DDL, we establish conventions:

Schema: All Meridian tables live in the MERIDIAN schema.
Naming: Table names are singular nouns in UPPER_CASE. Column names are UPPER_CASE with underscores. Primary keys follow the pattern TABLE_ID (e.g., CUSTOMER_ID).
Surrogate keys: All entity tables use INTEGER identity columns as primary keys.
Audit columns: Every table includes CREATED_TIMESTAMP and LAST_UPDATED_TIMESTAMP.
Natural key enforcement: Business identifiers (SSN, account numbers, branch codes) get UNIQUE constraints.
Referential integrity: All relationships enforced with foreign keys, ON DELETE RESTRICT.

The BRANCH Table

A branch is a physical location of Meridian National Bank. This is a reference table — it changes infrequently and is referenced by many other tables.

CREATE TABLE MERIDIAN.BRANCH (
    BRANCH_ID           INTEGER       NOT NULL
                        GENERATED ALWAYS AS IDENTITY
                        (START WITH 1, INCREMENT BY 1),
    BRANCH_CODE         CHAR(6)       NOT NULL,
    BRANCH_NAME         VARCHAR(100)  NOT NULL,
    BRANCH_ADDRESS      VARCHAR(200)  NOT NULL,
    BRANCH_CITY         VARCHAR(50)   NOT NULL,
    BRANCH_STATE        CHAR(2)       NOT NULL,
    BRANCH_ZIP          CHAR(10)      NOT NULL,
    BRANCH_PHONE        VARCHAR(20),
    BRANCH_MANAGER_NAME VARCHAR(100),
    OPENED_DATE         DATE          NOT NULL,
    CLOSED_DATE         DATE,
    IS_ACTIVE           SMALLINT      NOT NULL DEFAULT 1,
    CREATED_TIMESTAMP     TIMESTAMP(6) NOT NULL DEFAULT CURRENT_TIMESTAMP,
    LAST_UPDATED_TIMESTAMP TIMESTAMP(6) NOT NULL DEFAULT CURRENT_TIMESTAMP,
    CONSTRAINT PK_BRANCH PRIMARY KEY (BRANCH_ID),
    CONSTRAINT UQ_BRANCH_CODE UNIQUE (BRANCH_CODE),
    CONSTRAINT CK_BRANCH_STATE CHECK (LENGTH(BRANCH_STATE) = 2),
    CONSTRAINT CK_BRANCH_ACTIVE CHECK (IS_ACTIVE IN (0, 1))
);

Notice the design decisions: - BRANCH_CODE is the human-readable identifier ("DWN001"), enforced unique. BRANCH_ID is the surrogate key used in foreign key references. - CLOSED_DATE is nullable — most branches are not closed. - IS_ACTIVE is a SMALLINT flag rather than a BOOLEAN because DB2 for z/OS does not natively support a BOOLEAN data type. (DB2 LUW added BOOLEAN in version 11.1, but using SMALLINT ensures cross-platform compatibility.)

The CUSTOMER Table

A customer is a person or entity that holds one or more accounts. This is a high-volume table — 2 million rows at Meridian, with thousands of updates daily (address changes, phone updates, new customers).

CREATE TABLE MERIDIAN.CUSTOMER (
    CUSTOMER_ID         INTEGER       NOT NULL
                        GENERATED ALWAYS AS IDENTITY
                        (START WITH 100001, INCREMENT BY 1),
    SSN                 CHAR(11)      NOT NULL,
    FIRST_NAME          VARCHAR(50)   NOT NULL,
    MIDDLE_NAME         VARCHAR(50),
    LAST_NAME           VARCHAR(50)   NOT NULL,
    DATE_OF_BIRTH       DATE          NOT NULL,
    EMAIL               VARCHAR(254),
    PHONE_PRIMARY       VARCHAR(20),
    ADDRESS_LINE1       VARCHAR(200)  NOT NULL,
    ADDRESS_LINE2       VARCHAR(200),
    CITY                VARCHAR(50)   NOT NULL,
    STATE_CODE          CHAR(2)       NOT NULL,
    ZIP_CODE            CHAR(10)      NOT NULL,
    CUSTOMER_SINCE      DATE          NOT NULL DEFAULT CURRENT DATE,
    CUSTOMER_STATUS     CHAR(1)       NOT NULL DEFAULT 'A',
    CREDIT_SCORE        SMALLINT,
    HOME_BRANCH_ID      INTEGER,
    CREATED_TIMESTAMP     TIMESTAMP(6) NOT NULL DEFAULT CURRENT_TIMESTAMP,
    LAST_UPDATED_TIMESTAMP TIMESTAMP(6) NOT NULL DEFAULT CURRENT_TIMESTAMP,
    CONSTRAINT PK_CUSTOMER PRIMARY KEY (CUSTOMER_ID),
    CONSTRAINT UQ_CUSTOMER_SSN UNIQUE (SSN),
    CONSTRAINT FK_CUSTOMER_BRANCH
        FOREIGN KEY (HOME_BRANCH_ID)
        REFERENCES MERIDIAN.BRANCH (BRANCH_ID)
        ON DELETE SET NULL,
    CONSTRAINT CK_CUSTOMER_STATUS CHECK (CUSTOMER_STATUS IN ('A','I','C','S')),
    CONSTRAINT CK_CUSTOMER_SSN_FMT CHECK (SSN LIKE '___-__-____')
);

Key decisions: - HOME_BRANCH_ID is the customer's primary branch. It is nullable (some customers may not have a home branch — online-only customers, perhaps) and uses ON DELETE SET NULL (if a branch closes, we do not delete the customer). - CUSTOMER_STATUS: 'A' = Active, 'I' = Inactive, 'C' = Closed, 'S' = Suspended. A CHECK constraint limits valid values. - SSN format is enforced with a CHECK constraint using the LIKE pattern. This is 1NF in action — the SSN is stored in a single, consistent format. - CREDIT_SCORE is nullable — not all customers have one, and we may not have it at account opening.

The ACCOUNT Table

An account is a financial product held by a customer at a branch. This is the central transaction table — every deposit, withdrawal, and transfer references an account.

CREATE TABLE MERIDIAN.ACCOUNT (
    ACCOUNT_ID          INTEGER       NOT NULL
                        GENERATED ALWAYS AS IDENTITY
                        (START WITH 1000001, INCREMENT BY 1),
    ACCOUNT_NUMBER      CHAR(12)      NOT NULL,
    CUSTOMER_ID         INTEGER       NOT NULL,
    BRANCH_ID           INTEGER       NOT NULL,
    ACCOUNT_TYPE        CHAR(3)       NOT NULL,
    ACCOUNT_STATUS      CHAR(1)       NOT NULL DEFAULT 'A',
    CURRENT_BALANCE     DECIMAL(15,2) NOT NULL DEFAULT 0.00,
    AVAILABLE_BALANCE   DECIMAL(15,2) NOT NULL DEFAULT 0.00,
    INTEREST_RATE       DECIMAL(7,4),
    OPENED_DATE         DATE          NOT NULL DEFAULT CURRENT DATE,
    CLOSED_DATE         DATE,
    LAST_ACTIVITY_DATE  DATE,
    OVERDRAFT_LIMIT     DECIMAL(15,2) NOT NULL DEFAULT 0.00,
    CREATED_TIMESTAMP     TIMESTAMP(6) NOT NULL DEFAULT CURRENT_TIMESTAMP,
    LAST_UPDATED_TIMESTAMP TIMESTAMP(6) NOT NULL DEFAULT CURRENT_TIMESTAMP,
    CONSTRAINT PK_ACCOUNT PRIMARY KEY (ACCOUNT_ID),
    CONSTRAINT UQ_ACCOUNT_NUMBER UNIQUE (ACCOUNT_NUMBER),
    CONSTRAINT FK_ACCOUNT_CUSTOMER
        FOREIGN KEY (CUSTOMER_ID)
        REFERENCES MERIDIAN.CUSTOMER (CUSTOMER_ID)
        ON DELETE RESTRICT,
    CONSTRAINT FK_ACCOUNT_BRANCH
        FOREIGN KEY (BRANCH_ID)
        REFERENCES MERIDIAN.BRANCH (BRANCH_ID)
        ON DELETE RESTRICT,
    CONSTRAINT CK_ACCOUNT_TYPE CHECK (ACCOUNT_TYPE IN ('CHK','SAV','MMA','CDA','LON')),
    CONSTRAINT CK_ACCOUNT_STATUS CHECK (ACCOUNT_STATUS IN ('A','I','C','F')),
    CONSTRAINT CK_ACCOUNT_BALANCE CHECK (CURRENT_BALANCE >= -OVERDRAFT_LIMIT)
);

Key decisions: - ACCOUNT_NUMBER is the customer-facing identifier (printed on checks, shown on statements). ACCOUNT_ID is the internal surrogate key. Never expose surrogate keys to customers. - CURRENT_BALANCE is a denormalized column — it could be derived by summing all transactions. We store it directly for performance, accepting the responsibility to keep it in sync. - INTEREST_RATE uses DECIMAL(7,4) — rates like 4.7500% (four digits after the decimal point, up to 999.9999%). - ACCOUNT_TYPE codes: CHK = Checking, SAV = Savings, MMA = Money Market, CDA = Certificate of Deposit, LON = Loan. - The CHECK constraint on CURRENT_BALANCE ensures the balance never drops below the negative of the overdraft limit — a business rule enforced at the database level. - ON DELETE RESTRICT on both foreign keys: you cannot delete a customer or branch while accounts exist. This is the correct choice for a bank — accounts must be explicitly closed, not cascade-deleted.

The Relationship Diagram

                    ┌──────────────┐
                    │   BRANCH     │
                    ├──────────────┤
                    │ BRANCH_ID PK │
                    │ BRANCH_CODE  │
                    │ BRANCH_NAME  │
                    │ ...          │
                    └──────┬───────┘
                           │
              ┌────────────┼────────────┐
              │ 1          │ 1          │
              │            │            │
              │ *          │ *          │
    ┌─────────┴──┐   ┌────┴─────────┐
    │  CUSTOMER  │   │   ACCOUNT    │
    ├────────────┤   ├──────────────┤
    │CUSTOMER_ID │──<│ CUSTOMER_ID  │
    │ SSN        │ 1 │ ACCOUNT_ID PK│
    │ FIRST_NAME │   │ BRANCH_ID FK │
    │ ...        │   │ ...          │
    └────────────┘   └──────────────┘
         1                    *

A BRANCH has many CUSTOMERs (as their home branch) and many ACCOUNTs. A CUSTOMER has many ACCOUNTs. An ACCOUNT belongs to one CUSTOMER and one BRANCH.

This is a clean, normalized design in 3NF. Every non-key column depends directly on the primary key. There are no transitive dependencies. The only denormalized element is CURRENT_BALANCE in the ACCOUNT table, which we have documented and justified.

2.10 Constraints and Integrity — DB2 as Your Safety Net

Constraints are the relational model's enforcement mechanism. They are not suggestions — they are rules that DB2 will enforce on every INSERT, UPDATE, and DELETE, regardless of which application, utility, or user initiates the change. This is one of the most important advantages of enforcing business rules in the database rather than in application code: the rules apply universally.

NOT NULL

The simplest constraint, and one of the most important. A NOT NULL column must have a value in every row. No exceptions.

FIRST_NAME VARCHAR(50) NOT NULL

When should a column be NOT NULL? When a NULL value would be meaningless or dangerous. A customer without a name? That should not exist. An account without a customer? That is an orphan. An account without a balance? That is a data error.

When should a column be nullable? When the absence of a value is a legitimate state. A customer without a middle name. An account without a closed date (it is still open). A customer without a credit score (not yet assessed).

Be aggressive with NOT NULL. Allowing NULLs where they should not exist is one of the most common design mistakes, and it leads to a lifetime of IS NOT NULL checks scattered across application code and reporting queries.

UNIQUE

Ensures that no two rows have the same value (or combination of values) in the specified columns. DB2 automatically creates a unique index to enforce this.

CONSTRAINT UQ_CUSTOMER_SSN UNIQUE (SSN)

UNIQUE allows NULL values — and in DB2, multiple rows can have NULL in a UNIQUE column. This is because NULL is "unknown," and two unknown values are not considered equal. If you need a column to be both unique and non-null, declare it with both constraints.

[z/OS] On DB2 for z/OS, you can specify UNIQUE WHERE NOT NULL behavior using standard UNIQUE constraints — NULLs are permitted and do not conflict.

[LUW] On DB2 for LUW, the behavior is the same by default. However, if you create the unique index explicitly with EXCLUDE NULL KEYS, NULL values will not be stored in the index at all.

CHECK

Enforces a condition that every row must satisfy. CHECK constraints can reference any column in the same row (but not other rows or tables).

CONSTRAINT CK_ACCOUNT_TYPE
    CHECK (ACCOUNT_TYPE IN ('CHK','SAV','MMA','CDA','LON'))

CONSTRAINT CK_CUSTOMER_SSN_FMT
    CHECK (SSN LIKE '___-__-____')

CONSTRAINT CK_ACCOUNT_BALANCE
    CHECK (CURRENT_BALANCE >= -OVERDRAFT_LIMIT)

The third example is particularly powerful — it references two columns in the same row, enforcing a business rule that the balance cannot exceed the overdraft limit. Without this constraint, the rule would need to be enforced in every application that modifies the balance. With it, DB2 enforces it everywhere, always.

[z/OS] DB2 for z/OS CHECK constraints can include a limited set of functions (basic arithmetic, SUBSTR, LENGTH, etc.). Complex conditions may require a trigger instead.

[LUW] DB2 for LUW supports a broader set of built-in functions within CHECK constraints.

PRIMARY KEY and FOREIGN KEY

We covered these in Section 2.4. To summarize the DB2 implementation:

PRIMARY KEY creates a unique index (if needed), enforces NOT NULL, and serves as the default target for foreign key references.
FOREIGN KEY enforces referential integrity, with configurable actions for DELETE (RESTRICT, CASCADE, SET NULL, NO ACTION) and UPDATE (RESTRICT, NO ACTION).

[z/OS] On DB2 for z/OS, referential integrity can also be enforced through CHECK PENDING status — if a utility loads data that may violate a constraint, the table is placed in CHECK PENDING until the DBA runs the CHECK utility to verify or correct the data. This is a production reality you will encounter in z/OS shops.

[LUW] On DB2 for LUW, you can create informational constraints — foreign keys that are declared but not enforced (NOT ENFORCED). These are used when the application guarantees integrity and you want the optimizer to have the constraint information for query optimization without the overhead of enforcement. Use with extreme caution.

Integrity in Practice

At Meridian Bank, our constraint strategy is:

Every table has a primary key. No exceptions.
Every relationship has a foreign key. No exceptions.
Every column that must have a value is NOT NULL. No exceptions.
Every column with a fixed set of valid values has a CHECK constraint.
Every natural identifier (SSN, account number, branch code) has a UNIQUE constraint.

This may seem like overhead — and it is, in the sense that every INSERT and UPDATE incurs constraint checking. But the cost is tiny compared to the cost of bad data. One orphan account row that goes undetected for a month can trigger a regulatory inquiry that costs more than a year of constraint checking overhead.

Check Your Understanding — Section 2.10

Why is NOT NULL more important than it appears? What is the long-term cost of allowing NULLs in a column that should not have them?

Can two rows in a DB2 table both have NULL in a column that has a UNIQUE constraint? Why?

A junior developer suggests removing foreign key constraints "because they slow down inserts." How would you respond?

Spaced Review — Chapter 1

Before we close this chapter, let us revisit key concepts from Chapter 1 to strengthen retention.

DB2 families: Name the two main DB2 product families and the primary operating environments for each. (Chapter 1, Section 1.2)
Meridian National Bank: What are the three key metrics that characterize Meridian's scale? (2 million customers, 500 million annual transactions, mid-size regional bank.)
Why DB2 for banking: What properties of DB2 make it particularly suited to financial institutions? (ACID compliance, z/OS integration, proven reliability, regulatory acceptance.)
Historical context: In what decade was the relational model first proposed, and who proposed it? (1970, Edgar F. Codd — and now you know the full story from this chapter.)

Bringing It Together

We have covered enormous ground in this chapter. Let us trace the arc from beginning to end.

We started with the mess — a spreadsheet with duplicated data, inconsistent formats, and no enforced relationships. We learned why the hierarchical and network models of the 1960s shared similar problems, binding physical storage to logical access in ways that made change expensive and errors inevitable.

Codd's relational model severed that bond. By representing data as mathematical relations, accessed through a declarative language (SQL), and managed by an intelligent optimizer, the relational model gave us physical data independence, logical data independence, and a principled approach to data integrity.

We learned the building blocks — tables (relations), rows (tuples), and columns (attributes) — and the keys that connect them: candidate keys, primary keys, foreign keys, and composite keys. We saw why surrogate keys work well in practice, and why foreign keys enforcing referential integrity are not optional in a serious database.

We walked through normalization — 1NF through BCNF — with concrete Meridian Bank examples, seeing how each normal form eliminates specific data anomalies. And we acknowledged that denormalization has its place, but only when driven by measured performance needs, documented carefully, and maintained rigorously.

We built the first three tables of the Meridian Bank database: BRANCH, CUSTOMER, and ACCOUNT. These tables are in 3NF, with one documented denormalization (CURRENT_BALANCE). They have primary keys, foreign keys, CHECK constraints, and NOT NULL constraints. They follow consistent naming conventions and support both DB2 for z/OS and DB2 for LUW.

This is how DB2 thinks about data. Not as files, not as spreadsheets, not as trees of records, but as relations — sets of tuples with defined attributes, connected by keys, protected by constraints, and accessed through the expressive power of SQL.

In Chapter 3, we will start writing SQL against these tables — SELECT, INSERT, UPDATE, DELETE — and learn the language that brings the relational model to life.

Key Terms Introduced in This Chapter

Term	Definition
Relation	A table — a set of tuples with the same attributes
Tuple	A row — one record in a table
Attribute	A column — one named field in a table
Domain	The set of valid values for an attribute (implemented by data type + constraints)
Candidate key	A minimal set of columns that uniquely identifies every row
Primary key	The designated candidate key for a table
Foreign key	A column referencing a primary key in another table
Surrogate key	An artificial system-generated key (e.g., identity column)
Natural key	A key derived from real-world data (e.g., SSN)
Composite key	A key consisting of two or more columns
Referential integrity	The guarantee that every foreign key matches a valid primary key
Normalization	The process of organizing data to reduce redundancy and prevent anomalies
1NF	No multi-valued columns; all values are atomic
2NF	No partial dependencies on a composite primary key
3NF	No transitive dependencies (non-key → non-key)
BCNF	Every determinant is a superkey
Denormalization	Deliberately introducing redundancy for performance
Data anomaly	An inconsistency caused by redundant data (update, insert, or delete anomaly)
Cardinality	The number of rows in a table
Degree	The number of columns in a table
MQT	Materialized Query Table — a pre-computed summary table maintained by DB2

Prerequisites

Learning Objectives

In This Chapter

Chapter 2: The Relational Model — How DB2 Thinks About Data

2.1 Before Relational — Hierarchical and Network Databases

The Hierarchical Model and IMS

The Network Model and CODASYL

Why Relational Won

2.2 Codd's Vision — The Relational Model

The Core Idea

Codd's Twelve Rules

2.3 Tables, Rows, and Columns — The Building Blocks

Relations (Tables)

Tuples (Rows)

Attributes (Columns)

The Degree and Cardinality of a Table

2.4 Keys — The Glue That Holds It All Together

Candidate Keys

Primary Keys

Surrogate Keys vs. Natural Keys

Foreign Keys

Composite Keys

2.5 Data Types in DB2

Numeric Types

Character Types

Date and Time Types

Binary Types

XML Type

2.6 Normalization — Organizing Data to Eliminate Redundancy

The Unnormalized Mess

First Normal Form (1NF)

Second Normal Form (2NF)

Third Normal Form (3NF)

Boyce-Codd Normal Form (BCNF)

2.7 Denormalization — When and Why to Break the Rules

The Cost of Normalization

Strategic Denormalization Techniques

The Denormalization Decision Framework

2.8 Relational Algebra Intuition

The Three Core Operations

The Venn Diagram Mental Model

Set Operations

2.9 The Meridian Bank Core Tables

Design Principles

The BRANCH Table

The CUSTOMER Table

The ACCOUNT Table

The Relationship Diagram

2.10 Constraints and Integrity — DB2 as Your Safety Net

NOT NULL

UNIQUE

CHECK

PRIMARY KEY and FOREIGN KEY

Integrity in Practice

Spaced Review — Chapter 1

Bringing It Together

Key Terms Introduced in This Chapter

Related Reading