Chapter 6: Joining Tables — INNER, OUTER, CROSS, and LATERAL

34 min read

> "Data lives in separate tables for good reason. Joins bring it together."

In This Chapter

6.1 Why Joins Exist — The Relational Design Trade-Off
6.2 INNER JOIN — The Workhorse
6.3 Table Aliases — Writing Readable Join Queries
6.4 LEFT OUTER JOIN — When the Right Side Might Be Missing
6.5 RIGHT OUTER JOIN and FULL OUTER JOIN
6.6 CROSS JOIN — The Cartesian Product
6.7 LATERAL Joins and TABLE Functions
6.8 Joining Three or More Tables
6.9 Join Conditions vs. WHERE Conditions
6.10 Common Join Mistakes
6.11 Meridian Bank Join Queries
6.12 A Preview: How the Optimizer Processes Joins
Spaced Review — Chapters 1 Through 5
Chapter Summary

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

Chapter 6: Joining Tables — INNER, OUTER, CROSS, and LATERAL

"Data lives in separate tables for good reason. Joins bring it together."

In Chapter 5 you learned to query a single table with precision — filtering rows, sorting results, transforming columns with scalar functions, and handling NULLs. Every query targeted one table at a time. That constraint ends now.

Relational databases spread data across many tables by design. A customer's name lives in CUSTOMER. Their account balances live in ACCOUNT. The branch where they opened that account lives in BRANCH. The transactions they made live in TRANSACTION. The employee who helped them lives in EMPLOYEE. No single table tells the whole story. Joins are how you reassemble the pieces.

This chapter is the pivot point of SQL fluency. Everything that follows — subqueries, aggregation over multiple entities, reporting, analytics — depends on your ability to join tables correctly and efficiently. We will cover every join type DB2 supports, show exactly when each one is appropriate, and work through realistic scenarios using our Meridian National Bank schema.

By the end of this chapter you will be able to:

Explain why normalization requires joins and articulate the trade-offs
Write INNER JOIN queries with single-column, multi-column, and non-equi join conditions
Choose correctly among LEFT, RIGHT, and FULL OUTER JOIN based on business requirements
Apply CROSS JOIN deliberately for Cartesian products (and recognize when you have created one by accident)
Use LATERAL joins for correlated subquery-like behavior
Join three, four, or more tables in a single query with confidence
Distinguish join conditions from WHERE conditions, especially in outer joins
Recognize the most common join mistakes before they reach production

6.1 Why Joins Exist — The Relational Design Trade-Off

Before we write a single join, let us understand why data is split across tables in the first place.

The Redundancy Problem

Consider what happens when you try to store everything in a single flat table. In the early days of computing — before relational databases — this is exactly how data was organized: one giant file with every piece of information on every record. Some organizations still maintain spreadsheets this way, and the problems become apparent quickly.

Imagine storing everything about a customer's transactions in one massive table:

CUST_ID	CUST_NAME	CUST_ADDRESS	ACCT_ID	ACCT_TYPE	BRANCH_NAME	BRANCH_CITY	TXN_ID	TXN_DATE	TXN_AMT
1001	Alice Morgan	42 Elm St, Chicago	5001	CHECKING	Downtown Branch	Chicago	90001	2024-01-15	250.00
1001	Alice Morgan	42 Elm St, Chicago	5001	CHECKING	Downtown Branch	Chicago	90002	2024-01-16	-45.00
1001	Alice Morgan	42 Elm St, Chicago	5001	CHECKING	Downtown Branch	Chicago	90003	2024-01-17	1200.00
1001	Alice Morgan	42 Elm St, Chicago	5002	SAVINGS	Downtown Branch	Chicago	90004	2024-02-01	5000.00

Alice's name, address, account type, and branch information are repeated on every single transaction row. If Alice has 500 transactions across two accounts, her name appears 500 times. Her branch city appears 500 times. This is called update anomaly territory: if Alice moves to a new address, you must update 500 rows — and if you miss one, your data is inconsistent.

The problems with this flat structure go beyond storage waste:

Update anomalies: Change Alice's address and you must find every row. Miss one and you have contradictory data.
Insert anomalies: You cannot record a new branch until someone opens an account there — the branch has no independent existence.
Delete anomalies: If you delete Alice's last transaction, you lose her customer information entirely.

These are the classic motivations for normalization, and they explain why every well-designed relational database distributes data across multiple tables.

The Normalized Solution

Relational database design (normalization, covered in depth in Chapter 16) solves this by storing each fact exactly once:

CUSTOMER stores Alice's name and address once.
ACCOUNT stores each account once, linking to CUSTOMER via CUSTOMER_ID.
BRANCH stores each branch once, linked from ACCOUNT via BRANCH_ID.
TRANSACTION stores each transaction once, linking to ACCOUNT via ACCOUNT_ID.

This eliminates redundancy, but it creates a new requirement: when you need a report that shows customer names alongside their transaction amounts and branch locations, you must join the tables back together.

The Meridian National Bank Schema (Reminder)

Here are the tables we have been building since Chapter 2, shown with their primary and foreign keys:

CUSTOMER (CUSTOMER_ID PK, FIRST_NAME, LAST_NAME, DATE_OF_BIRTH, EMAIL, PHONE, ADDRESS, CITY, STATE, ZIP_CODE, CREATED_DATE)

ACCOUNT (ACCOUNT_ID PK, CUSTOMER_ID FK→CUSTOMER, BRANCH_ID FK→BRANCH, ACCOUNT_TYPE, BALANCE, OPENED_DATE, STATUS)

BRANCH (BRANCH_ID PK, BRANCH_NAME, ADDRESS, CITY, STATE, ZIP_CODE, PHONE, MANAGER_ID FK→EMPLOYEE)

TRANSACTION (TXN_ID PK, ACCOUNT_ID FK→ACCOUNT, TXN_TYPE, AMOUNT, TXN_DATE, DESCRIPTION, RELATED_ACCOUNT_ID FK→ACCOUNT)

EMPLOYEE (EMPLOYEE_ID PK, FIRST_NAME, LAST_NAME, BRANCH_ID FK→BRANCH, POSITION, HIRE_DATE, SALARY, MANAGER_EMPLOYEE_ID FK→EMPLOYEE)

The foreign key arrows are your roadmap for joins. Every FK→PK relationship is a natural join path. Keep this schema in front of you as we work through the chapter.

The Join Mental Model

Here is a way to think about joins that will serve you throughout your career: a join is a question about relationships. When you write CUSTOMER INNER JOIN ACCOUNT ON c.CUSTOMER_ID = a.CUSTOMER_ID, you are asking "for each customer, what accounts do they have?" When you write ACCOUNT LEFT OUTER JOIN TRANSACTION ON a.ACCOUNT_ID = t.ACCOUNT_ID, you are asking "for each account, what transactions exist — and show me the account even if the answer is none."

Every join answers a relationship question. The join type determines what happens when one side of the relationship is empty.

6.2 INNER JOIN — The Workhorse

The INNER JOIN is the most common join type. It returns only the rows where matching values exist in both tables. If a row in the left table has no match in the right table, it is excluded. If a row in the right table has no match in the left table, it is excluded.

Basic Syntax

SELECT columns
FROM table_a
INNER JOIN table_b
    ON table_a.join_column = table_b.join_column;

The INNER keyword is optional — a plain JOIN defaults to INNER JOIN in DB2 (and in the SQL standard). However, we will always write INNER JOIN explicitly for clarity.

Your First Join: Customers and Their Accounts

Business question: Show every customer alongside their account information.

SELECT c.CUSTOMER_ID,
       c.FIRST_NAME,
       c.LAST_NAME,
       a.ACCOUNT_ID,
       a.ACCOUNT_TYPE,
       a.BALANCE
FROM CUSTOMER c
INNER JOIN ACCOUNT a
    ON c.CUSTOMER_ID = a.CUSTOMER_ID;

Expected output:

CUSTOMER_ID  FIRST_NAME  LAST_NAME  ACCOUNT_ID  ACCOUNT_TYPE  BALANCE
-----------  ----------  ---------  ----------  ------------  ----------
1001         Alice       Morgan     5001        CHECKING       4,250.00
1001         Alice       Morgan     5002        SAVINGS       12,800.00
1002         Brian       Chen       5003        CHECKING       1,100.00
1003         Clara       Davis      5004        CHECKING       8,900.00
1003         Clara       Davis      5005        SAVINGS       45,000.00
1003         Clara       Davis      5006        CD            25,000.00
1004         David       Kim        5007        CHECKING         320.00

Notice that Alice appears twice (she has two accounts) and Clara appears three times (she has three accounts). This is expected and correct — the join produces one row for every matching pair.

This one-to-many relationship is fundamental. The CUSTOMER table has a one-to-many relationship with ACCOUNT (one customer can own many accounts). When you join them, the "one" side (CUSTOMER) is duplicated to match each row on the "many" side (ACCOUNT). The result set has as many rows as there are accounts — not as many as there are customers.

Understanding this row-multiplication behavior is crucial. It is not a bug; it is the defined behavior of SQL joins. But it can cause problems if you later aggregate these results without accounting for it (we cover this trap in Section 6.10).

How INNER JOIN Works Conceptually

Think of it as two nested loops:

Pick a row from CUSTOMER.
Scan every row in ACCOUNT looking for rows where ACCOUNT.CUSTOMER_ID = CUSTOMER.CUSTOMER_ID.
For each match, produce an output row combining columns from both tables.
Move to the next CUSTOMER row and repeat.

(The actual execution may differ — DB2's optimizer may use hash joins, merge joins, or index lookups — but the logical result is identical. We will preview optimizer strategies in Section 6.12.)

Multi-Column Join Conditions

Sometimes a single column is not enough to establish the relationship. Consider a hypothetical ACCOUNT_HISTORY table that tracks balance snapshots by account and date:

SELECT a.ACCOUNT_ID,
       a.ACCOUNT_TYPE,
       h.SNAPSHOT_DATE,
       h.BALANCE_SNAPSHOT
FROM ACCOUNT a
INNER JOIN ACCOUNT_HISTORY h
    ON a.ACCOUNT_ID = h.ACCOUNT_ID
   AND h.SNAPSHOT_DATE = CURRENT DATE;

The ON clause can contain multiple conditions connected with AND. Both conditions must be true for a row pair to appear in the result.

Multi-column joins are common in several scenarios:

Composite keys: Some tables use multiple columns as their primary key. For example, a DAILY_BALANCE table might have a composite key of (ACCOUNT_ID, BALANCE_DATE).
Temporal joins: Joining to a history or audit table where you need to match both the entity and a specific time period.
Business rules: The relationship between two tables may require more than one column to be meaningful — for example, matching on both department and job code.

You can include as many conditions in the ON clause as needed. All must evaluate to TRUE for a row pair to be included in the result.

Non-Equi Joins

Most joins use equality (=), but DB2 allows any comparison operator in the ON clause:

-- Find employees whose salary exceeds their branch's average
SELECT e.FIRST_NAME,
       e.LAST_NAME,
       e.SALARY,
       ba.AVG_SALARY
FROM EMPLOYEE e
INNER JOIN (
    SELECT BRANCH_ID, AVG(SALARY) AS AVG_SALARY
    FROM EMPLOYEE
    GROUP BY BRANCH_ID
) ba
    ON e.BRANCH_ID = ba.BRANCH_ID
   AND e.SALARY > ba.AVG_SALARY;

Non-equi joins are also useful for range-based lookups — for example, joining a transaction to a fee schedule where the fee depends on the transaction amount falling within a range:

SELECT t.TXN_ID,
       t.AMOUNT,
       f.FEE_PERCENTAGE
FROM TRANSACTION t
INNER JOIN FEE_SCHEDULE f
    ON t.AMOUNT >= f.MIN_AMOUNT
   AND t.AMOUNT <  f.MAX_AMOUNT;

Self-Joins

A self-join joins a table to itself. This is essential when a table contains a hierarchical or self-referencing relationship. Our EMPLOYEE table has a MANAGER_EMPLOYEE_ID column that points back to EMPLOYEE_ID in the same table.

Business question: List every employee alongside their manager's name.

SELECT e.EMPLOYEE_ID,
       e.FIRST_NAME || ' ' || e.LAST_NAME AS EMPLOYEE_NAME,
       e.POSITION,
       m.FIRST_NAME || ' ' || m.LAST_NAME AS MANAGER_NAME
FROM EMPLOYEE e
INNER JOIN EMPLOYEE m
    ON e.MANAGER_EMPLOYEE_ID = m.EMPLOYEE_ID;

Expected output:

EMPLOYEE_ID  EMPLOYEE_NAME    POSITION              MANAGER_NAME
-----------  ---------------  --------------------  ---------------
2002         Sarah Palmer     Senior Teller         James Whitfield
2003         Miguel Reyes     Loan Officer          James Whitfield
2004         Karen Liu        Teller                Sarah Palmer
2005         Tom Bradley      Operations Analyst    James Whitfield
2006         Nina Patel       Branch Manager        Robert Haines
2007         Derek Olsen      Teller                Nina Patel

Notice that James Whitfield (the top-level manager or CEO-equivalent in our sample data) does not appear as an employee in this result — his MANAGER_EMPLOYEE_ID is NULL, and an INNER JOIN excludes NULLs. We will fix this with a LEFT OUTER JOIN in Section 6.4.

Self-joins are also useful for comparing rows within the same table. For example, finding all pairs of employees at the same branch who have a salary difference of more than $10,000:

SELECT e1.FIRST_NAME || ' ' || e1.LAST_NAME AS EMPLOYEE_A,
       e1.SALARY AS SALARY_A,
       e2.FIRST_NAME || ' ' || e2.LAST_NAME AS EMPLOYEE_B,
       e2.SALARY AS SALARY_B,
       ABS(e1.SALARY - e2.SALARY) AS SALARY_GAP
FROM EMPLOYEE e1
INNER JOIN EMPLOYEE e2
    ON e1.BRANCH_ID = e2.BRANCH_ID
   AND e1.EMPLOYEE_ID < e2.EMPLOYEE_ID
WHERE ABS(e1.SALARY - e2.SALARY) > 10000.00
ORDER BY SALARY_GAP DESC;

The condition e1.EMPLOYEE_ID < e2.EMPLOYEE_ID ensures each pair appears only once (not both A-B and B-A) and excludes self-pairs (A-A). This is a standard technique whenever you compare rows within the same table.

The USING Clause (Alternative to ON)

DB2 supports an alternative syntax when the join columns have the same name in both tables:

SELECT c.FIRST_NAME, a.ACCOUNT_ID, a.BALANCE
FROM CUSTOMER c
INNER JOIN ACCOUNT a
    USING (CUSTOMER_ID);

The USING clause is shorthand for ON c.CUSTOMER_ID = a.CUSTOMER_ID. There is one behavioral difference: with USING, the join column appears only once in the result (not qualified by either table alias). With ON, both c.CUSTOMER_ID and a.CUSTOMER_ID are available.

In this book we prefer the ON clause because it is more explicit and works for all join conditions, including multi-column joins and non-equi joins. However, you may encounter USING in production code, and it is perfectly valid.

6.3 Table Aliases — Writing Readable Join Queries

You have already seen aliases like c for CUSTOMER and a for ACCOUNT. In single-table queries, aliases are a convenience. In joins, they are a necessity.

Why Aliases Are Essential

When two tables share a column name (e.g., both CUSTOMER and ACCOUNT have a CUSTOMER_ID column), you must qualify every reference:

-- Without aliases (verbose but legal)
SELECT CUSTOMER.FIRST_NAME,
       ACCOUNT.ACCOUNT_ID,
       ACCOUNT.BALANCE
FROM CUSTOMER
INNER JOIN ACCOUNT
    ON CUSTOMER.CUSTOMER_ID = ACCOUNT.CUSTOMER_ID;

With aliases, the same query is far more readable:

-- With aliases (preferred)
SELECT c.FIRST_NAME,
       a.ACCOUNT_ID,
       a.BALANCE
FROM CUSTOMER c
INNER JOIN ACCOUNT a
    ON c.CUSTOMER_ID = a.CUSTOMER_ID;

Self-Join Aliases Are Mandatory

In a self-join, aliases are not optional — they are required to distinguish the two "copies" of the table:

FROM EMPLOYEE e          -- the employee copy
INNER JOIN EMPLOYEE m    -- the manager copy
    ON e.MANAGER_EMPLOYEE_ID = m.EMPLOYEE_ID

Without aliases, DB2 would have no way to know which EMPLOYEE table reference you mean.

Alias Conventions

Adopt a consistent convention. Here are common approaches:

Convention	Example	Pros
First letter	`c` for CUSTOMER	Short, fast to type
Abbreviation	`cust` for CUSTOMER	More readable in complex queries
Role-based	`mgr` for EMPLOYEE (manager copy)	Clarifies purpose in self-joins

In this book, we use single-letter or short abbreviations. In production SQL with five or more tables joined, slightly longer aliases like cust, acct, txn improve readability significantly.

Column Qualification Best Practices

Even when a column name is unique across all joined tables, qualify it with the alias anyway. Consider:

-- Ambiguous: which table does CITY come from?
SELECT c.FIRST_NAME, a.ACCOUNT_ID, CITY
FROM CUSTOMER c
INNER JOIN ACCOUNT a ON c.CUSTOMER_ID = a.CUSTOMER_ID
INNER JOIN BRANCH b ON a.BRANCH_ID = b.BRANCH_ID;

Both CUSTOMER and BRANCH have a CITY column. If you do not qualify it, DB2 will raise an error: SQLCODE -203 (SQLSTATE 42702): column name is ambiguous. Even for columns that are currently unambiguous, qualifying them protects you against future schema changes — if someone adds a CITY column to ACCOUNT later, your unqualified query will break.

Rule of thumb: In any query with more than one table, qualify every column with its table alias. No exceptions.

6.4 LEFT OUTER JOIN — When the Right Side Might Be Missing

An INNER JOIN excludes rows that have no match. Sometimes that is exactly what you want. Other times, you need to preserve all rows from one table, whether or not a match exists in the other.

The Syntax

SELECT columns
FROM table_a
LEFT OUTER JOIN table_b
    ON table_a.join_column = table_b.join_column;

The OUTER keyword is optional in DB2. LEFT JOIN and LEFT OUTER JOIN are identical. We write LEFT OUTER JOIN for clarity.

How It Works

A LEFT OUTER JOIN returns:

Every matched pair (same as INNER JOIN), plus
Every row from the left table that has no match in the right table, with NULLs filling in the right table's columns.

Customers With or Without Accounts

Business question: List all customers, showing their accounts if they have any. Include customers who have no accounts.

SELECT c.CUSTOMER_ID,
       c.FIRST_NAME,
       c.LAST_NAME,
       a.ACCOUNT_ID,
       a.ACCOUNT_TYPE,
       a.BALANCE
FROM CUSTOMER c
LEFT OUTER JOIN ACCOUNT a
    ON c.CUSTOMER_ID = a.CUSTOMER_ID
ORDER BY c.CUSTOMER_ID;

Expected output:

CUSTOMER_ID  FIRST_NAME  LAST_NAME  ACCOUNT_ID  ACCOUNT_TYPE  BALANCE
-----------  ----------  ---------  ----------  ------------  ----------
1001         Alice       Morgan     5001        CHECKING       4,250.00
1001         Alice       Morgan     5002        SAVINGS       12,800.00
1002         Brian       Chen       5003        CHECKING       1,100.00
1003         Clara       Davis      5004        CHECKING       8,900.00
1003         Clara       Davis      5005        SAVINGS       45,000.00
1003         Clara       Davis      5006        CD            25,000.00
1004         David       Kim        5007        CHECKING         320.00
1005         Elena       Vargas     -           -              -

Elena Vargas has no accounts. With an INNER JOIN she would vanish from the result. With a LEFT OUTER JOIN she appears, and all ACCOUNT columns are NULL (shown as - here for readability).

Finding Rows With No Match (Anti-Join Pattern)

One of the most practical uses of LEFT OUTER JOIN is finding rows in the left table that have no corresponding row in the right table:

-- Find customers with no accounts
SELECT c.CUSTOMER_ID,
       c.FIRST_NAME,
       c.LAST_NAME
FROM CUSTOMER c
LEFT OUTER JOIN ACCOUNT a
    ON c.CUSTOMER_ID = a.CUSTOMER_ID
WHERE a.ACCOUNT_ID IS NULL;

Expected output:

CUSTOMER_ID  FIRST_NAME  LAST_NAME
-----------  ----------  ---------
1005         Elena       Vargas

The WHERE a.ACCOUNT_ID IS NULL filter keeps only the unmatched rows. This is called an anti-join pattern. It answers questions like:

Which customers have never made a transaction?
Which branches have no employees assigned?
Which products have never been ordered?

This pattern is functionally equivalent to NOT EXISTS and NOT IN subqueries, but many developers find the LEFT JOIN version more readable. DB2's optimizer typically generates the same access plan for all three forms.

Here are the three equivalent ways to express an anti-join, for comparison:

-- Method 1: LEFT OUTER JOIN + IS NULL (covered above)
SELECT c.CUSTOMER_ID, c.FIRST_NAME, c.LAST_NAME
FROM CUSTOMER c
LEFT OUTER JOIN ACCOUNT a ON c.CUSTOMER_ID = a.CUSTOMER_ID
WHERE a.ACCOUNT_ID IS NULL;

-- Method 2: NOT EXISTS (correlated subquery)
SELECT c.CUSTOMER_ID, c.FIRST_NAME, c.LAST_NAME
FROM CUSTOMER c
WHERE NOT EXISTS (
    SELECT 1 FROM ACCOUNT a WHERE a.CUSTOMER_ID = c.CUSTOMER_ID
);

-- Method 3: NOT IN (simple subquery — be careful with NULLs!)
SELECT c.CUSTOMER_ID, c.FIRST_NAME, c.LAST_NAME
FROM CUSTOMER c
WHERE c.CUSTOMER_ID NOT IN (
    SELECT a.CUSTOMER_ID FROM ACCOUNT a WHERE a.CUSTOMER_ID IS NOT NULL
);

All three return the same result. Method 3 has a subtle trap: if the subquery returns any NULL values, NOT IN returns no rows at all (because x NOT IN (1, 2, NULL) evaluates to UNKNOWN for any x). The WHERE a.CUSTOMER_ID IS NOT NULL guard clause prevents this, but it is easy to forget. For this reason, most DB2 practitioners prefer Method 1 or Method 2.

Fixing the Self-Join: Employees Including Top-Level Managers

Remember that our INNER JOIN self-join in Section 6.2 excluded the top-level manager because their MANAGER_EMPLOYEE_ID is NULL. A LEFT OUTER JOIN fixes this:

SELECT e.EMPLOYEE_ID,
       e.FIRST_NAME || ' ' || e.LAST_NAME AS EMPLOYEE_NAME,
       e.POSITION,
       COALESCE(m.FIRST_NAME || ' ' || m.LAST_NAME, '(none — top level)') AS MANAGER_NAME
FROM EMPLOYEE e
LEFT OUTER JOIN EMPLOYEE m
    ON e.MANAGER_EMPLOYEE_ID = m.EMPLOYEE_ID
ORDER BY e.EMPLOYEE_ID;

Expected output:

EMPLOYEE_ID  EMPLOYEE_NAME    POSITION              MANAGER_NAME
-----------  ---------------  --------------------  -------------------
2001         James Whitfield  Regional Director     (none — top level)
2002         Sarah Palmer     Senior Teller         James Whitfield
2003         Miguel Reyes     Loan Officer          James Whitfield
2004         Karen Liu        Teller                Sarah Palmer
2005         Tom Bradley      Operations Analyst    James Whitfield
2006         Nina Patel       Branch Manager        Robert Haines
2007         Derek Olsen      Teller                Nina Patel

James Whitfield now appears, with (none — top level) for his manager name, thanks to COALESCE handling the NULL from the unmatched right side.

Branches With or Without Employees

Business question: Show all branches and their employee count, including branches that might have zero employees.

SELECT b.BRANCH_ID,
       b.BRANCH_NAME,
       COUNT(e.EMPLOYEE_ID) AS EMPLOYEE_COUNT
FROM BRANCH b
LEFT OUTER JOIN EMPLOYEE e
    ON b.BRANCH_ID = e.BRANCH_ID
GROUP BY b.BRANCH_ID, b.BRANCH_NAME
ORDER BY EMPLOYEE_COUNT DESC;

Expected output:

BRANCH_ID  BRANCH_NAME         EMPLOYEE_COUNT
---------  ------------------  --------------
101        Downtown Branch              4
102        Northside Branch             3
103        Airport Branch               2
104        University Branch            0

COUNT(e.EMPLOYEE_ID) correctly returns 0 for University Branch because COUNT(column) does not count NULLs. If we had written COUNT(*) instead, it would return 1 for University Branch (counting the preserved left-side row), which would be incorrect.

This is one of the most common bugs in LEFT OUTER JOIN queries with aggregation. Memorize this rule:

Function	NULL Behavior	Use With Outer Joins
`COUNT(*)`	Counts all rows including NULL rows	Almost always wrong
`COUNT(column)`	Skips NULLs	Correct for counting matched rows
`SUM(column)`	Skips NULLs (returns NULL if all NULL)	Wrap in COALESCE for 0
`AVG(column)`	Skips NULLs	Wrap in COALESCE for 0

When you combine LEFT OUTER JOIN with GROUP BY, always use COUNT(specific_column) from the right-side table, and wrap SUM() and AVG() in COALESCE() to handle the all-NULL case.

6.5 RIGHT OUTER JOIN and FULL OUTER JOIN

RIGHT OUTER JOIN

A RIGHT OUTER JOIN preserves all rows from the right table, filling in NULLs for the left table when there is no match. It is the mirror image of LEFT OUTER JOIN.

SELECT a.ACCOUNT_ID,
       a.ACCOUNT_TYPE,
       t.TXN_ID,
       t.TXN_TYPE,
       t.AMOUNT
FROM TRANSACTION t
RIGHT OUTER JOIN ACCOUNT a
    ON t.ACCOUNT_ID = a.ACCOUNT_ID
ORDER BY a.ACCOUNT_ID;

This returns all accounts, including those with no transactions.

In practice, RIGHT OUTER JOIN is rarely used. Most SQL authors structure their queries so that the "must-preserve" table appears on the left side, making LEFT OUTER JOIN the natural choice. You can always rewrite a RIGHT JOIN as a LEFT JOIN by swapping the table order:

-- These two queries produce identical results
-- Version 1: RIGHT OUTER JOIN
SELECT a.ACCOUNT_ID, t.TXN_ID
FROM TRANSACTION t
RIGHT OUTER JOIN ACCOUNT a ON t.ACCOUNT_ID = a.ACCOUNT_ID;

-- Version 2: LEFT OUTER JOIN (preferred — same result, more conventional)
SELECT a.ACCOUNT_ID, t.TXN_ID
FROM ACCOUNT a
LEFT OUTER JOIN TRANSACTION t ON a.ACCOUNT_ID = t.ACCOUNT_ID;

Our recommendation: prefer LEFT OUTER JOIN for consistency. Reserve RIGHT OUTER JOIN for situations where rewriting would make the query less readable — for example, when adding a new table to an existing multi-join query and you do not want to rearrange the FROM clause.

In over twenty years of DB2 production code, the author has seen RIGHT OUTER JOIN used intentionally fewer than a dozen times. It is not wrong, but the SQL community has standardized on LEFT OUTER JOIN as the default outer join direction. When you see RIGHT OUTER JOIN in someone else's code, the first question to ask is whether the tables could be swapped to use LEFT instead.

FULL OUTER JOIN

A FULL OUTER JOIN preserves unmatched rows from both sides:

SELECT c.CUSTOMER_ID,
       c.LAST_NAME,
       a.ACCOUNT_ID,
       a.ACCOUNT_TYPE
FROM CUSTOMER c
FULL OUTER JOIN ACCOUNT a
    ON c.CUSTOMER_ID = a.CUSTOMER_ID
ORDER BY c.CUSTOMER_ID;

This returns: - All matched customer-account pairs - Customers with no accounts (ACCOUNT columns are NULL) - Accounts with no customer (CUSTOMER columns are NULL — this would indicate orphaned data)

When is FULL OUTER JOIN useful?

FULL OUTER JOIN is less common than LEFT OUTER JOIN but invaluable in specific scenarios:

Data reconciliation: Comparing two data sources to find records present in one but not the other.

-- Compare internal accounts with external audit file
SELECT i.ACCOUNT_ID AS INTERNAL_ACCT,
       e.ACCOUNT_ID AS EXTERNAL_ACCT,
       CASE
           WHEN i.ACCOUNT_ID IS NULL THEN 'Missing internally'
           WHEN e.ACCOUNT_ID IS NULL THEN 'Missing externally'
           ELSE 'Matched'
       END AS STATUS
FROM INTERNAL_ACCOUNTS i
FULL OUTER JOIN EXTERNAL_AUDIT e
    ON i.ACCOUNT_ID = e.ACCOUNT_ID
WHERE i.ACCOUNT_ID IS NULL
   OR e.ACCOUNT_ID IS NULL;

Merging time series: Combining data from two sources that may have entries on different dates.
Gap detection: Finding periods where expected data is missing from either side.

FULL OUTER JOIN is sometimes called "the reconciliation join" because of its power in data quality scenarios. In the Meridian Bank context, consider a nightly batch process that imports transaction data from an ATM network. A FULL OUTER JOIN between the internal transaction log and the external ATM feed quickly reveals:

Transactions recorded internally but not in the ATM feed (potential system error)
Transactions in the ATM feed but not recorded internally (potential data loss)
Transactions in both systems but with mismatched amounts (potential corruption)

No other single join type provides this complete three-way classification.

6.6 CROSS JOIN — The Cartesian Product

A CROSS JOIN combines every row from the left table with every row from the right table. If the left table has m rows and the right table has n rows, the result has m × n rows.

Syntax

SELECT columns
FROM table_a
CROSS JOIN table_b;

There is no ON clause — every row pairs with every other row.

Why Would You Want This?

Cross joins are useful for generating combinations:

Example 1: Generate all account-type/branch combinations for a report template.

SELECT b.BRANCH_NAME,
       at.ACCOUNT_TYPE
FROM BRANCH b
CROSS JOIN (VALUES ('CHECKING'), ('SAVINGS'), ('CD'), ('MONEY_MARKET')) AS at(ACCOUNT_TYPE)
ORDER BY b.BRANCH_NAME, at.ACCOUNT_TYPE;

Expected output:

BRANCH_NAME         ACCOUNT_TYPE
------------------  ------------
Airport Branch      CD
Airport Branch      CHECKING
Airport Branch      MONEY_MARKET
Airport Branch      SAVINGS
Downtown Branch     CD
Downtown Branch     CHECKING
Downtown Branch     MONEY_MARKET
Downtown Branch     SAVINGS
Northside Branch    CD
Northside Branch    CHECKING
Northside Branch    MONEY_MARKET
Northside Branch    SAVINGS
University Branch   CD
University Branch   CHECKING
University Branch   MONEY_MARKET
University Branch   SAVINGS

This produces a skeleton that can be LEFT OUTER JOINed to actual account data to create a report showing zero where no accounts of that type exist at a branch.

Example 2: Generate a date range for reporting.

-- Generate every date in January 2024
WITH DATE_RANGE(D) AS (
    VALUES DATE('2024-01-01')
    UNION ALL
    SELECT D + 1 DAY FROM DATE_RANGE WHERE D < DATE('2024-01-31')
)
SELECT dr.D AS REPORT_DATE,
       b.BRANCH_ID,
       b.BRANCH_NAME
FROM DATE_RANGE dr
CROSS JOIN BRANCH b
ORDER BY dr.D, b.BRANCH_ID;

This produces 31 days × 4 branches = 124 rows — a reporting skeleton for daily branch metrics.

Example 3: Comparing every pair of branches.

SELECT b1.BRANCH_NAME AS BRANCH_A,
       b2.BRANCH_NAME AS BRANCH_B
FROM BRANCH b1
CROSS JOIN BRANCH b2
WHERE b1.BRANCH_ID < b2.BRANCH_ID
ORDER BY b1.BRANCH_NAME, b2.BRANCH_NAME;

The WHERE b1.BRANCH_ID < b2.BRANCH_ID clause eliminates duplicates and self-pairs, giving unique combinations.

Understanding Cross Join Sizes

Always calculate the expected result size before running a CROSS JOIN:

Left Table Rows	Right Table Rows	Result Rows	Assessment
4	4	16	Trivial
31	4	124	Small reporting skeleton
365	4	1,460	Yearly daily report — still fine
10,000	10,000	100,000,000	Dangerous — probably unintended

A CROSS JOIN between two tables of 10,000 rows each produces 100 million rows. If the tables have wide columns, this can exhaust memory and fill temp tablespaces. Always ask yourself: "Do I genuinely need every combination?" If the answer is no, you probably want an INNER JOIN with a specific condition instead.

The Danger of Accidental Cross Joins

If you join two tables and forget the ON clause, you get a cross join whether you intended one or not:

-- DANGEROUS: missing ON clause produces a Cartesian product!
SELECT c.FIRST_NAME, a.ACCOUNT_ID
FROM CUSTOMER c, ACCOUNT a;
-- If CUSTOMER has 5,000 rows and ACCOUNT has 20,000 rows,
-- this returns 100,000,000 rows.

The old-style comma-separated FROM clause (pre-SQL:1992 syntax) still works in DB2 but makes accidental cross joins easy. Always use explicit JOIN ... ON syntax. We will return to this in Section 6.10.

CROSS JOIN vs. Comma Syntax

Technically, FROM A, B (with no WHERE clause) and FROM A CROSS JOIN B are equivalent — both produce a Cartesian product. The difference is intent. When you write CROSS JOIN, you are explicitly declaring that you want every combination. When you write FROM A, B and forget the WHERE clause, you have a bug. Use CROSS JOIN when you mean it, and INNER JOIN ... ON when you want matched rows.

6.7 LATERAL Joins and TABLE Functions

LATERAL is a powerful but often overlooked join type. It allows the right side of a join to reference columns from the left side — effectively embedding a correlated subquery in the FROM clause.

The Problem LATERAL Solves

Suppose you want to find, for each customer, their two most recent transactions. With a regular join you cannot express "top N per group" cleanly in the FROM clause. LATERAL makes it possible.

Syntax

SELECT columns
FROM table_a
, LATERAL (
    SELECT columns
    FROM table_b
    WHERE table_b.some_column = table_a.some_column
    ORDER BY ...
    FETCH FIRST n ROWS ONLY
) AS alias;

The subquery inside LATERAL (...) can reference table_a columns, which a regular subquery in FROM cannot do.

Example: Top 2 Transactions Per Customer

SELECT c.CUSTOMER_ID,
       c.FIRST_NAME,
       c.LAST_NAME,
       lt.TXN_ID,
       lt.AMOUNT,
       lt.TXN_DATE
FROM CUSTOMER c,
LATERAL (
    SELECT t.TXN_ID, t.AMOUNT, t.TXN_DATE
    FROM TRANSACTION t
    INNER JOIN ACCOUNT a ON t.ACCOUNT_ID = a.ACCOUNT_ID
    WHERE a.CUSTOMER_ID = c.CUSTOMER_ID
    ORDER BY t.TXN_DATE DESC
    FETCH FIRST 2 ROWS ONLY
) AS lt
ORDER BY c.CUSTOMER_ID, lt.TXN_DATE DESC;

Expected output:

CUSTOMER_ID  FIRST_NAME  LAST_NAME  TXN_ID  AMOUNT     TXN_DATE
-----------  ----------  ---------  ------  ---------  ----------
1001         Alice       Morgan     90015    1,200.00  2024-03-15
1001         Alice       Morgan     90012     -200.00  2024-03-10
1002         Brian       Chen       90020      500.00  2024-03-14
1002         Brian       Chen       90018     -75.00   2024-03-12
1003         Clara       Davis      90025    5,000.00  2024-03-16
1003         Clara       Davis      90022   -1,500.00  2024-03-13

Without LATERAL, achieving this result requires window functions (Chapter 14) or complex correlated subqueries in the SELECT list. LATERAL provides a clean, readable alternative.

Important Notes on LATERAL in DB2

DB2 for LUW (Linux, UNIX, Windows) has supported LATERAL since version 9.1.
DB2 for z/OS added LATERAL support in version 12.
The comma before LATERAL acts as a cross-apply-style join. If the LATERAL subquery returns no rows for a given left-side row, that left-side row is excluded (like an INNER JOIN). To preserve left-side rows with no matches, use LEFT OUTER JOIN LATERAL (...):

SELECT c.CUSTOMER_ID,
       c.FIRST_NAME,
       c.LAST_NAME,
       lt.TXN_ID,
       lt.AMOUNT
FROM CUSTOMER c
LEFT OUTER JOIN LATERAL (
    SELECT t.TXN_ID, t.AMOUNT
    FROM TRANSACTION t
    INNER JOIN ACCOUNT a ON t.ACCOUNT_ID = a.ACCOUNT_ID
    WHERE a.CUSTOMER_ID = c.CUSTOMER_ID
    ORDER BY t.TXN_DATE DESC
    FETCH FIRST 2 ROWS ONLY
) AS lt ON 1=1
ORDER BY c.CUSTOMER_ID;

The ON 1=1 is required syntactically for the LEFT OUTER JOIN but has no filtering effect — the correlation is inside the LATERAL subquery.

TABLE Functions

DB2 also supports TABLE() function calls in the FROM clause, which work similarly to LATERAL in that they can accept correlated arguments:

SELECT c.CUSTOMER_ID,
       c.LAST_NAME,
       t.*
FROM CUSTOMER c,
TABLE(GET_CUSTOMER_TRANSACTIONS(c.CUSTOMER_ID)) AS t;

Here GET_CUSTOMER_TRANSACTIONS is a user-defined table function that returns a result set. The TABLE() syntax allows the function to receive c.CUSTOMER_ID from the outer query — the same correlated behavior as LATERAL.

When to Use LATERAL vs. Other Approaches

LATERAL is not the only way to solve "top N per group" and similar problems. Here is a decision guide:

Approach	Syntax Complexity	Performance	DB2 Version Required
LATERAL	Moderate	Good (optimizer can push predicates)	LUW 9.1+, z/OS 12+
Window functions (ROW_NUMBER)	Moderate	Good	LUW 8.2+, z/OS 9+
Correlated subquery in SELECT	Low	Can be poor for many columns	All versions
Correlated subquery with EXISTS	Low	Good with indexes	All versions

LATERAL shines when you need multiple columns from the correlated result and when the correlated subquery benefits from ORDER BY with FETCH FIRST. Window functions (covered in Chapter 14) are often the most efficient approach and are widely understood. Choose based on readability, your DB2 version, and your team's SQL familiarity.

LATERAL and Performance Considerations

When DB2 processes a LATERAL join, it evaluates the LATERAL subquery once for each row from the left side — similar to a correlated subquery. However, the optimizer can often transform the LATERAL into a more efficient join internally. Index support on the correlated column (e.g., an index on ACCOUNT.CUSTOMER_ID or TRANSACTION.ACCOUNT_ID) is critical for good LATERAL performance. Without an index, each evaluation of the LATERAL subquery requires a table scan, and the total cost becomes proportional to left_rows × right_table_size.

6.8 Joining Three or More Tables

Real-world queries frequently join three, four, five, or more tables. The principle is simple: chain your joins.

Three-Table Join: Customer + Account + Branch

Business question: For each customer, show their account numbers, account types, and the name of the branch where each account was opened.

SELECT c.FIRST_NAME,
       c.LAST_NAME,
       a.ACCOUNT_ID,
       a.ACCOUNT_TYPE,
       a.BALANCE,
       b.BRANCH_NAME,
       b.CITY AS BRANCH_CITY
FROM CUSTOMER c
INNER JOIN ACCOUNT a
    ON c.CUSTOMER_ID = a.CUSTOMER_ID
INNER JOIN BRANCH b
    ON a.BRANCH_ID = b.BRANCH_ID
ORDER BY c.LAST_NAME, a.ACCOUNT_ID;

Expected output:

FIRST_NAME  LAST_NAME  ACCOUNT_ID  ACCOUNT_TYPE  BALANCE     BRANCH_NAME       BRANCH_CITY
----------  ---------  ----------  ------------  ----------  ----------------  -----------
Brian       Chen       5003        CHECKING       1,100.00   Northside Branch  Chicago
Clara       Davis      5004        CHECKING       8,900.00   Downtown Branch   Chicago
Clara       Davis      5005        SAVINGS       45,000.00   Downtown Branch   Chicago
Clara       Davis      5006        CD            25,000.00   Airport Branch    Chicago
David       Kim        5007        CHECKING         320.00   Northside Branch  Chicago
Alice       Morgan     5001        CHECKING       4,250.00   Downtown Branch   Chicago
Alice       Morgan     5002        SAVINGS       12,800.00   Downtown Branch   Chicago

Four-Table Join: Adding Transaction Details

Business question: Show customer names, account types, branch names, and transaction details for all transactions over $1,000.

SELECT c.FIRST_NAME || ' ' || c.LAST_NAME AS CUSTOMER_NAME,
       a.ACCOUNT_ID,
       a.ACCOUNT_TYPE,
       b.BRANCH_NAME,
       t.TXN_ID,
       t.TXN_TYPE,
       t.AMOUNT,
       t.TXN_DATE
FROM CUSTOMER c
INNER JOIN ACCOUNT a
    ON c.CUSTOMER_ID = a.CUSTOMER_ID
INNER JOIN BRANCH b
    ON a.BRANCH_ID = b.BRANCH_ID
INNER JOIN TRANSACTION t
    ON a.ACCOUNT_ID = t.ACCOUNT_ID
WHERE t.AMOUNT > 1000.00
ORDER BY t.TXN_DATE DESC;

Five-Table Join: Full Picture with Employee

Business question: For each large transaction (over $5,000), show the customer name, account, branch, and the branch manager who oversees that branch.

SELECT c.FIRST_NAME || ' ' || c.LAST_NAME AS CUSTOMER_NAME,
       a.ACCOUNT_ID,
       a.ACCOUNT_TYPE,
       b.BRANCH_NAME,
       e.FIRST_NAME || ' ' || e.LAST_NAME AS BRANCH_MANAGER,
       t.TXN_ID,
       t.AMOUNT,
       t.TXN_DATE
FROM CUSTOMER c
INNER JOIN ACCOUNT a
    ON c.CUSTOMER_ID = a.CUSTOMER_ID
INNER JOIN BRANCH b
    ON a.BRANCH_ID = b.BRANCH_ID
INNER JOIN EMPLOYEE e
    ON b.MANAGER_ID = e.EMPLOYEE_ID
INNER JOIN TRANSACTION t
    ON a.ACCOUNT_ID = t.ACCOUNT_ID
WHERE t.AMOUNT > 5000.00
ORDER BY t.AMOUNT DESC;

Readability Tips for Multi-Table Joins

Indent the ON clause under its JOIN to visually pair them.
List tables in logical order — typically starting from the "main" entity (CUSTOMER) and joining outward through foreign keys.
Use meaningful aliases — c, a, b, t, e map naturally to our five tables.
Comment complex joins:

FROM CUSTOMER c
INNER JOIN ACCOUNT a            -- customer's accounts
    ON c.CUSTOMER_ID = a.CUSTOMER_ID
INNER JOIN BRANCH b             -- branch where account opened
    ON a.BRANCH_ID = b.BRANCH_ID
LEFT OUTER JOIN EMPLOYEE e      -- branch manager (might be vacant)
    ON b.MANAGER_ID = e.EMPLOYEE_ID

Mixing Join Types

You can freely mix INNER, LEFT, RIGHT, and CROSS joins in a single query. A common pattern is an INNER JOIN chain with one LEFT OUTER JOIN at the end for an optional lookup:

SELECT c.FIRST_NAME,
       c.LAST_NAME,
       a.ACCOUNT_ID,
       b.BRANCH_NAME,
       t.TXN_ID,
       t.AMOUNT
FROM CUSTOMER c
INNER JOIN ACCOUNT a
    ON c.CUSTOMER_ID = a.CUSTOMER_ID
INNER JOIN BRANCH b
    ON a.BRANCH_ID = b.BRANCH_ID
LEFT OUTER JOIN TRANSACTION t
    ON a.ACCOUNT_ID = t.ACCOUNT_ID
ORDER BY c.LAST_NAME, a.ACCOUNT_ID, t.TXN_DATE;

This ensures we see every customer's accounts and branches, even if some accounts have no transactions yet.

Join Order and the Concept of "Driving Table"

When writing multi-table queries, developers sometimes worry about which table to list first. In the query above, we started with CUSTOMER. Should we have started with TRANSACTION instead?

From a correctness standpoint, the order does not matter. All of the following produce identical results:

-- Order 1: Start from CUSTOMER
FROM CUSTOMER c
INNER JOIN ACCOUNT a ON c.CUSTOMER_ID = a.CUSTOMER_ID
INNER JOIN BRANCH b ON a.BRANCH_ID = b.BRANCH_ID

-- Order 2: Start from BRANCH
FROM BRANCH b
INNER JOIN ACCOUNT a ON b.BRANCH_ID = a.BRANCH_ID
INNER JOIN CUSTOMER c ON a.CUSTOMER_ID = c.CUSTOMER_ID

-- Order 3: Start from ACCOUNT
FROM ACCOUNT a
INNER JOIN CUSTOMER c ON a.CUSTOMER_ID = c.CUSTOMER_ID
INNER JOIN BRANCH b ON a.BRANCH_ID = b.BRANCH_ID

From a performance standpoint, the textual order in your SQL does not determine the physical execution order. DB2's cost-based optimizer evaluates multiple join orderings and picks the cheapest one. Write the order that makes the query most readable to humans.

From a readability standpoint, start with the "main" entity of your query — the one the business question is about — and join outward through the foreign key relationships. If the question is "show me each customer's accounts," start with CUSTOMER. If the question is "show me each transaction with context," start with TRANSACTION.

6.9 Join Conditions vs. WHERE Conditions

This is one of the most misunderstood aspects of SQL joins, and getting it wrong with outer joins produces incorrect results.

For INNER JOINs: No Difference

With INNER JOIN, putting a condition in the ON clause or the WHERE clause produces the same result:

-- Version A: condition in ON
SELECT c.FIRST_NAME, a.ACCOUNT_ID, a.BALANCE
FROM CUSTOMER c
INNER JOIN ACCOUNT a
    ON c.CUSTOMER_ID = a.CUSTOMER_ID
   AND a.ACCOUNT_TYPE = 'CHECKING';

-- Version B: condition in WHERE
SELECT c.FIRST_NAME, a.ACCOUNT_ID, a.BALANCE
FROM CUSTOMER c
INNER JOIN ACCOUNT a
    ON c.CUSTOMER_ID = a.CUSTOMER_ID
WHERE a.ACCOUNT_TYPE = 'CHECKING';

Both return only checking accounts. DB2's optimizer treats them identically.

For OUTER JOINs: Critical Difference

With LEFT OUTER JOIN, placing a condition in the ON clause versus the WHERE clause changes the result dramatically.

Scenario: Show all customers with their checking accounts, but also show customers who have no checking accounts.

Correct — condition in ON clause:

SELECT c.CUSTOMER_ID,
       c.FIRST_NAME,
       c.LAST_NAME,
       a.ACCOUNT_ID,
       a.ACCOUNT_TYPE,
       a.BALANCE
FROM CUSTOMER c
LEFT OUTER JOIN ACCOUNT a
    ON c.CUSTOMER_ID = a.CUSTOMER_ID
   AND a.ACCOUNT_TYPE = 'CHECKING'
ORDER BY c.CUSTOMER_ID;

Expected output:

CUSTOMER_ID  FIRST_NAME  LAST_NAME  ACCOUNT_ID  ACCOUNT_TYPE  BALANCE
-----------  ----------  ---------  ----------  ------------  ----------
1001         Alice       Morgan     5001        CHECKING       4,250.00
1002         Brian       Chen       5003        CHECKING       1,100.00
1003         Clara       Davis      5004        CHECKING       8,900.00
1004         David       Kim        5007        CHECKING         320.00
1005         Elena       Vargas     -           -              -

Elena Vargas appears with NULLs because she has no checking account — the LEFT OUTER JOIN preserves her. Customers who have only savings or CD accounts (but no checking) would also appear with NULLs.

Incorrect — condition in WHERE clause:

SELECT c.CUSTOMER_ID,
       c.FIRST_NAME,
       c.LAST_NAME,
       a.ACCOUNT_ID,
       a.ACCOUNT_TYPE,
       a.BALANCE
FROM CUSTOMER c
LEFT OUTER JOIN ACCOUNT a
    ON c.CUSTOMER_ID = a.CUSTOMER_ID
WHERE a.ACCOUNT_TYPE = 'CHECKING'
ORDER BY c.CUSTOMER_ID;

Expected output:

CUSTOMER_ID  FIRST_NAME  LAST_NAME  ACCOUNT_ID  ACCOUNT_TYPE  BALANCE
-----------  ----------  ---------  ----------  ------------  ----------
1001         Alice       Morgan     5001        CHECKING       4,250.00
1002         Brian       Chen       5003        CHECKING       1,100.00
1003         Clara       Davis      5004        CHECKING       8,900.00
1004         David       Kim        5007        CHECKING         320.00

Elena is gone. The WHERE clause filters after the join. For Elena, a.ACCOUNT_TYPE is NULL (no match), and NULL = 'CHECKING' evaluates to UNKNOWN, which is treated as FALSE. The WHERE clause eliminates her. The LEFT OUTER JOIN effectively became an INNER JOIN.

The Rule

Condition Placement	Effect on OUTER JOIN
ON clause	Filters during the join — unmatched rows are still preserved with NULLs
WHERE clause	Filters after the join — can eliminate the preserved NULL rows, converting the outer join to an inner join

Best practice: Place conditions about the right-side table (the "optional" table) in the ON clause. Place conditions about the left-side table (the "preserved" table) in the WHERE clause.

A Visual Way to Think About It

Imagine the query execution in two phases:

Phase 1 (JOIN + ON): Build the joined result set, applying ON conditions during the join. For LEFT OUTER JOIN, add NULLs for unmatched left-side rows.
Phase 2 (WHERE): Filter the joined result set, removing rows that do not satisfy WHERE conditions.

If your WHERE condition tests a right-side column, it removes the NULL rows that the LEFT OUTER JOIN just added. That is why conditions on the right-side table belong in the ON clause — they participate in Phase 1, where the outer join logic preserves unmatched rows.

This is probably the single most important concept in this chapter. If you take away one lesson, let it be this: in a LEFT OUTER JOIN, filtering the right-side table in WHERE destroys the "outer" behavior.

6.10 Common Join Mistakes

After reviewing thousands of SQL queries in production systems, these are the mistakes we see most often.

Mistake 1: Accidental Cartesian Product

-- Missing join condition between CUSTOMER and BRANCH
SELECT c.FIRST_NAME, a.ACCOUNT_ID, b.BRANCH_NAME
FROM CUSTOMER c
INNER JOIN ACCOUNT a ON c.CUSTOMER_ID = a.CUSTOMER_ID
INNER JOIN BRANCH b ON 1 = 1;   -- oops, this is a cross join!

Every account row gets paired with every branch row. If you expected 7 result rows but got 28 (7 accounts × 4 branches), check your join conditions.

How to spot it: If your result set is much larger than expected, count the rows and check whether it equals the product of the table sizes.

Mistake 2: Wrong Outer Join Side

-- Intent: show ALL branches, even those with no employees
-- Bug: RIGHT side is BRANCH, but we used LEFT OUTER JOIN
SELECT b.BRANCH_NAME, e.FIRST_NAME
FROM BRANCH b
LEFT OUTER JOIN EMPLOYEE e ON b.BRANCH_ID = e.BRANCH_ID;

Actually, this one is correct — BRANCH is on the left, and we are preserving it. The mistake version would be:

-- Bug: BRANCH is on the RIGHT, and we used LEFT OUTER JOIN
-- This preserves EMPLOYEE rows with no branch (not what we wanted)
SELECT b.BRANCH_NAME, e.FIRST_NAME
FROM EMPLOYEE e
LEFT OUTER JOIN BRANCH b ON e.BRANCH_ID = b.BRANCH_ID;

This preserves all employees (even those without a branch), not all branches. Read the join direction carefully.

Mistake 3: Filtering Outer Join Results in WHERE

We covered this in Section 6.9, but it is worth repeating because it is so common:

-- Bug: WHERE clause nullifies the LEFT OUTER JOIN
FROM CUSTOMER c
LEFT OUTER JOIN ACCOUNT a ON c.CUSTOMER_ID = a.CUSTOMER_ID
WHERE a.STATUS = 'ACTIVE'   -- this eliminates customers with no accounts!

Fix: move the filter to the ON clause.

FROM CUSTOMER c
LEFT OUTER JOIN ACCOUNT a
    ON c.CUSTOMER_ID = a.CUSTOMER_ID
   AND a.STATUS = 'ACTIVE'

Mistake 4: Duplicate Rows from One-to-Many Joins

When CUSTOMER has 1 row but ACCOUNT has 3 rows for that customer, a join produces 3 rows. If you then join TRANSACTION (and each account has 100 transactions), you get 300 rows. If you aggregate (SUM, COUNT) without understanding this multiplication, you get inflated numbers.

-- Bug: double-counting balances when customer has multiple accounts
SELECT c.CUSTOMER_ID,
       c.LAST_NAME,
       SUM(a.BALANCE) AS TOTAL_BALANCE,    -- correct
       COUNT(t.TXN_ID) AS TXN_COUNT        -- correct
FROM CUSTOMER c
INNER JOIN ACCOUNT a ON c.CUSTOMER_ID = a.CUSTOMER_ID
INNER JOIN TRANSACTION t ON a.ACCOUNT_ID = t.ACCOUNT_ID
GROUP BY c.CUSTOMER_ID, c.LAST_NAME;

This query actually works correctly for COUNT(t.TXN_ID) because each transaction row appears once. But SUM(a.BALANCE) is wrong — Alice's checking account balance is summed once for each transaction on that account, wildly inflating the total. The fix is to aggregate at the right level, often using subqueries or CTEs:

-- Correct: aggregate balances separately from transaction counts
SELECT c.CUSTOMER_ID,
       c.LAST_NAME,
       acct.TOTAL_BALANCE,
       txn.TXN_COUNT
FROM CUSTOMER c
INNER JOIN (
    SELECT CUSTOMER_ID, SUM(BALANCE) AS TOTAL_BALANCE
    FROM ACCOUNT
    GROUP BY CUSTOMER_ID
) acct ON c.CUSTOMER_ID = acct.CUSTOMER_ID
INNER JOIN (
    SELECT a.CUSTOMER_ID, COUNT(*) AS TXN_COUNT
    FROM TRANSACTION t
    INNER JOIN ACCOUNT a ON t.ACCOUNT_ID = a.ACCOUNT_ID
    GROUP BY a.CUSTOMER_ID
) txn ON c.CUSTOMER_ID = txn.CUSTOMER_ID;

Mistake 5: Using Old-Style Comma Joins

-- Old style (pre-SQL:1992) — avoid this
SELECT c.FIRST_NAME, a.ACCOUNT_ID
FROM CUSTOMER c, ACCOUNT a
WHERE c.CUSTOMER_ID = a.CUSTOMER_ID;

This works, but it mixes join logic (how tables relate) with filter logic (which rows you want) in the WHERE clause. It makes it easy to accidentally omit a join condition, and it cannot express outer joins cleanly. Always use explicit JOIN ... ON syntax.

A counting trick for verifying comma-style joins: If you have N tables in the FROM clause, you need exactly N-1 equi-join conditions in the WHERE clause to avoid a Cartesian product. With 5 tables, you need 4 join conditions. Count them. If you are one short, you have a hidden cross join. With explicit JOIN ... ON syntax, this verification is unnecessary because each JOIN requires its own ON clause.

Mistake 6: Join Elimination Surprises

DB2's optimizer can eliminate a join entirely if it determines the join does not affect the result. For example:

SELECT a.ACCOUNT_ID, a.BALANCE
FROM ACCOUNT a
INNER JOIN CUSTOMER c ON a.CUSTOMER_ID = c.CUSTOMER_ID;

If there is a foreign key constraint guaranteeing that every ACCOUNT.CUSTOMER_ID exists in CUSTOMER, and you are not selecting any columns from CUSTOMER, DB2 may skip the CUSTOMER table entirely. This is usually a performance benefit, but it can be confusing when analyzing EXPLAIN plans.

Join elimination is one of many optimizer transformations that rely on foreign key constraints. This is one practical reason to define foreign keys even when your application layer already enforces referential integrity — the optimizer uses them to produce better plans.

Debugging Join Problems

When a join query produces unexpected results, use this systematic approach:

Count the result rows. Is it more or fewer than expected? - Way too many rows: probably a missing or wrong join condition (accidental Cartesian product). - Fewer rows than expected: probably using INNER JOIN where you need LEFT OUTER JOIN, or a WHERE clause that eliminates outer-joined NULLs. - Duplicate rows: probably a many-to-many relationship you did not anticipate.
Isolate each join. Start with two tables and verify the result. Add the third table and verify again. Continue until you find where the problem appears.
Check for NULLs in join columns. If a foreign key column contains NULLs, INNER JOIN silently drops those rows. This is correct behavior but can surprise you.
Verify your aliases. In a five-table query with single-letter aliases, it is easy to type a.BRANCH_ID when you mean b.BRANCH_ID. The query might still run if the column exists in both tables — but it gives the wrong answer.

6.11 Meridian Bank Join Queries

Let us put it all together with comprehensive queries against our Meridian National Bank schema.

Query 1: Complete Customer Account Summary

Business question: For each customer, show all their accounts with branch information, ordered by customer name and account type.

SELECT c.CUSTOMER_ID,
       c.FIRST_NAME || ' ' || c.LAST_NAME AS CUSTOMER_NAME,
       c.CITY AS CUSTOMER_CITY,
       a.ACCOUNT_ID,
       a.ACCOUNT_TYPE,
       a.BALANCE,
       a.STATUS,
       b.BRANCH_NAME,
       b.CITY AS BRANCH_CITY
FROM CUSTOMER c
INNER JOIN ACCOUNT a
    ON c.CUSTOMER_ID = a.CUSTOMER_ID
INNER JOIN BRANCH b
    ON a.BRANCH_ID = b.BRANCH_ID
ORDER BY c.LAST_NAME, c.FIRST_NAME, a.ACCOUNT_TYPE;

Query 2: Transaction Ledger with Full Context

Business question: Build a complete transaction ledger showing customer, account, and branch for each transaction in March 2024.

SELECT t.TXN_ID,
       t.TXN_DATE,
       c.FIRST_NAME || ' ' || c.LAST_NAME AS CUSTOMER_NAME,
       a.ACCOUNT_ID,
       a.ACCOUNT_TYPE,
       b.BRANCH_NAME,
       t.TXN_TYPE,
       t.AMOUNT,
       t.DESCRIPTION
FROM TRANSACTION t
INNER JOIN ACCOUNT a
    ON t.ACCOUNT_ID = a.ACCOUNT_ID
INNER JOIN CUSTOMER c
    ON a.CUSTOMER_ID = c.CUSTOMER_ID
INNER JOIN BRANCH b
    ON a.BRANCH_ID = b.BRANCH_ID
WHERE t.TXN_DATE BETWEEN '2024-03-01' AND '2024-03-31'
ORDER BY t.TXN_DATE, t.TXN_ID;

Query 3: Branch Staffing Report

Business question: For each branch, show the branch manager, total employee count, and total salary expense. Include branches with no employees.

SELECT b.BRANCH_ID,
       b.BRANCH_NAME,
       COALESCE(mgr.FIRST_NAME || ' ' || mgr.LAST_NAME, 'Vacant') AS MANAGER_NAME,
       COUNT(e.EMPLOYEE_ID) AS EMPLOYEE_COUNT,
       COALESCE(SUM(e.SALARY), 0) AS TOTAL_SALARY_EXPENSE
FROM BRANCH b
LEFT OUTER JOIN EMPLOYEE mgr
    ON b.MANAGER_ID = mgr.EMPLOYEE_ID
LEFT OUTER JOIN EMPLOYEE e
    ON b.BRANCH_ID = e.BRANCH_ID
GROUP BY b.BRANCH_ID, b.BRANCH_NAME,
         mgr.FIRST_NAME, mgr.LAST_NAME
ORDER BY b.BRANCH_NAME;

Query 4: Transfer Pairs (Self-Join on TRANSACTION)

Business question: For every transfer transaction, show the source account, destination account, and both account holders' names.

SELECT t.TXN_ID,
       t.TXN_DATE,
       t.AMOUNT,
       -- Source account details
       src.ACCOUNT_ID AS FROM_ACCOUNT,
       src_cust.FIRST_NAME || ' ' || src_cust.LAST_NAME AS FROM_CUSTOMER,
       -- Destination account details
       dst.ACCOUNT_ID AS TO_ACCOUNT,
       dst_cust.FIRST_NAME || ' ' || dst_cust.LAST_NAME AS TO_CUSTOMER
FROM TRANSACTION t
INNER JOIN ACCOUNT src
    ON t.ACCOUNT_ID = src.ACCOUNT_ID
INNER JOIN CUSTOMER src_cust
    ON src.CUSTOMER_ID = src_cust.CUSTOMER_ID
INNER JOIN ACCOUNT dst
    ON t.RELATED_ACCOUNT_ID = dst.ACCOUNT_ID
INNER JOIN CUSTOMER dst_cust
    ON dst.CUSTOMER_ID = dst_cust.CUSTOMER_ID
WHERE t.TXN_TYPE = 'TRANSFER'
ORDER BY t.TXN_DATE;

Query 5: Customers and Their Most Recent Transaction (LATERAL)

Business question: For every customer, show their most recent transaction across all accounts.

SELECT c.CUSTOMER_ID,
       c.FIRST_NAME || ' ' || c.LAST_NAME AS CUSTOMER_NAME,
       lt.ACCOUNT_ID,
       lt.TXN_ID,
       lt.TXN_TYPE,
       lt.AMOUNT,
       lt.TXN_DATE
FROM CUSTOMER c
LEFT OUTER JOIN LATERAL (
    SELECT t.TXN_ID, t.ACCOUNT_ID, t.TXN_TYPE, t.AMOUNT, t.TXN_DATE
    FROM TRANSACTION t
    INNER JOIN ACCOUNT a ON t.ACCOUNT_ID = a.ACCOUNT_ID
    WHERE a.CUSTOMER_ID = c.CUSTOMER_ID
    ORDER BY t.TXN_DATE DESC
    FETCH FIRST 1 ROW ONLY
) AS lt ON 1=1
ORDER BY c.CUSTOMER_ID;

Query 6: Inactive Accounts Report (Anti-Join)

Business question: Find all accounts that have had no transactions in the last 90 days.

SELECT a.ACCOUNT_ID,
       a.ACCOUNT_TYPE,
       a.BALANCE,
       c.FIRST_NAME || ' ' || c.LAST_NAME AS CUSTOMER_NAME,
       b.BRANCH_NAME
FROM ACCOUNT a
INNER JOIN CUSTOMER c
    ON a.CUSTOMER_ID = c.CUSTOMER_ID
INNER JOIN BRANCH b
    ON a.BRANCH_ID = b.BRANCH_ID
LEFT OUTER JOIN TRANSACTION t
    ON a.ACCOUNT_ID = t.ACCOUNT_ID
   AND t.TXN_DATE >= CURRENT DATE - 90 DAYS
WHERE a.STATUS = 'ACTIVE'
  AND t.TXN_ID IS NULL
ORDER BY a.BALANCE DESC;

Notice the anti-join pattern: the transaction date filter is in the ON clause (to only consider recent transactions as matches), and the IS NULL check is in the WHERE clause (to keep only unmatched accounts).

This query is a perfect example of the ON-vs-WHERE distinction in action. If we moved t.TXN_DATE >= CURRENT DATE - 90 DAYS to the WHERE clause, it would eliminate every account whose only transactions are older than 90 days — those accounts would show t.TXN_DATE as NULL (from the outer join), and NULL >= CURRENT DATE - 90 DAYS evaluates to UNKNOWN (filtered out). The ON placement ensures we only look for recent transactions as matches, while still preserving accounts that have no recent transactions.

Query 7: Account Type Distribution by Branch (CROSS JOIN + LEFT JOIN)

Business question: For each branch, show the count of each account type. Include zero counts for account types that do not exist at a branch.

SELECT b.BRANCH_NAME,
       at.ACCOUNT_TYPE,
       COUNT(a.ACCOUNT_ID) AS ACCOUNT_COUNT,
       COALESCE(SUM(a.BALANCE), 0) AS TOTAL_BALANCE
FROM BRANCH b
CROSS JOIN (
    VALUES ('CHECKING'), ('SAVINGS'), ('CD'), ('MONEY_MARKET')
) AS at(ACCOUNT_TYPE)
LEFT OUTER JOIN ACCOUNT a
    ON b.BRANCH_ID = a.BRANCH_ID
   AND a.ACCOUNT_TYPE = at.ACCOUNT_TYPE
GROUP BY b.BRANCH_NAME, at.ACCOUNT_TYPE
ORDER BY b.BRANCH_NAME, at.ACCOUNT_TYPE;

This is a three-step pattern that appears frequently in reporting: 1. CROSS JOIN creates every branch-accounttype combination (the "skeleton"). 2. LEFT OUTER JOIN brings in actual data where it exists. 3. COUNT and SUM with COALESCE fill in zeros where there is no data.

Without the CROSS JOIN skeleton, branches that have no CD accounts would simply not appear in the CD row — the report would have invisible gaps.

6.12 A Preview: How the Optimizer Processes Joins

DB2's query optimizer is one of the most sophisticated in the industry. When you write a join, you describe what you want. The optimizer decides how to get it. Understanding the basics helps you write better queries and diagnose performance problems.

Three Join Methods

DB2 uses three primary strategies to execute a join:

1. Nested Loop Join

The conceptual model from Section 6.2: for each row in the outer table, look up matching rows in the inner table (typically using an index).

For each row in CUSTOMER:
    Look up matching rows in ACCOUNT using index on CUSTOMER_ID
    Output combined rows

Best for: Small outer table, indexed inner table, or highly selective joins.
Cost: Proportional to outer_rows × (index lookup cost).

2. Merge Scan Join (Sort-Merge Join)

Both tables are sorted on the join column, then merged in a single pass:

Sort CUSTOMER by CUSTOMER_ID
Sort ACCOUNT by CUSTOMER_ID
Merge both sorted streams, matching rows in sequence

Best for: Large tables that are already sorted or have clustered indexes on the join columns.
Cost: Proportional to sort_cost + (rows_a + rows_b).

3. Hash Join

Build a hash table from the smaller table, then probe it with rows from the larger table:

Build hash table from CUSTOMER on CUSTOMER_ID
For each row in ACCOUNT:
    Probe hash table with ACCOUNT.CUSTOMER_ID
    If match found, output combined row

Best for: Large tables with no useful indexes, equi-joins only.
Cost: Proportional to rows_a + rows_b, but requires memory for the hash table.

What You Control

You do not (and should not) tell DB2 which join method to use. Instead, you influence the optimizer's choices by:

Creating appropriate indexes (Chapter 20) — especially on foreign key columns and commonly filtered columns.
Keeping statistics current with RUNSTATS (Chapter 21) — so the optimizer has accurate row counts and data distribution information.
Writing clear join conditions — ambiguous or overly complex ON clauses can confuse the optimizer.

Join Order

When you join five tables, there are 5! = 120 possible orders in which the optimizer could process them. The optimizer evaluates many of these (pruning obviously bad candidates) and chooses the order with the lowest estimated cost.

You might write:

FROM CUSTOMER c
INNER JOIN ACCOUNT a ON ...
INNER JOIN BRANCH b ON ...
INNER JOIN TRANSACTION t ON ...
INNER JOIN EMPLOYEE e ON ...

But the optimizer might process them as: TRANSACTION → ACCOUNT → CUSTOMER → BRANCH → EMPLOYEE, because starting with TRANSACTION (after a selective date filter) produces the fewest intermediate rows.

Key insight: The order you list tables in the FROM clause does not determine execution order. The optimizer makes that decision. Write your joins in the order that makes the query most readable for humans.

What Affects Join Performance in Practice

While the full treatment belongs in Chapter 22, here are the factors that most commonly affect join performance in day-to-day DB2 work:

Indexes on join columns. An index on the foreign key column (e.g., ACCOUNT.CUSTOMER_ID) dramatically speeds up nested loop joins. Without it, each lookup requires a full table scan.
Data volume. A join between two 100-row tables is instantaneous regardless of method. A join between two 100-million-row tables requires careful optimization. The strategies that work at small scale may fail catastrophically at large scale.
Selectivity of WHERE predicates. If your WHERE clause filters TRANSACTION to just 50 rows before joining, even a nested loop join is fast. If no filtering happens until after the join, DB2 must join millions of rows before discarding most of them.
Available memory (sort heap, buffer pool). Hash joins need memory for the hash table. Merge joins may need memory for sorting. If memory is insufficient, DB2 spills to disk, which is orders of magnitude slower.
Statistics freshness. Run RUNSTATS after loading data, after major updates, and after adding or dropping indexes. The optimizer cannot make good decisions without accurate statistics.

A Rule of Thumb

For most OLTP (online transaction processing) queries that join 2-5 tables with selective WHERE conditions and proper indexes, DB2 will choose an efficient plan automatically. Spend your time on clear SQL and good indexing. Reserve optimizer hints and manual plan overrides for the rare cases where the optimizer genuinely makes a mistake — and always confirm with EXPLAIN before and after.

We will return to join optimization in depth in Chapter 22 (Query Performance Tuning), including how to read EXPLAIN plans, identify join bottlenecks, and use optimization hints when the optimizer makes suboptimal choices.

Spaced Review — Chapters 1 Through 5

Each chapter includes review questions from earlier material to reinforce retention.

From Chapter 1 (Introduction to DB2): 1. What is the difference between DB2 for LUW and DB2 for z/OS? Name two differences in features or configuration. 2. What tool do you use to run SQL interactively in DB2 for LUW?

From Chapter 2 (Creating Your First Database): 3. Write the DB2 command to create a database named MERIDIAN_BANK with UTF-8 encoding. 4. What is the purpose of a tablespace, and how does it relate to tables?

From Chapter 3 (Tables, Columns, and Data Types): 5. What is the difference between CHAR(10) and VARCHAR(10)? 6. Why should you use DECIMAL(12,2) rather than FLOAT for monetary values?

From Chapter 4 (Inserting, Updating, and Deleting Data): 7. Write a single INSERT statement that adds three rows to a table. 8. What happens if you run DELETE FROM CUSTOMER without a WHERE clause?

From Chapter 5 (SELECT, WHERE, and Ordering): 9. What is the difference between WHERE CITY = 'Chicago' and WHERE CITY LIKE 'Chi%'? 10. Explain why WHERE SALARY > NULL returns no rows, and how you should test for NULL instead.

Chapter Summary

Joins are the mechanism that transforms a normalized relational design into meaningful, combined results. Here is what we covered:

Join Type	Returns	Use When
INNER JOIN	Only matching rows from both tables	You want only rows with data on both sides
LEFT OUTER JOIN	All left rows + matching right rows (NULLs for no match)	You need every row from the left table regardless
RIGHT OUTER JOIN	All right rows + matching left rows (NULLs for no match)	Mirror of LEFT (prefer LEFT for consistency)
FULL OUTER JOIN	All rows from both tables (NULLs on both sides for no match)	Data reconciliation, gap detection
CROSS JOIN	Every row from A combined with every row from B	Deliberate Cartesian products, generating combinations
LATERAL	Right side can reference left side columns	Top-N per group, correlated row generation

The most important lessons from this chapter:

INNER JOIN is your default. Use it when both sides must have data.
LEFT OUTER JOIN is your safety net. Use it when the right side is optional.
Put right-side filters in ON, not WHERE, when using outer joins.
Use explicit JOIN ... ON syntax. Never use comma-separated FROM clauses.
Watch for row multiplication when joining one-to-many relationships and aggregating.
The optimizer chooses the execution strategy. Your job is clear SQL and good indexes.

In Chapter 7, we will build on joins by adding aggregation — GROUP BY, HAVING, and aggregate functions — to answer questions like "What is the total balance per branch?" and "Which customers have more than three accounts?"

Next: Chapter 7 — Aggregation: GROUP BY, HAVING, and Aggregate Functions