Case Study 1: EXPLAIN-Driven Tuning — 10x Performance Improvement
The Situation
Meridian National Bank's end-of-day batch process includes a reconciliation report that summarizes all wire transfers processed during the business day. The report joins three tables — TRANSACTIONS, ACCOUNTS, and BRANCHES — to produce a summary by branch showing the number of wire transfers, total amounts, and average amounts.
The query had been running acceptably for years, averaging about 45 seconds. After a data migration that tripled the size of the TRANSACTIONS table from 18 million to 54 million rows, the same query began taking over 8 minutes. The batch window was being exceeded, and downstream processes were backing up.
The production DBA, Janelle, was asked to fix it overnight.
The Query
SELECT B.BRANCH_NAME,
B.BRANCH_CITY,
COUNT(*) AS WIRE_COUNT,
SUM(T.TRANS_AMT) AS TOTAL_WIRE_AMT,
AVG(T.TRANS_AMT) AS AVG_WIRE_AMT
FROM MERIDIAN.TRANSACTIONS T
JOIN MERIDIAN.ACCOUNTS A ON T.ACCT_ID = A.ACCT_ID
JOIN MERIDIAN.BRANCHES B ON A.BRANCH_ID = B.BRANCH_ID
WHERE T.TRANS_TYPE = 'WIRE'
AND T.TRANS_DATE = CURRENT DATE
GROUP BY B.BRANCH_NAME, B.BRANCH_CITY
ORDER BY TOTAL_WIRE_AMT DESC;
Step 1: Capture the Baseline EXPLAIN
Janelle's first action was to capture the current access plan. She did not guess. She did not start creating indexes. She looked at the plan.
EXPLAIN PLAN SET QUERYNO = 5001 FOR
SELECT B.BRANCH_NAME, B.BRANCH_CITY,
COUNT(*) AS WIRE_COUNT,
SUM(T.TRANS_AMT) AS TOTAL_WIRE_AMT,
AVG(T.TRANS_AMT) AS AVG_WIRE_AMT
FROM MERIDIAN.TRANSACTIONS T
JOIN MERIDIAN.ACCOUNTS A ON T.ACCT_ID = A.ACCT_ID
JOIN MERIDIAN.BRANCHES B ON A.BRANCH_ID = B.BRANCH_ID
WHERE T.TRANS_TYPE = 'WIRE'
AND T.TRANS_DATE = CURRENT DATE
GROUP BY B.BRANCH_NAME, B.BRANCH_CITY
ORDER BY TOTAL_WIRE_AMT DESC;
The Baseline Plan (z/OS)
| PLANNO | METHOD | TNAME | ACCESSTYPE | MATCHCOLS | ACCESSNAME | INDEXONLY | PREFETCH |
|---|---|---|---|---|---|---|---|
| 1 | 0 | TRANSACTIONS | R | 0 | — | N | S |
| 2 | 1 | ACCOUNTS | I | 1 | IX_ACCT_PK | N | — |
| 3 | 1 | BRANCHES | I | 1 | IX_BRANCH_PK | N | — |
Statement cost (DSN_STATEMNT_TABLE): PROCSU = 485,000, COST_CATEGORY = A
Step 2: Analyze the Plan
Janelle read the plan and immediately saw the problem.
PLANNO 1: TRANSACTIONS — Tablespace scan. DB2 was scanning all 54 million rows of TRANSACTIONS with sequential prefetch. The TRANS_TYPE = 'WIRE' and TRANS_DATE = CURRENT DATE predicates were being applied as stage 1 predicates during the scan, but DB2 still had to read every page of the tablespace to find the qualifying rows.
PLANNO 2: ACCOUNTS — Nested loop join. For each qualifying TRANSACTION row, DB2 looked up the ACCOUNT via the primary key. This was fine.
PLANNO 3: BRANCHES — Nested loop join. For each joined row, DB2 looked up the BRANCH via primary key. Also fine.
The problem was entirely in step 1. The TRANSACTIONS table had no index on TRANS_TYPE or TRANS_DATE that the optimizer could use. The only existing indexes were:
- IX_TRANS_PK on (TRANS_ID) — primary key
- IX_TRANS_ACCT on (ACCT_ID, TRANS_DATE) — for account-based lookups
Neither index had TRANS_TYPE as a leading column, and IX_TRANS_ACCT had ACCT_ID as the leading column, making it useless for a TRANS_TYPE + TRANS_DATE filter.
Step 3: Identify the Problem
Root cause: No suitable index for the TRANS_TYPE + TRANS_DATE predicate combination. The tablespace scan on 54 million rows was reading approximately 1.2 million data pages just to find roughly 3,500 wire transfers from the current day.
Step 4: Hypothesize a Fix
Janelle considered two options:
Option A: Create an index on (TRANS_TYPE, TRANS_DATE). This would allow an index scan with MATCHCOLS = 2, positioning directly to WIRE transfers on the current date.
Option B: Create an index on (TRANS_DATE, TRANS_TYPE). This would also allow MATCHCOLS = 2, but with TRANS_DATE as the leading column.
She chose Option A — (TRANS_TYPE, TRANS_DATE) — for a specific reason. TRANS_TYPE = 'WIRE' is an equality predicate with very few distinct values (about 8 transaction types). TRANS_DATE = CURRENT DATE is also an equality predicate. With both columns as equality predicates, either column order would give MATCHCOLS = 2. However, (TRANS_TYPE, TRANS_DATE) would cluster all WIRE entries together in the index, making the range of index entries to scan smaller and more contiguous.
She also considered adding ACCT_ID and TRANS_AMT as include columns for index-only access:
CREATE INDEX MERIDIAN.IX_TRANS_TYPE_DATE
ON MERIDIAN.TRANSACTIONS(TRANS_TYPE, TRANS_DATE)
INCLUDE(ACCT_ID, TRANS_AMT);
With ACCT_ID and TRANS_AMT included, DB2 could satisfy the entire TRANSACTIONS access from the index alone, without touching any data pages.
Step 5: Test the Fix
Janelle created the index in a test environment, ran RUNSTATS, and re-explained:
EXPLAIN PLAN SET QUERYNO = 5002 FOR <same query>;
The Optimized Plan
| PLANNO | METHOD | TNAME | ACCESSTYPE | MATCHCOLS | ACCESSNAME | INDEXONLY | PREFETCH |
|---|---|---|---|---|---|---|---|
| 1 | 0 | TRANSACTIONS | I | 2 | IX_TRANS_TYPE_DATE | Y | — |
| 2 | 1 | ACCOUNTS | I | 1 | IX_ACCT_PK | N | — |
| 3 | 1 | BRANCHES | I | 1 | IX_BRANCH_PK | N | — |
Statement cost: PROCSU = 42,000, COST_CATEGORY = A
Step 6: Verify the Improvement
The comparison was dramatic:
| Metric | Before | After | Improvement |
|---|---|---|---|
| ACCESSTYPE on TRANSACTIONS | R (table scan) | I (index scan) | Targeted access |
| MATCHCOLS | 0 | 2 | Full matching |
| INDEXONLY | N | Y | No data page access |
| Estimated cost (PROCSU) | 485,000 | 42,000 | 11.5x reduction |
| Actual elapsed time | 8 min 12 sec | 0.7 sec | ~700x improvement |
The actual improvement exceeded the optimizer's estimate because the index-only access also eliminated all buffer pool misses on the TRANSACTIONS data pages, which had been the dominant I/O cost.
Step 7: Deploy and Monitor
Janelle created the index in production during the next maintenance window:
CREATE INDEX MERIDIAN.IX_TRANS_TYPE_DATE
ON MERIDIAN.TRANSACTIONS(TRANS_TYPE, TRANS_DATE)
INCLUDE(ACCT_ID, TRANS_AMT);
-- RUNSTATS on the new index
-- (z/OS RUNSTATS utility)
She then rebound the package containing the query to pick up the new index, and the batch job completed in under 2 seconds that night.
Lessons Learned
-
Always EXPLAIN first. Janelle spent 5 minutes reading the EXPLAIN output and immediately identified the problem. Without EXPLAIN, she might have spent hours trying different approaches blindly.
-
Index-only access is transformative. The INCLUDE columns (ACCT_ID, TRANS_AMT) cost virtually nothing in terms of index maintenance overhead but eliminated all data page I/O for this query.
-
Column order matters, but both equality predicates give full MATCHCOLS. When both predicates are equalities, either column order in the index gives MATCHCOLS = 2. The choice between them depends on other queries that might use the same index.
-
Optimizer estimates are directional, not exact. The optimizer predicted an 11.5x improvement; the actual improvement was over 700x. The optimizer does not account for buffer pool miss ratios, system load, or I/O scheduling effects. But it correctly identified that the new plan was dramatically better.
-
Data growth changes everything. A plan that was acceptable at 18 million rows became intolerable at 54 million rows. Proactive EXPLAIN monitoring after major data loads can prevent these surprises.
Discussion Questions
- Why did Janelle choose to include ACCT_ID and TRANS_AMT in the index rather than adding them as key columns?
- If the query also filtered by BRANCH_ID (adding
AND A.BRANCH_ID = 'BR-0042'), would the same index help? Why or why not? - What maintenance burden does the new index add, and is it justified for a query that runs once per day?
- If TRANS_TYPE had 1,000 distinct values instead of 8, would the index design change? How?