Chapter 23 Exercises: Reading and Interpreting EXPLAIN

DataField.Dev

Chapter 23 Exercises: Reading and Interpreting EXPLAIN

These exercises test your ability to read EXPLAIN output, identify performance problems, and propose solutions. They range from straightforward plan reading (Section A) to advanced diagnosis and optimization (Section D).

Section A: Reading PLAN_TABLE Output (z/OS)

Exercise 1: Identify the Access Path

Given this PLAN_TABLE output:

PLANNO	METHOD	TNAME	ACCESSTYPE	MATCHCOLS	ACCESSNAME	INDEXONLY
1	0	CUSTOMERS	I	1	IX_CUST_PK	N

(a) What type of access is being performed? (b) How many index key columns are matching the predicate? (c) Why might INDEXONLY be 'N'? (d) Is this a good access plan for a primary key lookup? Why or why not?

Exercise 2: Two-Table Join Analysis

Given this PLAN_TABLE output:

PLANNO	METHOD	TNAME	ACCESSTYPE	MATCHCOLS	ACCESSNAME	INDEXONLY	SORTC_ORDERBY
1	0	ACCOUNTS	I	2	IX_ACCT_BRANCH	N	N
2	1	CUSTOMERS	I	1	IX_CUST_PK	N	Y

(a) What is the driving table? (b) What join method is used? (c) Why is SORTC_ORDERBY = 'Y' on step 2? (d) If the query returns approximately 200 rows, is the sort likely to be a performance concern?

Exercise 3: Decode the ACCESSTYPE

For each of the following ACCESSTYPE values, describe what DB2 is doing: (a) R (b) I with MATCHCOLS = 3 (c) I1 (d) M (e) N

Exercise 4: Sort Column Interpretation

A PLAN_TABLE row shows the following sort indicators:

SORTN_UNIQ	SORTN_JOIN	SORTN_ORDERBY	SORTN_GROUPBY	SORTC_UNIQ	SORTC_JOIN	SORTC_ORDERBY	SORTC_GROUPBY
N	N	N	N	N	Y	N	N

(a) What type of sort is occurring? (b) Is the sort on the "new" table or the "composite" table? (c) What join method is likely in use (given that a sort for join is occurring on the composite)? (d) How might you eliminate this sort?

Exercise 5: Prefetch Identification

Given these two PLAN_TABLE rows:

PLANNO	TNAME	ACCESSTYPE	PREFETCH
1	TRANSACTIONS	I	L
2	ACCOUNTS	R	S

(a) What type of prefetch is used for TRANSACTIONS? (b) What type of prefetch is used for ACCOUNTS? (c) What does list prefetch on an index scan suggest about the data? (d) What does sequential prefetch on a table scan indicate?

Section B: Reading LUW Operator Trees

Exercise 6: Identify the Operators

The following db2expln output shows an operator tree. Describe what each operator does:

RETURN
  |
SORT
  |
NLJOIN
 /    \
FETCH  FETCH
  |      |
IXSCAN IXSCAN

(a) What is the purpose of each operator in this tree? (b) How does data flow through the tree? (c) What is the SORT for (assuming the query has an ORDER BY clause)? (d) Why are there two FETCH operators?

Exercise 7: Index-Only Access Recognition

Consider these two operator trees. Which one uses index-only access?

Tree A:

RETURN -> IXSCAN (IX_ACCT_BRANCH)

Tree B:

RETURN -> FETCH -> IXSCAN (IX_ACCT_BRANCH)

(a) Which tree shows index-only access? (b) What type of query might produce Tree A? (c) What additional columns would need to be in the index to convert Tree B to index-only?

Exercise 8: Hash Join vs Nested Loop

You see the following two alternative plans for the same query:

Plan A:

RETURN -> HSJOIN -> (TBSCAN TRANSACTIONS, TBSCAN ACCOUNTS)
Total Cost: 45,000 timerons

Plan B:

RETURN -> NLJOIN -> (FETCH -> IXSCAN on TRANSACTIONS, TBSCAN ACCOUNTS)
Total Cost: 1,200,000 timerons

(a) Why is Plan A so much cheaper than Plan B? (b) Under what conditions might the optimizer incorrectly choose Plan B? (c) What index could potentially make a nested loop plan competitive?

Exercise 9: Multi-Operator Analysis

Analyze this operator tree:

RETURN (Total Cost: 12,500)
  |
SORT (Cost: 12,490, Est. Rows: 500,000)
  |
HSJOIN (Cost: 8,200, Est. Rows: 500,000)
 /              \
TBSCAN           FETCH (Cost: 120, Est. Rows: 300)
TRANSACTIONS       |
(Cost: 7,800)    IXSCAN
(Est. Rows:      IX_ACCT_BRANCH
 500,000)        (Est. Rows: 300)

(a) What is the most expensive operation? (b) Why is the SORT expensive? (c) What is the join method and is it appropriate? (d) Could an index on TRANSACTIONS improve this plan? (e) If the sort is for ORDER BY, what index might eliminate it?

Exercise 10: TBSCAN Diagnosis

You see a TBSCAN on the TRANSACTIONS table (50 million rows) with estimated output of 500 rows. The query predicate is:

WHERE YEAR(TRANS_DATE) = 2025 AND MONTH(TRANS_DATE) = 6

(a) Why is DB2 using a table scan despite the selective predicate? (b) Rewrite the predicate to enable index usage. (c) What index would you create for the rewritten predicate?

Section C: Diagnosing Problems ("What's Wrong with This Plan?")

Exercise 11: The Disastrous Nested Loop

PLAN_TABLE:
| PLANNO | METHOD | TNAME        | ACCESSTYPE | MATCHCOLS | ACCESSNAME |
|--------|--------|--------------|------------|-----------|------------|
| 1      | 0      | ACCOUNTS     | I          | 2         | IX_ACCT_BR |
| 2      | 1      | TRANSACTIONS | R          | 0         | —          |

The ACCOUNTS step returns approximately 5,000 rows. TRANSACTIONS has 50 million rows.

(a) What is wrong with this plan? (b) Estimate the number of rows DB2 will read from TRANSACTIONS. (c) What single change would most dramatically improve performance? (d) What index would you create?

Exercise 12: The Missing MATCHCOLS

Index IX_TRANS_COMPOSITE is defined as (BRANCH_ID, TRANS_TYPE, TRANS_DATE). The query is:

SELECT * FROM MERIDIAN.TRANSACTIONS
WHERE TRANS_TYPE = 'DEPOSIT'
  AND TRANS_DATE = '2025-06-15';

PLAN_TABLE shows: ACCESSTYPE = 'I', MATCHCOLS = 0, ACCESSNAME = IX_TRANS_COMPOSITE

(a) Why is MATCHCOLS = 0 even though two of the three index columns are in the predicate? (b) What would MATCHCOLS be if the query also included WHERE BRANCH_ID = 'BR-0042'? (c) What alternative index design would give better matching for this query pattern?

Exercise 13: The Cardinality Error

A query joins CUSTOMERS and ACCOUNTS. The optimizer estimates the join will produce 10 rows, but the actual result is 250,000 rows. Downstream, a nested loop join into TRANSACTIONS is chosen.

(a) What is the root cause of the performance problem? (b) How does the cardinality underestimate lead to a bad plan? (c) What should you check first? (d) If statistics are current, what other cause might explain the cardinality error?

Exercise 14: The Unnecessary Sort

PLAN_TABLE:
| PLANNO | METHOD | TNAME    | ACCESSTYPE | ACCESSNAME   | SORTC_ORDERBY |
|--------|--------|----------|------------|--------------|---------------|
| 1      | 0      | ACCOUNTS | I          | IX_ACCT_DATE | N             |
| 2      | 1      | CUSTOMERS| I          | IX_CUST_PK   | Y             |

The query has ORDER BY A.ACCT_OPEN_DATE. IX_ACCT_DATE is on (ACCT_OPEN_DATE).

(a) The index provides data in ACCT_OPEN_DATE order, so why does SORTC_ORDERBY = 'Y' appear? (b) Under what conditions could the sort be eliminated? (c) Does the nested loop join preserve the order from the outer table?

Exercise 15: The Phantom Index

You create a new index IX_TRANS_OPT on TRANSACTIONS(ACCT_ID, TRANS_DATE, TRANS_AMT). You re-explain the query but the optimizer still chooses the old IX_TRANS_ACCT index (which has only ACCT_ID, TRANS_DATE).

(a) List three possible reasons the optimizer did not choose the new index. (b) What is the most likely reason? (c) What step did you probably skip?

Exercise 16: Multiple Index Access

PLAN_TABLE:
| PLANNO | METHOD | TNAME        | ACCESSTYPE | ACCESSNAME       | MIXOPSEQ |
|--------|--------|--------------|------------|------------------|----------|
| 1      | 0      | TRANSACTIONS | MX         | IX_TRANS_DATE    | 1        |
| 1      | 0      | TRANSACTIONS | MX         | IX_TRANS_TYPE    | 2        |

(a) What is DB2 doing here? (b) Why are there two rows with the same PLANNO? (c) What single index could replace this multiple index access? (d) Would the replacement index always be better?

Exercise 17: Filter Factor Investigation

The DSN_PREDICAT_TABLE shows:

PLANNO	PREDNO	TYPE	FILTER_FACTOR	TEXT
1	1	I	0.5	Q1.ACCT_STATUS = 'ACTIVE'
1	2	I	0.01	Q1.BRANCH_ID = 'BR-0042'

(a) Which predicate is more selective? (b) What is the combined filter factor (assuming independence)? (c) If the ACCOUNTS table has 2,000,000 rows, how many rows does the optimizer expect after both predicates? (d) If 80% of accounts are actually ACTIVE, is the filter factor for predicate 1 correct?

Section D: Optimization Exercises

Exercise 18: Design a Covering Index

The following query runs thousands of times per day:

SELECT ACCT_ID, BALANCE
FROM MERIDIAN.ACCOUNTS
WHERE BRANCH_ID = 'BR-0042'
  AND ACCT_STATUS = 'ACTIVE';

Current plan: IXSCAN on IX_ACCT_BRANCH(BRANCH_ID, ACCT_STATUS) + FETCH. INDEXONLY = N.

(a) Design a covering index that enables index-only access. (b) What are the columns and their order? (c) What is the trade-off of the wider index? (d) Write the CREATE INDEX statement.

Exercise 19: Eliminate a Sort

The query is:

SELECT ACCT_ID, TRANS_DATE, TRANS_AMT
FROM MERIDIAN.TRANSACTIONS
WHERE ACCT_ID = 'A-0012345'
ORDER BY TRANS_DATE DESC
FETCH FIRST 10 ROWS ONLY;

The current plan shows: IXSCAN on IX_TRANS_ACCT(ACCT_ID, TRANS_DATE) + FETCH + SORT.

(a) Why is DB2 sorting when the index already includes TRANS_DATE? (b) How can the sort be eliminated? (c) Does the existing index support reverse scanning? (d) On LUW, what argument in the IXSCAN operator would tell you the scan direction?

Exercise 20: Choose the Right Join Method

For each scenario, state which join method (NLJOIN, MSJOIN, or HSJOIN) is most appropriate and why:

(a) Table A has 100 rows after filtering. Table B has 10,000,000 rows with an index on the join column. (b) Table A has 5,000,000 rows. Table B has 8,000,000 rows. Neither has an index on the join column. (c) Table A has 1,000 rows. Table B has 1,000 rows. Both have indexes on the join column. (d) Table A has 200 rows. Table B has 50,000,000 rows with no index on the join column.

Exercise 21: Rewrite for Better Plans

Rewrite each query to enable a better access plan:

(a) Original: WHERE YEAR(TRANS_DATE) = 2025 (b) Original: WHERE CUST_NAME LIKE '%Johnson%' (c) Original: WHERE ACCT_ID IN (SELECT ACCT_ID FROM TRANSACTIONS WHERE TRANS_AMT > 50000) (d) Original: WHERE CAST(BRANCH_ID AS INTEGER) = 42

Exercise 22: EXPLAIN-Driven Index Recommendation

Given this PLAN_TABLE output for a production query that runs every 15 minutes:

PLANNO	METHOD	TNAME	ACCESSTYPE	MATCHCOLS	ACCESSNAME	INDEXONLY	PREFETCH
1	0	TRANSACTIONS	R	0	—	N	S
2	1	ACCOUNTS	I	1	IX_ACCT_PK	N	—
3	1	CUSTOMERS	I	1	IX_CUST_PK	N	—

The query WHERE clause includes: T.TRANS_DATE >= CURRENT DATE - 1 DAY AND T.TRANS_TYPE = 'WIRE'

(a) What is the most significant problem? (b) Propose an index for TRANSACTIONS. (c) What ACCESSTYPE and MATCHCOLS do you expect after creating the index? (d) If TRANS_TYPE has only 5 distinct values and TRANS_DATE has 3,650 distinct values (10 years), which column should be first in the index? Why?

Exercise 23: Interpreting Cost Changes

You run EXPLAIN before and after adding an index. Here are the results:

Before: Total Cost = 45,000 timerons, I/O Cost = 42,000, CPU Cost = 3,000 After: Total Cost = 850 timerons, I/O Cost = 120, CPU Cost = 730

(a) What was the dominant cost factor before the change? (b) What is the dominant cost factor after the change? (c) By what factor did the total cost improve? (d) Does the shift from I/O-dominant to CPU-dominant make sense? Why?

Exercise 24: Complex Plan Analysis

Analyze this LUW operator tree for a monthly batch report:

RETURN (Total Cost: 2,500,000)
  |
SORT (ORDER BY BRANCH_ID) — Est. Rows: 150
  |
GRPBY (GROUP BY BRANCH_ID) — Est. Rows: 150
  |
SORT (GROUP BY BRANCH_ID) — Est. Rows: 12,000,000
  |
HSJOIN — Est. Rows: 12,000,000
 /                  \
FETCH               TBSCAN (ACCOUNTS)
  |                 Est. Rows: 2,000,000
IXSCAN
IX_TRANS_DATE
Est. Rows: 12,000,000

(a) What is the most expensive part of this plan? (b) The SORT before GRPBY processes 12 million rows to produce 150 groups. Is there a more efficient way to aggregate? (c) The TBSCAN on ACCOUNTS reads all 2 million rows. Is this necessarily bad here? (d) Could a hash-based GROUP BY avoid the 12-million-row sort? (e) What would you recommend to improve this plan?

Exercise 25: Real-World Scenario — Meridian ATM Query

Meridian's ATM system runs this query for every balance check:

SELECT A.BALANCE, A.ACCT_STATUS, C.CUST_NAME
FROM MERIDIAN.ACCOUNTS A
JOIN MERIDIAN.CUSTOMERS C ON A.CUST_ID = C.CUST_ID
WHERE A.ACCT_ID = ?

The query runs 50,000 times per hour. Current EXPLAIN shows:

PLANNO	METHOD	TNAME	ACCESSTYPE	MATCHCOLS	ACCESSNAME	INDEXONLY
1	0	ACCOUNTS	I	1	IX_ACCT_PK	N
2	1	CUSTOMERS	I	1	IX_CUST_PK	N

(a) Is this a good plan? Justify your answer. (b) Could index-only access be achieved on either table? (c) Would you recommend any changes? Consider the trade-offs. (d) What would the plan look like if IX_ACCT_PK were dropped (accidentally)?

Section E: Cross-Platform Comparison

Exercise 26: Translate z/OS to LUW

Given this z/OS PLAN_TABLE output, describe what the equivalent LUW operator tree would look like:

PLANNO	METHOD	TNAME	ACCESSTYPE	MATCHCOLS	ACCESSNAME	INDEXONLY	SORTC_ORDERBY
1	0	TRANSACTIONS	I	2	IX_TRANS_ACCT	Y	N
2	4	ACCOUNTS	R	0	—	N	N
(composite)							Y

Exercise 27: Translate LUW to z/OS

Given this LUW operator tree, describe what the z/OS PLAN_TABLE rows would look like:

RETURN
  |
NLJOIN
 /      \
FETCH    IXSCAN (IX_CUST_PK)
  |
IXSCAN (IX_ACCT_BRANCH)

Exercise 28: Platform Differences

For each of the following, explain how the concept is represented differently on z/OS vs LUW:

(a) Index-only access (b) The join method (c) Sort operations (d) Estimated cardinality (e) Total query cost

Answer Key Hints

For self-study, here are brief hints for selected exercises:

Exercise 1(c): INDEXONLY = N when the SELECT list or other clauses reference columns not in the index.
Exercise 10(a): Applying YEAR() and MONTH() functions to TRANS_DATE prevents index usage because the predicate is not sargable.
Exercise 11(c): Create an index on TRANSACTIONS(ACCT_ID) — or better, on TRANSACTIONS(ACCT_ID, TRANS_DATE) if date filtering is also needed.
Exercise 12(a): MATCHCOLS counts consecutive leading key columns with equality predicates. BRANCH_ID is the first column but has no predicate.
Exercise 15(b): Most likely reason: RUNSTATS was not executed after creating the index, so the optimizer has no statistics for it.
Exercise 18(a): CREATE INDEX IX_ACCT_BRANCH_COV ON ACCOUNTS(BRANCH_ID, ACCT_STATUS, ACCT_ID, BALANCE).
Exercise 22(d): TRANS_TYPE should be first because equality predicates on low-cardinality columns act as effective leading key filters, and TRANS_DATE as a range predicate works well as the second column.