23 min read

> "The fastest query is the one that does the least work. Your job is to figure out where it is doing work it does not need to do."

Chapter 24: SQL Tuning — Rewriting Queries for Performance Without Changing Results

"The fastest query is the one that does the least work. Your job is to figure out where it is doing work it does not need to do."

You have spent the last two chapters learning how the DB2 optimizer thinks. You can read access paths, interpret EXPLAIN output, and understand what the optimizer is trying to accomplish when it builds a plan for your SQL. That knowledge was foundational — but it was also passive. You were reading plans, not changing them.

This chapter is where you become active. SQL tuning is the craft of rewriting queries so that the optimizer can find dramatically better plans — without changing a single row in the result set. It is, in many ways, the highest-leverage skill a DBA or application developer can possess. A single rewrite of a badly written query can reduce a five-minute batch job to five seconds. A sargable predicate pattern can turn a tablespace scan into an index-only access path. A well-placed EXISTS clause can eliminate millions of rows that an IN subquery would have materialized into a temporary table.

But here is the part that many people get wrong: SQL tuning is not about applying tricks. It is about understanding why a query is slow, forming a hypothesis about what would make it faster, testing that hypothesis with EXPLAIN and measurement, and iterating until you have proof. The tuning mindset matters more than any individual technique.

Throughout this chapter, we will apply every technique to the Meridian National Bank schema. The bank has real problems — a monthly statement generation batch that runs too long, an account lookup that frustrates tellers, a transaction search that times out for high-volume customers. By the end of this chapter, you will have rewritten all of them.


24.1 The SQL Tuning Mindset

24.1.1 Measure First, Tune Second

The single most important rule of SQL tuning is this: never tune blind. Before you change a single character of SQL, you must know three things:

  1. How long does the query currently take? Not "it feels slow" — actual elapsed time, CPU time, and I/O counts.
  2. What is the current access path? Run EXPLAIN. Read the plan. Identify where the time is going.
  3. What does the data look like? How many rows in each table? How selective are the predicates? What is the cardinality of the join columns?

Without this baseline, you cannot know whether your rewrite helped. You might make the query faster for your test data but slower for production volumes. You might eliminate one bottleneck and introduce another. Measurement is not optional — it is the foundation.

On z/OS, your primary tools for baseline measurement are:

-- Capture EXPLAIN for the original query
EXPLAIN ALL SET QUERYNO = 2401 FOR
SELECT c.customer_name, a.account_number, t.txn_amount
FROM   CUSTOMERS c
JOIN   ACCOUNTS a ON a.customer_id = c.customer_id
JOIN   TRANSACTIONS t ON t.account_id = a.account_id
WHERE  t.txn_date BETWEEN '2025-01-01' AND '2025-01-31'
  AND  c.branch_id = 101;

On LUW, use db2exfmt or the EXPLAIN_STATEMENT procedure, and supplement with db2batch for timing:

# Capture timing and EXPLAIN simultaneously
db2batch -d MERIDIAN -f original_query.sql -r explain_original.txt

24.1.2 The Tuning Cycle

SQL tuning follows a disciplined cycle:

  1. Identify the problem query (from monitoring, user complaints, or batch overruns).
  2. Baseline the current performance (elapsed time, getpages, I/O, CPU).
  3. Analyze the EXPLAIN output (access path, join method, sort operations, estimated costs).
  4. Hypothesize what change will improve performance (new index? rewrite predicate? change join order?).
  5. Implement the change (rewrite SQL, add index, update statistics).
  6. Measure again with the same data and workload.
  7. Validate that results are identical (critical — a rewrite that returns different results is a bug, not a tune).
  8. Document what you changed and why, including before/after metrics.

Steps 6 and 7 are where most people cut corners. Do not. A tuning change that was never measured is just a guess with extra steps. And a tuning change that silently changed the result set is a production incident waiting to happen.

24.1.3 When Not to Tune SQL

Not every slow query needs a SQL rewrite. Before you start rewriting, check:

  • Are statistics current? Stale RUNSTATS can cause the optimizer to choose terrible plans. Update statistics first and re-EXPLAIN.
  • Are the right indexes in place? Sometimes the query is fine — it just needs an index that does not exist yet. We covered index design in Chapter 15.
  • Is the system under resource pressure? A query that is slow because the buffer pool is thrashing or the CPU is saturated will not get faster from a SQL rewrite. Check system health first (Chapter 27).
  • Is the data model the problem? If a query requires joining eight tables to answer a simple question, the real fix might be denormalization or a materialized query table.

SQL tuning is the right answer when the access path itself is suboptimal — when the optimizer is scanning where it should seek, sorting where it should use an index, or materializing a temporary result that could be avoided.


24.2 Sargable vs Non-Sargable Predicates

24.2.1 What "Sargable" Means

The term "sargable" comes from Search ARGument ABLE — a predicate is sargable if the database engine can use an index to evaluate it. This is the single most impactful concept in SQL tuning, because it determines whether a query can use index access or must fall back to a scan.

The rule is deceptively simple: a predicate is sargable when the column appears alone on one side of the comparison operator, unmodified by any function or expression.

-- SARGABLE: column alone on the left
WHERE txn_date >= '2025-01-01'
WHERE customer_id = 12345
WHERE account_type IN ('CHK', 'SAV')
WHERE customer_name LIKE 'SMITH%'

-- NON-SARGABLE: column wrapped in a function or expression
WHERE YEAR(txn_date) = 2025
WHERE customer_id + 0 = 12345
WHERE UPPER(customer_name) = 'SMITH'
WHERE SUBSTR(account_number, 1, 3) = 'CHK'
WHERE txn_amount * 1.1 > 1000
WHERE customer_name LIKE '%SMITH%'

When you wrap a column in a function, the optimizer cannot do a range lookup in an index B-tree. It must read every row (or every index entry), apply the function, and then check the result. The cost difference between sargable and non-sargable is often orders of magnitude.

24.2.2 Common Non-Sargable Patterns and Their Rewrites

Pattern 1: Function on a date column

-- NON-SARGABLE: YEAR() on column
WHERE YEAR(txn_date) = 2025 AND MONTH(txn_date) = 6

-- SARGABLE REWRITE: range predicate
WHERE txn_date >= '2025-06-01' AND txn_date < '2025-07-01'

The rewrite converts a function-on-column into a range predicate that DB2 can match to an index on txn_date. On Meridian's TRANSACTIONS table with 50 million rows, this single change reduced getpages from 1.2 million to 3,400 in our testing.

Pattern 2: Function on a string column

-- NON-SARGABLE: UPPER() on column
WHERE UPPER(customer_name) = 'JOHN SMITH'

-- SARGABLE REWRITE (option A): store data in consistent case, query without function
WHERE customer_name = 'JOHN SMITH'

-- SARGABLE REWRITE (option B): use a generated column with an index [LUW]
ALTER TABLE CUSTOMERS ADD COLUMN customer_name_upper VARCHAR(100)
    GENERATED ALWAYS AS (UPPER(customer_name));
CREATE INDEX IX_CUST_NAME_UPPER ON CUSTOMERS(customer_name_upper);
-- Then query:
WHERE customer_name_upper = 'JOHN SMITH'

-- SARGABLE REWRITE (option C): use an expression-based index [LUW 11.5+]
CREATE INDEX IX_CUST_NAME_UPPER ON CUSTOMERS(UPPER(customer_name));
WHERE UPPER(customer_name) = 'JOHN SMITH'  -- now sargable with this index

On z/OS, expression-based indexes are not available, so option A (data standardization) or a generated column approach is required.

Pattern 3: Implicit data type conversion

-- NON-SARGABLE: comparing VARCHAR column to INTEGER
WHERE account_number = 12345   -- account_number is VARCHAR

-- SARGABLE REWRITE: use matching data type
WHERE account_number = '12345'

When data types do not match, DB2 must apply an implicit conversion function. If the conversion is applied to the column (rather than the literal), the predicate becomes non-sargable. Always ensure literal types match column types.

Pattern 4: Arithmetic on the column side

-- NON-SARGABLE: arithmetic on the column
WHERE txn_amount * 1.10 > 5000
WHERE balance - 100 > minimum_balance

-- SARGABLE REWRITE: move arithmetic to the literal side
WHERE txn_amount > 5000 / 1.10        -- i.e., > 4545.45
WHERE balance > minimum_balance + 100

Pattern 5: OR predicates that defeat index usage

-- Potentially non-sargable: OR across different columns
WHERE customer_id = 100 OR branch_id = 5

-- SARGABLE REWRITE: UNION ALL (when result sets are disjoint or duplicates acceptable)
SELECT * FROM ACCOUNTS WHERE customer_id = 100
UNION ALL
SELECT * FROM ACCOUNTS WHERE branch_id = 5 AND customer_id <> 100

An OR between two columns that are indexed separately forces a scan unless the optimizer chooses index ORing (which it does not always do). Splitting into UNION ALL allows each branch to use its own index. Use UNION ALL (not UNION) to avoid an unnecessary sort for duplicate elimination, and add the exclusion predicate on the second branch to avoid duplicates.

Pattern 6: LIKE with leading wildcard

-- NON-SARGABLE: leading wildcard
WHERE customer_name LIKE '%SMITH%'

-- Partially sargable (trailing wildcard only)
WHERE customer_name LIKE 'SMITH%'

-- For true substring search, consider a text search index
-- z/OS: OmniFind Text Search
-- LUW: db2text CREATE INDEX

A leading % prevents index use because the B-tree is sorted by the beginning of the string. If you need true substring search on a high-volume table, a text search index is the correct solution.

24.2.3 Testing Sargability

The definitive test for sargability is EXPLAIN. Run the original and rewritten queries and compare:

  • z/OS: Check MATCHCOLS in PLAN_TABLE. A sargable predicate contributes to MATCHCOLS > 0. A non-sargable predicate yields MATCHCOLS = 0 or falls to residual evaluation.
  • LUW: Check the "Predicates Applied" section in db2exfmt. Look for "Sargable Predicate" vs "Residual Predicate" annotations.
-- z/OS: Quick sargability check
SELECT QUERYNO, PLANNO, QBLOCKNO, TNAME, ACCESSTYPE, MATCHCOLS, ACCESSNAME
FROM   PLAN_TABLE
WHERE  QUERYNO IN (2401, 2402)  -- original vs rewrite
ORDER BY QUERYNO, PLANNO;

If MATCHCOLS increases from 0 to 1 (or higher), your rewrite successfully made the predicate sargable.


24.3 Stage 1 vs Stage 2 Predicate Classification (z/OS)

24.3.1 The Two-Stage Evaluation Model

On z/OS, DB2 evaluates predicates in two stages, and understanding this distinction is essential for tuning:

  • Stage 1 predicates are evaluated by Data Manager (DM), the component that reads data from buffer pools. Stage 1 predicates are applied as data is being read, which means non-qualifying rows are rejected immediately — they never reach the next component.
  • Stage 2 predicates are evaluated by Relational Data Services (RDS), the component above Data Manager. By the time RDS evaluates a predicate, the row has already been fetched into the buffer pool and passed up through the processing pipeline.

The performance difference is significant. A Stage 1 predicate avoids passing irrelevant rows through the entire processing stack. A Stage 2 predicate does all the work of reading the row and then discards it — wasted CPU and potentially wasted I/O.

24.3.2 Stage 1 Predicate Patterns

The following predicate forms are Stage 1:

Predicate Form Example Stage 1?
COL = literal customer_id = 100 Yes
COL = host variable customer_id = :hv1 Yes
COL > literal (and >=, <, <=) txn_date > '2025-01-01' Yes
COL BETWEEN literal AND literal txn_date BETWEEN '2025-01-01' AND '2025-12-31' Yes
COL IN (list) status IN ('A','I','C') Yes
COL LIKE 'pattern' (no leading %) name LIKE 'SM%' Yes
COL IS NULL close_date IS NULL Yes
COL IS NOT NULL email IS NOT NULL Yes
COL = COL (same table) col1 = col2 Yes
COL <> literal status <> 'X' Yes

24.3.3 Stage 2 Predicate Patterns

The following are always Stage 2:

Predicate Form Example Stage
function(COL) = literal YEAR(txn_date) = 2025 Stage 2
COL = (non-correlated subquery) col = (SELECT MAX(...)) Stage 2
COL IN (subquery) cust_id IN (SELECT ...) Stage 2
EXISTS (subquery) EXISTS (SELECT 1 ...) Stage 2
COL LIKE '%pattern' name LIKE '%SMITH' Stage 2
COL1 + COL2 > literal bal1 + bal2 > 10000 Stage 2
value BETWEEN COL1 AND COL2 50 BETWEEN low AND high Stage 2

24.3.4 Rewriting Stage 2 to Stage 1

Every Stage 2 predicate is an opportunity for a rewrite:

-- STAGE 2: function on column
WHERE YEAR(txn_date) = 2025

-- STAGE 1 rewrite: range predicate
WHERE txn_date >= '2025-01-01' AND txn_date < '2026-01-01'
-- STAGE 2: expression on column
WHERE balance + overdraft_limit > 5000

-- STAGE 1 rewrite (if overdraft_limit is a column, add a generated column)
-- Or restructure the query logic:
WHERE balance > 5000 - overdraft_limit  -- Still Stage 2 if overdraft_limit is a column
-- Better: add a generated column for the sum
ALTER TABLE ACCOUNTS ADD COLUMN effective_balance DECIMAL(15,2)
    GENERATED ALWAYS AS (balance + overdraft_limit);
CREATE INDEX IX_EFF_BAL ON ACCOUNTS(effective_balance);
-- Now:
WHERE effective_balance > 5000   -- Stage 1 with index
-- STAGE 2: IN (subquery)
WHERE a.customer_id IN (SELECT customer_id FROM VIP_CUSTOMERS)

-- Potential Stage 1 rewrite: JOIN (allows index nested loop join)
FROM ACCOUNTS a
JOIN VIP_CUSTOMERS v ON v.customer_id = a.customer_id

24.3.5 Checking Stage Classification

On z/OS, the DSN_FILTER_TABLE (available in DB2 12+) or the DSN_PREDICAT_TABLE shows how each predicate was classified:

-- Check predicate stages in the predicate table
SELECT QUERYNO, QBLOCKNO, PREDNO,
       TEXT, STAGE, INDEXABLE
FROM   DSN_PREDICAT_TABLE
WHERE  QUERYNO = 2401
ORDER BY PREDNO;

The STAGE column will show 1 or 2. The INDEXABLE column tells you whether the predicate could be used for index matching. A predicate can be Stage 1 without being indexable (if no matching index exists), but it cannot be indexable without being Stage 1.


24.4 Join Optimization

24.4.1 Join Order Matters

The order in which DB2 joins tables can dramatically affect performance. Consider a three-table join between CUSTOMERS (500,000 rows), ACCOUNTS (2,000,000 rows), and TRANSACTIONS (50,000,000 rows):

SELECT c.customer_name, a.account_number, t.txn_date, t.txn_amount
FROM   CUSTOMERS c
JOIN   ACCOUNTS a ON a.customer_id = c.customer_id
JOIN   TRANSACTIONS t ON t.account_id = a.account_id
WHERE  c.customer_name = 'JOHN SMITH'
  AND  t.txn_date >= '2025-06-01';

If the optimizer starts with TRANSACTIONS (50M rows) and filters by date first, it might get 5 million June transactions, then join to ACCOUNTS, then join to CUSTOMERS filtering by name. That is an enormous amount of intermediate work.

If instead it starts with CUSTOMERS, finds the one row for 'JOHN SMITH' (assuming an index on customer_name), joins to ACCOUNTS (maybe 4 accounts), then joins to TRANSACTIONS (maybe 200 transactions), the total work is trivial.

The optimizer usually gets this right — but not always, especially when:

  • Statistics are stale or missing
  • The optimizer underestimates the selectivity of a predicate
  • The query involves complex expressions or UDFs that the optimizer cannot estimate

24.4.2 Influencing Join Order

On z/OS, you can use OPTIMIZE FOR n ROWS to hint that the optimizer should favor a nested-loop join plan (which benefits from selective driving tables):

SELECT c.customer_name, a.account_number, t.txn_date, t.txn_amount
FROM   CUSTOMERS c
JOIN   ACCOUNTS a ON a.customer_id = c.customer_id
JOIN   TRANSACTIONS t ON t.account_id = a.account_id
WHERE  c.customer_name = 'JOHN SMITH'
  AND  t.txn_date >= '2025-06-01'
OPTIMIZE FOR 1 ROW;

On LUW, the OPTIMIZE FOR n ROWS clause is also available. Additionally, you can use optimization profiles or guidelines to lock in a specific join order:

-- LUW: Use optimization guidelines to force join order
SELECT /*+ ORDERED */ c.customer_name, a.account_number, t.txn_amount
FROM   CUSTOMERS c
JOIN   ACCOUNTS a ON a.customer_id = c.customer_id
JOIN   TRANSACTIONS t ON t.account_id = a.account_id
WHERE  c.customer_name = 'JOHN SMITH';

24.4.3 Join Methods: Nested Loop, Merge, Hash

Understanding the three join methods helps you predict and influence optimizer choices:

Nested Loop Join (NLJ): For each row in the outer (driving) table, look up matching rows in the inner table, typically via an index. Best when the outer table is small (highly filtered) and the inner table has a good index on the join column.

Sort Merge Join (SMJ): Sort both tables on the join column, then merge them in a single pass. Best when both tables are large and the result set is large. Requires sorts unless both inputs are already in the right order.

Hash Join: Build a hash table from the smaller input, then probe it with the larger input. Best when one table is much smaller than the other and no useful index exists on the join column. Available on LUW and z/OS (DB2 12+).

Join Method Selection Heuristic:

  Is one table very   ──YES──► Nested Loop Join
  small after filtering?        (with index on inner table)
        │
        NO
        │
  Are both tables     ──YES──► Sort Merge Join
  large and sorted?            (or can be cheaply sorted)
        │
        NO
        │
  Is one table much   ──YES──► Hash Join
  smaller than the other?      (build on smaller, probe larger)
        │
        NO
        │
  ► Optimizer picks based on cost estimates

24.4.4 EXISTS vs IN

This is one of the most frequently discussed SQL tuning topics, and the conventional wisdom ("EXISTS is always faster than IN") is an oversimplification. The real answer depends on the data distribution and optimizer behavior.

-- VERSION 1: IN with subquery
SELECT c.customer_name, c.customer_id
FROM   CUSTOMERS c
WHERE  c.customer_id IN (
    SELECT a.customer_id
    FROM   ACCOUNTS a
    WHERE  a.account_type = 'CHK'
      AND  a.current_balance > 100000
);

-- VERSION 2: EXISTS with correlated subquery
SELECT c.customer_name, c.customer_id
FROM   CUSTOMERS c
WHERE  EXISTS (
    SELECT 1
    FROM   ACCOUNTS a
    WHERE  a.customer_id = c.customer_id
      AND  a.account_type = 'CHK'
      AND  a.current_balance > 100000
);

When IN is better: The subquery returns a small number of distinct values. DB2 can materialize the list once and use it for index matching on the outer table.

When EXISTS is better: The outer table is small (highly filtered by other predicates) and the inner table has an index on the correlation column. EXISTS can use early-out — it stops looking after the first match.

When JOIN is better than both: If you need data from both tables, and the join produces a one-to-one or many-to-one result, a JOIN is cleaner and often more efficient:

-- VERSION 3: JOIN (but be aware of potential duplicates if many-to-many)
SELECT DISTINCT c.customer_name, c.customer_id
FROM   CUSTOMERS c
JOIN   ACCOUNTS a ON a.customer_id = c.customer_id
WHERE  a.account_type = 'CHK'
  AND  a.current_balance > 100000;

Meridian Bank guideline: We standardize on EXISTS for existence checks (no data needed from the subquery table) and JOIN when we need columns from both tables. We avoid IN with large subqueries (more than a few hundred values) because the optimizer may struggle with cardinality estimation.

24.4.5 Anti-Join Patterns: NOT EXISTS vs NOT IN

This is a case where the difference is not just about performance — it is about correctness:

-- DANGEROUS: NOT IN with nullable column
SELECT c.customer_name
FROM   CUSTOMERS c
WHERE  c.customer_id NOT IN (
    SELECT a.customer_id FROM ACCOUNTS a
);
-- If ANY row in ACCOUNTS has customer_id = NULL, this returns ZERO rows!

-- SAFE and typically faster: NOT EXISTS
SELECT c.customer_name
FROM   CUSTOMERS c
WHERE  NOT EXISTS (
    SELECT 1 FROM ACCOUNTS a WHERE a.customer_id = c.customer_id
);

The NOT IN null trap is one of the most insidious bugs in SQL. If the subquery can ever return a NULL value, NOT IN treats it as "unknown" for every comparison, and the entire WHERE clause evaluates to unknown — returning no rows. NOT EXISTS does not have this problem because it checks for the absence of correlating rows, not for a value comparison.

Rule at Meridian Bank: Always use NOT EXISTS instead of NOT IN. No exceptions.


24.5 Subquery Optimization

24.5.1 Converting Subqueries to Joins

The optimizer can often transform a subquery into a join internally (a process called "subquery flattening" or "subquery unnesting"). But not all subqueries can be flattened — complex correlated subqueries, subqueries with DISTINCT, GROUP BY, or HAVING, and subqueries in certain positions may remain as subquery access.

When a subquery cannot be flattened, it executes once per qualifying row of the outer query. For large outer result sets, this is catastrophic. The fix is to rewrite the subquery as a join yourself:

-- SLOW: Correlated subquery for each customer's most recent transaction
SELECT c.customer_id, c.customer_name,
       (SELECT MAX(t.txn_date)
        FROM   TRANSACTIONS t
        JOIN   ACCOUNTS a ON a.account_id = t.account_id
        WHERE  a.customer_id = c.customer_id) AS last_txn_date
FROM   CUSTOMERS c
WHERE  c.branch_id = 101;

-- FASTER: Rewrite as a join with a derived table
SELECT c.customer_id, c.customer_name, lt.last_txn_date
FROM   CUSTOMERS c
LEFT JOIN (
    SELECT a.customer_id, MAX(t.txn_date) AS last_txn_date
    FROM   TRANSACTIONS t
    JOIN   ACCOUNTS a ON a.account_id = t.account_id
    GROUP BY a.customer_id
) lt ON lt.customer_id = c.customer_id
WHERE  c.branch_id = 101;

The rewrite allows the optimizer to compute the MAX(txn_date) in a single pass per customer, then join the result. The correlated version may re-execute the subquery for every row in CUSTOMERS where branch_id = 101.

24.5.2 EXISTS vs COUNT for Existence Checks

A surprisingly common anti-pattern:

-- SLOW: Using COUNT just to check existence
SELECT c.customer_name
FROM   CUSTOMERS c
WHERE  (SELECT COUNT(*)
        FROM   ACCOUNTS a
        WHERE  a.customer_id = c.customer_id) > 0;

-- FAST: Using EXISTS (can stop at the first match)
SELECT c.customer_name
FROM   CUSTOMERS c
WHERE  EXISTS (
    SELECT 1 FROM ACCOUNTS a WHERE a.customer_id = c.customer_id
);

COUNT(*) must examine every matching row to produce a count. If a customer has 15 accounts, it reads all 15 to return "15 > 0 is TRUE." EXISTS reads the first account, finds a match, and immediately returns TRUE. For a customer with 1,000 transactions, the difference is 1,000x.

The same principle applies to NOT EXISTS vs COUNT(*) = 0:

-- SLOW
WHERE (SELECT COUNT(*) FROM ACCOUNTS a WHERE a.customer_id = c.customer_id) = 0

-- FAST
WHERE NOT EXISTS (SELECT 1 FROM ACCOUNTS a WHERE a.customer_id = c.customer_id)

24.5.3 Scalar Subquery Caching

DB2 can cache the results of scalar subqueries (subqueries that return a single value). If the same correlation value appears multiple times, DB2 can reuse the cached result instead of re-executing the subquery.

This is particularly effective when the outer table has many duplicate values in the correlation column. For example, if you are correlating on branch_id and there are only 50 branches, a scalar subquery might execute only 50 times instead of 500,000 times (once per customer).

-- This benefits from scalar subquery caching because branch_id has low cardinality
SELECT c.customer_id, c.customer_name,
       (SELECT b.branch_name
        FROM   BRANCHES b
        WHERE  b.branch_id = c.branch_id) AS branch_name
FROM   CUSTOMERS c;

On z/OS, the scalar subquery cache size is controlled by the MXQBCE ZPARM. On LUW, caching happens automatically but is limited in size.

However, do not rely on caching as a tuning strategy. The best approach is to rewrite the scalar subquery as a JOIN:

-- Better: explicit JOIN, no subquery caching needed
SELECT c.customer_id, c.customer_name, b.branch_name
FROM   CUSTOMERS c
JOIN   BRANCHES b ON b.branch_id = c.branch_id;

24.6 Sort Avoidance

24.6.1 Why Sorts Are Expensive

Sorts consume CPU, require memory (sort work area or temporary space), and add latency. On z/OS, sorts may require work file allocation in a work file database. On LUW, sorts consume sort heap memory and may spill to temporary tablespaces.

DB2 performs sorts for: - ORDER BY clauses - GROUP BY clauses - DISTINCT operations - UNION (not UNION ALL) - Merge joins - Some subquery evaluations - Index creation

The goal of sort avoidance is to structure your query and indexes so that data arrives in the required order naturally, eliminating the sort step entirely.

24.6.2 ORDER BY with Index

If an index exists that matches the ORDER BY columns in the same sequence and direction, DB2 can read data in order without sorting:

-- Index: CREATE INDEX IX_TXN_DATE ON TRANSACTIONS(txn_date)

-- This query can avoid a sort:
SELECT txn_id, txn_date, txn_amount
FROM   TRANSACTIONS
WHERE  txn_date >= '2025-06-01'
ORDER BY txn_date;

-- This query CANNOT avoid the sort (different column):
SELECT txn_id, txn_date, txn_amount
FROM   TRANSACTIONS
WHERE  txn_date >= '2025-06-01'
ORDER BY txn_amount;

For composite indexes, the ORDER BY must match a prefix of the index key:

-- Index: CREATE INDEX IX_TXN_ACCT_DATE ON TRANSACTIONS(account_id, txn_date)

-- Sort avoided: ORDER BY matches index prefix after equality predicate
SELECT txn_id, txn_date, txn_amount
FROM   TRANSACTIONS
WHERE  account_id = 12345
ORDER BY txn_date;

-- Sort NOT avoided: ORDER BY does not follow index key sequence
SELECT txn_id, txn_date, txn_amount
FROM   TRANSACTIONS
WHERE  txn_date >= '2025-06-01'
ORDER BY account_id;

The key insight: when the leading index columns are fixed by equality predicates, the ORDER BY needs to match the remaining index columns.

24.6.3 GROUP BY Optimization

GROUP BY also requires sorting (or hashing on LUW). The same index-ordering strategies apply:

-- Index: CREATE INDEX IX_TXN_ACCT_DATE ON TRANSACTIONS(account_id, txn_date)

-- Sort avoided for GROUP BY:
SELECT account_id, DATE(txn_date) AS txn_day, SUM(txn_amount) AS total
FROM   TRANSACTIONS
GROUP BY account_id, DATE(txn_date);
-- But note: DATE(txn_date) is a function, so the optimizer may not recognize the ordering.

-- Better for sort avoidance:
SELECT account_id, txn_date, SUM(txn_amount) AS total
FROM   TRANSACTIONS
GROUP BY account_id, txn_date;

24.6.4 DISTINCT Elimination

DISTINCT forces a sort to identify and remove duplicates. Often, DISTINCT is used as a crutch when the real problem is an incorrect join producing duplicates:

-- Unnecessary DISTINCT caused by a bad join
SELECT DISTINCT c.customer_id, c.customer_name
FROM   CUSTOMERS c
JOIN   ACCOUNTS a ON a.customer_id = c.customer_id
WHERE  a.account_type IN ('CHK', 'SAV');
-- A customer with both checking and savings appears twice; DISTINCT hides the problem.

-- Better: use EXISTS (no duplicates, no sort needed)
SELECT c.customer_id, c.customer_name
FROM   CUSTOMERS c
WHERE  EXISTS (
    SELECT 1 FROM ACCOUNTS a
    WHERE  a.customer_id = c.customer_id
      AND  a.account_type IN ('CHK', 'SAV')
);

24.6.5 UNION vs UNION ALL

UNION performs a sort (or hash) to eliminate duplicates. UNION ALL does not. If you know the result sets are disjoint, or if duplicates are acceptable, always use UNION ALL:

-- SLOW: UNION forces sort for duplicate elimination
SELECT customer_id FROM CHECKING_ACCOUNTS
UNION
SELECT customer_id FROM SAVINGS_ACCOUNTS;

-- FAST: UNION ALL skips the sort (if duplicates are OK or guaranteed absent)
SELECT customer_id FROM CHECKING_ACCOUNTS
UNION ALL
SELECT customer_id FROM SAVINGS_ACCOUNTS;

If you need distinct results but want to avoid UNION's sort, you can sometimes restructure:

-- Alternative: EXISTS on a combined condition
SELECT c.customer_id
FROM   CUSTOMERS c
WHERE  EXISTS (SELECT 1 FROM CHECKING_ACCOUNTS WHERE customer_id = c.customer_id)
   OR  EXISTS (SELECT 1 FROM SAVINGS_ACCOUNTS WHERE customer_id = c.customer_id);

24.7 FETCH FIRST Optimization and Pagination Patterns

24.7.1 FETCH FIRST n ROWS ONLY

The FETCH FIRST n ROWS ONLY clause tells DB2 that you only need a limited number of rows. This is a powerful optimization signal because it allows the optimizer to choose plans that favor fast initial return over total throughput:

-- Without FETCH FIRST: optimizer plans for all rows
SELECT customer_name, current_balance
FROM   CUSTOMERS c
JOIN   ACCOUNTS a ON a.customer_id = c.customer_id
ORDER BY current_balance DESC;

-- With FETCH FIRST: optimizer may choose an index on current_balance
SELECT customer_name, current_balance
FROM   CUSTOMERS c
JOIN   ACCOUNTS a ON a.customer_id = c.customer_id
ORDER BY current_balance DESC
FETCH FIRST 10 ROWS ONLY;

On z/OS, FETCH FIRST n ROWS ONLY combined with OPTIMIZE FOR n ROWS gives the optimizer the strongest signal to favor nested loop joins and index access:

SELECT customer_name, account_number, current_balance
FROM   CUSTOMERS c
JOIN   ACCOUNTS a ON a.customer_id = c.customer_id
WHERE  c.branch_id = 101
ORDER BY a.current_balance DESC
FETCH FIRST 20 ROWS ONLY
OPTIMIZE FOR 20 ROWS;

24.7.2 Offset Pagination: Simple but Dangerous

The traditional pagination pattern using OFFSET (or ROW_NUMBER) becomes increasingly expensive as the page number grows:

-- Page 1: fast (skip 0 rows)
SELECT * FROM TRANSACTIONS
WHERE account_id = 12345
ORDER BY txn_date DESC
OFFSET 0 ROWS FETCH FIRST 20 ROWS ONLY;

-- Page 100: slow (must skip 1,980 rows)
SELECT * FROM TRANSACTIONS
WHERE account_id = 12345
ORDER BY txn_date DESC
OFFSET 1980 ROWS FETCH FIRST 20 ROWS ONLY;

-- Page 10000: very slow (must skip 199,980 rows!)
SELECT * FROM TRANSACTIONS
WHERE account_id = 12345
ORDER BY txn_date DESC
OFFSET 199980 ROWS FETCH FIRST 20 ROWS ONLY;

The problem is that DB2 must read and discard all the skipped rows. For page 10,000, it reads 200,000 rows and discards 199,980 of them.

24.7.3 Keyset Pagination: The Scalable Alternative

Keyset pagination (also called "seek method" or "cursor-based pagination") uses the last row of the previous page as the starting point for the next page:

-- Page 1: start from the beginning
SELECT txn_id, txn_date, txn_amount
FROM   TRANSACTIONS
WHERE  account_id = 12345
ORDER BY txn_date DESC, txn_id DESC
FETCH FIRST 20 ROWS ONLY;
-- Last row returned: txn_date = '2025-06-15', txn_id = 98765

-- Page 2: start after the last row of page 1
SELECT txn_id, txn_date, txn_amount
FROM   TRANSACTIONS
WHERE  account_id = 12345
  AND  (txn_date < '2025-06-15'
       OR (txn_date = '2025-06-15' AND txn_id < 98765))
ORDER BY txn_date DESC, txn_id DESC
FETCH FIRST 20 ROWS ONLY;

This approach has constant cost regardless of page depth because DB2 can start reading directly from the right position in the index. The compound WHERE clause looks complex, but with an index on (account_id, txn_date DESC, txn_id DESC), it is highly efficient.

Simplified with row-value comparison (DB2 LUW 11.5+):

-- More readable keyset pagination using row-value comparison
SELECT txn_id, txn_date, txn_amount
FROM   TRANSACTIONS
WHERE  account_id = 12345
  AND  (txn_date, txn_id) < ('2025-06-15', 98765)
ORDER BY txn_date DESC, txn_id DESC
FETCH FIRST 20 ROWS ONLY;

24.7.4 Meridian Bank Pagination Strategy

At Meridian, the teller application uses keyset pagination for transaction history. The initial page loads in under 50ms regardless of how many transactions the account has. Without keyset pagination, customers with 10+ years of history experienced 5-10 second load times for deep pages.

-- Meridian teller application: transaction history (first page)
SELECT t.txn_id, t.txn_date, t.txn_type, t.txn_amount,
       t.description, t.running_balance
FROM   TRANSACTIONS t
WHERE  t.account_id = :account_id
ORDER BY t.txn_date DESC, t.txn_id DESC
FETCH FIRST 25 ROWS ONLY
OPTIMIZE FOR 25 ROWS;

-- Subsequent pages: pass last txn_date and txn_id from previous page
SELECT t.txn_id, t.txn_date, t.txn_type, t.txn_amount,
       t.description, t.running_balance
FROM   TRANSACTIONS t
WHERE  t.account_id = :account_id
  AND  (t.txn_date, t.txn_id) < (:last_txn_date, :last_txn_id)
ORDER BY t.txn_date DESC, t.txn_id DESC
FETCH FIRST 25 ROWS ONLY
OPTIMIZE FOR 25 ROWS;

24.8 Materialized Query Tables (MQTs) for Query Acceleration

24.8.1 When MQTs Help

A Materialized Query Table (MQT) is a table whose content is defined by a query. DB2 stores the precomputed results and can automatically route queries to the MQT when it determines that the MQT can satisfy the query more efficiently than the base tables.

MQTs are most valuable when: - Complex aggregations are queried repeatedly with slight variations - The base tables are large but the aggregated result is small - The data changes infrequently relative to how often it is queried - Multiple queries can benefit from the same precomputed summary

MQTs are not appropriate when: - The base data changes frequently and staleness is unacceptable - The MQT would be nearly as large as the base table - Only one or two queries would benefit - The application requires real-time accuracy

24.8.2 Creating an MQT

-- z/OS and LUW: Create an MQT for daily transaction summaries
CREATE TABLE MQT_DAILY_TXN_SUMMARY AS (
    SELECT a.branch_id,
           a.account_type,
           DATE(t.txn_date) AS txn_day,
           t.txn_type,
           COUNT(*) AS txn_count,
           SUM(t.txn_amount) AS total_amount,
           AVG(t.txn_amount) AS avg_amount,
           MIN(t.txn_amount) AS min_amount,
           MAX(t.txn_amount) AS max_amount
    FROM   TRANSACTIONS t
    JOIN   ACCOUNTS a ON a.account_id = t.account_id
    GROUP BY a.branch_id, a.account_type, DATE(t.txn_date), t.txn_type
) DATA INITIALLY DEFERRED REFRESH DEFERRED
  MAINTAINED BY SYSTEM;
-- Note: DATA INITIALLY DEFERRED means the MQT is empty until first REFRESH

-- Populate the MQT
REFRESH TABLE MQT_DAILY_TXN_SUMMARY;

24.8.3 Refresh Strategies

REFRESH DEFERRED: Data is updated only when you explicitly run REFRESH TABLE. Best for batch reporting workloads where data is loaded on a schedule.

REFRESH IMMEDIATE (LUW only): DB2 updates the MQT automatically whenever the base tables change. This keeps the MQT current but adds overhead to every INSERT, UPDATE, and DELETE on the base tables. Best for small base tables or low-change environments.

-- LUW: Immediate refresh MQT
CREATE TABLE MQT_BRANCH_BALANCE_SUMMARY AS (
    SELECT branch_id,
           account_type,
           COUNT(*) AS account_count,
           SUM(current_balance) AS total_balance
    FROM   ACCOUNTS
    GROUP BY branch_id, account_type
) DATA INITIALLY DEFERRED REFRESH IMMEDIATE;

On z/OS, only REFRESH DEFERRED is available. Schedule refreshes to run after batch loads:

-- z/OS: Refresh MQT after nightly batch load
REFRESH TABLE MQT_DAILY_TXN_SUMMARY;
RUNSTATS TABLESPACE dbname.tsname TABLE(MQT_DAILY_TXN_SUMMARY) INDEX(ALL);

24.8.4 Optimizer Routing

For the optimizer to route a query to an MQT, several conditions must be met:

  1. CURRENT REFRESH AGE must be set appropriately. For REFRESH DEFERRED MQTs, the optimizer only considers them if the session allows stale data:
-- Allow the optimizer to use MQTs with stale data
SET CURRENT REFRESH AGE = ANY;     -- Use any MQT regardless of age
-- Or:
SET CURRENT REFRESH AGE = 99999999999999;  -- Same effect, explicit maximum

-- On z/OS, also need:
SET CURRENT MAINTAINED TABLE TYPES FOR OPTIMIZATION = ALL;
  1. The query must be "covered" by the MQT. The MQT must contain all the columns referenced in the query, and the grouping columns must be at the same level or a more detailed level than what the query requests.

  2. Statistics must be current on both the base tables and the MQT, so the optimizer can compare costs.

-- This query CAN be routed to MQT_DAILY_TXN_SUMMARY:
SELECT branch_id, SUM(total_amount) AS monthly_total
FROM   MQT_DAILY_TXN_SUMMARY          -- explicit reference
WHERE  txn_day BETWEEN '2025-06-01' AND '2025-06-30'
  AND  txn_type = 'D'
GROUP BY branch_id;

-- This query can ALSO be routed (implicit routing by optimizer):
SET CURRENT REFRESH AGE = ANY;
SELECT a.branch_id, SUM(t.txn_amount) AS monthly_total
FROM   TRANSACTIONS t
JOIN   ACCOUNTS a ON a.account_id = t.account_id
WHERE  DATE(t.txn_date) BETWEEN '2025-06-01' AND '2025-06-30'
  AND  t.txn_type = 'D'
GROUP BY a.branch_id;
-- The optimizer recognizes this matches the MQT definition and routes automatically.

24.8.5 Meridian Bank MQT Strategy

Meridian uses three MQTs for their reporting workload:

  1. MQT_DAILY_TXN_SUMMARY: Refreshed nightly after the batch cycle. Used by the branch management dashboard.
  2. MQT_MONTHLY_CUSTOMER_SUMMARY: Refreshed on the 1st of each month. Used by the monthly statement generation.
  3. MQT_BRANCH_PERFORMANCE: Refreshed hourly during business hours. Used by the executive dashboard.

The monthly statement generation query, which previously scanned 50 million transactions, now reads from an MQT with 1.5 million summary rows. Elapsed time dropped from 47 minutes to 3 minutes.


24.9 SQL Rewrite Cookbook

This section provides 15+ before-and-after SQL rewrites with performance impact analysis. Each entry follows the same format: the problem, the slow query, the fast rewrite, and the measured improvement on Meridian Bank data.

Rewrite 1: Date Function to Range Predicate

-- BEFORE: Non-sargable date function (tablespace scan)
SELECT txn_id, txn_date, txn_amount
FROM   TRANSACTIONS
WHERE  YEAR(txn_date) = 2025 AND MONTH(txn_date) = 1;
-- Cost: 1,200,000 getpages, 45 seconds

-- AFTER: Sargable range predicate (index range scan)
SELECT txn_id, txn_date, txn_amount
FROM   TRANSACTIONS
WHERE  txn_date >= '2025-01-01' AND txn_date < '2025-02-01';
-- Cost: 3,400 getpages, 0.2 seconds
-- Improvement: 353x fewer getpages

Rewrite 2: UPPER() Function to Expression Index

-- BEFORE: Case-insensitive search with function on column
SELECT customer_id, customer_name, email
FROM   CUSTOMERS
WHERE  UPPER(email) = 'JOHN.SMITH@EXAMPLE.COM';
-- Cost: tablespace scan, 8 seconds for 500K rows

-- AFTER (LUW): Expression-based index
CREATE INDEX IX_CUST_EMAIL_UPPER ON CUSTOMERS(UPPER(email));
-- Same query now uses the index
-- Cost: 4 getpages, 0.001 seconds

-- AFTER (z/OS): Generated column approach
ALTER TABLE CUSTOMERS ADD COLUMN email_upper VARCHAR(255)
    GENERATED ALWAYS AS (UPPER(email));
CREATE INDEX IX_CUST_EMAIL_UPPER ON CUSTOMERS(email_upper);
SELECT customer_id, customer_name, email
FROM   CUSTOMERS
WHERE  email_upper = 'JOHN.SMITH@EXAMPLE.COM';
-- Cost: 4 getpages, 0.001 seconds

Rewrite 3: COUNT(*) Existence Check to EXISTS

-- BEFORE: Count all matching rows just to check existence
SELECT c.customer_id, c.customer_name
FROM   CUSTOMERS c
WHERE  (SELECT COUNT(*) FROM TRANSACTIONS t
        JOIN ACCOUNTS a ON a.account_id = t.account_id
        WHERE a.customer_id = c.customer_id
          AND t.txn_date >= '2025-01-01') > 0;
-- Cost: correlated subquery scans transactions for every customer

-- AFTER: EXISTS with early-out
SELECT c.customer_id, c.customer_name
FROM   CUSTOMERS c
WHERE  EXISTS (
    SELECT 1 FROM TRANSACTIONS t
    JOIN   ACCOUNTS a ON a.account_id = t.account_id
    WHERE  a.customer_id = c.customer_id
      AND  t.txn_date >= '2025-01-01'
);
-- Improvement: 10-100x for customers with many transactions

Rewrite 4: NOT IN to NOT EXISTS (Correctness and Performance)

-- BEFORE: NOT IN (incorrect if NULLs exist, and slow)
SELECT c.customer_id, c.customer_name
FROM   CUSTOMERS c
WHERE  c.customer_id NOT IN (
    SELECT a.customer_id FROM ACCOUNTS a WHERE a.status = 'A'
);

-- AFTER: NOT EXISTS (correct and typically faster)
SELECT c.customer_id, c.customer_name
FROM   CUSTOMERS c
WHERE  NOT EXISTS (
    SELECT 1 FROM ACCOUNTS a
    WHERE  a.customer_id = c.customer_id AND a.status = 'A'
);

Rewrite 5: Correlated Scalar Subquery to JOIN

-- BEFORE: Scalar subquery executes once per row
SELECT a.account_id, a.account_number,
       (SELECT b.branch_name FROM BRANCHES b WHERE b.branch_id = a.branch_id) AS branch_name,
       (SELECT COUNT(*) FROM TRANSACTIONS t WHERE t.account_id = a.account_id) AS txn_count
FROM   ACCOUNTS a
WHERE  a.account_type = 'CHK';
-- Two correlated subqueries per row

-- AFTER: JOINs replace both subqueries
SELECT a.account_id, a.account_number, b.branch_name, tc.txn_count
FROM   ACCOUNTS a
JOIN   BRANCHES b ON b.branch_id = a.branch_id
LEFT JOIN (
    SELECT account_id, COUNT(*) AS txn_count
    FROM   TRANSACTIONS
    GROUP BY account_id
) tc ON tc.account_id = a.account_id
WHERE  a.account_type = 'CHK';

Rewrite 6: OR to UNION ALL

-- BEFORE: OR prevents single-index access
SELECT customer_id, customer_name, phone, email
FROM   CUSTOMERS
WHERE  phone = '555-0100' OR email = 'john@example.com';
-- May require tablespace scan if optimizer cannot use index ORing

-- AFTER: UNION ALL with each branch using its own index
SELECT customer_id, customer_name, phone, email
FROM   CUSTOMERS WHERE phone = '555-0100'
UNION ALL
SELECT customer_id, customer_name, phone, email
FROM   CUSTOMERS WHERE email = 'john@example.com' AND phone <> '555-0100';

Rewrite 7: DISTINCT Elimination via EXISTS

-- BEFORE: JOIN produces duplicates, DISTINCT sorts them away
SELECT DISTINCT c.customer_id, c.customer_name
FROM   CUSTOMERS c
JOIN   TRANSACTIONS t ON t.customer_id = c.customer_id
WHERE  t.txn_date >= '2025-01-01';

-- AFTER: EXISTS avoids duplicates in the first place
SELECT c.customer_id, c.customer_name
FROM   CUSTOMERS c
WHERE  EXISTS (
    SELECT 1 FROM TRANSACTIONS t
    WHERE  t.customer_id = c.customer_id
      AND  t.txn_date >= '2025-01-01'
);

Rewrite 8: Unnecessary ORDER BY in Subquery

-- BEFORE: ORDER BY in a subquery is meaningless and may cause an unnecessary sort
SELECT customer_id, customer_name
FROM   CUSTOMERS
WHERE  customer_id IN (
    SELECT customer_id FROM ACCOUNTS
    WHERE  current_balance > 50000
    ORDER BY current_balance DESC  -- pointless in a subquery
);

-- AFTER: Remove the ORDER BY
SELECT customer_id, customer_name
FROM   CUSTOMERS
WHERE  customer_id IN (
    SELECT customer_id FROM ACCOUNTS WHERE current_balance > 50000
);

Rewrite 9: HAVING Without GROUP BY Work

-- BEFORE: GROUP BY + HAVING when a simple WHERE suffices
SELECT account_id, COUNT(*) AS txn_count
FROM   TRANSACTIONS
GROUP BY account_id
HAVING account_id = 12345;

-- AFTER: Filter in WHERE before grouping
SELECT account_id, COUNT(*) AS txn_count
FROM   TRANSACTIONS
WHERE  account_id = 12345
GROUP BY account_id;
-- WHERE filters before GROUP BY (reduces data to group)
-- HAVING filters after GROUP BY (groups everything first)

Rewrite 10: SELECT * to Specific Columns (Index-Only Access)

-- BEFORE: SELECT * forces a table access even if index has what we need
SELECT * FROM TRANSACTIONS
WHERE  account_id = 12345 AND txn_date = '2025-06-15';

-- AFTER: Select only needed columns (may enable index-only access)
SELECT txn_id, txn_amount, txn_type
FROM   TRANSACTIONS
WHERE  account_id = 12345 AND txn_date = '2025-06-15';
-- With index on (account_id, txn_date, txn_id, txn_amount, txn_type),
-- this becomes an index-only access path

Rewrite 11: Implicit Conversion Fix

-- BEFORE: Implicit conversion (account_number is CHAR(12))
SELECT * FROM ACCOUNTS WHERE account_number = 100012345678;
-- DB2 converts every CHAR value to numeric for comparison -- tablespace scan

-- AFTER: Match the data type
SELECT * FROM ACCOUNTS WHERE account_number = '100012345678';
-- Index lookup, 3 getpages

Rewrite 12: Predicate Pushdown into Views

-- View definition:
CREATE VIEW V_ACTIVE_ACCOUNTS AS
SELECT a.*, c.customer_name
FROM   ACCOUNTS a
JOIN   CUSTOMERS c ON c.customer_id = a.customer_id
WHERE  a.status = 'A';

-- BEFORE: Query with filter that cannot be pushed into the view
SELECT * FROM V_ACTIVE_ACCOUNTS
WHERE  SUBSTR(account_number, 1, 3) = 'CHK';
-- The SUBSTR prevents predicate pushdown

-- AFTER: Rewrite with a sargable predicate
SELECT * FROM V_ACTIVE_ACCOUNTS
WHERE  account_number LIKE 'CHK%';
-- This can be pushed down and potentially matched to an index

Rewrite 13: Multiple Single-Row Queries to Set-Based

-- BEFORE: Application issues 500 individual queries in a loop
-- (pseudocode)
FOR EACH customer_id IN :list
    SELECT customer_name, balance FROM ACCOUNTS WHERE customer_id = :cid;
END FOR

-- AFTER: Single set-based query with IN list
SELECT customer_id, customer_name, balance
FROM   ACCOUNTS
WHERE  customer_id IN (101, 102, 103, ..., 600);
-- Or use a temporary table / table-valued parameter for large lists

-- Even better on z/OS: use multi-row FETCH
DECLARE cursor1 CURSOR FOR
    SELECT customer_id, customer_name, balance
    FROM   ACCOUNTS
    WHERE  customer_id IN (SELECT id FROM SESSION.TEMP_IDS);
FETCH FROM cursor1 FOR 100 ROWS INTO :array;

Rewrite 14: Unnecessary CASE in WHERE Clause

-- BEFORE: CASE expression in WHERE defeats index usage
SELECT * FROM ACCOUNTS
WHERE  CASE WHEN :search_type = 'BRANCH'
            THEN branch_id
            ELSE customer_id
       END = :search_value;

-- AFTER: Dynamic SQL or UNION ALL approach
-- Option A: Build the query dynamically in the application
-- Option B: Static SQL with explicit branches
SELECT * FROM ACCOUNTS WHERE branch_id = :search_value
    AND :search_type = 'BRANCH'
UNION ALL
SELECT * FROM ACCOUNTS WHERE customer_id = :search_value
    AND :search_type = 'CUSTOMER';

Rewrite 15: Excessive Joins Replaced by Denormalized MQT

-- BEFORE: 6-table join for a frequently-run dashboard query
SELECT b.branch_name, c.customer_name, a.account_number,
       a.current_balance, p.product_name, r.region_name
FROM   ACCOUNTS a
JOIN   CUSTOMERS c ON c.customer_id = a.customer_id
JOIN   BRANCHES b ON b.branch_id = a.branch_id
JOIN   PRODUCTS p ON p.product_id = a.product_id
JOIN   REGIONS r ON r.region_id = b.region_id
JOIN   ACCOUNT_STATUS s ON s.status_code = a.status
WHERE  r.region_name = 'NORTHEAST'
  AND  s.status_desc = 'ACTIVE';

-- AFTER: MQT that pre-joins and pre-filters
CREATE TABLE MQT_ACCOUNT_DASHBOARD AS (
    SELECT b.branch_name, c.customer_name, a.account_number,
           a.current_balance, p.product_name, r.region_name,
           s.status_desc
    FROM   ACCOUNTS a
    JOIN   CUSTOMERS c ON c.customer_id = a.customer_id
    JOIN   BRANCHES b ON b.branch_id = a.branch_id
    JOIN   PRODUCTS p ON p.product_id = a.product_id
    JOIN   REGIONS r ON r.region_id = b.region_id
    JOIN   ACCOUNT_STATUS s ON s.status_code = a.status
) DATA INITIALLY DEFERRED REFRESH DEFERRED;

-- Dashboard query now reads from MQT:
SELECT branch_name, customer_name, account_number,
       current_balance, product_name
FROM   MQT_ACCOUNT_DASHBOARD
WHERE  region_name = 'NORTHEAST' AND status_desc = 'ACTIVE';

24.10 Meridian Bank SQL Tuning

24.10.1 Case: Monthly Statement Generation

The monthly statement generation job is Meridian's most critical batch process. It runs on the 1st of each month and must complete within a 4-hour batch window. As the bank has grown, it has been creeping past the window.

Original Query (simplified):

-- Monthly statement: for each customer, summarize all transactions last month
SELECT c.customer_id, c.customer_name, c.mailing_address,
       a.account_number, a.account_type,
       (SELECT SUM(t.txn_amount) FROM TRANSACTIONS t
        WHERE t.account_id = a.account_id
          AND YEAR(t.txn_date) = YEAR(CURRENT DATE - 1 MONTH)
          AND MONTH(t.txn_date) = MONTH(CURRENT DATE - 1 MONTH)
          AND t.txn_type = 'D') AS total_deposits,
       (SELECT SUM(t.txn_amount) FROM TRANSACTIONS t
        WHERE t.account_id = a.account_id
          AND YEAR(t.txn_date) = YEAR(CURRENT DATE - 1 MONTH)
          AND MONTH(t.txn_date) = MONTH(CURRENT DATE - 1 MONTH)
          AND t.txn_type = 'W') AS total_withdrawals,
       (SELECT COUNT(*) FROM TRANSACTIONS t
        WHERE t.account_id = a.account_id
          AND YEAR(t.txn_date) = YEAR(CURRENT DATE - 1 MONTH)
          AND MONTH(t.txn_date) = MONTH(CURRENT DATE - 1 MONTH)) AS txn_count
FROM   CUSTOMERS c
JOIN   ACCOUNTS a ON a.customer_id = c.customer_id
WHERE  a.status = 'A'
ORDER BY c.customer_id, a.account_number;

Problems identified: 1. Three correlated subqueries per account (each scanning TRANSACTIONS) 2. YEAR() and MONTH() functions on txn_date make predicates non-sargable 3. No FETCH FIRST or OPTIMIZE FOR guidance

Tuned Query:

-- Step 1: Calculate the date range once
-- (In application code, compute start_date and end_date before executing)
-- :start_date = first day of previous month
-- :end_date   = first day of current month

-- Step 2: Rewrite with a single JOIN + GROUP BY replacing all subqueries
SELECT c.customer_id, c.customer_name, c.mailing_address,
       a.account_number, a.account_type,
       COALESCE(ts.total_deposits, 0)    AS total_deposits,
       COALESCE(ts.total_withdrawals, 0) AS total_withdrawals,
       COALESCE(ts.txn_count, 0)         AS txn_count
FROM   CUSTOMERS c
JOIN   ACCOUNTS a ON a.customer_id = c.customer_id
LEFT JOIN (
    SELECT t.account_id,
           SUM(CASE WHEN t.txn_type = 'D' THEN t.txn_amount ELSE 0 END) AS total_deposits,
           SUM(CASE WHEN t.txn_type = 'W' THEN t.txn_amount ELSE 0 END) AS total_withdrawals,
           COUNT(*) AS txn_count
    FROM   TRANSACTIONS t
    WHERE  t.txn_date >= :start_date
      AND  t.txn_date <  :end_date
    GROUP BY t.account_id
) ts ON ts.account_id = a.account_id
WHERE  a.status = 'A'
ORDER BY c.customer_id, a.account_number;

Results: | Metric | Before | After | Improvement | |--------|--------|-------|-------------| | Elapsed time | 3h 47m | 22 minutes | 10.3x | | CPU seconds | 14,200 | 1,380 | 10.3x | | Getpages | 890M | 62M | 14.4x | | Sorts | 3 | 1 | 3x |

The three correlated subqueries were replaced by a single derived table that scans TRANSACTIONS once. The date functions were replaced by sargable range predicates. The LEFT JOIN preserves accounts with no transactions in the period.

24.10.2 Case: Account Lookup

The teller application searches for accounts by partial customer name. The original query was unbearably slow:

-- Original: wildcard search with function
SELECT c.customer_id, c.customer_name, a.account_number, a.current_balance
FROM   CUSTOMERS c
JOIN   ACCOUNTS a ON a.customer_id = c.customer_id
WHERE  UPPER(c.customer_name) LIKE '%' || UPPER(:search_term) || '%'
ORDER BY c.customer_name
FETCH FIRST 20 ROWS ONLY;

Problems: UPPER() on column prevents index use. Leading % wildcard prevents index range scan. Tablespace scan of 500K customers for every search.

Tuned approach:

-- Solution 1: If tellers always search by last name prefix
CREATE INDEX IX_CUST_NAME ON CUSTOMERS(customer_name);

SELECT c.customer_id, c.customer_name, a.account_number, a.current_balance
FROM   CUSTOMERS c
JOIN   ACCOUNTS a ON a.customer_id = c.customer_id
WHERE  c.customer_name LIKE :search_term || '%'   -- trailing wildcard only
ORDER BY c.customer_name
FETCH FIRST 20 ROWS ONLY
OPTIMIZE FOR 20 ROWS;

-- Solution 2: For true substring search, implement a text search index
-- (LUW)
-- Or maintain a search table with trigrams for fast lookup

Results: Response time dropped from 8 seconds to 40 milliseconds for prefix searches.

Customer service representatives search for specific transactions. The original query had a "universal search" design that was inherently non-sargable:

-- Original: universal search with CASE-based predicates
SELECT t.txn_id, t.txn_date, t.txn_amount, t.description
FROM   TRANSACTIONS t
WHERE  (:search_field = 'AMOUNT' AND CAST(t.txn_amount AS VARCHAR(20)) LIKE '%' || :search_value || '%')
   OR  (:search_field = 'DESC'   AND UPPER(t.description) LIKE '%' || UPPER(:search_value) || '%')
   OR  (:search_field = 'DATE'   AND CHAR(t.txn_date) = :search_value)
ORDER BY t.txn_date DESC
FETCH FIRST 50 ROWS ONLY;

Problems: Every single predicate is non-sargable. The OR structure prevents any index use. This query always performs a tablespace scan of 50 million rows.

Tuned approach: Replace the universal search with specific queries built dynamically:

-- The application builds the appropriate query based on search_field

-- When searching by amount (exact match):
SELECT t.txn_id, t.txn_date, t.txn_amount, t.description
FROM   TRANSACTIONS t
WHERE  t.account_id = :acct_id      -- always scope to account first!
  AND  t.txn_amount = :search_amount
ORDER BY t.txn_date DESC
FETCH FIRST 50 ROWS ONLY
OPTIMIZE FOR 50 ROWS;

-- When searching by date:
SELECT t.txn_id, t.txn_date, t.txn_amount, t.description
FROM   TRANSACTIONS t
WHERE  t.account_id = :acct_id
  AND  t.txn_date = :search_date
ORDER BY t.txn_id DESC
FETCH FIRST 50 ROWS ONLY
OPTIMIZE FOR 50 ROWS;

-- When searching by description (use text index or scope narrowly):
SELECT t.txn_id, t.txn_date, t.txn_amount, t.description
FROM   TRANSACTIONS t
WHERE  t.account_id = :acct_id
  AND  t.txn_date >= :start_date AND t.txn_date < :end_date  -- narrow the window
  AND  UPPER(t.description) LIKE '%' || UPPER(:search_value) || '%'
ORDER BY t.txn_date DESC
FETCH FIRST 50 ROWS ONLY;

The key insight: always scope the search to an account first (reducing 50M rows to perhaps 5,000), then apply less selective predicates within that narrowed scope. The non-sargable description search is acceptable when applied to 5,000 rows instead of 50 million.


Spaced Review

From Chapter 8: Subqueries and CTEs

  1. What is the difference between a correlated and a non-correlated subquery? A non-correlated subquery can execute independently; a correlated subquery references a column from the outer query and executes once per outer row (conceptually). This distinction directly affects tuning: correlated subqueries are often candidates for rewriting as JOINs.

  2. When would you use a CTE instead of a derived table? CTEs improve readability for complex queries and allow self-referencing (recursive CTEs). From a tuning perspective, DB2 may materialize a CTE as a temporary table, which can help or hurt depending on the situation.

From Chapter 15: Index Design

  1. What makes an index "covering" for a query? A covering index includes all columns referenced by the query (in SELECT, WHERE, ORDER BY, GROUP BY) so that DB2 never needs to access the base table. This is the ultimate sort-avoidance and I/O-reduction technique.

  2. How does the order of columns in a composite index affect query performance? The leftmost columns in the index key must match equality predicates or range predicates for the index to be used effectively. ORDER BY and GROUP BY benefit when they match the index key sequence after the equality columns.

From Chapter 22: The DB2 Optimizer

  1. What is the optimizer's cost model based on? The optimizer estimates cost primarily from catalog statistics (cardinality, distribution, high/low key values) combined with knowledge of system resources (I/O speed, CPU cost per operation). Stale statistics lead to bad estimates, which lead to bad plans.

  2. Why might the optimizer choose a tablespace scan over an index access path? When the optimizer estimates that a large percentage of rows qualify, a sequential tablespace scan may actually be faster than an index access path that would require many random I/Os. This is why tuning begins with examining selectivity — a non-selective predicate may legitimately result in a scan.


Summary

SQL tuning is a disciplined practice, not a collection of tricks. The foundation is measurement: baseline performance before you change anything, and verify both correctness and improvement after every change. The most impactful techniques are often the simplest: making predicates sargable, avoiding unnecessary sorts, using EXISTS instead of COUNT for existence checks, and scoping searches narrowly before applying expensive predicates.

On z/OS, the Stage 1 vs Stage 2 predicate classification adds another dimension of optimization. Every Stage 2 predicate represents wasted work — rows that were fetched only to be discarded. Systematically identifying and rewriting Stage 2 predicates to Stage 1 is one of the highest-return activities a z/OS DBA can perform.

The SQL Rewrite Cookbook provides patterns you will encounter repeatedly throughout your career. Memorize the principles behind them, not the specific rewrites — because every query is unique, but the patterns repeat endlessly.

In the next chapter, we turn from SQL-level tuning to system-level tuning: buffer pools and memory configuration. The best SQL in the world cannot compensate for a database that does not have enough memory to cache its most frequently accessed data.