Chapter 24 Exercises: SQL Tuning — Rewriting Queries for Performance Without Changing Results

Instructions: These exercises build from individual predicate rewrites through full query optimization. All queries should be written against the Meridian National Bank schema. For each exercise, provide both the original (slow) and rewritten (fast) query, and explain why the rewrite is faster. Where possible, describe what you would expect to see in EXPLAIN output for both versions.


Section A: Sargable Predicate Rewrites (Exercises 1-6)

Exercise 1 — Date Function Elimination

The following query retrieves all transactions from Q4 2025. Rewrite it to be sargable:

SELECT txn_id, txn_date, txn_amount
FROM   TRANSACTIONS
WHERE  YEAR(txn_date) = 2025
  AND  QUARTER(txn_date) = 4;

State what index would benefit the rewrite and predict the MATCHCOLS value.

Exercise 2 — String Function Elimination

Meridian's customer service application searches by email domain. Rewrite this query to be sargable:

SELECT customer_id, customer_name, email
FROM   CUSTOMERS
WHERE  SUBSTR(email, LOCATE('@', email) + 1) = 'meridianbank.com';

Discuss two different approaches (one using data model changes, one using an expression index on LUW) and explain which is more portable across z/OS and LUW.

Exercise 3 — Implicit Conversion

The ACCOUNT_NUMBER column is defined as CHAR(12). Identify the performance problem and fix it:

SELECT account_id, account_number, current_balance
FROM   ACCOUNTS
WHERE  account_number = 100012345678;

Exercise 4 — Arithmetic on Column

Rewrite this query so the predicate is sargable. The goal is to find accounts where the available balance (current_balance minus hold_amount) exceeds $10,000:

SELECT account_id, account_number, current_balance, hold_amount
FROM   ACCOUNTS
WHERE  current_balance - hold_amount > 10000;

Provide two solutions: one using algebraic rearrangement, and one using a generated column with an index.

Exercise 5 — OR Predicate Split

Rewrite this query using UNION ALL so each branch can use its own index. Ensure no duplicate rows in the result:

SELECT customer_id, customer_name, phone, email
FROM   CUSTOMERS
WHERE  phone = '555-0147'
   OR  email = 'jane.doe@example.com';

Exercise 6 — LIKE with Leading Wildcard

A compliance officer needs to find all transactions whose description contains the word "WIRE." Given that TRANSACTIONS has 50 million rows, propose three approaches ranked by performance, and write the SQL for each:

  1. A brute-force approach (slow but correct)
  2. A scoped approach (narrow the search first)
  3. A text search approach (if available on the platform)

Section B: Stage 1 vs Stage 2 Rewrites — z/OS (Exercises 7-10)

Exercise 7 — Predicate Stage Classification

Classify each of the following predicates as Stage 1 or Stage 2. For each Stage 2 predicate, provide a Stage 1 rewrite:

a) WHERE customer_id = 100
b) WHERE YEAR(open_date) = 2024
c) WHERE branch_id IN (SELECT branch_id FROM REGIONS WHERE region = 'EAST')
d) WHERE current_balance BETWEEN 1000 AND 5000
e) WHERE LENGTH(customer_name) > 20
f) WHERE account_type IS NOT NULL
g) WHERE txn_amount + fee_amount > 500
h) WHERE UPPER(status_code) = 'ACTIVE'

Exercise 8 — Stage 2 to Stage 1 Conversion

The following query has three Stage 2 predicates. Identify them and rewrite the entire query so that all predicates are Stage 1:

SELECT c.customer_id, c.customer_name, a.account_number
FROM   CUSTOMERS c
JOIN   ACCOUNTS a ON a.customer_id = c.customer_id
WHERE  YEAR(a.open_date) = 2024
  AND  c.customer_name LIKE '%JOHNSON%'
  AND  a.current_balance + a.hold_amount > 50000;

Exercise 9 — Predicate Stage Impact Measurement

Write the EXPLAIN SQL and the query against DSN_PREDICAT_TABLE that would show you the stage classification of each predicate in Exercise 8's original query. What columns would you examine?

Exercise 10 — Mixed Stage Optimization

Given this query with both Stage 1 and Stage 2 predicates, reorder and rewrite to maximize Stage 1 filtering:

SELECT t.txn_id, t.txn_date, t.txn_amount, t.description
FROM   TRANSACTIONS t
WHERE  t.account_id = 12345
  AND  DAYOFWEEK(t.txn_date) = 1          -- Sundays
  AND  t.txn_amount > 1000
  AND  UPPER(t.description) LIKE '%ATM%';

Which predicates can be made Stage 1? Which must remain Stage 2? What index would you recommend?


Section C: Join Optimization (Exercises 11-15)

Exercise 11 — JOIN Order Analysis

The following query joins four tables. Given these approximate cardinalities after filtering: - REGIONS: 8 rows (region = 'NORTHEAST') - BRANCHES: 25 rows (in NORTHEAST region) - ACCOUNTS: 200,000 rows (active accounts at those branches) - TRANSACTIONS: 5,000,000 rows (last quarter)

What is the optimal join order? Write the query with comments explaining your chosen order and the join method you expect for each step.

SELECT r.region_name, b.branch_name, a.account_number,
       SUM(t.txn_amount) AS total_amount
FROM   REGIONS r
JOIN   BRANCHES b ON b.region_id = r.region_id
JOIN   ACCOUNTS a ON a.branch_id = b.branch_id
JOIN   TRANSACTIONS t ON t.account_id = a.account_id
WHERE  r.region_name = 'NORTHEAST'
  AND  a.status = 'A'
  AND  t.txn_date >= '2025-07-01' AND t.txn_date < '2025-10-01'
GROUP BY r.region_name, b.branch_name, a.account_number;

Exercise 12 — EXISTS vs IN vs JOIN

Write three versions of this query (EXISTS, IN, JOIN) and explain which is best given that ACCOUNTS has 2 million rows and VIP_CUSTOMERS has 500 rows:

"Find all accounts belonging to VIP customers."

Exercise 13 — Anti-Join

Write a query to find all customers who have no active accounts (status = 'A'). Write it three ways (NOT IN, NOT EXISTS, LEFT JOIN ... IS NULL). Explain the correctness issue with NOT IN if customer_id can be NULL in the ACCOUNTS table.

Exercise 14 — Correlated Subquery to JOIN

Rewrite this query to eliminate the correlated subqueries:

SELECT a.account_id,
       a.account_number,
       (SELECT MAX(t.txn_date) FROM TRANSACTIONS t WHERE t.account_id = a.account_id) AS last_txn,
       (SELECT MIN(t.txn_date) FROM TRANSACTIONS t WHERE t.account_id = a.account_id) AS first_txn,
       (SELECT SUM(t.txn_amount) FROM TRANSACTIONS t WHERE t.account_id = a.account_id AND t.txn_type = 'D') AS total_deposits
FROM   ACCOUNTS a
WHERE  a.branch_id = 101;

Exercise 15 — OPTIMIZE FOR Hint

The Meridian teller application needs the first 10 matching customers quickly. Write two versions of this query — one without hints and one with OPTIMIZE FOR and FETCH FIRST. Explain what access path change you expect:

SELECT c.customer_id, c.customer_name, c.phone
FROM   CUSTOMERS c
WHERE  c.branch_id = 101
  AND  c.customer_name >= 'S'
ORDER BY c.customer_name;

Section D: Sort Avoidance (Exercises 16-18)

Exercise 16 — ORDER BY Index Alignment

Given this index:

CREATE INDEX IX_TXN_ACCT_DATE ON TRANSACTIONS(account_id, txn_date);

Which of the following queries can avoid a sort? Explain each:

-- Query A
SELECT * FROM TRANSACTIONS WHERE account_id = 100 ORDER BY txn_date;

-- Query B
SELECT * FROM TRANSACTIONS WHERE account_id = 100 ORDER BY txn_date DESC;

-- Query C
SELECT * FROM TRANSACTIONS WHERE account_id BETWEEN 100 AND 200 ORDER BY account_id, txn_date;

-- Query D
SELECT * FROM TRANSACTIONS WHERE txn_date = '2025-06-15' ORDER BY account_id;

Exercise 17 — DISTINCT Elimination

Rewrite this query to eliminate the DISTINCT without changing the result:

SELECT DISTINCT c.customer_id, c.customer_name, c.branch_id
FROM   CUSTOMERS c
JOIN   ACCOUNTS a ON a.customer_id = c.customer_id
JOIN   TRANSACTIONS t ON t.account_id = a.account_id
WHERE  t.txn_date >= '2025-01-01'
  AND  t.txn_amount > 10000;

Exercise 18 — UNION to UNION ALL

Determine whether these two result sets can overlap. If not, safely replace UNION with UNION ALL:

SELECT customer_id, customer_name, 'CHECKING' AS acct_type
FROM   CUSTOMERS
WHERE  customer_id IN (SELECT customer_id FROM ACCOUNTS WHERE account_type = 'CHK')
UNION
SELECT customer_id, customer_name, 'SAVINGS' AS acct_type
FROM   CUSTOMERS
WHERE  customer_id IN (SELECT customer_id FROM ACCOUNTS WHERE account_type = 'SAV');

Section E: Pagination and FETCH FIRST (Exercises 19-21)

Exercise 19 — Offset vs Keyset Pagination

Write two implementations for paginating through a customer's transaction history (25 rows per page): 1. Offset-based pagination (page N) 2. Keyset-based pagination (using last row from previous page)

Include the supporting index for each approach. Explain why keyset pagination is O(1) per page while offset pagination is O(N).

Exercise 20 — Top-N Query

Write an efficient query to find the 10 highest-balance checking accounts across the entire bank. Include customer name and branch name. Use FETCH FIRST and OPTIMIZE FOR. What index would make this query optimal?

Exercise 21 — Pagination with Filtering

The Meridian web application lets customers search their transactions by date range and amount range, with paginated results. Design a keyset pagination query for: - Account ID: provided by login session - Date range: user-specified start and end dates - Amount filter: minimum transaction amount - Sort: by txn_date DESC, txn_id DESC - Page size: 20 rows


Section F: MQT Design (Exercises 22-24)

Exercise 22 — MQT Creation

Create an MQT that summarizes monthly transaction volumes and amounts by branch, account type, and transaction type. Include COUNT, SUM, AVG, MIN, and MAX. Write the CREATE TABLE, REFRESH, and RUNSTATS statements.

Exercise 23 — MQT Routing

After creating the MQT from Exercise 22, write three queries against the base tables that should be automatically routed to the MQT. Include the SET CURRENT REFRESH AGE statement. Then write one query that cannot be routed, and explain why.

Exercise 24 — MQT Refresh Strategy

Meridian's data warehouse receives batch loads at 2 AM and 2 PM daily. Design a refresh strategy for three MQTs with different staleness requirements: 1. Branch performance dashboard (tolerable staleness: 1 hour during business hours) 2. Monthly regulatory report (must be exact as of month-end) 3. Customer self-service balance summary (near-real-time)


Section G: Comprehensive Tuning (Exercises 25-28)

Exercise 25 — Full Query Rewrite

The following query appears in Meridian's nightly batch for calculating interest. It currently takes 90 minutes. Identify all performance problems and rewrite it:

SELECT a.account_id, a.account_number,
       a.current_balance,
       (SELECT interest_rate FROM PRODUCTS p WHERE p.product_id = a.product_id) AS rate,
       a.current_balance * (SELECT interest_rate FROM PRODUCTS p WHERE p.product_id = a.product_id) / 365 AS daily_interest,
       (SELECT COUNT(*) FROM TRANSACTIONS t
        WHERE t.account_id = a.account_id
          AND MONTH(t.txn_date) = MONTH(CURRENT DATE)
          AND YEAR(t.txn_date) = YEAR(CURRENT DATE)) AS monthly_txn_count
FROM   ACCOUNTS a
WHERE  a.status = 'A'
  AND  a.account_type IN ('SAV', 'CD', 'MMA')
ORDER BY a.branch_id, a.account_id;

Exercise 26 — Tuning with EXPLAIN

For the original and rewritten versions of Exercise 25, write the EXPLAIN statements (for both z/OS and LUW). Describe what specific metrics you would compare in the EXPLAIN output.

Exercise 27 — Before/After Documentation

Create a tuning documentation template and fill it in for one of the rewrites from this chapter. Include: query name, business context, original SQL, original EXPLAIN summary, identified problems, rewritten SQL, new EXPLAIN summary, before/after metrics, and validation approach.

Exercise 28 — Real-World Tuning Scenario

The Meridian fraud detection system runs this query every 5 minutes to find suspicious patterns. It is consuming too much CPU. Analyze and tune it:

SELECT c.customer_id, c.customer_name, c.phone,
       t1.txn_id, t1.txn_date, t1.txn_amount, t1.description
FROM   TRANSACTIONS t1
JOIN   ACCOUNTS a ON a.account_id = t1.account_id
JOIN   CUSTOMERS c ON c.customer_id = a.customer_id
WHERE  t1.txn_amount > 10000
  AND  t1.txn_date >= CURRENT DATE
  AND  EXISTS (
    SELECT 1 FROM TRANSACTIONS t2
    WHERE  t2.account_id = t1.account_id
      AND  t2.txn_id <> t1.txn_id
      AND  t2.txn_date >= CURRENT DATE
      AND  t2.txn_amount > 10000
      AND  ABS(TIMESTAMPDIFF(4, CHAR(t2.txn_timestamp - t1.txn_timestamp))) < 60
  )
ORDER BY t1.txn_date DESC, t1.txn_amount DESC;

Consider: What index supports the outer query? Can the EXISTS subquery be optimized? What about the TIMESTAMPDIFF function?