Chapter 24 Quiz: SQL Tuning — Rewriting Queries for Performance Without Changing Results
Instructions: Choose the single best answer for each multiple-choice question. For short-answer questions, write 2-4 sentences. Answers are provided at the end.
Question 1
What does the term "sargable" stand for?
- A) Structured Argument Readable
- B) Search Argument Able
- C) Sequential Argument Blockable
- D) Sort Argument Buildable
Question 2
Which of the following predicates is sargable?
- A)
WHERE YEAR(txn_date) = 2025 - B)
WHERE txn_date >= '2025-01-01' AND txn_date < '2026-01-01' - C)
WHERE SUBSTR(account_number, 1, 3) = 'CHK' - D)
WHERE current_balance * 1.1 > 5000
Question 3
On z/OS, what is the key difference between Stage 1 and Stage 2 predicates?
- A) Stage 1 predicates are evaluated by RDS; Stage 2 by Data Manager
- B) Stage 1 predicates are evaluated by Data Manager; Stage 2 by RDS
- C) Stage 1 predicates require an index; Stage 2 do not
- D) Stage 1 predicates are applied after sorting; Stage 2 before sorting
Question 4
Which predicate is classified as Stage 2 on z/OS?
- A)
WHERE customer_id = 100 - B)
WHERE status IN ('A', 'I') - C)
WHERE LENGTH(customer_name) > 20 - D)
WHERE txn_date BETWEEN '2025-01-01' AND '2025-12-31'
Question 5
Why is NOT EXISTS generally preferred over NOT IN?
- A) NOT EXISTS is always faster regardless of data distribution
- B) NOT IN returns incorrect results if the subquery returns any NULL values
- C) NOT EXISTS can use parallel processing; NOT IN cannot
- D) NOT IN requires a sort; NOT EXISTS does not
Question 6
An EXISTS subquery stops processing after finding the first matching row (early-out). In which scenario does this provide the greatest benefit?
- A) When the outer query returns 1 row and the inner table has 1 row per match
- B) When the outer query returns 1,000 rows and each has 10,000 matching inner rows
- C) When both tables have exactly 1 row
- D) When the inner table has no matching rows for any outer row
Question 7
Which join method is best when one table is very small after filtering and the inner table has an index on the join column?
- A) Hash Join
- B) Sort Merge Join
- C) Nested Loop Join
- D) Cartesian Join
Question 8
What is the primary benefit of keyset pagination over offset pagination?
- A) Keyset pagination produces results in a different order
- B) Keyset pagination has constant cost per page regardless of page depth
- C) Keyset pagination does not require an ORDER BY clause
- D) Keyset pagination automatically removes duplicate rows
Question 9
Which of the following causes an unnecessary sort that could be eliminated?
- A) ORDER BY on a column that matches the leading index key
- B) UNION ALL between two queries
- C) UNION between two queries with identical result sets
- D) GROUP BY on the leading columns of a covering index
Question 10
When does the DB2 optimizer automatically route a query to a Materialized Query Table (MQT)?
- A) Always, whenever an MQT exists for the base tables
- B) When CURRENT REFRESH AGE is set appropriately and the MQT covers the query
- C) Only when the MQT is referenced explicitly in the FROM clause
- D) When the query uses FETCH FIRST
Question 11
What is the primary risk of replacing a correlated subquery with a JOIN?
- A) The JOIN version is always slower
- B) A many-to-many relationship may produce duplicate rows that the subquery did not
- C) JOINs cannot filter on the same columns as subqueries
- D) The optimizer cannot use indexes with JOINs
Question 12
Which rewrite correctly converts a non-sargable date predicate to a sargable one?
- A)
WHERE MONTH(txn_date) = 6becomesWHERE txn_date LIKE '%-06-%' - B)
WHERE YEAR(txn_date) = 2025becomesWHERE txn_date >= '2025-01-01' AND txn_date < '2026-01-01' - C)
WHERE DATE(txn_timestamp) = '2025-06-15'becomesWHERE txn_timestamp = '2025-06-15' - D)
WHERE DAYOFWEEK(txn_date) = 1becomesWHERE txn_date IN ('2025-01-05', '2025-01-12', ...)
Question 13
A query uses SELECT DISTINCT because a JOIN produces duplicates. What is a better approach?
- A) Add more columns to the SELECT list to make rows unique
- B) Rewrite using EXISTS to avoid producing duplicates in the first place
- C) Add an ORDER BY to naturally eliminate duplicates
- D) Use UNION instead of JOIN
Question 14
In the context of SQL tuning, what should you do BEFORE rewriting any query?
- A) Add all possible indexes to the tables involved
- B) Measure current performance and examine the EXPLAIN output
- C) Convert the query to use stored procedures
- D) Increase buffer pool sizes to compensate for slow SQL
Question 15 (Short Answer)
Explain why WHERE account_number = 12345 is non-sargable when account_number is defined as CHAR(12), and how to fix it.
Question 16 (Short Answer)
A developer argues that since the optimizer should transform IN (subquery) into a join automatically, there is no need to manually rewrite subqueries. Under what circumstances is this argument incorrect?
Question 17 (Short Answer)
Explain the diminishing returns problem with offset-based pagination. At what page depth does the problem become severe for a table with 10 million rows and 20 rows per page?
Question 18 (Short Answer)
Why is UNION ALL preferred over UNION when you know the result sets are disjoint? What specific processing step does UNION ALL skip?
Question 19 (Short Answer)
A z/OS DBA finds that a predicate is Stage 1 but not indexable. Explain what this means and whether it is still beneficial compared to Stage 2.
Question 20 (Short Answer)
Describe one scenario where a Materialized Query Table (MQT) would hurt performance rather than help it.
Answer Key
Question 1: B — Search Argument Able. The term describes a predicate that the database engine can use as a search argument for an index.
Question 2: B — A range predicate with the column alone on one side is sargable. Options A, C, and D all apply functions or expressions to the column.
Question 3: B — Stage 1 predicates are evaluated by Data Manager (DM) as data is read. Stage 2 predicates are evaluated by Relational Data Services (RDS) after the row has already been fetched.
Question 4: C — LENGTH(customer_name) > 20 applies a function to the column, making it Stage 2. All other options are standard comparison patterns classified as Stage 1.
Question 5: B — If the subquery contains any NULL value, NOT IN produces an "unknown" result for every comparison, which causes the entire WHERE clause to return no rows. NOT EXISTS does not have this problem.
Question 6: B — Early-out provides the greatest benefit when each outer row has many inner matches, because EXISTS stops at the first match. With 10,000 matches per outer row, EXISTS avoids reading 9,999 rows per outer row.
Question 7: C — Nested Loop Join. When the driving (outer) table is small and the inner table has an index on the join column, NLJ performs a small number of efficient index lookups.
Question 8: B — Keyset pagination uses a WHERE clause to start reading from the right position in the index, making every page equally fast regardless of depth. Offset pagination must read and discard all preceding rows.
Question 9: C — UNION between two queries forces a sort for duplicate elimination. If the result sets are identical (complete overlap), all rows would be processed through the sort just to remove duplicates.
Question 10: B — The optimizer routes to an MQT when CURRENT REFRESH AGE allows stale data (or the MQT is current) and the MQT's definition covers the query's columns, predicates, and grouping.
Question 11: B — If the join creates a many-to-many relationship, it produces duplicate rows in the outer table that the original correlated subquery would not have produced. You may need DISTINCT or a derived table to handle this.
Question 12: B — This correctly replaces the YEAR() function with a range predicate using >= and < bounds. Option A uses LIKE on a date (not reliable). Option C changes the semantics (timestamp vs date). Option D is impractical and not a general solution.
Question 13: B — Rewriting with EXISTS avoids producing duplicates in the first place, eliminating the need for DISTINCT and its associated sort.
Question 14: B — The SQL tuning mindset requires measuring first. Without a baseline and EXPLAIN analysis, you cannot make informed decisions about what to change.
Question 15: The literal 12345 is numeric, but account_number is CHAR(12). DB2 must apply an implicit CAST to convert the CHAR column to numeric for comparison, which prevents index use. Fix: WHERE account_number = '100012345678' (using a string literal that matches the column type).
Question 16: The optimizer cannot always flatten subqueries. Complex subqueries with DISTINCT, GROUP BY, HAVING, or certain forms of correlation may remain as subquery access paths that execute once per outer row. Additionally, poor cardinality estimates can cause the optimizer to choose a suboptimal transformation. Manual rewriting gives you control.
Question 17: Offset-based pagination has O(N) cost per page because DB2 must read and discard all preceding rows. For 10 million rows at 20 per page, page 100,000 (the midpoint) requires reading 2 million rows to return 20. Pages beyond the midpoint are even worse. The problem becomes noticeable around page 1,000 (20,000 rows to skip) and severe beyond page 10,000.
Question 18: UNION ALL skips the sort (or hash) operation required for duplicate elimination. When the result sets are disjoint, there are no duplicates to eliminate, so UNION would do a sort that discards nothing — pure overhead.
Question 19: A Stage 1, non-indexable predicate is evaluated early by Data Manager (filtering rows before they pass to RDS) but cannot be used for index matching (no suitable index key exists). It is still beneficial compared to Stage 2 because it rejects non-qualifying rows earlier in the processing pipeline, saving CPU in upper-level processing.
Question 20: If the base tables are frequently updated and the MQT uses REFRESH IMMEDIATE, every INSERT, UPDATE, and DELETE on the base tables incurs additional overhead to maintain the MQT. If the MQT is queried infrequently relative to how often the base data changes, the maintenance cost exceeds the query benefit.