Chapter 22: Quiz — How the DB2 Optimizer Works

DataField.Dev

Chapter 22: Quiz — How the DB2 Optimizer Works

Test your understanding of the DB2 optimizer. Select the best answer for each question.

Question 1. What type of optimization has DB2 used since its initial release?

A) Rule-based optimization B) Cost-based optimization C) Heuristic-only optimization D) Adaptive optimization without statistics

Question 2. In DB2's cost-based optimizer, what does the "cost" primarily represent?

A) The exact elapsed time in seconds B) The dollar cost of hardware resources consumed C) A relative metric blending estimated I/O and CPU cost D) The number of rows returned by the query

Question 3. What is a filter factor?

A) The number of rows that match a predicate B) The fraction of rows in a table that satisfy a given predicate (between 0 and 1) C) The percentage of pages in a buffer pool that are dirty D) The ratio of index pages to data pages

Question 4. Under the uniform distribution assumption, what is the filter factor for WHERE BRANCH_ID = 7 if the BRANCH_ID column has COLCARD = 50?

A) 0.50 B) 0.07 C) 0.02 D) 0.005

Question 5. Which catalog statistics are used to estimate filter factors for range predicates?

A) COLCARD and NLEVELS B) HIGH2KEY and LOW2KEY C) FULLKEYCARD and FIRSTKEYCARD D) NPAGES and CARDF

Question 6. When combining two AND predicates, how does the optimizer compute the combined filter factor under the independence assumption?

A) FF(A) + FF(B) B) FF(A) x FF(B) C) MAX(FF(A), FF(B)) D) MIN(FF(A), FF(B))

Question 7. What is the primary risk of the independence assumption for compound predicates?

A) It always overestimates the number of qualifying rows B) It cannot handle more than two predicates C) It can severely mis-estimate cardinality when columns are correlated D) It only works with equality predicates

Question 8. Which DB2 feature helps the optimizer accurately estimate filter factors for skewed column values?

A) Buffer pool tuning B) Frequency value statistics (FREQVAL) C) Increasing the sort heap D) Enabling parallel query

Question 9. In which scenario is a table scan most likely to be the optimal access path?

A) A highly selective query on an indexed column B) A query that matches only 5 rows in a 10-million-row table C) A query where the predicate matches 95% of rows D) A query using an index-only access path

Question 10. What is "index-only access"?

A) Using an index with only one key column B) Accessing only the index leaf pages, without reading any data pages, because all needed columns are in the index C) Using only one index when multiple indexes are available D) An index access path that retrieves only the first row

Question 11. What does list prefetch accomplish?

A) It pre-reads index pages before they are needed B) It collects qualifying RIDs from an index, sorts them by page number, and reads data pages in sequential order C) It prefetches rows from a joined table before the join starts D) It caches query results for future identical queries

Question 12. Which join method is most efficient when the outer table is very small and an index exists on the inner table's join column?

A) Hash join B) Merge scan join C) Nested loop join D) Cartesian join

Question 13. When is hash join typically the best choice?

A) When one input has exactly one row B) When both inputs are large, no useful index exists on the join column, and an equi-join is used C) When both inputs are already sorted on the join column D) When the join is a non-equi-join (theta join)

Question 14. On z/OS, what is the difference between Stage 1 and Stage 2 predicates?

A) Stage 1 predicates use indexes; Stage 2 predicates use table scans B) Stage 1 predicates are evaluated by the Data Manager close to the data; Stage 2 predicates are evaluated higher in the processing chain after row materialization C) Stage 1 predicates are simple; Stage 2 predicates are complex D) Stage 1 predicates run first chronologically; Stage 2 predicates run second

Question 15. Which of the following predicates is classified as Stage 2 (non-sargable) on z/OS?

A) WHERE BALANCE > 5000 B) WHERE STATUS = 'ACTIVE' C) WHERE YEAR(TXN_DATE) = 2024 D) WHERE TXN_DATE BETWEEN '2024-01-01' AND '2024-12-31'

Question 16. What is the correct Stage 1 rewrite of WHERE YEAR(TXN_DATE) = 2024?

A) WHERE TXN_DATE = 2024 B) WHERE TXN_DATE >= '2024-01-01' AND TXN_DATE < '2025-01-01' C) WHERE EXTRACT(YEAR FROM TXN_DATE) = 2024 D) WHERE TXN_DATE LIKE '2024%'

Question 17. What is MQT routing?

A) Routing queries to a different database server B) The optimizer automatically directing a query to a Materialized Query Table instead of the base tables when the MQT can satisfy the query C) Creating a new MQT at query execution time D) Using an MQT as an index for the base table

Question 18. What is the single most common cause of suboptimal access paths in production DB2 systems?

A) Insufficient memory B) Too many indexes C) Stale catalog statistics D) Network latency

Question 19. After running RUNSTATS on a table, what should you do before issuing REBIND on static SQL packages?

A) Drop and recreate all indexes B) Verify that the new statistics are reasonable and reflect actual data characteristics C) Restart the DB2 subsystem D) Delete the dynamic statement cache

Question 20. What does the PLANMGMT(EXTENDED) option on z/OS provide?

A) It creates an execution plan for extended SQL syntax B) It keeps copies of previous access plans so you can switch back if the new plan regresses C) It extends the optimization time limit D) It manages the plan table automatically

Question 21. What does OPTIMIZE FOR 20 ROWS tell the optimizer?

A) The query will return exactly 20 rows B) The optimizer should choose a plan optimized for quickly returning the first 20 rows rather than all rows C) Only 20 rows should be stored in the buffer pool D) The query should use 20 parallel threads

Question 22. When is REOPT(ONCE) most beneficial?

A) For static SQL with literal values B) For queries with parameter markers where the actual values significantly affect filter factors and the query is executed multiple times with similar values C) For DDL statements D) For queries that always perform table scans

Question 23. On LUW, what does the CURRENT QUERY OPTIMIZATION special register control?

A) The maximum number of rows the optimizer will process B) The level of effort the optimizer spends exploring the search space for candidate plans C) Whether the optimizer uses cost-based or rule-based optimization D) The maximum elapsed time for query execution

Question 24. Why do access paths sometimes change after a DB2 version migration?

A) The new version always uses different indexes B) The new version may have updated cost formulas, new access path types, and new query rewrite rules C) Statistics are automatically deleted during migration D) The new version forces all queries to use table scans initially

Question 25. What is the "threshold concept" of optimizer understanding described in this chapter?

A) The optimizer has a cost threshold above which it rejects plans B) The optimizer makes statistical decisions based on a model of the data (catalog statistics), not the actual data at optimization time — and when the model diverges from reality, plans diverge from optimal C) There is a threshold number of rows below which the optimizer always uses index access D) The optimizer has a threshold for the maximum number of tables it can join

Answer Key

B — DB2 has used cost-based optimization since its initial release in 1983.
C — Cost is a relative metric (timerons) blending estimated I/O and CPU cost; it does not predict exact elapsed time.
B — A filter factor is the fraction of rows satisfying a predicate, ranging from 0 to 1.
C — FF = 1/COLCARD = 1/50 = 0.02.
B — HIGH2KEY and LOW2KEY define the value range for range predicate estimation.
B — Under independence, FF(A AND B) = FF(A) x FF(B).
C — Correlated columns cause the independence assumption to severely mis-estimate cardinality.
B — Frequency value statistics capture the actual occurrence count for the most common values.
C — When 95% of rows match, an index adds overhead without meaningful filtering; a sequential scan is cheaper.
B — Index-only access satisfies the query entirely from index leaf pages without reading data pages.
B — List prefetch collects RIDs, sorts by page number, and reads data pages sequentially.
C — Nested loop join is ideal when the outer is tiny and the inner has an efficient index probe.
B — Hash join excels for large-to-large equi-joins without useful indexes.
B — Stage 1 is evaluated by the Data Manager at the page level; Stage 2 is evaluated after row materialization in the Relational Data System.
C — YEAR() is a function on a column, making it non-sargable (Stage 2).
B — Rewrite as a range predicate: TXN_DATE >= '2024-01-01' AND TXN_DATE < '2025-01-01'.
B — MQT routing automatically directs qualifying queries to a precomputed Materialized Query Table.
C — Stale statistics are the most common cause of suboptimal access paths.
B — Verify that the new statistics are reasonable before REBIND to avoid plan regressions.
B — PLANMGMT(EXTENDED) retains previous plans, enabling rollback with SWITCH(PREVIOUS).
B — It biases the optimizer toward plans that return the first rows quickly.
B — REOPT(ONCE) reoptimizes with actual parameter values on first execution, then reuses the plan.
B — It controls the optimization level (0-9), determining how thoroughly the optimizer searches the plan space.
B — New DB2 versions bring updated cost models, new access methods, and new rewrite rules.
B — The optimizer works from a statistical model, not from the actual data. Model accuracy determines plan quality.