Chapter 15: Key Takeaways
These are the essential points from Chapter 15 that you will need throughout the rest of this book and in production environments. If you can explain each of these clearly and from memory, you have a solid foundation for index design in DB2.
Core Concepts to Remember
-
B+ trees are wide, shallow, and balanced. A B+ tree of height 3 can index millions of rows; height 4 handles billions. Each level adds one I/O operation to a lookup, and the upper levels are almost always cached in the buffer pool. The practical cost of an indexed lookup is typically 1-2 physical I/O operations regardless of table size.
-
Leaf page chaining enables efficient range scans. Once DB2 locates the first qualifying leaf page via the tree, it follows forward pointers through the linked list of leaf pages without returning to the root. A range scan that covers 100 leaf pages requires roughly 103 I/O operations (3 for tree traversal + 100 for leaf pages), not 100 separate tree traversals.
-
Clustering is the single most impactful index decision. A clustering index determines the physical order of rows in the table. Range scans on the clustering key read contiguous pages with sequential prefetch. The same query on a non-clustered column reads random pages — potentially orders of magnitude slower. You get exactly one clustering index per table. Choose based on the dominant access pattern.
-
Cluster ratio degrades over time. Inserts to a clustered table gradually scatter rows away from their ideal position. Monitor the cluster ratio in the catalog. When it drops below 80-90%, REORG the tablespace to restore physical order.
-
Index-only access eliminates the most expensive I/O. When every column referenced by a query exists in the index (as key columns or INCLUDE columns), DB2 never reads data pages. This can reduce I/O by 90% or more for queries that otherwise require many random data page reads.
-
The left prefix rule governs composite index usage. A composite index on (A, B, C) supports matching scans on A, (A, B), or (A, B, C) — but not on B alone, C alone, or (B, C). Column order is therefore a critical design decision: place equality predicate columns first, then range predicate columns, then ORDER BY columns.
-
Every index is a write-cost investment. Each index on a table adds one page write (plus possible page splits) to every INSERT, one page write per DELETE, and one page write per UPDATE of an indexed column. A table with 5 indexes performs 6 writes for every INSERT. Design indexes to justify their write cost through measurable read improvement.
-
Redundant indexes are pure waste. An index on (A) is redundant if (A, B) exists. Redundant indexes consume storage, slow writes, extend REORG time, and provide no unique benefit. Periodic index review should identify and eliminate them.
-
Functions on indexed columns prevent index use. Wrapping an indexed column in a function (UPPER, YEAR, CAST) in a WHERE clause prevents the optimizer from using the index for a matching scan. Use expression-based indexes (LUW) or generated columns (z/OS) to make function-based queries indexable.
-
Index design is workload-driven, not rule-driven. There is no formula for "the right number of indexes." The answer depends on query patterns, read/write ratios, response time requirements, and maintenance windows. Analyze the workload, design candidate indexes, estimate overhead, test, and measure.
Platform-Specific Reminders
z/OS
- DPSI vs. NPSI: Data-Partitioned Secondary Indexes provide partition independence (REORG one partition at a time) but penalize queries that lack the partition key predicate. Non-Partitioned Secondary Indexes are better for cross-partition queries but reduce partition independence.
- Generated columns: Use generated columns to create indexable expressions, since z/OS does not support expression-based indexes directly.
- CLUSTER keyword: Specify on CREATE INDEX to define the clustering index.
LUW
- INCLUDE clause: Use on unique indexes to add non-key columns for index-only access. INCLUDE columns are stored in leaf pages but do not participate in the key sort or uniqueness.
- Expression-based indexes: Create indexes directly on expressions like
UPPER(LAST_NAME)orYEAR(HIRE_DATE). - MDC (Multi-Dimensional Clustering): Allows clustering on multiple dimensions simultaneously, using block-level indexing. Powerful for data warehouse workloads with multi-dimensional filtering.
- ALLOW REVERSE SCANS: Enables efficient index scanning in both forward and reverse directions. Default in modern versions.
What to Carry Forward
As you proceed through the remaining chapters, keep these principles in active memory:
-
The optimizer depends on your indexes. In Chapter 16 (Query Optimization), you will see how the optimizer's access path decisions are directly shaped by the indexes you create. A well-designed index strategy makes the optimizer's job easier and your queries faster.
-
Statistics must be current. The optimizer uses catalog statistics (cardinality, cluster ratio, distribution) to choose access paths. Stale statistics lead to poor choices. RUNSTATS is not optional — it is part of the index lifecycle.
-
Index design is an ongoing discipline. Workloads evolve. Queries change. Data volumes grow. An index strategy that was optimal last year may be suboptimal today. Schedule periodic reviews. Monitor usage. Drop what is not needed. Add what is.
-
The Meridian Bank index strategy is your reference model. The 15 indexes across 4 tables, each with documented justification, represent a disciplined approach. When you design indexes for your own databases, follow the same methodology: analyze the workload, design with intent, document the rationale, and review periodically.
Preview of Chapter 16
In the next chapter, you will move from index design to query optimization — the process by which DB2's optimizer transforms your SQL into an execution plan. You will:
- Understand how the optimizer evaluates access paths and estimates costs
- Read and interpret EXPLAIN output on both z/OS and LUW
- Identify common performance problems in access plans
- Use optimization techniques to guide the optimizer toward better plans
- Apply optimization to the Meridian Bank query workload
The index design decisions you made in this chapter will directly influence the optimizer's choices in Chapter 16.
Return to Chapter 15 | Continue to Further Reading