Chapter 25 Key Takeaways: Buffer Pool and Memory Tuning
Buffer Pool Fundamentals
-
The buffer pool is the most critical memory structure in DB2. It determines whether page requests are served from memory (microseconds) or disk (milliseconds) — a difference of three orders of magnitude. Everything else in performance tuning assumes the buffer pool is doing its job.
-
Target hit ratios depend on workload type. OLTP pools should target 98-99%+. Batch/sequential pools at 80-90% are acceptable because prefetch handles throughput. A blended hit ratio across workload types masks problems — always measure per pool.
-
Random (synchronous) reads are the most expensive I/O. They suspend the application thread and add disk latency directly to response time. Sequential (prefetch) reads are asynchronous and amortize I/O cost across many pages. Tuning should focus on reducing synchronous reads.
-
Diminishing returns are real. Doubling a pool from 98% to 99% hit ratio cuts physical reads in half — but the absolute number is already small. The same memory might produce a larger improvement in another pool or memory consumer.
-
DB2 supports four page sizes (4K, 8K, 16K, 32K). Each buffer pool serves exactly one page size. Match page size to row width and access pattern. The choice is permanent for a table space — plan carefully.
z/OS Buffer Pool Configuration
-
BP0 should not be a dumping ground. Assign application objects to dedicated pools. Keep BP0 small and reserved for the system catalog and directory.
-
VPSIZE is the primary sizing lever. Size pools to hold 100% of the hot working set where possible. When the working set fits, the hit ratio approaches 100%.
-
VPSEQT protects OLTP from batch scans. Set it low (5-20%) on OLTP data and index pools to prevent sequential prefetch from evicting hot random-access pages. This is the most underappreciated z/OS buffer pool parameter.
-
DWQT and VDWQT prevent synchronous writes. If you observe synchronous writes, lower DWQT to trigger deferred writes earlier. Synchronous writes add write latency directly to the application thread — a more serious problem than synchronous reads.
-
PGFIX(YES) is mandatory for production. Without it, z/OS may page buffer pool memory to auxiliary storage, causing invisible double paging that degrades performance without appearing in DB2 statistics.
-
Hiperpool is obsolete on modern z/OS. Disable it (HPSIZE 0) and invest the memory in larger virtual buffer pools. Modern hardware has no separate expanded storage tier for hiperpool to exploit.
LUW Buffer Pool Configuration
-
Use SIZE AUTOMATIC to enable STMM management for pools where flexibility is acceptable. Use fixed SIZE for critical OLTP pools that must not shrink, even during low-activity periods.
-
NUMBLOCKPAGES improves sequential prefetch efficiency. Reserve 20-30% of batch/scan pools for block I/O. Do not use block areas in OLTP pools — the reservation wastes memory needed for random page caching.
-
Buffer pool changes are dynamic on LUW. No database restart is needed to create pools, alter sizes, or reassign table spaces. This makes tuning low-risk and reversible.
-
Do not leave everything in IBMDEFAULTBP. The default pool should hold only the system catalog. Create purpose-built pools with meaningful names for application workloads.
Buffer Pool Assignment Strategy
-
Separate things that behave differently. This is the core principle. Random vs. sequential, data vs. indexes, hot vs. cold, OLTP vs. batch, persistent vs. temporary — each distinction justifies a separate pool.
-
Give indexes their own pool. Index non-leaf pages have extremely high reuse and are accessed by virtually every query. A dedicated index pool with VPSEQT near zero ensures these high-value pages are never evicted.
-
Isolate batch workloads from OLTP. A single sequential scan can flush an entire OLTP cache. Separate pools eliminate this interference completely — the most common source of "OLTP spikes during batch" problems.
-
Use PGSTEAL(NONE) for small reference tables. When the entire table fits comfortably in the pool and must always be cached (branch codes, currency codes), PGSTEAL(NONE) guarantees zero physical I/O.
z/OS Memory Components (EDM, RID, Sort)
-
The EDM pool caches plans, packages, DBDs, and dynamic SQL. An undersized EDM pool causes repeated I/O for metadata and re-preparation of SQL, consuming CPU. EDMSTMTC (dynamic statement cache) is especially important for Java/JDBC applications.
-
The RID pool supports list prefetch and multi-index access. When it is too small, DB2 silently falls back from efficient list prefetch to full table space scans (RID failure). This causes dramatic, invisible performance degradation.
-
The sort pool determines whether sorts complete in memory. Sort overflows to work files dramatically increase elapsed time. Monitor sort overflow rate — target below 2-3%.
LUW Memory Architecture
-
Three tiers: INSTANCE_MEMORY > DATABASE_MEMORY > individual consumers. Each tier constrains the tier below it. Misalignment between tiers (e.g., buffer pools summing to more than DATABASE_MEMORY) causes allocation failures.
-
STMM redistributes memory based on benefit analysis. Enable it for DATABASE_MEMORY, buffer pools, sort heap, package cache, and lock list. It works well for varying workloads and new systems where optimal allocation is unknown.
-
The hybrid STMM approach works best. Fix sizes for critical OLTP pools (STMM cannot shrink them). Use AUTOMATIC for batch, temp, and less critical pools. This gives STMM flexibility while protecting what matters most.
-
Reserve 20-25% of physical RAM for the OS. Even on a dedicated database server. Overcommitting memory causes OS paging, which is catastrophically worse than a slightly lower buffer pool hit ratio.
Monitoring
-
Monitor hit ratios separately for data and index pages, per pool. A blended hit ratio can mask a serious problem. An index pool at 92% is a very different problem from a batch data pool at 92%.
-
Track synchronous reads and synchronous writes. Sync reads indicate cache misses. Sync writes indicate the pool is running out of clean pages — a more urgent problem that needs immediate attention.
-
Establish a baseline before tuning. Capture metrics during a full business cycle (one week minimum). Without a baseline, you cannot measure whether changes helped or hurt.
Tuning Methodology
-
Follow the cycle: baseline, analyze, adjust one thing, verify. Never change multiple parameters simultaneously. The discipline of single-variable changes is what separates methodical tuning from guessing.
-
Prioritize by impact. Tune OLTP buffer pools first (response time), then package cache/EDM pool (CPU), then sort memory (batch/report speed), then batch pools (throughput). Fix the most painful problem first.
-
The working set matters more than the data set. A 2 TB database may have a 500 MB working set. Cache the working set and the hit ratio approaches 100%. You do not need to cache the entire database — just the pages that are actually accessed.