Key Takeaways — Chapter 11: DB2 Performance Diagnosis

Core Principles

  1. Measure before you tune. The accounting trace tells you where time is spent. EXPLAIN tells you how DB2 accesses data. Without these, you are guessing. Guessing costs more than diagnosis.

  2. Define problems with numbers. "The batch is slow" is not a problem statement. "Job CNBGL410 elapsed time increased from 45 minutes to 3 hours 47 minutes" is a problem statement. No numbers, no diagnosis.

  3. The Five Common Problems cover 90% of cases. Tablespace scans, lock contention, thread reuse issues, commit frequency, and data skew. Master these five patterns and you will handle nearly every DB2 performance issue you encounter.

  4. Commit frequency is the single most impactful tuning lever for COBOL batch programs. Most programs commit far too infrequently. The overhead of frequent commits (1-3ms per commit) is almost always dwarfed by the lock wait reduction.

  5. Performance is continuous, not episodic. Daily monitoring, weekly reviews, monthly analysis, quarterly deep dives. Gradual degradation is invisible without trend analysis.

EXPLAIN Essentials

  1. MATCHCOLS is the most important number in PLAN_TABLE. It tells you how precisely the index is targeting your data. Higher is better. When MATCHCOLS is lower than expected, check for functions on columns, type mismatches, and missing leading predicates.

  2. ACCESSTYPE='R' on a large table is a red flag. Tablespace scans read every page. On a 50-million-row table, this is catastrophic. On a 200-row reference table, it is fine. Context matters.

  3. DSN_STATEMNT_TABLE's COST_CATEGORY is gold. COST_CATEGORY='B' means the optimizer is uncertain about its estimate. Uncertain estimates lead to suboptimal access paths. Fix the statistics.

  4. INDEXONLY='Y' is the best possible access. When DB2 can satisfy a query entirely from the index, it never touches data pages. Design queries and indexes to enable index-only access when possible.

  5. Never apply functions to indexed columns in WHERE clauses. WHERE YEAR(TRANS_DATE) = 2025 kills index matching. Rewrite as WHERE TRANS_DATE BETWEEN '2025-01-01' AND '2025-12-31'.

Accounting Trace Essentials

  1. IFCID 3 (accounting trace) should always be active. The 1-3% overhead is the cheapest performance insurance available. Without it, you have no baseline and no diagnostic data when problems occur.

  2. Read accounting data like bloodwork. CPU = brain. Lock wait = circulatory. I/O wait = muscles. The ratios tell you which organ is sick before you order specialized tests.

  3. High elapsed with low CPU means the program is waiting. Check lock wait first (most common), then I/O wait, then other waits. If CPU is proportional to elapsed, the program is doing too much work — check access paths.

  4. Compare SQL call counts to business volume. If a program processes 2 million transactions and executes 14 million SELECTs, that is 7 SELECTs per transaction. Is that reasonable, or are there redundant lookups that could be cached in COBOL working storage?

Problem-Specific Takeaways

  1. When access paths change suddenly and no code changed, check RUNSTATS. Stale or incorrect statistics cause more sudden performance degradations than all other causes combined.

  2. Lock contention is solved by reducing lock duration, not by adding hardware. Increase commit frequency. Resequence jobs. Use appropriate isolation levels. Lock contention is a design problem.

  3. Data skew is the hardest problem to diagnose. EXPLAIN looks correct for the average case. You find skew by correlating slow executions with specific input values. FREQVAL statistics and REOPT help the optimizer handle it.

  4. Thread reuse issues show bimodal response times. If a transaction is either fast or slow with nothing in between, check the dynamic SQL cache, RELEASE options, and package allocation.

Monitoring and Testing

  1. Every REBIND is a change event. Compare EXPLAIN before and after. Test with production volumes. No REBIND moves to production without access path verification.

  2. Every COBOL-DB2 program needs a performance budget. Maximum elapsed, CPU, GETPAGE, SQL calls, commit frequency. If testing exceeds the budget, fix it before deployment.

Quick Reference: Diagnosis Flowchart

Problem reported
    |
    v
Define precisely (what, when, how much)
    |
    v
Pull accounting trace (IFCID 3)
    |
    +---> High CPU, normal elapsed ---> Check EXPLAIN (access paths)
    |
    +---> High elapsed, low CPU -------> Check lock wait / I/O wait
    |         |
    |         +---> Lock wait high -----> Commit frequency / job scheduling
    |         +---> I/O wait high ------> Buffer pools / prefetch / REORG
    |
    +---> Sudden change, no code ------> Check RUNSTATS / REBIND history
    |
    +---> Gradual degradation ----------> Data growth / index fragmentation
    |
    v
Identify root cause ---> Fix ---> Measure ---> Compare to baseline