Exercises — Chapter 11: DB2 Performance Diagnosis
Section 11.1 — The Performance Investigation Methodology
Exercise 1: Problem Statement Formulation
Convert each of the following vague complaints into precise, measurable problem statements. Include the metric, the expected value, the actual value, and the timeframe.
a) "The nightly batch is running long." b) "Our online transactions feel sluggish today." c) "DB2 is using too much CPU." d) "The claims processing job keeps timing out."
Exercise 2: Component Isolation
A COBOL batch program that normally completes in 45 minutes finished in 3 hours 12 minutes last night. You have the following accounting trace data:
Normal Run: Last Night:
Elapsed: 2,700 sec Elapsed: 11,520 sec
CPU: 340 sec CPU: 355 sec
Lock Wait: 12 sec Lock Wait: 8,847 sec
I/O Wait: 187 sec I/O Wait: 201 sec
GETPAGE: 12,447,000 GETPAGE: 12,512,000
SQL Calls: 3,200,000 SQL Calls: 3,205,000
COMMITS: 6,400 COMMITS: 6,401
a) Identify the component responsible for the degradation. b) What does the consistency of CPU, GETPAGE, and SQL calls tell you? c) What would you investigate next? d) Why does the nearly identical COMMIT count matter for your diagnosis?
Exercise 3: Baseline Creation
You are assigned to create performance baselines for a COBOL-DB2 program that runs daily. Describe:
a) What metrics you would capture b) How many days of data you need before the baseline is reliable c) How you would handle weekday vs. weekend differences d) How you would handle month-end runs that process higher volumes e) What threshold multipliers you would set for alerting (watch, investigate, escalate)
Exercise 4: Investigation Sequencing
A developer tells you: "I think we need a new index on the TRANSACTIONS table." Using the five-step methodology, describe what questions you would ask and what data you would examine before agreeing or disagreeing.
Exercise 5: Multi-Program Diagnosis
Three batch programs access the same set of tables. All three ran long last night. Describe how you would determine whether:
a) There is a single root cause affecting all three b) One program is the "victim" and another is the "blocker" c) The problem is external to all three programs (e.g., system-level)
Section 11.2 — EXPLAIN in Depth
Exercise 6: PLAN_TABLE Reading
Given the following PLAN_TABLE output:
QUERYNO PLANNO METHOD TNAME ACCESSTYPE MATCHCOLS ACCESSNAME INDEXONLY PREFETCH
------- ------ ------ -------------- ---------- --------- --------------- -------- --------
22 1 0 POLICY R 0 N S
22 2 1 POLICYHOLDER I 1 XPOLH_PK Y
22 3 1 AGENT I 1 XAGNT_PK N L
a) Describe the access path in plain English. b) Identify the performance concern. c) What would you check first in the COBOL program's WHERE clause? d) If the POLICY table has 50 million rows, estimate the relative cost of this access path vs. an index access with MATCHCOLS=3.
Exercise 7: MATCHCOLS Analysis
For each of the following predicates and indexes, determine the expected MATCHCOLS value and explain why:
a) Index: (ACCT_TYPE, ACCT_STATUS, OPEN_DATE)
Predicate: WHERE ACCT_TYPE = :HV1 AND ACCT_STATUS = :HV2 AND OPEN_DATE > :HV3
b) Index: (ACCT_TYPE, ACCT_STATUS, OPEN_DATE)
Predicate: WHERE ACCT_STATUS = :HV2 AND OPEN_DATE > :HV3
c) Index: (ACCT_TYPE, ACCT_STATUS, OPEN_DATE)
Predicate: WHERE ACCT_TYPE = :HV1 AND OPEN_DATE > :HV3
d) Index: (TRANS_DATE, TRANS_TYPE, ACCT_ID)
Predicate: WHERE YEAR(TRANS_DATE) = 2025 AND TRANS_TYPE = 'D'
e) Index: (LAST_NAME, FIRST_NAME)
Predicate: WHERE LAST_NAME LIKE :HV1
(where :HV1 contains 'SMITH%')
Exercise 8: DSN_STATEMNT_TABLE Interpretation
You query DSN_STATEMNT_TABLE and find the following:
QUERYNO PROCMS COST_CATEGORY STMT_TYPE
------- ------ ------------- ---------
15 12 A SELECT
22 4847 B SELECT
38 28 A UPDATE
45 18322 B SELECT
112 3 A SELECT
a) Which statements require investigation and why? b) What does COST_CATEGORY='B' tell you about the optimizer's confidence? c) For QUERYNO 45, what actions would you take? d) If the actual CPU for QUERYNO 22 is 35 milliseconds, is the estimate reasonable? What if it is 35,000 milliseconds?
Exercise 9: Access Path Comparison
You ran EXPLAIN before and after a REBIND. Here are the differences:
QUERYNO 88 BEFORE: ACCESSTYPE=I, MATCHCOLS=3, ACCESSNAME=XCLM_PROC_DATE, INDEXONLY=N
QUERYNO 88 AFTER: ACCESSTYPE=I, MATCHCOLS=1, ACCESSNAME=XCLM_STATUS, INDEXONLY=N
a) What changed and is it likely better or worse? b) What could cause the optimizer to choose a different index? c) What investigation steps would you take before allowing this REBIND into production?
Exercise 10: Non-Matching Index Scan
Explain why ACCESSTYPE='I' with MATCHCOLS=0 can be worse than ACCESSTYPE='R' (tablespace scan). Under what circumstances would the optimizer still choose it?
Section 11.3 — DB2 Accounting Traces
Exercise 11: Accounting Trace Analysis
Given the following accounting data for a CICS transaction:
PROGRAM: PINQ0100 PLAN: PINQPLN CONNTYPE: CICS
ELAPSED: 4.7 SEC IN-DB2: 4.1 SEC
CLASS 2 CPU: 0.047 SEC
CLASS 3 WAITS:
SYNC I/O: 0.012 SEC
LOCK WAIT: 3.840 SEC
LOG WRITE: 0.003 SEC
SQL ACTIVITY:
SELECT: 4 FETCH: 12 COMMIT: 1
UPDATE: 2 GETPAGE: 847
a) What percentage of elapsed time is lock wait? b) Is this a CPU-bound or lock-bound problem? c) With only 4 SELECTs and 2 UPDATEs, what could be holding locks for 3.84 seconds? d) What would you ask the operations team?
Exercise 12: Batch Job Accounting Analysis
Two batch programs run concurrently. Analyze their accounting data:
Program A:
ELAPSED: 7,200 SEC CPU: 890 SEC
LOCK WAIT: 4,100 SEC
COMMITS: 120 SQL UPDATES: 6,000,000
GETPAGE: 180,000,000
Program B:
ELAPSED: 5,400 SEC CPU: 420 SEC
LOCK WAIT: 2,800 SEC
COMMITS: 45 SQL UPDATES: 2,250,000
GETPAGE: 67,000,000
a) Calculate the commit interval for each program. b) Which program is likely the primary "blocker"? c) What commit frequency would you recommend for each? d) Calculate the average lock hold time per commit interval.
Exercise 13: Buffer Pool Analysis
From the accounting data:
GETPAGE: 50,000,000
SYNC I/O READ: 2,500,000
PREFETCH REQUESTS: 175,000
PREFETCH PAGES READ: 12,250,000
a) Calculate the buffer pool hit ratio. b) Calculate the average pages per prefetch request. c) Is sequential prefetch working effectively? d) If each synchronous I/O takes 0.5ms on average, calculate total I/O wait time.
Exercise 14: Class 3 Wait Time Analysis
Given the following wait time breakdown:
TOTAL ELAPSED: 18,000 SEC
DB2 CPU: 2,400 SEC
SYNC I/O WAIT: 1,200 SEC
LOCK WAIT: 8,400 SEC
LOG WRITE WAIT: 600 SEC
DRAIN WAIT: 300 SEC
CLAIM WAIT: 150 SEC
OTHER WAIT: 4,950 SEC
a) Account for all time. What is the "unaccounted" time? b) What is the most significant wait category? c) The LOG WRITE WAIT of 600 seconds is unusual. What could cause this? d) The DRAIN WAIT of 300 seconds suggests what is happening? e) Propose an investigation plan prioritized by impact.
Exercise 15: SQL Activity Ratio Analysis
Examine these SQL activity counters and identify the anomaly:
OPEN CURSOR: 500,000
FETCH: 25,000,000
CLOSE CURSOR: 500,000
SELECT: 500,000
COMMIT: 1,000
a) Calculate the average fetches per cursor open. b) Is 50 rows per cursor open reasonable? Under what circumstances? c) If this is a batch program processing one row at a time, what pattern would explain these numbers? d) What would the FETCH count be if the cursor were properly filtering with WHERE clause predicates instead of fetching and filtering in COBOL?
Section 11.4 — The Five Common DB2 Problems
Exercise 16: Tablespace Scan Elimination
Rewrite each of the following SQL statements to enable index matching. Assume an index exists on the columns referenced in the WHERE clause.
a) WHERE YEAR(TRANS_DATE) = 2025 AND MONTH(TRANS_DATE) = 3
b) WHERE UPPER(LAST_NAME) = 'SMITH'
c) WHERE CAST(ACCT_NUM AS INTEGER) = :HV-ACCT-NUM
(ACCT_NUM is CHAR(10), HV-ACCT-NUM is INTEGER)
d) WHERE ACCT_BALANCE + PENDING_AMT > 10000
(Index on ACCT_BALANCE)
e) WHERE SUBSTR(POLICY_NUM, 1, 3) = 'HMO'
Exercise 17: Lock Contention Resolution
A nightly batch job and an online transaction are deadlocking. The batch job updates ACCOUNT rows by ACCOUNT_ID order. The online transaction reads and updates ACCOUNT rows by CUSTOMER_ID order (which maps to a different physical sequence).
a) Explain why deadlocks occur. b) Propose three different solutions, explaining trade-offs of each. c) Which solution would you implement first and why? d) How would you verify the fix is working?
Exercise 18: Commit Frequency Optimization
A batch program processes 5 million rows. It currently commits once at the end. The program takes 4 hours to run, with 2.5 hours of lock wait. You need to add commit logic.
a) Design the COBOL commit logic with a configurable commit interval. b) What value would you start with for the commit interval? c) How does adding commits affect the program's restart logic? d) If the program abends at row 4,999,999 with the current approach, how long might rollback take? How long with commits every 500 rows? e) Write the COBOL paragraph for checkpoint/restart that records the last committed key value.
Exercise 19: Data Skew Diagnosis
An account inquiry transaction has the following response time distribution:
< 0.5 sec: 92% of executions
0.5-1.0 sec: 5% of executions
1.0-5.0 sec: 2% of executions
> 5.0 sec: 1% of executions
a) What does this distribution suggest? b) How would you identify which accounts fall into the >5.0 sec category? c) What DB2 facilities address data skew? d) Compare REOPT(ALWAYS), REOPT(ONCE), and FREQVAL statistics as solutions. e) Could you solve this at the application level without DB2 changes? How?
Exercise 20: Thread Reuse Diagnosis
A CICS transaction shows the following pattern:
First execution after CICS start: 0.15 sec
Next 100 executions: 0.08 sec average
After 2 hours idle: 0.15 sec again
After pool reset: 0.15 sec again
a) What causes the first-execution overhead? b) What DB2 resources are being cached on the first execution? c) How does RELEASE(DEALLOCATE) vs. RELEASE(COMMIT) affect this? d) For a transaction that executes 10,000 times per day, quantify the total overhead difference between the two behaviors.
Section 11.5 — Performance Monitoring Tools
Exercise 21: Monitoring Strategy Design
Design a monitoring strategy for a COBOL-DB2 application that processes insurance claims during business hours (8 AM-6 PM) and runs batch processing overnight (10 PM-6 AM).
a) What metrics would you monitor in real-time during business hours? b) What thresholds would trigger alerts? c) What batch monitoring would you configure? d) What daily report would you produce? e) How would you detect a slow degradation over weeks that never triggers an alert?
Exercise 22: DISPLAY Command Interpretation
Interpret the following DB2 command output:
-DIS THREAD(*) TYPE(ACTIVE) DETAIL
DSNV402I - ACTIVE THREADS -
NAME ST A REQ ID AUTHID PLAN ASID TOKEN
BATCH TR * 847K CNBGL410 CNBGLDBA CNBGLPLN 0047 12
V 652-LOCK WAIT, RESOURCE=DATABASE CNBGLDB
SPACE GLTRANS PAGE X'0001A47C'
HOLDER: TOKEN 15 PLAN CNBGL420
BATCH TR * 1.2M CNBGL420 CNBGLDBA CNBGLPLN 0047 15
V 652-LOCK WAIT, RESOURCE=DATABASE CNBGLDB
SPACE GLTRANS PAGE X'0001A480'
HOLDER: TOKEN 12 PLAN CNBGL410
a) What is happening between these two threads? b) Is this a deadlock or lock contention? How can you tell? c) What will DB2 do to resolve this? d) What long-term fix would you implement?
Exercise 23: Buffer Pool Sizing
Given the following buffer pool statistics:
POOL PAGES GETPAGE HIT_RATIO SYNC_READ PREFETCH_READ STEAL_COUNT
BP0 50000 847M 97.2% 23.7M 142M 2.4M
BP1 20000 124M 99.1% 1.1M 18M 0.1M
BP2 5000 47M 82.4% 8.3M 22M 4.7M
a) Which buffer pool needs attention and why? b) What does a high STEAL_COUNT indicate? c) If you could add 30,000 pages total across all pools, how would you distribute them? d) What objects are in BP2, and how would you find out?
Section 11.6 — Performance Testing Strategy
Exercise 24: Performance Test Plan
Write a performance test plan for a new COBOL-DB2 program that will: - Process 2 million account records nightly - Update 3 tables (ACCOUNTS, TRANSACTIONS, AUDIT_LOG) - Run concurrently with 2 other programs that read ACCOUNTS
Your test plan should include: a) Test environment requirements b) Test data requirements c) Specific measurements to capture d) Pass/fail criteria e) Concurrent testing approach
Exercise 25: Performance Regression Analysis
You have the following before/after data from a code change:
METRIC BEFORE AFTER CHANGE
Elapsed (sec) 2,400 2,850 +18.8%
CPU (sec) 340 410 +20.6%
GETPAGE 45.2M 62.1M +37.4%
SQL SELECT 3.2M 4.8M +50.0%
SQL UPDATE 1.1M 1.1M 0.0%
COMMITS 2,200 2,200 0.0%
Lock Wait (sec) 45 52 +15.6%
Sync I/O (sec) 187 312 +66.8%
a) Which metric changed the most, and what does it suggest? b) The SELECT count increased by 50% but UPDATE count did not change. What design pattern could explain this? c) The GETPAGE increase is proportionally less than the SELECT increase. What does this tell you about the new SELECTs? d) Would you approve this change for production? What additional information would you need? e) What EXPLAIN comparison would you perform?
Exercise 26: Load Testing Design
Design a load test for a banking system that must handle: - 500 concurrent CICS users - Peak transaction rate: 200 transactions/second - Mix: 60% inquiry, 25% update, 15% batch-initiated - SLA: 95th percentile response time < 1 second
Describe: a) How you would simulate the workload b) What you would measure c) How you would determine the system's breaking point d) What DB2-specific metrics matter most under load e) How you would test for lock contention at scale
Exercise 27: Performance Budget Creation
Create performance budgets for each of the following HA Banking system transactions. Justify your numbers.
a) ATM balance inquiry b) Teller deposit c) End-of-day interest calculation (batch, 10M accounts) d) Monthly statement generation (batch, 10M accounts) e) Fraud detection real-time scoring
Exercise 28: Continuous Monitoring Dashboard
Design a DB2 performance dashboard for operations. Specify:
a) Five metrics that should be displayed in real-time b) Three daily trend charts c) Two weekly comparison reports d) Alert thresholds for each real-time metric e) Escalation procedures for each alert level
Applied Exercises
Exercise 29: End-to-End Diagnosis Scenario
You receive the following problem report at 7:00 AM Monday:
"The weekend batch cycle did not complete. Job CNBGL410 started at 22:00 Saturday and was still running when operations cancelled it at 06:00 Sunday. Normally this job completes by 23:30."
Walk through the complete diagnosis: a) What is your precise problem statement? b) What data do you request first? c) You receive the accounting trace — it shows 28,000 seconds elapsed, 400 seconds CPU, 24,000 seconds lock wait. What do you conclude? d) The lock holder is a thread from an ad-hoc query tool running under a developer's ID. What do you do immediately? What process change do you recommend? e) Write the after-action report.
Exercise 30: Performance Improvement Proposal
The HA Banking System's end-of-day settlement process currently takes 4.5 hours. The business requires it to complete in 2 hours. You have analyzed the accounting data and found:
- 2.8 hours is CPU time
- 0.9 hours is lock wait
- 0.5 hours is I/O wait
- 0.3 hours is other (commit processing, log writes)
Propose a performance improvement plan that addresses each component. For each recommendation: a) Estimate the expected improvement b) Describe the implementation effort c) Identify risks d) Specify how you will measure success