Exercises — Chapter 11: DB2 Performance Diagnosis

Exercises — Chapter 11: DB2 Performance Diagnosis

Section 11.1 — The Performance Investigation Methodology

Exercise 1: Problem Statement Formulation

Convert each of the following vague complaints into precise, measurable problem statements. Include the metric, the expected value, the actual value, and the timeframe.

a) "The nightly batch is running long." b) "Our online transactions feel sluggish today." c) "DB2 is using too much CPU." d) "The claims processing job keeps timing out."

Exercise 2: Component Isolation

A COBOL batch program that normally completes in 45 minutes finished in 3 hours 12 minutes last night. You have the following accounting trace data:

Normal Run:                    Last Night:
Elapsed:      2,700 sec        Elapsed:     11,520 sec
CPU:            340 sec        CPU:            355 sec
Lock Wait:       12 sec        Lock Wait:    8,847 sec
I/O Wait:       187 sec        I/O Wait:       201 sec
GETPAGE:  12,447,000           GETPAGE:   12,512,000
SQL Calls:  3,200,000          SQL Calls:   3,205,000
COMMITS:        6,400          COMMITS:        6,401

a) Identify the component responsible for the degradation. b) What does the consistency of CPU, GETPAGE, and SQL calls tell you? c) What would you investigate next? d) Why does the nearly identical COMMIT count matter for your diagnosis?

Exercise 3: Baseline Creation

You are assigned to create performance baselines for a COBOL-DB2 program that runs daily. Describe:

a) What metrics you would capture b) How many days of data you need before the baseline is reliable c) How you would handle weekday vs. weekend differences d) How you would handle month-end runs that process higher volumes e) What threshold multipliers you would set for alerting (watch, investigate, escalate)

Exercise 4: Investigation Sequencing

A developer tells you: "I think we need a new index on the TRANSACTIONS table." Using the five-step methodology, describe what questions you would ask and what data you would examine before agreeing or disagreeing.

Exercise 5: Multi-Program Diagnosis

Three batch programs access the same set of tables. All three ran long last night. Describe how you would determine whether:

a) There is a single root cause affecting all three b) One program is the "victim" and another is the "blocker" c) The problem is external to all three programs (e.g., system-level)

Section 11.2 — EXPLAIN in Depth

Exercise 6: PLAN_TABLE Reading

Given the following PLAN_TABLE output:

QUERYNO PLANNO METHOD TNAME          ACCESSTYPE MATCHCOLS ACCESSNAME      INDEXONLY PREFETCH
------- ------ ------ -------------- ---------- --------- --------------- -------- --------
    22      1      0  POLICY         R                  0                  N        S
    22      2      1  POLICYHOLDER   I                  1  XPOLH_PK        Y
    22      3      1  AGENT          I                  1  XAGNT_PK        N        L

a) Describe the access path in plain English. b) Identify the performance concern. c) What would you check first in the COBOL program's WHERE clause? d) If the POLICY table has 50 million rows, estimate the relative cost of this access path vs. an index access with MATCHCOLS=3.

Exercise 7: MATCHCOLS Analysis

For each of the following predicates and indexes, determine the expected MATCHCOLS value and explain why:

a) Index: (ACCT_TYPE, ACCT_STATUS, OPEN_DATE) Predicate: WHERE ACCT_TYPE = :HV1 AND ACCT_STATUS = :HV2 AND OPEN_DATE > :HV3

b) Index: (ACCT_TYPE, ACCT_STATUS, OPEN_DATE) Predicate: WHERE ACCT_STATUS = :HV2 AND OPEN_DATE > :HV3

c) Index: (ACCT_TYPE, ACCT_STATUS, OPEN_DATE) Predicate: WHERE ACCT_TYPE = :HV1 AND OPEN_DATE > :HV3

d) Index: (TRANS_DATE, TRANS_TYPE, ACCT_ID) Predicate: WHERE YEAR(TRANS_DATE) = 2025 AND TRANS_TYPE = 'D'

e) Index: (LAST_NAME, FIRST_NAME) Predicate: WHERE LAST_NAME LIKE :HV1 (where :HV1 contains 'SMITH%')

Exercise 8: DSN_STATEMNT_TABLE Interpretation

You query DSN_STATEMNT_TABLE and find the following:

QUERYNO  PROCMS  COST_CATEGORY  STMT_TYPE
-------  ------  -------------  ---------
    15      12   A              SELECT
    22    4847   B              SELECT
    38      28   A              UPDATE
    45   18322   B              SELECT
   112       3   A              SELECT

a) Which statements require investigation and why? b) What does COST_CATEGORY='B' tell you about the optimizer's confidence? c) For QUERYNO 45, what actions would you take? d) If the actual CPU for QUERYNO 22 is 35 milliseconds, is the estimate reasonable? What if it is 35,000 milliseconds?

Exercise 9: Access Path Comparison

You ran EXPLAIN before and after a REBIND. Here are the differences:

QUERYNO 88 BEFORE: ACCESSTYPE=I, MATCHCOLS=3, ACCESSNAME=XCLM_PROC_DATE, INDEXONLY=N
QUERYNO 88 AFTER:  ACCESSTYPE=I, MATCHCOLS=1, ACCESSNAME=XCLM_STATUS, INDEXONLY=N

a) What changed and is it likely better or worse? b) What could cause the optimizer to choose a different index? c) What investigation steps would you take before allowing this REBIND into production?

Exercise 10: Non-Matching Index Scan

Explain why ACCESSTYPE='I' with MATCHCOLS=0 can be worse than ACCESSTYPE='R' (tablespace scan). Under what circumstances would the optimizer still choose it?

Section 11.3 — DB2 Accounting Traces

Exercise 11: Accounting Trace Analysis

Given the following accounting data for a CICS transaction:

PROGRAM: PINQ0100   PLAN: PINQPLN   CONNTYPE: CICS
ELAPSED: 4.7 SEC    IN-DB2: 4.1 SEC

CLASS 2 CPU: 0.047 SEC
CLASS 3 WAITS:
  SYNC I/O:     0.012 SEC
  LOCK WAIT:    3.840 SEC
  LOG WRITE:    0.003 SEC

SQL ACTIVITY:
  SELECT:    4     FETCH:    12     COMMIT: 1
  UPDATE:    2     GETPAGE:  847

a) What percentage of elapsed time is lock wait? b) Is this a CPU-bound or lock-bound problem? c) With only 4 SELECTs and 2 UPDATEs, what could be holding locks for 3.84 seconds? d) What would you ask the operations team?

Exercise 12: Batch Job Accounting Analysis

Two batch programs run concurrently. Analyze their accounting data:

Program A:

ELAPSED: 7,200 SEC   CPU: 890 SEC
LOCK WAIT: 4,100 SEC
COMMITS: 120   SQL UPDATES: 6,000,000
GETPAGE: 180,000,000

Program B:

ELAPSED: 5,400 SEC   CPU: 420 SEC
LOCK WAIT: 2,800 SEC
COMMITS: 45   SQL UPDATES: 2,250,000
GETPAGE: 67,000,000

a) Calculate the commit interval for each program. b) Which program is likely the primary "blocker"? c) What commit frequency would you recommend for each? d) Calculate the average lock hold time per commit interval.

Exercise 13: Buffer Pool Analysis

From the accounting data:

GETPAGE: 50,000,000
SYNC I/O READ: 2,500,000
PREFETCH REQUESTS: 175,000
PREFETCH PAGES READ: 12,250,000

a) Calculate the buffer pool hit ratio. b) Calculate the average pages per prefetch request. c) Is sequential prefetch working effectively? d) If each synchronous I/O takes 0.5ms on average, calculate total I/O wait time.

Exercise 14: Class 3 Wait Time Analysis

Given the following wait time breakdown:

TOTAL ELAPSED:     18,000 SEC
DB2 CPU:            2,400 SEC
SYNC I/O WAIT:      1,200 SEC
LOCK WAIT:          8,400 SEC
LOG WRITE WAIT:       600 SEC
DRAIN WAIT:           300 SEC
CLAIM WAIT:           150 SEC
OTHER WAIT:         4,950 SEC

a) Account for all time. What is the "unaccounted" time? b) What is the most significant wait category? c) The LOG WRITE WAIT of 600 seconds is unusual. What could cause this? d) The DRAIN WAIT of 300 seconds suggests what is happening? e) Propose an investigation plan prioritized by impact.

Exercise 15: SQL Activity Ratio Analysis

Examine these SQL activity counters and identify the anomaly:

OPEN CURSOR:    500,000
FETCH:       25,000,000
CLOSE CURSOR:   500,000
SELECT:         500,000
COMMIT:           1,000

a) Calculate the average fetches per cursor open. b) Is 50 rows per cursor open reasonable? Under what circumstances? c) If this is a batch program processing one row at a time, what pattern would explain these numbers? d) What would the FETCH count be if the cursor were properly filtering with WHERE clause predicates instead of fetching and filtering in COBOL?

Section 11.4 — The Five Common DB2 Problems

Exercise 16: Tablespace Scan Elimination

Rewrite each of the following SQL statements to enable index matching. Assume an index exists on the columns referenced in the WHERE clause.

a) WHERE YEAR(TRANS_DATE) = 2025 AND MONTH(TRANS_DATE) = 3

b) WHERE UPPER(LAST_NAME) = 'SMITH'

c) WHERE CAST(ACCT_NUM AS INTEGER) = :HV-ACCT-NUM (ACCT_NUM is CHAR(10), HV-ACCT-NUM is INTEGER)

d) WHERE ACCT_BALANCE + PENDING_AMT > 10000 (Index on ACCT_BALANCE)

e) WHERE SUBSTR(POLICY_NUM, 1, 3) = 'HMO'

Exercise 17: Lock Contention Resolution

A nightly batch job and an online transaction are deadlocking. The batch job updates ACCOUNT rows by ACCOUNT_ID order. The online transaction reads and updates ACCOUNT rows by CUSTOMER_ID order (which maps to a different physical sequence).

a) Explain why deadlocks occur. b) Propose three different solutions, explaining trade-offs of each. c) Which solution would you implement first and why? d) How would you verify the fix is working?

Exercise 18: Commit Frequency Optimization

A batch program processes 5 million rows. It currently commits once at the end. The program takes 4 hours to run, with 2.5 hours of lock wait. You need to add commit logic.

a) Design the COBOL commit logic with a configurable commit interval. b) What value would you start with for the commit interval? c) How does adding commits affect the program's restart logic? d) If the program abends at row 4,999,999 with the current approach, how long might rollback take? How long with commits every 500 rows? e) Write the COBOL paragraph for checkpoint/restart that records the last committed key value.

Exercise 19: Data Skew Diagnosis

An account inquiry transaction has the following response time distribution:

< 0.5 sec:  92% of executions
0.5-1.0 sec: 5% of executions
1.0-5.0 sec: 2% of executions
> 5.0 sec:   1% of executions

a) What does this distribution suggest? b) How would you identify which accounts fall into the >5.0 sec category? c) What DB2 facilities address data skew? d) Compare REOPT(ALWAYS), REOPT(ONCE), and FREQVAL statistics as solutions. e) Could you solve this at the application level without DB2 changes? How?

Exercise 20: Thread Reuse Diagnosis

A CICS transaction shows the following pattern:

First execution after CICS start: 0.15 sec
Next 100 executions: 0.08 sec average
After 2 hours idle: 0.15 sec again
After pool reset: 0.15 sec again

a) What causes the first-execution overhead? b) What DB2 resources are being cached on the first execution? c) How does RELEASE(DEALLOCATE) vs. RELEASE(COMMIT) affect this? d) For a transaction that executes 10,000 times per day, quantify the total overhead difference between the two behaviors.

Section 11.5 — Performance Monitoring Tools

Exercise 21: Monitoring Strategy Design

Design a monitoring strategy for a COBOL-DB2 application that processes insurance claims during business hours (8 AM-6 PM) and runs batch processing overnight (10 PM-6 AM).

a) What metrics would you monitor in real-time during business hours? b) What thresholds would trigger alerts? c) What batch monitoring would you configure? d) What daily report would you produce? e) How would you detect a slow degradation over weeks that never triggers an alert?

Exercise 22: DISPLAY Command Interpretation

Interpret the following DB2 command output:

-DIS THREAD(*) TYPE(ACTIVE) DETAIL

DSNV402I - ACTIVE THREADS -
NAME     ST A  REQ ID        AUTHID    PLAN     ASID  TOKEN
BATCH    TR *  847K  CNBGL410 CNBGLDBA  CNBGLPLN 0047  12
           V 652-LOCK WAIT, RESOURCE=DATABASE CNBGLDB
              SPACE GLTRANS PAGE X'0001A47C'
              HOLDER: TOKEN 15 PLAN CNBGL420

BATCH    TR *  1.2M  CNBGL420 CNBGLDBA  CNBGLPLN 0047  15
           V 652-LOCK WAIT, RESOURCE=DATABASE CNBGLDB
              SPACE GLTRANS PAGE X'0001A480'
              HOLDER: TOKEN 12 PLAN CNBGL410

a) What is happening between these two threads? b) Is this a deadlock or lock contention? How can you tell? c) What will DB2 do to resolve this? d) What long-term fix would you implement?

Exercise 23: Buffer Pool Sizing

Given the following buffer pool statistics:

POOL  PAGES   GETPAGE   HIT_RATIO  SYNC_READ  PREFETCH_READ  STEAL_COUNT
BP0   50000   847M       97.2%      23.7M      142M            2.4M
BP1   20000   124M       99.1%       1.1M       18M            0.1M
BP2    5000    47M       82.4%       8.3M       22M            4.7M

a) Which buffer pool needs attention and why? b) What does a high STEAL_COUNT indicate? c) If you could add 30,000 pages total across all pools, how would you distribute them? d) What objects are in BP2, and how would you find out?

Section 11.6 — Performance Testing Strategy

Exercise 24: Performance Test Plan

Write a performance test plan for a new COBOL-DB2 program that will: - Process 2 million account records nightly - Update 3 tables (ACCOUNTS, TRANSACTIONS, AUDIT_LOG) - Run concurrently with 2 other programs that read ACCOUNTS

Your test plan should include: a) Test environment requirements b) Test data requirements c) Specific measurements to capture d) Pass/fail criteria e) Concurrent testing approach

Exercise 25: Performance Regression Analysis

You have the following before/after data from a code change:

METRIC              BEFORE      AFTER       CHANGE
Elapsed (sec)        2,400       2,850      +18.8%
CPU (sec)              340         410      +20.6%
GETPAGE            45.2M        62.1M      +37.4%
SQL SELECT          3.2M         4.8M      +50.0%
SQL UPDATE          1.1M         1.1M       0.0%
COMMITS             2,200        2,200       0.0%
Lock Wait (sec)        45           52      +15.6%
Sync I/O (sec)        187          312      +66.8%

a) Which metric changed the most, and what does it suggest? b) The SELECT count increased by 50% but UPDATE count did not change. What design pattern could explain this? c) The GETPAGE increase is proportionally less than the SELECT increase. What does this tell you about the new SELECTs? d) Would you approve this change for production? What additional information would you need? e) What EXPLAIN comparison would you perform?

Exercise 26: Load Testing Design

Design a load test for a banking system that must handle: - 500 concurrent CICS users - Peak transaction rate: 200 transactions/second - Mix: 60% inquiry, 25% update, 15% batch-initiated - SLA: 95th percentile response time < 1 second

Describe: a) How you would simulate the workload b) What you would measure c) How you would determine the system's breaking point d) What DB2-specific metrics matter most under load e) How you would test for lock contention at scale

Exercise 27: Performance Budget Creation

Create performance budgets for each of the following HA Banking system transactions. Justify your numbers.

a) ATM balance inquiry b) Teller deposit c) End-of-day interest calculation (batch, 10M accounts) d) Monthly statement generation (batch, 10M accounts) e) Fraud detection real-time scoring

Exercise 28: Continuous Monitoring Dashboard

Design a DB2 performance dashboard for operations. Specify:

a) Five metrics that should be displayed in real-time b) Three daily trend charts c) Two weekly comparison reports d) Alert thresholds for each real-time metric e) Escalation procedures for each alert level

Applied Exercises

Exercise 29: End-to-End Diagnosis Scenario

You receive the following problem report at 7:00 AM Monday:

"The weekend batch cycle did not complete. Job CNBGL410 started at 22:00 Saturday and was still running when operations cancelled it at 06:00 Sunday. Normally this job completes by 23:30."

Walk through the complete diagnosis: a) What is your precise problem statement? b) What data do you request first? c) You receive the accounting trace — it shows 28,000 seconds elapsed, 400 seconds CPU, 24,000 seconds lock wait. What do you conclude? d) The lock holder is a thread from an ad-hoc query tool running under a developer's ID. What do you do immediately? What process change do you recommend? e) Write the after-action report.

Exercise 30: Performance Improvement Proposal

The HA Banking System's end-of-day settlement process currently takes 4.5 hours. The business requires it to complete in 2 hours. You have analyzed the accounting data and found:

2.8 hours is CPU time
0.9 hours is lock wait
0.5 hours is I/O wait
0.3 hours is other (commit processing, log writes)

Propose a performance improvement plan that addresses each component. For each recommendation: a) Estimate the expected improvement b) Describe the implementation effort c) Identify risks d) Specify how you will measure success