Chapter 36: Capstone — The Meridian National Bank: Complete System Review, Stress Test, and Disaster Recovery Drill

DataField.Dev

31 min read

> "The final exam for a DBA is not a written test. It is the morning you walk into the data center, look at every blinking light, every scrolling log, every query in flight — and declare with confidence: this system is ready." — attributed to a...

Prerequisites

All thirty-five prior chapters of this book
A complete working Meridian National Bank environment across z/OS, LUW, and cloud
Operational comfort running utilities, EXPLAIN, and recovery on production-sized data

Learning Objectives

Produce a complete Db2 system architecture verification across all three deployments
Conduct a quantified schema and design review with documented findings
Run a stress test, capture baselines, and analyze the top SQL with EXPLAIN
Execute a disaster recovery drill that measures actual RTO and RPO
Audit security against PCI-DSS and produce a remediation plan
Deliver a three-year capacity plan and an executive summary for the board

In This Chapter

36.1 The Capstone Challenge
36.2 System Architecture Review
36.3 Schema and Design Review
36.4 Performance Baseline and Stress Test
36.5 EXPLAIN Analysis Under Load
36.6 Security Audit
36.7 Backup and Recovery Drill
36.8 Concurrency and Locking Analysis
36.9 Utility and Maintenance Review
36.10 Capacity Planning Report
36.11 Executive Summary Presentation
36.12 The Final Meridian Bank Report
Spaced Review: Comprehensive Topic Recall
Summary

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

Chapter 36: Capstone — The Meridian National Bank: Complete System Review, Stress Test, and Disaster Recovery Drill

"The final exam for a DBA is not a written test. It is the morning you walk into the data center, look at every blinking light, every scrolling log, every query in flight — and declare with confidence: this system is ready." — attributed to a senior IBM Systems Engineer, circa 2005

36.1 The Capstone Challenge

You have traveled a long road. Thirty-five chapters have taken you from your first SELECT * FROM SYSIBM.SYSDUMMY1 to the heights of enterprise architecture — data sharing groups on z/OS, HADR failover on Linux, cloud-native analytics on IBM Db2 Warehouse, row and column access control, advanced optimization, and everything in between. Now it is time to bring every skill together in a single, integrated exercise.

Here is the scenario.

Meridian National Bank — our progressive case study throughout this book — is preparing for its annual regulatory examination. The Office of the Comptroller of the Currency (OCC) has notified the bank that examiners will arrive in six weeks. The Chief Information Officer has called an emergency meeting. She turns to you — the lead database administrator — and says:

"I need a complete health assessment of every DB2 system we run. I need proof that our disaster recovery works. I need a capacity plan for the next three years. And I need it all summarized in a report I can hand to the board of directors. You have four weeks."

This chapter walks through every phase of that engagement, step by step. It is not a theoretical exercise. Every SQL statement, every monitoring query, every checklist item reflects what a real DBA would execute in a real bank. By the end of this chapter you will have produced:

A system architecture verification report covering all three DB2 environments (z/OS data sharing, LUW HADR, cloud analytics).
A schema and design review with quantified findings.
A performance baseline and the results of a simulated stress test.
An EXPLAIN-based analysis of the top queries under load.
A security audit mapped to PCI-DSS requirements.
A documented disaster recovery drill with measured Recovery Time Objective (RTO) and Recovery Point Objective (RPO).
A concurrency and locking analysis.
A utility and maintenance schedule review.
A three-year capacity plan.
An executive summary presentation.

Let us begin.

36.1.1 How This Chapter Is Organized

Each section of this chapter follows a consistent pattern. First, we describe what we are reviewing and why it matters. Then we provide the actual SQL statements and commands you would execute — not pseudocode, not summaries, but production-ready queries. Next, we show the results we found in the Meridian environment and how we interpreted them. Finally, we document the findings in a standardized format suitable for the final report.

This structure mirrors how a real assessment engagement works. The client does not want a theoretical framework. They want evidence: what did you check, what did you find, and what do you recommend? Every finding has a severity level (CRITICAL, HIGH, MEDIUM, LOW, or OK), a business impact statement, and a recommended action with estimated timeline and cost.

36.1.2 Prerequisites and Preparation

Before beginning the assessment, the lead DBA should ensure:

Administrative access to all three DB2 environments (z/OS SYSADM or equivalent, LUW DBADM, cloud instance administrator).
Monitoring enabled — on LUW, the MON_GET_* table functions require the appropriate monitor switches to be active. Verify with SELECT * FROM TABLE(MON_GET_DATABASE(-2)).
EXPLAIN tables created — the EXPLAIN facility requires EXPLAIN tables in the schema of the user running EXPLAIN. If they do not exist, create them with the EXPLAIN.DDL script provided by IBM.
A test environment for the stress test — never run a stress test against production. The test environment should be a recent clone of production, with production-representative data volumes.
Stakeholder notification — inform the operations team, the application teams, and management that an assessment is underway. Some activities (particularly the DR drill) require coordination with other teams.
Documentation baseline — collect the current enterprise architecture diagram, the last capacity planning report, the security policy, the DR plan, and the maintenance schedule. These are the documents against which you will compare your findings.

With preparations complete, we proceed to the first phase of the assessment.

36.2 System Architecture Review

36.2.1 The Meridian Bank DB2 Landscape

Meridian National Bank operates three distinct DB2 environments:

Environment	Platform	Version	Purpose	HA Strategy
Production OLTP	z/OS 2.5, DB2 13 for z/OS	V13R1	Core banking transactions	3-member data sharing group
Reporting / Mid-tier	RHEL 8.8, DB2 11.5.8	11.5.8.0	Branch applications, regulatory reporting	HADR with automatic client reroute
Analytics	IBM Db2 Warehouse on Cloud (Dallas region)	SaaS	Business intelligence, data science	Managed HA by IBM

The architecture review begins by confirming that each environment matches the approved enterprise architecture diagram — a document maintained by the data architecture team and last updated eighteen months ago. Discrepancies between the diagram and reality are findings that must be reported.

The data sharing group is the crown jewel of the Meridian OLTP platform. Three members — DB2A, DB2B, and DB2C — share a single set of databases through the Coupling Facility. Our verification checklist:

Coupling Facility Health

-- Query the coupling facility structure status
-- Run from any active member
SELECT MEMBER, STATUS, GROUP_MEMBER
FROM TABLE(SYSPROC.ADMIN_DS_LIST()) AS T;

We verify: - All three members show STATUS = 'ACTIVE'. - The group attach name resolves correctly through WLM. - Lock structure, SCA (Shared Communications Area), and group buffer pools are allocated in the Coupling Facility and show no rebuild-pending status.

Group Buffer Pool Verification

-- Check GBP hit ratios across all members
SELECT MEMBER_NAME,
       BP_NAME,
       GBP_DATA_HIT_RATIO,
       GBP_INDEX_HIT_RATIO,
       GBP_WRITE_FAILED
FROM SYSIBM.SYSBUFFERPOOL_DS
WHERE GBP_DATA_HIT_RATIO < 95.0
ORDER BY GBP_DATA_HIT_RATIO;

A group buffer pool hit ratio below 95% is a warning; below 85% is a critical finding. Cross-invalidation rates are also checked — excessive cross-invalidation indicates that members are frequently updating the same pages, a sign of poor workload distribution.

IRLM (Internal Resource Lock Manager) Verification

-- Check for global lock contention
SELECT MEMBER_NAME,
       GLOBAL_LOCK_REQUESTS,
       GLOBAL_LOCK_SUSPENDS,
       DECIMAL(GLOBAL_LOCK_SUSPENDS * 100.0 /
               NULLIF(GLOBAL_LOCK_REQUESTS, 0), 5, 2) AS SUSPEND_PCT
FROM SYSIBM.SYSLOCKS_DS
ORDER BY SUSPEND_PCT DESC;

Global lock suspend percentages above 2% warrant investigation. Above 5% is a critical finding.

36.2.3 LUW HADR Pair Verification

The LUW environment runs a primary database server and a standby in a separate data center (30 km away). Verification steps:

-- Check HADR status on primary
SELECT HADR_ROLE, HADR_STATE, HADR_SYNCMODE,
       HADR_CONNECT_STATUS, HADR_LOG_GAP,
       HADR_PEER_WINDOW_END
FROM TABLE(MON_GET_HADR(NULL)) AS T;

Expected results: - HADR_ROLE = 'PRIMARY' - HADR_STATE = 'PEER' (for synchronous mode) or 'REMOTE_CATCHUP' (acceptable for async) - HADR_CONNECT_STATUS = 'CONNECTED' - HADR_LOG_GAP should be near zero for SYNC/NEARSYNC modes

We also verify Automatic Client Reroute (ACR) configuration:

-- Verify ACR configuration
SELECT DBPARTITIONNUM, NAME, VALUE
FROM SYSIBMADM.DBCFG
WHERE NAME IN ('hadr_local_host', 'hadr_local_svc',
               'hadr_remote_host', 'hadr_remote_svc',
               'hadr_syncmode', 'hadr_timeout');

36.2.4 Cloud Analytics Environment

For the IBM Db2 Warehouse on Cloud instance, verification is performed through the IBM Cloud dashboard and REST APIs:

Confirm the instance is running in the Dallas (us-south) region.
Verify the instance plan (Flex One, Enterprise, or equivalent).
Check that IP allowlisting is configured to permit connections only from Meridian's corporate network and VPN ranges.
Verify that encryption at rest and in transit are enabled (TLS 1.2+).
Confirm that the ETL pipeline from the LUW reporting database to the cloud warehouse completed its last scheduled load within the expected window.

36.2.5 Architecture Findings Template

Finding ID	Environment	Component	Expected	Actual	Severity	Recommendation
ARCH-001	z/OS	Data Sharing Members	3 active	3 active	OK	None
ARCH-002	LUW	HADR State	PEER	REMOTE_CATCHUP	WARNING	Investigate network latency
ARCH-003	Cloud	IP Allowlist	Corporate + VPN only	Includes deprecated VPN range	MEDIUM	Remove stale IP range

36.3 Schema and Design Review

The schema review examines the logical and physical data model for correctness, efficiency, and compliance with Meridian's data governance standards.

36.3.1 Data Model Validation

We begin with a catalog-based audit. Every table in the MERIDIAN schema should have: - A primary key. - Appropriate foreign keys enforcing referential integrity. - Column names following the bank's naming convention (prefix with entity abbreviation, no abbreviations in business-critical columns). - Appropriate data types (no VARCHAR(4000) for a 10-character code field).

-- Find tables without primary keys
SELECT T.TABSCHEMA, T.TABNAME, T.CARD, T.NPAGES
FROM SYSCAT.TABLES T
WHERE T.TABSCHEMA = 'MERIDIAN'
  AND T.TYPE = 'T'
  AND NOT EXISTS (
      SELECT 1 FROM SYSCAT.TABCONST TC
      WHERE TC.TABSCHEMA = T.TABSCHEMA
        AND TC.TABNAME = T.TABNAME
        AND TC.TYPE = 'P'
  )
ORDER BY T.CARD DESC;

Tables without primary keys are a critical finding for any banking application. Without a primary key, there is no guaranteed unique row identifier, which complicates replication, auditing, and regulatory data lineage requirements.

-- Find potential data type mismatches (oversized VARCHAR columns)
SELECT TABNAME, COLNAME, TYPENAME, LENGTH,
       CASE WHEN TYPENAME = 'VARCHAR' AND LENGTH > 1000
            THEN 'REVIEW: Potentially oversized'
            ELSE 'OK'
       END AS ASSESSMENT
FROM SYSCAT.COLUMNS
WHERE TABSCHEMA = 'MERIDIAN'
  AND TYPENAME = 'VARCHAR'
  AND LENGTH > 500
ORDER BY LENGTH DESC;

36.3.2 Index Effectiveness Analysis

Indexes are the single most impactful performance lever. Our review checks for:

Missing indexes — tables with high read counts but only a primary key index.
Redundant indexes — indexes whose leading columns are a prefix of another index on the same table.
Unused indexes — indexes that have not been used in any access path since the last statistics reset.

-- Find potentially redundant indexes on LUW
SELECT A.TABSCHEMA, A.TABNAME,
       A.INDNAME AS SHORTER_INDEX,
       B.INDNAME AS LONGER_INDEX,
       A.COLNAMES AS SHORT_COLS,
       B.COLNAMES AS LONG_COLS
FROM SYSCAT.INDEXES A
JOIN SYSCAT.INDEXES B
  ON A.TABSCHEMA = B.TABSCHEMA
 AND A.TABNAME = B.TABNAME
 AND A.INDNAME <> B.INDNAME
WHERE A.TABSCHEMA = 'MERIDIAN'
  AND B.COLNAMES LIKE A.COLNAMES || '%'
  AND A.UNIQUERULE <> 'P'
ORDER BY A.TABNAME, A.INDNAME;

-- Find unused indexes (LUW — requires monitoring enabled)
SELECT I.TABSCHEMA, I.TABNAME, I.INDNAME, I.COLNAMES,
       M.INDEX_SCANS, M.INDEX_ONLY_SCANS,
       I.NPAGES AS INDEX_PAGES
FROM SYSCAT.INDEXES I
LEFT JOIN TABLE(MON_GET_INDEX('MERIDIAN', '', -2)) AS M
  ON I.IID = M.IID AND I.TABSCHEMA = M.TABSCHEMA
     AND I.TABNAME = M.TABNAME
WHERE I.TABSCHEMA = 'MERIDIAN'
  AND I.UNIQUERULE <> 'P'
  AND COALESCE(M.INDEX_SCANS, 0) + COALESCE(M.INDEX_ONLY_SCANS, 0) = 0
ORDER BY I.NPAGES DESC;

Each unused index consumes storage, slows INSERT/UPDATE/DELETE operations, and adds REORG/RUNSTATS overhead. However, before recommending removal, we check whether the index supports a quarterly or annual process that may not have run since the monitoring reset.

36.3.3 Tablespace Health

-- Tablespace utilization and health (LUW)
SELECT TBSP_NAME, TBSP_TYPE, TBSP_STATE,
       TBSP_TOTAL_PAGES, TBSP_USED_PAGES, TBSP_FREE_PAGES,
       DECIMAL(TBSP_USED_PAGES * 100.0 /
               NULLIF(TBSP_TOTAL_PAGES, 0), 5, 2) AS USED_PCT,
       TBSP_AUTO_RESIZE_ENABLED
FROM TABLE(MON_GET_TABLESPACE('', -2)) AS T
ORDER BY USED_PCT DESC;

Tablespaces above 85% utilization are flagged as warnings. Above 95% is critical. We also verify that auto-resize is enabled for DMS tablespaces in non-production environments and that production tablespaces follow the approved growth plan.

36.3.4 Foreign Key and Referential Integrity Review

Referential integrity is the backbone of a banking data model. Every ACCOUNT must belong to a CUSTOMER. Every TRANSACTION must belong to an ACCOUNT. Every WIRE_TRANSFER must reference a valid source account. Without enforced foreign keys, orphaned records can accumulate — an account without a customer, a transaction without an account — creating data quality issues that are difficult to detect and expensive to remediate.

-- Find tables that reference other tables but lack foreign keys
-- Identify likely parent-child relationships by naming convention
SELECT CHILD.TABNAME AS POTENTIAL_CHILD,
       CHILD.COLNAME AS FK_COLUMN,
       PARENT.TABNAME AS LIKELY_PARENT
FROM SYSCAT.COLUMNS CHILD
JOIN SYSCAT.COLUMNS PARENT
  ON CHILD.COLNAME = PARENT.COLNAME
 AND CHILD.TABNAME <> PARENT.TABNAME
WHERE CHILD.TABSCHEMA = 'MERIDIAN'
  AND PARENT.TABSCHEMA = 'MERIDIAN'
  AND CHILD.COLNAME LIKE '%_ID'
  AND NOT EXISTS (
      SELECT 1 FROM SYSCAT.REFERENCES R
      WHERE R.TABSCHEMA = CHILD.TABSCHEMA
        AND R.TABNAME = CHILD.TABNAME
        AND R.REFTABNAME = PARENT.TABNAME
  )
  AND EXISTS (
      SELECT 1 FROM SYSCAT.TABCONST TC
      WHERE TC.TABSCHEMA = PARENT.TABSCHEMA
        AND TC.TABNAME = PARENT.TABNAME
        AND TC.TYPE = 'P'
  )
ORDER BY CHILD.TABNAME, CHILD.COLNAME;

This query uses a heuristic approach — looking for columns with matching names ending in _ID — to identify relationships that probably should have foreign keys but do not. Each result must be manually reviewed: some matches are coincidental (not every column named BRANCH_ID is a foreign key to the BRANCH table), but many reveal genuine missing constraints.

For the Meridian schema, this query identified two missing foreign keys: - WIRE_TRANSFER_QUEUE.SOURCE_ACCOUNT_ID should reference ACCOUNT.ACCOUNT_ID. - PAYROLL_STAGING.EMPLOYEE_ACCOUNT_ID should reference ACCOUNT.ACCOUNT_ID.

Both staging tables were designed without foreign keys to improve LOAD performance (foreign key checking adds overhead during bulk loads). This is a valid technical decision, but it must be documented, and the application must enforce the relationship in code.

36.3.5 Data Type Consistency Review

In a well-governed schema, the same business concept should use the same data type everywhere. If CUSTOMER_ID is INTEGER in the CUSTOMER table but BIGINT in the TRANSACTION_HISTORY table, implicit data type conversion occurs during joins — potentially preventing index usage and degrading performance.

-- Find columns with the same name but different data types across tables
SELECT A.COLNAME,
       A.TABNAME AS TABLE_A, A.TYPENAME AS TYPE_A, A.LENGTH AS LEN_A,
       B.TABNAME AS TABLE_B, B.TYPENAME AS TYPE_B, B.LENGTH AS LEN_B
FROM SYSCAT.COLUMNS A
JOIN SYSCAT.COLUMNS B
  ON A.COLNAME = B.COLNAME
 AND A.TABNAME < B.TABNAME
WHERE A.TABSCHEMA = 'MERIDIAN'
  AND B.TABSCHEMA = 'MERIDIAN'
  AND (A.TYPENAME <> B.TYPENAME OR A.LENGTH <> B.LENGTH)
  AND A.COLNAME LIKE '%_ID'
ORDER BY A.COLNAME, A.TABNAME;

36.3.6 Catalog-Based Audit Summary

The schema review produces a findings table:

Finding ID	Object	Issue	Impact	Severity	Recommendation
SCHEMA-001	MERIDIAN.TRANSACTION_STAGING	No primary key	Data integrity risk	CRITICAL	Add surrogate PK
SCHEMA-002	MERIDIAN.CUSTOMER.ADDR_LINE_3	VARCHAR(4000) for address	Wasted buffer pool memory	LOW	Alter to VARCHAR(200)
SCHEMA-003	MERIDIAN.IDX_ACCT_TYPE	Redundant (prefix of IDX_ACCT_TYPE_STATUS)	INSERT overhead	MEDIUM	Drop after validation
SCHEMA-004	MERIDIAN.IDX_CUST_LEGACY	Zero scans since reset	Maintenance overhead	MEDIUM	Validate with batch team, then drop

36.4 Performance Baseline and Stress Test

36.4.1 Establishing the Baseline

Before we can interpret stress test results, we need a baseline — a snapshot of normal system performance under typical weekday load. The baseline captures:

Throughput: transactions per second (TPS), SQL statements per second.
Response time: average and 95th percentile elapsed time for key transaction types.
Resource utilization: CPU, memory, I/O rates, buffer pool hit ratios.
Locking: lock wait time, deadlock frequency.

-- Capture baseline metrics snapshot (LUW)
SELECT SNAPSHOT_TIMESTAMP,
       TOTAL_APP_COMMITS,
       TOTAL_APP_ROLLBACKS,
       ACT_COMPLETED_TOTAL,
       AVG_RQST_CPU_TIME,
       TOTAL_WAIT_TIME,
       LOCK_WAIT_TIME,
       LOCK_WAITS,
       DEADLOCKS,
       POOL_DATA_L_READS,
       POOL_DATA_P_READS,
       DECIMAL(
           (POOL_DATA_L_READS - POOL_DATA_P_READS) * 100.0 /
           NULLIF(POOL_DATA_L_READS, 0), 5, 2
       ) AS BP_HIT_RATIO
FROM TABLE(MON_GET_DATABASE(-2)) AS T;

We capture this snapshot every 15 minutes for one full business day — typically a Tuesday or Wednesday, avoiding month-end processing. The resulting 40+ data points form our baseline profile.

36.4.2 Designing the Stress Test Workload

The stress test simulates peak load conditions. For Meridian National Bank, peak load occurs on the last business day of each month when:

Payroll processing for 14,000 employees runs concurrently with normal banking operations.
Statement generation for 2.3 million accounts runs in batch.
Wire transfer volume increases by 340% over the daily average.
The online banking portal experiences 2.5x normal concurrent users.

Our simulated workload (see code/stress-test-workload.sql) includes:

Account inquiry — read-only, high frequency (simulates online banking).
Fund transfer — read-write, medium frequency (simulates teller and wire operations).
Statement generation — batch read, sequential scan of large tables.
Payroll posting — bulk INSERT/UPDATE, high contention on account balance tables.
Regulatory reporting — complex aggregation queries against historical data.

36.4.3 Executing the Stress Test

The stress test is orchestrated using a combination of DB2's built-in tools and custom scripts:

# Launch stress test with 200 concurrent connections
# Each connection runs a randomized mix of the 5 transaction types
db2 connect to MERIDIANDB user STRESSTEST using $STRESS_PWD

for i in $(seq 1 200); do
    db2 -tvf stress-test-workload.sql &
done

# Monitor in real-time
db2 "SELECT * FROM TABLE(MON_GET_ACTIVITY(NULL, -2))
     WHERE ACTIVITY_STATE = 'EXECUTING'
     ORDER BY ELAPSED_TIME_SEC DESC
     FETCH FIRST 20 ROWS ONLY"

In practice, most organizations use a load testing tool such as IBM Rational Performance Tester, Apache JMeter with a JDBC driver, or a custom harness. The key requirement is that the tool can simulate realistic concurrency — not just sequential statement execution, but genuinely concurrent sessions with think times and transaction boundaries that mimic real user behavior.

36.4.4 Monitoring Under Load

During the stress test, we capture metrics at 60-second intervals:

-- Real-time throughput monitoring
SELECT CURRENT TIMESTAMP AS SAMPLE_TIME,
       MEMBER,
       ACT_COMPLETED_TOTAL AS COMPLETED_ACTS,
       APP_RQSTS_COMPLETED_TOTAL AS COMPLETED_RQSTS,
       AVG_RQST_CPU_TIME,
       TOTAL_CPU_TIME,
       TOTAL_WAIT_TIME,
       LOCK_WAIT_TIME,
       IO_WAIT_TIME,
       POOL_DATA_L_READS,
       POOL_DATA_P_READS,
       ROWS_READ,
       ROWS_RETURNED
FROM TABLE(MON_GET_DATABASE(-2)) AS T;

-- Top resource-consuming statements during stress test
SELECT EXECUTABLE_ID,
       NUM_EXECUTIONS,
       TOTAL_CPU_TIME / NUM_EXECUTIONS AS AVG_CPU_PER_EXEC,
       TOTAL_ACT_TIME / NUM_EXECUTIONS AS AVG_ELAPSED_PER_EXEC,
       ROWS_READ / NUM_EXECUTIONS AS AVG_ROWS_READ_PER_EXEC,
       ROWS_RETURNED / NUM_EXECUTIONS AS AVG_ROWS_RET_PER_EXEC,
       DECIMAL(ROWS_RETURNED * 100.0 / NULLIF(ROWS_READ, 0), 5, 2)
           AS EFFICIENCY_PCT
FROM TABLE(MON_GET_PKG_CACHE_STMT(NULL, NULL, NULL, -2)) AS T
WHERE NUM_EXECUTIONS > 0
ORDER BY TOTAL_CPU_TIME DESC
FETCH FIRST 20 ROWS ONLY;

36.4.5 Identifying Bottlenecks

During the Meridian stress test, we observe three bottlenecks:

Bottleneck 1: Buffer Pool Pressure

The buffer pool hit ratio for the ACCT_DATA tablespace drops from 99.2% at baseline to 87.4% under load. Root cause: the statement generation batch job performs sequential scans of the ACCOUNT_HISTORY table (480 million rows), flushing recently used pages from the buffer pool.

Resolution: Assign the batch workload to a separate buffer pool (BP_BATCH) sized appropriately for sequential prefetch. This isolates the batch I/O from the OLTP buffer pool.

Bottleneck 2: Lock Contention on ACCT_BALANCE

The payroll posting process updates the same account balance rows that teller transactions read. Lock wait time spikes to 340ms average (baseline: 8ms).

Resolution: Change the payroll posting to use SKIP LOCKED DATA for non-critical reads, and restructure the posting to batch updates per account rather than individual row-level updates in a tight loop. Consider Currently Committed semantics for reader isolation.

Bottleneck 3: Sort Overflow

The regulatory reporting query performs a large GROUP BY across 18 months of transaction data. Under concurrent load, the sort heap is exhausted, and sorts spill to disk — a 15x performance degradation.

Resolution: Increase SORTHEAP from 4096 to 16384 pages for the reporting workload class. Alternatively, create a materialized query table (MQT) for the pre-aggregated regulatory report data.

36.4.6 Stress Test Results Summary

Metric	Baseline	Stress Test	Threshold	Status
Throughput (TPS)	2,400	3,800	>3,500 target	PASS
Avg Response Time (ms)	12	48	<100	PASS
95th Percentile RT (ms)	34	210	<500	PASS
BP Hit Ratio (OLTP)	99.2%	87.4%	>95%	FAIL
Lock Wait Time (avg ms)	8	340	<50	FAIL
Sort Overflows / hr	0	847	<10	FAIL
CPU Utilization	28%	72%	<85%	PASS
Deadlocks / hr	0	3	<5	PASS

Three of eight key metrics fail under stress. Each has an identified resolution. After implementing the fixes, the stress test is repeated. The results after remediation:

Metric	After Remediation	Threshold	Status
BP Hit Ratio (OLTP)	98.7%	>95%	PASS
Lock Wait Time (avg ms)	22	<50	PASS
Sort Overflows / hr	0	<10	PASS

36.5 EXPLAIN Analysis Under Load

36.5.1 Capturing Access Paths

During the stress test, we identify the top 10 queries by total CPU consumption and capture their access paths using the EXPLAIN facility:

-- Populate EXPLAIN tables for a specific statement
EXPLAIN PLAN SET QUERYNO = 1001 FOR
SELECT C.CUSTOMER_ID, C.CUSTOMER_NAME,
       A.ACCOUNT_NUMBER, A.CURRENT_BALANCE,
       T.TRANSACTION_DATE, T.TRANSACTION_AMOUNT, T.TRANSACTION_TYPE
FROM MERIDIAN.CUSTOMER C
JOIN MERIDIAN.ACCOUNT A ON C.CUSTOMER_ID = A.CUSTOMER_ID
JOIN MERIDIAN.TRANSACTION_HISTORY T ON A.ACCOUNT_ID = T.ACCOUNT_ID
WHERE C.CUSTOMER_ID = ?
  AND T.TRANSACTION_DATE BETWEEN ? AND ?
ORDER BY T.TRANSACTION_DATE DESC
FETCH FIRST 50 ROWS ONLY;

-- Review the access plan
SELECT OPERATOR_ID, TARGET_ID, OPERATOR_TYPE,
       OBJECT_SCHEMA, OBJECT_NAME,
       TOTAL_COST, IO_COST, CPU_COST,
       STREAM_COUNT
FROM EXPLAIN_OPERATOR
WHERE QUERYNO = 1001
ORDER BY OPERATOR_ID;

36.5.2 Common Access Path Issues Found

Issue 1: Table Scan on TRANSACTION_HISTORY

The optimizer chooses a table scan instead of using the index on (ACCOUNT_ID, TRANSACTION_DATE) because statistics are stale — the last RUNSTATS was 45 days ago, and 38 million new rows have been inserted since.

-- Fix: Update statistics
RUNSTATS ON TABLE MERIDIAN.TRANSACTION_HISTORY
  ON ALL COLUMNS AND INDEXES ALL
  WITH DISTRIBUTION ON ALL COLUMNS
  ALLOW WRITE ACCESS;

After RUNSTATS, the optimizer switches to an index scan with a 94% reduction in estimated cost.

Issue 2: Nested Loop Join Where Hash Join Would Be Better

The regulatory summary query joins ACCOUNT (2.3 million rows) with TRANSACTION_HISTORY (480 million rows) using a nested loop because the join predicate uses a function — YEAR(T.TRANSACTION_DATE) = YEAR(CURRENT DATE). The function prevents the optimizer from using range predicates.

-- Rewrite to enable range predicates
-- Before (problematic):
WHERE YEAR(T.TRANSACTION_DATE) = YEAR(CURRENT DATE)

-- After (optimizer-friendly):
WHERE T.TRANSACTION_DATE >= DATE(DIGITS(YEAR(CURRENT DATE)) || '-01-01')
  AND T.TRANSACTION_DATE < DATE(DIGITS(YEAR(CURRENT DATE) + 1) || '-01-01')

Issue 3: Index-Only Access Opportunity Missed

The account balance inquiry query accesses ACCOUNT only for ACCOUNT_NUMBER and CURRENT_BALANCE, but the existing index includes only ACCOUNT_NUMBER. Adding CURRENT_BALANCE to the index enables index-only access, eliminating all data page I/O for this high-frequency query.

CREATE INDEX MERIDIAN.IDX_ACCT_BAL_COVER
ON MERIDIAN.ACCOUNT (ACCOUNT_NUMBER)
INCLUDE (CURRENT_BALANCE);

36.5.3 Access Path Regression Detection

We compare access paths from the stress test to the production baseline captured last month. A regression is any case where the optimizer chooses a more expensive plan for the same query. We use the EXPLAIN tables to compare:

-- Compare estimated costs between baseline and current
SELECT B.QUERYNO,
       B.TOTAL_COST AS BASELINE_COST,
       C.TOTAL_COST AS CURRENT_COST,
       DECIMAL((C.TOTAL_COST - B.TOTAL_COST) * 100.0 /
               NULLIF(B.TOTAL_COST, 0), 7, 2) AS COST_CHANGE_PCT
FROM EXPLAIN_OPERATOR_BASELINE B
JOIN EXPLAIN_OPERATOR C
  ON B.QUERYNO = C.QUERYNO
 AND B.OPERATOR_ID = 1  -- root operator
 AND C.OPERATOR_ID = 1
WHERE C.TOTAL_COST > B.TOTAL_COST * 1.20  -- >20% regression
ORDER BY COST_CHANGE_PCT DESC;

Any regression greater than 20% is investigated. Common causes include stale statistics, changed data distribution, new indexes that create suboptimal plan choices, and changes to database configuration parameters.

36.5.4 Top 10 Query Optimization Summary

After completing the EXPLAIN analysis of all top 10 queries, we compile the results into a summary table that becomes part of the final report:

Query #	Description	Before Cost	After Cost	Improvement	Change Made
Q1	Customer account inquiry	124	12	90%	Covering index added
Q2	Transaction history lookup	89,400	4,200	95%	RUNSTATS + index scan enabled
Q3	Account balance check	34	8	76%	Covering index (index-only access)
Q4	Wire transfer validation	18	18	0%	No change needed — already optimal
Q5	Daily transaction summary	12,800	12,800	0%	No change — appropriate table scan
Q6	Regulatory CTR report	456,000	23,400	95%	Predicate rewrite (removed function)
Q7	Branch activity report	34,200	8,900	74%	RUNSTATS + better join order
Q8	Payroll batch posting	2,400	2,400	0%	No change — bulk operation, expected cost
Q9	Statement generation	890,000	890,000	0%	Expected cost for sequential scan; isolated to separate BP
Q10	Year-over-year comparison	234,000	18,000	92%	MQT created for pre-aggregated data

Note that not every query needs optimization. Queries Q4, Q5, Q8, and Q9 were already using appropriate access paths. The optimizer chose a table scan for Q5 because the query aggregates the entire day's transactions — an index would not help. Q9 is a sequential scan of a large table that is expected and managed through buffer pool isolation. A common mistake in performance tuning is optimizing queries that are already optimal. The EXPLAIN analysis tells us where to invest effort and where to leave the optimizer's choices alone.

36.5.5 Optimization Profile Considerations

For queries where the optimizer consistently chooses a suboptimal plan despite current statistics, DB2 offers optimization profiles — XML documents that provide hints to the optimizer about preferred access methods. However, optimization profiles should be used sparingly and as a last resort:

-- Example: Create an optimization profile to force index usage
-- Only use when the optimizer consistently makes the wrong choice
-- despite accurate statistics
CREATE OR REPLACE OPTIMIZATION PROFILE MERIDIAN.PROF_CTR_REPORT
  FOR STATEMENT MATCHING ALL
  CONTAINING 'TRANSACTION_HISTORY'
  OPTIONS 'USE INDEX (IX_TXN_DATE_TYPE)'
  ENABLE;

In the Meridian assessment, no optimization profiles were needed — all performance issues were resolved through RUNSTATS, index creation, and query rewriting. If optimization profiles are present in your environment, document each one and verify that it is still needed. Profiles that override the optimizer's decisions based on outdated assumptions can cause performance regressions when data distributions change.

36.6 Security Audit

36.6.1 Privilege Review

The security audit begins with a comprehensive privilege inventory. Every GRANT in the system is examined:

-- Find users with excessive privileges (DBADM or higher)
SELECT GRANTEE, GRANTEETYPE, DBADM, SECADM, DATAACCESS, ACCESSCTRL,
       SQLADM, WLMADM, EXPLAIN
FROM SYSCAT.DBAUTH
WHERE DBADM = 'Y' OR SECADM = 'Y' OR DATAACCESS = 'Y'
ORDER BY GRANTEE;

Findings to look for: - Application service accounts with DBADM authority (they should have only the minimum privileges needed). - Individual user IDs with direct GRANT instead of role-based access. - Orphaned grants — privileges granted to users who have left the organization.

-- Find all table-level privileges
SELECT GRANTOR, GRANTEE, GRANTEETYPE, TABSCHEMA, TABNAME,
       CONTROLAUTH, ALTERAUTH, DELETEAUTH, INSERTAUTH,
       SELECTAUTH, UPDATEAUTH, REFAUTH, INDEXAUTH
FROM SYSCAT.TABAUTH
WHERE TABSCHEMA = 'MERIDIAN'
  AND GRANTEETYPE = 'U'  -- Individual users, not roles
ORDER BY GRANTEE, TABNAME;

Individual user grants are a finding. The bank's security policy requires all access to be granted through roles.

36.6.2 RCAC (Row and Column Access Control) Verification

Meridian uses RCAC to enforce data masking and row filtering:

-- Verify row permissions are active
SELECT TABSCHEMA, TABNAME, PERMNAME, ENABLE, RULETEXT
FROM SYSCAT.ROWPERMISSIONS
WHERE TABSCHEMA = 'MERIDIAN'
ORDER BY TABNAME;

-- Verify column masks are active
SELECT TABSCHEMA, TABNAME, COLMASKNAME, COLNAME, ENABLE
FROM SYSCAT.COLMASKS
WHERE TABSCHEMA = 'MERIDIAN'
ORDER BY TABNAME, COLNAME;

We verify: - Customer SSN is masked for all users except the COMPLIANCE_ROLE. - Account balances are masked for the MARKETING_ROLE (they can see account existence but not balances). - Row-level filtering ensures that branch managers can see only accounts belonging to their branch. - The RCAC policies are ENABLED, not merely defined.

36.6.3 Audit Policy Check

-- Review audit policies
SELECT AUDITPOLICYNAME, AUDITSTATUS,
       CHECKING_STATUS, CONTEXT_STATUS, EXECUTE_STATUS,
       OBJMAINT_STATUS, SECMAINT_STATUS, SYSADMIN_STATUS,
       VALIDATE_STATUS
FROM SYSCAT.AUDITPOLICIES
ORDER BY AUDITPOLICYNAME;

The bank's compliance requirements mandate: - All SECMAINT events are audited (privilege changes). - All VALIDATE events are audited (authentication attempts). - EXECUTE auditing is enabled for sensitive tables (CUSTOMER, ACCOUNT, TRANSACTION_HISTORY). - Audit logs are forwarded to the enterprise SIEM (Security Information and Event Management) system within 60 seconds.

36.6.4 PCI-DSS Compliance Mapping

As a financial institution that processes credit card transactions, Meridian must comply with PCI-DSS. The relevant DB2 requirements map as follows:

PCI-DSS Requirement	DB2 Control	Status	Evidence
2.2: Develop configuration standards	DB2 hardening guide applied	PASS	Config audit report
3.4: Render PAN unreadable	Column mask on CARD_NUMBER	PASS	RCAC policy MASK_PAN active
7.1: Limit access to need-to-know	Role-based access control	PARTIAL	3 individual grants found
7.2: Access control system	DB2 LBAC + RCAC	PASS	Policies verified
8.5: Do not use shared accounts	Service accounts reviewed	PASS	Each app has unique ID
10.1: Audit trails	DB2 audit facility	PASS	All categories enabled
10.5: Secure audit trails	Audit logs on WORM storage	PASS	Storage config verified
10.7: Retain audit trail for 1 year	Log retention policy	PASS	13-month retention confirmed

The three individual user grants (Requirement 7.1) are a finding that must be remediated before the examination.

36.6.5 Penetration Test Results

The security team conducted a database-focused penetration test. Key findings relevant to DB2:

SQL Injection Attempt: The web application properly uses parameterized queries. No injection vectors found.
Privilege Escalation: Attempted to escalate from a restricted application role to DBADM. Failed — no known vulnerabilities in current patch level.
Network Sniffing: All DB2 connections use TLS 1.2. No cleartext credentials observed.
Default Credentials: No default or well-known passwords found on any DB2 instance.

36.7 Backup and Recovery Drill

36.7.1 The DR Scenario

The disaster recovery drill simulates the following scenario:

At 2:47 AM on a Tuesday, the primary SAN controller for the LUW production database fails catastrophically. The storage array enters a degraded state, and the DB2 instance crashes with media errors on the ACCT_DATA tablespace. The standby database (HADR) is 30 km away in the alternate data center.

The drill has two phases: 1. HADR Failover: Bring the standby online as the new primary. 2. Point-in-Time Recovery: Restore a specific tablespace from backup to recover data that may have been in-flight at the moment of failure.

36.7.2 HADR Failover Execution

-- On the standby server, execute takeover
-- Force option used because primary is unreachable
TAKEOVER HADR ON DATABASE MERIDIANDB BY FORCE;

The takeover command promotes the standby to primary. With NEARSYNC mode, the standby is guaranteed to have all log records that were acknowledged to the application — so no committed transactions are lost.

Measured Failover Time:

Phase	Duration
Detect primary failure (monitoring alert)	45 seconds
Operator decision and authorization	3 minutes 20 seconds
Execute TAKEOVER BY FORCE	12 seconds
Database becomes accessible	8 seconds
Application reconnect (ACR)	15 seconds
Total RTO	4 minutes 40 seconds

The bank's RTO target for the LUW system is 15 minutes. We are well within the target.

36.7.3 Point-in-Time Recovery

After the HADR failover stabilizes the environment, we perform a parallel exercise: tablespace-level point-in-time recovery (PITR) to demonstrate the ability to recover a specific tablespace to a moment just before a simulated data corruption event.

-- Step 1: List available backups
LIST HISTORY BACKUP ALL FOR MERIDIANDB;

-- Step 2: Restore a single tablespace from the most recent backup
RESTORE DATABASE MERIDIANDB
  TABLESPACE (ACCT_DATA)
  FROM /db2backup/meridiandb
  TAKEN AT 20260314021500
  WITHOUT ROLLING FORWARD;

-- Step 3: Roll forward to a specific point in time
ROLLFORWARD DATABASE MERIDIANDB
  TO 2026-03-14-02.45.00.000000
  TABLESPACE (ACCT_DATA)
  ONLINE;

-- Step 4: Complete the rollforward
ROLLFORWARD DATABASE MERIDIANDB
  STOP TABLESPACE (ACCT_DATA);

36.7.4 Data Integrity Verification

After recovery, we verify data integrity:

-- Verify row counts match expected values
SELECT 'CUSTOMER' AS TABLENAME, COUNT(*) AS ROW_COUNT
FROM MERIDIAN.CUSTOMER
UNION ALL
SELECT 'ACCOUNT', COUNT(*) FROM MERIDIAN.ACCOUNT
UNION ALL
SELECT 'TRANSACTION_HISTORY', COUNT(*) FROM MERIDIAN.TRANSACTION_HISTORY;

-- Run CHECK DATA utility to verify consistency
-- (DB2 utilities invoked via command line)

# Verify referential integrity
db2 "SET INTEGRITY FOR MERIDIAN.ACCOUNT,
     MERIDIAN.TRANSACTION_HISTORY
     IMMEDIATE CHECKED FOR EXCEPTION IN MERIDIAN.ACCOUNT
     USE MERIDIAN.ACCT_EXCEPTIONS"

36.7.5 DR Drill Results

Metric	Target	Actual	Status
RTO (HADR Failover)	< 15 minutes	4 min 40 sec	PASS
RPO (Data Loss)	0 transactions	0 transactions	PASS
PITR Completion	< 60 minutes	34 minutes	PASS
Data Integrity Check	0 errors	0 errors	PASS
Application Reconnect	< 5 minutes	15 seconds	PASS

36.7.6 z/OS Recovery Considerations

On the z/OS platform, the recovery drill takes a different form. DB2 for z/OS uses: - Image copies (full and incremental) instead of the BACKUP DATABASE command. - RECOVER utility with the TORBA (To Relative Byte Address) option for point-in-time recovery. - Log Apply through the recovery utility, reading archive logs from tape or disk. - Data sharing group recovery where individual members can be recovered without affecting other members.

//RECOVER EXEC DSNUPROC,SYSTEM=DB2A,UID='RECOVER'
//SYSIN    DD *
  RECOVER TABLESPACE MERIDIAN.ACCTDATA
    TORBA X'00000C8A23F0'
    LOGONLY
/*

The z/OS recovery drill confirms that the bank can recover any single tablespace to any point in time within the log retention window (currently 7 days of online logs plus 90 days of archive logs on tape).

36.8 Concurrency and Locking Analysis

36.8.1 Lock Contention Under Concurrent Load

During the stress test, we capture detailed locking information:

-- Current lock waits
SELECT LW.AGENT_ID AS WAITING_AGENT,
       LW.LOCK_OBJECT_TYPE, LW.LOCK_MODE_REQUESTED,
       LW.LOCK_WAIT_START_TIME,
       TIMESTAMPDIFF(2,
           CHAR(CURRENT TIMESTAMP - LW.LOCK_WAIT_START_TIME)) AS WAIT_SECS,
       LH.AGENT_ID AS HOLDING_AGENT,
       LH.LOCK_MODE AS HELD_MODE,
       LH.TABSCHEMA, LH.TABNAME
FROM TABLE(MON_GET_APPL_LOCKWAIT(NULL, -2)) AS LW
JOIN TABLE(MON_GET_LOCKS(NULL, -2)) AS LH
  ON LW.LOCK_NAME = LH.LOCK_NAME
 AND LW.LOCK_OBJECT_TYPE = LH.LOCK_OBJECT_TYPE
ORDER BY WAIT_SECS DESC;

36.8.2 Deadlock Analysis

-- Review deadlock history from monitoring
SELECT DEADLOCK_ID, PARTICIPANT_NO,
       LOCK_MODE, LOCK_OBJECT_TYPE,
       TABSCHEMA, TABNAME,
       STMT_TEXT
FROM TABLE(MON_GET_DEADLOCK(-2)) AS T
ORDER BY DEADLOCK_ID, PARTICIPANT_NO;

During the stress test, three deadlocks occurred — all involving the same pattern: the payroll posting process and the real-time balance inquiry contending for rows in the ACCOUNT table. The deadlock graph reveals a classic cycle:

Session A (payroll) holds an X lock on Account 10042, waits for an X lock on Account 10087.
Session B (payroll) holds an X lock on Account 10087, waits for an X lock on Account 10042.

Resolution: The payroll posting batches are restructured to process accounts in ascending ACCOUNT_ID order, eliminating the possibility of deadlock from opposing lock acquisition sequences. This is a fundamental concurrency principle — consistent lock ordering prevents deadlocks.

36.8.3 Isolation Level Verification

We verify that each application workload uses the correct isolation level:

Application	Required Isolation	Configured	Status
Online Banking (reads)	Cursor Stability (CS)	CS	PASS
Fund Transfer	Repeatable Read (RR)	RR	PASS
Statement Generation	Uncommitted Read (UR)	CS	FINDING
Regulatory Reporting	Read Stability (RS)	RS	PASS

The statement generation process runs under CS when UR would suffice (it reads historical data that will not change during the batch window). Changing to UR eliminates lock acquisition entirely for this workload, reducing contention with OLTP operations.

-- Fix: Set isolation level for statement generation connection
-- In the application's JDBC connection string:
-- jdbc:db2://host:port/MERIDIANDB:currentIsolation=UR;

36.8.4 Lock Escalation Review

Lock escalation — when DB2 promotes many row-level locks to a table-level lock — is a serious performance concern in a concurrent environment:

-- Check lock escalation history
SELECT TABSCHEMA, TABNAME,
       LOCK_ESCALS, LOCK_ESCALS_GLOBAL,
       LOCK_ESCALS_LOCKLIST, LOCK_ESCALS_MAXLOCKS
FROM TABLE(MON_GET_TABLE('MERIDIAN', '', -2)) AS T
WHERE LOCK_ESCALS > 0
ORDER BY LOCK_ESCALS DESC;

If lock escalation is detected on the ACCOUNT or TRANSACTION_HISTORY tables, we increase the LOCKLIST and MAXLOCKS database configuration parameters. Alternatively, we restructure the application to commit more frequently, releasing locks and reducing the total lock count per transaction.

36.9 Utility and Maintenance Review

36.9.1 REORG Schedule Verification

-- Check tables needing REORG (LUW)
SELECT TABSCHEMA, TABNAME,
       REORG_PENDING, NUM_OVERFLOW_ROWS,
       PCT_ROWS_COMPRESSED,
       STATS_TIME
FROM TABLE(ADMIN_GET_TAB_INFO('MERIDIAN', '')) AS T
WHERE REORG_PENDING = 'Y'
   OR NUM_OVERFLOW_ROWS > 1000
ORDER BY NUM_OVERFLOW_ROWS DESC;

Meridian's maintenance schedule calls for: - Online REORG of high-activity tables every Sunday at 02:00 AM. - Offline REORG of the largest tables (TRANSACTION_HISTORY) during the quarterly maintenance window. - RUNSTATS after every REORG and weekly on all tables with significant DML activity.

We verify that the last REORG completed successfully and that no tables have entered REORG-pending state since:

-- Verify REORG history
SELECT TABSCHEMA, TABNAME, REORG_TYPE,
       REORG_STATUS, REORG_START, REORG_END,
       REORG_COMPLETION
FROM TABLE(ADMIN_GET_TAB_REORG_STATUS('MERIDIAN', '')) AS T
ORDER BY REORG_END DESC
FETCH FIRST 20 ROWS ONLY;

36.9.2 RUNSTATS Currency

Stale statistics are one of the top causes of suboptimal access paths. We check how current the statistics are:

-- Find tables with stale statistics
SELECT TABSCHEMA, TABNAME, CARD, STATS_TIME,
       TIMESTAMPDIFF(8, CHAR(CURRENT TIMESTAMP - STATS_TIME)) AS DAYS_SINCE_STATS
FROM SYSCAT.TABLES
WHERE TABSCHEMA = 'MERIDIAN'
  AND TYPE = 'T'
  AND (STATS_TIME IS NULL
       OR TIMESTAMPDIFF(8, CHAR(CURRENT TIMESTAMP - STATS_TIME)) > 7)
ORDER BY DAYS_SINCE_STATS DESC;

Any table with statistics older than 7 days is a finding for high-activity tables. For low-activity reference tables, 30 days is acceptable.

36.9.3 Backup Schedule Verification

-- Verify backup history
SELECT OPERATION, OBJECT, OPERATIONTYPE,
       START_TIME, END_TIME, SQLCAID,
       DEVICETYPE, FIRSTLOG, LASTLOG
FROM TABLE(ADMIN_LIST_HIST()) AS T
WHERE OPERATION = 'B'
ORDER BY START_TIME DESC
FETCH FIRST 10 ROWS ONLY;

The backup schedule requires: - Full online backup every Saturday at 23:00. - Incremental delta backup every weeknight at 23:00. - Transaction log archiving continuously to both local disk and a remote NFS share.

We verify that no backup has failed and that the backup chain is unbroken — an incremental restore depends on a complete chain from the last full backup through every subsequent incremental.

36.9.4 Pending States

Pending states are one of the most dangerous situations in a DB2 environment — they can prevent applications from accessing data:

-- Check for any pending states
SELECT TABSCHEMA, TABNAME, STATUS, TYPE,
       CONST_CHECKED, ACCESS_MODE
FROM SYSCAT.TABLES
WHERE TABSCHEMA = 'MERIDIAN'
  AND STATUS <> 'N'  -- N = Normal
ORDER BY TABNAME;

Common pending states include: - Check Pending: Referential integrity constraints need verification after a LOAD operation. - Inoperative: A view or trigger has become inoperative due to a schema change. - Reorg Pending: A table requires reorganization before certain operations.

Any table in a pending state that serves production applications is a critical finding.

36.9.5 Monitoring Alerts

We verify that the monitoring infrastructure is properly configured:

-- Verify health monitor is active
SELECT ALERT_ID, ALERT_TYPE, ALARM_STATE,
       THRESHOLD_VALUE, VALUE
FROM TABLE(HEALTH_GET_ALERT_CFG(-2)) AS T
WHERE ALARM_STATE <> 'NORMAL'
ORDER BY ALERT_TYPE;

The monitoring system should alert on: - Tablespace utilization > 85%. - Buffer pool hit ratio < 95%. - Lock escalation events. - Deadlocks. - Backup failures. - HADR log gap exceeding 5 minutes. - Sort overflows exceeding 10 per hour.

36.10 Capacity Planning Report

36.10.1 Current Utilization

-- Database size and growth (LUW)
SELECT TBSP_NAME, TBSP_TYPE,
       TBSP_TOTAL_SIZE_KB / 1024 / 1024 AS TOTAL_GB,
       TBSP_USED_SIZE_KB / 1024 / 1024 AS USED_GB,
       TBSP_FREE_SIZE_KB / 1024 / 1024 AS FREE_GB,
       DECIMAL(TBSP_USED_SIZE_KB * 100.0 /
               NULLIF(TBSP_TOTAL_SIZE_KB, 0), 5, 2) AS USED_PCT
FROM TABLE(MON_GET_TABLESPACE('', -2)) AS T
ORDER BY TBSP_USED_SIZE_KB DESC;

Current state:

Resource	Capacity	Current Usage	Utilization
Storage (OLTP)	12 TB	7.8 TB	65%
Storage (Archive)	48 TB	31.2 TB	65%
CPU (z/OS)	8 CPs	Avg 2.2 CPs	28%
CPU (LUW)	32 cores	Avg 9.1 cores	28%
Memory (LUW)	256 GB	188 GB allocated	73%
Buffer Pool (OLTP)	64 GB	58 GB used	91%

36.10.2 Growth Projections

Based on analysis of the past 24 months of data, we project:

Metric	Current	Year 1	Year 2	Year 3
Customer Count	1.2M	1.4M (+17%)	1.7M (+21%)	2.1M (+24%)
Account Count	2.3M	2.8M (+22%)	3.5M (+25%)	4.4M (+26%)
Daily Transactions	4.2M	5.5M (+31%)	7.2M (+31%)	9.4M (+31%)
OLTP Storage	7.8 TB	10.4 TB	14.1 TB	19.2 TB
Archive Storage	31.2 TB	42.8 TB	58.6 TB	80.3 TB

The growth is driven by Meridian's acquisition of two smaller banks (scheduled Year 1) and the launch of a mobile banking platform (Year 2).

36.10.3 Hardware Recommendations

Year 1 (Immediate): - Expand OLTP storage from 12 TB to 16 TB (add one SAN shelf). - Increase LUW buffer pool memory from 64 GB to 96 GB (add RAM to both primary and standby). - No CPU changes needed — headroom is sufficient.

Year 2: - Upgrade LUW server from 32-core to 48-core (or equivalent cloud instance). - Expand OLTP storage to 20 TB. - Consider moving archive storage to object storage (IBM Cloud Object Storage or equivalent) to reduce SAN costs. - Evaluate DB2 pureScale for the LUW tier if the bank exceeds the HADR pair's capacity.

Year 3: - Full cloud migration assessment for the analytics tier. - z/OS capacity upgrade (additional CP). - Implement data lifecycle management: compress and archive transactions older than 7 years to cold storage.

36.10.4 Cloud Expansion Plan

The analytics workload is already in the cloud. The capacity plan recommends expanding cloud usage:

Year 1: Migrate read-only regulatory reporting to a DB2 read replica on IBM Cloud.
Year 2: Implement a hybrid architecture where the LUW reporting database feeds a cloud-native data lake for advanced analytics.
Year 3: Evaluate migrating the LUW mid-tier to Db2 on IBM Cloud Pak for Data, retaining z/OS for core OLTP.

36.10.5 Cost Projections

Item	Year 1	Year 2
Storage Expansion	$180K \| $240K	$150K \| $570K
Server Upgrades	$0 \| $320K	$0 \| $320K
Cloud Services	$48K \| $96K	$144K \| $288K
DB2 Licensing	$0 (current) \| $85K (pureScale eval)	$170K (expansion) \| $255K
Professional Services	$60K \| $120K	$90K \| $270K
Total	$288K \| $861K	$554K \| $1,703K

36.11 Executive Summary Presentation

36.11.1 How to Present Technical Findings to Non-Technical Stakeholders

The CIO asked for a report she can hand to the board. Board members do not want to see SQL. They want to know three things:

Are we safe? (security, compliance, disaster recovery)
Are we fast? (performance, reliability, uptime)
Are we ready for growth? (capacity, scalability, cost)

The executive summary follows this structure:

Page 1: Overall System Health Score

Use a simple traffic light dashboard: - Green: Meets or exceeds all targets. - Yellow: Minor findings that require attention within 90 days. - Red: Critical findings that require immediate action.

Area	Score	Status
System Availability	99.97% uptime (last 12 months)	GREEN
Disaster Recovery	RTO 4m40s (target 15m)	GREEN
Security & Compliance	3 individual user grants (PCI-DSS 7.1)	YELLOW
Performance	3 stress test failures (remediated)	YELLOW
Capacity	35% growth headroom on storage	GREEN

Page 2: Key Findings and Actions

#	Finding	Risk	Action	Timeline	Cost
1	3 individual user grants violate role-based access policy	Audit finding	Migrate to role grants	2 weeks	$0
2	Buffer pool undersized for peak load	Performance degradation at month-end	Expand buffer pool memory	30 days	$12K
3	Stale statistics on 4 tables	Suboptimal query performance	Automate RUNSTATS	1 week	$0
4	Statement generation uses wrong isolation level	Unnecessary lock contention	Application change	2 weeks	$0

Page 3: Three-Year Investment Plan

Present the capacity plan cost projections as a simple bar chart with brief narrative explaining the growth drivers.

36.11.2 The DBA as Business Partner

This capstone exercise illustrates a critical truth: the DBA role is not purely technical. The DBA who can translate database metrics into business impact — who can explain that "a 340ms lock wait time means customers see a 3-second delay at the ATM" — is the DBA who earns a seat at the leadership table.

Throughout this book, we have emphasized understanding not just the how but the why. Why does the optimizer choose a particular access path? Because understanding the why allows you to predict behavior, prevent problems, and communicate effectively with people who will never write a line of SQL.

The capstone report is your proof of competence. It demonstrates that you can: - Operate complex multi-platform DB2 environments. - Identify problems before they affect users. - Quantify risk in terms the business understands. - Propose solutions with cost and timeline. - Execute recovery procedures under pressure.

36.12 The Final Meridian Bank Report

Below is the complete system health report template, filled in with the findings from this capstone exercise.

MERIDIAN NATIONAL BANK — DB2 SYSTEM HEALTH ASSESSMENT REPORT

Report Date: March 2026 Prepared By: Lead Database Administrator Classification: Confidential — Internal Use Only

1. Executive Summary

Meridian National Bank operates three DB2 environments supporting core banking, regulatory reporting, and business analytics. This assessment evaluated all three environments across six dimensions: architecture, schema design, performance, security, disaster recovery, and capacity. Overall system health is GOOD with four findings requiring attention before the upcoming regulatory examination.

2. Architecture Review

All three DB2 environments are operational and conform to the approved enterprise architecture with two exceptions: - The LUW HADR pair is in REMOTE_CATCHUP state instead of PEER due to a recent network configuration change (Finding ARCH-002, severity WARNING). - The cloud analytics environment includes a deprecated VPN IP range in the allowlist (Finding ARCH-003, severity MEDIUM).

Recommendation: Resolve the HADR synchronization issue within 48 hours. Remove the stale IP range immediately.

3. Schema and Design Review

The data model is well-structured with four findings: - One staging table lacks a primary key (CRITICAL). - One column is oversized — VARCHAR(4000) for an address field (LOW). - One redundant index identified (MEDIUM). - One unused index identified (MEDIUM).

Recommendation: Add primary key to staging table before examination. Schedule remaining items for next maintenance window.

4. Performance Assessment

Baseline performance is excellent. Under simulated peak load (month-end conditions), three metrics fell below thresholds: - Buffer pool hit ratio dropped to 87.4% (target >95%). - Lock wait time increased to 340ms (target <50ms). - Sort overflows reached 847/hr (target <10/hr).

All three issues were identified, root-caused, and remediated during the assessment. Re-test confirmed all metrics within thresholds.

Recommendation: Implement the buffer pool expansion and isolation level changes before the next month-end processing cycle.

5. Security and Compliance

The security posture is strong with one finding: - Three individual user grants exist outside the role-based access model (PCI-DSS Requirement 7.1 — PARTIAL compliance).

RCAC policies, audit policies, encryption, and network security all meet or exceed requirements.

Recommendation: Remediate the three individual grants within two weeks. This is the most examination-sensitive finding.

6. Disaster Recovery

The DR drill was successful: - HADR failover completed in 4 minutes 40 seconds (target: 15 minutes). - Zero data loss confirmed. - Point-in-time recovery completed in 34 minutes. - Data integrity verified post-recovery.

Recommendation: No changes required. Consider reducing the operator decision time through automation for future improvement.

7. Capacity Plan

Current utilization across all environments is healthy (28-73% depending on resource type). Growth projections indicate that storage expansion is needed within 12 months and a server upgrade within 24 months. The three-year infrastructure investment is estimated at $1.7M.

Recommendation: Approve Year 1 storage expansion ($180K) in the next capital budget cycle.

Spaced Review: Comprehensive Topic Recall

This capstone chapter draws on concepts from every part of the book. Use these questions to verify your recall:

From Chapter 5 (SQL Foundations): Write a query that joins three tables with appropriate WHERE clause predicates. Did you use sargable predicates?
From Chapter 15 (Indexing): Given the stress test results, which index strategy eliminated the most I/O? Why did the covering index help the account balance inquiry?
From Chapter 23 (EXPLAIN): What is the difference between an index scan and a table scan in the EXPLAIN output? When would the optimizer legitimately choose a table scan?
From Chapter 26 (Concurrency): Explain why consistent lock ordering prevents deadlocks. How would you diagnose a deadlock using the DB2 monitoring interfaces?
From Chapter 19 (Security): What is the difference between DBADM and SECADM authority? Why should application service accounts never have DBADM?
From Chapter 18 (Backup and Recovery): What is the difference between RTO and RPO? Why does NEARSYNC HADR guarantee zero RPO?
From Chapter 28 (Data Sharing): What role does the Coupling Facility play in a z/OS data sharing group? What happens to the group if the Coupling Facility fails?
From Chapter 27 (Performance Tuning): Name three causes of sort overflow and three remediation strategies.
From Chapter 31 (Cloud): What are the trade-offs between managing your own DB2 instance on a cloud VM versus using a fully managed service?
From Chapter 29 (Enterprise Architecture): How does a three-tier architecture (OLTP, reporting, analytics) reduce contention compared to running all workloads on a single database?

Summary

This capstone chapter has walked through a complete, production-realistic assessment of a multi-platform DB2 environment. You established baselines, ran stress tests, diagnosed performance problems, verified security controls, executed a disaster recovery drill, analyzed concurrency behavior, reviewed maintenance schedules, and produced a capacity plan with cost projections — all wrapped in an executive summary suitable for a board of directors.

This is the work of a senior DBA. It requires breadth across every topic in this book, depth to diagnose root causes, and communication skill to translate technical findings into business language. If you can execute this capstone exercise on a real system, you are ready for the most demanding DB2 roles in the industry.

The system is reviewed. The findings are documented. The report is delivered.

Now let us turn to the final chapter — your career.

Item	Year 1	Year 2
Storage Expansion	$180K \| $240K	$150K \| $570K
Server Upgrades	$0 \| $320K	$0 \| $320K
Cloud Services	$48K \| $96K	$144K \| $288K
DB2 Licensing	$0 (current) \| $85K (pureScale eval)	$170K (expansion) \| $255K
Professional Services	$60K \| $120K	$90K \| $270K
Total	$288K \| $861K	$554K \| $1,703K

Prerequisites

Learning Objectives

In This Chapter

Chapter 36: Capstone — The Meridian National Bank: Complete System Review, Stress Test, and Disaster Recovery Drill

36.1 The Capstone Challenge

36.1.1 How This Chapter Is Organized

36.1.2 Prerequisites and Preparation

36.2 System Architecture Review

36.2.1 The Meridian Bank DB2 Landscape

36.2.2 z/OS Data Sharing Group Verification

36.2.3 LUW HADR Pair Verification

36.2.4 Cloud Analytics Environment

36.2.5 Architecture Findings Template

36.3 Schema and Design Review

36.3.1 Data Model Validation

36.3.2 Index Effectiveness Analysis

36.3.3 Tablespace Health

36.3.4 Foreign Key and Referential Integrity Review

36.3.5 Data Type Consistency Review

36.3.6 Catalog-Based Audit Summary

36.4 Performance Baseline and Stress Test

36.4.1 Establishing the Baseline

36.4.2 Designing the Stress Test Workload

36.4.3 Executing the Stress Test

36.4.4 Monitoring Under Load

36.4.5 Identifying Bottlenecks

36.4.6 Stress Test Results Summary

36.5 EXPLAIN Analysis Under Load

36.5.1 Capturing Access Paths

36.5.2 Common Access Path Issues Found

36.5.3 Access Path Regression Detection

36.5.4 Top 10 Query Optimization Summary

36.5.5 Optimization Profile Considerations

36.6 Security Audit

36.6.1 Privilege Review

36.6.2 RCAC (Row and Column Access Control) Verification

36.6.3 Audit Policy Check

36.6.4 PCI-DSS Compliance Mapping

36.6.5 Penetration Test Results

36.7 Backup and Recovery Drill

36.7.1 The DR Scenario

36.7.2 HADR Failover Execution

36.7.3 Point-in-Time Recovery

36.7.4 Data Integrity Verification

36.7.5 DR Drill Results

36.7.6 z/OS Recovery Considerations

36.8 Concurrency and Locking Analysis

36.8.1 Lock Contention Under Concurrent Load

36.8.2 Deadlock Analysis

36.8.3 Isolation Level Verification

36.8.4 Lock Escalation Review

36.9 Utility and Maintenance Review

36.9.1 REORG Schedule Verification

36.9.2 RUNSTATS Currency

36.9.3 Backup Schedule Verification

36.9.4 Pending States

36.9.5 Monitoring Alerts

36.10 Capacity Planning Report

36.10.1 Current Utilization

36.10.2 Growth Projections

36.10.3 Hardware Recommendations

36.10.4 Cloud Expansion Plan

36.10.5 Cost Projections

36.11 Executive Summary Presentation

36.11.1 How to Present Technical Findings to Non-Technical Stakeholders

36.11.2 The DBA as Business Partner

36.12 The Final Meridian Bank Report

MERIDIAN NATIONAL BANK — DB2 SYSTEM HEALTH ASSESSMENT REPORT

1. Executive Summary

2. Architecture Review

3. Schema and Design Review

4. Performance Assessment

5. Security and Compliance

6. Disaster Recovery

7. Capacity Plan

Spaced Review: Comprehensive Topic Recall

Summary