Chapter 29 Exercises: HADR and Replication
Exercise 29.1: HADR Synchronization Mode Analysis
Objective: Analyze the trade-offs between HADR synchronization modes for different scenarios.
Scenario: Meridian National Bank has three database environments that need HADR protection:
| Database | Purpose | Transaction Rate | Data Sensitivity | Standby Location |
|---|---|---|---|---|
| CORE_BANK | Core banking OLTP | 5,000 TPS | Financial (zero loss required) | Same data center (1 ms latency) |
| WEB_PORTAL | Customer web portal | 2,000 TPS | Session data (some loss acceptable) | Different city (15 ms latency) |
| ANALYTICS | Analytics warehouse | 200 bulk loads/day | Derived data (re-creatable) | Different continent (150 ms latency) |
Tasks:
-
For each database, recommend an HADR synchronization mode. Justify your choice with: - RPO analysis (what data loss is acceptable?) - Performance impact analysis (what latency will the sync mode add?) - Availability behavior (what happens if the standby becomes unreachable?)
-
Calculate the per-transaction commit latency added by each recommended sync mode:
SYNC: network_roundtrip + standby_disk_write + ack NEARSYNC: network_roundtrip + ack ASYNC: 0 (no wait) -
For CORE_BANK with NEARSYNC mode, calculate the probability of data loss. Under what exact conditions would committed data be lost?
-
For WEB_PORTAL with the standby 15 ms away, calculate the throughput impact of using SYNC mode versus ASYNC mode. Assume each transaction performs one commit.
Exercise 29.2: HADR Configuration and Verification
Objective: Practice configuring HADR from scratch.
Scenario: You have two servers:
- Primary: db2-prod.meridian.bank (IP: 10.10.1.10)
- Standby: db2-stby.meridian.bank (IP: 10.10.1.20)
- Database name: MERIDIAN
- Instance name: db2inst1
- HADR port: 55001
- Sync mode: NEARSYNC
- Peer window: 120 seconds
Tasks:
-
Write the complete sequence of commands to: a. Take a backup on the primary b. Restore on the standby c. Configure HADR parameters on both servers d. Start HADR on both servers (correct order!) e. Verify HADR is in PEER state
-
Write the command to perform a graceful (planned) takeover to the standby.
-
Write the command to perform a forced takeover when the primary is unreachable.
-
After a forced takeover, the old primary comes back online. Write the commands to: a. Reinitialize the old primary as the new standby b. Verify the new HADR configuration
-
Write a monitoring query using MON_GET_HADR() that checks: - Current HADR role and state - Log gap in bytes and time - Connection status - Primary and standby log positions
Exercise 29.3: Automatic Client Reroute Design
Objective: Design an ACR configuration for a multi-tier application.
Scenario: Meridian Bank's online banking application has the following architecture:
Mobile App → API Gateway → Application Server (Java) → DB2 HADR
Web Browser → Web Server → Application Server (Java) → DB2 HADR
Batch Scheduler → Batch Server (Python) → DB2 HADR
The application servers use JDBC Type 4 (JCC) drivers. The batch server uses Python with ibm_db.
Tasks:
-
Write the db2dsdriver.cfg XML configuration that enables: - ACR with seamless failover - Maximum 3 retry attempts - 10-second retry interval - 30-second ACR timeout - Alternate server pointing to the standby
-
Write the Java JDBC connection URL that enables ACR:
jdbc:db2://primary:50000/MERIDIAN:... -
For the Java application, write the exception handling code that detects an ACR reroute (SQL error -30108) and retries the transaction.
-
For the Python batch application, write equivalent error handling code.
-
Design a test plan to verify ACR works correctly. Include: - Test case for graceful takeover - Test case for sudden primary failure - Test case for network partition between primary and standby - Expected application behavior for each test case
Exercise 29.4: Reads-on-Standby Configuration
Objective: Configure and validate reads-on-standby for reporting workloads.
Scenario: Meridian's analytics team wants to run reports against the HADR standby to offload the primary. The reports include:
| Report | Frequency | Duration | Tables Accessed |
|---|---|---|---|
| Daily transaction summary | Hourly | 5 minutes | TRANSACTIONS, ACCOUNTS |
| Customer portfolio analysis | Daily | 45 minutes | CUSTOMERS, ACCOUNTS, LOANS |
| Branch performance dashboard | Real-time | Continuous | All tables |
| Regulatory compliance extract | Monthly | 2 hours | All tables |
Tasks:
-
Identify which reports are good candidates for reads-on-standby and which should remain on the primary. Justify each decision based on: - Isolation level requirements - Tolerance for slightly stale data - Duration and resource consumption
-
For the reports running on the standby, what isolation levels are available? What impact does this have on the report results?
-
Write the connection configuration for the reporting application to connect to the standby server.
-
If the HADR log gap increases to 60 seconds during the 2-hour regulatory compliance extract, analyze: - Why is this happening? - What is the impact on the primary? - What corrective actions should be taken?
-
Design a monitoring approach that alerts when reads-on-standby are causing the log gap to grow beyond an acceptable threshold.
Exercise 29.5: pureScale vs. HADR Comparison
Objective: Evaluate pureScale and HADR for a specific workload.
Scenario: A new Meridian Bank application processes real-time payment transactions. Requirements:
- Zero data loss (RPO = 0)
- Sub-second RTO (no perceptible outage)
- 10,000 TPS write throughput
- 50,000 TPS read throughput
- 99.999% availability target
Tasks:
-
Evaluate whether HADR can meet these requirements. For each requirement, explain whether HADR satisfies it and any limitations.
-
Evaluate whether pureScale can meet these requirements. For each requirement, explain whether pureScale satisfies it.
-
Create a comparison table covering: - Infrastructure requirements (servers, storage, networking) - Licensing and operational costs (estimate ranges) - Operational complexity - Failure scenarios and recovery behavior - Read/write scalability
-
Make a recommendation with supporting rationale. Under what conditions would you change your recommendation?
-
If the budget allows only two servers (primary + standby), can any combination of technologies meet the requirements? What compromises would be necessary?
Exercise 29.6: Q Replication Design
Objective: Design a Q Replication topology for cross-site data distribution.
Scenario: Meridian Bank operates two data centers:
- DC-East (New York): Primary operations center, 60% of customers
- DC-West (Dallas): Secondary operations center, 40% of customers
The bank wants active-active capability: customers can be served from either data center, and both data centers can process transactions. If one data center fails, the other takes over all traffic.
Tasks:
-
Design a Q Replication topology showing: - Source and target databases at each site - MQ queue managers and channels - Bidirectional replication subscriptions - Which tables are replicated
-
For the ACCOUNTS table, design a conflict resolution strategy. Consider: - What happens if the same account is updated at both sites simultaneously? - How does the bank ensure account balances remain consistent? - Should certain operations be restricted to one site (e.g., account closure)?
-
Calculate the end-to-end replication latency assuming: - Network latency (NY to Dallas): 35 ms - Q Capture processing: 50 ms - MQ transport: network latency + 10 ms overhead - Q Apply processing: 50 ms
-
Design a monitoring approach for the Q Replication environment including: - Latency monitoring - Conflict rate monitoring - Queue depth monitoring - Data consistency verification
-
Write a failover procedure for the scenario where DC-East becomes completely unavailable.
Exercise 29.7: CDC to Kafka Pipeline
Objective: Design a CDC pipeline from DB2 to Apache Kafka.
Scenario: Meridian Bank wants to stream changes from the MERIDIAN database to Kafka for consumption by three downstream systems:
- Fraud detection engine — needs transaction changes within 2 seconds of commit
- Data warehouse loader — needs all table changes, can tolerate 30-second latency
- Customer notification service — needs ACCOUNTS and TRANSACTIONS changes, 5-second latency
Tasks:
-
Design the Kafka topic structure. Consider: - One topic per table vs. one topic for all tables - Topic naming conventions - Partitioning strategy (what key to partition on?) - Retention policy
-
For the TRANSACTIONS table, write the expected Kafka message format (JSON) for an INSERT and an UPDATE operation. Include: - Operation type (I/U/D) - Before-image (for updates and deletes) - After-image (for inserts and updates) - Transaction timestamp - Source metadata (LSN, transaction ID)
-
Calculate the required Kafka cluster sizing: - 5,000 transactions per second - Average message size: 500 bytes - 7-day retention - Replication factor: 3
-
Design error handling for the CDC pipeline: - What happens if Kafka is temporarily unavailable? - What happens if the CDC engine crashes and restarts? - How do you detect and handle message duplicates in consumers?
-
Write a consumer group design for the three downstream systems, ensuring each system gets the messages it needs without interfering with the others.