Case Study 29.2: Multi-Site Replication for Global Operations
Background
Meridian National Bank has grown through acquisition and now operates in three regions:
- Americas (New York): Core banking, 15 million customers, primary data center
- Europe (London): Acquired European retail bank, 8 million customers
- Asia-Pacific (Singapore): New market entry, 2 million customers and growing rapidly
Each region has its own DB2 LUW database serving local operations. However, the bank's executive team has mandated a unified global platform:
- Customer data must be accessible from any region
- Regulatory compliance requires data residency (EU data stays in EU)
- If any region's data center goes offline, the other two must absorb its workload
- Real-time analytics must cover all three regions
The chief architect, Dr. Yuki Tanaka, leads the design of a multi-site replication strategy.
Challenge
Design and implement a replication architecture that:
- Provides low-latency read access to global customer data from any region
- Maintains data residency compliance (GDPR for Europe, MAS for Singapore)
- Enables regional failover with RPO < 10 seconds and RTO < 30 minutes
- Feeds a global analytics platform with real-time data from all regions
- Handles conflict resolution for shared reference data
Architecture Design
Tier 1: Regional HADR (Within Each Data Center)
Each region implements HADR for local high availability:
| Region | Primary | Standby | Sync Mode | Purpose |
|---|---|---|---|---|
| Americas | NY-PROD | NY-STBY | NEARSYNC | Local HA + reads-on-standby |
| Europe | LDN-PROD | LDN-STBY | NEARSYNC | Local HA + reads-on-standby |
| Asia-Pacific | SG-PROD | SG-STBY | NEARSYNC | Local HA + reads-on-standby |
Tier 2: Q Replication (Cross-Region)
Q Replication provides bidirectional replication of shared reference data and unidirectional replication of regional transactional data:
┌────────────────┐
│ New York │
│ (Americas) │
│ NY-PROD │
└───┬──────┬─────┘
│ │
Q Rep │ │ Q Rep
(bidir) │ │ (bidir)
│ │
┌─────────┘ └──────────┐
│ │
▼ ▼
┌────────────────┐ ┌────────────────┐
│ London │ │ Singapore │
│ (Europe) │◄────►│ (Asia-Pac) │
│ LDN-PROD │Q Rep │ SG-PROD │
└────────────────┘(bidir)└────────────────┘
Bidirectional Tables (Reference Data)
These tables are replicated bidirectionally between all three regions:
| Table | Description | Conflict Strategy |
|---|---|---|
| EXCHANGE_RATES | Currency exchange rates | Timestamp wins (source of truth: NY) |
| PRODUCT_CATALOG | Banking products | Source wins (products defined centrally) |
| BRANCH_DIRECTORY | Global branch information | Region of origin wins |
| COMPLIANCE_RULES | Regulatory rules | Source wins (compliance team in NY) |
Unidirectional Tables (Regional Data)
Regional transactional data is replicated outward to a global read-only copy:
| Source | Tables | Targets | Purpose |
|---|---|---|---|
| NY-PROD | ACCOUNTS_US, TRANSACTIONS_US | LDN-READONLY, SG-READONLY | Global visibility |
| LDN-PROD | ACCOUNTS_EU, TRANSACTIONS_EU | NY-READONLY, SG-READONLY | Global visibility |
| SG-PROD | ACCOUNTS_AP, TRANSACTIONS_AP | NY-READONLY, LDN-READONLY | Global visibility |
The READONLY databases are separate DB2 instances that receive replicated data for query purposes. They do not participate in the HADR configuration.
Tier 3: CDC to Kafka (Global Analytics)
Each region streams changes via CDC to a regional Kafka cluster. A Kafka MirrorMaker 2 setup consolidates all three regional Kafka clusters into a global analytics Kafka cluster in New York:
NY-PROD ──CDC──► Kafka-NY ────┐
│
LDN-PROD ──CDC──► Kafka-LDN ──┼──MirrorMaker──► Kafka-Global
│ │
SG-PROD ──CDC──► Kafka-SG ────┘ │
▼
Global Analytics Platform
(Db2 Warehouse on Cloud)
Implementation
Phase 1: Regional HADR (Month 1)
Each region implements HADR independently. This is straightforward — the same pattern as Case Study 29.1, replicated three times.
Key decisions: - NEARSYNC mode for all regions (standbys are in the same data center) - Pacemaker for automatic failover in Americas and Asia-Pacific - PowerHA for automatic failover in Europe (AIX-based infrastructure)
Phase 2: Q Replication for Reference Data (Months 2-3)
MQ Infrastructure
Dr. Tanaka deploys IBM MQ queue managers at each site with channels connecting all three:
Queue Managers:
QMNY01 (New York)
QMLDN01 (London)
QMSG01 (Singapore)
Channels:
QMNY01 ←→ QMLDN01 (London-NY dedicated fiber: 70 ms RTT)
QMNY01 ←→ QMSG01 (NY-Singapore via submarine cable: 240 ms RTT)
QMLDN01 ←→ QMSG01 (London-Singapore: 170 ms RTT)
All channels use TLS 1.3 encryption and MQ message authentication.
Bidirectional Replication Setup
For the EXCHANGE_RATES table (bidirectional, NY-London):
-- On NY-PROD: Q Capture subscription
INSERT INTO ASN.IBMQREP_SUBS (
SUBNAME, SOURCE_OWNER, SOURCE_NAME,
TARGET_OWNER, TARGET_NAME,
SENDQ, RECVQ,
SUB_TYPE, CONFLICT_RULE
) VALUES (
'EXRATE_NY_TO_LDN', 'MERIDIAN', 'EXCHANGE_RATES',
'MERIDIAN', 'EXCHANGE_RATES',
'NY_TO_LDN_Q', 'LDN_FROM_NY_Q',
'B', 'C' -- B=Bidirectional, C=Custom conflict resolution
);
Conflict Resolution Procedure
For EXCHANGE_RATES, a custom stored procedure resolves conflicts:
CREATE PROCEDURE MERIDIAN.RESOLVE_EXRATE_CONFLICT (
IN p_source_region VARCHAR(10),
IN p_target_region VARCHAR(10),
IN p_source_timestamp TIMESTAMP,
IN p_target_timestamp TIMESTAMP,
IN p_source_rate DECIMAL(15,6),
IN p_target_rate DECIMAL(15,6),
OUT p_winner VARCHAR(10)
)
BEGIN
-- New York is the authoritative source for exchange rates.
-- If NY is the source, NY always wins.
-- Otherwise, the most recent update wins.
IF p_source_region = 'NY' THEN
SET p_winner = 'SOURCE';
ELSEIF p_target_region = 'NY' THEN
SET p_winner = 'TARGET';
ELSEIF p_source_timestamp > p_target_timestamp THEN
SET p_winner = 'SOURCE';
ELSE
SET p_winner = 'TARGET';
END IF;
END;
Phase 3: CDC to Kafka (Month 4)
Each region deploys a CDC capture engine reading the local DB2 transaction log:
Kafka Topic Structure:
Region-specific topics:
meridian.ny.accounts
meridian.ny.transactions
meridian.ldn.accounts
meridian.ldn.transactions
meridian.sg.accounts
meridian.sg.transactions
Global topics (after MirrorMaker):
meridian.global.accounts (merged from all regions)
meridian.global.transactions (merged from all regions)
Message Format (Avro):
{
"operation": "UPDATE",
"region": "NY",
"table": "ACCOUNTS",
"timestamp": "2026-03-15T14:30:00.123Z",
"transaction_id": "0x00000A21F3400000",
"before": {
"account_id": "US-1234567",
"balance": 52340.00,
"last_updated": "2026-03-15T14:29:55.000Z"
},
"after": {
"account_id": "US-1234567",
"balance": 51840.00,
"last_updated": "2026-03-15T14:30:00.123Z"
}
}
Testing: Regional Failover
Scenario: London Data Center Goes Offline
Dr. Tanaka conducts a full disaster recovery test simulating a London data center failure.
Preparation: - European customers' traffic is served by LDN-PROD via a European load balancer - Q Replication is active between all three regions - Global analytics Kafka pipeline is operational
Execution:
- T+0 seconds: London data center simulated offline (network cut)
- T+5 seconds: MQ channels to London disconnect. Q Capture/Apply queues begin buffering.
- T+10 seconds: Monitoring alerts fire: "London region unreachable"
- T+2 minutes: DR declared. Operations team initiates failover.
- T+3 minutes: DNS updated to route European customer traffic to NY-PROD
- T+5 minutes: NY-PROD's READONLY copy of European data is promoted to a writable database (Q Replication's stored copy)
- T+8 minutes: Application servers in the European edge locations reconnect to NY
- T+10 minutes: European customers resume banking operations via NY
Results: - RPO: 6 seconds (the Q Replication lag at the time of failure) - RTO: 10 minutes (from failure to service restoration) - 847 European transactions in the Q Replication queue were lost (within the 10-second RPO tolerance) - All other transactions were preserved
Recovery (London Restored):
When London comes back online: 1. HADR standby (LDN-STBY) is promoted to primary 2. Q Replication resumes, re-synchronizing London with NY's changes during the outage 3. Conflict resolution handles any dual-region updates during the failover period 4. European traffic is gradually migrated back to London 5. Original LDN-PROD is rebuilt as the new standby
Data Residency Compliance
GDPR Compliance (Europe)
European customer PII (Personal Identifiable Information) resides on LDN-PROD and is replicated only to: - LDN-STBY (same data center, same jurisdiction) via HADR - NY-READONLY (anonymized/pseudonymized subset only)
The Q Replication subscription for EU data to NY applies a transformation:
-- Column mapping in Q Apply subscription
-- Personal data is hashed before replication to NY
SOURCE_COL TARGET_COL TRANSFORMATION
CUSTOMER_NAME CUSTOMER_NAME_HASH SHA256(CUSTOMER_NAME)
EMAIL EMAIL_HASH SHA256(EMAIL)
PHONE PHONE_HASH SHA256(PHONE)
ADDRESS COUNTRY_CODE SUBSTR(ADDRESS, 1, 2)
ACCOUNT_BALANCE ACCOUNT_BALANCE (no transformation)
MAS Compliance (Singapore)
Singapore's Monetary Authority requires that customer data for Singapore-regulated accounts remains in Singapore. Similar transformations apply for SG-to-NY and SG-to-LDN replication.
Performance Metrics
After three months of production operation:
| Metric | Americas | Europe | Asia-Pacific |
|---|---|---|---|
| HADR log gap (avg) | 0 bytes | 0 bytes | 0 bytes |
| HADR log gap (max) | 12 KB | 8 KB | 15 KB |
| Q Rep latency (to nearest region) | 85 ms | 95 ms | 190 ms |
| Q Rep latency (to farthest region) | 260 ms | 210 ms | 275 ms |
| Q Rep conflict rate | 0.003% | 0.002% | 0.001% |
| CDC-to-Kafka latency | <1 sec | <1 sec | <2 sec |
| Global analytics freshness | <5 sec | <5 sec | <5 sec |
Lessons Learned
-
Network latency is the dominant factor in cross-region replication. The NY-Singapore link (240 ms RTT) makes synchronous replication impossible. Q Replication's asynchronous nature is essential for cross-continent scenarios.
-
Conflict resolution requires deep domain knowledge. The initial "timestamp wins" strategy for all tables caused subtle issues with exchange rates, where an older NY value should always override a newer London value. Custom conflict resolution procedures were necessary.
-
Data residency complicates the architecture significantly. About 30% of the design effort went into ensuring PII was properly handled during cross-region replication. Automated compliance checks validate that no unmasked PII reaches unauthorized regions.
-
MQ channel reliability matters. Two MQ channel failures during the first month caused Q Replication to fall behind by several minutes. Implementing MQ channel monitoring with automatic restart and redundant channels resolved this.
-
Global analytics requires schema harmonization. Each region had slightly different column names and data types (a legacy of the European acquisition). Significant effort went into creating a unified schema for the global analytics platform.
Discussion Questions
-
Why was Q Replication chosen over HADR ASYNC for cross-region replication? What advantages does Q Replication offer for this use case?
-
The conflict rate is very low (0.003%). Under what circumstances could it spike? What business process changes might reduce conflicts to zero?
-
If Meridian acquires a bank in Brazil, what changes would be needed to extend this architecture to a fourth region?
-
The CDC-to-Kafka pipeline feeds a centralized analytics platform in New York. What are the data sovereignty implications? How might a federated analytics approach work instead?
-
Calculate the total cost of this architecture in terms of servers, licenses, and network bandwidth. Compare it to a single-site architecture with cross-region backup shipping. What availability improvement justifies the additional cost?