Case Study 31.1: Cloud Migration Strategy for a Regional Bank

Background

Valley National Savings Bank is a regional institution with 1.2 million customers, 120 branch locations, and $18 billion in assets. The bank operates a DB2 LUW 11.1 environment on-premises that supports its digital banking platform, customer relationship management (CRM) system, and operational reporting.

Current Environment

Infrastructure

  • Database server: Two physical servers (primary + standby) running SUSE Linux Enterprise Server 15.
  • DB2 version: 11.1 Fix Pack 6.
  • HADR: Configured between primary (data center A) and standby (data center B), synchronous mode.
  • Storage: SAN-attached with 800 GB allocated (520 GB used).
  • CPU: 16 cores per server.
  • Memory: 128 GB per server.

Database Profile

  • 180 tables across 12 schemas.
  • 450 indexes.
  • 35 stored procedures (all SQL PL — no C or Java stored procedures).
  • 12 triggers.
  • 8 materialized query tables (MQTs) refreshed nightly.
  • Peak concurrent connections: 450.
  • Average daily transaction volume: 2.4 million SQL statements.

Pain Points

  1. Hardware refresh: Both servers are 6 years old and approaching end-of-support. Replacement cost: $480,000.
  2. Operational overhead: Two full-time DBAs spend 60% of their time on infrastructure tasks (patching, HADR monitoring, backup management, storage provisioning).
  3. Scaling limitations: The bank's mobile banking app has grown from 200,000 to 800,000 monthly active users in 3 years. Peak load during payroll periods exceeds the current server capacity.
  4. Disaster recovery: The HADR standby is in the same metropolitan area (30 miles apart). Regulatory auditors have flagged this as insufficient geographic separation.

Migration Assessment

Compatibility Analysis

The DBA team conducted a thorough compatibility assessment:

Feature Current Use Db2 on Cloud Support Action Required
SQL PL stored procedures 35 procedures Fully supported None
Triggers 12 triggers Fully supported None
MQTs 8 MQTs Supported with limitations Test refresh scheduling
HADR Synchronous Managed HA (3-node) Reconfigure ACR
db2audit Enabled Activity Tracker integration Reconfigure audit
Row compression All large tables Supported None
External routines (C) None N/A None
Local file system access LOAD from local files Cloud Object Storage Modify LOAD scripts
Federated access Not used Available N/A

Compatibility score: 95%. Only the LOAD scripts (which reference local file paths) and the audit configuration require modification.

Strategy Selection

After evaluating the three migration strategies:

Strategy Time to Complete Risk Level Cost Cloud Benefits
Lift and shift (EC2) 4 weeks Low $350K/yr Minimal
Re-platform (Db2 on Cloud) 10 weeks Medium $200K/yr High
Re-architect (microservices) 12-18 months High $800K (project) Maximum

Decision: Re-platform to Db2 on Cloud Enterprise HA.

Rationale: The 95% compatibility score makes re-platforming feasible with minimal code changes. The managed service eliminates the hardware refresh cost and reduces DBA infrastructure effort by 60%. The 3-node HA cluster in Db2 on Cloud provides automatic geographic separation across availability zones, addressing the auditor's concern.

Migration Plan

Phase 1: Foundation (Weeks 1-2)

  1. Provision Db2 on Cloud Enterprise HA in the US East region (Washington, DC — 3 availability zones).
  2. Configure IBM Direct Link (10 Gbps) between the bank's primary data center and IBM Cloud.
  3. Set up VPC with private endpoint for the Db2 instance.
  4. Configure IAM users and roles.
  5. Create Key Protect instance and root encryption key.

Phase 2: Schema Migration (Weeks 3-4)

  1. Extract DDL from the source database using db2look: bash db2look -d VALLEYDB -e -a -l -x -o valley_schema.sql

  2. Review and modify DDL for cloud compatibility: - Remove references to specific storage groups and buffer pools (managed by the service). - Update LOAD scripts to reference Cloud Object Storage URLs. - Verify all CREATE PROCEDURE statements execute successfully.

  3. Execute DDL on the cloud instance.

  4. Validate schema correctness by comparing SYSCAT catalog views between source and target.

Phase 3: Data Migration with CDC (Weeks 5-8)

  1. Set up IBM Data Replication (CDC) from the on-premises DB2 to Db2 on Cloud.
  2. Perform initial full load: - Export using db2move VALLEYDB EXPORT. - Transfer via Direct Link (520 GB at 10 Gbps = ~7 minutes). - Import using db2move VALLEYDB IMPORT.
  3. Start CDC continuous replication to keep the cloud target synchronized.
  4. Monitor replication lag (target: < 5 seconds during normal operation).

Phase 4: Application Testing (Weeks 6-9, overlapping with Phase 3)

  1. Deploy a parallel instance of the digital banking application pointing to the cloud database.
  2. Run automated regression tests (2,400 test cases).
  3. Conduct performance testing with simulated peak load (450 concurrent connections, 2.4M daily statements).
  4. Address any compatibility issues discovered during testing.

Phase 5: Cutover (Week 10)

  1. Friday 22:00: Begin maintenance window.
  2. 22:05: Quiesce the on-premises database (stop application writes).
  3. 22:10: Verify CDC replication is fully caught up (lag = 0).
  4. 22:15: Update DNS records and application connection strings to point to Db2 on Cloud.
  5. 22:20: Smoke test — verify application connectivity and basic operations.
  6. 22:30: Open the application to internal users (staff only).
  7. Saturday 06:00: Monitor for 8 hours. If no issues, proceed to full cutover.
  8. Saturday 14:00: Open to all customers.
  9. Sunday: Monitor. If stable, decommission the on-premises HADR standby.
  10. Week 11-12: Decommission the on-premises primary after validation period.

Rollback Plan

If critical issues are discovered during cutover: 1. Reverse DNS records and connection strings to the on-premises database. 2. CDC replication runs in both directions — the on-premises database receives any changes made during the cloud testing period. 3. Rollback time: < 15 minutes.

Results

Performance Comparison

Metric On-Premises Db2 on Cloud Change
Avg query response time 12 ms 8 ms -33%
Peak connection capacity 450 1,000+ +122%
HADR failover time 30-60 sec (manual) < 15 sec (automatic) -75%
Backup duration 45 min (DBA initiated) Automatic (continuous) N/A
Geographic HA separation 30 miles 3 availability zones Compliant

Cost Analysis (Annual)

Cost Category On-Premises Db2 on Cloud
Hardware amortization $80,000 | $0
Software licenses $95,000 Included
Data center (power, cooling) $24,000 | $0
DBA infrastructure effort $120,000 | $48,000
Direct Link $0 | $66,000
Db2 on Cloud (reserved) $0 | $108,000
Total $319,000** | **$222,000
Savings $97,000 (30%)

Operational Impact

  • DBA team redirected 60% of infrastructure time to performance tuning and application support.
  • Mobile banking team can now request database scaling in minutes instead of the 8-week hardware procurement cycle.
  • Regulatory audit passed with zero findings on geographic HA separation.

Lessons Learned

  1. Test LOAD scripts early: The LOAD scripts that referenced local file paths failed on the cloud instance. Converting to Cloud Object Storage URLs took 3 days of development and testing. This should have been identified and addressed in Phase 2.

  2. CDC latency under peak load: During performance testing, CDC replication lag spiked to 30 seconds during the simulated payroll period. The team increased the CDC apply process parallelism from 2 to 8 threads, reducing peak lag to under 5 seconds.

  3. MQT refresh scheduling: On-premises, MQTs were refreshed by a cron job running REFRESH TABLE at 02:00 AM. On Db2 on Cloud, the team used the Task Scheduler built into the service to schedule the same refreshes. The Task Scheduler syntax differed slightly from cron, requiring documentation updates.

  4. Connection pool tuning: The application's connection pool was configured for a maximum of 100 connections with 5-second timeout. The cloud database, being farther away (3 ms vs. 0.1 ms latency), required increasing the pool to 200 connections and the timeout to 10 seconds to handle peak load without connection errors.

Discussion Questions

  1. Valley National chose re-platform over re-architect. Under what circumstances would re-architect have been the better choice?

  2. The migration used CDC for zero-downtime cutover. What if CDC was not available — how would you minimize downtime using only export/import tools?

  3. The annual savings are $97,000. Over 3 years, that is $291,000 in savings. If the hardware refresh ($480,000) had been the only driver, would the migration still be justified? What non-financial factors should be considered?

  4. The bank's regulatory auditors were satisfied with 3-availability-zone HA. If they required cross-region DR (e.g., US East + US South), how would the architecture and cost change?