Case Study 1: Continental National Bank's 10-Year Hybrid Architecture Vision

Background

In January 2023, Continental National Bank's board of directors asked Kwame Mensah a question that would define the next decade of CNB's technology strategy: "Where will our technology platform be in 2033?"

The question came during the annual strategic planning cycle. The board had been reading industry analyst reports — the kind that predict mainframe extinction every year with the same confidence and the same track record — and they wanted reassurance, or at least a plan.

Kwame had been preparing for this question for five years. Not because he anticipated it from the board specifically, but because he'd been watching the industry's hybrid evolution with the analytical rigor of an architect who's seen too many migrations fail and too few succeed. He had data. He had a framework. And he had a strong opinion backed by thirty years of production mainframe experience and five years of deliberate cloud learning.

"I told the board three things," Kwame recalls. "First, the mainframe is not going away — I showed them the economics. Second, cloud-only is not an option — I showed them the risk analysis. Third, the right answer is a permanent hybrid architecture designed to last at least ten years, with annual review points. They didn't love hearing 'ten years.' Board members like three-year plans. But I showed them the math, and math is hard to argue with."

The Assessment Phase (January - June 2023)

Kwame's team spent six months assessing CNB's technology portfolio before proposing an architecture. The assessment had four workstreams:

Workstream 1: Application Portfolio Analysis

Lisa Tran led the analysis of CNB's 8 million lines of COBOL, organized into 1,200 programs across 47 applications.

The team categorized every application on two dimensions:

Category Business Criticality Technical Complexity Count COBOL Lines
Core Transaction Critical (regulatory, revenue) High (DB2 data sharing, CICS, complex business rules) 12 applications 3.2M lines
Operational Support Important (not revenue-critical) Medium (batch, reporting, file processing) 18 applications 2.8M lines
Peripheral Low (internal tools, legacy interfaces) Low to Medium 17 applications 2.0M lines

"The core transaction applications — account management, funds transfer, loan origination, general ledger, regulatory reporting — those are the ones where z/OS earns its keep," Lisa said. "Three point two million lines of COBOL processing 500 million transactions a day with sub-millisecond response times and five-nines availability. There is no cloud architecture today that matches this for our transaction volume, and honestly, I don't see one in ten years either."

The operational support applications — batch reporting, data extraction, file-based integration with partners — were candidates for cloud migration or hybrid operation.

The peripheral applications — legacy interfaces to systems that had been decommissioned, internal reporting tools with single-digit users, and test utilities — were candidates for retirement.

Workstream 2: Cost Analysis

Rob Calloway worked with CNB's finance team to build a true total cost of ownership (TCO) model for three scenarios:

Scenario A: Status Quo — Continue running everything on z/OS with incremental modernization (API wrapping, CI/CD). - Year 1-10 total cost: $890M - Risk: workforce cliff (mainframe staff retiring), limited elasticity for growth

Scenario B: Full Cloud Migration — Rewrite core banking in cloud-native over 7 years. - Year 1-10 total cost: $1.7B (including $680M migration cost, $320M in parallel running costs, $700M in cloud operating costs) - Risk: 40-60% probability of project failure based on industry track record; estimated 18-month period of degraded capability during cutover; regulatory risk during transition

Scenario C: Designed Hybrid — Core transactions stay on z/OS; operational and peripheral workloads migrate to cloud; integration infrastructure built for permanent coexistence. - Year 1-10 total cost: $1.1B (including $180M integration infrastructure, $120M cloud migration for suitable workloads, $800M in operating costs for both platforms) - Risk: integration complexity; skills gap; dual-platform operational overhead

"Scenario B was $600 million more expensive than Scenario C, with 40-60% failure probability," Rob noted. "And even if it succeeded perfectly, the cloud operating cost for core banking — when you account for achieving equivalent availability and consistency — was higher than the mainframe operating cost. The only way Scenario B wins economically is if you accept lower availability and weaker consistency. For a Tier-1 bank processing $2 trillion in annual transaction volume, that's not acceptable."

Workstream 3: Risk Analysis

Ahmad Rashidi from Pinnacle — brought in as an external advisor due to his compliance background — conducted the risk analysis. He evaluated three categories:

Regulatory Risk: CNB's OCC examiners had explicitly stated that any technology transformation must maintain continuous compliance with OCC Heightened Standards, PCI-DSS, SOX, and GLBA. A full migration would require re-certification of every compliance control on the new platform — a 2-3 year process running in parallel with the migration. The examiners expressed "significant concern" (their words) about the disruption risk of a full migration for a systemically important bank.

Operational Risk: The batch window (11 PM to 5 AM) processes $1.4 billion in nightly settlement. A failed migration that disrupts the batch window for even one night creates a settlement failure that triggers Federal Reserve intervention. "That's not a risk you manage with a contingency plan," Ahmad said. "That's a risk you avoid entirely."

Knowledge Risk: Marcus Whitfield at FBA (a close professional contact of Kwame's) had just completed a knowledge audit showing that 37% of FBA's critical business rules existed only in the minds of staff within 5 years of retirement. Kwame commissioned a similar audit at CNB: the number was 28%. A full migration would require extracting and reimplementing those rules — rules that the people who understand them won't be around to validate.

Workstream 4: Technology Evaluation

Kwame personally led the technology evaluation, testing five integration approaches:

Approach Tested With Result Verdict
z/OS Connect REST API wrapping 3 CICS transactions 180ms avg response time, functional Adopted for ACL
MQ-to-Kafka connector Transaction event stream 8-second avg lag, reliable Adopted for event mesh
IBM CDC (InfoSphere Data Replication) ACCOUNTS table replication 12-second avg lag, < 60s at 99.5th percentile Adopted for CDC
Direct JDBC from cloud to DB2 Analytics query test Functional but caused 15% increase in DB2 buffer pool miss rate Rejected
COBOL-to-Java automated conversion One peripheral program (CNBUTIL03) Code compiled but was unmaintainable; 3x slower than original Rejected for core systems

The Architecture (Proposed July 2023, Approved September 2023)

Based on the six-month assessment, Kwame proposed the hybrid architecture documented in Section 37.7 of this chapter. The board approved it in September 2023 with a 10-year horizon and annual review gates.

Key architectural decisions:

1. Core transaction processing stays on z/OS permanently. The 12 core applications (3.2M lines of COBOL) remain on the Parallel Sysplex. Investment: modernized CI/CD pipeline (Chapter 36), AI-assisted documentation (Chapter 35), API wrapping of all customer-facing transactions (Chapter 21).

2. Operational applications migrate to cloud over 3 years. The 18 operational applications (2.8M lines) are evaluated individually: 7 are migrated to cloud-native (rewritten), 6 are replatformed (COBOL on Linux), and 5 are replaced by SaaS solutions.

3. Peripheral applications are retired. The 17 peripheral applications (2.0M lines) are retired over 18 months. Users are migrated to modern alternatives or the functionality is absorbed into other applications.

4. Integration infrastructure is built to last. $180M over 10 years funds the API gateway, event mesh, CDC pipeline, identity bridge, and unified monitoring platform. These are not "migration tools" — they are permanent production infrastructure.

5. Organization restructured for hybrid. The integration squad (6 people initially, growing to 12 over 3 years) is formed, reporting directly to Kwame. Cross-training programs begin for both mainframe and cloud teams. Shared on-call rotations start within 6 months.

Implementation Timeline (2023-2033)

Year Milestone Status (as of 2025)
2023 H2 Integration squad formed; API gateway deployed; z/OS Connect configured for first 5 APIs Complete
2024 H1 First 12 CICS transactions exposed as REST APIs; CDC pipeline for ACCOUNTS and TRANSACTIONS tables operational Complete
2024 H2 Event mesh (MQ-Kafka connector) operational; unified monitoring MVP deployed; first 3 operational apps migrated to cloud Complete
2025 H1 Identity federation (ISAM) deployed; 25 APIs live; cloud analytics platform consuming CDC data; 5 more operational apps migrated In progress
2025 H2 All customer-facing transactions available as APIs; mobile banking app launched on hybrid backend; peripheral app retirement complete Planned
2026 Full operational app migration complete; strangler fig active for 2 operational apps; batch modernization begins Planned
2027 Batch optimization — parallel batch, checkpoint/restart modernization; cloud analytics fully operational Planned
2028 Architecture review gate — assess 10-year plan against actuals; adjust based on technology evolution Planned
2029-2030 AI-assisted mainframe development tools mature; next-generation z/OS hardware evaluation Planned
2031-2033 Architecture evolution based on 2028 review; potential for quantum-safe cryptography migration; workforce transition (Generation Z mainframe engineers via cross-training pipeline) Planned

Results (as of March 2025)

Two years into the 10-year plan, Kwame reports the following results:

Operational: 25 CICS transactions are exposed as REST APIs, serving 12,000 requests per second at peak. The mobile banking app (launched internally in Q4 2024) runs entirely on the hybrid architecture. Average API response time: 210ms end-to-end. CICS-side response time: 38ms. The difference (172ms) is network latency + z/OS Connect transformation + API gateway overhead.

Financial: Year 1-2 actual cost: $205M (vs. $214M budget). MIPS consumption reduced 8% through API-driven workload offloading (queries that used to run as CICS green-screen transactions now run as cloud-side cached reads against the CDC replica).

Organizational: Integration squad has 8 members (growing to target of 12). 60% of mainframe engineers have completed cloud fundamentals training. 40% of cloud engineers have completed z/OS fundamentals training. Cross-platform incident response time has improved 62% since unified monitoring deployment.

Risk: Zero production incidents caused by hybrid integration in the first 18 months. Two near-misses: (1) CDC lag spiked to 4 minutes during a particularly heavy batch night — resolved by tuning DB2 log buffer allocation; (2) a cloud deployment accidentally removed the correlation ID header propagation, causing 6 hours of monitoring blind spot before detection. Both were addressed through runbook updates and automated testing.

Lessons Learned

Kwame distills the first two years into five lessons:

1. The ACL is worth every dollar. "We spent $12M on z/OS Connect licensing, configuration, and custom extensions. Some people thought that was excessive for an 'integration layer.' But the ACL has absorbed 47 COBOL COMMAREA changes and 23 API version updates without any cloud consumer being affected. The alternative — rebuilding cloud integrations every time a COBOL program changes — would have cost far more."

2. CDC lag is the number you watch. "Our 30-second SLA for CDC lag is the most closely monitored metric in the hybrid architecture. When it spikes, the cloud analytics team is working with stale data, and decisions based on stale data in banking are dangerous. We treat CDC lag the same way we treat CICS response time — with real-time alerting and automatic escalation."

3. Cultural change is harder than technical change. "The mainframe team and the cloud team didn't trust each other initially. The mainframe people thought the cloud people were reckless. The cloud people thought the mainframe people were dinosaurs. The integration squad was the bridge — they spoke both languages and earned credibility with both teams. But it took a year. Technical integration took six months. Cultural integration took twelve."

4. Build for permanence from day one. "Every piece of integration infrastructure we've deployed is built to production standards with monitoring, alerting, failover, and documentation. We treated the integration layer as seriously as we treat the mainframe or the cloud platform. That decision to invest in durability — rather than building 'temporary' bridges — has saved us an estimated $8M in rework that we've seen other organizations spend on rebuilding their temporary integration layers."

5. The board needs metrics, not architecture diagrams. "The board doesn't want to see a reference architecture diagram. They want to see: cost per transaction (trending down), API response time (within SLA), incident count (zero from hybrid integration), and MIPS utilization (trending down through offloading). Translate architecture into business metrics, or the funding conversation gets very difficult."

Discussion Questions

  1. Kwame's assessment rejected direct JDBC from cloud to DB2 after testing showed a 15% increase in buffer pool miss rate. Why does this happen, and how does the CDC-based shared-nothing architecture avoid it?

  2. The cost analysis shows Scenario B (full cloud migration) at $1.7B vs. Scenario C (hybrid) at $1.1B. What assumptions could change to make Scenario B cheaper? How likely are those assumptions to hold?

  3. CNB's integration squad started with 6 people and is growing to 12. For an organization half CNB's size, how would you scale the integration function? Is a dedicated squad still justified, or should integration skills be embedded in platform teams?

  4. The two near-misses (CDC lag spike, correlation ID loss) both involved components in the integration layer. What does this tell you about where to focus testing and monitoring investment in hybrid architectures?

  5. Kwame says "cultural change is harder than technical change." Based on the organizational design principles in Section 37.6, what specific actions could have accelerated the cultural integration between mainframe and cloud teams?