Case Study 2: SecureFirst's Modernization Journey — From Big-Bang to Incremental

Background

SecureFirst Retail Bank is a mid-size institution with 1.8 million customers, a growing mobile banking user base, and a mainframe running approximately 2,000 COBOL programs on an IBM z15 with 3,200 MIPS. Unlike Federal Benefits Administration's 40-year-old IMS-heavy codebase, SecureFirst's mainframe is relatively modern: CICS TS 5.6, DB2 13, Enterprise COBOL 6.4. The code is cleaner, the team is younger, and the integration points are better defined.

But SecureFirst had a problem that FBA didn't: a mobile banking app that was losing the feature race.

Yuki Nakamura — DevOps lead, brought from a Netflix-pedigree distributed systems background — was hired in 2021 to modernize SecureFirst's development pipeline. Carlos Vega — mobile API architect, Java/Kotlin background — joined the same year to build the mobile banking platform. Together, they represented the "new school" at SecureFirst. Their counterparts on the mainframe side — a team of twelve COBOL developers led by Frank DeLuca, a 28-year veteran — represented the "old school."

The stage was set for a clash that would teach both sides something important.

The Big-Bang Proposal (January 2022)

Carlos's initial proposal was straightforward: rewrite the core banking COBOL in Java microservices on Kubernetes. He'd done the architecture diagram (clean, six microservices, event sourcing, the works). He'd estimated 18 months. He'd benchmarked a prototype balance inquiry service on Kubernetes that handled 3,000 requests per second.

The proposal went to the CTO, who asked Frank DeLuca to review it.

Frank's review was three pages. Here are the key points:

On the timeline: "This proposal rewrites 2,000 programs in 18 months. That's 111 programs per month, roughly 4 per business day. Each program averages 3,200 lines of COBOL with embedded business logic, DB2 SQL, and CICS commands. The proposal assumes perfect requirements knowledge, zero rework, and no production incidents during the transition. In twenty-eight years, I have never seen a project with zero rework."

On the performance claim: "The prototype handles 3,000 requests per second for balance inquiry. Our COBOL balance inquiry handles 12,000 requests per second with a p99 latency of 0.8 milliseconds. The prototype's p99 is 14 milliseconds. For a bank processing real-time card authorizations, 14 milliseconds times 50,000 concurrent requests means we're 680 milliseconds behind our SLA before the business logic even runs."

On the cost: "The proposal estimates $18M for the rewrite. It does not include: running the mainframe during the 18-month migration ($5.6M), retraining operations staff ($800K), re-certifying with the OCC ($400K), PCI DSS re-audit ($350K), performance engineering to match mainframe SLAs on Kubernetes ($2-4M unknown), or fixing the business logic gaps that surface in production (historically 15-25% of original project cost for this type of migration). My estimate: $28-35M total, assuming no major failures."

On the risk: "If the rewrite fails or falls behind schedule, our customers can't use mobile banking during the transition — because the new APIs won't be ready and the old system doesn't have APIs. We're betting the bank's digital strategy on a single project with no fallback."

Carlos was frustrated. Yuki was intrigued. The CTO asked both sides to come back with alternatives.

The Turning Point (March 2022)

Yuki arranged a day-long session where Carlos and Frank would walk through the COBOL codebase together. Not to argue — to understand.

Carlos had never read production COBOL before. Frank had never explained it to someone from a microservices background.

Three things changed Carlos's perspective:

Discovery 1: The Account Balance Service. Carlos had assumed that the COBOL balance inquiry was a simple database read. Frank showed him the actual code. The "balance inquiry" transaction: - Reads the account master record from DB2 - Checks for holds (legal, administrative, pending) - Applies pending transactions not yet posted (real-time memo posting) - Calculates available balance (different from ledger balance — involves holds, pending transactions, and float calculations) - Checks for overdraft protection and linked accounts - Returns not one number but seven: ledger balance, available balance, pending debits, pending credits, hold amount, overdraft limit, and last-posted date

"I thought it was a SELECT statement," Carlos admitted. "It's a financial state machine."

Discovery 2: The Interest Calculation Engine. Frank walked through INTCALC — the interest calculation program. It handles 14 different interest calculation methods (simple, compound, 360-day, 365-day, actual/actual, actual/360, tiered, blended, promotional, penalty, workout, regulatory cap, minimum guaranteed, and variable-rate with caps and floors). Each method has edge cases for leap years, mid-month changes, account transfers, and regulatory exceptions.

Carlos: "How many of these calculation methods are actually used?"

Frank: "All fourteen. Different product types use different methods. Some customers have accounts from the 1990s with calculation methods we don't offer to new customers but are contractually obligated to maintain."

Discovery 3: The Error Handling. Frank showed Carlos the transaction recovery logic. If a fund transfer fails halfway through — after debiting the source account but before crediting the destination — the CICS transaction manager triggers automatic compensation. The COBOL code handles 47 distinct failure scenarios, each with a different recovery path. This logic had been refined over 20 years of production incidents.

Carlos asked: "How long would it take to replicate all of this in Java?"

Frank's answer: "If you had the requirements documented? Two to three years. But the requirements aren't documented. The code IS the documentation."

That was the turning point. Carlos didn't abandon modernization — he abandoned rewriting.

The Incremental Strategy (April 2022)

Carlos, Yuki, and Frank developed a joint proposal: incremental modernization that preserves the COBOL core while making it accessible to modern consumers.

The Architecture

┌──────────────────────────────────────────────────────────────────┐
│                        Mobile/Web Consumers                       │
│                    (React Native / Angular)                        │
└────────────────────────────┬─────────────────────────────────────┘
                             │ HTTPS/REST
┌────────────────────────────▼─────────────────────────────────────┐
│                      API Gateway (Kong)                            │
│              Rate limiting, auth, versioning                       │
└────────────────────────────┬─────────────────────────────────────┘
                             │
          ┌──────────────────┼──────────────────┐
          │                  │                  │
┌─────────▼──────┐ ┌────────▼───────┐ ┌───────▼────────┐
│ z/OS Connect   │ │ Cloud-Native   │ │ Cloud-Native   │
│ (CICS services │ │ Microservices  │ │ Analytics      │
│  as REST APIs) │ │ (Java/Kotlin)  │ │ (Python/Spark) │
│                │ │                │ │                │
│ • Balance      │ │ • Notifications│ │ • Spending     │
│ • Transfer     │ │ • Budgeting    │ │   insights     │
│ • History      │ │ • Card mgmt   │ │ • Fraud ML     │
│ • Payments     │ │ • P2P pay     │ │ • Reporting    │
└───────┬────────┘ └───────┬────────┘ └───────┬────────┘
        │                  │                  │
        │           ┌──────▼──────┐           │
        │           │ Event Bus   │           │
        │           │ (IBM MQ +   │◄──────────┘
        │           │  Kafka)     │
        │           └──────┬──────┘
        │                  │
┌───────▼──────────────────▼───────────────────────────────────────┐
│                     z/OS Mainframe                                 │
│  ┌────────────┐  ┌────────────┐  ┌────────────┐  ┌────────────┐ │
│  │   CICS     │  │    DB2     │  │    MQ      │  │   Batch    │ │
│  │ (core txn  │  │ (accounts, │  │ (events,   │  │ (EOD, stmt │ │
│  │  programs) │  │  ledger)   │  │  feeds)    │  │  reporting)│ │
│  └────────────┘  └────────────┘  └────────────┘  └────────────┘ │
└──────────────────────────────────────────────────────────────────┘

The key insight: the mainframe handles what it's best at (high-throughput transaction processing, ACID transactions, batch processing), and the cloud handles what it's best at (elastic scaling for mobile traffic, ML/analytics, rapid feature iteration for non-critical services).

Execution Timeline

Phase 1 (Q2-Q4 2022): API Foundation — 7 months

The team exposed 12 CICS transactions as REST APIs through z/OS Connect: - Balance inquiry - Transaction history (with pagination) - Fund transfer (internal) - Bill payment initiation - Account details - Statement request - Hold inquiry - Pending transaction list - Overdraft status - Linked accounts - Rate inquiry - Branch/ATM locator (simple DB2 read)

Each API endpoint was load-tested at 2x expected mobile traffic. Performance results:

Endpoint Throughput (req/sec) p50 Latency p99 Latency
Balance Inquiry 11,400 0.4ms 1.2ms
Fund Transfer 8,200 0.6ms 2.1ms
Transaction History 6,800 0.8ms 3.4ms
Bill Payment 7,100 0.5ms 1.8ms

Carlos's reaction to the production performance data: "This is faster than anything I've ever built on Kubernetes. The COBOL isn't the problem. It never was."

Phase 2 (Q1-Q3 2023): Cloud-Native Services — 9 months

With the core banking APIs in place, the mobile team could build new features rapidly: - Push notifications service (Java/Spring Boot on Kubernetes) - Budgeting and spending insights (Python analytics on AWS) - P2P payments (Kotlin microservice — this one genuinely belonged in the cloud because it integrated with external payment networks) - Card management (freeze/unfreeze, travel notifications — Java microservice)

These services consumed the z/OS Connect APIs for account data and published events to a Kafka topic for real-time analytics.

Phase 3 (Q4 2023 - Q2 2024): DevOps Modernization — 9 months

Yuki's domain. The mainframe development pipeline was transformed: - Source code migrated from CA Endevor to Git (Rocket Software's DBB for build) - Jenkins pipelines for COBOL compilation and link-edit - Automated unit testing with IBM zUnit - Integration testing with z/OS Connect API test suites - Zowe CLI for developer access to z/OS datasets and jobs - VS Code with IBM Z Open Editor for COBOL development

Frank — the 28-year veteran — initially resisted. His team had used ISPF and Endevor for decades. The transition was difficult. But within three months, the team reported: - 40% reduction in time from code change to production deployment - 60% reduction in "works on my screen, fails in production" incidents (because automated testing caught them) - Measurably improved code review process (Git pull requests vs. verbal walkthroughs)

Frank's eventual verdict: "I wish we'd done this ten years ago. Not the cloud stuff — the Git and testing stuff. That's what actually made us faster."

Phase 4 (Q3 2024 - ongoing): Strangler Fig for Selected Components

With the API layer stable, the team identified 70 programs (3.5% of the portfolio) suitable for incremental replacement via the strangler fig pattern (detailed in Chapter 33). These were bounded contexts where a Java microservice was genuinely superior to the COBOL implementation — primarily services that needed elastic scaling for mobile traffic patterns (traffic spikes during lunch hours and evenings that the mainframe could handle but at unnecessary MIPS cost).

The first strangler fig target: the statement generation service. The batch-only COBOL statement generator was replaced with a cloud-native service that generates real-time digital statements on demand, while the batch statement run continues for paper statement customers.

The Failed Detour: The Card Processing Experiment

Not everything went smoothly. In Q3 2023, Carlos convinced the team to attempt a more aggressive modernization of the card processing system — replacing the COBOL card authorization program with a Kotlin microservice.

The prototype worked in testing. In production, it failed within 72 hours.

The problem: the COBOL card authorization program used CICS intersystem communication (ISC) to validate transactions against the fraud detection system in real-time. The Kotlin replacement used a REST API call to the fraud service. Under peak load (Black Friday weekend), the REST call added 45ms of latency to each authorization. For a card authorization that must complete in under 100ms to avoid merchant timeouts, the additional 45ms pushed 8% of transactions past the deadline.

The COBOL version's ISC call completed in 2ms because it used CICS's internal transaction routing — no HTTP overhead, no JSON serialization, no TLS handshake. The Kotlin version had to traverse the full HTTP stack for every call.

They rolled back to the COBOL version on Saturday morning. Total cost of the experiment: approximately $180K in development effort, plus one very stressful weekend for the operations team.

Lesson: Not every component can be modernized by moving it off the mainframe. The CICS ISC performance advantage for latency-critical paths is real and measurable. The card authorization system was reclassified from "Replace (strangler fig)" to "Refactor (API + CI/CD)" — keep the COBOL, modernize the development process.

Financial Results (After 2.5 Years)

Metric Before (Q1 2022) After (Q3 2024) Change
Annual mainframe cost $18.4M | $15.8M -14%
MIPS consumption 3,200 2,750 -14%
API endpoints (z/OS Connect) 0 42 New capability
Mobile app features 12 47 +292%
Mobile app rating (App Store) 3.2 stars 4.6 stars +1.4 stars
Feature deployment frequency Quarterly Weekly (cloud), Monthly (mainframe) 4-12x faster
Production incidents (monthly) 8 5 -38%
Developer satisfaction (survey) 3.1/5 4.2/5 +35%
Customer acquisition cost $142 | $98 -31%

Total modernization investment: $8.4M over 2.5 years Annual mainframe cost savings: $2.6M/year (ongoing) Revenue from mobile growth: Attributed $12M incremental revenue in 2023-2024 from mobile customer acquisition enabled by the new API-driven app

ROI: The modernization paid for itself in under one year when mobile revenue is included.

Comparative Analysis: Big-Bang vs. Incremental

If SecureFirst had executed Carlos's original big-bang rewrite:

Factor Big-Bang Proposal Actual Incremental Approach
Timeline 18 months (estimated), likely 3-4 years 2.5 years and ongoing
Budget $18M (estimated), likely $28-35M $8.4M (actual)
Business value delivery Zero until cutover First API at month 4
Risk All-or-nothing Each phase independent
Mobile app features during transition Blocked (no APIs until cutover) 35 new features deployed during modernization
Core banking performance Degraded (Java < COBOL for this workload) Preserved (COBOL core unchanged)
Team conflict High (us vs. them) Resolved (collaborative after Q2 2022)
Knowledge preservation Lost (COBOL expertise discarded) Preserved and enhanced

The Cultural Transformation

Perhaps the most significant outcome wasn't technical — it was cultural. Before modernization, SecureFirst had two camps: the "mainframe people" and the "cloud people." Each camp viewed the other with suspicion. The mainframe team saw the cloud team as reckless innovators who didn't understand production systems. The cloud team saw the mainframe team as dinosaurs resistant to change.

The incremental approach forced collaboration. Carlos had to understand COBOL to design API contracts. Frank had to understand REST to validate API behavior. Yuki had to understand both JCL and Jenkins to build the CI/CD pipeline.

By the end of Phase 2, the teams were integrated. Mainframe developers were writing API test suites. Cloud developers were reviewing COBOL pull requests. Frank and Carlos co-presented at an industry conference on "Bridging Mainframe and Cloud" — a talk that would have been impossible 18 months earlier.

Yuki summarized the transformation: "We stopped arguing about platforms and started talking about what the customer needs. Once you do that, the technology decisions make themselves."

Discussion Questions

  1. Carlos's big-bang proposal estimated 18 months and $18M. Frank's analysis estimated 3-4 years and $28-35M. What specific factors account for this 2-3x difference? Who is more likely to be right, and why?

  2. The card processing experiment failed because CICS ISC was faster than REST for the latency-critical authorization path. How should the team have identified this risk before attempting the migration? What assessment criteria would have flagged this component as "keep on CICS"?

  3. SecureFirst's incremental approach cost $8.4M and delivered continuous business value. The big-bang approach would have cost $28-35M and delivered zero value until cutover. Yet many organizations still choose big-bang approaches. Why? What organizational dynamics favor big-bang over incremental?

  4. Frank initially resisted the DevOps modernization (Git, Jenkins, VS Code). He changed his mind within three months. What made the difference? How should modernization leaders handle resistance from experienced mainframe professionals?

  5. The mobile app went from 3.2 to 4.6 stars and contributed $12M in incremental revenue. Was the mobile API layer (z/OS Connect wrapping CICS transactions) the most important modernization activity, or was it the cloud-native services (notifications, budgeting, P2P)? Which provided more business value?

  6. SecureFirst's modernization was 97% refactor-in-place. Only 70 programs (3.5%) were targeted for replacement via strangler fig. Does this ratio surprise you? Would you expect it to be different for a larger organization like FBA or CNB? Why or why not?

  7. Compare SecureFirst's situation with FBA's. Both chose incremental approaches, but their motivations were different (speed-to-market vs. risk management/knowledge preservation). How does the business driver change the modernization strategy? Would Sandra's approach work for SecureFirst? Would Yuki's approach work for FBA?