Case Study 2: Pinnacle Health's Hybrid Success — Dev/Test on Cloud, Production on Mainframe

Background

Pinnacle Health Insurance processes 50 million claims per month for 12 million members across 38 states. The mainframe — an IBM z16 running 7,200 MIPS — hosts the claims adjudication engine, eligibility verification system, provider network management, and EDI 837/835 processing. The annual mainframe operating cost is $26.4 million.

Diane Okoye, systems architect, joined Pinnacle in 2021 after twelve years in distributed systems (Java, Spring, Kubernetes, AWS). She was hired specifically to bridge the gap between the mainframe team and the cloud-native team that was building Pinnacle's member-facing portal.

Ahmad Rashidi, compliance officer turned technical lead, had been at Pinnacle for eighteen years. He knew HIPAA regulations at a depth that made auditors nervous. Every design review started with the same question: "How does this affect our audit trail?"

When the CTO asked Diane to evaluate moving the mainframe to the cloud, Diane's instinct — honed by twelve years of building cloud-native systems — was to say yes. But she'd learned something in her first year at Pinnacle: mainframe people weren't dinosaurs. They were engineers who'd been building systems at a scale and reliability level that most cloud-native engineers had never experienced.

"I spent three months just understanding what CICS actually does," Diane told Ahmad. "It's not an application server. It's an operating system for transactions. You don't replace an operating system with a container."

The Assessment: What to Move, What to Keep

Diane spent four months (January-April 2023) conducting a workload-by-workload assessment using the framework she'd adapted from Chapter 32's modernization assessment approach. She categorized every mainframe workload on three dimensions: cloud fit, mainframe dependency depth, and business risk of migration.

The Workload Map

Workload Volume Cloud Fit MF Dependency Business Risk Verdict
Claims adjudication 50M claims/month, 12,000 TPS peak Low Deep (CICS + DB2 + MQ, 2-phase commit) Extreme (process $8B/month) Mainframe
Eligibility verification 8,000 TPS Low Deep (CICS + DB2, sub-5ms SLA) High (real-time provider queries) Mainframe
Provider network mgmt 200 TPS Medium Moderate (CICS + DB2, no MQ) Medium Evaluate
EDI 837/835 processing 45,000 files/night Low Deep (batch + MQ + DB2, complex transforms) High (HIPAA compliance) Mainframe
Dev/test (4 environments) ~1,100 MIPS High Low (mirrors production but no SLA) Low Cloud
Claims analytics Batch, monthly High Low (read-only, no real-time) Low Cloud
Actuarial modeling Quarterly batch High Low (read-only, CPU-intensive calc) Low Cloud
Regulatory reporting (CMS) Monthly/quarterly High Low (read-only, batch) Medium (accuracy critical) Cloud
Document generation 2.4M letters/month High Low (batch, output-only) Low Cloud
Training environments ~300 MIPS High Low None Cloud

Diane's assessment identified 40% of total MIPS consumption as cloud-eligible. But more importantly, it identified the right 40% — the workloads where cloud economics worked, operational risk was low, and the migration path was straightforward.

Phase 1: Dev/Test Migration (May-September 2023)

Implementation

Diane chose Azure as the cloud platform (Pinnacle's enterprise agreement made Azure cost-effective). The target: move four development environments and two training environments from the mainframe to Azure.

Infrastructure design: - 6 Azure VMs: Standard_E16s_v5 (16 vCPU, 128 GB RAM) running RHEL 8 - Micro Focus Enterprise Server licenses: 6 × 16-core = 96 cores licensed - Azure DevTest Labs for environment management (auto-shutdown, auto-startup) - Azure Managed Disks (Premium SSD) for VSAM-equivalent files - Azure Database for PostgreSQL Flexible Server (dev/test equivalents of DB2) - Azure DevOps integration for CI/CD pipeline

Data handling: - Production data subsetted (HIPAA compliance: no real PHI in dev/test) using IBM InfoSphere Optim - Test data generated synthetically for performance testing - Refreshed weekly from a mainframe extract (automated job)

Key decisions: 1. Keep DB2 on mainframe for system test. One environment (SIT — System Integration Test) connected directly to a mainframe DB2 subsystem via Micro Focus's DB2 passthrough feature. This allowed testing of actual DB2 SQL execution without converting SQL syntax. The other five environments used PostgreSQL. 2. Accept imperfect compatibility. Diane explicitly told her team: "Dev/test on cloud will not be 100% identical to production on z/OS. That's fine. We'll maintain a list of known differences and test those scenarios on the mainframe SIT environment." 3. Automate environment provisioning. Every dev/test environment was defined in Terraform and could be provisioned from scratch in 90 minutes. On the mainframe, provisioning a new LPAR took 3-4 weeks.

Results

Metric Before (Mainframe Dev/Test) After (Azure Dev/Test)
MIPS consumed 1,400 0
Annual mainframe savings $1.8M (marginal, verified with IBM)
Cloud annual cost $340K (compute + storage + licenses)
Net annual savings $1.46M
Environment provisioning time 3-4 weeks 90 minutes
Developer onboarding time 3-4 weeks (learn TSO/ISPF) 3-5 days (VS Code + MF ext.)
Concurrent environments 4 (constrained by MIPS) Unlimited (constrained by budget)
Known compatibility issues N/A 14 documented, all manageable
Migration project cost $410K
Year 1 net benefit $1.05M

Ahmad's compliance review confirmed that the dev/test migration met HIPAA requirements because no real PHI was present in the cloud environments (all data was either subsetted with masking or synthetically generated).

Unexpected Benefits

Developer hiring improved. Pinnacle posted a COBOL developer position in Q4 2023. Previous postings (requiring TSO/ISPF experience) received 3-5 applicants. The new posting (describing VS Code, Git, Azure DevOps, and cloud-based development) received 23 applicants. Diane hired two junior developers who'd learned COBOL in university using GnuCOBOL on Linux and transitioned to Micro Focus on Azure with minimal friction.

Testing velocity increased. Before cloud dev/test, Pinnacle ran 3 test cycles per release because environment contention forced teams to share. After cloud dev/test, each team had its own environment. Test cycles increased to 8 per release. Defect escape rate dropped 34%.

Sprint demos worked. The dev teams started doing live sprint demos from their Azure environments — showing stakeholders the actual COBOL application running, not just screenshots. This was impossible with 3270 terminal sessions that required VPN + TN3270 emulator + RACF credentials.

Phase 2: Reporting and Analytics Migration (October 2023 - March 2024)

Claims Analytics Migration

The claims analytics workload was a suite of 67 COBOL batch programs that read claims data, calculated metrics (denial rates, average processing time, provider cost patterns, fraud indicators), and produced reports consumed by the actuarial team, compliance team, and executive dashboard.

Characteristics that made it a good cloud candidate: - Read-only: no writes to the system of record - Batch: ran monthly and quarterly, not real-time - Tolerant of latency: reports that took 4 hours instead of 2.5 hours were acceptable - Self-contained: no CICS interaction, no MQ messaging, no two-phase commit - High MIPS during peak: unlike CNB's off-peak reporting, Pinnacle's analytics ran during business hours because the actuarial team needed results by end of day. This meant real marginal MIPS savings.

Data synchronization: Diane implemented IBM InfoSphere CDC to replicate claims data from mainframe DB2 to Azure PostgreSQL. The replication ran continuously with 8-15 second lag under normal load.

For the monthly analytics batch, the team added a "consistency checkpoint" — a batch job that ran on the mainframe, captured row counts and hash values for the claims tables, and transmitted the checksums to Azure. The cloud batch would not start until the Azure PostgreSQL checksums matched the mainframe checksums, ensuring data consistency.

Results: - 67 batch programs migrated to Azure (Micro Focus Enterprise Server) - Batch window: 2.5 hours (mainframe) → 4.1 hours (Azure) — acceptable per business requirements - Marginal mainframe savings: $680K/year (these programs ran during peak hours) - Azure annual cost: $186K - Net annual savings: $494K - Migration project cost: $320K - Break-even: Month 8

Actuarial Modeling Migration

The quarterly actuarial modeling suite — 12 large COBOL programs performing reserve calculations, mortality projections, and premium adequacy analysis — was the most CPU-intensive batch workload on the mainframe. It consumed 800 MIPS for a 14-hour run every quarter.

The packed decimal challenge. Actuarial programs perform enormous volumes of packed decimal arithmetic — millions of multiply/divide operations on COMP-3 fields. On z/OS, this is hardware-accelerated. On x86, it's software-emulated.

Diane benchmarked the actuarial suite on Azure before committing:

Metric z/OS (z16 hardware decimal) Azure (Micro Focus software decimal)
Total runtime 14 hours 23 hours
CPU cost per run $18,000 (MIPS allocation) | $4,200 (E96ds_v5, 96 vCPU, spot pricing)
Decimal arithmetic % of runtime ~60% ~75% (software emulation is the bottleneck)

The runtime increase was significant — 14 hours to 23 hours. But because this ran quarterly and the 23-hour window was acceptable (they had a full weekend), and because the cost per run dropped from $18K to $4.2K using spot instances, the economics worked.

Key insight: Diane used Azure spot instances (preemptible VMs) for the actuarial run because the workload could be checkpointed and restarted. If Azure reclaimed the VM mid-run, the COBOL checkpoint/restart logic (Chapter 24 patterns) would resume from the last checkpoint. This reduced compute cost by 65% compared to reserved instances.

"I never thought COBOL checkpoint/restart would help me save money on cloud," Diane told her team. "But mainframe patterns translate when you understand what they're actually doing."

Regulatory Reporting Migration

Monthly CMS (Centers for Medicare & Medicaid Services) reports and quarterly state regulatory filings — 34 COBOL batch programs reading claims data and producing fixed-format output files.

Straightforward migration: same pattern as claims analytics. The only complication was output format validation — CMS requires specific record layouts with specific field positions, and the EBCDIC-to-ASCII conversion changed some filler characters. Ahmad personally validated every output field position against the CMS specifications.

"HIPAA violations cost $50,000 per incident," Ahmad reminded the team. "One misaligned field in a CMS submission could affect 12 million member records. That's not a rounding error."

Ahmad's validation process added three weeks to the migration timeline. It also prevented two data format errors that would have been rejected by CMS.

Document Generation Migration

2.4 million member correspondence letters per month — Explanation of Benefits (EOB), claim status notifications, annual benefit summaries. Generated by 8 COBOL batch programs that read claims data and produced print-ready output.

Migrated to Azure with a modern twist: instead of producing print-ready files for the mail house, the cloud version produced PDFs stored in Azure Blob Storage and accessible through the member portal. The COBOL programs were unchanged — they still produced their standard output. A Python post-processing pipeline converted the output to PDF.

"That's the hybrid sweet spot," Diane said. "COBOL does the business logic. Cloud does the distribution. Neither has to change to do what the other is good at."

Phase 3: Hybrid Integration (April-September 2024)

With dev/test and reporting on Azure, Diane's next step was building the integration layer that would make the hybrid architecture sustainable long-term.

CDC Optimization

The initial CDC implementation (InfoSphere) worked but had rough edges:

  • Lag spikes during nightly batch. When the nightly batch window ran (6,000+ batch jobs producing heavy DB2 log volume), CDC replication lag spiked from 15 seconds to 3-5 minutes. The analytics team complained that their dashboards showed stale data during the batch window.
  • Solution: Diane implemented a "batch-aware" CDC configuration that increased the CDC agent's read-ahead buffer during the batch window and reduced the commit interval. Lag during batch dropped to 45-90 seconds.

  • Schema change incidents. Two DB2 schema changes (one ALTER TABLE ADD COLUMN, one REORG that changed the table's physical structure) broke CDC replication. Both times, the cloud analytics team discovered the break when their dashboards showed blank data.

  • Solution: Ahmad added a mandatory step to the change management process: every DB2 schema change required notification to the CDC team 48 hours in advance, and the CDC configuration update was included in the change ticket's implementation plan.

Monitoring Integration

Pinnacle built a unified monitoring dashboard using Grafana that displayed:

  • Mainframe health metrics (via SMF data exported to Azure via MQ → Event Hubs pipeline)
  • Azure infrastructure metrics (via Azure Monitor)
  • CDC replication lag (custom metric from the CDC agent)
  • Batch job status (mainframe CA7 job status exported via API; Azure batch status from Step Functions)
  • Data consistency metrics (hourly reconciliation checksums)

"For the first time in eighteen years," Ahmad said, "I can see the entire system on one screen."

Cost Tracking

Diane implemented detailed cost tracking from day one — not vendor estimates, not projections, but actual Azure invoices compared against actual mainframe bill changes.

Cumulative results after 18 months (through September 2024):

Category Amount
Savings
Mainframe MIPS reduction (verified with IBM) $3.84M
Faster developer onboarding (avoided contractor cost) $280K
Reduced defect escape rate (quality improvement) $120K (estimated)
Total savings $4.24M
Costs
Azure infrastructure (18 months) $756K
Micro Focus licenses (18 months) $504K
CDC tooling (InfoSphere) $180K
Migration projects (dev/test + reporting) $730K
Cloud operations (0.5 FTE) $195K
Direct Connect / ExpressRoute $108K
Security and compliance $84K
Monitoring integration $42K
Total costs $2.599M
Net benefit (18 months) $1.641M
Annual run-rate savings $1.92M/year

"The mainframe cost went down by $2.56M per year," Diane reported to the CTO. "The cloud costs $1.1M per year to operate. The net is $1.46M per year in steady-state savings. But honestly, the bigger win is what we didn't spend."

"What we didn't spend?"

"We didn't spend $50 million trying to move claims adjudication to the cloud. Because we figured out — with data, not opinion — where the cloud works and where it doesn't. That's the most expensive mistake we didn't make."

What Made Pinnacle Succeed Where MidWest Mutual Failed

1. They Started with an Assessment, Not a Destination

Diane's four-month workload assessment (before spending a dollar on cloud infrastructure) identified which workloads were cloud-eligible and which were not. MidWest Mutual's plan was "migrate everything to cloud." Pinnacle's plan was "move what benefits from cloud, keep what doesn't."

2. They Had Mainframe Expertise on the Team

Ahmad had eighteen years of mainframe experience. He reviewed every data conversion, every JCL modification, and every compliance implication. MidWest Mutual's team had three junior mainframe developers out of 42 people.

3. They Measured Marginal Cost, Not Allocated Cost

Diane worked with IBM's pricing team to calculate the actual mainframe bill impact of removing specific workloads. She knew that off-peak batch wouldn't save money, and she targeted on-peak workloads for migration. MidWest Mutual used allocated MIPS cost and assumed the entire mainframe bill would disappear.

4. They Accepted Hybrid as the Architecture

Diane never intended to decommission the mainframe. Her architecture diagram had two boxes — mainframe and cloud — connected by integration arrows, from day one. MidWest Mutual's architecture diagram had one box with a red X through it (the mainframe) and one box with a green checkmark (the cloud).

5. They Moved in Order of Increasing Risk

Dev/test first (lowest risk, highest learning). Then read-only reporting (low risk, real production workload). Then analytics and document generation. Never OLTP. Never claims adjudication. Each phase built on the operational experience of the previous phase.

6. They Measured Everything

Every migration had a detailed before/after comparison: runtime, cost, error rate, operational incidents. Every TCO calculation used actual Azure invoices, not vendor projections. When the numbers didn't work (they evaluated moving EDI processing to cloud and the TCO was unfavorable), they said no.

The Long-Term Architecture

Diane's architecture for Pinnacle as of late 2024:

MAINFRAME (z/OS, z16, 5,800 MIPS — down from 7,200)
├── Claims adjudication (CICS + DB2 + MQ)
├── Eligibility verification (CICS + DB2)
├── EDI 837/835 processing (batch + MQ + DB2)
├── Provider network management (CICS + DB2)
├── Core batch (nightly posting, balancing)
└── System of record (all DB2 master tables)

INTEGRATION LAYER
├── CDC: InfoSphere → Azure PostgreSQL (claims data, 15-sec lag)
├── Batch extract: Weekly full refresh (configuration data)
├── MQ bridge: Mainframe MQ → Azure Event Hubs (real-time events)
└── API: z/OS Connect → Azure API Management (member-facing services)

AZURE
├── Dev/test (6 environments, Micro Focus Enterprise Server)
├── Training (2 environments)
├── Claims analytics (67 COBOL batch programs, monthly)
├── Actuarial modeling (12 COBOL programs, quarterly, spot instances)
├── Regulatory reporting (34 COBOL programs, monthly/quarterly)
├── Document generation (8 COBOL programs + Python PDF pipeline)
├── Analytics database (Azure PostgreSQL, CDC-fed)
├── Member portal (Java/Spring, consuming z/OS Connect APIs)
├── Fraud detection ML (Azure ML, trained on CDC-replicated claims data)
└── Monitoring (Grafana dashboard, unified mainframe + cloud)

MIPS reduction: 7,200 → 5,800 (19.4% reduction) Annual mainframe savings: $2.56M Annual cloud cost: $1.1M Net annual savings: $1.46M Total migration investment: $730K Payback period: 6 months

"I'm a cloud architect who learned to love the mainframe," Diane said. "Not because it's nostalgic — because it's the best platform for high-volume transaction processing that exists. My job isn't to replace it. My job is to make the system better by putting each workload on the platform where it runs best."

Discussion Questions

  1. Diane spent four months on the workload assessment before spending any money on cloud infrastructure. Some would argue this was too slow. How do you balance thoroughness with speed? What's the minimum viable assessment?

  2. Ahmad's compliance review added three weeks to the regulatory reporting migration but caught two data format errors. How do you build compliance review into an agile migration process without it becoming a bottleneck?

  3. The actuarial modeling suite runs 64% slower on Azure than on z/OS but costs 77% less per run. At what point does the cost savings justify the performance degradation? What if the business eventually needs the actuarial results faster?

  4. Compare Pinnacle's $730K total migration investment with MidWest Mutual's $43M spend. Both are insurance companies. Pinnacle achieved $1.46M/year in net savings. MidWest Mutual achieved approximately $1.4M/year in net savings (from the workloads that did successfully migrate). What drove the 60:1 difference in migration investment cost?

  5. Diane's fraud detection ML model trains on CDC-replicated claims data. What happens if the CDC replication fails for 48 hours? Does the fraud model become dangerous? Design a mitigation strategy.