43 min read

> "I've seen six COBOL-to-cloud migrations. Two succeeded. Both kept the transaction engine on the mainframe."

Learning Objectives

  • Evaluate rehosting platforms (Micro Focus, Heirloom, NTT DATA) for running COBOL on cloud infrastructure
  • Analyze the technical limitations of COBOL on cloud (EBCDIC, packed decimal, CICS equivalence)
  • Design hybrid architectures where mainframe COBOL and cloud services collaborate
  • Calculate realistic TCO for cloud-migrated COBOL workloads
  • Decide which HA banking system components belong on cloud vs. mainframe

"I've seen six COBOL-to-cloud migrations. Two succeeded. Both kept the transaction engine on the mainframe." — Kwame Mensah, Chief Architect, Continental National Bank

Chapter Overview

Rob Calloway looked at the AWS bill and felt a familiar sinking feeling.

It was March 2024, and Continental National Bank had been running its regulatory reporting COBOL — 340,000 lines, forty-seven batch programs, two hundred and twelve JCL procs — on Amazon EC2 for exactly ninety days. The proof-of-concept had been deemed a success in December. By February, the operations team was starting to ask uncomfortable questions. By March, the answers were arriving in the form of invoices.

"We're at $127,000 a month," Rob told Kwame during the Tuesday architecture call. "The projection was $84,000."

Kwame didn't sound surprised. "What's the biggest line item?"

"Storage. We're paying for io2 Block Express volumes because the batch programs do random reads against the converted VSAM files. The optimizer in Micro Focus Enterprise Server doesn't batch I/O the way VSAM does on z/OS. So every RANDOM READ is an actual disk I/O. On the mainframe, half of those were served from data space buffers."

"And the CPU?"

"The CPU's fine when it runs. The problem is the batch window. On z/OS, the nightly reporting cycle takes 2 hours 40 minutes because WLM can prioritize it and the I/O subsystem can manage 180,000 IOPS across the FICON channels. On EC2, the same workload takes 6 hours 15 minutes because we can't match the I/O parallelism. So we're running larger instances for longer. That's where the extra $43K comes from."

Kwame was quiet for a moment. Then: "Rob, did anyone from the cloud team read Chapter 1?"

Rob laughed despite himself. "They read a vendor slide deck."

This chapter is about what happens between the vendor slide deck and the AWS bill — the technical realities of running COBOL on cloud infrastructure. We'll cover the rehosting platforms that make it possible, the workloads where it makes sense, the workloads where it doesn't, the hybrid architectures that represent the real sweet spot, and the TCO analysis that separates hope from arithmetic.

By the end of this chapter, you'll be able to evaluate any COBOL-to-cloud proposal and say — with data, not opinion — whether it's a good idea, a bad idea, or a good idea for the wrong workload.


34.1 The Promise and Reality of COBOL on Cloud

The Pitch

Every major cloud provider and every major COBOL tooling vendor has a version of the same pitch: take your COBOL applications, run them on commodity x86 hardware in the cloud, eliminate your mainframe costs, and unlock agility.

The pitch isn't entirely wrong. COBOL can run on Linux. COBOL can run on x86 hardware. COBOL can be deployed to AWS, Azure, and GCP. The question isn't "can it?" — the question is "should it, and for which workloads?"

To answer that honestly, we need to understand what you gain, what you lose, and what changes when COBOL leaves z/OS.

What You Gain

Elastic compute. On z/OS, your MIPS capacity is fixed (or requires a capacity-on-demand activation that costs real money). On cloud, you can spin up additional compute for peak workloads and release it when you're done. For batch workloads with variable volume — think month-end reporting, quarterly regulatory filings — this is genuinely valuable.

Pay-per-use economics. If your COBOL batch only runs 4 hours per night, you're paying for 24 hours of mainframe capacity but using 4. On cloud, you can (in theory) pay for 4. The reality is more nuanced than this — we'll dissect it in Section 34.6 — but the basic economic proposition is real for intermittent workloads.

Modern tooling ecosystem. Cloud platforms come with monitoring (CloudWatch, Azure Monitor, Stackdriver), logging (CloudTrail, Log Analytics), CI/CD (CodePipeline, Azure DevOps, Cloud Build), and infrastructure-as-code (Terraform, CloudFormation, Pulumi). If your mainframe shop is still using SDSF and JES2 output for monitoring, the cloud tooling is a genuine improvement in operational visibility.

Geographic distribution. If you need to run COBOL workloads close to data sources or users in multiple regions — say, regulatory reporting that must stay within EU data residency requirements — cloud regions give you that without buying hardware in Frankfurt.

Developer accessibility. Junior developers who freeze at the sight of TSO can use VS Code, connect to cloud instances, compile and run COBOL with familiar tools. For dev/test environments, this is a meaningful reduction in the learning curve.

What You Lose

And this is where the conversation gets uncomfortable.

I/O architecture. This is the big one. z/OS has dedicated I/O processors (SAPs), FICON channels with sustained throughput measured in gigabytes per second, data-in-memory through hiperbatch and data spaces, and an I/O subsystem that's been optimized for 60 years. Cloud block storage — even io2 Block Express on AWS or Ultra Disk on Azure — is general-purpose. It's good. It's not the same.

Rob's regulatory reporting hit this wall: random-access patterns against converted VSAM files on cloud block storage produced 3-5x the I/O latency compared to z/OS with VSAM LSR buffering. For sequential batch workloads, the gap is smaller. For random-access OLTP, the gap is a canyon.

💡 Practitioner's Note: The I/O gap is the single most underestimated factor in COBOL-to-cloud migrations. Cloud vendors measure IOPS (input/output operations per second) and throughput (MB/s). Mainframe architects measure I/O response time at the microsecond level — and know that the I/O subsystem's ability to overlap I/O with processing is what makes the batch window work. You can't replicate z/OS I/O behavior by buying more IOPS. The architecture is fundamentally different.

Parallel Sysplex capabilities. Coupling Facility-based data sharing, Sysplex-aware workload distribution, and automatic failover between LPARs have no cloud equivalent. You can build high-availability architectures on cloud — active-passive, active-active with database replication — but they're different architectures with different failure modes, different recovery characteristics, and different operational complexity.

WLM-grade workload management. z/OS Workload Manager dynamically adjusts dispatching priority, storage allocation, and I/O priority across thousands of concurrent workloads with goal-based service class definitions (Chapter 5). Cloud resource management uses CPU/memory limits, auto-scaling groups, and priority queues. The granularity and the sophistication are different by an order of magnitude.

CICS transaction management. CICS provides pseudo-conversational programming, two-phase commit coordination, dynamic transaction routing, shared temporary storage, transient data queues, and recoverable resource management — all integrated. Cloud alternatives provide some of these capabilities through separate services (API gateways, message queues, distributed transaction coordinators), but they're not integrated into a single transaction manager. We'll examine what "CICS emulation" actually provides in Section 34.2.

Data locality. On the mainframe, your COBOL programs, your DB2 data, your VSAM files, and your CICS regions all share the same high-speed memory bus and I/O subsystem. On cloud, data moves over networks — even within the same availability zone, you're adding microseconds of latency for every data access. For a program that does 500 database calls per transaction, those microseconds compound.

⚠️ The Latency Tax: A CICS COBOL transaction that makes 12 DB2 calls at 0.3ms each on z/OS (3.6ms total DB2 time) will make the same 12 calls at 1.5-3ms each on cloud (18-36ms total DB2 time). The COBOL code is identical. The SQL is identical. The latency increased 5-10x because the data path changed. This is physics, not tuning.

What Changes

Beyond gains and losses, some things simply change — they're neither better nor worse, but they require rearchitecting.

Character encoding. z/OS uses EBCDIC. Cloud (Linux/x86) uses ASCII/UTF-8. Every data file, every copybook with hardcoded character values, every collating sequence assumption, every SORT operation that depends on EBCDIC ordering — all of it changes. This sounds trivial. It isn't. Section 34.2 covers the technical depth of this problem.

Packed decimal arithmetic. COBOL on z/OS uses the hardware decimal arithmetic instructions built into the z/Architecture processor. These handle packed decimal (COMP-3) natively — add, subtract, multiply, divide, compare — in silicon. On x86, packed decimal must be emulated in software by the COBOL runtime. The emulation is correct, but it's slower. For programs that do heavy decimal arithmetic (interest calculations, actuarial computations, financial aggregations), the performance difference is measurable.

Job scheduling. z/OS batch runs under JES2/JES3 with enterprise schedulers like CA7, Control-M, or TWS. Cloud batch runs under... what? Cloud-native schedulers (AWS Step Functions, Azure Logic Apps) don't understand JCL, don't handle GDGs, and don't support the intricate predecessor/successor dependency chains that Rob Calloway manages every night. You'll need a migration strategy for your job scheduling, which is a project in itself.

Security model. RACF on z/OS provides integrated security that covers datasets, CICS transactions, DB2 objects, batch jobs, and system commands — all from a single security database. On cloud, security is assembled from IAM policies, network security groups, database authentication, and application-level controls. The security capabilities are comparable; the operational model is completely different. Ahmad Rashidi at Pinnacle Health spent four months mapping RACF profiles to AWS IAM equivalents for their compliance audit — and still found gaps.


34.2 Rehosting Platforms: Micro Focus, Heirloom, NTT DATA

Rehosting means running your COBOL programs on non-mainframe hardware with minimal code changes. The COBOL source code stays (mostly) the same; the platform underneath changes. Three vendors dominate this space, and understanding their differences matters because they make very different tradeoffs.

Micro Focus Enterprise Server (OpenText)

What it is: A comprehensive mainframe runtime emulator for Windows and Linux. Provides COBOL compilation (compatible with Enterprise COBOL syntax), CICS emulation (called Enterprise Server CICS), JCL interpretation, VSAM file support, IMS DB emulation, and DB2 connectivity (via SQL passthrough or database migration).

Where it's strong: - The most mature COBOL-to-x86 rehosting platform, with 30+ years of development - Broad Enterprise COBOL compatibility — handles most COBOL syntax including reference modification, intrinsic functions, XML GENERATE/PARSE, and JSON GENERATE/PARSE - CICS emulation covers the most commonly used API commands: EXEC CICS READ/WRITE/REWRITE/DELETE, SEND MAP/RECEIVE MAP, LINK/XCTL, START/RETRIEVE, SYNCPOINT, HANDLE CONDITION/ABEND - JCL interpreter handles most production JCL: DD statements, PROC invocation, conditional execution (IF/THEN/ELSE/ENDIF), GDG support, SORT integration - Integrated Eclipse-based IDE (Enterprise Developer) and VS Code extension

Where it falls short: - CICS emulation does not cover the full CICS API surface. Missing or limited: EXEC CICS SIGNAL EVENT, some TS queue recovery semantics, CICS channels across MRO regions, Sysplex-wide CSD operations. If your CICS programs use advanced features, test exhaustively. - JCL interpretation has edge cases: DFSORT control statements with complex INCLUDE/OMIT criteria, some IDCAMS REPRO options, certain catalog operations. Production JCL should be regression-tested, not assumed compatible. - No Parallel Sysplex equivalent. Data sharing, Sysplex workload distribution, and Coupling Facility structures are not available. Applications that depend on shared VSAM files across regions need architectural changes. - Performance for packed decimal arithmetic is software-emulated. Benchmarks show 2-4x slower than z/Architecture hardware decimal for arithmetic-heavy programs.

Cloud deployment options: AWS (EC2, EKS containers), Azure (VMs, AKS containers), GCP (Compute Engine, GKE). Micro Focus also offers a "Micro Focus COBOL" container image that can be deployed on any Kubernetes cluster.

Licensing model: Per-core licensing. A 16-core EC2 instance running Enterprise Server costs the software license plus the EC2 compute. The software license is not cheap — expect $15K-30K per core annually, depending on the components licensed (base COBOL runtime, CICS emulation, IMS emulation, JCL interpreter are separately licensed modules).

📊 Real Numbers from CNB's POC: Continental National Bank licensed Micro Focus Enterprise Server for their regulatory reporting POC. 8-core EC2 r6i.2xlarge instances. Software license: $168,000/year. EC2 compute (reserved, 1-year): $43,000/year. Storage (io2, 2TB): $36,000/year. Data transfer: $8,400/year. Total: $255,400/year for a workload that consumed approximately 400 MIPS on their z16 — which they estimated at $2M/year using their blended MIPS cost. The raw numbers looked good. The devil was in the batch window expansion, which we'll revisit in Section 34.6.

Heirloom Computing (now Heirloom, acquired by various entities)

What it is: A source-code transformation platform that converts COBOL into cloud-native Java bytecode running on the JVM. Unlike Micro Focus (which compiles COBOL to x86 native code), Heirloom translates COBOL semantics into Java classes that run on standard JVMs — OpenJDK, GraalVM, or cloud-managed Java runtimes.

Where it's strong: - Java bytecode output means the resulting applications deploy on any JVM platform — Kubernetes, AWS Lambda (with packaging), Azure App Service, GCP Cloud Run - No proprietary COBOL runtime license per core — once converted, the code runs on standard Java infrastructure - Leverages JVM garbage collection, JIT compilation, and the JVM's mature ecosystem of monitoring tools (JFR, async-profiler, Micrometer) - Conversion handles COBOL data types including packed decimal (via Java BigDecimal), REDEFINES (via union-type patterns), and 88-level condition names

Where it falls short: - The converted Java code is not idiomatic Java. It's COBOL semantics expressed in Java syntax — PERFORM paragraphs become method calls with non-obvious control flow, WORKING-STORAGE becomes class-level fields, GOTO becomes labeled breaks. A Java developer reading this code will be confused. A COBOL developer reading this code will be confused. Nobody's happy. - CICS emulation is limited. Heirloom provides basic transaction management and screen handling equivalents, but complex CICS patterns (MRO, Sysplex-wide routing, event processing, channels/containers) are not fully supported. - The conversion is not 100% automatic. Complex COBOL constructs — ALTER GOTO, PERFORM THRU with complex nesting, certain REDEFINES patterns, CALL BY CONTENT with length manipulation — may require manual intervention. - Performance characteristics change unpredictably. Some COBOL operations are faster on JVM (string manipulation benefits from JIT compilation); others are slower (packed decimal arithmetic through BigDecimal is significantly slower than hardware decimal). - Debugging is challenging. When production issues occur, you're debugging Java code that was generated from COBOL. Stack traces reference Java method names derived from COBOL paragraph names. Finding the original COBOL line that caused the problem requires mapping tools.

Cloud deployment options: Any cloud platform that runs Java — which is all of them. The selling point is the absence of proprietary runtime licensing.

Licensing model: Conversion service fee (typically per-line-of-code or per-program) plus ongoing support. No per-core runtime fees. This changes the TCO equation significantly for large deployments.

What it is: A CICS and batch emulation environment for UNIX/Linux/Windows. UniKix focuses specifically on the CICS transaction processing environment — it provides a CICS-compatible API layer, screen handling (BMS equivalent), and transaction management on commodity hardware.

Where it's strong: - Deep CICS emulation — covers more of the CICS API surface than most competitors, including some advanced features like CICS event processing, basic channel/container support, and multi-region operation emulation - Mature batch execution environment with JCL interpretation - Strong in the CICS-heavy workload space — if your primary challenge is running CICS COBOL programs off-mainframe, UniKix has the deepest compatibility - Integration with NTT DATA's broader managed services — they can operate the cloud environment for you

Where it falls short: - Smaller ecosystem than Micro Focus — fewer third-party integrations, smaller community, less documentation - DB2 for z/OS compatibility requires database migration to PostgreSQL, Oracle, or another RDBMS. DB2 SQL syntax differences (especially z/OS-specific features like temporal tables, row-level security, native stored procedures) require SQL migration effort. - No IMS emulation — IMS-dependent applications need a different approach - Coupling Facility, data sharing, and Parallel Sysplex features are not available (same limitation as all rehosting platforms)

Cloud deployment options: AWS and Azure primarily. GCP support varies by configuration.

Licensing model: Per-core or per-instance, with managed service options.

The Compatibility Matrix

Here's what I wish every vendor would put in their slide deck instead of the two-box arrow diagram:

Capability z/OS Native Micro Focus Heirloom NTT DATA
Enterprise COBOL compilation Full ~95% syntax compatible N/A (converts to Java) ~90% via GnuCOBOL or MF
CICS basic (READ/WRITE/LINK) Full Good Limited Good
CICS advanced (MRO, events, channels) Full Partial Minimal Partial-Good
CICS pseudo-conversational Full Good Limited Good
JCL interpretation Full Good (with edge cases) Partial Good
VSAM file access Full Good (performance differs) Via file adapters Good
DB2 for z/OS SQL Full Passthrough or migrate Via JDBC Migrate required
IMS DB Full Partial emulation Not supported Not supported
Packed decimal hardware Hardware native Software emulated Java BigDecimal Software emulated
EBCDIC native Yes Conversion required Conversion required Conversion required
Parallel Sysplex Full Not available Not available Not available
WLM workload management Full OS-level scheduling JVM/K8s scheduling OS-level scheduling
Five-nines availability Proven Architecture-dependent Architecture-dependent Architecture-dependent

⚠️ The 95% Trap: When a vendor says "95% compatible," remember that the 5% is where your most complex, most critical, and most difficult-to-test code lives. The easy COBOL works everywhere. The COBOL that makes your business unique — the edge cases, the REDEFINES-based data transformations, the deeply nested PERFORM THRU logic, the CICS programs that use HANDLE CONDITION with PUSH/POP — that's the 5% that breaks. Budget your testing accordingly.

The EBCDIC Problem in Detail

Let me be specific about why character encoding is not a trivial conversion problem, because every migration plan I've reviewed treats it as a line item that takes two weeks. It doesn't.

Hardcoded character literals. COBOL programs routinely contain statements like:

       IF WS-STATUS-CODE = X'C1'
           PERFORM 2100-PROCESS-ACTIVE
       END-IF

X'C1' is EBCDIC for 'A'. On ASCII, 'A' is X'41'. Every hardcoded hex literal that represents a character must be found and converted. Automated tools catch most of them. "Most" is not "all."

Collating sequence. EBCDIC sorts lowercase before uppercase, letters before numbers. ASCII sorts numbers before uppercase, uppercase before lowercase. If your COBOL programs sort data and then use binary search (SEARCH ALL) or assume a particular record order, the results change. If your batch jobs produce sorted reports and downstream systems parse by position assuming a sort order, those systems break.

Packed decimal with embedded signs. COMP-3 fields store the sign in the low nibble of the last byte. z/OS COBOL treats X'C' (positive), X'D' (negative), and X'F' (unsigned positive) as valid signs. Some x86 COBOL runtimes are stricter about sign representation. Data files created on z/OS with non-standard sign nibbles (which exist in production, because forty years of data creates every edge case) may cause data exceptions on x86.

National (DBCS) data. If your COBOL programs handle double-byte character set data (Japanese, Chinese, Korean) using the NATIONAL data type or DBCS literals, the conversion from EBCDIC DBCS to Unicode is well-defined but must be tested exhaustively. Code page mismatches produce garbled data, not errors.


34.3 What Works: Batch, Reporting, Dev/Test

After watching a dozen COBOL-to-cloud migrations — some successful, some catastrophic, most somewhere in between — I can tell you that three workload categories consistently succeed on cloud. Not all COBOL, not all cloud. These three.

Batch Reporting and Analytics

This is the sweet spot. The workload profile is:

  • Runs periodically (nightly, weekly, monthly, quarterly) — not 24/7
  • Read-heavy — reads transaction data, calculates aggregates, produces reports
  • Tolerant of higher latency — a report that takes 3 hours instead of 2 hours is acceptable
  • Independent of CICS — no online transaction dependencies during execution
  • Produces output, doesn't mutate core data — worst case if it fails, you rerun it

CNB's regulatory reporting is the textbook example. Forty-seven COBOL programs read transaction data (replicated from the mainframe nightly), calculate regulatory metrics (capital ratios, risk-weighted assets, liquidity coverage), and produce reports for the OCC, Federal Reserve, and FDIC. These programs don't update account balances. They don't interact with CICS. They run once per reporting period. And they consumed 400 MIPS that Kwame wanted back for growing OLTP workloads.

Why it works on cloud: 1. The elastic compute model fits: spin up large instances for the reporting window, release when done 2. The data can be replicated to cloud storage without real-time synchronization requirements 3. The batch programs run on Micro Focus Enterprise Server with minimal modification — the JCL interpretation handles standard batch patterns well 4. Report output goes to cloud storage (S3, Blob, GCS) for distribution — no CICS output involved 5. The acceptable latency window absorbs the I/O performance gap

What CNB had to solve: - Data replication: Nightly extract from mainframe DB2, convert from EBCDIC, load to cloud database (PostgreSQL on RDS). Initial implementation took 90 minutes; optimized to 35 minutes. - GDG handling: Seventeen of the forty-seven programs used Generation Data Groups. Micro Focus JCL interpreter supports GDGs but the behavior differs slightly. Three GDG-related production incidents in the first month. - Sort collation: Two report programs produced output sorted by customer name. The EBCDIC-to-ASCII collation change reordered the output. Downstream consumers (Excel macros used by the compliance team) broke. Fixed by explicit SORT collation overrides. - Monitoring: Replaced CA7 scheduling and SMF-based monitoring with AWS Step Functions orchestration and CloudWatch dashboards. This took longer than expected — the operations team had to learn an entirely new monitoring paradigm.

Development and Test Environments

If I could recommend exactly one COBOL-to-cloud workload for organizations to start with, it's this one. Move your dev/test environments to the cloud.

Why it's almost always a good idea:

  1. MIPS cost savings are real and immediate. Development and test workloads consume MIPS on the mainframe — often 15-25% of total capacity. Every COMPILE, every BIND, every test execution counts against your MIPS license. Moving dev/test to cloud eliminates that consumption. At $5,000-$8,000 per MIPS per year (blended cost including software), even 500 MIPS of dev/test savings is $2.5M-$4M annually.

  2. Spin-up speed for new environments. On the mainframe, provisioning a new LPAR for a project team takes weeks of system programmer time, capacity planning, and change management. On cloud, you can provision a complete COBOL dev/test environment in hours using infrastructure-as-code templates.

  3. Isolation between teams. Multiple development teams sharing the same mainframe dev/test regions step on each other — shared VSAM files, shared DB2 subsystems, contention for resources. Cloud environments can be fully isolated per team, per project, or per sprint.

  4. Imperfect compatibility is acceptable. The 5% of COBOL that doesn't work perfectly on the rehosting platform? In dev/test, you discover and document those gaps. In production, they cause incidents. Dev/test is where you want to find the incompatibilities.

  5. Junior developers are productive faster. A new COBOL developer can start coding in VS Code on a cloud dev environment in their first week. The same developer facing TSO, ISPF, and SDSF for the first time may not be productive for a month.

What Pinnacle Health did: Diane Okoye moved Pinnacle's COBOL dev/test to Azure. She provisioned three Azure DevTest Lab environments — development, system test, and UAT — running Micro Focus Enterprise Server on Standard_E16s_v5 VMs. The annual cost: $340,000 (compute + storage + Micro Focus licensing for dev/test). The mainframe MIPS savings: approximately $1.8M/year. Net savings in year one (after migration project cost of $400K): approximately $1.06M.

"The savings were nice," Diane told Ahmad during their quarterly review. "The real win was speed. We went from three-week dev environment provisioning to same-day. That changed how we staffed projects."

Low-Volume CICS Transactions (with Caveats)

Some CICS workloads can run on cloud — but the caveats are load-bearing.

Profile of CICS workloads that can work on cloud: - Transaction volume under 100 TPS (transactions per second) - Latency requirements above 50ms per transaction - No cross-region data sharing (no MRO/ISC between the cloud CICS and mainframe CICS) - No CICS event processing or business event integration - Standard CICS API usage (READ/WRITE/LINK/XCTL, basic SEND MAP/RECEIVE MAP) - Database access is self-contained (no distributed two-phase commit with mainframe DB2)

Real example: SecureFirst ran their internal admin tools — employee lookup, configuration management, system parameter maintenance — on NTT DATA UniKix on AWS. These were 12 CICS transactions, ~200 executions per day, simple READ/WRITE against a small VSAM-equivalent file. They worked. The response time went from 3ms on z/OS to 25ms on cloud, but nobody cared because these were internal admin screens.

What Yuki learned: "I was tempted to use the admin tools as proof that CICS works on cloud and then propose moving customer-facing transactions. Kwame — I mean, our advisor from the architecture review — talked me out of it. He said, 'Just because your bicycle made it across the creek doesn't mean the truck will.' He was right. The admin tools moved 200 transactions a day. Our customer-facing ATM authorization does 200 transactions per second. That's not a scaling problem — it's a different universe."

🔗 Cross-Reference: Compare the cloud CICS latency (25ms for simple transactions) with the z/OS CICS numbers from Chapter 17 (0.5-2ms for equivalent transactions). This is the latency tax from Section 34.1, manifested in real measurements. The absolute numbers are fine for low-volume internal tools. They're fatal for high-volume customer-facing OLTP.


34.4 What Doesn't Work: High-Volume OLTP, Full CICS, Data Sharing

I'm going to be blunt here because bluntness saves money. Some COBOL workloads should not be moved to cloud. Not "not yet." Not "when the technology matures." Not with the current generation of rehosting platforms, not with the current generation of cloud infrastructure. The architectural gaps are fundamental, not incremental.

High-Volume OLTP (Online Transaction Processing)

Any CICS transaction processing workload that meets any of these criteria should stay on the mainframe:

  1. Volume above 500 TPS sustained. The combination of CICS emulation overhead, network latency between application and database, and the absence of Coupling Facility-based workload distribution means that cloud-hosted COBOL OLTP hits a throughput ceiling well below what the same code achieves on z/OS. CNB's core banking does 5,800 TPS at peak. No rehosting platform comes close.

  2. Latency requirements below 10ms p99. CICS on z/OS achieves sub-millisecond p99 for well-tuned transactions because the I/O subsystem, the DB2 buffer pool, and the CICS region all share the same high-speed memory bus. On cloud, every database call crosses a network — even within the same availability zone. You cannot get below 5ms p99 for any transaction that touches a database, and most real transactions touch the database multiple times.

  3. Two-phase commit requirements. If your CICS transactions coordinate updates across DB2 and MQ (or DB2 and VSAM, or DB2 and IMS), the transaction manager provides two-phase commit to ensure atomicity. Cloud CICS emulators provide basic syncpoint, but distributed two-phase commit across heterogeneous resources is either not supported or not production-tested at scale.

  4. CICS-to-CICS communication (MRO/ISC). If your CICS architecture uses Multi-Region Operation — a TOR routing to AORs, or AORs communicating via function shipping — the rehosting platforms don't fully replicate this. You'd need to rearchitect the region topology, which means you're not "rehosting" anymore — you're redesigning.

The Sandra Chen Story: FBA's Failed Cloud OLTP Pilot

Sandra learned this the hard way — well, cheaply hard, because she ran a six-week pilot before committing.

In 2024, after the success of moving dev/test environments to cloud (Chapter 33 covered the strangler fig context), Sandra's deputy director asked whether the benefits eligibility verification — a CICS/DB2 transaction that processes 2,400 requests per hour during business hours — could move to AWS GovCloud.

"The vendor said yes," Sandra told Marcus. "Their benchmark showed 800 TPS on an r6i.4xlarge."

"Their benchmark used what data set?" Marcus asked.

"A synthetic test database with 10,000 beneficiary records."

Marcus closed his eyes. "We have 22 million."

Sandra ran the pilot on GovCloud with a replica of the production IMS-to-DB2 migrated data (22 million beneficiary records, 180GB database). The results:

Metric z/OS Production AWS GovCloud Pilot
Throughput 2,400 req/hour (40 TPS) 1,100 req/hour (18 TPS)
p50 latency 2.1ms 28ms
p99 latency 8.4ms 340ms
DB buffer hit ratio 98.7% 82.3% (PostgreSQL shared_buffers)
Error rate 0.001% 0.14% (timeout + data conversion)
Cost per transaction $0.003 (allocated) | $0.012 (measured)

The throughput drop wasn't because the COBOL was slower. It was because the data access pattern — 22 million records with complex eligibility lookups involving date arithmetic, benefit-tier calculations, and cross-reference validation — couldn't achieve the same buffer hit ratio on PostgreSQL that DB2 for z/OS achieves with its hiperpool and group buffer pool. Every cache miss became a 4ms disk read instead of a 0.1ms buffer pool hit.

Sandra's one-page summary to the deputy director: "The eligibility transaction works on cloud. It works 55% slower, costs 4x more per transaction, and fails 140x more often. I recommend we keep it on the mainframe and spend the cloud budget on the reporting workloads where cloud economics actually favor us."

The deputy director, to his credit, listened.

Full-Function CICS Environments

A "full-function CICS environment" means an installation that uses the breadth of CICS capabilities — not just basic transaction execution, but:

  • BMS maps with complex screen formatting (colors, attribute bytes, cursor positioning)
  • Temporary Storage Queues (TS) used for pseudo-conversational state management, shared data, and inter-transaction communication
  • Transient Data Queues (TD) for trigger-based processing and asynchronous workflows
  • Event processing for business events and system events
  • Channels and Containers for modern data passing between programs
  • CICS Document Templates for web output generation
  • CICS Liberty for Java/COBOL interoperability within the same CICS region
  • CICSPlex SM for multi-region management, workload management, and real-time monitoring

No rehosting platform supports all of these. Micro Focus Enterprise Server covers the first three reasonably well. NTT DATA UniKix covers the first four. Nobody fully covers the last four.

If your CICS installation uses CICSPlex SM for workload management across twelve AOR regions, you cannot rehost that to cloud without redesigning the entire topology. That's not a rehost — that's a replatform or rearchitect, which is a different cost, a different timeline, and a different risk profile.

Data Sharing and Parallel Sysplex Workloads

This is the hardest "no" to deliver because it's the most absolute.

Parallel Sysplex provides:

  1. DB2 data sharing — multiple DB2 subsystems accessing the same data simultaneously through the Coupling Facility's group buffer pool
  2. CICS Sysplex optimized routing — dynamic transaction routing based on real-time workload across multiple CICS regions on multiple LPARs
  3. Automatic failure recovery — if an LPAR fails, work migrates to surviving LPARs within seconds, with data integrity maintained by the Coupling Facility lock structure

There is no cloud equivalent. Not "limited cloud equivalent" or "partial cloud equivalent." No equivalent. The Coupling Facility is a hardware component with a specialized operating system (CFCC) that provides sub-microsecond lock arbitration and coherent cross-system caching. You cannot replicate this with distributed software locks over a network.

If your HA banking system (like the one you're building in the progressive project) uses DB2 data sharing across a four-member Parallel Sysplex for five-nines availability, that workload cannot be cloud-hosted without a fundamental architectural change — specifically, moving from Sysplex-based HA to active-passive replication (which changes your RTO from seconds to minutes) or active-active with application-level conflict resolution (which introduces complexity that z/OS has spent decades solving in hardware).

🔴 Hard Limit: If your application depends on Coupling Facility data sharing for availability or scalability, cloud rehosting is not an option. This isn't a technology maturity issue — it's a fundamental architectural difference between shared-everything (Sysplex) and shared-nothing (cloud) designs. Chapter 37 will address how to build a hybrid architecture that respects this boundary.


34.5 The Hybrid Sweet Spot

Every successful COBOL-to-cloud story I've been involved with — every one — ended up as a hybrid architecture. Not because hybrid was the plan, but because reality pushed the project there. The smart organizations started with hybrid as the plan and saved themselves eighteen months of failed experiments.

The Hybrid Architecture Pattern

┌─────────────────────────────────────────────────────────────────┐
│                         MAINFRAME (z/OS)                         │
│                                                                   │
│  ┌────────────┐  ┌────────────┐  ┌────────────┐                 │
│  │   CICS     │  │    DB2     │  │   Batch    │                 │
│  │  OLTP      │  │  System of │  │  Core      │                 │
│  │  Core      │  │  Record    │  │  Processing│                 │
│  │  Txns      │  │            │  │            │                 │
│  └─────┬──────┘  └─────┬──────┘  └─────┬──────┘                 │
│        │               │               │                         │
│  ┌─────┴───────────────┴───────────────┴──────┐                 │
│  │        z/OS Connect / MQ / CDC              │                 │
│  │     (Integration Layer — data flows out)    │                 │
│  └─────────────────────┬───────────────────────┘                 │
└────────────────────────┼─────────────────────────────────────────┘
                         │
                    ╔════╧════╗
                    ║ NETWORK ║
                    ╚════╤════╝
                         │
┌────────────────────────┼─────────────────────────────────────────┐
│                     CLOUD (AWS/Azure/GCP)                         │
│                                                                   │
│  ┌────────────┐  ┌────────────┐  ┌────────────┐  ┌───────────┐ │
│  │  Reporting │  │  Dev/Test  │  │  Analytics │  │   API     │ │
│  │  Batch     │  │  Envs      │  │  & ML      │  │  Gateway  │ │
│  │  (COBOL)   │  │  (COBOL)   │  │  (Non-COBOL│  │  (Mobile/ │ │
│  │            │  │            │  │   stack)   │  │   Web)    │ │
│  └────────────┘  └────────────┘  └────────────┘  └───────────┘ │
│                                                                   │
│  ┌─────────────────────────────────────────────────────────────┐ │
│  │     Cloud Database Replica (PostgreSQL/Aurora) — read-only   │ │
│  └─────────────────────────────────────────────────────────────┘ │
└───────────────────────────────────────────────────────────────────┘

What stays on the mainframe: The system of record. The OLTP engine. The core batch processing that mutates account balances. Anything that requires five-nines availability, sub-10ms latency, or two-phase commit coordination.

What moves to cloud: Read-only workloads (reporting, analytics). Development and test environments. API gateway and mobile/web serving layers. Machine learning and AI inference (fraud detection, recommendation engines). Anything that benefits from elasticity, pay-per-use, or modern tooling.

The integration layer: This is the most critical design decision in any hybrid architecture. How does data flow from mainframe to cloud, how fresh does it need to be, and what happens when the link fails?

Data Synchronization Patterns

Three patterns, in order of increasing complexity and decreasing latency:

Pattern 1: Nightly Batch Extract (Latency: 24 hours)

The simplest pattern. A batch job on the mainframe extracts data from DB2/VSAM, converts from EBCDIC to UTF-8, and writes to a file. The file is transferred to cloud storage (S3, Blob) via SFTP, Connect:Direct, or Sterling File Gateway. A cloud-side loader ingests the file into the cloud database.

When to use: Reporting that doesn't need intraday data. Month-end processing. Regulatory snapshots.

CNB implementation: Rob's reporting extract runs at 01:00, completes by 01:35, transfers to S3 by 01:50, loads into Aurora PostgreSQL by 02:15. The regulatory reporting batch starts at 02:30. Total latency from mainframe to cloud availability: ~90 minutes.

Gotchas: - EBCDIC conversion must handle packed decimal, signed numeric, and redefined fields correctly. A single misalignment in the conversion copybook produces garbage data in every record that follows. - File sizes can be enormous. CNB's nightly extract is 12GB compressed. Transfer over the network takes time and bandwidth. - Recovery: if the extract fails, the cloud batch has no data. Build a "stale data" detection mechanism into the cloud batch jobs.

Pattern 2: Change Data Capture / CDC (Latency: seconds to minutes)

IBM InfoSphere Data Replication, Attunity (now Qlik Replicate), or Debezium (for DB2 log reading) captures changes to mainframe DB2 tables and replicates them to a cloud database in near-real-time.

When to use: Cloud workloads that need intraday data. Dashboards. Real-time analytics. Fraud detection models that need current transaction data.

Pinnacle Health implementation: Diane set up IBM InfoSphere CDC to replicate the claims status table from mainframe DB2 to Azure PostgreSQL. Replication lag: 8-15 seconds under normal load, up to 45 seconds during batch window. The cloud-side analytics team runs fraud detection models against the replicated data.

Gotchas: - CDC reads the DB2 recovery log. Log volume during batch window can overwhelm the CDC agent, increasing lag. - Schema changes on the mainframe DB2 (ALTER TABLE) require CDC reconfiguration. If the DBA adds a column on Monday and nobody tells the CDC team, replication breaks on Tuesday. - The replicated database is read-only. If cloud applications need to write back to the mainframe, you need a separate write path (API call, MQ message).

Pattern 3: API-Mediated Real-Time (Latency: milliseconds)

z/OS Connect (Chapter 21) exposes CICS transactions as REST APIs. Cloud applications call these APIs synchronously for real-time data access. No data replication — the cloud calls the mainframe directly.

When to use: Mobile/web applications that need real-time mainframe data. Account balance inquiry, transaction history, eligibility verification — where the mainframe is the system of record and the cloud is the presentation layer.

SecureFirst implementation: Carlos's mobile banking app calls z/OS Connect APIs for balance inquiry, recent transactions, and fund transfer initiation. The API gateway (AWS API Gateway) routes requests to z/OS Connect over a dedicated AWS Direct Connect link. End-to-end latency: 80-120ms (includes network transit, z/OS Connect overhead, CICS transaction execution, and return).

Gotchas: - Network dependency: if the Direct Connect link fails, the mobile app cannot show balances. You need a caching strategy (stale data with "as of" timestamp) or a circuit breaker pattern. - Throughput: z/OS Connect has a throughput ceiling based on the CICS region capacity behind it. If the mobile app goes viral and traffic spikes 10x, the mainframe CICS region needs capacity — which means MIPS, which means cost. Cloud elasticity doesn't help because the bottleneck is the mainframe. - Cost: every API call consumes mainframe MIPS. High-volume API patterns (polling, chatty APIs with multiple calls per screen) can increase MIPS consumption faster than the cloud saves money.

📊 Decision Matrix for Data Sync Patterns:

Pattern Latency Complexity Cost Best For
Batch Extract Hours Low Low Reporting, analytics, regulatory
CDC Seconds-minutes Medium Medium Dashboards, near-RT analytics, fraud
API Real-Time Milliseconds High High (MIPS) Mobile/web, account services

Rule of thumb: Start with batch extract. Move to CDC only when business requirements demand it. Use real-time API only for user-facing interactions where stale data is unacceptable.

The CNB Hybrid Architecture

After the regulatory reporting POC (successful, despite the cost overrun), the batch window expansion analysis, and the failed OLTP experiment (which Kwame killed after two weeks of benchmarking), CNB settled on this hybrid architecture:

On z/OS (Parallel Sysplex): - All CICS OLTP (account inquiry, fund transfer, ATM authorization, wire transfer) - Core batch processing (nightly posting, interest calculation, GL balancing) - DB2 system of record (accounts, transactions, balances) - MQ messaging hub (inter-system communication) - Real-time fraud scoring (CICS program calling external ML model via MQ)

On AWS: - Regulatory reporting batch (the 47-program suite that started this journey) - Dev/test environments (3 environments, 8 developers) - Analytics data warehouse (Aurora PostgreSQL replica, CDC-fed) - Mobile banking API gateway (API Gateway → z/OS Connect over Direct Connect) - ML fraud model training (SageMaker, trained on CDC-replicated transaction data) - Document generation (account statements, compliance reports)

Integration: - Nightly batch extract for regulatory reporting data (Pattern 1) - CDC for analytics data warehouse and fraud model data (Pattern 2) - z/OS Connect APIs for mobile banking (Pattern 3) - MQ bridge for fraud scoring (mainframe sends transaction event to cloud ML model, receives score back)

Lisa Tran summarized the architecture in one sentence: "The mainframe does the work. The cloud does the looking."


34.6 TCO Reality Check

This section exists because every COBOL-to-cloud proposal I've reviewed has a TCO slide that's wrong. Not slightly wrong. Wrong in a way that makes the proposal look 40-60% cheaper than it actually is.

I'm going to walk through the math using CNB's regulatory reporting migration as the worked example — real numbers (anonymized but proportionally accurate), real categories, real gotchas.

The Vendor's TCO Slide

Here's what the cloud vendor presented to CNB's CFO:

CURRENT MAINFRAME COST (REPORTING WORKLOAD)
  MIPS allocation (400 MIPS × $5,000/MIPS)    $2,000,000/yr

PROPOSED CLOUD COST
  EC2 compute (reserved)                         $43,000/yr
  Storage (io2 + S3)                             $36,000/yr
  Micro Focus license (8-core)                  $168,000/yr
  Data transfer                                   $8,400/yr
  ─────────────────────────────────────────────────────────
  TOTAL CLOUD                                   $255,400/yr

  ANNUAL SAVINGS: $1,744,600 (87% reduction!)

The CFO liked this slide. Kwame did not.

What the Vendor's Slide Left Out

1. The mainframe cost doesn't go down by $2M when you move 400 MIPS of reporting.

This is the single biggest lie in cloud migration TCO. Software licensing on z/OS is based on the R4HA (rolling four-hour average) peak MIPS consumption. The reporting batch runs at 02:00-04:30. The peak MIPS window is 10:00-14:00 (online transaction processing). Moving the reporting batch to cloud removes 400 MIPS from the 02:00 window — which is not the peak window. The R4HA doesn't change. The software bill doesn't change.

CNB's actual mainframe cost reduction from moving reporting: $0 in Year 1.

Wait, it gets worse. The reporting batch at 02:00 actually helps CNB's pricing. IBM's Tailored Fit Pricing and Country Multiplex Pricing models consider the utilization pattern — workloads that run during off-peak hours spread the MSU consumption and can reduce the per-MSU price for peak workloads. Removing the off-peak batch could theoretically increase the per-MSU cost for the remaining workloads.

Rob ran the numbers with IBM's pricing team. The net mainframe cost change from removing reporting: +$84,000/year (slight increase due to pricing model impact).

⚠️ The MIPS-to-Dollars Fallacy: The "$5,000 per MIPS" number that every vendor uses in their TCO slide is the *fully allocated* cost — total mainframe cost divided by total MIPS. It's meaningful for capacity planning. It's meaningless for migration ROI calculations, because removing 400 MIPS from a 6,000-MIPS installation doesn't save 400 × $5,000. The cost reduction is marginal, not proportional. You need to calculate the marginal MIPS cost — what the mainframe bill actually decreases by when you remove this specific workload. For off-peak batch, the marginal cost is often close to zero.

2. Migration project costs.

Item Cost
COBOL code analysis and modification $120,000
JCL conversion and testing $85,000
Data conversion (EBCDIC, copybooks) $65,000
Integration development (extract/load) $95,000
Performance testing and tuning $110,000
User acceptance testing $45,000
Production cutover and parallel run $60,000
Project management $80,000
Total migration $660,000

The vendor had estimated $250,000. They weren't lying — they were estimating from their experience with simpler migrations. CNB's reporting suite had seventeen GDG-dependent programs, nine programs with complex REDEFINES-based data transformations, and four programs that used a sort exit routine written in Assembler. Each of these required specialized remediation.

3. Ongoing operational costs the vendor didn't mention.

Item Annual Cost
Cloud operations engineer (0.5 FTE) $85,000
Micro Focus license maintenance (20%) $33,600
Network (Direct Connect for data replication) $42,000
Monitoring/alerting (CloudWatch, PagerDuty) $12,000
Security compliance (quarterly pen test, audit) $24,000
Disaster recovery (cross-region replication) $31,000
Total ongoing (not in vendor estimate) $227,600

4. The expanded batch window cost.

The reporting batch that took 2:40 on z/OS takes 6:15 on EC2. The larger instances needed to keep it under 8 hours (Rob's hard deadline because downstream systems need the reports by 08:00) cost $43K more than the original EC2 estimate.

The Honest TCO: 5-Year Comparison

Year 1 Year 2 Year 3 Year 4 Year 5 Total
Option A: Keep on Mainframe
Mainframe allocated cost $2,000K | $2,060K $2,122K | $2,185K $2,251K | $10,618K
(3% annual increase)
Option B: Cloud Migration
Migration project $660K | — | — | — | — | $660K
Cloud infrastructure $298K | $307K $316K | $326K $336K | $1,583K
Micro Focus license $168K | $173K $178K | $184K $189K | $892K
Ongoing ops (new) $228K | $235K $242K | $249K $256K | $1,210K
Mainframe cost increase $84K | $87K $89K | $92K $95K | $447K
Cloud total $1,438K** | **$802K $825K** | **$851K $876K** | **$4,792K

5-year savings: $10,618K - $4,792K = $5,826K (55% reduction)

That's still a significant savings — but it's 55%, not 87%. And Year 1 actually costs more than staying on the mainframe ($1,438K vs. $2,000K is savings, but only $562K net savings after the migration investment, not the $1.74M the vendor promised).

Break-even point: Month 9 of Year 1 (including migration costs).

Risk-adjusted break-even (adding 30% contingency to migration and 15% to cloud ops): Month 14 — early Year 2.

💡 Kwame's Rule of TCO: "If your honest TCO shows less than 30% savings after including every cost you can think of, don't do it. The costs you can't think of will eat the margin. If it shows 50%+ savings honestly, it's probably real. Between 30-50% is the danger zone where you need to ask whether the operational risk is worth the marginal savings."

The Hidden Cost Nobody Budgets For: Organizational Disruption

Rob spent 340 hours in the first year managing the cloud reporting environment — learning AWS, debugging Micro Focus JCL interpretation issues, building CloudWatch dashboards, handling the three GDG-related incidents, and coordinating with the compliance team when the sort-order change broke their macros.

That's 340 hours he wasn't spending on the mainframe batch window optimization project that was supposed to reduce the nightly batch from 4:15 to 3:30. That project slipped six months.

"Opportunity cost is real," Kwame told the architecture review board. "Every hour Rob spends learning cloud is an hour he's not improving the mainframe. That's fine if the cloud work is strategic. It's not fine if the cloud work is operational firefighting because we underestimated the migration complexity."


34.7 Making the Decision

After six sections of technical analysis, platform evaluation, success stories, failure stories, and TCO dissection, let me give you the decision framework I use. It's not complicated. Complicated frameworks don't get used.

The Three Questions

Question 1: What is the workload's primary characteristic?

Characteristic Cloud Fit Mainframe Fit Verdict
Batch, read-heavy, periodic High Medium Cloud candidate
Dev/test High Low (expensive) Cloud, almost always
Reporting and analytics High Medium Cloud candidate
OLTP, high volume (>100 TPS) Low High Mainframe
CICS full-function Low High Mainframe
Data sharing (Sysplex) None High Mainframe (non-negotiable)
Real-time API serving Medium High Hybrid (gateway on cloud, backend on MF)

Question 2: What is the marginal mainframe cost savings?

Not the allocated cost. The marginal cost — what the mainframe bill actually decreases by when this workload is removed. If the workload runs off-peak and removing it doesn't change the R4HA, the marginal savings is zero and the cloud migration has to justify itself on non-cost benefits (agility, tooling, geographic distribution).

Question 3: Can you operate it?

This is the question nobody asks and everybody should. Do you have the people who can operate COBOL on cloud? Not develop — operate. Debug production issues at 2am when the Micro Focus JCL interpreter throws an error your team has never seen? Tune PostgreSQL to match the access patterns of a COBOL program designed for DB2? Manage the EBCDIC-to-ASCII data pipeline when a new data field with packed decimal appears?

If the answer is "we'll train them," add 12 months and $200K to your timeline and budget. If the answer is "we'll hire," add 6 months and $300K (cloud + mainframe skills in one person is a unicorn hire).

The Decision Tree

START: "Should this COBOL workload move to cloud?"
│
├─ Does it use Parallel Sysplex data sharing?
│  └─ YES → STOP. Keep on mainframe. (Section 34.4)
│
├─ Is it high-volume OLTP (>100 TPS, <10ms latency)?
│  └─ YES → STOP. Keep on mainframe. (Section 34.4)
│
├─ Is it a dev/test environment?
│  └─ YES → MOVE TO CLOUD. (Section 34.3)
│       Almost always the right answer.
│
├─ Is it batch reporting/analytics (read-heavy, periodic)?
│  └─ YES → EVALUATE:
│       ├─ Marginal mainframe savings > 30%? → Strong candidate
│       ├─ Marginal mainframe savings 10-30%? → Evaluate non-cost benefits
│       └─ Marginal mainframe savings < 10%? → Weak candidate
│            Consider only if agility/tooling benefits are compelling
│
├─ Is it low-volume CICS (<100 TPS, >50ms acceptable)?
│  └─ YES → EVALUATE CAREFULLY:
│       ├─ Standard CICS API only? → Possible
│       ├─ Advanced CICS features? → Unlikely
│       └─ Cross-region communication? → No
│
└─ Everything else → DEFAULT TO MAINFRAME.
    The burden of proof is on the cloud proposal, not on keeping
    the mainframe. "Move to cloud because cloud" is not a strategy.

Applying the Framework to the Anchor Examples

CNB (Kwame/Lisa/Rob): - Regulatory reporting → Cloud (batch, periodic, read-heavy, marginal savings real) ✅ - Dev/test environments → Cloud (obvious win) ✅ - Core OLTP (ATM, wire, posting) → Mainframe (Sysplex, 5,800 TPS, sub-ms latency) ✅ - Analytics warehouse → Cloud (CDC-fed replica, no write-back) ✅ - Mobile API gateway → Hybrid (cloud gateway → z/OS Connect → CICS) ✅

Pinnacle Health (Diane/Ahmad): - Claims adjudication → Mainframe (12,000 TPS, CICS + DB2 + MQ, two-phase commit) - Dev/test → Cloud (Azure DevTest Labs, $1.8M MIPS savings) - Claims analytics → Cloud (CDC-replicated, fraud detection models on Azure ML) - EDI processing → Mainframe (high-volume batch, complex MQ integration) - Provider portal (low-volume inquiry) → Evaluate (could be cloud if <100 TPS, standard CICS)

Federal Benefits (Sandra/Marcus): - Eligibility verification → Mainframe (Sandra's pilot proved this — Section 34.4) - Benefits calculation → Mainframe (22M records, IMS-to-DB2 migrated, complex business rules) - Dev/test → Cloud (AWS GovCloud, security compliance requirements add cost) - Regulatory reporting → Cloud (quarterly, read-heavy, batch — ideal candidate) - Document generation (benefit letters) → Cloud (batch, output-only, no data mutation)

SecureFirst (Yuki/Carlos): - Internal admin tools → Cloud (12 transactions, 200/day — low-volume CICS success story) - Mobile API gateway → Hybrid (cloud gateway, mainframe backend via z/OS Connect) - Core banking OLTP → Mainframe (Carlos's big-bang proposal was killed for good reasons — Chapter 32) - Dev/test → Cloud (already moved, integrated with Jenkins/Git CI/CD pipeline)

The Cloud Migration Maturity Model

Organizations don't jump from "everything on mainframe" to "optimal hybrid" in one step. The journey has stages, and trying to skip stages leads to expensive failures.

Stage 1: Cloud Curious (6-12 months) - Move dev/test to cloud - Build team's cloud skills - Establish network connectivity (Direct Connect / ExpressRoute) - Learn the cloud's operational model - Deliverable: cloud landing zone, first dev/test environments running

Stage 2: Cloud Capable (12-24 months) - Migrate first batch workload (reporting or analytics) - Implement CDC for near-real-time data replication - Build monitoring and incident response for cloud COBOL - Measure actual TCO (not projected) - Deliverable: first production workload on cloud, validated TCO

Stage 3: Hybrid Optimized (24-48 months) - Multiple workloads running on cloud - Mature data synchronization (batch + CDC + API) - Integrated monitoring across mainframe and cloud - Team operates both environments confidently - Clear policies for what runs where - Deliverable: documented hybrid architecture, operational runbooks for both environments

Stage 4: Strategic Hybrid (48+ months) - Workload placement decisions are routine, data-driven - New workloads are deployed to the optimal platform from inception - Cost optimization is continuous (reserved instances, right-sizing, Tailored Fit Pricing) - The organization doesn't think about "mainframe vs. cloud" — it thinks about "where does this workload run best?" - Deliverable: mature hybrid operations, optimized TCO, strategic architecture

🔵 Spaced Review — Chapter 1 (z/OS Strengths): Review Section 1.3 on the z/OS I/O subsystem and Parallel Sysplex architecture. Those capabilities — dedicated I/O processors, FICON channels, Coupling Facility, WLM — are exactly the capabilities that cloud cannot replicate. The decision about what stays on the mainframe is fundamentally a question about which workloads need those capabilities.

🔵 Spaced Review — Chapter 23 (Batch Architecture): Review Section 23.2 on the batch window and critical path analysis. Rob's batch window expansion from 2:40 to 6:15 on cloud happened because cloud I/O can't match z/OS I/O throughput for random-access patterns. The batch window isn't just a time question — it's an I/O architecture question.

🔵 Spaced Review — Chapter 32 (Modernization Strategy): Review Section 32.3 on the Rehost strategy. This chapter is the deep dive on what "rehost to cloud" actually means in practice. Sandra's TCO analysis from Chapter 32 should be updated with the detailed cost categories from Section 34.6.


Chapter Summary

Let me give you the practitioner's summary — what I'd tell a colleague over coffee.

Cloud is real, and it works for some COBOL workloads. Dev/test, reporting, analytics, document generation, low-volume admin tools. These workloads benefit from elastic compute, modern tooling, and pay-per-use economics. If you haven't moved your dev/test to cloud yet, that's your first project.

Cloud doesn't work for high-volume OLTP, full CICS, or Sysplex-dependent workloads. Not because the technology isn't mature enough yet. Because the architectural gap is fundamental. z/OS's I/O subsystem, transaction management, and data sharing capabilities exist because of sixty years of hardware/software co-design. You can't replicate that on commodity hardware, and that's not an insult to commodity hardware — it's a different design point.

Hybrid is the answer. Not because it's a compromise, but because it's the architecture that puts each workload on its optimal platform. The mainframe does the heavy lifting — OLTP, core batch, system of record. The cloud does the looking — reporting, analytics, API serving, dev/test. The integration layer (batch extract, CDC, APIs) connects them.

TCO analysis must be honest. Marginal cost, not allocated cost. Migration project cost included. Operational cost included. Organizational disruption included. If it still shows 30%+ savings, proceed. If not, the business case is weak regardless of what the vendor's slide says.

Start with dev/test. Then reporting. Then stop and evaluate. The organizations that succeed at COBOL-to-cloud are the ones that take it one workload at a time, measure everything, and make the next decision based on data from the last one. The organizations that fail are the ones that commit to a three-year, all-workloads cloud migration plan before they've successfully moved a single batch job.

Rob Calloway, nine months after the regulatory reporting migration, was asked by CNB's CFO whether they should move core banking to the cloud.

Rob opened a spreadsheet. "Here's the actual TCO for the reporting migration. Here's the batch window comparison. Here's the cost-per-transaction data from the OLTP benchmark Kwame ran. And here's what I spent 340 hours of my time on this year."

The CFO studied the numbers. "So the reporting move was worth it."

"Yes. Solidly."

"And core banking would not be."

"Not unless you'd like to explain to the board why ATM authorization went from one millisecond to twenty-eight."

The CFO closed the spreadsheet. "Fair enough. What's next?"

Rob had the answer ready: "Analytics warehouse. Same pattern as reporting — read-only, batch-fed, elastic demand. And this time, I know what to budget."

That's what progress looks like. Not a PowerPoint slide with two boxes and an arrow. A spreadsheet with real numbers, a team with real experience, and a decision made with real data.


Looking Ahead

Chapter 35 will explore how AI and large language models can assist with COBOL code analysis, documentation, and modernization — including the capabilities and the significant limitations. Chapter 37 will bring together the hybrid architecture concepts from this chapter with the strangler fig patterns from Chapter 33 into a comprehensive hybrid architecture design for the HA banking system.

But first: the exercises, case studies, and project checkpoint for this chapter will give you hands-on practice evaluating cloud candidacy, calculating honest TCO, and designing the hybrid architecture for your HA banking system's reporting and analytics components.