Case Study 2: When the Mainframe Meets the Cloud — SecureFirst's Architecture Discovery
The Pitch
In early 2024, SecureFirst Retail Bank's executive team approved a mobile-first strategy. The centerpiece: a new mobile banking app that would offer real-time balance checks, instant transfers, bill payments, and loan applications — all with sub-second response times. The project was called "Project Velocity."
Carlos Vega, the mobile API architect, was hired specifically for this initiative. He came from a fintech startup where he'd built REST APIs on Node.js and Kubernetes. His résumé included zero mainframe experience, and frankly, he considered that a feature, not a bug.
"I was told the mainframe was a black box," Carlos remembers. "The API team would call COBOL programs through z/OS Connect. We didn't need to know what happened inside the box. That was the systems team's problem."
Yuki Nakamura, the DevOps lead, had a similar perspective. She'd spent five years building CI/CD pipelines for cloud-native applications. When she was brought onto Project Velocity, she assumed the mainframe would be just another deployment target — another endpoint she could wrap in an abstraction layer and automate.
They were both about to have a very educational year.
The Assumption
The Project Velocity architecture, as initially designed by Carlos and the mobile team:
┌──────────┐ ┌─────────────┐ ┌──────────────┐ ┌──────────────┐
│ Mobile │───▶│ API Gateway │───▶│ z/OS Connect │───▶│ CICS/COBOL │
│ App │◀───│ (AWS) │◀───│ (black box) │◀───│ (black box) │
└──────────┘ └─────────────┘ └──────────────┘ └──────────────┘
The assumptions embedded in this architecture:
- z/OS Connect is a simple pass-through. Just send JSON in, get JSON back. Like calling a microservice.
- CICS/COBOL programs are stateless. Each API call is independent. No session state to worry about.
- Response time is predictable. If the COBOL program completes in 30ms, the API will respond in 30ms plus network latency.
- Scaling is transparent. If you need more throughput, just add more z/OS Connect instances. Like adding Kubernetes pods.
- The mainframe team handles everything below z/OS Connect. The API team doesn't need mainframe knowledge.
Every single one of these assumptions was wrong.
The First Wake-Up Call: Response Time
Three months into development, Carlos's team had the first API endpoints working. Account balance inquiry — the simplest possible use case. A COBOL CICS program reads one DB2 row and returns the balance.
CICS statistics showed the transaction completed in 25ms. Network latency between AWS and the z/OS LPAR (connected via AWS Direct Connect) was measured at 8ms round-trip. Carlos expected a total API response time of about 40ms.
He got 620ms.
"I nearly fell out of my chair," Carlos says. "Six hundred milliseconds for a balance check? My Node.js prototype hit a Postgres database in 12ms."
Yuki started digging. The 620ms broke down as follows:
| Component | Time |
|---|---|
| Mobile app to API Gateway | 45ms |
| API Gateway to z/OS Connect | 8ms |
| z/OS Connect: TLS handshake | 180ms |
| z/OS Connect: JSON to COMMAREA transformation | 35ms |
| z/OS Connect: CICS attachment | 85ms |
| CICS transaction execution | 25ms |
| z/OS Connect: COMMAREA to JSON transformation | 30ms |
| z/OS Connect: TLS response | 12ms |
| Return path | 200ms (cumulative overhead) |
The TLS handshake was the biggest single contributor — 180ms for every request because z/OS Connect was establishing a new TLS connection to CICS for each API call. There was no connection pooling configured.
The CICS attachment overhead — 85ms to create and destroy a CICS task for each API call — was the second largest contributor. This wasn't a simple function call. It was the full CICS task lifecycle: task creation, program load (even from cache, there's dispatcher overhead), DB2 thread allocation, task termination, and thread deallocation.
"That's when I realized z/OS Connect isn't a pass-through," Carlos says. "It's a translation layer that does real work. And every piece of work has overhead that's invisible if you treat it as a black box."
The Second Wake-Up Call: WLM and the Lunch Rush
After tuning connection pooling (which brought the average response time down to 180ms — acceptable for a first release), the team went into beta testing with 500 internal employees.
For two weeks, everything was fine. Then they hit a Thursday — end-of-month, payday, and the day CNB's batch scheduler runs the monthly statement generation job. (SecureFirst shares a managed service provider with CNB for some infrastructure, and the provider's scheduling decisions affected SecureFirst's LPAR.)
Between 12:00 PM and 1:00 PM — the lunch hour, when mobile banking usage spiked — API response times jumped from 180ms to 2,400ms. Some requests timed out entirely.
Yuki's first instinct was to check the API Gateway and z/OS Connect. Both looked fine — low CPU, low memory, no errors. The CICS transaction was still completing in 25-30ms. But somehow, end-to-end response time was 13x higher.
She escalated to SecureFirst's z/OS systems programmer — a contractor named Tim who came in two days a week. Tim found the problem in ten minutes.
"Your z/OS Connect server address space is classified as a VELOCITY goal in WLM, which is fine," Tim explained. "But the CICS region that serves your API transactions is classified with the same service class as the batch CICS region that processes statement generation. When the statement job kicked off, WLM saw 800 batch CICS transactions competing with your 500 API requests — and it treated them all equally. Your API requests were waiting in the CICS dispatcher queue behind batch work."
The fix: a separate WLM service class for the API-serving CICS region, with a higher velocity goal than the batch CICS work. Twenty minutes to implement. But it required knowledge that nobody on the Project Velocity team possessed — an understanding of how WLM, CICS, and the z/OS dispatcher interact.
"That was the moment I realized I couldn't treat the mainframe as a black box," Yuki admits. "The performance characteristics of my API depended on z/OS internals that were completely outside my monitoring. I couldn't even see the WLM classification from my side."
The Third Wake-Up Call: Data Sharing and the Funds Transfer
The next feature was funds transfer — the same operation CNB runs as the XFER transaction. Carlos designed the API as two calls:
POST /transfers— initiate the transfer (debit source account, credit target account)GET /transfers/{id}— check transfer status
In beta testing, the transfer worked perfectly — on a single LPAR. Then SecureFirst's systems programmer added a second CICS AOR for capacity (they were preparing for the public launch), and the transfer started failing intermittently.
The symptom: the POST /transfers call would succeed (return HTTP 200), but the GET /transfers/{id} call made one second later would return "transfer not found." Ten seconds later, the same GET would succeed.
The problem: SecureFirst had added the second CICS AOR on the same LPAR but had not configured it to share VSAM files through RLS (Record Level Sharing). The transfer record was written by CICS AOR1 to a local VSAM file. AOR2 had its own copy of the file (via CICS file definition), and the record wasn't there yet — it would only appear after the file was closed and reopened, or after the buffer was flushed.
"In my world — microservices with a shared database — this doesn't happen," Carlos says. "You write to Postgres, and every other service can read it immediately. On the mainframe, VSAM file sharing has rules I didn't know existed."
The solution required understanding VSAM RLS, CICS file control, and (for DB2 data) the data sharing group buffer pool coherence protocol. The team moved the transfer record to DB2 (where data sharing handled coherence automatically) and configured VSAM RLS for the remaining VSAM files.
The Realization
Six months into Project Velocity, Yuki called a team retrospective. Her notes from that meeting:
What we assumed: - The mainframe is a black box. We call it, it responds. - Performance is deterministic and independent of other workloads. - Scaling means adding instances. - We don't need mainframe expertise on the API team.
What we learned: - The mainframe is an ecosystem. Performance depends on WLM, dispatcher, coupling facility, and interactions between address spaces. - Response time depends on what else is running on the same LPAR (and in a Sysplex, on other LPARs). - Scaling requires understanding CICS topology, DB2 thread management, and coupling facility capacity — not just adding instances. - We need at least one person on the team who understands z/OS architecture. Not a mainframe expert — but someone who can read SMF data, understand WLM classifications, and diagnose problems that cross the API-to-mainframe boundary.
The New Architecture
Carlos redesigned the Project Velocity architecture:
┌──────────┐ ┌─────────────┐ ┌──────────────┐ ┌──────────────┐
│ Mobile │───▶│ API Gateway │───▶│ z/OS Connect │───▶│ CICS AOR │
│ App │◀───│ (AWS) │◀───│ (dedicated │◀───│ (API-only, │
└──────────┘ └─────────────┘ │ LPAR, tuned │ │ dedicated │
│ conn pools, │ │ WLM class, │
│ AT-TLS) │ │ DB2 entry │
└──────────────┘ │ threads) │
└──────┬───────┘
│
┌──────┴───────┐
│ DB2 (data │
│ sharing grp) │
└──────────────┘
Key changes:
- Dedicated CICS AOR for API workload. Separate from batch and traditional online CICS regions. Its own WLM service class with a strict velocity goal.
- Connection pooling. z/OS Connect maintains persistent connections to CICS, eliminating per-request TLS handshake and CICS attachment overhead.
- DB2 entry threads. The API CICS region uses DB2 entry threads for its high-volume transactions, pre-allocated and reused.
- VSAM RLS for all shared files. No more local file definitions that can create coherence problems.
- AT-TLS (Application Transparent TLS). TLS is handled by the z/OS TCP/IP stack, not by z/OS Connect, reducing application-level TLS overhead.
- End-to-end monitoring. SMF records from z/OS Connect, CICS, and DB2 are fed to a centralized monitoring dashboard that Yuki's team can access. No more blind spots at the mainframe boundary.
Result: average API response time dropped to 85ms. Peak response time (during heavy batch periods) stays under 150ms. The mobile app launched to 3 million customers in Q4 2024.
The Cultural Shift
The most lasting impact of Project Velocity wasn't technical — it was organizational.
"We used to have a wall between the cloud team and the mainframe team," Yuki says. "The cloud team designed APIs. The mainframe team ran CICS. We communicated through tickets. That doesn't work when your API's performance depends on a WLM policy you've never heard of."
SecureFirst now requires every API developer to complete a one-week z/OS fundamentals course. Not to make them mainframe programmers — but to give them the vocabulary and mental model to participate in architectural discussions that span both worlds.
Carlos summarizes the lesson: "The best architects understand both worlds. I was a cloud-native architect who was blind to the mainframe. Now I'm a hybrid architect. And honestly, some of the things z/OS does — coupling facility locks, data sharing, WLM goal management — are things the cloud world is still trying to figure out with eventually-consistent databases and Kubernetes resource quotas. The mainframe solved these problems decades ago. We just didn't know it."
Kwame Mensah, who consulted informally with Yuki's team during the WLM incident, puts it differently: "The mainframe isn't legacy. It's the foundation. You can build whatever you want on top of it — REST APIs, mobile apps, cloud integration — but if you don't understand the foundation, your building will wobble."
Discussion Questions
-
The Black Box Assumption: Carlos initially treated the mainframe as a black box. In what specific ways did this assumption fail? Are there legitimate scenarios where treating a backend system as a black box is appropriate? Where is the line?
-
Response Time Decomposition: The initial 620ms response time broke down into seven components, with TLS handshake (180ms) and CICS attachment (85ms) being the largest. How could the team have discovered this decomposition before going into development? What z/OS tools or techniques would have revealed the overhead?
-
WLM and Multi-Tenant Systems: The "lunch rush" problem occurred because API requests and batch work shared a WLM service class. In a cloud-native architecture, how would you solve the equivalent problem (one workload starving another for resources)? How does the z/OS WLM approach compare to Kubernetes resource quotas and pod priorities?
-
Data Coherence: The VSAM file coherence problem (transfer not found on second AOR) is analogous to eventual consistency in distributed systems. But Carlos expected immediate consistency because he was used to PostgreSQL. How should API architects think about consistency guarantees when the backend is a z/OS system with VSAM and DB2? What guarantees does DB2 data sharing provide that VSAM without RLS does not?
-
Organizational Design: SecureFirst's "wall between cloud team and mainframe team" is common in enterprises. What organizational structures or processes would you implement to prevent the communication failures described in this case study? Is "every API developer takes a z/OS course" sufficient?
-
Cost of Ignorance: Estimate the total cost of the three problems described in this case study — in development time, delayed launch, and risk. How much of that cost could have been avoided with upfront z/OS architectural review?
-
Carlos's Observation: Carlos says the mainframe "solved these problems decades ago." Identify three specific technical capabilities from this chapter that the cloud/distributed world is still evolving solutions for. Is Carlos right, or is he oversimplifying? What does the cloud do better than the mainframe?