Case Study 2: SecureFirst's CICS Modernization
Adding Web-Facing Regions for Mobile Banking on a Single-LPAR Environment
Background
SecureFirst Retail Bank is a regional bank with 180 branches, 800 ATMs, and 1.2 million retail customers. Their mainframe environment is modest by enterprise standards: a single z14 with one LPAR, no Parallel Sysplex, and a CICS topology that hasn't changed materially in a decade.
The business is launching a mobile banking application. Carlos Vega (API architect) and Yuki Nakamura (DevOps lead) must add RESTful API access to the existing CICS-based banking applications — without disrupting the 3270 and ATM services that currently process 120,000 transactions per day.
This case study follows their architecture decisions, the constraints imposed by a single-LPAR environment, and the compromises they made between ideal architecture and practical reality.
Starting Point: The Existing Topology
Single LPAR — z14 Model 703
┌─────────────────────────────────┐
│ SFBTORA1 (TOR) │
│ - 800 3270 terminals │
│ - 800 ATM connections │
│ - Static routing to AORs │
│ │
│ SFBAORA1 (AOR) │
│ - All transactions (A–M) │
│ - MAXTASK=100 │
│ - DB2 connection │
│ - 12 local VSAM files │
│ │
│ SFBAORA2 (AOR) │
│ - All transactions (N–Z) │
│ - MAXTASK=100 │
│ - DB2 connection │
│ - 12 local VSAM files │
│ │
│ SFBFORA1 (FOR) │
│ - 8 shared VSAM files │
│ - Customer master │
│ - Account master │
│ │
│ No CICSPlex SM │
│ No Sysplex features │
│ Custom routing program (1997) │
└─────────────────────────────────┘
Immediate Problems
Static transaction routing. Transactions A–M go to SFBAORA1; N–Z go to SFBAORA2. This was implemented in 1997 when the workload was smaller. The alphabetical split was arbitrary and has created a persistent imbalance: AOR1 runs at 65% utilization while AOR2 runs at 40%, because high-volume transactions (ACCT, DPST, BLNC) fall in the A–M range.
No CICSPlex SM. Health monitoring is manual. If SFBAORA1 goes down, the operator must manually update the routing program to send A–M transactions to SFBAORA2. Mean time to redirect: 8–12 minutes, depending on operator availability.
Local VSAM files on AORs. Each AOR owns 12 VSAM files locally. Some of these files are duplicated between AORs (reference data), some are partitioned (each AOR owns its alphabetical segment of transaction logs). This creates data inconsistency risks and makes AOR failover complex — if SFBAORA1 fails, its local files are unavailable until it restarts.
No Sysplex. No coupling facility, no shared TS, no named counters, no data sharing. Every Sysplex-aware feature discussed in section 13.6 is unavailable.
Requirements for Mobile Banking
Carlos Vega gathered the following requirements from the product team:
- RESTful JSON API for mobile app — balance inquiry, funds transfer, bill payment, transaction history
- Target volume: 500 TPS at launch, growing 40% per year
- Response time: 95% under 300ms
- Availability: 99.9% (8.7 hours unplanned downtime per year)
- Security: OAuth 2.0 token-based authentication, TLS 1.2 minimum, PCI DSS compliance
- No changes to existing COBOL programs — the same business logic must serve 3270, ATM, and mobile
The No-Code-Change Constraint
This constraint is critical. SecureFirst's COBOL programs were written over 25 years by developers who have since retired. The programs work. They are tested by 20 years of production use. Modifying them risks introducing defects in code that nobody fully understands.
The architecture must invoke the existing programs from the mobile channel without modifying them. CICS TS 5.6 supports this through the Liberty JVM server, which can expose COBOL programs as RESTful services using CICS channels and containers or COMMAREA-based DPL (Distributed Program Link).
Architecture Decision Process
Applying the Five-Question Framework
1. What are the workload channels? Three channels: 3270/ATM (existing), mobile API (new). The 3270 and ATM channels are stable; mobile is new and growing rapidly.
2. What are the failure domains? Mobile must not impact 3270/ATM. A code defect in the API layer, a mobile traffic surge, or a security incident on the mobile channel must not degrade branch or ATM operations.
3. What are the performance tiers? 3270/ATM: current target is < 500ms (informal). Mobile API: < 300ms (formal SLA with mobile app team). Different targets suggest different WLM service classes — but without Sysplex, WLM integration with CPSM routing is limited to single-image capabilities.
4. What data is shared? All channels access the same customer/account data. Currently split between DB2 (transactional data) and VSAM on the FOR (customer/account master). Without data sharing, all AORs must access data through the same DB2 instance and the same FOR.
5. What must scale independently? Mobile will grow 40% per year. 3270 is flat. ATM is declining slightly. Mobile must scale without affecting the traditional channel.
The Proposed Topology
Carlos and Yuki's initial proposal:
Single LPAR — z14 Model 703
┌───────────────────────────────────────────┐
│ SFBTORA1 (TOR - 3270/ATM) [EXISTING] │
│ - 3270 terminals │
│ - ATM connections │
│ - Dynamic routing via CPSM │
│ │
│ SFBTORW1 (TOR - Mobile API) [NEW] │
│ - Liberty JVM server │
│ - REST/JSON endpoint │
│ - OAuth 2.0 token validation │
│ - TLS 1.2 termination │
│ │
│ SFBAORA1 (AOR - Core banking) [EXISTING] │
│ - MAXTASK=150 (increased from 100) │
│ - DB2 connection │
│ SFBAORA2 (AOR - Core banking) [EXISTING] │
│ - MAXTASK=150 (increased from 100) │
│ - DB2 connection │
│ │
│ SFBAORW1 (AOR - API transactions) [NEW] │
│ - MAXTASK=200 │
│ - DB2 connection │
│ SFBAORW2 (AOR - API transactions) [NEW] │
│ - MAXTASK=200 │
│ - DB2 connection │
│ │
│ SFBFORA1 (FOR) [EXISTING] │
│ - Customer master VSAM │
│ - Account master VSAM │
│ │
│ SFBCMAS1 (CICSPlex SM CMAS) [NEW] │
│ - Workload management │
│ - Health monitoring │
│ - BAS resource management │
└───────────────────────────────────────────┘
Total: 9 regions on 1 LPAR (up from 4 regions)
Key Design Decisions and Trade-offs
Decision 1: Separate API TOR with Liberty JVM Server
The Liberty JVM server hosts the REST/JSON interface layer. It receives HTTP requests, validates OAuth tokens, extracts parameters, and invokes the COBOL programs via CICS channels/containers or COMMAREA-based EXEC CICS LINK.
Why a separate TOR? The Liberty JVM server consumes significant resources (JVM heap, thread pools, SSL processing). Running it in the same region as the 3270/ATM TOR would create resource contention between the traditional terminal handling (low CPU, low memory) and the API handling (higher CPU for JSON parsing and TLS, higher memory for JVM heap).
Trade-off accepted: Two TORs on one LPAR means both share the same hardware. A CPU-intensive mobile surge will still affect the traditional channel at the hardware level, even though the CICS regions are isolated. True hardware isolation requires a second LPAR (or a second machine), which SecureFirst cannot justify yet.
Carlos documented this trade-off explicitly: "We accept shared hardware risk. If mobile volume grows beyond 2,000 TPS, we will need a second LPAR. The current architecture supports this migration — add a second LPAR, move SFBTORW1 and the API AORs to it, change MRO to IPIC for cross-LPAR connections."
Decision 2: Dedicated API AOR Group
SFBAORW1 and SFBAORW2 are dedicated to mobile API transactions. They run the same COBOL programs as the core banking AORs, but they are separate CICS regions with separate MAXTASK limits, separate DSA allocations, and separate WLM service classes.
Why separate? The same reason CNB separates channels: failure isolation and independent scaling. If a mobile API defect causes SFBAORW1 to abend, SFBAORA1 and SFBAORA2 continue serving 3270/ATM transactions without interruption.
Why MAXTASK=200 for API AORs (vs. 150 for core AORs)? Mobile API transactions are stateless, short-lived, and individually lighter than 3270 pseudo-conversational transactions. Each API request is a single EXEC CICS LINK — no BMS maps, no pseudo-conversational state. More concurrent tasks can run with less per-task resource consumption.
Decision 3: Shared FOR — The Compromise
All four AORs (2 core, 2 API) function-ship to SFBFORA1. This means the FOR is a shared dependency across both channels — a violation of the channel isolation principle.
Why accept this? Adding a second FOR would require either duplicating the VSAM files (creating data consistency challenges without Sysplex data sharing) or splitting the files (customer master on one FOR, account master on another, which doesn't reduce cross-channel dependency).
Mitigation: The team monitors FOR response time separately for core and API function-shipped requests. If FOR latency impacts one channel, they can temporarily drain the other channel's requests to reduce FOR load — a manual intervention, but a documented one.
Strategic direction: Migrate the high-volume VSAM files (customer master, account master) to DB2 within 18 months. Once in DB2, both AOR groups access the data directly through their DB2 connections. The FOR handles only low-volume VSAM files (audit logs, configuration data), reducing its criticality as a shared dependency.
Decision 4: CICSPlex SM — Single CMAS
SFBCMAS1 provides workload management, health monitoring, and BAS for all 8 managed regions. With a single LPAR, only one CMAS is feasible — there's no second LPAR to host a standby.
Risk acknowledged: If the CMAS fails, routing continues on cached data (MAS agents), but health monitoring and new deployments are suspended until the CMAS restarts. The CMAS is configured for automatic restart with a startup time under 60 seconds.
Mitigation: The MAS agents in each region perform local health checks (DSA usage, MAXTASK utilization) and can self-remove from routing pools without CMAS direction. This provides a safety net during the CMAS restart window.
Decision 5: VSAM File Consolidation on AORs
The 12 local VSAM files on each AOR were analyzed:
- 4 files: reference data (product codes, rate tables) — migrated to coupling facility data tables... except SecureFirst has no coupling facility. Instead, these were moved to DB2 reference tables accessible from all AORs.
- 4 files: transaction logs partitioned by AOR — consolidated on the FOR to eliminate the AOR-local file dependency and simplify failover.
- 4 files: temporary work files — converted to CICS temporary storage queues (region-local is acceptable since these are intra-transaction, not inter-transaction).
After consolidation, the AORs own no VSAM files. This means an AOR restart is clean — no file recovery, no enqueue resolution, no data consistency checks. Restart time dropped from 90 seconds to 15 seconds.
The API Integration Pattern
The mobile API flow through the new topology:
Mobile App
│
▼ HTTPS (TLS 1.2)
┌──────────────┐
│ SFBTORW1 │ 1. TLS termination
│ (API TOR) │ 2. OAuth 2.0 token validation
│ Liberty JVM │ 3. JSON parsing
│ │ 4. Map to CICS transaction
└──────┬───────┘
│ MRO (transaction routing via CPSM)
▼
┌──────────────┐
│ SFBAORW1 │ 5. EXEC CICS LINK to COBOL program
│ (API AOR) │ 6. COBOL program executes business logic
│ │ 7. DB2 access (direct)
│ │ 8. VSAM access (function shipped to FOR)
└──────┬───────┘
│ MRO (function shipping)
▼
┌──────────────┐
│ SFBFORA1 │ 9. VSAM READ/WRITE
│ (FOR) │ 10. Return data to AOR
└──────────────┘
│
▼ (back through AOR → TOR → mobile app)
The COBOL programs don't know they were invoked from a mobile app. They receive a COMMAREA (or CICS channel/container) with the same structure used by the 3270 interface. The Liberty JVM server in the API TOR handles the translation between REST/JSON and COMMAREA/container.
Example: Balance Inquiry
The mobile app sends:
POST /api/v1/accounts/balance
{
"account_number": "4521-8890-1234",
"auth_token": "eyJhbGciOiJSUzI1NiIs..."
}
The Liberty JVM server: 1. Validates the OAuth token against the token service 2. Extracts the account number 3. Populates the COMMAREA for program SFBBLNC0 4. Issues EXEC CICS LINK PROGRAM('SFBBLNC0') with the COMMAREA 5. CPSM routes the LINK to SFBAORW1 or SFBAORW2 6. SFBBLNC0 executes — reads account balance from DB2, reads hold information from VSAM (function shipped to FOR) 7. SFBBLNC0 returns the COMMAREA with balance data 8. Liberty JVM server maps the COMMAREA to JSON response
{
"account_number": "4521-8890-1234",
"available_balance": 4521.83,
"current_balance": 4871.83,
"holds": [
{"amount": 350.00, "description": "Pending deposit", "date": "2025-11-14"}
],
"as_of": "2025-11-15T14:32:01Z"
}
Zero changes to SFBBLNC0. The program that has run correctly for 15 years continues to run correctly — it just has a new caller.
Security Architecture
PCI DSS Compliance on a Single LPAR
PCI DSS requires network segmentation between cardholder data environments and other systems. On a single LPAR, "segmentation" is achieved through:
RACF profiles per region. Each CICS region runs under a unique RACF user ID with specific authorities: - SFBTORW1: authority to accept TCP/IP connections and route transactions. No DB2 authority. No VSAM authority. - SFBAORW1/W2: authority to access DB2 tables required for mobile transactions. VSAM access only through function shipping (RACF profile on FOR controls file access). - SFBFORA1: authority to open specific VSAM files. No DB2 authority. No transaction execution authority.
CICS security domains. Transactions defined in the API AORs are restricted to the programs required for mobile banking. A compromised API AOR cannot execute batch programs, administrative transactions, or programs not explicitly defined in its CSD.
TLS termination at the TOR. The API TOR terminates TLS and validates OAuth tokens before any transaction reaches an AOR. Invalid requests are rejected at the TOR level — they never reach the application.
Audit logging. Every API transaction generates an SMF 110 record with: timestamp, user identity (derived from OAuth token), transaction ID, target AOR, response code, and elapsed time. These records feed into SecureFirst's SIEM for real-time threat detection.
OAuth 2.0 Token Flow
Mobile App ──(1)──▶ OAuth Server (separate system)
◀──(2)── Access token (JWT, 15-min expiry)
Mobile App ──(3)──▶ SFBTORW1 (API TOR)
│
├─ (4) Validate JWT signature (public key in Liberty keystore)
├─ (5) Check token expiry
├─ (6) Extract user identity (subject claim)
├─ (7) Route transaction with user identity
│
▼
SFBAORW1 (AOR) ── (8) Execute under user's RACF identity
The API TOR does not contact the OAuth server for every request — it validates the JWT locally using the OAuth server's public key. Token revocation is handled by short expiry times (15 minutes) and a cached revocation list refreshed every 60 seconds.
Capacity Planning and Growth Strategy
Year 1 Capacity
| Channel | Peak TPS | AOR Capacity | Utilization |
|---|---|---|---|
| 3270/ATM | 300 | 2 AORs × 150 MAXTASK = 300 effective | ~100% at peak |
| Mobile API | 500 | 2 AORs × 200 MAXTASK = 400 effective | ~125% at peak* |
*The API AOR MAXTASK of 200 per region gives 400 total, but API transactions are so short-lived (avg 40ms) that actual concurrent tasks at 500 TPS = 500 × 0.04 = 20. The effective limit is CPU, not MAXTASK. Each AOR can handle 500 TPS individually.
Growth Projection
| Year | Mobile TPS | AOR Requirement | Topology Change Needed |
|---|---|---|---|
| 1 | 500 | 2 AORs (current) | None |
| 2 | 700 | 2 AORs (headroom sufficient) | None |
| 3 | 980 | 2 AORs (approaching limit) | Evaluate 3rd API AOR |
| 4 | 1,372 | 3 AORs | Add SFBAORW3 |
| 5 | 1,921 | 3–4 AORs | Consider 2nd LPAR |
The Sysplex Decision Point
At approximately 2,000 TPS across all channels (projected year 4–5), SecureFirst will face a fundamental constraint: a single LPAR cannot provide 99.99% availability because it is a single point of hardware failure.
At that point, SecureFirst must decide: 1. Add a second LPAR on the same machine. Provides some isolation but shared hardware (power supply, cooling, I/O subsystem). Requires z/OS Sysplex configuration. 2. Add a second machine. Full hardware isolation. Requires Sysplex with coupling facility. 3. Hybrid cloud. Offload the mobile API to a cloud-hosted service that calls back to the mainframe for data. Reduces mainframe load but introduces network latency and a new failure domain.
Carlos and Yuki documented option 1 as the 3-year plan and option 2 as the 5-year plan. The current topology is designed to make both migrations straightforward: the API TOR and AORs move to the second LPAR, MRO connections become IPIC connections, and CICSPlex SM gains a second CMAS on the new LPAR.
Lessons for Smaller Shops
SecureFirst's experience illustrates principles that apply to any organization adding modern channels to a traditional CICS environment:
-
Separate the new channel's TOR and AORs from day one. The cost of adding regions is low. The cost of untangling a shared-region problem under production pressure is high.
-
Deploy CICSPlex SM even if you think you don't need it. SecureFirst went from "we only have 4 regions, we don't need CPSM" to "we have 9 regions and CPSM is essential for managing dynamic routing, health monitoring, and deployments." The management overhead of 9+ regions without CPSM is unsustainable.
-
Design for the next migration, not just today's deployment. Every decision Carlos and Yuki made was evaluated against the question: "Will this make the eventual Sysplex migration harder or easier?" Naming conventions that encode LPAR identity. MRO connections that can be replaced with IPIC. AOR configurations that are portable across LPARs. These choices cost nothing today and save months later.
-
The FOR is your hidden bottleneck. In a VSAM-based environment without data sharing, the FOR is the shared dependency you can't eliminate. Monitor it aggressively. Plan the DB2 migration for your highest-volume files. The FOR should be a shrinking component of your architecture, not a growing one.
-
Security is architecture, not a layer. PCI DSS compliance drove several of SecureFirst's topology decisions: TOR separation (so the API entry point has no data access), AOR RACF profiles (so compromised regions have limited blast radius), and audit logging (so every API call is traceable). These weren't afterthoughts — they were design inputs.
Discussion Questions
-
Carlos chose to have the Liberty JVM server in the API TOR rather than in the API AORs. What would change if the JVM server were in the AORs instead? What are the trade-offs?
-
SecureFirst's shared FOR is a known compromise. Design an alternative architecture that eliminates the shared FOR dependency without Sysplex features. What additional complexity does your alternative introduce?
-
The "no code change" constraint forced a specific integration pattern (COMMAREA-based DPL from Liberty). What capabilities would be available if SecureFirst were willing to modify their COBOL programs? Would the modifications be worth the risk?
-
Compare SecureFirst's single-LPAR, 9-region topology with CNB's 4-LPAR, 16-region topology. Identify three specific architectural compromises SecureFirst made that CNB avoided, and for each, assess whether the compromise is acceptable given SecureFirst's scale.
-
SecureFirst projects needing a Sysplex in 4–5 years. What preparations should they begin now (in the single-LPAR phase) to minimize the future Sysplex migration effort?