> "Every architect who's ever redesigned a CICS topology in production at 2 AM will tell you the same thing: the topology you draw on a whiteboard and the topology that survives contact with real workloads are never the same. Understanding why they...
In This Chapter
- Regions, Routing, Sysplex Coupling, and Multi-Region Operation (MRO)
- 13.1 Beyond EXEC CICS — Understanding CICS as a Platform
- 13.2 Region Topology — TOR, AOR, FOR and Why You Separate Them
- 13.3 Transaction Routing — Getting Work to the Right Place
- 13.4 MRO and ISC — Cross-Region Communication
- 13.5 CICSPlex SM — Managing the Enterprise
- 13.6 Sysplex-Aware CICS — High Availability at Platform Scale
- 13.7 Designing a CICS Topology — The Architecture Decision Framework
- Project Checkpoint: CICS Region Topology for the HA Banking System
- Production Considerations
- Summary
- Spaced Review
Chapter 13: CICS Architecture for Architects
Regions, Routing, Sysplex Coupling, and Multi-Region Operation (MRO)
"Every architect who's ever redesigned a CICS topology in production at 2 AM will tell you the same thing: the topology you draw on a whiteboard and the topology that survives contact with real workloads are never the same. Understanding why they diverge is the difference between a programmer and an architect." — Kwame Mensah, Chief Mainframe Architect, Continental National Bank
Welcome to Part III. Everything changes here.
Parts I and II built your foundation — Sysplex architecture, DB2 internals, batch optimization, JCL mastery. You wrote code. You tuned queries. You shaved hours off batch windows. All of that matters, and none of it prepares you for what CICS demands.
CICS is the beating heart of online transaction processing on z/OS. Every ATM withdrawal, every insurance claim lookup, every benefits query that happens in real time on a mainframe passes through CICS. At Continental National Bank, that means 500 million transactions per day. Not batch records — individual, real-time, human-initiated transactions where a customer is staring at a screen waiting for an answer.
This chapter is the gateway to CICS mastery. We will not write a single EXEC CICS SEND MAP command. Instead, we will think like architects — understanding the platform that makes those commands possible, the topology decisions that determine whether those commands execute in 50 milliseconds or 5 seconds, and the failure modes that determine whether those commands execute at all.
🚪 GATEWAY CHAPTER — This chapter establishes the architectural foundation for all of Part III. Chapters 14–18 assume you understand region topology, routing, MRO, and CICSPlex SM. If any section here feels unfamiliar after your first read, stop and re-read it before moving forward. The cost of a shaky foundation multiplies with every subsequent chapter.
13.1 Beyond EXEC CICS — Understanding CICS as a Platform
Most COBOL programmers encounter CICS as a set of API commands: EXEC CICS READ, EXEC CICS SEND, EXEC CICS RETURN. They learn to write programs that run "inside CICS." They learn pseudo-conversational design. They learn BMS maps. And they develop a mental model that is fundamentally incomplete: CICS as an application server that runs their programs.
That mental model will fail you at architect level.
🔄 THRESHOLD CONCEPT — CICS is a transaction manager, not an application server. Once you internalize this distinction, everything about region topology, routing, MRO, and recovery design clicks into place. Until you internalize it, every architectural decision will feel arbitrary.
What "Transaction Manager" Actually Means
An application server runs your code. A transaction manager ensures your work either completes entirely or doesn't happen at all — and it does this across multiple resources, multiple regions, and multiple systems simultaneously.
Consider what happens when a CNB customer transfers $5,000 from savings to checking at an ATM:
- The ATM terminal connects to a Terminal-Owning Region (TOR)
- The TOR routes the transaction to an Application-Owning Region (AOR)
- The AOR program reads the savings balance from DB2
- The AOR program decrements savings by $5,000
- The AOR program increments checking by $5,000
- The AOR program writes an audit record to a VSAM file
- The TOR sends the response back to the ATM
Steps 3–6 must be atomic. If the system crashes between steps 4 and 5, the customer has lost $5,000. CICS doesn't merely run steps 3–6 — it manages the transaction that encompasses all of them. It coordinates with DB2's lock manager, with VSAM's recovery log, with the journal, with the CICS system log. If anything goes wrong at any point, CICS rolls back every change across every resource manager.
That is what a transaction manager does. It doesn't just host your program — it guarantees the integrity of work that spans multiple resources, multiple programs, and potentially multiple regions and systems.
The Evolution to Enterprise Scale
CICS began in 1968 as a single-address-space system for interactive computing. For two decades, a "CICS system" meant one region running everything — terminals, applications, files, all in one address space.
That model broke under three pressures:
Storage constraint. A single CICS region runs in a single z/OS address space. Even with 64-bit addressing for data, the region's below-the-bar storage is finite. Pack too many programs, too many terminals, too many open files into one region, and you hit storage limits.
Failure isolation. In a single region, one badly behaved transaction can bring down the entire system. A storage overlay in one program corrupts the address space for every program. A runaway task consumes dispatcher time for every task. One bad apple spoils 500 million transactions.
Scalability. A single region has one dispatcher, one set of system resources, one set of limits. You can tune it, but you cannot make it serve unlimited workload. At some point, you must go horizontal.
The answer to all three pressures is the same: split one region into many, each with a specialized role.
💡 INSIGHT — The history of CICS architecture is a 50-year lesson in the same principle that drives microservices: decompose monoliths into specialized, isolated components. The mainframe community was doing this in the 1980s. The patterns are identical; only the vocabulary differs.
CICS Transaction Server — The Modern Platform
CICS Transaction Server (CICS TS) 5.6, the version running at CNB, is not your grandfather's CICS. It includes:
- Multi-region operation (MRO) for cross-region communication within a z/OS image
- Inter-system communication (ISC) for cross-system communication across LPARs or machines
- CICSPlex System Manager (CICSPlex SM) for enterprise-wide management and workload routing
- Liberty JVM server for running Java alongside COBOL in the same region
- JSON/REST interfaces for API-driven access from web and mobile channels
- Sysplex-aware features including shared temporary storage, named counter servers, and coupling facility integration
CNB uses all of these. Their CICS topology spans 4 LPARs, 16 regions, and processes 500 million transactions per day with sub-second average response time and 99.999% availability. Understanding how that topology works — and why it's designed the way it is — is what this chapter teaches.
The ACID Guarantee Across Boundaries
To truly grasp what "transaction manager" means in practice, consider the ACID properties — Atomicity, Consistency, Isolation, Durability — and how CICS enforces them across region boundaries.
Atomicity across regions. When a funds transfer involves a DB2 update in an AOR and a VSAM journal write function-shipped to a FOR, CICS coordinates both resource managers through its recovery manager. If the VSAM write fails after the DB2 update succeeds, CICS rolls back the DB2 update. This coordination happens through the two-phase commit protocol — CICS acts as the sync-point coordinator, sending PREPARE and COMMIT (or ROLLBACK) signals to every resource manager involved in the unit of work.
Isolation between tasks. CICS's task control manages concurrent execution within a region. Thousands of tasks run simultaneously in a single AOR, each isolated from the others through separate task control areas, separate task-lifetime storage, and separate recovery scopes. When task 47,232 abends, tasks 47,231 and 47,233 are unaffected — their storage, their enqueue positions, their DB2 thread allocations are untouched. This is not operating system process isolation (each task shares the same address space); it's transaction manager isolation enforced by CICS's internal dispatching and storage management.
Durability through logging. CICS writes every recoverable change to its system log before acknowledging the change to the application. The system log is a z/OS logger log stream, which can be duplexed to the coupling facility for Sysplex-wide durability. If a region crashes, the recovery manager replays the system log to determine which units of work completed (committed) and which were in-flight (rolled back). This is the write-ahead logging protocol, and it operates identically whether the recoverable resource is local or remote.
🔍 DEEP DIVE — The two-phase commit protocol is what distinguishes CICS from a simple program executor. In phase 1 (PREPARE), CICS asks every resource manager: "Can you commit your part of this work?" Each resource manager writes its changes to a log and responds YES or NO. In phase 2 (COMMIT or ROLLBACK), CICS issues the final decision based on all responses. If every resource manager said YES, CICS issues COMMIT. If any said NO, CICS issues ROLLBACK to all. This protocol works across MRO and ISC boundaries — a single unit of work can span resource managers in multiple regions on multiple LPARs, and CICS guarantees all-or-nothing.
This recovery architecture is invisible to most COBOL programmers. You write EXEC CICS SYNCPOINT and trust that it works. As an architect, you need to understand that SYNCPOINT triggers a potentially complex coordination across DB2 lock managers, VSAM recovery logs, CICS system logs, and coupling facility structures — and your topology design determines how efficiently that coordination occurs.
13.2 Region Topology — TOR, AOR, FOR and Why You Separate Them
The fundamental architectural decision in CICS is how to split work across regions. This section covers the three primary region types, why separation matters, and how CNB's production topology implements these principles.
The Three Region Types
Terminal-Owning Region (TOR) — Owns the connection to the end user. Whether that connection is a 3270 terminal, a web service request via CICS Web Services, or a JSON call from a mobile app through CICS TS's Liberty JVM server, the TOR is the front door. TORs do not run application logic. They accept requests, route them, and return responses.
Application-Owning Region (AOR) — Runs your COBOL programs. AORs contain the application logic, access databases through DB2 connections, and perform the business processing. They do not own terminals and ideally do not own files. AORs are the workhorses.
File-Owning Region (FOR) — Owns VSAM files and other CICS-managed resources. FORs handle file I/O requests from AORs via function shipping. They isolate file access so that file contention, enqueue waits, and file recovery do not impact the application regions.
⚠️ PRODUCTION WARNING — In textbook architectures, the three-way TOR/AOR/FOR split is clean. In production, you'll encounter hybrid regions, especially AORs that own some files directly. This is a pragmatic compromise — function shipping adds overhead, and for low-contention files accessed by a single AOR, the overhead isn't justified. The key is to make these compromises deliberately, not accidentally.
Why Separation Matters
Workload isolation. When a surge of web requests doubles the transaction rate, only the TORs and their target AORs feel the pressure. Your batch-supporting CICS regions, your file-owning regions, your administrative regions are unaffected. Without separation, a web surge could degrade everything.
Failure isolation. When an application defect causes an AOR to abend and require a restart, only that AOR's transactions are interrupted. The TOR detects the failure and routes subsequent transactions to a surviving AOR. Terminals stay connected. Users see a brief delay, not a disconnection. In a monolithic region, that same defect would drop every terminal.
Scalability. Need more application capacity? Start another AOR and add it to the routing pool. Need more terminal capacity? Add another TOR. Need more file throughput? Split files across FORs. Each tier scales independently.
Security. TORs that face the internet can run in a restricted security environment with minimal authority. AORs that access sensitive data can run under tightly controlled RACF profiles. FORs that own critical files can restrict access to authorized AORs only. Separation enables defense in depth.
CNB's 16-Region Topology
Let's examine CNB's production CICS topology across their 4 LPARs (SYSA, SYSB, SYSC, SYSD):
LPAR SYSA LPAR SYSB
┌─────────────────────────┐ ┌─────────────────────────┐
│ CNBTORA1 (TOR) │ │ CNBTORB1 (TOR) │
│ - 3270 terminals │ │ - 3270 terminals │
│ - ATM connections │ │ - ATM connections │
│ │ │ │
│ CNBAORA1 (AOR) │ │ CNBAORB1 (AOR) │
│ - Core banking │ │ - Core banking │
│ CNBAORA2 (AOR) │ │ CNBAORB2 (AOR) │
│ - Core banking │ │ - Core banking │
│ │ │ │
│ CNBFORA1 (FOR) │ │ CNBFORB1 (FOR) │
│ - Customer master VSAM │ │ - Customer master VSAM │
│ - Account index VSAM │ │ - Account index VSAM │
└─────────────────────────┘ └─────────────────────────┘
LPAR SYSC LPAR SYSD
┌─────────────────────────┐ ┌─────────────────────────┐
│ CNBTORC1 (TOR - Web) │ │ CNBTORD1 (TOR - API) │
│ - CICS Web Services │ │ - Liberty JVM server │
│ - Online banking portal │ │ - Mobile API gateway │
│ │ │ │
│ CNBAORC1 (AOR) │ │ CNBAORD1 (AOR) │
│ - Web transactions │ │ - API transactions │
│ CNBAORC2 (AOR) │ │ CNBAORD2 (AOR) │
│ - Web transactions │ │ - API transactions │
│ │ │ │
│ CNBFORC1 (FOR) │ │ CNBFORD1 (FOR) │
│ - Audit/journal VSAM │ │ - Session/token VSAM │
│ CNBSMGR (CICSPlex SM) │ │ │
└─────────────────────────┘ └─────────────────────────┘
Sixteen regions. Four LPARs. Notice the design decisions:
Channel separation. Traditional 3270/ATM traffic (SYSA, SYSB) is isolated from web traffic (SYSC) and mobile API traffic (SYSD). A DDoS against the mobile API cannot degrade ATM service.
AOR pairing. Each channel has two AORs. If one AOR fails, its partner absorbs the workload. The CICSPlex SM workload manager distributes transactions across both, and detects when one becomes unavailable.
FOR specialization. Different FORs own different file sets. CNBFORA1 and CNBFORB1 own the same logical files (customer master, account index) but each serves its local AORs, minimizing cross-LPAR function shipping. CNBFORC1 specializes in audit files. CNBFORD1 handles session state for mobile.
CICSPlex SM placement. The CICSPlex SM managing address space (CMAS) runs on SYSC alongside the web TOR. In production, CNB actually runs two CMASs for high availability — the second is on SYSD. We'll cover CICSPlex SM in detail in section 13.5.
🧩 ANCHOR EXAMPLE — Kwame Mensah designed this topology in 2019 when CNB migrated from CICS TS 5.4 to 5.6. The previous topology was 10 regions across 2 LPARs. The migration to 4 LPARs with 16 regions was driven by the addition of mobile banking (SYSD) and the need to isolate web traffic after a 2018 incident where a web application defect brought down an AOR that also served ATM transactions. "After that incident," Kwame says, "channel isolation became non-negotiable."
Region Naming Conventions
CNB's naming convention encodes role and location:
CNB TOR A 1
│ │ │ │
│ │ │ └─ Instance number within LPAR
│ │ └──── LPAR identifier (A=SYSA, B=SYSB, etc.)
│ └───────── Region type (TOR/AOR/FOR)
└────────────── Enterprise prefix
This convention is not cosmetic. When you're troubleshooting a CICS region at 3 AM, a name that immediately tells you the region type, which LPAR it's on, and which instance it is saves minutes — and minutes matter when 500 million daily transactions are flowing through the system.
💡 INSIGHT — Adopt a naming convention before you build your first region. Every name in your CICS topology — region names, connection IDs, transaction IDs, program names — should encode meaning. Cryptic names are technical debt that compounds with every new hire, every production incident, every audit.
13.3 Transaction Routing — Getting Work to the Right Place
A TOR receives a request. An AOR processes it. The mechanism that connects them is transaction routing — and it is the single most consequential decision in your CICS topology after region separation itself.
Basic Transaction Routing
In its simplest form, transaction routing uses a static definition. You define a remote transaction in the TOR that specifies a target AOR:
CEDA DEFINE TRANSACTION(XFER)
GROUP(CNBROUTE)
PROGRAM(CNBXFER0)
REMOTESYSTEM(CNBAORA1)
REMOTENAME(XFER)
This says: when transaction XFER arrives at this TOR, route it to the CICS system named CNBAORA1 and execute it there. Static. Predictable. And completely inadequate for production.
Why? Because if CNBAORA1 goes down, every XFER transaction fails. There is no failover, no load balancing, no awareness of whether CNBAORA1 is overloaded or idle. Static routing is a single point of failure.
Dynamic Transaction Routing
Dynamic routing replaces static definitions with a routing program — a COBOL program that CICS invokes for every routable transaction. The routing program examines the transaction, the available AORs, and the current conditions, then tells CICS where to send the work.
CICS invokes the dynamic routing program by setting the DTRPGM system initialization parameter:
DTRPGM=CNBROUT0
The routing program receives control through a standard COMMAREA containing the transaction ID, the terminal ID, and other routing context. It returns a target system name. The routing program can implement any logic: round-robin, least-loaded, affinity-based, time-of-day, transaction-type based — anything.
⚠️ PRODUCTION WARNING — Your dynamic routing program runs for every routable transaction. At CNB, that's approximately 5,800 transactions per second during peak. A routing program that takes 1 millisecond adds 5.8 seconds of accumulated CPU per second — effectively consuming an entire CPU. Keep routing programs lean. No database calls. No file I/O. Decision logic only, using in-memory data structures refreshed asynchronously.
CICSPlex SM Workload Management
For most enterprise installations, the answer to dynamic routing is CICSPlex SM (CPSM) workload management, not a custom routing program. CPSM provides:
Workload definitions. You define which transactions belong to which workload, and which AOR groups can serve each workload. A workload definition for CNB's core banking might specify that transactions XFER, INQY, DPST, and WDRL should route to any AOR in the CNBCORE group.
Health monitoring. CPSM continuously monitors AOR health — task count, CPU utilization, storage usage, response time. When an AOR becomes overloaded or fails, CPSM automatically stops routing to it.
Routing algorithms. CPSM supports several routing algorithms: - Queue algorithm — routes to the AOR with the shortest task queue. Best for general-purpose workloads where transactions have similar resource requirements. - Goal algorithm — routes to the AOR most likely to meet a response-time target. Integrates with z/OS Workload Manager (WLM) service classes. This is what CNB uses for core banking.
Affinity management. Some transactions must route to the same AOR for the duration of a pseudo-conversational sequence. CPSM manages these affinities automatically, creating them when needed and releasing them when the conversational sequence ends.
┌──────────┐ ┌─────────────────┐ ┌──────────┐
│ │ │ CICSPlex SM │ │ CNBAORA1 │
│ CNBTORA1 │─────▶│ Workload Mgr │────▶│ (AOR) │
│ (TOR) │ │ │ └──────────┘
│ │ │ - Health check │ ┌──────────┐
│ │ │ - Queue depth │────▶│ CNBAORA2 │
│ │ │ - WLM goals │ │ (AOR) │
│ │ │ - Affinity │ └──────────┘
└──────────┘ └─────────────────┘ ┌──────────┐
│ ────▶│ CNBAORB1 │
│ │ (AOR) │
▼ └──────────┘
Route to AOR with
best chance of
meeting WLM goal
🔍 DEEP DIVE — The goal algorithm is the crown jewel of CPSM routing. It queries z/OS WLM for each candidate AOR's current velocity — a measure of how much resource the AOR is receiving relative to what WLM says it should receive. An AOR with velocity > 1.0 is getting more resource than its goal requires, meaning it has headroom. CPSM preferentially routes to high-velocity AORs. This creates a feedback loop: WLM adjusts dispatching priority based on response time goals, and CPSM routes work based on WLM's assessment. The result is near-optimal workload distribution with no manual tuning.
Transaction Affinity — The Necessary Evil
Pseudo-conversational CICS programs — programs that EXEC CICS RETURN TRANSID to start a new transaction for the next screen interaction — create a problem for routing. The second transaction in the conversation may need access to data stored in the COMMAREA, a temporary storage queue, or a container created by the first transaction. If the second transaction routes to a different AOR than the first, that data is not there.
This is transaction affinity — the requirement that related transactions route to the same AOR.
Affinities are the enemy of workload balancing. Every affinity reduces your routing options. In the extreme, if every transaction has an affinity, you're back to static routing with no load balancing at all.
CNB manages affinities through three strategies:
-
Eliminate where possible. Convert COMMAREA-based pseudo-conversations to use named counters or DB2 for inter-transaction state. DB2 is accessible from any AOR; the affinity disappears.
-
Minimize duration. Where affinities are necessary, ensure they last only for the duration of the pseudo-conversational sequence, not longer. CPSM can define affinity lifetime as PSEUDOCONV — automatically released when the user presses a PF key that doesn't trigger a continuation transaction.
-
Partition by user population. When affinity is unavoidable and long-lived, partition users across AOR groups so that each group maintains its own affinity pool. This limits the blast radius of an AOR failure to one partition.
✅ BEST PRACTICE — Every affinity is technical debt. When you design a new CICS application, your first question should be: "Can I avoid creating any transaction affinities?" The answer is almost always yes, if you use external state stores (DB2, coupling facility data tables, shared temporary storage) instead of region-local resources. The performance cost of external state is real, but the operational benefit of affinity-free routing is enormous.
Distributed Routing Programs
In complex topologies, routing decisions may span multiple TORs and AOR groups. CICS supports distributed routing programs (DRP) that run in the AOR rather than the TOR. The TOR's routing program picks an AOR group; the AOR's DRP refines the routing decision within that group.
CNB uses this for their web channel. The TOR (CNBTORC1) routes web transactions to either the web AOR group or the API AOR group based on the incoming URL path. Within each group, CPSM workload management handles the final AOR selection.
13.4 MRO and ISC — Cross-Region Communication
Regions must communicate. A TOR must send transactions to AORs. An AOR must access files on a FOR. A program on one AOR may need to invoke a program on another AOR. The mechanisms for this communication are MRO and ISC.
MRO — Multi-Region Operation
MRO enables communication between CICS regions running on the same z/OS image (LPAR). It uses z/OS inter-region communication (IRC) through cross-memory services — the most efficient communication path available on z/OS.
MRO supports four types of cross-region interaction:
Transaction routing. Already covered in 13.3. The TOR sends a transaction request to a remote AOR, which executes the transaction and returns the response.
Function shipping. An application program issues a CICS command (READ, WRITE, DELETE, etc.) against a resource that is owned by another region. CICS automatically ships the request to the owning region, executes it there, and returns the result. The application program doesn't know the resource is remote — the function shipping is transparent.
AOR (CNBAORA1) FOR (CNBFORA1)
┌──────────────────┐ ┌──────────────────┐
│ EXEC CICS READ │ MRO function │ VSAM KSDS │
│ FILE('CUSTMST')│ ──── ship ─────▶ │ CUSTMST │
│ INTO(WS-REC) │ │ │
│ RIDFLD(WS-KEY) │ ◀─── response ── │ Record returned │
│ │ │ │
└──────────────────┘ └──────────────────┘
Distributed Program Link (DPL). A program in one region explicitly links to a program in another region using EXEC CICS LINK with the SYSID option:
EXEC CICS LINK
PROGRAM('CNBVALID')
SYSID('AORB1')
COMMAREA(WS-COMM-AREA)
LENGTH(WS-COMM-LEN)
END-EXEC
This is the CICS equivalent of a remote procedure call. Unlike function shipping, DPL is explicit — the programmer knows they're invoking a remote program.
Asynchronous processing. EXEC CICS START can initiate a transaction in a remote region. The initiating program continues without waiting for the remote transaction to complete.
MRO Configuration
MRO requires three configuration elements:
-
IRC startup. Each participating region must have IRC enabled in its System Initialization Table (SIT):
ISC=YES -
Connection definitions. Each region pair requires a CONNECTION definition:
CEDA DEFINE CONNECTION(AORA1) GROUP(CNBCONN) NETNAME(CNBAORA1) ACCESSMETHOD(IRC) PROTOCOL(EXCI) AUTOCONNECT(YES) -
Session definitions. Each connection requires a SESSIONS definition specifying the number of parallel sessions (conversations) allowed:
CEDA DEFINE SESSIONS(SAORA1) GROUP(CNBCONN) CONNECTION(AORA1) PROTOCOL(EXCI) MAXIMUM(050,025) SENDCOUNT(025) RECEIVECOUNT(025)
⚠️ PRODUCTION WARNING — The MAXIMUM parameter on SESSIONS is one of the most commonly misconfigured values in CICS. Too low, and MRO requests queue waiting for sessions, creating artificial bottlenecks. Too high, and you waste storage on session control blocks. The right value depends on peak concurrent cross-region requests. At CNB, Kwame sizes sessions at 2x the observed peak concurrent MRO requests for each connection, reviewed quarterly. "The cost of excess sessions is a few kilobytes of storage," he says. "The cost of insufficient sessions is a queue that turns into an outage."
ISC — Inter-System Communication
ISC enables communication between CICS regions on different z/OS images — different LPARs or different physical machines. ISC uses either:
- SNA/APPC (LU 6.2) — Traditional SNA communication. Reliable, well-understood, but requires VTAM definitions and network configuration.
- IP interconnectivity (IPIC) — TCP/IP-based communication introduced in CICS TS 4.1. Simpler to configure, lower overhead, and increasingly preferred.
At CNB, MRO handles all intra-LPAR communication, while IPIC handles cross-LPAR communication. The decision is straightforward: MRO is faster within an LPAR (cross-memory vs. TCP/IP), but MRO cannot cross LPAR boundaries.
LPAR SYSA LPAR SYSB
┌─────────────────┐ ┌─────────────────┐
│ CNBTORA1 ──MRO──▶ CNBAORA1 │ CNBTORB1 ──MRO──▶ CNBAORB1
│ MRO──▶ CNBAORA2 │ MRO──▶ CNBAORB2
│ MRO──▶ CNBFORA1 │ MRO──▶ CNBFORB1
│ │ │ │
│ CNBAORA1 ◀──IPIC──────────────────▶ CNBAORB1 │
│ CNBAORA2 ◀──IPIC──────────────────▶ CNBAORB2 │
└─────────────────┘ └─────────────────┘
IPIC Configuration
IPIC connections require TCPIPSERVICE and IPCONN definitions:
CEDA DEFINE TCPIPSERVICE(IPICLSN)
GROUP(CNBIPIC)
PORTNUMBER(1435)
PROTOCOL(IPIC)
BACKLOG(10)
STATUS(OPEN)
CEDA DEFINE IPCONN(AORB1)
GROUP(CNBIPIC)
HOST(10.1.2.100)
PORT(1435)
NETWORKID(CNBNET)
APPLID(CNBAORB1)
AUTOCONNECT(YES)
SENDCOUNT(025)
RECEIVECOUNT(025)
MRO vs. ISC — When to Use Which
| Criterion | MRO | ISC (IPIC) |
|---|---|---|
| Same LPAR | Yes — always prefer | Can, but slower |
| Cross-LPAR | No | Yes |
| Cross-sysplex | No | Yes |
| Performance | Best (cross-memory) | Good (TCP/IP) |
| Configuration | IRC + CONNECTION/SESSIONS | TCPIPSERVICE + IPCONN |
| Function shipping | Full support | Full support |
| DPL | Full support | Full support |
| Transaction routing | Full support | Full support |
| Security | z/OS cross-memory (trusted) | SSL/TLS optional |
💡 INSIGHT — A common architectural mistake is using ISC where MRO would work. If two regions are on the same LPAR, always use MRO. The performance difference is not subtle — MRO uses cross-memory services with zero network stack overhead, while ISC traverses the TCP/IP stack even for local connections. On CNB's workload, Kwame measured MRO at 0.08ms per round-trip vs. IPIC at 0.4ms for same-LPAR communication. At 5,800 transactions per second, that difference matters.
Function Shipping — Transparent but Not Free
Function shipping is CICS's most elegant cross-region mechanism and its most dangerous performance trap. A programmer writes:
EXEC CICS READ
FILE('CUSTMST')
INTO(WS-CUSTOMER-REC)
RIDFLD(WS-CUST-KEY)
RESP(WS-RESP)
END-EXEC
If CUSTMST is defined as a remote file on a FOR, CICS automatically ships the READ request to the FOR, executes it, and returns the record. The programmer doesn't know (or care) that the file is remote.
The danger: one function-shipped request incurs one MRO round-trip. If a program does 50 file operations against remote files, that's 50 MRO round-trips. At 0.08ms each, that's 4ms of pure communication overhead — possibly doubling the transaction's elapsed time.
The mitigation strategies:
- Mirror/replicate files to avoid function shipping for read-only lookups
- Use DPL instead of function shipping when multiple operations target the same FOR — ship the program, not the data
- Batch function-shipped operations using CICS channels and containers to reduce round-trips
- Move high-frequency file access to DB2 where data sharing eliminates the need for function shipping entirely
🔄 CROSS-REFERENCE — Chapter 1 covered z/OS Parallel Sysplex and CF data sharing. DB2 data sharing through the coupling facility enables any region on any LPAR to access DB2 data with no function shipping and no ISC overhead. This is why CNB moved their highest-volume data from VSAM to DB2 — not for relational features, but for data sharing.
13.5 CICSPlex SM — Managing the Enterprise
When you have 16 regions across 4 LPARs — or 160 regions across 40 LPARs, which is not unusual at the largest banks — managing individual regions is not viable. You need enterprise-level management. That's CICSPlex SM.
Architecture Overview
CICSPlex SM (CPSM) consists of:
CMAS (CICSPlex SM Address Space). A specialized CICS region that manages a group of CICS regions. The CMAS runs the CPSM logic — workload management, monitoring, resource management. CNB runs two CMASs for high availability.
MAS (Managed Application System). Any CICS region managed by a CMAS. CNB's 14 non-CMAS regions (TORs, AORs, FORs) are all MASs.
CICSplex. A logical grouping of managed regions. CNB defines one CICSplex encompassing all 16 regions.
CICS system groups. Logical subsets of a CICSplex. CNB defines groups like CNBTORS (all TORs), CNBAORS (all AORs), CNBCORE (core banking AORs), CNBWEB (web AORs), CNBAPI (API AORs).
┌─────────────────────────────────────────┐
│ CICSplex: CNBPLEX │
│ │
│ CMAS: CNBSMGR (SYSC) ◄──► CNBSMG2 (SYSD)
│ │
│ Group: CNBTORS │
│ CNBTORA1, CNBTORB1, CNBTORC1, │
│ CNBTORD1 │
│ │
│ Group: CNBCORE (AORs - Core Banking) │
│ CNBAORA1, CNBAORA2, │
│ CNBAORB1, CNBAORB2 │
│ │
│ Group: CNBWEB (AORs - Web) │
│ CNBAORC1, CNBAORC2 │
│ │
│ Group: CNBAPI (AORs - Mobile API) │
│ CNBAORD1, CNBAORD2 │
│ │
│ Group: CNBFILES (FORs) │
│ CNBFORA1, CNBFORB1, CNBFORC1, │
│ CNBFORD1 │
└─────────────────────────────────────────┘
Workload Management
CPSM workload management is the enterprise solution for transaction routing. Instead of custom routing programs in each TOR, you define workloads centrally:
Workload definition: Specifies which transactions belong to the workload.
WORKLOAD: CNBCORE-WKL
Transactions: XFER, INQY, DPST, WDRL, STMT, PMNT
Target scope: CNBCORE (system group)
Routing algorithm: GOAL
Affinity: PSEUDOCONV
Workload scope: Specifies which AOR group services the workload. CPSM routes only to healthy AORs within the scope.
Routing algorithm: QUEUE or GOAL, as described in 13.3.
Affinity rules: CPSM can detect and manage affinities automatically based on transaction patterns, or you can define explicit affinity rules.
Business Application Services (BAS)
BAS is CPSM's mechanism for managing resource definitions (programs, transactions, files, connections) across the CICSplex. Instead of defining resources individually in each region using CEDA, you define them once in CPSM and install them across groups of regions.
For example, when CNB deploys a new version of the CNBXFER0 program:
- Update the program definition in CPSM BAS
- CPSM installs the new definition across all AORs in the CNBCORE group
- CPSM can perform a phased install — updating one AOR at a time while others continue serving with the old version
This is CICS's version of rolling deployment, and it's critical for zero-downtime updates.
💡 INSIGHT — BAS eliminates the CSD (CICS System Definition) management nightmare. Without BAS, every resource change requires updating CSDs across every affected region — a manual, error-prone process. With BAS, you define once, deploy everywhere. At CNB, BAS reduced CICS deployment errors by 85% in its first year.
Real-Time Analysis (RTA) and Monitoring
Beyond workload management and resource deployment, CPSM provides Real-Time Analysis — a monitoring engine that continuously evaluates CICS region health against thresholds you define.
RTA evaluations include:
System availability. Is the MAS active? Is it accepting work? If a region becomes unresponsive, RTA raises an alert and can trigger automated actions — such as removing the region from routing pools or initiating a restart.
Resource thresholds. You define thresholds for DSA usage, MAXTASK utilization, MRO session usage, and other metrics. When a threshold is breached, RTA generates an external notification (WTO message, email, SNMP trap) and can invoke a CICS program for automated remediation.
Transaction health. RTA can monitor individual transaction response times and abend rates. If transaction XFER's abend rate exceeds 1%, RTA flags it immediately — potentially indicating a code defect in a recently deployed version.
At CNB, RTA feeds into their enterprise monitoring infrastructure through three mechanisms:
- WTO messages — captured by the z/OS automation product (IBM System Automation) for immediate operator notification
- SMF 110 records — forwarded to Splunk for historical analysis and trending
- SNMP traps — sent to the network operations center's monitoring dashboard for real-time visibility
Lisa Tran (DBA) worked with Kwame's team to correlate CPSM RTA alerts with DB2 performance data. When an AOR's response time degrades, the team can immediately determine whether the cause is CICS-internal (task queuing, storage pressure) or DB2-related (lock contention, buffer pool misses). This cross-domain correlation — CPSM monitoring plus DB2 instrumentation — is what enables CNB to diagnose production issues in minutes rather than hours.
⚠️ PRODUCTION WARNING — RTA thresholds require tuning. Set them too tight and you generate alert fatigue — operators ignore the alerts because they fire constantly during normal load spikes. Set them too loose and you miss genuine problems. CNB reviews RTA thresholds quarterly, adjusting based on the previous quarter's workload patterns. New thresholds start at 80% of the theoretical maximum and are tightened only after observing actual production behavior.
WLM Integration
CPSM integrates with z/OS Workload Manager to create a closed-loop performance management system:
- WLM service classes define response-time goals for CICS transactions (covered in Chapter 5)
- WLM dispatching adjusts CPU priority to meet those goals
- CPSM workload management routes transactions to AORs where WLM indicates the best chance of meeting goals
- CPSM monitoring detects when goals are missed and raises alerts
At CNB, the core banking workload has a WLM goal of 95% of transactions completing in under 200ms. CPSM monitors actual response times against this goal in real time and adjusts routing to maintain it. When Kwame sees the 95th percentile creeping toward 200ms, he knows it's time to evaluate whether an additional AOR is needed — or whether a code change has introduced a regression.
🔄 SPACED REVIEW: CHAPTER 5 — WLM service classes for CICS. Review how you defined service classes, classification rules, and velocity goals. Those definitions directly feed CPSM's goal-based routing algorithm. The routing is only as good as the WLM configuration.
13.6 Sysplex-Aware CICS — High Availability at Platform Scale
CICS TS 5.6 integrates deeply with z/OS Parallel Sysplex services. This integration transforms CICS from a collection of independent regions into a unified, highly available transaction processing platform.
Shared Temporary Storage
Traditional CICS temporary storage (TS) queues are region-local. Data written to a TS queue in one region is invisible to other regions. This creates affinity — if a pseudo-conversational program stores data in a TS queue, the next transaction must route to the same AOR.
Shared temporary storage uses the coupling facility to provide TS queues accessible from any region in the sysplex. The TS queue data resides in a coupling facility cache structure, not in any region's storage.
Configuration:
SIT parameters:
TSMODEL=CNBSHRD
TSMODEL definition:
CEDA DEFINE TSMODEL(CNBSHRD)
GROUP(CNBSHARE)
PREFIX(SH*)
LOCATION(SYSPLEX)
POOLNAME(CNBTSPOOL)
RECOVERY(YES)
Any TS queue whose name starts with "SH" will be stored in the coupling facility pool CNBTSPOOL, accessible from any CICS region connected to that pool.
✅ BEST PRACTICE — Use shared TS queues for all inter-transaction state in pseudo-conversational programs. This eliminates transaction affinity at the cost of coupling facility access latency (typically 10–20 microseconds). At CNB, converting 40 high-volume transactions to shared TS reduced the affinity count by 60%, dramatically improving workload distribution.
Named Counter Server
CICS named counters provide unique, sequentially numbered identifiers across the CICSplex. A named counter server uses the coupling facility to ensure that no two regions ever issue the same counter value — even under concurrent access from multiple AORs on multiple LPARs.
Use cases: - Transaction reference numbers - Audit sequence numbers - Customer interaction IDs
Without named counters, generating unique IDs across multiple AORs requires either DB2 sequences (higher overhead) or custom coordination logic (error-prone).
Coupling Facility Data Tables (CFDT)
CICS can store CICS-managed data tables in the coupling facility, making them accessible from any region. CFDTs combine the performance of in-memory tables with the accessibility of shared storage.
CNB uses CFDTs for: - Rate tables. Current exchange rates, interest rates — updated once per business day, read millions of times. Each AOR needs access; storing in a CFDT eliminates function shipping to a FOR. - Session state. Web session tokens and state, shared across web AORs so that a user's session survives AOR failover. - Reference data. Product codes, branch codes, transaction type codes — small, stable, frequently accessed.
CEDA DEFINE CFDTPOOL(CNBRATES)
GROUP(CNBCFDT)
CFDTPOOL(CNBRATES)
TABLENAME(RATETBL)
MAXNUMRECS(10000)
UPDATEMODEL(LOCKING)
RECOVERY(BACKOUTONLY)
LOADTYPE(CICS)
Data Sharing Integration
The deepest Sysplex integration is through DB2 data sharing, covered in Chapter 1. When CICS regions connect to a DB2 data sharing group, every AOR on every LPAR can access the same DB2 data through the coupling facility's lock structure and group buffer pool. No function shipping. No ISC. No data replication.
This is why CNB migrated their highest-volume transactional data from VSAM to DB2: not because relational was better for the data model (in many cases, KSDS was perfectly adequate), but because data sharing eliminated the cross-region communication overhead that VSAM function shipping imposed.
🔍 DEEP DIVE — The performance calculus between VSAM function shipping and DB2 data sharing depends on access patterns. For single-record random access (the ATM use case), DB2 data sharing typically wins because the coupling facility lock/cache path is faster than an MRO round-trip to a FOR. For sequential scans or batch-oriented access, VSAM often wins because local file access avoids coupling facility overhead entirely. CNB maintains both — DB2 for online, VSAM for batch.
High Availability Design Patterns
Combining Sysplex features with multi-region topology produces several HA patterns:
Pattern 1: Active/Active AOR pairs. Two AORs on different LPARs serve the same workload. CICSPlex SM distributes transactions across both. If one fails, the other absorbs the full workload. DB2 data sharing ensures both AORs see the same data.
This is CNB's primary pattern for core banking. CNBAORA1 (SYSA) and CNBAORB1 (SYSB) are an active/active pair. Under normal conditions, each handles approximately 50% of core banking transactions. If either fails, the survivor handles 100% — a pre-tested capacity scenario that runs during every quarterly DR exercise.
Pattern 2: Cross-LPAR TOR failover. If the TOR on SYSA fails, VTAM/TCPIP routes 3270 terminals and ATM connections to the TOR on SYSB. This requires either: - VTAM generic resources (for 3270/LU2) - Sysplex Distributor or external load balancer (for TCP/IP)
Pattern 3: CMAS failover. CNB runs two CMASs. If the primary CMAS fails, the secondary takes over CICSplex management with no interruption to transaction processing. CPSM stores its configuration in a coupling facility data structure, so the secondary CMAS has immediate access to all definitions.
Pattern 4: Region recovery with persistent data. When a CICS region restarts after a failure, it uses its system log and journal to recover in-flight transactions. Shared TS queues survive region failure because they're in the coupling facility. Named counters survive because they're in the coupling facility. The only data at risk is region-local TS queues and transient data.
⚠️ PRODUCTION WARNING — Every HA pattern increases complexity. CNB's topology includes 16 regions, 2 CMASs, coupling facility structures for shared TS/named counters/CFDTs, and DB2 data sharing. Each moving part is another potential failure point. The architect's job is not to maximize HA features — it's to select the HA features that address actual failure scenarios in your environment, and no more. Over-engineering availability is real and costly.
13.7 Designing a CICS Topology — The Architecture Decision Framework
This section synthesizes everything into a practical framework for designing CICS topologies. We'll apply it to two real scenarios: CNB's current topology (validation) and SecureFirst Retail Bank's modernization (design).
The Decision Framework
Every CICS topology decision flows from five questions:
1. What are the workload channels? Identify distinct sources of work: 3270 terminals, ATMs, web browsers, mobile apps, batch initiators, partner APIs. Each channel has different volume, latency requirements, and security posture. Each channel typically needs its own TOR or TOR group.
2. What are the failure domains? What must continue working if something fails? If the mobile app goes down, can ATMs still work? If a code defect corrupts one AOR, do all AORs go down? Define your failure domains, then draw region boundaries around them.
3. What are the performance tiers? Not all transactions need the same response time. A balance inquiry at an ATM needs sub-second response. A quarterly statement generation can take seconds. Different performance tiers may warrant different AOR groups with different WLM service classes.
4. What data is shared? If all AORs need the same data, use DB2 data sharing or coupling facility data tables. If AOR groups need different data, consider FORs or localized VSAM. Data sharing strategy drives the function shipping and ISC topology.
5. What must scale independently? If web traffic is growing 30% per year while 3270 traffic is flat, the web tier must scale without affecting the 3270 tier. Independent scaling requires independent region groups.
Applying the Framework: CNB Topology Validation
Let's validate CNB's existing topology against the framework:
| Question | CNB Answer | Topology Impact |
|---|---|---|
| Channels? | 3270, ATM, Web, Mobile API | 4 TOR groups (one per channel), one per LPAR |
| Failure domains? | ATM/3270 isolated from Web/API | Channel-specific LPARs (SYSA/B vs. SYSC/D) |
| Performance tiers? | Core banking < 200ms; Web < 500ms; API < 300ms | Separate AOR groups with different WLM goals |
| Shared data? | All channels access same customer/account data | DB2 data sharing across all LPARs |
| Independent scale? | API growing 50%/year; 3270 declining 5%/year | API on its own LPAR (SYSD) can scale to 4 AORs without touching others |
CNB's topology aligns cleanly with the framework. This isn't coincidence — Kwame used a similar analysis (less formalized) when he designed it.
Applying the Framework: SecureFirst Retail Bank
SecureFirst is a different challenge. They're a smaller bank adding mobile banking to an existing 3270/ATM CICS environment. Their current topology:
- 1 LPAR, 4 CICS regions (1 TOR, 2 AORs, 1 FOR)
- VSAM-based, no DB2
- No CICSPlex SM
- No Sysplex (single z/OS image)
Carlos Vega (API architect) and Yuki Nakamura (DevOps) need to add mobile API access without disrupting existing operations.
Applying the framework:
Channels: Adding mobile API alongside existing 3270/ATM. Two distinct channels.
Failure domains: Mobile must not impact 3270/ATM operations. Isolation required.
Performance tiers: Mobile API needs < 300ms; existing 3270 has < 500ms target. Different WLM profiles.
Shared data: Mobile and 3270 need the same customer data. Currently in VSAM on the FOR.
Independent scale: Mobile will grow rapidly; 3270 is stable.
Recommended topology:
Single LPAR (no Sysplex available)
┌──────────────────────────────────────────────┐
│ SFBTORA1 (TOR - 3270/ATM) [existing] │
│ SFBTORW1 (TOR - Web/API) [NEW] │
│ - Liberty JVM server for REST/JSON │
│ │
│ SFBAORA1 (AOR - Core banking) [existing] │
│ SFBAORA2 (AOR - Core banking) [existing] │
│ SFBAORW1 (AOR - API transactions) [NEW] │
│ SFBAORW2 (AOR - API transactions) [NEW] │
│ │
│ SFBFORA1 (FOR) [existing] │
│ │
│ SFBCMAS1 (CICSPlex SM CMAS) [NEW] │
└──────────────────────────────────────────────┘
Key decisions:
- Separate TOR for API. Isolates mobile traffic from 3270/ATM traffic at the entry point.
- Separate AOR group for API. Isolates API processing so that mobile load spikes don't impact 3270 response times.
- Shared FOR. Both AOR groups function-ship to the same FOR. This is a compromise — a second FOR would improve isolation, but SecureFirst's VSAM volume doesn't justify it. The FOR becomes a shared dependency.
- CICSPlex SM. Added to manage workload routing and monitoring across 9 regions. Without CPSM, managing dynamic routing and health monitoring for this many regions is operationally burdensome.
- Future: DB2 migration. The FOR is a bottleneck waiting to happen. As mobile volume grows, function shipping from 4 AORs to 1 FOR will create contention. The strategic answer is migrating high-volume data to DB2 — but that's a multi-quarter project. The current design buys time.
🧩 ANCHOR EXAMPLE — Carlos Vega's first instinct was to add the Liberty JVM server to the existing TOR and route API transactions to the existing AORs. "It's less work," he argued. Yuki pushed back: "What happens when we get our first 10,000-request mobile burst? The existing AORs are sized for 200 3270 users, not 10,000 concurrent API calls. We'll blow through the MAXTASK limit and queue 3270 transactions behind mobile requests." Carlos agreed to the separate AOR group. Within six months, they were glad they did — mobile volume exceeded projections by 3x.
Sizing Considerations
Region sizing is part art, part measurement. Key parameters:
MAXTASK. Maximum concurrent tasks in a region. CNB sets MAXTASK to 250 for AORs. The formula:
MAXTASK = (expected peak transactions/second) × (average response time in seconds) × safety_factor
Example for CNB AOR:
Peak TPS per AOR: 750
Average response time: 0.15 seconds
Safety factor: 2.2
MAXTASK = 750 × 0.15 × 2.2 = 247.5 → 250
DSA (Dynamic Storage Area). Below-the-bar storage for programs, control blocks, and user data. CNB allocates 256MB for AORs (DSALIM=256M). Size depends on program load and concurrent task storage requirements.
EDSALIM (Extended DSA Limit). Above-the-bar extended storage. CNB sets 2GB (EDSALIM=2048M) for AORs with Liberty JVM servers, 512MB for traditional AORs.
MRO session counts. As covered in 13.4, size at 2x observed peak concurrent MRO requests. Under-sizing creates queuing; over-sizing wastes minimal storage.
✅ BEST PRACTICE — Size based on measurement, not theory. Deploy your topology in a performance test environment with synthetic workload at 1.5x expected peak. Measure actual DSA usage, task counts, MRO session utilization, and response times. Adjust. Repeat. Theory gets you to a starting point; measurement gets you to production readiness.
Growth Planning
Design for three horizons:
Horizon 1 (0–12 months). Current workload plus planned projects. Your topology must handle this without changes.
Horizon 2 (1–3 years). Expected growth and known upcoming projects. Your topology should scale to this with parameter tuning and possibly additional AOR instances, but no architectural changes.
Horizon 3 (3–5 years). Strategic direction. If your organization is planning a Sysplex migration, a cloud hybrid strategy, or a major channel expansion, your topology should not preclude it. Avoid designs that create architectural dead-ends.
At CNB, Horizon 3 includes a second physical machine in a separate data center for disaster recovery. Kwame designed the topology so that each LPAR's region set is self-contained — moving SYSC and SYSD to a second machine requires only IPIC configuration changes between the two machines, not a topology redesign.
Project Checkpoint: CICS Region Topology for the HA Banking System
🔄 PROGRESSIVE PROJECT — The HA Banking Transaction Processing System. In Chapter 1, you designed the Sysplex infrastructure. In Chapter 5, you defined WLM service classes. Now you design the CICS topology that will host the banking application.
Requirements
Your HA banking system must support:
- 3270 terminals — 500 branch teller stations
- ATM network — 2,000 ATMs
- Mobile API — RESTful API for mobile banking app
- Core transactions — Balance inquiry, funds transfer, deposit, withdrawal, bill payment
- 99.99% availability — 52 minutes of unplanned downtime per year maximum
- Response time — 95% of core transactions under 200ms
- Peak volume — 5,000 transactions per second across all channels
- Growth — Mobile channel expected to grow 40% per year for 3 years
Your Topology Design Must Include
- A region topology diagram showing all regions, their types, and which LPAR they run on
- Channel separation strategy (how you isolate 3270, ATM, and mobile)
- AOR group definitions and sizing rationale
- MRO and ISC connection plan
- CICSPlex SM configuration (CMAS placement, system groups, workload definitions)
- Sysplex features used (shared TS, named counters, CFDTs) and why
- Failure scenario analysis — what happens when each region type fails?
- Growth plan — how does the topology scale for mobile growth?
Design Hints
- Two LPARs minimum for 99.99% availability (single LPAR = single point of failure for hardware)
- Size your AORs for 50% of peak capacity each, so either can absorb full load during failover
- Mobile API will be your largest channel within 2 years — design for that future, not today's volume
- Consider whether VSAM or DB2 is the right data store. Your decision here affects whether you need FORs and function shipping
📋 DELIVERABLE — Document your topology design in the format shown in code/project-checkpoint.md. Bring your design to the Chapter 14 practical lab, where you'll implement the CSD definitions for your chosen topology.
Production Considerations
Security Architecture for Multi-Region CICS
Each CICS region runs under its own z/OS user ID. Security architecture principles:
Principle of least privilege. TORs should have authority to route transactions, not to access data. AORs should have authority to access specific databases and files, not all of them. FORs should have authority to open their files, not to run application programs.
RACF CICS segment. Define RACF profiles for each CICS region that restrict which transactions, programs, and resources it can use. Use the CICS security domain (DFHAPPL) to control inter-region access.
Surrogate user processing. When a TOR routes a transaction to an AOR, the transaction should run under the end user's security identity, not the TOR's or AOR's identity. CICS supports this through ATTACHSEC=VERIFY on MRO connections, causing the AOR to verify the user's identity.
CEDA DEFINE CONNECTION(TORA1)
GROUP(CNBCONN)
NETNAME(CNBTORA1)
ACCESSMETHOD(IRC)
ATTACHSEC(VERIFY)
Audit trail. Every routed transaction should produce a security audit record showing: the originating TOR, the user identity, the target AOR, the transaction ID, and the outcome. CICS SMF 110 records capture this data, which feeds into CNB's SIEM (Security Information and Event Management) system.
⚠️ COMPLIANCE NOTE — For banking (PCI DSS, SOX) and healthcare (HIPAA), your CICS security architecture is auditable. Auditors will ask: Who can access which regions? How is user identity propagated across regions? How are privileged transactions (like force-post or override) controlled? Ahmad Rashidi at Pinnacle Health designs CICS security topologies specifically for HIPAA compliance — every region has a documented security profile reviewed quarterly.
Monitoring and Alerting
A 16-region topology requires automated monitoring. Key metrics to monitor:
| Metric | Threshold | Action |
|---|---|---|
| MAXTASK utilization | > 80% | Alert — risk of task queuing |
| DSA utilization | > 70% | Warning — potential short-on-storage |
| MRO session utilization | > 60% | Alert — sessions may exhaust |
| Transaction response time (95th percentile) | > WLM goal | Alert — investigate routing and AOR health |
| AOR health status | NOT ACTIVE | Critical — immediate routing adjustment |
| CMAS status | NOT ACTIVE | Critical — CPSM management impaired |
CICSPlex SM provides real-time monitoring through its WUI (Web User Interface) and API. CNB integrates CPSM monitoring with their enterprise Splunk instance through SMF 110 records and CPSM CICS Performance Analyzer exports.
Operational Runbooks and CICS Topology
A multi-region CICS topology is only as good as the runbooks that support it. At CNB, every region has an associated runbook that documents:
-
Startup sequence. Which regions must start first? FORs before AORs (so function shipping targets are available). CMASs before TORs (so workload routing is active before traffic arrives). AORs before TORs (so routing targets exist). The startup sequence is automated through z/OS System Automation, but the runbook documents the dependencies in case manual intervention is required.
-
Shutdown sequence. Reverse of startup, with controlled draining. TORs drain first (no new transactions accepted), then AORs (in-flight transactions complete), then FORs (pending I/O completes). Graceful shutdown takes 3–5 minutes; emergency shutdown is immediate but risks in-flight transaction recovery on restart.
-
Recovery procedures. For each failure type (region abend, LPAR failure, DB2 connection loss, MRO connection failure), the runbook specifies: symptoms, diagnostic commands, recovery actions, escalation criteria. Kwame insists that every runbook procedure be tested during quarterly DR exercises: "A runbook that hasn't been tested is fiction."
-
Capacity thresholds. When to add a region, when to increase MAXTASK, when to escalate to architecture team. These are documented with specific metrics and thresholds, not vague guidance.
🧩 ANCHOR EXAMPLE — At Pinnacle Health Insurance, Diane Okoye maintains separate runbooks for their claims processing CICS topology and their member services topology. After a 2023 incident where an operator followed the claims runbook for a member services region (different MAXTASK settings, different DB2 connections), Diane color-coded the runbooks and added region-name validation to each procedure. "The runbook is part of the topology," she says. "If operators can't execute it correctly under stress, you don't have high availability — you have high anxiety."
Common Topology Anti-Patterns
The mega-AOR. One AOR that runs everything. Easy to manage, impossible to scale, and any defect takes down all transactions. This is the monolith problem. At Federal Benefits Administration, Sandra Chen inherited a mega-AOR running 200+ transactions. Her first modernization step was splitting it into four AORs by functional domain — benefits calculation, eligibility verification, payment processing, and inquiry/reporting. The split alone reduced incident impact by 75% because a defect in one domain no longer affected the others.
The over-split topology. Too many specialized regions, each handling a handful of transactions. Management overhead explodes. MRO connections multiply quadratically (N regions = N×(N-1)/2 potential connections). Keep specialization to a meaningful level — by channel, by workload class, by security domain — not by individual transaction. A topology with 50 AORs each running 4 transactions is harder to manage than 5 AORs each running 40 transactions, and provides no additional failure isolation benefit.
The forgotten FOR. A FOR that was sized for 1998 volumes and hasn't been revisited. FORs are invisible until they become bottlenecks — they don't serve users directly, so nobody monitors them until response times degrade. Monitor function shipping response times to detect FOR degradation before users notice. At CNB, Kwame added FOR response time to the CICSPlex SM RTA monitoring dashboard after a 2020 incident where a forgotten FOR's VSAM file ran out of CA/CI splits, causing a 10-second response time for customer master lookups.
The static routing holdout. Regions that still use static routing "because it's simpler." Static routing is simpler until the target region fails, and then it's catastrophic. Any region routing more than 100 transactions per second should use dynamic routing. The cost of implementing CICSPlex SM is measured in days; the cost of a static-routing outage is measured in lost revenue and customer trust.
The orphaned CMAS. A CICSPlex SM CMAS running without a standby. When the CMAS fails, workload routing continues on last-known configuration (CPSM agents in each MAS cache their routing data), but new workload definitions can't be deployed, and health monitoring stops. Always run paired CMASs. The cost of a second CMAS is one additional CICS region (roughly 200MB of storage); the cost of losing CPSM management during an incident is the difference between a 5-minute recovery and a 45-minute recovery.
The affinity-laden topology. A topology where every transaction has affinities, effectively reducing CICSPlex SM to an expensive static router. This usually happens incrementally — each new pseudo-conversational transaction adds "just one more" affinity, until the routing pool is so constrained that workload balancing is impossible. Track your affinity count as a metric. If more than 20% of your transactions have affinities, you have an architectural problem that requires a remediation plan.
Summary
This chapter established the architectural foundation for CICS production mastery:
CICS is a transaction manager. Its core job is ensuring work completes atomically across multiple resources — not merely running your programs. This distinction drives every topology decision.
Region separation isolates workloads, failures, and scaling concerns. TOR/AOR/FOR separation is the fundamental CICS architecture pattern. Each region type has a specific role, and mixing roles creates coupling that will hurt you in production.
Transaction routing is the nervous system of your topology. Static routing is a dead end. Dynamic routing through CICSPlex SM workload management, integrated with z/OS WLM, provides intelligent, adaptive, health-aware routing that keeps transactions flowing even during partial failures.
MRO for intra-LPAR, ISC/IPIC for cross-LPAR. The choice is straightforward but the configuration details matter. Session counts, security propagation, and function shipping overhead are the operational realities that determine whether your cross-region communication helps or hurts.
CICSPlex SM is essential for enterprise scale. Managing individual regions doesn't scale. CPSM provides workload management, resource management, and monitoring across the entire CICSplex from a single management point.
Sysplex-aware CICS eliminates affinities and enables true HA. Shared temporary storage, named counters, coupling facility data tables, and DB2 data sharing transform CICS from a collection of independent regions into a unified platform.
Design with a framework. Five questions — channels, failure domains, performance tiers, shared data, independent scale — drive every topology decision. Apply them rigorously and your topology will be sound.
🚪 LOOKING AHEAD — Chapter 14 takes you from architecture to implementation. You'll define CSD resources, configure SIT parameters, establish MRO connections, and bring up a multi-region topology. The design you created in this chapter's project checkpoint becomes the blueprint you build from.
Spaced Review
From Chapter 1: z/OS Parallel Sysplex
- How does the coupling facility enable DB2 data sharing, and why does this matter for CICS multi-region topologies?
- What is the role of XCF (Cross-system Coupling Facility) in enabling ISC communication between CICS regions on different LPARs?
- How does Sysplex Distributor route TCP/IP connections to CICS TORs across multiple LPARs?
From Chapter 5: WLM Service Classes
- How do WLM service class definitions for CICS transactions interact with CICSPlex SM's goal-based routing algorithm?
- If you define a WLM service class with a 200ms response-time goal for core banking transactions, what happens at the WLM level when those transactions are routed to an overloaded AOR?
- How would you configure WLM velocity goals differently for TOR regions versus AOR regions, and why?
From This Chapter
- Explain the threshold concept: why does understanding CICS as a transaction manager (not an application server) change how you design topologies?
- Walk through what happens — at the region level — when a CNB customer performs an ATM withdrawal that requires access to a VSAM file on a FOR.
- Why is transaction affinity the enemy of workload balancing, and what three strategies does CNB use to manage it?
- Compare MRO and ISC: when is each appropriate, and what are the performance implications of choosing incorrectly?
Next chapter: Chapter 14 — CICS Resource Definitions and System Configuration: From CSD to BAS
Related Reading
Explore this topic in other books
Advanced COBOL CICS Web Services Learning COBOL CICS Fundamentals Intermediate COBOL CICS Fundamentals