> "The difference between a collection of components and a system is architecture. The difference between architecture and production is a thousand decisions made well." — Kwame Asante, CNB Chief Architect, addressing the 2024 Architecture Review...
Learning Objectives
- Synthesize all 37 chapters into a complete system architecture document
- Present an architecture to a review board and defend design decisions
- Create production runbooks, DR procedures, and operational documentation
- Calculate capacity requirements, TCO, and ROI for the complete system
- Deliver a portfolio-quality architecture deliverable
In This Chapter
- 38.1 The Requirements: PinnaclePay National Payment Platform
- 38.2 System Architecture Overview: The Complete Picture
- 38.3 z/OS Environment: The Foundation
- 38.4 Data Architecture: Where the Money Lives
- 38.5 Online Processing: CICS Topology for Payments
- 38.6 Integration Layer: MQ and API Architecture
- 38.7 Batch Architecture: The ACH Processing Engine
- 38.8 Security and Compliance: Protecting the Money
- 38.9 Operational Excellence: Keeping It Running
- 38.10 Modernization Roadmap: The Path Forward
- 38.11 Architecture Review Presentation: Defending Your Design
- 38.12 The Complete Deliverable: Your Portfolio Artifact
- 38.13 Bringing It All Together: The Morning Wire Run
- Chapter Summary
Chapter 38: Capstone — Architecting a High-Availability Payment Processing System from Requirements to Production
"The difference between a collection of components and a system is architecture. The difference between architecture and production is a thousand decisions made well." — Kwame Asante, CNB Chief Architect, addressing the 2024 Architecture Review Board
You have spent thirty-seven chapters learning the individual disciplines of enterprise COBOL development on z/OS. You have studied the operating system, the database, the transaction monitor, the messaging layer, batch processing, security, operations, and modernization. Each chapter gave you a piece. This chapter asks you to assemble the entire puzzle.
This is not an exercise in repetition. This is a synthesis — the hardest intellectual task in systems engineering. You must take everything you have learned and produce a coherent, defensible, production-ready architecture for a high-availability payment processing system. The kind of system that moves real money for real people, where a four-hour outage makes the evening news and a data breach ends careers.
I have sat through hundreds of architecture reviews over twenty-five years. I can tell you exactly what separates the presentations that get approved from the ones that get sent back for rework: completeness, coherence, and honesty. The approved architectures acknowledge their weaknesses. The rejected ones pretend they have none.
By the end of this chapter, you will have produced a portfolio-quality architecture deliverable — the kind of document that gets you hired as a senior architect, the kind of document that gets a $40 million project funded, the kind of document that, most importantly, results in a system that actually works in production.
Let us begin.
38.1 The Requirements: PinnaclePay National Payment Platform
38.1.1 Business Context
Pinnacle Financial Group — the institution where Diane Chen serves as VP of Technology and Ahmad Rashid leads the mainframe modernization effort — has decided to build a new national payment processing platform called PinnaclePay. The platform must handle three payment types that together represent the majority of U.S. electronic payment volume:
ACH (Automated Clearing House): Batch-oriented payments including direct deposit, bill pay, and account-to-account transfers. The Federal Reserve and EPN (Electronic Payments Network) serve as operators. Settlement occurs in batches, typically same-day or next-day. Volume: approximately 3 million transactions per day.
Wire Transfers: High-value, time-critical transfers processed through Fedwire (domestic) and SWIFT (international). Each transaction may represent millions of dollars. Real-time gross settlement — each payment settles individually. Volume: approximately 200,000 transactions per day.
Real-Time Payments (RTP): The newest rail, operated by The Clearing House, with the Federal Reserve's FedNow service as a second operator. Payments settle in seconds, 24/7/365. Volume: approximately 1.8 million transactions per day, growing at 40% annually.
Total combined volume: 5 million transactions per day at launch, scaling to 8 million within three years.
38.1.2 Non-Functional Requirements
The non-functional requirements are where payment systems live or die. Every architect can draw boxes and arrows. The ones who survive production can meet these numbers:
| Requirement | Target | Rationale |
|---|---|---|
| Availability | 99.999% (five nines) | 5.26 minutes unplanned downtime per year. Federal Reserve operating circular requires it for Fedwire participants. |
| Online Response Time | < 200ms for 99th percentile | RTP requires confirmation within 3 seconds end-to-end; our processing window is 200ms. |
| Batch Window | 4 hours for full EOD cycle | Must complete between 23:00 and 03:00 ET to meet ACH cutoff times. |
| Recovery Time Objective (RTO) | < 15 minutes for Tier 1 services | Wire and RTP processing must fail over within 15 minutes. |
| Recovery Point Objective (RPO) | Zero data loss | No committed transaction may be lost under any failure scenario. Financial regulators accept nothing less. |
| Throughput | 500 TPS sustained, 2,000 TPS peak | Peak occurs during payroll processing windows (first and fifteenth of each month). |
| Security | PCI-DSS Level 1, SOX, FFIEC compliant | Payment card data, financial reporting controls, and banking examination requirements. |
| Audit | 100% transaction traceability | Every transaction must be traceable from origination through settlement with immutable audit trail. |
💡 Why Five Nines Matters for Payments: Five nines (99.999%) means 5.26 minutes of unplanned downtime per year. That sounds extreme until you realize that a wire transfer system going down for one hour could delay billions of dollars in settlements, trigger cascading failures across the financial system, and result in regulatory action. The Federal Reserve does not accept excuses about "planned maintenance windows" for critical payment infrastructure.
38.1.3 Regulatory Landscape
Payment processing does not exist in a regulatory vacuum. PinnaclePay must comply with:
- Federal Reserve Operating Circulars 4, 6, and 8 — governing ACH, Fedwire funds, and FedNow participation
- NACHA Operating Rules — the rule book for all ACH transactions
- PCI-DSS v4.0 — if any payment card data touches the system
- BSA/AML (Bank Secrecy Act / Anti-Money Laundering) — OFAC screening for every wire transfer
- SOX Section 404 — internal controls over financial reporting
- FFIEC IT Examination Handbook — the standard against which bank examiners evaluate technology
- Regulation E — electronic fund transfer consumer protections
- GDPR — if processing payments involving EU data subjects
Each of these has specific technical implications that ripple through the architecture. You cannot bolt compliance on after the fact. It must be designed in from the beginning. We covered the security fundamentals in Chapters 29–31, but here we must apply them to every layer simultaneously.
38.1.4 Stakeholder Map
Understanding who cares about what is essential for an architecture review. Here are the stakeholders and their primary concerns:
| Stakeholder | Primary Concern | What They Will Ask |
|---|---|---|
| CFO | Total Cost of Ownership, ROI | "What does this cost over five years, and when does it pay for itself?" |
| CISO | Security posture, compliance gaps | "Show me the threat model. Where are the attack surfaces?" |
| CTO (Diane Chen) | Technical feasibility, scalability | "Will this handle 3x volume growth without re-architecture?" |
| Head of Payments | Feature completeness, time to market | "When can we onboard our first RTP customer?" |
| Operations Director | Runbook completeness, on-call burden | "How many people do I need, and what does the 2 AM page look like?" |
| External Auditors | Control evidence, separation of duties | "Show me the access controls and change management process." |
| Federal Reserve Examiners | Resilience, BCP/DR testing results | "When was your last DR test, and what were the results?" |
38.2 System Architecture Overview: The Complete Picture
38.2.1 Architecture Philosophy
Before drawing a single box, we must establish principles. These are not platitudes — they are decision filters that resolve arguments when the team disagrees:
- No single point of failure. Every component must be redundant. If removing any single element causes an outage, the architecture is incomplete.
- Defense in depth. Security is not a layer; it is a property of every layer.
- Fail safe, not fail silent. When something breaks, the system must reject the transaction cleanly, not process it incorrectly.
- Auditability over convenience. Every design decision that trades audit capability for performance or simplicity is wrong.
- Mainframe-first for core processing, distributed for edge services. The mainframe handles the money. Everything else is negotiable.
38.2.2 Logical Architecture
The complete PinnaclePay architecture consists of seven major subsystems, each mapping to a section of this textbook:
┌─────────────────────────────────────────────────────────────────┐
│ EXTERNAL INTERFACES │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────┐ │
│ │ FedACH │ │ Fedwire │ │ FedNow/ │ │ Partner APIs │ │
│ │ Gateway │ │ Gateway │ │ RTP GW │ │ (REST/gRPC) │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ └──────┬───────┘ │
│ │ │ │ │ │
├───────┼──────────────┼─────────────┼───────────────┼────────────┤
│ │ INTEGRATION LAYER (MQ + API Gateway) │ │
│ ┌────┴──────────────┴─────────────┴───────────────┴────────┐ │
│ │ WebSphere MQ Cluster (Ch 19-22) │ │
│ │ ┌─────────┐ ┌──────────┐ ┌──────────┐ │ │
│ │ │ ACH QM │ │ Wire QM │ │ RTP QM │ │ │
│ │ └────┬────┘ └────┬─────┘ └────┬─────┘ │ │
│ └────────┼─────────────┼──────────────┼────────────────────┘ │
│ │ │ │ │
├───────────┼─────────────┼──────────────┼────────────────────────┤
│ │ ONLINE PROCESSING (CICS) │ │
│ ┌────────┴─────────────┴──────────────┴──────────────────┐ │
│ │ CICS TS v5.6 Regions (Ch 13-18) │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
│ │ │ TOR/WOR │ │ AOR │ │ FOR │ │ │
│ │ │ Routing │ │Processing│ │ File │ │ │
│ │ └──────────┘ └──────────┘ └──────────┘ │ │
│ └────────────────────────┬───────────────────────────────┘ │
│ │ │
├───────────────────────────┼──────────────────────────────────────┤
│ DATA LAYER (DB2) │
│ ┌────────────────────────┴──────────────────────────────┐ │
│ │ DB2 v13 Data Sharing Group (Ch 7-12) │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────────────┐ │ │
│ │ │ Payment │ │ Customer │ │ Audit/Compliance │ │ │
│ │ │ Tables │ │ Tables │ │ Tables │ │ │
│ │ └──────────┘ └──────────┘ └──────────────────┘ │ │
│ └───────────────────────────────────────────────────────┘ │
│ │
├───────────────────────────────────────────────────────────────────┤
│ z/OS INFRASTRUCTURE │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ z/OS v2.5 — WLM, RACF, SMF, RMF, JES2 (Ch 1-6) │ │
│ │ ┌──────────┐ ┌──────────┐ ┌──────────┐ │ │
│ │ │ LPAR 1 │ │ LPAR 2 │ │ LPAR 3 │ │ │
│ │ │Production│ │Production│ │ Dev/Test │ │ │
│ │ └──────────┘ └──────────┘ └──────────┘ │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │
├───────────────────────────────────────────────────────────────────┤
│ BATCH PROCESSING │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ JCL/JES2 Batch Subsystem (Ch 23-28) │ │
│ │ ┌───────────┐ ┌──────────────┐ ┌──────────────────┐ │ │
│ │ │ ACH Batch │ │ EOD/Settle │ │ Report/Recon │ │ │
│ │ │ Processing│ │ Processing │ │ Generation │ │ │
│ │ └───────────┘ └──────────────┘ └──────────────────┘ │ │
│ └───────────────────────────────────────────────────────────┘ │
│ │
├───────────────────────────────────────────────────────────────────┤
│ SECURITY / COMPLIANCE / OPERATIONS │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────────┐ │
│ │ RACF │ │ zERT/TLS │ │ SMF/Audit│ │ GDPS/DR │ │
│ │ (Ch29-31)│ │ (Ch29-31)│ │ (Ch32-34)│ │ (Ch32-34) │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────────┘ │
└───────────────────────────────────────────────────────────────────┘
This is the system you are building. Every box maps to chapters you have already studied. What makes this capstone different from any individual chapter is that you must now make all of these boxes work together — and defend the decisions that connect them.
38.2.3 Physical Topology
PinnaclePay runs across two data centers, designated Primary (East) and Secondary (West), separated by approximately 400 kilometers — far enough for geographic diversity, close enough for synchronous replication:
Primary Data Center (East): - IBM z16 Model A01, 12 CPs configured as 8 GCPs + 4 zIIPs - Two LPARs: PPAY1 (production) and PPAY2 (production, active-active for online) - Third LPAR: PPAYDEV (development/test, capped)
Secondary Data Center (West): - IBM z16 Model A01, 8 CPs configured as 6 GCPs + 2 zIIPs - Two LPARs: PPAY3 (DR/active for reads) and PPAY4 (DR warm standby) - GDPS/XRC for continuous data replication
Network: - Dual FICON channels between all LPARs in each site - Dual ISC (Intersystem Communication) links between sites - Dedicated coupling facility links for Parallel Sysplex
⚠️ The 400-Kilometer Question: Why 400km and not 4,000km? Because synchronous replication — which is required for zero RPO — has a practical distance limit imposed by the speed of light. At 400km, round-trip latency is approximately 2.7ms, which is acceptable. At 4,000km, it would be 27ms, which would destroy online response times. For payment processing with zero-RPO requirements, you must accept this geographic constraint. Asynchronous replication to a third site (2,000+ km away) provides additional protection against regional disasters.
38.3 z/OS Environment: The Foundation
38.3.1 LPAR Configuration
Recall from Chapters 1–3 that an LPAR (Logical Partition) is an independent instance of z/OS running on shared hardware. PinnaclePay's LPAR design reflects the need for both isolation and resource sharing:
PPAY1 — Primary Production Online: - 4 GCPs, 2 zIIPs (dedicated) - Runs: CICS TOR/AOR regions for wire and RTP, DB2 member PPDB1, MQ queue manager PPMQ1 - WLM service class: PAYMENT_ONLINE (velocity goal: 95% of transactions under 100ms) - This LPAR handles all real-time payment processing — the transactions where milliseconds matter
PPAY2 — Primary Production Batch/Online: - 4 GCPs, 2 zIIPs (shared with PPAY1 via LPAR weight adjustment) - Runs: CICS AOR regions for ACH online, DB2 member PPDB2, MQ queue manager PPMQ2, JES2 for batch - WLM service class: PAYMENT_BATCH (duration goal: ACH batch completes within 4 hours) - During batch windows, WLM shifts CP weight from PPAY1 (reduced online load overnight) to PPAY2 - This is the workhorse LPAR — it processes the 3 million daily ACH transactions
PPAY3 — DR/Read Active: - 3 GCPs, 1 zIIP - Runs: DB2 member PPDB3 (data sharing group member, handles read queries), reporting CICS regions - Offloads read-heavy operations (balance inquiries, status checks) from production - Also serves as first-line DR takeover if PPAY1 fails
PPAY4 — DR Warm Standby: - 3 GCPs, 1 zIIP - Maintains GDPS-managed standby for PPAY2 - Batch DR capability — can assume full batch workload within 15 minutes
38.3.2 WLM Policy Design
Workload Manager policy for PinnaclePay is where Chapter 4's theory becomes life-or-death practice. The policy must enforce strict priorities:
Service Class Hierarchy:
Period 1 — SYSSTC (system tasks): Velocity 100% < 5ms
Period 2 — WIRE_ONLINE: Velocity 95% < 50ms
Period 3 — RTP_ONLINE: Velocity 95% < 75ms
Period 4 — ACH_ONLINE: Velocity 90% < 150ms
Period 5 — PAYMENT_BATCH: Duration — complete within 4 hours
Period 6 — REPORTING: Discretionary
Why is WIRE_ONLINE higher priority than RTP_ONLINE? Because wire transfers are individually high-value (often millions of dollars each) and Fedwire has strict timing requirements. A wire delayed by 100ms might cost nothing; a wire delayed by 10 minutes might cost the bank $50,000 in penalties. RTP has a more generous 3-second end-to-end window, so the CICS processing can tolerate slightly lower priority.
38.3.3 Language Environment Configuration
Chapter 5 covered Language Environment (LE) runtime options. For PinnaclePay, the critical LE settings are:
CICS LE Runtime Options:
STORAGE(00,NONE,00,0K) — Zero-fill storage on allocation
(security: no data leakage between transactions)
TRAP(ON,SPIE) — Trap abends, generate diagnostics
ABTERMENC(ABEND) — Propagate abends for clean CICS task management
ALL31(ON) — 31-bit addressing mode (required for >16MB storage)
HEAP(32768,32768,ANYWHERE,KEEP,8192,4096) — Heap tuned for payment message sizes
Batch LE Runtime Options:
RPTOPTS(ON) — Report options at startup for audit trail
RPTSTG(ON) — Report storage usage for capacity planning
STORAGE(00,NONE,00,0K) — Same security requirement as online
TRAP(ON,SPIE) — Same diagnostic requirement
The STORAGE(00,...) option deserves special emphasis for payment processing. When a CICS transaction completes and its storage is freed, that storage might be reallocated to a different transaction processing a different customer's data. If the storage is not zeroed, remnants of one customer's payment data could theoretically be visible to a program processing another customer. In a PCI-DSS audit, this is a finding. In a breach, it is a catastrophe. Zero-fill always.
📊 Capacity Planning Note: The z16 A01 with 8 GCPs delivers approximately 12,000 MIPS. PinnaclePay's projected workload at 5M transactions/day is approximately 4,500 MIPS sustained, with peaks to 8,000 MIPS during payroll cycles. This provides a 33% headroom at sustained load — within the 25–40% range that Chapter 6 identified as the operational sweet spot. Going below 25% headroom means you cannot absorb peak loads; going above 40% means you are overpaying for capacity.
38.4 Data Architecture: Where the Money Lives
38.4.1 DB2 Data Sharing Group
PinnaclePay uses a four-member DB2 data sharing group (Chapter 10), one member per LPAR. Data sharing provides both high availability and workload distribution:
DB2 Data Sharing Group: PPAYGRP
Member PPDB1 on PPAY1 — Primary online read/write
Member PPDB2 on PPAY2 — Batch processing, ACH online
Member PPDB3 on PPAY3 — Read offload, reporting
Member PPDB4 on PPAY4 — DR standby (started but idle)
Coupling Facility Structures:
PPAY_LOCK1 — Group lock structure (primary)
PPAY_LOCK2 — Group lock structure (rebuild alternate)
PPAY_SCA — Shared communications area
PPAY_GBP0 — Group buffer pool for BP0 (directory + data pages)
PPAY_GBP1 — Group buffer pool for BP1 (payment tables)
PPAY_GBP2 — Group buffer pool for BP2 (audit/history tables)
The data sharing group is the core of PinnaclePay's zero-RPO, zero-data-loss guarantee. Because all four DB2 members share the same data through the coupling facility, a member failure does not lose any committed data — it is already available to the surviving members. This is fundamentally different from replication, where there is always a window of potential loss. Chapter 10 explained the theory. Here is the practice: data sharing means that when PPDB1 fails, PPDB2 can immediately begin processing wire transfers without waiting for a log apply. The coupling facility has the current data.
38.4.2 The Payment Data Model
The PinnaclePay data model centers on a core payment entity with type-specific extensions. This design reflects the patterns from Chapters 7–9:
-- Core payment table — every payment type starts here
CREATE TABLE PINNACLE.PAYMENT (
PAYMENT_ID CHAR(20) NOT NULL,
PAYMENT_TYPE CHAR(3) NOT NULL, -- ACH, WIR, RTP
ORIGINATOR_ID CHAR(12) NOT NULL,
BENEFICIARY_ID CHAR(12) NOT NULL,
AMOUNT DECIMAL(15,2) NOT NULL,
CURRENCY_CODE CHAR(3) NOT NULL DEFAULT 'USD',
STATUS CHAR(3) NOT NULL, -- PND, ACT, SET, REJ, RET
CREATED_TS TIMESTAMP(12) NOT NULL WITH DEFAULT,
UPDATED_TS TIMESTAMP(12) NOT NULL WITH DEFAULT,
SETTLEMENT_DATE DATE,
PRIORITY SMALLINT NOT NULL DEFAULT 5,
ORIGINATING_SYSTEM CHAR(8) NOT NULL,
CONSTRAINT PK_PAYMENT PRIMARY KEY (PAYMENT_ID)
)
IN PPAYDB.PPAYTS01
PARTITION BY RANGE (CREATED_TS)
(
PARTITION 1 ENDING('2026-04-01-00.00.00.000000000000') INCLUSIVE,
PARTITION 2 ENDING('2026-07-01-00.00.00.000000000000') INCLUSIVE,
PARTITION 3 ENDING('2026-10-01-00.00.00.000000000000') INCLUSIVE,
PARTITION 4 ENDING('2027-01-01-00.00.00.000000000000') INCLUSIVE,
PARTITION 5 ENDING('9999-12-31-00.00.00.000000000000') INCLUSIVE
)
APPEND YES
AUDIT ALL;
-- Temporal history for regulatory compliance (Chapter 12)
CREATE TABLE PINNACLE.PAYMENT_HISTORY (
PAYMENT_ID CHAR(20) NOT NULL,
SYS_START TIMESTAMP(12) NOT NULL,
SYS_END TIMESTAMP(12) NOT NULL,
PAYMENT_TYPE CHAR(3) NOT NULL,
STATUS CHAR(3) NOT NULL,
AMOUNT DECIMAL(15,2) NOT NULL,
UPDATED_BY CHAR(8) NOT NULL,
CONSTRAINT PK_PAY_HIST PRIMARY KEY (PAYMENT_ID, SYS_START)
)
IN PPAYDB.PPAYTS02;
-- Wire transfer extension
CREATE TABLE PINNACLE.WIRE_DETAIL (
PAYMENT_ID CHAR(20) NOT NULL,
FEDWIRE_IMAD CHAR(22), -- Input Message Accountability Data
SENDER_ABA CHAR(9) NOT NULL,
RECEIVER_ABA CHAR(9) NOT NULL,
BUSINESS_FUNC CHAR(3) NOT NULL, -- CTR, DRW, FFR
OFAC_SCREEN_STATUS CHAR(3) NOT NULL DEFAULT 'PND',
OFAC_SCREEN_TS TIMESTAMP(12),
CONSTRAINT PK_WIRE PRIMARY KEY (PAYMENT_ID),
CONSTRAINT FK_WIRE_PAY FOREIGN KEY (PAYMENT_ID)
REFERENCES PINNACLE.PAYMENT (PAYMENT_ID)
)
IN PPAYDB.PPAYTS03;
-- ACH extension
CREATE TABLE PINNACLE.ACH_DETAIL (
PAYMENT_ID CHAR(20) NOT NULL,
BATCH_ID CHAR(12) NOT NULL,
SEC_CODE CHAR(3) NOT NULL, -- PPD, CCD, WEB, TEL
TRACE_NUMBER CHAR(15) NOT NULL,
EFFECTIVE_DATE DATE NOT NULL,
ADDENDA_COUNT SMALLINT NOT NULL DEFAULT 0,
RETURN_REASON CHAR(3),
CONSTRAINT PK_ACH PRIMARY KEY (PAYMENT_ID),
CONSTRAINT FK_ACH_PAY FOREIGN KEY (PAYMENT_ID)
REFERENCES PINNACLE.PAYMENT (PAYMENT_ID)
)
IN PPAYDB.PPAYTS04
PARTITION BY RANGE (EFFECTIVE_DATE);
-- RTP extension
CREATE TABLE PINNACLE.RTP_DETAIL (
PAYMENT_ID CHAR(20) NOT NULL,
MESSAGE_ID CHAR(35) NOT NULL,
CREDITOR_AGENT CHAR(11) NOT NULL, -- BIC
DEBTOR_AGENT CHAR(11) NOT NULL, -- BIC
END_TO_END_ID CHAR(35) NOT NULL,
ACCEPTANCE_TS TIMESTAMP(12),
CLEARING_SYSTEM CHAR(3) NOT NULL, -- TCH, FDN (FedNow)
CONSTRAINT PK_RTP PRIMARY KEY (PAYMENT_ID),
CONSTRAINT FK_RTP_PAY FOREIGN KEY (PAYMENT_ID)
REFERENCES PINNACLE.PAYMENT (PAYMENT_ID)
)
IN PPAYDB.PPAYTS05;
-- Immutable audit trail
CREATE TABLE PINNACLE.PAYMENT_AUDIT (
AUDIT_ID CHAR(26) NOT NULL, -- Timestamp + sequence
PAYMENT_ID CHAR(20) NOT NULL,
AUDIT_ACTION CHAR(3) NOT NULL, -- CRE, UPD, APR, REJ, SET
AUDIT_TS TIMESTAMP(12) NOT NULL WITH DEFAULT,
AUDIT_USER CHAR(8) NOT NULL,
AUDIT_PROGRAM CHAR(8) NOT NULL,
AUDIT_TERMINAL CHAR(4),
OLD_STATUS CHAR(3),
NEW_STATUS CHAR(3),
AUDIT_DETAIL VARCHAR(500),
CONSTRAINT PK_AUDIT PRIMARY KEY (AUDIT_ID)
)
IN PPAYDB.PPAYTS06
APPEND YES
AUDIT ALL;
38.4.3 Partitioning Strategy
The PAYMENT table is partitioned by CREATED_TS using quarterly boundaries (Chapter 11). This serves three purposes:
- Performance: Range scans on recent payments (the most common query pattern) only touch one or two partitions.
- Maintenance: Old partitions can be rotated to archive without impacting active data. REORG and RUNSTATS operate on individual partitions.
- Recovery: Partition-level recovery is possible — if partition 3 is damaged, you can recover just that partition while partitions 1, 2, 4, and 5 remain online.
ACH_DETAIL is also partitioned by EFFECTIVE_DATE because ACH queries are almost always date-range bound ("show me all ACH credits settling on March 15").
38.4.4 Indexing Strategy
-- Primary access path for online payment lookup
CREATE UNIQUE INDEX PINNACLE.IX_PAY_ID
ON PINNACLE.PAYMENT (PAYMENT_ID)
USING STOGROUP PPAYSG01
CLUSTER
BUFFERPOOL BP1;
-- Status-based queries (operations dashboard)
CREATE INDEX PINNACLE.IX_PAY_STATUS
ON PINNACLE.PAYMENT (STATUS, PAYMENT_TYPE, CREATED_TS DESC)
USING STOGROUP PPAYSG01
BUFFERPOOL BP1;
-- Originator queries (customer service)
CREATE INDEX PINNACLE.IX_PAY_ORIG
ON PINNACLE.PAYMENT (ORIGINATOR_ID, CREATED_TS DESC)
USING STOGROUP PPAYSG01
BUFFERPOOL BP1;
-- Settlement date queries (batch processing)
CREATE INDEX PINNACLE.IX_PAY_SETTLE
ON PINNACLE.PAYMENT (SETTLEMENT_DATE, STATUS, PAYMENT_TYPE)
USING STOGROUP PPAYSG01
BUFFERPOOL BP1;
🔗 Cross-Reference: The index design follows the principles from Chapter 9 (query optimization). The IX_PAY_STATUS index uses a composite key with STATUS as the leading column because the operations dashboard filters on status first ("show me all pending payments"), then payment type, then orders by timestamp descending. If you reversed the column order, every status query would require a full index scan.
38.4.5 Temporal Tables for Regulatory Compliance
Chapter 12 introduced DB2 temporal tables. For PinnaclePay, system-time temporal tables are not optional — they are a regulatory requirement. Federal examiners will ask: "Show me the state of payment P123456 at exactly 14:32:17 on March 3, 2026." Without temporal tables, answering this question requires reconstructing state from audit logs — an expensive, error-prone process that takes hours. With temporal tables:
SELECT * FROM PINNACLE.PAYMENT
FOR SYSTEM_TIME AS OF '2026-03-03-14.32.17.000000'
WHERE PAYMENT_ID = 'P123456';
One query. Instant answer. Examiner satisfied. Career preserved.
38.5 Online Processing: CICS Topology for Payments
38.5.1 CICSplex Design
PinnaclePay's CICS topology (Chapters 13–18) uses the multi-region architecture that Kwame Asante's team at CNB pioneered and that is now considered standard for financial workloads:
CICSplex: PPAYPLEX
CICSPlex SM WUI Region: PPSMWUI (management)
┌─── PPAY1 LPAR ───────────────────────────┐
│ PPTOR1 — Terminal Owning Region │
│ (TCP/IP listener, port 8443) │
│ Routes to AOR based on payment type │
│ │
│ PPWOR1 — Web Owning Region │
│ (CICS Web Services, JSON/REST) │
│ Handles all API-originated payments │
│ │
│ PPAOR1 — Application Owning Region │
│ (Wire transfers, RTP processing) │
│ Connected to PPDB1 (DB2) │
│ Connected to PPMQ1 (MQ) │
│ │
│ PPFOR1 — File Owning Region │
│ (VSAM files, temporary storage) │
└───────────────────────────────────────────┘
┌─── PPAY2 LPAR ───────────────────────────┐
│ PPTOR2 — Terminal Owning Region │
│ PPWOR2 — Web Owning Region │
│ PPAOR2 — Application Owning Region │
│ (ACH online, batch initiation) │
│ PPFOR2 — File Owning Region │
└───────────────────────────────────────────┘
┌─── PPAY3 LPAR ───────────────────────────┐
│ PPTOR3 — TOR (read-only queries) │
│ PPWOR3 — WOR (status inquiries) │
│ PPAOR3 — AOR (read workload) │
└───────────────────────────────────────────┘
38.5.2 Transaction Routing
The TOR regions use dynamic routing (Chapter 16) with CICSPlex SM workload management to distribute transactions:
- Wire transfer programs (PPWIRE*): Routed to PPAOR1 (primary) or PPAOR3 (overflow/read)
- RTP programs (PPRTP*): Routed to PPAOR1 with affinity to maintain session state
- ACH online programs (PPACH*): Routed to PPAOR2
- Inquiry programs (PPINQ*): Routed to PPAOR3 (read-optimized LPAR)
The routing exit program, PPROUTE, implements weighted distribution with health checking:
IDENTIFICATION DIVISION.
PROGRAM-ID. PPROUTE.
*================================================================*
* Payment Processing Dynamic Router *
* Routes transactions to appropriate AOR based on payment type, *
* region health, and current workload. *
*================================================================*
DATA DIVISION.
WORKING-STORAGE SECTION.
01 WS-TRAN-ID PIC X(4).
01 WS-TARGET-AOR PIC X(4).
01 WS-PAYMENT-PREFIX PIC X(2).
PROCEDURE DIVISION.
EXEC CICS ASSIGN TRANSID(WS-TRAN-ID)
END-EXEC
MOVE WS-TRAN-ID(1:2) TO WS-PAYMENT-PREFIX
EVALUATE WS-PAYMENT-PREFIX
WHEN 'PW' MOVE 'AOR1' TO WS-TARGET-AOR
WHEN 'PR' MOVE 'AOR1' TO WS-TARGET-AOR
WHEN 'PA' MOVE 'AOR2' TO WS-TARGET-AOR
WHEN 'PI' MOVE 'AOR3' TO WS-TARGET-AOR
WHEN OTHER MOVE 'AOR1' TO WS-TARGET-AOR
END-EVALUATE
* Health check — if target AOR is down, reroute
PERFORM CHECK-AOR-HEALTH
EXEC CICS ROUTE REGION(WS-TARGET-AOR)
END-EXEC
STOP RUN.
CHECK-AOR-HEALTH.
* In production, this queries CICSPlex SM for region
* status and reroutes if the primary target is
* unhealthy. See Chapter 16 for the full pattern.
CONTINUE.
38.5.3 CICS Web Services for API Access
Modern payment integrations arrive via REST APIs, not 3270 terminals. PinnaclePay exposes payment services through CICS Web Services (Chapter 18):
API Endpoints (via CICS WOR regions):
POST /api/v1/payments/wire — Initiate wire transfer
POST /api/v1/payments/ach — Submit ACH payment
POST /api/v1/payments/rtp — Initiate real-time payment
GET /api/v1/payments/{id} — Payment status inquiry
GET /api/v1/payments/{id}/audit — Audit trail for payment
POST /api/v1/payments/{id}/cancel — Request cancellation
The JSON-to-COMMAREA transformation uses CICS channels and containers (Chapter 17), which eliminated the 32KB COMMAREA size limitation that plagued earlier payment systems. A wire transfer message with full SWIFT MT103 data can easily exceed 32KB. Channels handle it without modification.
38.5.4 The Wire Transfer Transaction Flow
Let us trace a single wire transfer through the entire online path — the kind of end-to-end walkthrough that an architecture review board expects:
- Inbound: Fedwire message arrives via TCP/IP at the MQ listener, is placed on queue PPAY.WIRE.INBOUND
- Trigger: MQ trigger monitor fires CICS transaction PWRX (wire receive) in PPAOR1
- Parse: Program PPWRXPRS parses the Fedwire message, validates format
- OFAC Screening: Program PPWROFAC performs real-time OFAC/SDN list screening via MQ request/reply to the compliance service. If flagged, the payment enters HOLD status and is routed to the compliance team's work queue. (This step is not optional. Missing an OFAC hit is a federal crime.)
- Validation: Program PPWRVAL performs business rule validation — sufficient funds, valid ABA routing numbers, within counterparty limits
- Persist: Program PPWRDB2 inserts into PAYMENT, WIRE_DETAIL, and PAYMENT_AUDIT tables within a single unit of work
- Process: Program PPWRPRC updates the general ledger positions and queues the settlement record
- Confirm: Program PPWRCNF sends the Fedwire acknowledgment via MQ to the outbound queue
- Notify: Program PPWRNTF publishes a payment event to the event stream for downstream consumers
Total elapsed time target: under 150ms. Every millisecond of that budget is allocated:
| Step | Program | Time Budget | Notes |
|---|---|---|---|
| Parse | PPWRXPRS | 5ms | CPU-bound message parsing |
| OFAC Screen | PPWROFAC | 30ms | MQ request/reply to compliance |
| Validation | PPWRVAL | 10ms | Business rules, DB2 lookups |
| Persist | PPWRDB2 | 25ms | DB2 insert, 3 tables, 1 commit |
| Process | PPWRPRC | 20ms | GL update, settlement queue |
| Confirm | PPWRCNF | 10ms | MQ put to outbound queue |
| Notify | PPWRNTF | 5ms | Async event publish |
| Overhead | (CICS/MQ/network) | 45ms | Transaction management, routing |
| Total | 150ms |
⚠️ The OFAC Screening Budget: The 30ms budget for OFAC screening is the tightest constraint in the wire transfer flow. The SDN (Specially Designated Nationals) list contains over 12,000 entries. Naive string matching would blow the budget. The production approach uses a pre-compiled hash table loaded into a CICS shared data table at region startup, updated daily. This reduces screening to a hash lookup — approximately 0.1ms for the lookup itself, with the remaining 29.9ms as buffer for the MQ round-trip to the compliance logging service.
38.6 Integration Layer: MQ and API Architecture
38.6.1 MQ Cluster Topology
WebSphere MQ (Chapters 19–22) serves as the nervous system of PinnaclePay, connecting internal components and external payment networks:
MQ Cluster: PPAYCLUSTER
Queue Manager PPMQ1 (PPAY1) — Primary online messaging
Queue Manager PPMQ2 (PPAY2) — Batch messaging, ACH files
Queue Manager PPMQ3 (PPAY3) — Read/reporting messaging
Queue Manager PPMQ4 (PPAY4) — DR standby
Full Repository: PPMQ1, PPMQ3
Partial Repository: PPMQ2, PPMQ4
Channel Configuration:
PPMQ1.TO.PPMQ2 — Sender/Receiver, TLS 1.3, persistent
PPMQ1.TO.PPMQ3 — Sender/Receiver, TLS 1.3, persistent
PPMQ1.TO.FEDWIRE — Sender/Receiver, TLS 1.3, to Federal Reserve
PPMQ2.TO.FEDACH — Sender/Receiver, TLS 1.3, to FedACH
PPMQ1.TO.TCH — Sender/Receiver, TLS 1.3, to The Clearing House (RTP)
38.6.2 Queue Design
Each payment type has a dedicated set of queues following the pattern established in Chapter 20:
Wire Transfer Queues:
PPAY.WIRE.INBOUND — Incoming Fedwire messages
PPAY.WIRE.OUTBOUND — Outgoing Fedwire messages
PPAY.WIRE.OFAC.REQUEST — OFAC screening requests
PPAY.WIRE.OFAC.REPLY — OFAC screening responses
PPAY.WIRE.DLQ — Wire dead-letter queue
PPAY.WIRE.EXCEPTION — Wires requiring manual review
ACH Queues:
PPAY.ACH.INBOUND.FILE — Incoming ACH files (NACHA format)
PPAY.ACH.OUTBOUND.FILE — Outgoing ACH files
PPAY.ACH.RETURN — ACH return processing
PPAY.ACH.NOC — Notification of Change
PPAY.ACH.DLQ — ACH dead-letter queue
RTP Queues:
PPAY.RTP.INBOUND — Incoming RTP messages (ISO 20022)
PPAY.RTP.OUTBOUND — Outgoing RTP messages
PPAY.RTP.STATUS — Payment status updates
PPAY.RTP.DLQ — RTP dead-letter queue
Cross-Cutting Queues:
PPAY.AUDIT.EVENTS — All payment events for audit trail
PPAY.GL.UPDATES — General ledger posting queue
PPAY.NOTIFY.EVENTS — Customer notification events
PPAY.RECON.FEED — Reconciliation data feed
38.6.3 Message Persistence and Recovery
Every message in PinnaclePay is persistent (MQPER_PERSISTENT). This is non-negotiable for payment processing. A lost message is a lost payment. The MQ log configuration ensures:
- Circular logging is NOT used. Linear logging with archiving is mandatory.
- Log file size: 256MB per log extent, 30 primary extents, 30 secondary extents
- Media image frequency: Every 100,000 messages or 1 hour, whichever comes first
- Log replication: MQ logs are replicated via GDPS/XRC alongside DB2 logs
🔴 Non-Negotiable: I have seen three payment system failures in my career caused by MQ circular logging. In each case, the queue manager ran out of log space during a peak period, could not wrap the log because active units of work held the oldest log extent, and the queue manager crashed. With linear logging and proper monitoring, this cannot happen. The slight overhead of managing archived logs is insignificant compared to the cost of losing an hour of wire transfers.
38.6.4 API Gateway Integration
For external partner access and modern channel integration, PinnaclePay uses an API gateway pattern (Chapter 22) that sits in front of the CICS Web Services:
External Client → Load Balancer → API Gateway (z/OS Connect EE)
→ CICS Web Owning Region → CICS Application Owning Region
→ DB2 / MQ
API Gateway Responsibilities:
- TLS termination (mutual TLS for partner APIs)
- OAuth 2.0 token validation
- Rate limiting (per-partner quotas)
- Request/response transformation (JSON ↔ COBOL copybook)
- API versioning (v1, v2 endpoints)
- Circuit breaker pattern for downstream failures
The API gateway is implemented using z/OS Connect Enterprise Edition, which provides native integration between RESTful APIs and CICS/IMS/DB2 resources. This is the bridge that Ahmad Rashid's modernization team at Pinnacle has been building — allowing fintech partners to interact with mainframe-based payment processing through modern APIs without requiring any knowledge of COBOL, CICS, or MQ.
38.7 Batch Architecture: The ACH Processing Engine
38.7.1 ACH Batch Processing Design
ACH is fundamentally a batch system. The Federal Reserve delivers ACH files at scheduled windows throughout the day, and PinnaclePay must process them within tight deadlines. The batch architecture (Chapters 23–28) handles this:
ACH Daily Processing Schedule:
06:00 ET — Receive morning delivery (FedACH)
06:15 ET — Parse and validate batch (PPACHPRS)
06:30 ET — OFAC screening batch run (PPACHSDN)
07:00 ET — Post to accounts (PPACHPST)
07:30 ET — Generate return file (PPACHRET)
08:00 ET — Transmit returns to FedACH
(Repeat for each FedACH delivery window:
06:00, 10:00, 14:00, 16:00, 18:00)
23:00 ET — Begin End-of-Day processing
23:15 ET — Settlement reconciliation (PPSETTRC)
23:30 ET — GL posting batch (PPGLPOST)
00:00 ET — Report generation (PPRPTGEN)
01:00 ET — Data archival (PPARCHIV)
02:00 ET — REORG/RUNSTATS window (maintenance)
03:00 ET — End of batch window
38.7.2 JCL for the ACH Master Job
The ACH batch processing is orchestrated by a master JCL job that invokes each step with checkpoint/restart capability (Chapter 26):
//PPACHDAY JOB (PPAY,ACH),'ACH DAILY PROCESSING',
// CLASS=A,MSGCLASS=H,NOTIFY=&SYSUID,
// RESTART=*,
// REGION=0M
//*============================================================*
//* ACH DAILY PROCESSING — MASTER JOB *
//* Checkpoint/restart enabled at each step boundary. *
//* If a step fails, restart from the failed step, not step 1. *
//*============================================================*
//*
//JOBLIB DD DSN=PPAY.PROD.LOADLIB,DISP=SHR
// DD DSN=CEE.SCEERUN,DISP=SHR
//*
//*--- STEP 1: RECEIVE AND PARSE ACH FILE ---*
//PARSE EXEC PGM=PPACHPRS,PARM='WINDOW=MORNING'
//STEPLIB DD DSN=PPAY.PROD.LOADLIB,DISP=SHR
//INFILE DD DSN=PPAY.ACH.INBOUND(&DATESTAMP),DISP=SHR
//PARSED DD DSN=PPAY.ACH.PARSED(&DATESTAMP),
// DISP=(NEW,CATLG,DELETE),
// SPACE=(CYL,(100,50),RLSE),
// DCB=(RECFM=VB,LRECL=1024,BLKSIZE=0)
//ERRFILE DD DSN=PPAY.ACH.ERRORS(&DATESTAMP),
// DISP=(NEW,CATLG,DELETE),
// SPACE=(CYL,(10,5),RLSE)
//SYSOUT DD SYSOUT=*
//CHKPT DD DSN=PPAY.ACH.CHKPT.PARSE,DISP=OLD
//*
//*--- STEP 2: OFAC SCREENING ---*
//SCREEN EXEC PGM=PPACHSDN,COND=(0,NE,PARSE)
//INFILE DD DSN=PPAY.ACH.PARSED(&DATESTAMP),DISP=SHR
//SDNLIST DD DSN=PPAY.OFAC.SDN.CURRENT,DISP=SHR
//CLEARED DD DSN=PPAY.ACH.CLEARED(&DATESTAMP),
// DISP=(NEW,CATLG,DELETE),
// SPACE=(CYL,(100,50),RLSE)
//FLAGGED DD DSN=PPAY.ACH.FLAGGED(&DATESTAMP),
// DISP=(NEW,CATLG,DELETE),
// SPACE=(CYL,(1,1),RLSE)
//CHKPT DD DSN=PPAY.ACH.CHKPT.SCREEN,DISP=OLD
//*
//*--- STEP 3: POST TO ACCOUNTS ---*
//POST EXEC PGM=PPACHPST,COND=(0,NE,SCREEN)
//INFILE DD DSN=PPAY.ACH.CLEARED(&DATESTAMP),DISP=SHR
//DBRM DD DSN=PPAY.PROD.DBRM(PPACHPST),DISP=SHR
//SYSOUT DD SYSOUT=*
//CHKPT DD DSN=PPAY.ACH.CHKPT.POST,DISP=OLD
//*
//*--- STEP 4: GENERATE RETURNS ---*
//RETURN EXEC PGM=PPACHRET,COND=(0,NE,POST)
//POSTLOG DD DSN=PPAY.ACH.POSTLOG(&DATESTAMP),DISP=SHR
//RETFILE DD DSN=PPAY.ACH.RETURNS(&DATESTAMP),
// DISP=(NEW,CATLG,DELETE),
// SPACE=(CYL,(10,5),RLSE)
//CHKPT DD DSN=PPAY.ACH.CHKPT.RETURN,DISP=OLD
//*
//*--- STEP 5: TRANSMIT RETURNS ---*
//XMIT EXEC PGM=PPACHXMT,COND=(0,NE,RETURN)
//RETFILE DD DSN=PPAY.ACH.RETURNS(&DATESTAMP),DISP=SHR
//MQOUT DD DSN=PPAY.ACH.MQ.OUTBOUND,DISP=SHR
//SYSOUT DD SYSOUT=*
38.7.3 Checkpoint/Restart for Payment Batch
The most critical batch programs implement checkpoint/restart at the individual transaction level. Program PPACHPST (ACH posting) processes millions of records and cannot afford to restart from the beginning on failure:
IDENTIFICATION DIVISION.
PROGRAM-ID. PPACHPST.
*================================================================*
* ACH Payment Posting — Batch Program with Checkpoint/Restart *
* Commits every 1,000 records. On restart, repositions to the *
* last committed checkpoint and resumes processing. *
*================================================================*
DATA DIVISION.
WORKING-STORAGE SECTION.
01 WS-COMMIT-FREQUENCY PIC S9(8) COMP VALUE 1000.
01 WS-RECORDS-IN-UOW PIC S9(8) COMP VALUE 0.
01 WS-TOTAL-PROCESSED PIC S9(9) COMP VALUE 0.
01 WS-TOTAL-POSTED PIC S9(9) COMP VALUE 0.
01 WS-TOTAL-REJECTED PIC S9(9) COMP VALUE 0.
01 WS-CHECKPOINT-ID PIC X(20).
01 WS-RESTART-FLAG PIC X VALUE 'N'.
01 WS-LAST-PAYMENT-ID PIC X(20).
01 WS-PAYMENT-RECORD.
05 WS-PAY-ID PIC X(20).
05 WS-PAY-TYPE PIC X(3).
05 WS-PAY-AMOUNT PIC S9(13)V99 COMP-3.
05 WS-PAY-ORIG-ACCT PIC X(12).
05 WS-PAY-BENE-ACCT PIC X(12).
05 WS-PAY-STATUS PIC X(3).
PROCEDURE DIVISION.
0000-MAIN.
PERFORM 1000-INITIALIZE
PERFORM 2000-CHECK-RESTART
PERFORM 3000-PROCESS-PAYMENTS
UNTIL WS-END-OF-FILE = 'Y'
PERFORM 4000-FINAL-COMMIT
PERFORM 9000-TERMINATE
STOP RUN.
1000-INITIALIZE.
EXEC SQL CONNECT TO PPAYGRP END-EXEC
OPEN INPUT ACH-CLEARED-FILE
MOVE 'N' TO WS-END-OF-FILE.
2000-CHECK-RESTART.
* Read checkpoint dataset to determine if this is a restart
READ CHECKPOINT-FILE INTO WS-CHECKPOINT-ID
AT END MOVE 'N' TO WS-RESTART-FLAG
NOT AT END MOVE 'Y' TO WS-RESTART-FLAG
END-READ
IF WS-RESTART-FLAG = 'Y'
DISPLAY 'RESTART DETECTED. REPOSITIONING TO: '
WS-CHECKPOINT-ID
PERFORM 2100-REPOSITION-FILE
END-IF.
2100-REPOSITION-FILE.
* Skip records already processed before the checkpoint
PERFORM UNTIL WS-PAY-ID >= WS-CHECKPOINT-ID
OR WS-END-OF-FILE = 'Y'
READ ACH-CLEARED-FILE INTO WS-PAYMENT-RECORD
AT END MOVE 'Y' TO WS-END-OF-FILE
END-READ
ADD 1 TO WS-TOTAL-PROCESSED
END-PERFORM.
3000-PROCESS-PAYMENTS.
READ ACH-CLEARED-FILE INTO WS-PAYMENT-RECORD
AT END
MOVE 'Y' TO WS-END-OF-FILE
NOT AT END
PERFORM 3100-POST-SINGLE-PAYMENT
ADD 1 TO WS-RECORDS-IN-UOW
ADD 1 TO WS-TOTAL-PROCESSED
IF WS-RECORDS-IN-UOW >= WS-COMMIT-FREQUENCY
PERFORM 3200-COMMIT-CHECKPOINT
END-IF
END-READ.
3100-POST-SINGLE-PAYMENT.
* Insert into PAYMENT and ACH_DETAIL tables
* Update account balances
* Write audit record
* Full implementation per Chapter 8 patterns
EXEC SQL
INSERT INTO PINNACLE.PAYMENT
(PAYMENT_ID, PAYMENT_TYPE, ORIGINATOR_ID,
BENEFICIARY_ID, AMOUNT, STATUS,
ORIGINATING_SYSTEM)
VALUES
(:WS-PAY-ID, :WS-PAY-TYPE, :WS-PAY-ORIG-ACCT,
:WS-PAY-BENE-ACCT, :WS-PAY-AMOUNT, 'ACT',
'PPACHPST')
END-EXEC
EVALUATE SQLCODE
WHEN 0 ADD 1 TO WS-TOTAL-POSTED
WHEN -803 ADD 1 TO WS-TOTAL-REJECTED
WHEN OTHER PERFORM 8000-SQL-ERROR
END-EVALUATE.
3200-COMMIT-CHECKPOINT.
EXEC SQL COMMIT END-EXEC
MOVE WS-PAY-ID TO WS-LAST-PAYMENT-ID
WRITE CHECKPOINT-RECORD FROM WS-LAST-PAYMENT-ID
MOVE 0 TO WS-RECORDS-IN-UOW
DISPLAY 'CHECKPOINT AT: ' WS-LAST-PAYMENT-ID
' PROCESSED: ' WS-TOTAL-PROCESSED
' POSTED: ' WS-TOTAL-POSTED.
4000-FINAL-COMMIT.
IF WS-RECORDS-IN-UOW > 0
EXEC SQL COMMIT END-EXEC
WRITE CHECKPOINT-RECORD FROM WS-LAST-PAYMENT-ID
END-IF
DISPLAY 'BATCH COMPLETE.'
DISPLAY ' TOTAL PROCESSED: ' WS-TOTAL-PROCESSED
DISPLAY ' TOTAL POSTED: ' WS-TOTAL-POSTED
DISPLAY ' TOTAL REJECTED: ' WS-TOTAL-REJECTED.
38.7.4 Parallel Batch Processing
For the 3 million daily ACH transactions, serial processing would be far too slow. Chapter 27 covered parallel batch techniques. PinnaclePay uses DB2 utility-based partitioned parallelism:
- The ACH input file is split by originator ABA routing number into 10 parallel streams
- Each stream processes independently with its own commit checkpoint
- A coordinator job waits for all streams to complete, then runs the reconciliation step
- If one stream fails, only that stream restarts — the other 9 are unaffected
This reduces ACH batch processing time from approximately 3.5 hours (serial) to approximately 25 minutes (10-way parallel). That is the difference between barely making the batch window and having comfortable margin for error.
38.7.5 End-of-Day Settlement Processing
The end-of-day (EOD) cycle is the heartbeat of any payment system. Every day must close cleanly before the next day can begin. PinnaclePay's EOD processing runs between 23:00 and 03:00 ET and includes five critical steps:
Step 1 — Settlement Reconciliation (PPSETTRC): This program reconciles every payment processed during the day against the settlement positions maintained with each counterparty. For wires, each individual transaction should have a corresponding settlement entry. For ACH, batches are reconciled against NACHA file totals. For RTP, the settlement is verified against The Clearing House's end-of-day settlement file. The reconciliation must balance to the penny. If it does not, the program generates a break report and alerts the operations team. In seventeen years of running payment systems, I have never seen a day where everything balanced perfectly on the first pass. There is always at least one exception — a late-arriving wire amendment, an ACH return that crossed with the original, a timezone boundary issue with RTP. The program must handle these exceptions gracefully and produce a report that the reconciliation team can work through the next morning.
Step 2 — General Ledger Posting (PPGLPOST): Throughout the day, GL updates are queued on PPAY.GL.UPDATES. The EOD GL posting batch drains this queue and posts aggregate entries to the general ledger system. The GL entries are grouped by payment type, direction (inbound/outbound), and currency. This is a DB2-intensive step — the program reads millions of queue messages and generates summary journal entries. Checkpoint/restart operates at the GL account level: if the program fails while posting to account 1234, it restarts from that account, not from the beginning.
Step 3 — Regulatory Reporting (PPRPTGEN): Generates the daily regulatory reports required by various agencies. This includes the daily wire transfer activity report (for OFAC compliance), the ACH volume and value summary (for NACHA reporting), the RTP settlement summary, and the daily large-value transaction report (for BSA/AML — any single transaction over $10,000). These reports are generated as both machine-readable files (for submission to regulators) and human-readable PDF reports (for internal management review).
Step 4 — Data Archival (PPARCHIV): Moves data older than the retention threshold from active tables to archive tables. For PinnaclePay, the active retention period is 90 days for transaction detail and 7 years for audit records. The archival process uses DB2 partition rotation — once a quarterly partition ages beyond 90 days, it is detached from the active table, its data is exported to archive storage, and the partition is reused for future data. This approach avoids the performance overhead of row-by-row DELETE operations.
Step 5 — Database Maintenance (PPMAINT): The final step in the EOD cycle runs DB2 REORG and RUNSTATS on partitions that have accumulated sufficient changes during the day. Not every partition needs maintenance every night — the program checks the SYSTABLESPACESTATS system table to identify partitions where the percentage of updated rows exceeds 20%. Only those partitions are reorganized. This targeted approach completes in 15-20 minutes instead of the 2+ hours a full REORG would require.
🔗 Cross-Reference: The partition rotation technique for archival comes from Chapter 11 (partitioning strategies). The targeted REORG approach comes from Chapter 28 (batch optimization). The checkpoint/restart pattern for the GL posting comes from Chapter 26. In production, these techniques compound — each one saves minutes, and in a 4-hour batch window, every minute matters.
38.7.6 Batch Monitoring and Alerting
Batch jobs need real-time monitoring just as urgently as online transactions. PinnaclePay's batch monitoring tracks:
- Job duration vs. expected duration: If the ACH posting step is taking 50% longer than normal, something has changed — higher volume, lock contention, I/O bottleneck. Alert at 110% of expected duration; escalate at 150%.
- Record processing rate: The ACH posting step should process approximately 2,000 records per second. If the rate drops below 1,000, investigate immediately.
- Error rate: Rejected records are expected (bad account numbers, insufficient funds). An error rate above 2% suggests a systemic problem — corrupt input file, wrong OFAC list version, DB2 issue.
- Checkpoint frequency: If checkpoints stop being written, the program may be hung. Alert after 5 minutes without a checkpoint.
A batch job that runs to completion but produces incorrect results is worse than a batch job that abends. The abend is visible. The incorrect results may not be discovered until a customer calls, an auditor investigates, or a regulator examines. This is why PinnaclePay's batch monitoring includes reconciliation checks within the batch flow itself — not just at EOD, but at each step boundary.
38.8 Security and Compliance: Protecting the Money
38.8.1 RACF Security Model
PinnaclePay's RACF configuration (Chapters 29–30) implements defense in depth with separation of duties:
RACF Group Structure:
PPAY — Top-level payment group
PPAY.DEV — Developers (no production access)
PPAY.OPS — Operations (execute, no modify)
PPAY.DBA — Database administrators
PPAY.SEC — Security administrators
PPAY.AUDIT — Auditors (read-only everywhere)
PPAY.COMPLIANCE — Compliance team (OFAC queue access)
PPAY.BATCH — Batch job user IDs (non-human)
Separation of Duties Matrix:
┌──────────────┬─────┬─────┬─────┬─────┬─────┬──────┐
│ Action │ DEV │ OPS │ DBA │ SEC │ AUD │ COMP │
├──────────────┼─────┼─────┼─────┼─────┼─────┼──────┤
│ Deploy code │ ✗ │ ✓ │ ✗ │ ✗ │ ✗ │ ✗ │
│ Modify DB2 │ ✗ │ ✗ │ ✓ │ ✗ │ ✗ │ ✗ │
│ Grant access │ ✗ │ ✗ │ ✗ │ ✓ │ ✗ │ ✗ │
│ View audit │ ✗ │ ✗ │ ✗ │ ✗ │ ✓ │ ✓ │
│ Release OFAC │ ✗ │ ✗ │ ✗ │ ✗ │ ✗ │ ✓ │
│ Read source │ ✓ │ ✗ │ ✗ │ ✗ │ ✓ │ ✗ │
│ Submit batch │ ✗ │ ✓ │ ✗ │ ✗ │ ✗ │ ✗ │
│ View logs │ ✓ │ ✓ │ ✓ │ ✓ │ ✓ │ ✓ │
└──────────────┴─────┴─────┴─────┴─────┴─────┴──────┘
The key principle: no single person can both write code and deploy it to production. No single person can both initiate a payment and approve it. No single person can both grant access and use that access. This is not just good security practice — it is a SOX requirement for any publicly traded financial institution.
38.8.2 Encryption Architecture
Encryption at Rest:
- DB2 table spaces: Encrypted with ICSF AES-256
- MQ messages: Encrypted at rest using MQ AMS (Advanced Message Security)
- Batch datasets: Encrypted using z/OS dataset encryption (RACF ICSF key labels)
- Backups: Encrypted before writing to tape/virtual tape
Encryption in Transit:
- CICS web services: TLS 1.3 with mutual authentication
- MQ channels: TLS 1.3 with channel-level authentication
- DB2 DRDA: TLS 1.3 for distributed connections
- Sysplex communication: CF encryption enabled
- Cross-site replication: IPSec tunnel with AES-256
Key Management:
- ICSF (Integrated Cryptographic Service Facility) manages all keys
- Master key ceremony: Dual control, split knowledge
- Key rotation: Every 90 days for data encryption keys
- HSM (Hardware Security Module): Crypto Express7S adapters, FIPS 140-2 Level 4
38.8.3 PCI-DSS Controls Mapping
Although PinnaclePay primarily processes bank-to-bank payments, some payment flows touch card data (debit card-initiated ACH). The PCI-DSS v4.0 controls mapping for the mainframe components:
| PCI-DSS Requirement | PinnaclePay Implementation | Chapter Reference |
|---|---|---|
| 1. Network segmentation | LPAR isolation, VTAM session limits | Ch 2, 3 |
| 2. Secure configurations | LE runtime hardening, CICS security | Ch 5, 15 |
| 3. Protect stored data | DB2 encryption, dataset encryption | Ch 8, 30 |
| 4. Encrypt transmission | TLS 1.3 everywhere | Ch 21, 30 |
| 5. Malware protection | z/OS integrity — no traditional malware vector | Ch 30 |
| 6. Secure development | SDLC, code review, SAST scanning | Ch 35, 36 |
| 7. Restrict access | RACF profiles, need-to-know | Ch 29 |
| 8. Identify/authenticate | RACF user IDs, password policy, MFA | Ch 29 |
| 9. Physical access | Data center controls (not in scope for this architecture) | — |
| 10. Logging/monitoring | SMF records, audit trail, SIEM feed | Ch 32 |
| 11. Security testing | Penetration testing, vulnerability scanning | Ch 31 |
| 12. Security policy | Information security program | Ch 31 |
38.8.4 OFAC/BSA/AML Compliance Architecture
Wire transfers have specific anti-money laundering requirements. PinnaclePay implements:
- Real-time OFAC screening for every wire transfer (online, within 30ms budget)
- Batch OFAC screening for all ACH payments (during batch processing window)
- Transaction monitoring — pattern detection for structuring, unusual velocity, geographic risk
- Currency Transaction Reports (CTRs) — automated generation for transactions over $10,000
- Suspicious Activity Reports (SARs) — workflow for compliance team to file with FinCEN
The OFAC screening engine maintains a hashed SDN list in a CICS shared data table, updated daily from the OFAC website via automated download and load process. Sandra Chen's team at Federal Benefits developed the original pattern for this (Chapter 30), and PinnaclePay adapts it with higher-performance hashing suited to the wire transfer volume.
38.9 Operational Excellence: Keeping It Running
38.9.1 Monitoring Architecture
Monitoring for PinnaclePay follows the three-tier model from Chapter 32:
Tier 1 — Infrastructure Monitoring: - RMF (Resource Measurement Facility) for z/OS resource utilization - CICS Statistics for region performance - DB2 Performance Monitor for database health - MQ monitoring for queue depths and channel status - Thresholds trigger automated alerts via Tivoli/NetCool integration
Tier 2 — Application Monitoring: - Transaction response time tracking (per program, per transaction type) - Batch job duration and throughput monitoring - Error rate tracking (DB2 SQLCODEs, CICS abend rates, MQ reason codes) - Business metrics: payments processed per minute, settlement completion rate
Tier 3 — Business Monitoring: - Payment volume dashboards (real-time, by type, by status) - Settlement position monitoring (net position against each counterparty) - OFAC screening hit rate and false positive rate - SLA compliance: what percentage of wires settle within target time
Key Alert Thresholds:
CRITICAL (page on-call immediately):
- Wire processing response time > 500ms (3x budget)
- Any CICS region abend rate > 0.1%
- MQ queue depth > 10,000 on any payment queue
- DB2 lock timeout rate > 0.01%
- Any component availability < 99.99%
- OFAC screening failure (unable to screen)
WARNING (notify team, investigate within 1 hour):
- Wire processing response time > 250ms (1.5x budget)
- CICS region CPU utilization > 70%
- Batch job running > 110% of expected duration
- MQ channel in retry state
- DB2 buffer pool hit ratio < 98%
INFORMATIONAL (review next business day):
- Daily volume variance > 20% from forecast
- New OFAC screening false positive patterns
- DB2 RUNSTATS data more than 7 days old
38.9.2 Disaster Recovery
PinnaclePay's DR design uses GDPS (Geographically Dispersed Parallel Sysplex) as described in Chapter 33:
Recovery Tiers:
| Tier | Components | RTO | RPO | Recovery Method |
|---|---|---|---|---|
| 1 | Wire/RTP processing | 15 min | Zero | GDPS/XRC automatic failover |
| 2 | ACH online processing | 30 min | Zero | GDPS/XRC with manual confirmation |
| 3 | Batch processing | 2 hours | Zero | Restart batch from checkpoint at DR site |
| 4 | Reporting/Analytics | 4 hours | 1 hour | Restore from backup at DR site |
DR Test Schedule: - Monthly: Component-level failover test (one CICS region, one DB2 member) - Quarterly: Full application failover (all payment processing to DR site for 4 hours) - Annually: Full site failover with Federal Reserve connectivity verification
📊 The DR Test That Matters: The quarterly full-application failover is the test that proves you can actually survive a disaster. I have seen too many shops that pass component tests perfectly but fail the full-application test because of a dependency they did not know about — a RACF profile not replicated, an MQ channel authority not configured at the DR site, a batch scheduler entry missing. The quarterly test finds these gaps before a real disaster does.
38.9.3 Production Runbook Structure
Every production system needs runbooks — the step-by-step procedures that the 2 AM on-call person follows when something breaks. PinnaclePay's runbook covers:
Runbook 1 — CICS Region Recovery: 1. Identify which AOR region is unresponsive (CEMT I TASK / CICS SM dashboard) 2. Check if the issue is a long-running task (CEMT S TASK PURGE) or region hang 3. If task purge resolves, investigate the specific transaction and escalate to development 4. If region is hung, initiate controlled shutdown: CEMT P SHUT IMMEDIATE 5. Verify CICSPlex SM has rerouted traffic to alternate AOR 6. Restart the failed region: cold start if data integrity is uncertain, warm start otherwise 7. Verify region health: transaction rate, response time, error rate 8. Document the incident, timestamp every action
Runbook 2 — DB2 Member Recovery: 1. Identify the failed DB2 member from z/OS console messages (DSNR0XXI) 2. If member crash, the data sharing group automatically recovers locks 3. Issue -START DB2 to restart the member 4. Check for indoubt units of work: -DISPLAY THREAD(*) TYPE(INDOUBT) 5. Resolve indoubt threads based on their originator (CICS, MQ, batch) 6. Verify data sharing group health: -DISPLAY GROUP 7. Run verification queries on recent payments to confirm data integrity 8. Monitor for 30 minutes before downgrading the incident
Runbook 3 — MQ Queue Manager Recovery: 1. Identify the failure from MQ error log (AMQERR01.LOG) 2. If channel failure, restart the affected channel: START CHANNEL(name) 3. If queue manager failure, check for pending messages on the dead-letter queue 4. Restart queue manager: START QMGR 5. Verify cluster status: DISPLAY CLUSQMGR(*) 6. Check for message backlog on payment queues 7. If backlog exists, monitor drain rate and alert downstream systems of delayed processing
Runbook 4 — Batch Job Failure: 1. Check job output in SDSF for the failing step 2. If ABEND code is S0C7 (data exception), check input data quality — likely bad record in ACH file 3. If ABEND code is S878 (storage), increase REGION parameter 4. If SQL error, check DB2 error code and resolve (lock timeout → retry; tablespace full → extend) 5. Restart job from the failing step using //jobname RESTART=stepname 6. Verify checkpoint dataset is intact before restart 7. If checkpoint is corrupted, assess whether re-running from the beginning is safe (idempotency check)
38.9.4 Capacity Planning Model
Capacity planning (Chapter 34) for PinnaclePay uses a growth model based on historical trends and business projections:
Capacity Planning Model — PinnaclePay 3-Year Projection:
Year 1 Year 2 Year 3
ACH Volume/Day 3.0M 3.3M 3.6M (10% YoY growth)
Wire Volume/Day 200K 220K 240K (10% YoY growth)
RTP Volume/Day 1.8M 2.5M 3.5M (40% YoY growth)
Total/Day 5.0M 6.0M 7.3M
Peak TPS 2,000 2,400 3,000
MIPS Required 4,500 5,400 6,750
MIPS Available 12,000 12,000 15,000 (upgrade Year 3)
Utilization 37.5% 45.0% 45.0%
DB2 Storage (TB) 2.5 3.5 5.0 (growth + retention)
MQ Log (GB/day) 50 60 75
The Year 3 upgrade is a planned capacity event — adding 3,000 MIPS via a processor upgrade or additional CPs. This should be budgeted in Year 1 to avoid surprise capital expenditure.
38.10 Modernization Roadmap: The Path Forward
38.10.1 Hybrid Architecture Vision
PinnaclePay is built on the mainframe because that is where payment processing belongs — but it does not exist in isolation. Chapter 35's modernization patterns apply here. The three-year modernization roadmap:
Year 1 — API Enablement: - Deploy z/OS Connect EE for RESTful API access to all payment services - Implement API gateway with OAuth 2.0 for partner authentication - Build developer portal for fintech partners to onboard - Result: Mainframe processing power accessible via modern APIs
Year 2 — Event-Driven Architecture: - Implement event streaming (Kafka on z/OS or MQ Streaming Queue) for real-time payment events - Build cloud-native analytics platform consuming mainframe events - Deploy machine learning models for fraud detection (trained in cloud, scoring on z/OS via zIIP) - Result: Real-time visibility into payment flows for business users
Year 3 — Selective Decomposition: - Extract read-heavy services (payment status, history queries) to cloud-native microservices backed by replicated data stores - Mainframe retains: core payment processing, settlement, regulatory compliance - Implement CQRS (Command Query Responsibility Segregation) pattern — commands on mainframe, queries in cloud - Result: Optimal workload placement — mainframe for what it does best, cloud for what cloud does best
38.10.2 CI/CD Pipeline
The CI/CD pipeline (Chapter 36) for PinnaclePay follows the enterprise pattern with additional payment-specific gates:
Pipeline Stages:
1. Code Commit (Git → Enterprise GitHub)
2. Static Analysis (COBOL Lint + Security SAST scan)
3. Unit Test (on z/OS dev LPAR, PPAYDEV)
4. Integration Test (CICS region test, DB2 test schema)
5. Payment Protocol Conformance Test
- ACH: NACHA file format validation
- Wire: Fedwire message format validation
- RTP: ISO 20022 message format validation
6. Performance Test (response time regression check)
7. Security Gate (RACF profile verification, encryption validation)
8. Compliance Gate (OFAC screening test, audit trail verification)
9. Staging Deployment (mirror production, synthetic transactions)
10. Production Deployment (blue/green with CICS NEWCOPY)
11. Post-Deployment Verification (smoke tests, 30-minute monitoring)
The payment protocol conformance tests (stage 5) are unique to payment processing. A COBOL program that compiles and passes unit tests can still generate an invalid ACH file that the Federal Reserve rejects. These tests run actual payment messages through the system and validate the output against the NACHA and Fedwire format specifications.
38.10.3 What Stays on the Mainframe and Why
This is the question every CTO asks, and here is the honest answer:
Stays on the mainframe: - Core payment processing (ACID transactions, regulatory requirement) - Settlement (requires DB2 data sharing for zero RPO) - OFAC/AML screening (proximity to payment data, latency requirements) - Audit trail (immutable, regulatory requirement) - Batch processing (ACH file processing, mainframe batch is unmatched)
Can move to cloud/distributed: - API gateway (keep z/OS Connect, but add cloud API management layer) - Payment status inquiries (read replicas in cloud) - Analytics and reporting (event stream to cloud analytics) - Developer portal and documentation - Non-regulatory notification services
The decision framework: If a component touches money, regulatory compliance, or the audit trail, it stays on the mainframe. If it is read-only, customer-facing presentation, or analytical, it can move to the cloud. If it is somewhere in between, keep it on the mainframe until you have proven the distributed alternative meets the same availability and integrity requirements.
38.10.4 The Skills Transition Plan
The hardest part of the modernization roadmap is not the technology — it is the people. PinnaclePay requires a team that can operate across both mainframe and cloud technologies, and that team does not exist today at Pinnacle Financial. The skills transition plan addresses this reality:
Year 1 — Cross-Training Foundation: - Send 3 experienced Java developers through IBM's Enterprise COBOL and CICS training program (12-week curriculum) - Pair each new mainframe developer with a senior mentor (Rob Chen at CNB has validated this mentorship model with three successful cohorts) - Establish "bridge architects" — 2 people who are fluent in both mainframe and cloud and can translate between the teams - Create a shared code review process where mainframe and cloud developers review each other's work
Year 2 — Specialization and Depth: - The cross-trained developers take ownership of the z/OS Connect API layer — they understand both sides of the API boundary - Establish a "mainframe SRE" role — people who bring cloud-native SRE practices (SLOs, error budgets, toil reduction) to mainframe operations - Begin documenting tribal knowledge — the operational wisdom that exists only in the heads of the senior team. Every runbook, every tuning parameter, every "we learned this the hard way" insight must be written down
Year 3 — Self-Sufficiency: - The team can independently design, develop, test, deploy, and operate PinnaclePay without relying on external consultants for routine work - Senior mainframe staff transition to architecture and mentorship roles - The CI/CD pipeline has reduced the skill barrier for routine changes to the point where a developer with 6 months of mainframe experience can safely deploy a COBOL program change
This plan is not free. The training budget for Year 1 alone is approximately $350,000, and the productivity impact of senior staff spending 20% of their time on mentorship is real. But the alternative — trying to hire experienced COBOL/CICS/DB2 developers in a market where they are retiring faster than they are being replaced — is far more expensive and far less reliable.
🔵 What Sandra Chen's Federal Benefits Team Learned: Sandra's team at Federal Benefits went through a similar skills transition three years ago. Their key insight: the cross-trained developers did not need to become mainframe experts. They needed to become competent mainframe practitioners who understood the architecture well enough to make safe changes and recognize when they needed expert help. The 80/20 rule applied — 80% of routine maintenance work could be done by developers with solid-but-not-expert mainframe skills, freeing the true experts for architecture, troubleshooting, and mentorship.
38.11 Architecture Review Presentation: Defending Your Design
38.11.1 Preparing for the Review Board
At Pinnacle Financial, Diane Chen chairs the Architecture Review Board (ARB). The board includes the CISO, the Head of Payments, the Operations Director, the CFO's representative, and two external consultants (one from a Big Four firm, one from a competing bank who serves as a technical advisor under NDA). Ahmad Rashid presents the PinnaclePay architecture. This is how he structures the presentation.
The 90-Minute Architecture Review — Structure:
Minutes 0–5: Executive Summary (1 slide)
"What are we building, why, and what does it cost?"
Minutes 5–15: Business Requirements (3 slides)
Volume, availability, regulatory drivers
"What happens if we don't build this?"
Minutes 15–35: Technical Architecture (8 slides)
One slide per subsystem, showing how they connect
Each slide: what, why this design, what alternatives
were considered
Minutes 35–50: Security and Compliance (4 slides)
Threat model, RACF design, encryption, PCI-DSS mapping
CISO will ask the hard questions here
Minutes 50–60: Operations and DR (3 slides)
Monitoring, runbooks, DR test results
Operations Director lives here
Minutes 60–70: Cost Model and Timeline (3 slides)
TCO, ROI, implementation phases
CFO's representative pays attention here
Minutes 70–85: Q&A
The real review happens in Q&A
Minutes 85–90: Decision
Approve / Approve with conditions / Send back for rework
38.11.2 Anticipating the Hard Questions
Every architecture review has questions that can derail the presentation. Here are the ones Ahmad must be ready for:
"Why mainframe and not cloud-native?"
Answer: "Three reasons. First, five-nines availability with zero data loss requires synchronous replication and transactional integrity that cloud-native architectures achieve only with significant additional complexity and cost. The mainframe provides this natively through Parallel Sysplex and DB2 data sharing. Second, our Federal Reserve connectivity agreements require that payment processing infrastructure pass FFIEC examination — our mainframe environment has twenty years of clean examination history. A cloud migration would require re-examination and re-certification, a twelve to eighteen month process. Third, total cost of ownership: when you account for the development cost of building cloud-native equivalents of CICS transaction management, DB2 data sharing, and GDPS disaster recovery, the mainframe is actually 30% less expensive over five years."
"What is the single biggest risk?"
Answer: "Skills. The architecture is sound — every component is proven technology running in production at hundreds of banks. The risk is that we cannot hire enough COBOL/CICS/DB2 developers to build and maintain it. Our mitigation: the modernization roadmap progressively reduces the surface area of mainframe-specific code, the CI/CD pipeline reduces the skill barrier for routine changes, and we are investing in a training program. Chapter 37 of our reference architecture addresses this directly."
"What happens during a Fedwire outage?"
Answer: "We have three scenarios documented. If the outage is on our side, GDPS fails over to the DR site within 15 minutes and the Federal Reserve connection is re-established using the DR site's pre-configured Fedwire endpoint. If the outage is on the Federal Reserve's side, we queue all outbound wire transfers in MQ (persistent messages) and process them in order when the connection is restored. We have tested this scenario, and MQ can hold 48 hours of wire transfer volume without queue depth or storage issues. If both sides are down — a scenario that has never occurred in Fedwire's history — we invoke our business continuity plan for manual wire processing via phone, which is documented in Runbook 7."
"What is the cost?"
The Total Cost of Ownership model:
PinnaclePay 5-Year TCO:
Capital Expenditure (Year 0):
z16 hardware (2 machines) $8.2M
Software licenses (DB2, CICS, MQ) $2.1M/year
Network infrastructure $1.5M
Implementation services $4.8M
Total CapEx $14.6M
Operating Expenditure (Annual):
IBM software maintenance $1.8M
z/OS MLC (Monthly License Charge) $3.2M
Staff (12 FTEs: 6 dev, 3 ops, 2 DBA, 1 sec) $2.4M
Data center (power, space, cooling) $0.8M
DR data center $0.6M
Federal Reserve connectivity $0.3M
Total OpEx/year $9.1M
5-Year TCO: $14.6M + (5 × $9.1M) = $60.1M
ROI Calculation:
Current payment processing cost $15.2M/year
(outsourced to third-party processor)
PinnaclePay annual cost (after Year 1) $9.1M/year
Annual savings $6.1M/year
Payback period 2.4 years
5-Year ROI 51%
Additional revenue from RTP services $3.5M/year (Year 2+)
Adjusted 5-Year ROI 122%
38.11.3 Common Architecture Review Failures
In twenty-five years, I have seen architectures fail their review for these reasons. Learn from others' mistakes:
-
No threat model. Presenting a payment system architecture without a threat model is like presenting a building design without structural calculations. The CISO will send you back immediately.
-
Handwaving the DR plan. "We'll use GDPS" is not a DR plan. "We tested GDPS failover on October 15, it completed in 12 minutes 37 seconds, and here are the three issues we found and fixed" is a DR plan.
-
Ignoring the batch window. Online processing gets all the attention in architecture reviews, but batch failures cause the most operational pain. If your batch window has less than 25% margin, you will miss it within six months.
-
Optimistic capacity planning. If your capacity model shows 80% utilization at launch, you are already in trouble. The first peak load event — Black Friday, tax day, a government stimulus payment — will exceed 100%.
-
No operational story. Who gets paged? What do they do? How long does it take? If you cannot answer these questions, your system is not production-ready regardless of how elegant the architecture is.
38.12 The Complete Deliverable: Your Portfolio Artifact
38.12.1 What You Should Have Produced
If you have been building the progressive project throughout this textbook, you should now have the following artifacts for the HA Banking Transaction Processing System. This checklist is your final deliverable:
Architecture Document (Core): - [ ] System context diagram showing all external interfaces - [ ] Logical architecture diagram (the seven-subsystem view from Section 38.2) - [ ] Physical topology diagram showing LPARs, network, and data centers - [ ] Component inventory with version numbers and dependencies
z/OS Infrastructure (Part I): - [ ] LPAR configuration specification - [ ] WLM policy definition with service classes and goals - [ ] LE runtime options for CICS and batch - [ ] Capacity planning model with 3-year projections
Data Architecture (Part II): - [ ] Logical data model (entity-relationship diagram) - [ ] Physical data model (DB2 DDL for all tables) - [ ] Partitioning strategy document - [ ] Index design with access path analysis - [ ] Temporal table design for regulatory compliance - [ ] Data sharing group configuration
Online Processing (Part III): - [ ] CICSplex topology diagram - [ ] Transaction routing rules - [ ] CICS Web Services endpoint catalog - [ ] Transaction flow diagrams for each payment type - [ ] Response time budget breakdown
Integration (Part IV): - [ ] MQ cluster topology - [ ] Queue inventory with naming conventions - [ ] Message flow diagrams - [ ] API catalog with OpenAPI specifications - [ ] Channel and connectivity matrix
Batch Processing (Part V): - [ ] Batch schedule (daily, weekly, monthly, quarterly) - [ ] JCL for all batch jobs - [ ] Checkpoint/restart specifications - [ ] Parallel processing design - [ ] Batch window analysis with margin calculation
Security (Part VI): - [ ] RACF group and access control design - [ ] Encryption architecture (at rest, in transit, key management) - [ ] PCI-DSS controls mapping - [ ] OFAC/AML screening design - [ ] Threat model
Operations (Part VI continued): - [ ] Monitoring architecture (Tier 1/2/3) - [ ] Alert threshold matrix - [ ] Production runbooks (at least 6) - [ ] DR design with RTO/RPO tiers - [ ] DR test plan and schedule
Modernization (Part VII): - [ ] 3-year modernization roadmap - [ ] CI/CD pipeline design - [ ] API strategy - [ ] Skills and training plan
Business Case: - [ ] Total Cost of Ownership (5-year) - [ ] ROI calculation - [ ] Risk register with mitigations - [ ] Implementation timeline
38.12.2 Quality Checklist
Use this checklist to evaluate your architecture before submitting it for review:
Completeness: - Does every component in the logical architecture appear in the physical topology? - Is every external interface documented with protocol, format, and SLA? - Does every online transaction have a documented flow with response time budget? - Does every batch job have a documented schedule, dependencies, and restart procedure?
Consistency: - Do the MIPS numbers in the capacity plan match the hardware in the physical topology? - Do the DB2 table names in the data model match the SQL in the COBOL programs? - Do the queue names in the MQ topology match the queue names in the application design? - Do the RACF groups in the security design match the roles in the operations runbooks?
Defensibility: - For every major design decision, can you articulate: what you chose, what you rejected, and why? - Have you identified at least three risks and documented mitigations for each? - Have you tested your DR plan, not just designed it? - Can you trace any payment from origination to settlement through every component in the architecture?
Honesty: - Have you documented known limitations and technical debt? - Have you identified single points of failure and your plan to eliminate them? - Is your capacity model based on measured data, not estimates? - Does your ROI calculation use conservative assumptions?
38.12.3 From Architecture to Career
This architecture document — if you have built it with the rigor this textbook demands — is more than a course deliverable. It is a portfolio artifact that demonstrates mastery of enterprise systems architecture. In an industry where the average mainframe professional is approaching retirement age and the demand for these skills is increasing, this document tells a potential employer: "I understand not just the individual technologies, but how they fit together to build systems that move real money."
When Kwame Asante hired Lisa Park at CNB, it was not because she could write a COBOL program. It was because she could explain how a COBOL program running in CICS, connected to DB2 through a data sharing group, receiving messages from MQ, secured by RACF, monitored by SMF, and recoverable through GDPS comes together to process a wire transfer that moves ten million dollars from one bank to another in under 200 milliseconds with five-nines availability.
That is what this textbook has been building toward. That is what this capstone demonstrates. And that is what the industry needs.
38.13 Bringing It All Together: The Morning Wire Run
Let us end this chapter the way a real payment system operates — by walking through the first wire transfer of the business day.
It is 6:00 AM Eastern Time. The Federal Reserve's Fedwire system opens for the day. Within seconds, a message arrives at PinnaclePay.
The TCP/IP listener on PPMQ1 receives the Fedwire message — a $2.3 million funds transfer from JPMorgan Chase to a Pinnacle Financial commercial customer. The message is placed on queue PPAY.WIRE.INBOUND.
The MQ trigger monitor fires CICS transaction PWRX in PPAOR1. The CICS task dispatcher picks it up within 2 milliseconds. Program PPWRXPRS parses the Fedwire IMAD, validates the format, and extracts the critical fields: sender ABA, receiver ABA, amount, business function code.
Program PPWROFAC takes over. It hashes the beneficiary name and checks it against the SDN hash table in the CICS shared data table. No match. The payment is clean. Elapsed time: 0.3 milliseconds for the hash lookup plus 15 milliseconds for the compliance logging MQ round-trip.
Program PPWRVAL checks the receiver ABA against Pinnacle's routing table, verifies the beneficiary account exists and is active, and confirms the wire amount is within the counterparty's limit. Two DB2 queries, each returning in under 3 milliseconds thanks to buffered indexes.
Program PPWRDB2 begins the critical section. Within a single DB2 unit of work, it inserts a row into PINNACLE.PAYMENT with status 'ACT', inserts a row into PINNACLE.WIRE_DETAIL with the Fedwire IMAD, inserts a row into PINNACLE.PAYMENT_AUDIT recording the creation event, and updates the beneficiary's account balance. Four SQL statements. One commit. The DB2 log write takes 1.2 milliseconds to the coupling facility log buffer. The data is now durable across all four members of the data sharing group.
Program PPWRPRC updates the general ledger position for Pinnacle's settlement account with the Federal Reserve. The net position increases by $2.3 million. A message is placed on PPAY.GL.UPDATES for the end-of-day GL posting batch.
Program PPWRCNF generates the Fedwire acknowledgment message and places it on PPAY.WIRE.OUTBOUND. MQ transmits it to the Federal Reserve within seconds.
Program PPWRNTF publishes a payment event to PPAY.NOTIFY.EVENTS. A downstream consumer picks it up and triggers a notification to the beneficiary's relationship manager: "Wire received for your client, $2.3M from JPMorgan Chase."
Total elapsed time: 87 milliseconds. Well within the 150ms budget. The $2.3 million has been received, screened, validated, posted, acknowledged, and logged.
This transaction will be repeated 200,000 times today. Every one of them will be screened against OFAC. Every one of them will be committed with zero data loss. Every one of them will be traceable from origination to settlement, now and for the seven years that federal regulations require.
Tonight, the ACH batch will process 3 million more payments. The end-of-day settlement will reconcile every penny. The DR system will silently replicate every byte to the secondary site. And tomorrow morning, when the Federal Reserve opens Fedwire again at 6:00 AM, PinnaclePay will be ready.
That is what this textbook has taught you to build.
Chapter Summary
This capstone chapter synthesized all seven parts of this textbook into a complete architecture for PinnaclePay, a high-availability national payment processing system. We defined requirements (5M transactions/day, five-nines availability, zero data loss), designed the architecture across all layers (z/OS infrastructure, DB2 data model, CICS topology, MQ messaging, batch processing, security, operations, and modernization), and produced a comprehensive deliverable checklist.
The key lesson of this chapter — and of this entire textbook — is that enterprise systems architecture is not about any single technology. It is about the disciplined integration of many technologies, each one excellent in its domain, into a coherent whole that serves the business reliably, securely, and economically. The mainframe is not a relic; it is the proven foundation on which the financial system runs. And the architects who understand it — all of it, from the WLM policy to the DR runbook — are the architects who build systems that work.
"Architecture is the art of making decisions that are expensive to change. Make them well." — Diane Chen, Pinnacle Financial Group, closing the Architecture Review Board meeting that approved PinnaclePay