Exercises — Chapter 38: Capstone — Architecting a High-Availability Payment Processing System
These exercises are designed to be completed as part of the capstone project. Unlike exercises in earlier chapters that focused on individual technologies, these exercises require you to integrate multiple subsystems and defend your design decisions. Several exercises are meant to be completed over multiple sessions and can serve as portfolio artifacts.
Exercise 38.1 — Requirements Analysis and Stakeholder Mapping
Difficulty: Intermediate | Time: 45 minutes | Bloom Level: Analyze
You have been given the following one-paragraph brief from the Head of Payments at Pinnacle Financial:
"We need a new payment processing system that handles ACH, wires, and real-time payments. It needs to be highly available, meet all regulatory requirements, and cost less than our current outsourced solution."
Tasks:
-
Expand this brief into a structured requirements document with at least 10 functional requirements and 8 non-functional requirements. For each requirement, assign a priority (Must, Should, Could, Won't — MoSCoW method) and a rationale.
-
Create a stakeholder analysis matrix identifying at least 6 stakeholders. For each stakeholder, document: - Their role and organizational position - Their primary concern regarding PinnaclePay - The specific questions they will ask during the architecture review - The evidence they need to see to be satisfied
-
Identify three requirements that are in tension with each other (for example, "maximum performance" vs. "complete audit trail"). For each tension, describe how PinnaclePay resolves it and what trade-offs were accepted.
Deliverable: A 3–5 page requirements document suitable for attachment to the architecture deliverable.
Exercise 38.2 — LPAR and WLM Policy Design
Difficulty: Advanced | Time: 60 minutes | Bloom Level: Create
Using the PinnaclePay requirements from Section 38.1, design the complete LPAR and WLM configuration.
Tasks:
-
Draw the LPAR layout for both data centers (Primary East and Secondary West). For each LPAR, specify: - Number of general-purpose CPs and zIIPs - LPAR weight (relative and absolute) - Memory allocation - Which subsystems run on each LPAR
-
Write the WLM service definition that implements the priority hierarchy from Section 38.3.2. Include: - At least 6 service classes with velocity or duration goals - Classification rules that assign work to the correct service class - Resource groups that prevent any single workload from consuming all CPs
-
Explain what happens during the nightly batch window when WLM needs to shift resources from online to batch processing. How does the LPAR weight adjustment work, and what safeguards prevent batch from starving the RTP workload that runs 24/7?
-
A new regulatory requirement mandates that all OFAC screening must complete within 10ms (down from the current 30ms budget). How does this change your WLM policy? What service class changes are needed?
Deliverable: LPAR specification document and WLM policy definition.
Exercise 38.3 — DB2 Data Model Extension
Difficulty: Advanced | Time: 60 minutes | Bloom Level: Create
The PinnaclePay data model in Section 38.4 covers the core payment tables. Extend it to support these additional requirements:
Tasks:
-
Design and write DDL for a CUSTOMER table that stores the originator and beneficiary information referenced by the PAYMENT table. Include: - Customer identification (multiple ID types: account number, tax ID, LEI) - Customer risk rating for AML purposes - Customer status and effective dates - Temporal table support for regulatory "as-of" queries
-
Design and write DDL for a SETTLEMENT_POSITION table that tracks Pinnacle's net position with each counterparty (other banks, the Federal Reserve) throughout the day. Consider: - How is the position updated in real-time as wires are processed? - How is the position reconciled during end-of-day batch? - What happens to the position when a payment is reversed?
-
Design the indexing strategy for your new tables. For each index, document: - The access pattern it supports - Whether it is clustered or non-clustered - The buffer pool assignment and why
-
Write a DB2 query using temporal tables that answers: "What was our net settlement position with JPMorgan Chase at exactly 14:00 ET on March 15, 2026?" Explain why this query is important for regulatory compliance.
Deliverable: DDL scripts and index design documentation.
Exercise 38.4 — CICS Transaction Flow Design
Difficulty: Advanced | Time: 75 minutes | Bloom Level: Create
Section 38.5 described the wire transfer transaction flow. Design the equivalent flow for Real-Time Payments (RTP).
Tasks:
-
Document the complete RTP payment flow from message receipt to confirmation. RTP differs from wire in several ways: - Messages use ISO 20022 XML format (not the proprietary Fedwire format) - The sender expects a response within 3 seconds (not minutes) - RTP supports Request for Payment (RfP) messages in addition to credit transfers - The Clearing House requires specific acknowledgment message types
-
Create a response time budget for the RTP flow, allocating milliseconds to each step. Your total budget is 200ms (the system's share of the 3-second end-to-end RTP SLA). Justify each allocation.
-
Design the CICS program structure for RTP processing. For each program, specify: - Program name (following the PPRTP* naming convention) - Input: what data it receives (COMMAREA, channel/container, MQ message) - Processing: what it does - Output: what data it produces - Error handling: what happens when it fails
-
What happens when the RTP message contains a Request for Payment instead of a credit transfer? How does your transaction routing handle this different message type? Draw the branching logic.
-
RTP operates 24/7/365, unlike wire transfers which follow Fedwire hours. How does this affect your CICS region maintenance strategy? You cannot shut down the RTP AOR for maintenance. Describe your approach.
Deliverable: RTP transaction flow document with response time budget and program specifications.
Exercise 38.5 — MQ Queue Design and Dead-Letter Handling
Difficulty: Intermediate | Time: 45 minutes | Bloom Level: Apply
The MQ design in Section 38.6 defines the queue topology. This exercise focuses on error handling — the part that separates production-ready designs from academic exercises.
Tasks:
-
For each dead-letter queue (PPAY.WIRE.DLQ, PPAY.ACH.DLQ, PPAY.RTP.DLQ), design a dead-letter handler program. Specify: - How the program identifies why the message ended up on the DLQ (examine the dead-letter header) - Classification of DLQ reasons into categories: format error, routing error, processing error, timeout - Automated recovery actions for each category (retry, reroute, alert, discard) - The alert that fires when a message cannot be automatically recovered
-
A Fedwire message arrives on PPAY.WIRE.INBOUND but the OFAC screening service is unavailable (PPAY.WIRE.OFAC.REPLY queue has no consumer). What happens? Design the timeout and retry logic: - How long should the wire processing program wait for an OFAC response? - What happens when the timeout expires? - How many retries before the wire is placed on the exception queue? - What alert fires, and who responds?
-
During a disaster recovery failover, MQ messages in transit between queue managers may be in doubt. Design the procedure for resolving in-doubt messages after a failover: - How do you identify in-doubt messages? - How do you determine whether the message was already processed? - What is the order of operations: resolve DB2 in-doubt first, or MQ in-doubt first? Why?
Deliverable: DLQ handler specifications and in-doubt resolution procedure.
Exercise 38.6 — Batch Window Analysis
Difficulty: Advanced | Time: 60 minutes | Bloom Level: Evaluate
Section 38.7 described PinnaclePay's batch architecture. This exercise asks you to stress-test it.
Tasks:
- Calculate the batch processing time for each step of the ACH daily processing schedule under three scenarios: - Normal day: 3M ACH transactions, no errors - Payroll peak (1st of month): 5.5M ACH transactions, 0.5% error rate - Government stimulus event: 8M ACH transactions (direct deposit of stimulus checks), 2% error rate
For each scenario, determine: total processing time, whether the batch window is sufficient, and what contingency action is needed if the window is exceeded.
-
The ACH batch currently uses 10-way parallelism (partitioned by originator ABA). Analyze whether this is optimal: - What is the theoretical speedup from 10-way parallelism? (Hint: Amdahl's Law — what portion of the batch is inherently serial?) - What happens if the data is skewed — one ABA (say, a large bank) accounts for 30% of the volume? - Would 20-way parallelism help? Calculate the expected improvement accounting for DB2 lock contention.
-
Design a batch schedule for the end-of-month processing cycle, which includes all daily processing PLUS: - Monthly interest calculation (processes all 2M active accounts) - Monthly statement generation (produces PDF statements for 2M accounts) - Regulatory report generation (Call Report data, HMDA data) - Monthly archive and purge (move data older than 90 days to archive tables)
Show that all processing fits within the available batch windows. Identify the critical path.
Deliverable: Batch window analysis with timing calculations and schedule.
Exercise 38.7 — Security Threat Model
Difficulty: Advanced | Time: 75 minutes | Bloom Level: Analyze
Create a threat model for PinnaclePay using the STRIDE methodology (Spoofing, Tampering, Repudiation, Information disclosure, Denial of service, Elevation of privilege).
Tasks:
-
For each external interface (FedACH, Fedwire, FedNow/RTP, Partner APIs), identify at least two threats per STRIDE category. For each threat: - Describe the attack scenario in specific terms - Rate the likelihood (Low/Medium/High) and impact (Low/Medium/High/Critical) - Describe the existing mitigation in the PinnaclePay architecture - Identify any residual risk
-
Identify three insider threat scenarios (a malicious or compromised employee) and describe how PinnaclePay's separation of duties, RACF controls, and audit trail mitigate each one.
-
A penetration testing firm reports that they were able to: - Enumerate valid payment IDs by observing response time differences between valid and invalid IDs in the status inquiry API - Cause a denial of service on the ACH batch by submitting a malformed ACH file that triggered a program abend
For each finding, design a remediation and explain how to verify the fix.
- PCI-DSS v4.0 Requirement 6.4.2 requires a web application firewall (WAF) or equivalent for public-facing web applications. The CICS Web Services endpoint is technically a web application. How does PinnaclePay address this requirement? If a traditional WAF cannot be deployed on z/OS, what compensating control would you propose?
Deliverable: STRIDE threat model document and remediation recommendations.
Exercise 38.8 — DR Test Plan and Execution
Difficulty: Advanced | Time: 60 minutes | Bloom Level: Create
Design a complete DR test plan for the quarterly full-application failover test described in Section 38.9.2.
Tasks:
-
Write the DR test plan document including: - Test objectives (what you are proving) - Scope (what is being failed over, what is excluded) - Pre-test checklist (what must be verified before starting) - Step-by-step test procedure (minute-by-minute timeline) - Success criteria (quantitative: RTO met, RPO met, transaction integrity verified) - Rollback procedure (how to return to normal operations) - Communication plan (who is notified at each stage)
-
Design the transaction integrity verification procedure. After failover, how do you prove that: - No committed transactions were lost? - No transactions were duplicated? - All in-flight transactions at the moment of failover were either completed or cleanly rolled back? - The audit trail is complete and consistent?
-
The DR test on July 15 revealed that three CICS regions at the DR site failed to start because their CSD (CICS System Definition) file had not been replicated since the last configuration change on July 10. Design a process to prevent this from happening again. What should be replicated, how often, and how do you verify replication completeness?
-
Federal Reserve examiners want to see the last three DR test reports. Design the report template that documents test results, issues found, remediation actions, and the sign-off process.
Deliverable: DR test plan, integrity verification procedure, and report template.
Exercise 38.9 — TCO and ROI Analysis
Difficulty: Intermediate | Time: 45 minutes | Bloom Level: Evaluate
Section 38.11.2 presented a TCO and ROI model. This exercise asks you to challenge it.
Tasks:
-
The CFO asks: "What happens to the ROI if RTP adoption grows at 20% instead of 40%?" Recalculate the 5-year TCO and ROI under these conservative assumptions. At what RTP growth rate does the project break even in Year 3 instead of Year 2.4?
-
Identify three cost categories that the Section 38.11.2 model may underestimate: - For each category, estimate the realistic cost - Recalculate the 5-year TCO with your adjustments - Is the project still financially justified?
-
A cloud vendor proposes an alternative: migrate all payment processing to their cloud platform at a cost of $4.5M/year (all-inclusive). Compare this to the mainframe TCO: - What are the hidden costs the cloud proposal likely does not include? - What are the transition costs of migrating from mainframe to cloud? - What regulatory risks exist in cloud migration for payment processing? - At what annual volume does the cloud option become more expensive per transaction?
-
The Head of Payments wants to add FedNow support (in addition to The Clearing House's RTP) in Year 2. Estimate the incremental cost: new development, Federal Reserve certification testing, additional MQ channels, modified CICS programs. How does this change the TCO?
Deliverable: Revised TCO/ROI analysis with sensitivity analysis and cloud comparison.
Exercise 38.10 — Modernization Roadmap Detail
Difficulty: Advanced | Time: 60 minutes | Bloom Level: Create
The modernization roadmap in Section 38.10 is intentionally high-level. This exercise asks you to fill in the details.
Tasks:
-
For Year 1 (API Enablement), create a detailed project plan: - Which payment services are exposed as APIs first, and why? - What is the API versioning strategy? - How do you handle API authentication for different partner types (banks vs. fintechs)? - What rate limiting policy protects the mainframe from API-driven overload? - Define the contract testing strategy between the API gateway and CICS programs
-
For Year 2 (Event-Driven Architecture), design the event schema: - What events does PinnaclePay publish? (List at least 10 event types) - What is the event format? (Design a JSON schema for a payment.completed event) - How do you guarantee event ordering for payments that must be processed in sequence? - What happens when an event consumer is down? How long are events retained?
-
For Year 3 (Selective Decomposition), design the CQRS split: - Which queries move to the cloud-native read model? - How is the read model kept in sync with the mainframe source of truth? - What consistency model does the read model use (eventual consistency with what SLA)? - How do you handle the transition period when some queries go to the mainframe and others to the cloud?
-
Identify two risks for each modernization year and describe mitigations.
Deliverable: Detailed modernization roadmap with project plans and risk analysis.
Exercise 38.11 — Architecture Review Simulation
Difficulty: Advanced | Time: 90 minutes | Bloom Level: Evaluate
This exercise simulates the Architecture Review Board presentation.
Tasks:
-
Prepare a 12-slide presentation deck (outline format, not actual slides) for the PinnaclePay architecture. Each slide should include: - Title - Key message (one sentence) - Content outline (bullet points) - Supporting data or diagram reference - Anticipated question from the review board
-
Role-play the Q&A session. For each of the following questions, write a 2–3 paragraph response that an experienced architect would give: - "The CISO asks: Your OFAC screening uses a hash table that is updated daily. What happens if a new SDN entry is added at 10 AM and a wire transfer for that person is processed at 11 AM — before the next daily update?" - "The Operations Director asks: Your monitoring shows 30+ alert thresholds. How many alerts per day do you expect in steady state? How do you prevent alert fatigue?" - "An external consultant asks: You have chosen DB2 data sharing over active-passive replication. What happens if the coupling facility fails?" - "The CFO's representative asks: Other banks are building cloud-native payment systems. Why are we investing $60M in mainframe technology?"
-
Based on the hard questions in Task 2, identify two areas where the PinnaclePay architecture should be strengthened. Propose specific design changes.
Deliverable: Presentation outline and Q&A response document.
Exercise 38.12 — Operational Runbook Development
Difficulty: Intermediate | Time: 45 minutes | Bloom Level: Create
Section 38.9.3 provided runbook outlines for four scenarios. Develop two additional complete runbooks.
Tasks:
-
Write Runbook 5 — ACH File Rejection for the scenario where the Federal Reserve rejects an outbound ACH return file due to format errors. Include: - How the error is detected (return code from FedACH transmission) - Diagnostic steps to identify the format error - Corrective actions (fix the file, resubmit within the NACHA deadline) - Escalation path if the deadline cannot be met - Post-incident review checklist
-
Write Runbook 6 — Performance Degradation for the scenario where wire transfer response times have gradually increased from 87ms to 300ms over the past hour. Include: - How the degradation is detected (monitoring alert) - Systematic diagnostic procedure (check CICS, DB2, MQ, WLM in order) - Common root causes and their specific remediation:
- DB2 lock contention
- CICS MaxTask limit reached
- MQ channel congestion
- WLM goal not being met due to competing workload
- Decision criteria for when to engage IBM support
- Communication template for notifying downstream systems of degraded performance
-
For both runbooks, estimate the Mean Time to Resolve (MTTR) for each root cause scenario. How does MTTR change based on whether the on-call person is a junior operator (1 year experience) vs. a senior system programmer (15 years experience)?
Deliverable: Two complete production runbooks with MTTR estimates.
Exercise 38.13 — End-to-End Payment Tracing
Difficulty: Intermediate | Time: 45 minutes | Bloom Level: Apply
Regulatory examiners require the ability to trace any payment end-to-end. Design the tracing capability.
Tasks:
-
A Federal Reserve examiner asks: "Trace payment P20260315143217001 from origination to settlement." Document every data source you would query: - Which DB2 tables? - Which MQ logs? - Which SMF records? - Which CICS transaction logs? - Which audit trail entries? - In what order would you query them, and why?
-
Design a payment tracing tool — a CICS inquiry transaction (PPTR) that, given a payment ID, retrieves and displays the complete lifecycle of a payment on a single screen. Specify: - The BMS map layout - The DB2 queries executed - How the tool handles payments that are still in-flight vs. completed vs. rejected - Performance requirements (the examiner should not wait more than 3 seconds)
-
The same examiner asks: "Show me all payments from originator account 1234567890 between March 1 and March 15, 2026, where the payment was initially rejected and then manually approved." Write the DB2 query using temporal tables and the audit trail.
-
PinnaclePay processes 5 million payments per day. After 7 years of retention (regulatory requirement), there will be approximately 12.8 billion audit records. How do you ensure the tracing tool still meets the 3-second response time requirement with this volume?
Deliverable: Tracing architecture design, tool specification, and long-term performance strategy.
Exercise 38.14 — Capacity Stress Testing
Difficulty: Advanced | Time: 60 minutes | Bloom Level: Evaluate
The capacity model in Section 38.9.4 projects normal growth. This exercise asks you to plan for abnormal events.
Tasks:
-
Model the "Black Friday ACH Event" — a major retailer processes payroll for 500,000 seasonal employees via ACH on the same day that regular monthly payroll runs. This creates a one-day spike of 7M ACH transactions (vs. normal 3M). Analyze: - Can the batch window accommodate this volume? - What is the DB2 log generation rate during peak processing? - Do the MQ queues have sufficient depth? - What WLM adjustments are needed for the day?
-
Model the "Federal Reserve Outage Recovery" — Fedwire is down for 2 hours (this has happened historically). When it comes back up, all delayed wire transfers flood in simultaneously. Estimate: - The backlog size (number and dollar value of delayed wires) - The processing time to clear the backlog - The impact on other payment types (ACH, RTP) during the backlog processing - The MQ queue depths during the flood
-
Model the "Regulatory Stress Test" — the Federal Reserve mandates an immediate re-screening of all wire transfers from the past 30 days against an updated SDN list (this happens when a major sanctions action occurs). Calculate: - The number of wire transfers to re-screen (30 days x 200K/day = 6M) - The processing time using the existing batch OFAC screening program - The impact on regular batch processing - Whether you need to request additional MIPS from IBM (on-demand capacity)
-
For each scenario, write a one-page operational response plan that the Operations Director can approve in advance.
Deliverable: Three stress scenario analyses with operational response plans.
Exercise 38.15 — Final Portfolio Assembly
Difficulty: Advanced | Time: 120 minutes | Bloom Level: Create
This is the final exercise of the textbook. Assemble your complete architecture deliverable.
Tasks:
-
Using the checklist from Section 38.12.1, assemble all artifacts you have created throughout this textbook's progressive project into a single architecture document. Include: - Title page with version, author, date, and approval signatures - Table of contents - Executive summary (1 page, readable by a non-technical executive) - All technical sections organized by subsystem - Appendices: glossary, acronym list, reference architecture standards
-
Conduct a self-review using the quality checklist from Section 38.12.2. For each checklist item, document: - Whether it passes (with evidence) - If it fails, what work remains
-
Write a one-page "Architecture Decision Record" (ADR) for the single most important design decision in PinnaclePay. The ADR should include: - Context: what problem were you solving? - Decision: what did you choose? - Alternatives considered: what did you reject? - Consequences: what are the trade-offs? - Status: accepted, with approval date
-
Write a one-page cover letter to accompany the architecture document if you were submitting it as part of a job application for a Senior Mainframe Architect position. Focus on what the document demonstrates about your skills and knowledge.
Deliverable: Complete architecture portfolio package.