Chapter 37 Exercises: The Hybrid Architecture
Part A: Conceptual Questions
A1. Explain the threshold concept "hybrid is the destination, not a waypoint." How does this mental shift change the way an organization invests in integration infrastructure? Provide three specific examples of decisions that change when hybrid becomes the permanent target state.
A2. Define the anti-corruption layer (ACL) in the context of hybrid COBOL-cloud architecture. Why is it critical that the ACL "owns the contract" rather than the COBOL program or the cloud consumer? What happens when an organization skips the ACL and lets cloud microservices interact directly with CICS programs?
A3. Explain the difference between the event mesh and the ACL in terms of synchronous vs. asynchronous integration. Under what circumstances would you use the event mesh instead of the ACL for a given data flow? Under what circumstances would you use both?
A4. Describe the four data consistency patterns covered in Section 37.3 (eventual consistency with CDC, saga pattern, event sourcing at the boundary, conflict resolution for bidirectional sync). For each pattern, identify: (a) the consistency guarantee it provides, (b) the tradeoff it makes, and (c) one use case where it is the best choice.
A5. Why is "dual write" — updating both the mainframe database and the cloud database in the same operation — dangerous in hybrid architecture? What mechanism would you use instead, and how does it preserve data consistency?
A6. Explain why the correlation ID is described as "the key" to unified monitoring in hybrid architecture. How is the correlation ID generated, propagated, and used during incident diagnosis? What happens if correlation ID propagation breaks at any point in the chain?
A7. Define "data sovereignty" in the context of hybrid architecture. Why does Pinnacle Health Insurance tokenize PHI on z/OS before transmitting it to cloud analytics? What regulatory framework drives this decision?
A8. Describe the three organizational models for hybrid teams (platform teams with integration squad, product teams with embedded expertise, federated with governance). For each model, identify the type and size of organization it best serves and one significant limitation.
Part B: Applied Analysis
B1. Architecture Pattern Selection
You are designing a hybrid architecture for a mid-size insurance company (similar to Pinnacle) with the following requirements:
- Claims adjudication engine on z/OS (COBOL/DB2/CICS) processes 30 million claims per month
- New cloud-based member portal allows members to check claim status in real-time
- New cloud-based analytics platform performs fraud detection on claims data
- Mobile app sends push notifications when claim status changes
- Regulatory requirement (HIPAA): all PHI must be audited, and PII access must be logged
For each of the following data flows, identify which architecture pattern(s) from Section 37.2 apply (ACL, event mesh, shared-nothing, API gateway) and which data consistency pattern from Section 37.3 applies:
a) Member checks claim status via mobile app b) Fraud detection model scores a new claim c) Member updates their mailing address via the web portal d) Nightly batch generates claims summary for state regulatory filing e) Claim status changes trigger push notification to member's mobile device
B2. Saga Pattern Design
Design a saga for the following cross-platform business process at CNB:
Loan Origination: A customer applies for a personal loan through the mobile app. The process involves: 1. Creating a loan application record (mainframe DB2) 2. Running a credit check (cloud-based credit bureau API) 3. Calculating the loan terms (mainframe COBOL program using proprietary interest rate model) 4. Creating the loan account (mainframe DB2) 5. Sending approval notification (cloud notification service) 6. Enabling loan dashboard in mobile app (cloud customer profile service)
For each step, define: - The local transaction - The compensating transaction - What happens if this step fails (which prior steps must be compensated?) - How idempotency is ensured
B3. Conflict Resolution Design
A government agency (similar to FBA) has a hybrid system where: - Benefit eligibility records are maintained on the mainframe (COBOL/IMS) - A cloud-based case management portal allows case workers to update beneficiary information - A mainframe batch process updates beneficiary information based on annual re-certification mailings
Case: A caseworker updates a beneficiary's address via the cloud portal at 2:15 PM. At 2:17 PM, the mainframe batch process updates the same beneficiary's address based on a returned mail notification (to a "return to sender — forwarding address" provided by USPS).
a) Both updates are valid but contain different addresses. Design a conflict resolution strategy for this scenario. b) How would you detect this conflict? What metadata is needed? c) What is the resolution workflow? Who reviews the conflict? d) How do you prevent this class of conflict from recurring?
B4. Unified Monitoring Design
Design the unified monitoring dashboard for SecureFirst's hybrid architecture. The dashboard must provide a single view of system health across both z/OS and cloud platforms.
Specifications: - SecureFirst has 1 LPAR, 2 CICS AORs, 1 TOR, DB2 single instance, MQ single queue manager - Cloud: 3 Kubernetes clusters (dev, staging, prod), API gateway, Kafka, PostgreSQL - Business services: Balance inquiry, Funds transfer, Bill payment, Mobile authentication
For each business service: a) Define the golden signals (latency, traffic, errors, saturation) with specific data sources on both platforms b) Define the alerting thresholds (warning and critical) c) Define the correlation ID propagation path through all components d) Design one example alert rule that requires data from both platforms to trigger (e.g., "high latency on balance inquiry detected when mainframe CICS response time is normal but MQ queue depth is elevated")
B5. Security Architecture Analysis
Analyze the following security incident at a fictional hybrid-architecture bank:
Incident: A cloud-based analytics service was able to access a z/OS Connect API endpoint that returns full customer account details (including SSN and account numbers) without proper authorization. The root cause was that the API gateway's rate limiting was configured per-consumer, but the analytics service was using a shared service account with broad permissions.
a) Identify every control failure in this scenario. b) For each control failure, identify which hybrid security pattern from Section 37.5 was violated. c) Design a remediation plan that addresses each failure. d) What monitoring/alerting would have detected this before it became an incident?
Part C: Design Exercises
C1. Reference Architecture Customization
Adapt the CNB reference architecture from Section 37.7 for the following organization:
GreenField Credit Union — a small credit union with: - 200,000 members - Single LPAR, no Parallel Sysplex - COBOL/CICS/DB2 core banking (purchased package: Fiserv DNA on z/OS) - Budget: $500K/year for modernization - Staff: 2 mainframe operators, 3 cloud developers, no dedicated integration team - Goal: Enable mobile banking within 12 months
Your design must include: a) Simplified hybrid architecture diagram (fewer components than CNB) b) Data flow for the three most critical member-facing services c) Data consistency pattern for each data flow d) Operational model (who monitors what, given the small team?) e) 3-year roadmap with quarterly milestones
C2. ADR Writing Exercise
Write a complete Architecture Decision Record (ADR) for the following decision:
Context: Your organization (FBA) currently uses batch file transfer (FTP) to move data from the mainframe to a cloud analytics platform. The analytics team wants real-time data. You are considering three options: 1. Continue with batch FTP but increase frequency (hourly instead of daily) 2. Implement CDC (Change Data Capture) from DB2 to cloud database 3. Implement event sourcing — mainframe emits events for every state change
Your ADR must include: Title, Status, Context, Decision, Rationale (with quantified tradeoffs), Consequences (positive and negative), and Review Date.
C3. Organizational Design Exercise
You are the CTO of a mid-size bank (5M customers, 2 LPARs, 50-person IT department) that has just adopted a hybrid-permanent strategy. Currently, the mainframe team (15 people) and the cloud team (20 people) operate independently with no formal integration function.
Design an organizational transformation plan: a) Define the target team structure (choose and adapt one of the three models from Section 37.6) b) Identify the roles needed for the integration function c) Design a 6-month skills development plan for existing staff d) Define the communication patterns (meetings, tools, shared processes) that bridge the teams e) Address the likely resistance points and how you'll overcome them
C4. 20-Year Horizon Planning
Your organization's board of directors asks for a 20-year technology roadmap. You are the chief architect. Design a roadmap that:
a) Identifies the architectural constants (things that won't change in 20 years — hint: data integrity, security, business rules) b) Identifies the architectural variables (things that will change — hint: specific products, hardware platforms, programming languages) c) Designs the architecture around the constants, with pluggable interfaces for the variables d) Includes an "Assumptions Register" (per Sandra Chen's practice) with at least 10 assumptions and the criteria for when each one breaks e) Defines the annual review process for the roadmap
C5. Hybrid Incident Simulation
Design a tabletop incident simulation exercise for a hybrid team. The scenario:
It is 3:47 AM. The on-call engineer receives an alert: "Funds Transfer API — p99 latency exceeded 5,000ms (threshold: 500ms)." All other API endpoints are healthy. CICS response times are normal. Cloud services are normal.
Design the investigation playbook: a) What are the first five diagnostic steps? b) What data sources are checked on each platform? c) What is the most likely root cause based on the symptoms? (Hint: consider the integration layer) d) What is the resolution procedure? e) What post-incident actions prevent recurrence?
Part D: Research and Reflection
D1. Research one real-world hybrid mainframe-cloud architecture (examples: Commonwealth Bank of Australia, JPMorgan Chase, Social Security Administration, UK's HMRC). Describe their architecture, identify which patterns from this chapter they use, and evaluate their approach against the principles in Section 37.7.
D2. The "20-year horizon" claim in Section 37.1 is bold. Research the counter-arguments: what technologies or trends could accelerate full mainframe retirement? Evaluate each counter-argument against the four numbers presented at the start of Section 37.1. Write a 500-word position paper defending or challenging the 20-year timeline.
D3. Interview (or research) an architect who works in a hybrid mainframe-cloud environment. What patterns from this chapter do they recognize? What challenges do they face that this chapter doesn't address? What would they add to the reference architecture?
D4. Compare the saga pattern (Section 37.3) with XA two-phase commit. Why doesn't XA work across the mainframe-cloud boundary? What would need to change in cloud infrastructure to make distributed ACID transactions practical across WAN boundaries?
D5. The reference architecture uses Kafka for cloud-side event streaming and MQ for mainframe-side messaging. Research alternatives: AWS EventBridge, Azure Event Grid, Google Cloud Pub/Sub. Could any of these replace Kafka in the reference architecture? What would change? What would break?