Chapter 37 Key Takeaways: The Hybrid Architecture
Threshold Concept
Hybrid is the destination, not a waypoint. The economic reality (240 billion lines of active COBOL, $1.2B+ to rewrite a large system), technical reality (z/OS five-nines availability vs. cloud's four-nines), and practical reality (6,000+ developer-years to rewrite a Tier-1 bank's core banking) all point to permanent coexistence of mainframe and cloud. Design for permanence, not transition.
The Four Architecture Patterns
| Pattern | Purpose | Key Component | Critical Design Decision |
|---|---|---|---|
| Anti-Corruption Layer (ACL) | Synchronous request-response integration | z/OS Connect + custom extensions | ACL owns the contract; absorbs impedance mismatch (packed decimal → string, EBCDIC → UTF-8, CICS RESP → HTTP status) |
| Event Mesh | Asynchronous event-driven integration | MQ + MQ-Kafka connector + Kafka | Events flow out of mainframe easily (MQ); events flowing in as state changes require synchronous API calls |
| Shared-Nothing Data | Platform isolation with controlled replication | CDC (InfoSphere), batch transfer, API sync | Single system of record per entity; never dual-write; CDC for analytics, API for real-time |
| Hybrid API Gateway | Single entry point for all consumers | Kong / AWS API Gateway / Apigee | Consumers don't know which platform serves their request; enables transparent strangler fig migration |
The Four Data Consistency Patterns
| Pattern | Guarantee | Tradeoff | Best For |
|---|---|---|---|
| Eventual Consistency (CDC) | Cloud replica converges within SLA (e.g., 60 sec) | Cloud data may be stale during replication lag | Analytics, dashboards, reporting |
| Saga Pattern | Multi-step operations complete or fully compensate | Not atomic across platforms; compensating ≠ rollback | Cross-platform business processes (loan origination, account opening) |
| Event Sourcing at Boundary | Complete, immutable audit trail; temporal queries | Added complexity; high data volumes | Compliance-sensitive domains (claims, regulatory) |
| Conflict Resolution | Bidirectional updates reconciled per policy | Requires explicit conflict detection and resolution strategy | Multi-platform writes to same entity (avoid if possible) |
Critical Rule: Never use dual write. Always use a single system of record with replication.
Operational Model
Unified Monitoring — Four Golden Signals
| Signal | Mainframe Source | Cloud Source | Linked By |
|---|---|---|---|
| Latency | CICS response time (SMF 110) | API gateway response time | Correlation ID |
| Traffic | CICS task count, MQ queue depth | HTTP request count | Time window alignment |
| Errors | CICS abend rate, DB2 SQL errors | HTTP 5xx rate | Correlation ID |
| Saturation | CPU (RMF), DB2 threads, MQ channels | Pod CPU/memory, connection pools | Capacity model |
Key insight: The correlation ID (UUID propagated from API gateway through z/OS Connect into CICS TCTUA and through to all downstream components) is the single most valuable operational tool in hybrid architecture. It reduces mean-time-to-diagnose by 80%+ for cross-platform incidents.
Incident Response
- Single incident commander trained on both platforms
- Runbooks organized per business service, not per platform
- Shared war room / channel for cross-platform incidents
- Post-incident review covers the full transaction path across both platforms
Capacity Planning
Cloud growth impacts mainframe capacity. Mainframe optimization impacts cloud data freshness. Plan capacity as a coupled system, not two independent platforms.
Security Architecture
| Principle | Implementation |
|---|---|
| Identity Federation | Cloud IdP (OAuth 2.0) → Identity Bridge → RACF; service-to-service via mTLS + API keys |
| Zero Trust | Encrypt in transit (TLS 1.3, AT-TLS, IPSec); encrypt at rest (DFSMS, cloud KMS); authenticate every request; authorize at finest grain; log everything |
| Data Sovereignty | Tokenize/anonymize regulated data at the mainframe boundary before CDC replication; controlled, audited API access for identifiable data exceptions |
Critical Rule: Never store RACF passwords in cloud configuration. Use PassTickets or token-based authentication via the identity bridge.
Organizational Design
| Model | Best For | Key Strength | Key Limitation |
|---|---|---|---|
| Platform Teams + Integration Squad | Large orgs (CNB scale) | Deep platform expertise + dedicated bridge function | Expensive (12-person squad) |
| Product Teams + Embedded Expertise | Small-mid orgs (SecureFirst scale) | Aligned to business capabilities | Requires distributed mainframe knowledge |
| Federated + Architecture Governance | Orgs with rigid boundaries (FBA/government) | Works within immovable org constraints | Slower decision-making |
Universal requirements regardless of model: - Shared on-call rotation with cross-platform capability - Joint architecture reviews for all cross-platform features - Shared dashboards visible to both teams - Blameless post-incident reviews - Single documentation repository
Reference Architecture Components
Cloud Platform Integration Layer z/OS Platform
───────────── ───────────────── ────────────
Mobile/Web/Partner Apps API Gateway (routing, auth) z/OS Connect (ACL)
Cloud Microservices (K8s) Identity Bridge (OAuth↔RACF) CICS TS (transactions)
Cloud Data Platform Unified Monitoring DB2 (system of record)
Kafka (event streaming) MQ-Kafka Connector IBM MQ (messaging)
CDC Engine Batch/JES2
20-Year Design Principles
- Contracts, not implementations. Integrations use defined API specs and event schemas. Implementations behind contracts are replaceable.
- Replaceable components. API gateway, event platform, CDC engine — all isolated behind interfaces. No vendor lock-in at the integration layer.
- Data as the durable asset. Hardware, software, and languages change. Data structures and business rules endure.
- Organizational learning. Architecture Decision Records (ADRs) document not just what was decided but why, so future architects can evolve the architecture intelligently.
Rules of Thumb
- Build for permanence from day one. "Temporary" integration layers accumulate debt faster than purpose-built ones and always become permanent.
- The ACL is worth every dollar. It absorbs COMMAREA changes and API version updates without breaking consumers.
- CDC lag is the number you watch. Stale data in banking is dangerous. Monitor replication lag with the same urgency as CICS response time.
- Cultural change is harder than technical change. Budget 2x the time you think you need for team integration.
- Network redundancy is non-negotiable. Dual VPN tunnels (or Direct Connect + VPN) between platforms. A single network link is a single point of failure.
- Monetary values are strings, not JSON numbers. IEEE 754 doubles cannot represent all decimal values exactly. Use strings with precision contracts for financial data.
- Translate architecture into business metrics for the board. Cost per transaction, API response time, incident count, MIPS utilization — not architecture diagrams.