Chapter 37 Key Takeaways: The Hybrid Architecture

Threshold Concept

Hybrid is the destination, not a waypoint. The economic reality (240 billion lines of active COBOL, $1.2B+ to rewrite a large system), technical reality (z/OS five-nines availability vs. cloud's four-nines), and practical reality (6,000+ developer-years to rewrite a Tier-1 bank's core banking) all point to permanent coexistence of mainframe and cloud. Design for permanence, not transition.


The Four Architecture Patterns

Pattern Purpose Key Component Critical Design Decision
Anti-Corruption Layer (ACL) Synchronous request-response integration z/OS Connect + custom extensions ACL owns the contract; absorbs impedance mismatch (packed decimal → string, EBCDIC → UTF-8, CICS RESP → HTTP status)
Event Mesh Asynchronous event-driven integration MQ + MQ-Kafka connector + Kafka Events flow out of mainframe easily (MQ); events flowing in as state changes require synchronous API calls
Shared-Nothing Data Platform isolation with controlled replication CDC (InfoSphere), batch transfer, API sync Single system of record per entity; never dual-write; CDC for analytics, API for real-time
Hybrid API Gateway Single entry point for all consumers Kong / AWS API Gateway / Apigee Consumers don't know which platform serves their request; enables transparent strangler fig migration

The Four Data Consistency Patterns

Pattern Guarantee Tradeoff Best For
Eventual Consistency (CDC) Cloud replica converges within SLA (e.g., 60 sec) Cloud data may be stale during replication lag Analytics, dashboards, reporting
Saga Pattern Multi-step operations complete or fully compensate Not atomic across platforms; compensating ≠ rollback Cross-platform business processes (loan origination, account opening)
Event Sourcing at Boundary Complete, immutable audit trail; temporal queries Added complexity; high data volumes Compliance-sensitive domains (claims, regulatory)
Conflict Resolution Bidirectional updates reconciled per policy Requires explicit conflict detection and resolution strategy Multi-platform writes to same entity (avoid if possible)

Critical Rule: Never use dual write. Always use a single system of record with replication.


Operational Model

Unified Monitoring — Four Golden Signals

Signal Mainframe Source Cloud Source Linked By
Latency CICS response time (SMF 110) API gateway response time Correlation ID
Traffic CICS task count, MQ queue depth HTTP request count Time window alignment
Errors CICS abend rate, DB2 SQL errors HTTP 5xx rate Correlation ID
Saturation CPU (RMF), DB2 threads, MQ channels Pod CPU/memory, connection pools Capacity model

Key insight: The correlation ID (UUID propagated from API gateway through z/OS Connect into CICS TCTUA and through to all downstream components) is the single most valuable operational tool in hybrid architecture. It reduces mean-time-to-diagnose by 80%+ for cross-platform incidents.

Incident Response

  • Single incident commander trained on both platforms
  • Runbooks organized per business service, not per platform
  • Shared war room / channel for cross-platform incidents
  • Post-incident review covers the full transaction path across both platforms

Capacity Planning

Cloud growth impacts mainframe capacity. Mainframe optimization impacts cloud data freshness. Plan capacity as a coupled system, not two independent platforms.


Security Architecture

Principle Implementation
Identity Federation Cloud IdP (OAuth 2.0) → Identity Bridge → RACF; service-to-service via mTLS + API keys
Zero Trust Encrypt in transit (TLS 1.3, AT-TLS, IPSec); encrypt at rest (DFSMS, cloud KMS); authenticate every request; authorize at finest grain; log everything
Data Sovereignty Tokenize/anonymize regulated data at the mainframe boundary before CDC replication; controlled, audited API access for identifiable data exceptions

Critical Rule: Never store RACF passwords in cloud configuration. Use PassTickets or token-based authentication via the identity bridge.


Organizational Design

Model Best For Key Strength Key Limitation
Platform Teams + Integration Squad Large orgs (CNB scale) Deep platform expertise + dedicated bridge function Expensive (12-person squad)
Product Teams + Embedded Expertise Small-mid orgs (SecureFirst scale) Aligned to business capabilities Requires distributed mainframe knowledge
Federated + Architecture Governance Orgs with rigid boundaries (FBA/government) Works within immovable org constraints Slower decision-making

Universal requirements regardless of model: - Shared on-call rotation with cross-platform capability - Joint architecture reviews for all cross-platform features - Shared dashboards visible to both teams - Blameless post-incident reviews - Single documentation repository


Reference Architecture Components

Cloud Platform                 Integration Layer              z/OS Platform
─────────────                 ─────────────────              ────────────
Mobile/Web/Partner Apps       API Gateway (routing, auth)     z/OS Connect (ACL)
Cloud Microservices (K8s)     Identity Bridge (OAuth↔RACF)   CICS TS (transactions)
Cloud Data Platform           Unified Monitoring              DB2 (system of record)
Kafka (event streaming)       MQ-Kafka Connector             IBM MQ (messaging)
                              CDC Engine                      Batch/JES2

20-Year Design Principles

  1. Contracts, not implementations. Integrations use defined API specs and event schemas. Implementations behind contracts are replaceable.
  2. Replaceable components. API gateway, event platform, CDC engine — all isolated behind interfaces. No vendor lock-in at the integration layer.
  3. Data as the durable asset. Hardware, software, and languages change. Data structures and business rules endure.
  4. Organizational learning. Architecture Decision Records (ADRs) document not just what was decided but why, so future architects can evolve the architecture intelligently.

Rules of Thumb

  • Build for permanence from day one. "Temporary" integration layers accumulate debt faster than purpose-built ones and always become permanent.
  • The ACL is worth every dollar. It absorbs COMMAREA changes and API version updates without breaking consumers.
  • CDC lag is the number you watch. Stale data in banking is dangerous. Monitor replication lag with the same urgency as CICS response time.
  • Cultural change is harder than technical change. Budget 2x the time you think you need for team integration.
  • Network redundancy is non-negotiable. Dual VPN tunnels (or Direct Connect + VPN) between platforms. A single network link is a single point of failure.
  • Monetary values are strings, not JSON numbers. IEEE 754 doubles cannot represent all decimal values exactly. Use strings with precision contracts for financial data.
  • Translate architecture into business metrics for the board. Cost per transaction, API response time, incident count, MIPS utilization — not architecture diagrams.