Chapter 33 Key Takeaways: Strangler Fig Pattern for Mainframes

Core Principle

The strangler fig pattern transforms mainframe modernization from a single, catastrophic bet into a series of small, reversible experiments. There is no "go-live day." There is no "cutover weekend." There is a gradual, measurable, reversible transfer of traffic from legacy to modern, one service at a time, over months or years. If a new service has a problem, you route traffic back to the legacy system in seconds.


The Three Architectural Components

Component Role Key Property
Facade Single entry point for all consumers; routes to legacy or modern Transparent, stateless, observable, reversible
Routing Engine Decides which service handles each request Configurable (path, percentage, user, time, feature toggle)
Data Synchronization Keeps legacy and modern data consistent during coexistence The hardest part — source of most production incidents

Extraction Scorecard

Priority = (Business Value × 3) + (Change Frequency × 2) + Risk Tolerance
           ─────────────────────────────────────────────────────────────────
                        Technical Complexity + Data Coupling

Extraction order rule: Start with read-only services that have high business value, low complexity, and low data coupling. Balance inquiry is almost always the first candidate. Fund transfer is almost always the last.


Data Synchronization Patterns

Pattern When to Use Data Consistency Complexity
Legacy as Source of Truth Phase 1-3, read-only services Eventual (CDC lag) Low
Dual-Write with Legacy Priority Phase 3+, write services Eventual + compensating transactions High
Event Sourcing Greenfield only (not brownfield) Event-log-based Very high
Shared Database Transitional only Immediate (same DB) Low, but limits evolution

Start with Pattern 1. Move to Pattern 2 only when you need write capability in the modern service, and only after the CDC pipeline has been battle-tested.


The Extraction Pipeline Lifecycle

IDENTIFY → UNDERSTAND → BUILD → SHADOW → PARALLEL → CANARY → MIGRATE → DECOMMISSION

Critical phases:

  • UNDERSTAND: Map every business rule, every edge case, every undocumented behavior. This is the knowledge capture phase. It may be the most valuable output of the entire project.
  • SHADOW: Route copies of production traffic to the modern service; compare responses. Non-negotiable — it finds edge cases that no test suite can.
  • DECOMMISSION: Only after 90+ days of standby with no issues. Archive source code; never delete it.

Five Natural Seams in COBOL/CICS

  1. CICS transaction boundaries — route the transaction elsewhere
  2. COMMAREA / channel boundaries — the service contract
  3. Copybook boundaries — isolated data structures = clean extraction
  4. DB2 view boundaries — change what's behind the view
  5. MQ queue boundaries — change the consumer

If you can't find clean seams, create them through pre-extraction refactoring.


Shadow Mode and Parallel Running

Phase What Happens Duration
Shadow mode Modern service gets copies of traffic; responses discarded; comparison engine logs results 2-4 weeks minimum (one full monthly cycle)
Parallel running Both services handle real traffic; primary's response goes to consumer; comparison continues 4-8 weeks with gradual traffic shift
Canary deployment Controlled subset of real users see modern service's response 4-12 weeks across escalating rings

Comparison rule for financial data: Exact to the penny. No tolerance. A one-cent rounding difference means the systems disagree about account state, and that disagreement compounds into regulatory findings.


Critical Technical Lessons

COMP-3 vs. Floating-Point

COBOL COMP-3 (packed decimal) stores monetary values exactly. IEEE 754 floating-point does not. Use BigDecimal (Java/Kotlin) or Decimal (Python) for all monetary calculations in the modern service. This is a guaranteed production incident if you use float or double.

CDC Latency During Batch Windows

CDC from DB2 works perfectly until the batch window opens and the recovery log generates gigabytes per hour. Plan for 30-60 second latency spikes during batch processing. Options: freshness indicators, time-based routing to legacy during batch, or accepting the latency.

Correlation ID Propagation

Every request through the facade must carry a correlation ID that flows through to both legacy and modern services. Without it, you can't match responses in the comparison engine.


When to Stop Strangling

  • Define success in business terms (API response time, development velocity, cost reduction, staff flexibility) — not as "mainframe off"
  • The hybrid steady state is the realistic endpoint: mainframe for high-throughput OLTP and complex batch; modern services for consumer-facing APIs and high-change-frequency capabilities
  • The asymptotic problem: the first 50% of services take 30% of the effort; the last 30% would take 70% of the effort and may not be worth extracting
  • Standby before decommission: 90-180 days minimum, covering all periodic cycles (monthly, quarterly, annual)

Patterns That Work

  1. Start with the API, not the service (expose everything through a facade first)
  2. Extract read services first, write services later
  3. Keep the legacy system funded and maintained (it's your safety net)
  4. Celebrate each extraction as a milestone
  5. Pair mainframe and modern developers on every extraction

Anti-Patterns That Kill

  1. Extracting multiple services simultaneously
  2. Skipping shadow mode
  3. Building the facade as a monolith (with business logic, transformations, orchestration)
  4. Defining success as "mainframe off"
  5. Picking the highest-value service first instead of the highest-priority-score service

The One Sentence Summary

The strangler fig doesn't kill the tree — it creates a hybrid where each component runs on the platform that serves it best, and that's not a compromise; it's good architecture.