Chapter 33 Quiz: Strangler Fig Pattern for Mainframes
Section 1: Multiple Choice
1. What is the primary advantage of the strangler fig pattern over a big-bang migration?
a) The strangler fig pattern is always cheaper than big-bang migration b) The strangler fig pattern eliminates the single "go-live day," replacing it with incremental, reversible traffic shifts c) The strangler fig pattern guarantees that all services will eventually be extracted from the mainframe d) The strangler fig pattern does not require any changes to the legacy system
Answer: b) The strangler fig pattern eliminates the single "go-live day," replacing it with incremental, reversible traffic shifts
Explanation: The strangler fig's fundamental advantage is that there is no single point of failure — no "cutover weekend" where the entire system switches from old to new. Instead, traffic is gradually and reversibly shifted one service at a time. If a new service has a problem, traffic is routed back to the legacy service in seconds. This is not necessarily cheaper (it can be more expensive due to the coexistence period), it doesn't guarantee full extraction, and it does require changes to the legacy system (at minimum, wrapping services as APIs). The key value is risk reduction through incremental, reversible migration.
2. In the strangler fig architecture, what is the role of the facade layer?
a) To rewrite legacy COBOL business logic in a modern language b) To act as a single entry point that routes requests to either legacy or modern services based on configurable rules c) To permanently replace the CICS Transaction Server d) To synchronize data between the legacy and modern databases
Answer: b) To act as a single entry point that routes requests to either legacy or modern services based on configurable rules
Explanation: The facade is the strangler. It presents a single, stable interface to consumers and internally routes each request to either the legacy CICS system or the modern service, based on routing rules (path-based, percentage-based, user-based, etc.). The facade does not contain business logic, does not replace CICS, and does not handle data synchronization — those are separate architectural concerns. The facade's four requirements are: transparent to consumers, stateless, observable, and reversible.
3. When selecting the first service to extract using the strangler fig pattern, which characteristic is MOST important?
a) Highest business value b) Highest extraction priority score (business value and change frequency divided by technical complexity and data coupling) c) Largest codebase (most lines of COBOL) d) Most users
Answer: b) Highest extraction priority score (business value and change frequency divided by technical complexity and data coupling)
Explanation: Picking the service with the highest business value alone is a common mistake — it often means picking the most complex, riskiest service (like fund transfer). The extraction priority score balances business value and change frequency (the benefits) against technical complexity and data coupling (the costs). The first extraction must succeed because it sets the pattern, builds organizational confidence, and trains the team. A failed first extraction can derail the entire strangler fig program. Balance inquiry typically scores highest because it combines high business value with low complexity and low data coupling.
4. Carlos Vega's 4 AM incident at SecureFirst was caused by which data synchronization issue?
a) The modern service used a different database schema than the legacy system b) The CDC pipeline had a 47-second lag during peak batch processing, causing the modern service to show pre-posting balances c) The modern service had a bug in its balance calculation logic d) The API gateway routed requests to the wrong backend
Answer: b) The CDC pipeline had a 47-second lag during peak batch processing, causing the modern service to show pre-posting balances
Explanation: During peak batch processing, the DB2 recovery log generates a high volume of changes, and the CDC pipeline's throughput couldn't keep up. The result: the modern service's PostgreSQL replica was 47 seconds behind the DB2 master. When a customer checked their balance on the mobile app (served by the modern service) and a teller checked it on the green screen (served by CICS/DB2 directly), they saw different numbers — because a nightly mortgage payment had posted in DB2 but hadn't yet been replicated to PostgreSQL. This is a CDC latency issue, not a logic bug.
5. Which data synchronization pattern is recommended for Phase 1-3 of a strangler fig migration?
a) Event Sourcing — both systems produce and consume events b) Dual-Write with Legacy Priority — writes go to both databases c) Legacy as Source of Truth — the modern service reads from a CDC-fed replica d) Shared Database — both services connect to the same DB2 instance
Answer: c) Legacy as Source of Truth — the modern service reads from a CDC-fed replica
Explanation: Pattern 1 (Legacy as Source of Truth) is recommended for early phases because it's the simplest and most reliable. The legacy DB2 database remains the single source of truth. The modern service reads from a replica kept in sync via CDC. No conflict resolution is needed because the modern service doesn't write. The only downside — the modern service can't handle writes — is acceptable in early phases because the first extractions should target read-only services (balance inquiry, transaction history). Dual-write comes later, for write-capable services.
6. What is a "seam" in the context of extracting COBOL services?
a) A bug in the legacy code that creates an opportunity for improvement b) A place in the code where you can alter behavior without editing the code itself — such as a CICS transaction boundary, a COMMAREA contract, or a DB2 view c) The boundary between the mainframe network and the external network d) The point at which the strangler fig migration is complete
Answer: b) A place in the code where you can alter behavior without editing the code itself — such as a CICS transaction boundary, a COMMAREA contract, or a DB2 view
Explanation: Michael Feathers defined a "seam" as a place where you can alter behavior without editing the code. In COBOL/CICS systems, the natural seams are: CICS transaction boundaries (you can route a transaction to a different program), COMMAREA/channel boundaries (the data contract between programs), copybook boundaries (data structure definitions), DB2 view boundaries (you can change what's behind a view), and MQ queue boundaries (you can change the consumer of a queue). Clean seams make extraction easy; tightly coupled code with no seams requires pre-extraction refactoring to create them.
7. During shadow mode testing, SecureFirst discovered seven accounts with a specific combination of hold types that the modern service hadn't implemented. What does this demonstrate?
a) The modern service was not properly unit tested b) Only real production traffic with its full diversity of account states can reveal all edge cases c) Shadow mode testing is unreliable and should be skipped d) The legacy COBOL program has bugs that the modern service correctly avoids
Answer: b) Only real production traffic with its full diversity of account states can reveal all edge cases
Explanation: The seven accounts used an obscure hold type (judicial garnishment hold with partial release) that the extraction team hadn't encountered during analysis. No unit test, integration test, or synthetic test data would have covered this case because no one knew it existed. Shadow mode testing with production traffic is non-negotiable precisely because it exposes edge cases that emerge only from the full diversity of real-world data — hold types, fee structures, regulatory flags, and account configurations that accumulate over decades of production use. This is why the chapter states shadow mode should run for at least one full monthly cycle.
8. What is the recommended minimum standby period before decommissioning a legacy COBOL module?
a) 30 days b) 60 days c) 90-180 days (covering the longest periodic cycle plus a buffer) d) 1 year
Answer: c) 90-180 days (covering the longest periodic cycle plus a buffer)
Explanation: The minimum standby period must cover all periodic cycles: monthly close, quarterly reporting, annual regulatory submissions, and fiscal year-end processing. Sandra Chen's team at FBA learned this when a decommissioned module was needed by a quarterly regulatory batch job after only 88 days of standby — two days before the quarter-end run. The 90-day minimum covers monthly and quarterly cycles. For modules involved in annual processing, 180 days (or even a full year) may be necessary. During standby, the module is still deployed and available but receives no production traffic.
9. The chapter identifies a specific technical error that occurred during SecureFirst's parallel running: a 0.01 discrepancy on 3.2% of accounts. What was the root cause?
a) A rounding error in the DB2 SQL query b) The modern Kotlin service used IEEE 754 floating-point for monetary calculations instead of packed decimal equivalents c) A CDC pipeline that dropped decimal places during replication d) A timezone difference that affected interest calculation cutoff times
Answer: b) The modern Kotlin service used IEEE 754 floating-point for monetary calculations instead of packed decimal equivalents
Explanation: IEEE 754 double-precision floating-point cannot exactly represent all decimal fractions — $1234.56 is stored as 1234.5600000000000909... in floating-point. When enough arithmetic operations accumulate (balance + deposit - fee + interest), the rounding error manifests as a visible penny discrepancy. COBOL's COMP-3 (packed decimal) stores decimal values exactly, which is why financial systems have used packed decimal for 60 years. The fix: use BigDecimal (Java/Kotlin), Decimal (Python), or equivalent fixed-point types for all monetary calculations in the modern service. This is a well-known problem, documented extensively, and teams still make the mistake.
10. Diane Okoye at Pinnacle Health recommended stopping the strangler fig migration after extracting services around the claims adjudication engine, leaving the core engine on CICS. What principle does this illustrate?
a) The strangler fig pattern always fails for complex services b) COBOL is always better than Java for claims processing c) The realistic endpoint of most strangler fig migrations is a hybrid architecture, not full mainframe decommission d) Health insurance systems should never be modernized
Answer: c) The realistic endpoint of most strangler fig migrations is a hybrid architecture, not full mainframe decommission
Explanation: The asymptotic problem: the first 20% of services (by transaction volume) are extracted quickly, but the last 50% — the complex, tightly coupled, mission-critical services — would take disproportionately long and cost disproportionately more. Diane's recommendation reflects the chapter's core principle: define success as achieving business outcomes, not decommissioning the mainframe. The adjudication engine works, it's fast, it's compliant, and extracting it would cost $45 million with no guarantee of success. The hybrid steady state — modern services for consumer-facing APIs and analytics, mainframe for high-throughput OLTP and complex batch — is the realistic and desirable endpoint for most organizations.
11. Which of the following is NOT a valid routing strategy for the strangler fig's routing engine?
a) Path-based — route by API endpoint path b) Percentage-based — route N% of traffic to modern c) Database-based — route based on which database has the most recent data d) User-based — route specific users or accounts to modern
Answer: c) Database-based — route based on which database has the most recent data
Explanation: The facade routes based on the request's metadata (path, headers, user identity, percentage) — not based on database state. A database-based routing strategy would require the facade to query both databases before routing, which would add latency, introduce database coupling into the facade layer, and violate the principle that the facade should be stateless and thin. The valid routing strategies are: path-based, header-based, percentage-based, user-based, time-based, and feature-toggle-based.
12. In the canary deployment strategy, SecureFirst chose lowest-balance accounts for Ring 1. Why?
a) Lowest-balance accounts are the least active and generate the least traffic b) Lowest-balance accounts have the simplest data structures c) If there is a display error, the dollar impact is smallest for lowest-balance accounts d) Regulatory requirements mandate testing with lowest-balance accounts first
Answer: c) If there is a display error, the dollar impact is smallest for lowest-balance accounts
Explanation: Canary rings are about controlling blast radius. Showing a customer with $47 an incorrect balance of $46.99 is a problem, but showing a customer with $470,000 a balance of $469,999.99 is a much bigger problem and a much bigger regulatory exposure. By starting with lowest-balance accounts, the potential dollar impact of any error is minimized. This also means that complex accounts (trusts, joint accounts, business accounts with sub-accounts) — which are more likely to expose edge cases and are more likely to have high balances — are tested in later rings after confidence has been built.
13. What is the purpose of the correlation ID in the CICS web service wrapper?
a) To uniquely identify the CICS region handling the request b) To match legacy and modern responses for the same request during parallel running c) To authenticate the consumer with RACF d) To track the request through the DB2 recovery log
Answer: b) To match legacy and modern responses for the same request during parallel running
Explanation: During parallel running and shadow mode, the comparison engine needs to match the legacy response with the modern response for the same request. The correlation ID — a UUID generated by the facade and passed through to both services — enables this matching. Without a correlation ID, you can't reliably compare responses when requests arrive out of order or when the two services have different processing times. The correlation ID also aids in debugging: when a discrepancy is found, you can trace the entire request lifecycle through both systems using a single identifier.
14. The chapter presents four layers of the strangler fig architecture. Which layer is described as "the part that nobody thinks about until it's too late"?
a) The facade / API gateway b) The routing engine c) The modern services d) The data synchronization layer
Answer: d) The data synchronization layer
Explanation: Data synchronization is the hardest part of the strangler fig pattern and the source of most production incidents. The facade and routing engine are well-understood infrastructure problems with mature tooling. Building modern services is straightforward software engineering. But keeping two data stores in sync — with consistent data, acceptable latency, no conflicts, and reliable failure handling — is a distributed systems problem that has no simple solution. Carlos's 4 AM incident, the COMP-3 vs. floating-point issue, and the CDC pipeline failure scenarios all trace back to data synchronization.
15. According to the chapter, what is the most important anti-pattern to avoid in a strangler fig migration?
a) Building the facade as a monolith b) Defining success as "mainframe off" c) Extracting multiple services simultaneously d) Skipping shadow mode testing
Answer: b) Defining success as "mainframe off"
Explanation: The chapter states this anti-pattern "kills more projects than all the technical anti-patterns combined." If success requires the mainframe to be decommissioned, you will either fail (because the last 30% of services resist extraction) or succeed at catastrophic cost. The correct approach is to define success in business terms: API response times, development velocity, operational cost reduction, and staff flexibility. The architecture should be whatever achieves those outcomes — which for most organizations is a hybrid where the mainframe handles what it does best and modern services handle the rest.
Section 2: Short Answer
16. Name the eight stages of the extraction pipeline lifecycle in order.
Answer: Identify, Understand, Build, Shadow, Parallel, Canary, Migrate, Decommission. Identify: score extraction candidates. Understand: map all business rules, edge cases, and data dependencies. Build: implement the modern service. Shadow: modern service receives copies of production traffic, responses are discarded. Parallel: both services handle real traffic, responses are compared. Canary: gradually shift real users to the modern service in controlled rings. Migrate: 100% traffic to modern, legacy on standby. Decommission: remove legacy code after 90+ days of standby with no issues.
17. List the five natural seams in a COBOL/CICS system that can serve as extraction points.
Answer: (1) CICS transaction boundaries — each transaction (EXEC CICS LINK, EXEC CICS XCTL) is a seam where routing can be changed. (2) COMMAREA / channel boundaries — the data passed between CICS programs defines the service contract. (3) Copybook boundaries — shared data structure definitions; isolated copybooks indicate clean data boundaries. (4) DB2 view boundaries — if a service reads through a view, the view is a seam (you can change what's behind it). (5) MQ queue boundaries — services that communicate via MQ have explicit message contracts, and the queue consumer can be replaced.
18. Explain why distributed two-phase commit between DB2 on z/OS and PostgreSQL is impractical for the dual-write pattern, and describe the recommended alternative.
Answer: DB2 on z/OS and PostgreSQL on Linux cannot participate in the same two-phase commit without a distributed transaction coordinator (such as CICS UOW with XA), and even with one, the performance penalty of cross-platform two-phase commit would violate typical SLA requirements (e.g., 200ms response time). The round-trip latency alone — z/OS to Linux network hop, PostgreSQL prepare, PostgreSQL commit acknowledgment — would dominate the response time budget. The recommended alternative is asynchronous replication with compensating transactions: write to the primary store (e.g., PostgreSQL for the modern service), send the write instruction to the legacy system via MQ, and if the legacy processing fails, execute a compensating transaction to reverse the primary write.
19. The chapter describes three implementation options for the facade layer: (A) external API gateway, (B) z/OS Connect as facade, and (C) hybrid. Which does the chapter recommend for most enterprise implementations, and why?
Answer: Option C — Hybrid (external API gateway + z/OS Connect). The external API gateway (Kong, Apigee, etc.) handles consumer-facing routing decisions, traffic splitting, and canary management using mature tooling. z/OS Connect handles the mainframe-side mediation, transforming REST requests into CICS LINK or IMS transactions using native z/OS integration. This provides a clean separation of concerns: the gateway gets sophisticated traffic management, and z/OS Connect gets native mainframe protocol handling without requiring custom code. The tradeoff — two components to manage and monitor — is acceptable for the architectural benefits.
20. What three CDC monitoring metrics does the chapter recommend, and what are the alert thresholds?
Answer: (1) Replication lag (seconds behind source) — alert if greater than 30 seconds. This measures how far behind the target database is relative to the source. (2) Apply throughput (rows per second) — alert if it drops to zero. This measures whether the CDC pipeline is actively applying changes; a zero throughput indicates the pipeline has stopped. (3) Error count — alert on any non-zero value. Any error in the CDC pipeline (schema mismatch, connection failure, data type conversion error) could result in data loss or inconsistency and must be investigated immediately. CDC pipelines fail silently, so monitoring is essential.