Case Study 2: Federal Benefits Administration's IMS Extraction

Background

Federal Benefits Administration (FBA) administers benefits for 22 million Americans. The system — 15.2 million lines of COBOL, 47 IMS databases, 23 DB2 subsystems, and 8,200 batch jobs — has been running since the early 1980s. In Chapter 32, Sandra Chen (modernization lead) completed a portfolio assessment that recommended retiring 38% of the codebase, refactoring the core on z/OS, and selectively extracting services through the strangler fig pattern.

This case study focuses on the strangler fig extraction of the Beneficiary Eligibility Lookup — the highest-priority extraction candidate in FBA's modernization roadmap.

The team: - Sandra Chen — Modernization lead. PhD in CS. Fifteen years of government service. Fights vendor "rip and replace" proposals with data. The architect of FBA's strangler fig strategy. - Marcus Whitfield — Legacy SME, 18 months from retirement. Walking encyclopedia of undocumented business rules. His departure creates the urgency that drives the knowledge transfer embedded in every extraction.

The Challenge: IMS, Not Just CICS

SecureFirst's strangler fig (Case Study 1) extracted CICS/DB2 services — a well-understood pattern with mature tooling. FBA's extraction is harder because the Beneficiary Eligibility Lookup reads from IMS databases, not DB2.

IMS (Information Management System) is a hierarchical database and transaction manager that predates DB2 by fifteen years. FBA's IMS databases contain the master beneficiary records — name, SSN, benefit tier, enrollment history, eligibility status, dependency information, and payment history. These records are stored in DL/I (Data Language/Interface) segments organized in parent-child hierarchies, not relational tables.

The implications for the strangler fig:

  1. No SQL. IMS databases are navigated with DL/I calls (GU, GN, GNP, ISRT, DLET, REPL), not SQL. You can't point a CDC tool at an IMS database the way you can at DB2.
  2. Hierarchical, not relational. Beneficiary records are organized in segments: a root segment (BENEFICIARY) with child segments (ENROLLMENT, DEPENDENTS, PAYMENTS, HISTORY). Mapping this to a relational schema requires flattening the hierarchy — a non-trivial data modeling exercise.
  3. IMS Connect, not CICS web services. IMS transactions are accessed via IMS Connect (TCP/IP) or through CICS-IMS cross-region calls, not through CICS web services pipelines.
  4. Limited CDC options. IBM's IMS Solution Packs include IMS CDC capabilities, but the tooling is less mature and less widely deployed than DB2 CDC. Many organizations resort to periodic batch extracts instead of real-time CDC.

Marcus looked at Sandra's strangler fig plan and said: "Everything in that plan assumes the data lives in DB2. Our data lives in IMS. The vine's going to have a harder time growing on this tree."

The Eligibility Lookup Service

What it does: Given a beneficiary's SSN and a date, the Eligibility Lookup determines whether the beneficiary is eligible for benefits as of that date, what tier of benefits they qualify for, and what their enrollment status is. It's the most frequently called service in FBA's system — 180,000 inquiries per day from call center agents, self-service portals, and partner agencies.

The IMS transaction: ELIG001, running in an IMS MPP (Message Processing Program) region. The program (ELIGCHK, 4,800 lines of COBOL) issues DL/I calls to navigate the BENEFICIARY segment hierarchy:

GU  BENEFICIARY (SSA: SSN = input)
    GN  ENROLLMENT (SSA: ENROLL-DATE <= input-date)
    GNP DEPENDENTS (unqualified — get all)
    GN  HISTORY    (SSA: EFF-DATE <= input-date,
                        EXP-DATE >= input-date)

The program then applies 23 eligibility rules — some encoded in the COBOL logic, some derived from reference tables in a DB2 subsystem, and at least four that Marcus says "aren't written down anywhere" and live only in his understanding of how the benefits law has been interpreted over forty years.

Those four undocumented rules are the ticking clock. When Marcus retires, FBA loses the knowledge of why certain eligibility edge cases are handled the way they are. The strangler fig extraction is not just a modernization exercise — it's a knowledge capture exercise.

Sandra's Strangler Fig Plan for IMS Extraction

Phase 0: Knowledge Capture (Months 1-3)

Before writing any code, Sandra assigned two junior analysts to shadow Marcus for three months. Their job: document every eligibility rule, every edge case, every "that's just how we've always done it" decision in ELIGCHK. The output: a 147-page Eligibility Rules Specification that captured all 23 documented rules and, after extensive sessions with Marcus, the 4 undocumented rules.

The undocumented rules:

  1. The 1987 Exception. Beneficiaries enrolled before a specific date in 1987 are exempt from a waiting period that applies to all subsequent enrollees. The exemption is determined by checking whether the ENROLLMENT segment's ENROLL-DATE is before 1987-07-01. This rule is not in any regulation — it was a one-time congressional directive that was never codified in the benefits manual. Marcus knows about it because he implemented it.

  2. The Dependent Aging Rule. When a dependent turns 26, they age out of eligibility. But the aging is calculated from the dependent's birth date in the DEPENDENTS segment, not from the current date — it's calculated from the inquiry date parameter. This matters because partner agencies sometimes query historical eligibility (e.g., "Was this person eligible on March 15, 2021?"). The aging calculation must use the inquiry date, not the current date.

  3. The Dual Enrollment Override. If a beneficiary has two active ENROLLMENT segments (which shouldn't happen but does for approximately 340 beneficiaries due to a data entry error in 2003), the program uses the enrollment with the later effective date. This isn't documented because it's considered a data quality issue, not a business rule — but the COBOL program handles it, and any replacement must handle it identically.

  4. The Retroactive Eligibility Window. Beneficiaries who lose eligibility and re-enroll within 60 days retain their original enrollment date for benefits calculation purposes. This is loosely derived from a regulation, but the 60-day window is not in the code — ELIGCHK uses 63 days (Marcus's interpretation of "60 calendar days including the termination date and the re-enrollment date, with a 1-day grace period for system processing delays"). Any replacement must use 63 days, not 60, to match the legacy system's behavior.

💡 Key Insight: The strangler fig pattern's knowledge capture phase is often its most valuable output. Even if the extraction fails, the documented rules survive. Sandra told Marcus: "If this entire project gets canceled tomorrow, the 147-page spec we wrote with you is worth more than everything else we've done. It captures forty years of institutional knowledge that was about to walk out the door with your retirement."

Phase 1: IMS-to-API Bridge (Months 3-5)

Unlike CICS, IMS doesn't have a built-in web service pipeline. Sandra's team needed to build a bridge between IMS and the REST world. The options:

Option A: IMS Connect + z/OS Connect EE. IMS Connect provides TCP/IP access to IMS transactions. z/OS Connect EE can mediate IMS Connect calls, transforming REST requests into IMS input messages and IMS output messages into REST responses. This is IBM's recommended approach and has the best integration with IMS security (RACF) and monitoring (IMS Performance Analyzer).

Option B: CICS-to-IMS Bridge. Deploy a CICS program that receives a web service request, issues an EXEC DLI call (CICS-managed DL/I) to access the IMS database, and returns the result. This approach leverages CICS web services (Chapter 14) for the API layer and uses CICS's built-in IMS database access capability.

Option C: Batch Extract + API on Extracted Data. Periodically extract the eligibility-relevant IMS segments to a flat file or DB2 table, and build the modern service against the extracted data. This avoids real-time IMS integration but introduces data staleness.

Sandra chose Option A (IMS Connect + z/OS Connect EE) for the facade layer and Option C as a fallback for the data layer. The reasoning:

  • Option A provides real-time access to IMS data through the facade — essential for the shadow mode and parallel running phases where the comparison engine needs both legacy and modern responses for the same real-time request.
  • Option C provides the data feed for the modern service's own data store. Because eligibility data changes relatively slowly (enrollments and status changes happen daily, not per-second like account balances), a batch extract every 15 minutes provides acceptable freshness for a read-only eligibility lookup.

Phase 2: The Modern Service (Months 5-8)

The modern eligibility lookup service was built in Java on Red Hat OpenShift:

  • Data store: PostgreSQL with a relational schema that flattens the IMS hierarchy:
  • beneficiary table (root segment)
  • enrollment table (child, FK to beneficiary)
  • dependent table (child, FK to beneficiary)
  • eligibility_history table (child, FK to beneficiary)
  • Data feed: A batch extract job (ELIGEXT, new COBOL program) runs every 15 minutes, reading IMS segments and writing to a staging dataset. A Python script loads the staging data into PostgreSQL. Total pipeline latency: 15-20 minutes.
  • Business logic: All 27 eligibility rules (23 documented + 4 previously undocumented) implemented in Java, with each rule as a separate class implementing an EligibilityRule interface. This design means rules can be independently tested, modified, and audited — a significant improvement over the monolithic COBOL implementation.
  • Rule traceability: Each rule has a unique identifier (RULE-001 through RULE-027) that appears in the service's audit log and maps to the Eligibility Rules Specification. Auditors can trace any eligibility decision back to the specific rule that produced it.

Phase 3: Shadow Mode (Months 8-10)

The facade (z/OS Connect EE fronting both IMS ELIG001 and the Java service via Kong) was configured to route all production traffic to IMS (legacy) and shadow to the Java service. The comparison engine compared eligibility determinations field by field.

Results after eight weeks of shadow mode:

Week Requests Eligibility Match Tier Match Enrollment Match
1-2 2,520,000 97.3% 99.1% 99.8%
3-4 2,490,000 99.92% 99.98% 99.99%
5-6 2,580,000 99.997% 99.999% 100%
7-8 2,510,000 99.999% 100% 100%

Week 1-2 issues:

The 2.7% eligibility mismatch was caused by the 15-minute data staleness. When an enrollment status changed in IMS, the legacy service reflected it immediately, but the Java service wouldn't see the change until the next batch extract. Fix: reduced the batch extract interval to 5 minutes for the shadow period. Residual staleness (0.08%) was confined to the 5-minute window after a status change — acceptable for the production service, where a 5-minute delay in reflecting an enrollment change is within the SLA.

The 0.9% tier mismatch was caused by a subtle difference in how the Java service and the COBOL program handled the Dual Enrollment Override (Rule 25). The COBOL program compared enrollment effective dates using a packed decimal comparison (COMP-3), while the Java service compared them as LocalDate objects. For enrollments on the same day (the 340 dual-enrollment cases), the COBOL program used the enrollment with the higher sequence number in the IMS database (a side effect of DL/I navigation order), while the Java service used the one inserted most recently into PostgreSQL (a side effect of batch load order). Fix: added a secondary sort on IMS sequence number to the Java service's dual-enrollment resolution logic.

Week 5-6: The 0.003% residual mismatch (approximately 8 requests per week) was caused by the Retroactive Eligibility Window. The COBOL program used 63 days. The initial Java implementation used 60 days. Sandra went back to Marcus, who explained the history. The Java implementation was corrected to 63 days.

Phase 4: Canary Deployment (Months 10-13)

Ring Users Duration Result
Ring 0 FBA internal staff (call center test accounts) 3 weeks Zero discrepancies
Ring 1 5% of call center queries (routed by agent ID) 2 weeks Zero discrepancies
Ring 2 25% of all queries 3 weeks 1 discrepancy (timing, during batch extract window)
Ring 3 50% of all queries 3 weeks Zero discrepancies
Ring 4 100% of all queries Ongoing IMS on standby

The canary duration was longer than SecureFirst's (13 weeks vs. 8 weeks) because FBA is a government agency with stricter audit requirements. Each ring transition required sign-off from Sandra, Marcus, the compliance officer, and the deputy director. The compliance officer required a written attestation that the modern service produced identical results for a random sample of 10,000 eligibility determinations.

Phase 5: Steady State

The Java eligibility service now handles 100% of production eligibility lookups. The IMS ELIG001 transaction remains deployed and available for rollback. The batch extract from IMS continues (it feeds not only the eligibility service but also several reporting systems).

The Marcus Factor

Three months after the eligibility service reached 100% modern traffic, Marcus Whitfield retired.

At his farewell lunch, Sandra gave a brief speech. She mentioned the 147-page Eligibility Rules Specification. She mentioned the four undocumented rules that are now documented, tested, and traceable. She mentioned that Marcus's knowledge — accumulated over 38 years — is now encoded in 27 Java rule classes, each with its own unit test suite, each mapped to a page in the specification.

Marcus's contribution to the strangler fig wasn't code. It was knowledge. The extraction pipeline's "Understand" phase — the second step in the lifecycle — was really "Talk to Marcus until he's told us everything." That phase took three months and produced the project's most valuable artifact.

💡 Key Insight: In organizations with legacy COBOL systems, the strangler fig pattern is often justified as a modernization exercise. But its most important function may be as a knowledge transfer exercise. The structured process of extracting a service — mapping every rule, documenting every edge case, proving equivalence through parallel running — captures institutional knowledge that would otherwise be lost when the people who built the system retire. The modern service is the output. The documented rules are the real deliverable.

The Remaining IMS Challenge

FBA's eligibility lookup was a read-only IMS extraction — the hardest part was the data modeling (hierarchical to relational) and the knowledge capture. But FBA's IMS databases also contain write-side services: enrollment updates, benefit tier changes, and payment processing.

Sandra's assessment: write-side IMS extraction is a Phase 3 initiative (12-24 months out), and it may never happen. The IMS write services are deeply integrated — an enrollment update triggers cascading changes across BENEFICIARY, ENROLLMENT, DEPENDENTS, and HISTORY segments in a single IMS UOW. Extracting this to a microservice would require:

  1. Implementing IMS-equivalent hierarchical consistency guarantees in a relational database
  2. Handling the 47 IMS database triggers (exit routines) that fire on segment changes
  3. Replicating the IMS HALDB (High Availability Large Database) partition structure for concurrent access
  4. Maintaining the IMS log-based recovery model that the disaster recovery plan depends on

Sandra's honest assessment: "The read side was hard. The write side might be impossible without rebuilding IMS in Java, and if we're going to rebuild IMS in Java, we should probably just keep using IMS."

Outcomes

Metric Before After
Eligibility lookup response time (p95) 340ms (IMS Connect) 45ms (Java service)
Rule traceability 23 rules in COBOL; 4 undocumented 27 rules, each with ID, test suite, and spec page reference
Development velocity for rule changes 6-8 weeks (COBOL change + regression test) 1-2 weeks (Java rule class + automated test)
Knowledge dependency on Marcus Critical (single point of failure) Eliminated (knowledge in specification + code + tests)
IMS MIPS reduction 12% (eligibility queries no longer hit IMS MPP region)
Audit finding risk High (undocumented rules) Low (full traceability)

Discussion Questions

  1. Sandra chose a 15-minute batch extract instead of real-time CDC from IMS. Under what circumstances would real-time IMS CDC be justified? What would the architecture look like?

  2. The four undocumented rules required extensive sessions with Marcus to capture. If Marcus had already retired before the strangler fig project began, how would the team have discovered these rules? What tools and techniques are available for extracting business rules from COBOL source code?

  3. The Dual Enrollment Override (Rule 25) is caused by a data quality issue — 340 beneficiaries with duplicate enrollment records from a 2003 data entry error. Should the modern service fix the data or replicate the workaround? What are the implications of each choice?

  4. Sandra describes the write-side IMS extraction as "might be impossible." Is this a failure of the strangler fig pattern, or is it the pattern working correctly (by identifying what should and shouldn't be extracted)?

  5. The 63-day Retroactive Eligibility Window (instead of the regulation's 60-day window) is Marcus's interpretation. Now that it's documented and the modern service replicates it, is it "correct"? Who has the authority to change it to 60 days? What process should be followed?

  6. Compare the IMS strangler fig (this case study) with the CICS/DB2 strangler fig (Case Study 1). What are the three biggest differences in complexity? Which pattern decisions from Case Study 1 apply directly to IMS extraction, and which need modification?