Case Study 1: Federal Benefits' COMMAREA-to-Channels Migration

Background

Federal Benefits Administration processes benefit eligibility determinations for 47 million beneficiaries across 12 program types. Their CICS application portfolio — 340 programs, built between 1988 and 2014 — runs on three CICS regions: a TOR handling 3270 terminal traffic, a primary AOR processing benefit calculations, and a secondary AOR running compliance and audit functions.

Sandra Chen, the lead systems programmer, has been managing this portfolio for 18 years. Marcus Washington, a senior application developer with 12 years on the team, handles most of the new feature work. Together, they've kept the system running at 99.97% availability — but they know the architecture is reaching its limits.

The Problem

In January, the legislative team delivered a new mandate: the Comprehensive Benefits Modernization Act requires that every eligibility determination include a detailed audit trail documenting every data source consulted, every rule applied, and every intermediate calculation — all in a single, traceable transaction. The estimated data volume for a complete audit trail: 85KB to 200KB per determination, depending on the program type.

The existing eligibility determination flow uses a 31,500-byte COMMAREA — nearly maxed out — with the following structure:

01  DFHCOMMAREA.
    05  CA-HEADER.
        10  CA-TRANS-TYPE          PIC X(4).
        10  CA-BENEFICIARY-ID      PIC X(11).
        10  CA-PROGRAM-CODE        PIC X(4).
        10  CA-REQUEST-DATE        PIC X(10).
        10  CA-RESPONSE-CODE       PIC X(4).
    05  CA-BENEFICIARY-DATA.
        10  CA-BEN-NAME            PIC X(60).
        10  CA-BEN-DOB             PIC X(10).
        10  CA-BEN-SSN             PIC X(9).
        10  CA-BEN-ADDR            PIC X(200).
        10  CA-BEN-INCOME-DATA     PIC X(500).
    05  CA-ELIGIBILITY-FACTORS.
        10  CA-FACTOR-COUNT        PIC S9(4) COMP.
        10  CA-FACTOR-ENTRY OCCURS 40 TIMES.
            15  CA-FACTOR-CODE     PIC X(6).
            15  CA-FACTOR-VALUE    PIC S9(9)V99 COMP-3.
            15  CA-FACTOR-SOURCE   PIC X(4).
            15  CA-FACTOR-DATE     PIC X(10).
    05  CA-DETERMINATION-RESULT.
        10  CA-ELIG-STATUS         PIC X(2).
        10  CA-BENEFIT-AMT         PIC S9(9)V99 COMP-3.
        10  CA-EFF-DATE            PIC X(10).
        10  CA-REVIEW-DATE         PIC X(10).
    05  CA-OVERFLOW-QUEUE          PIC X(16).
    05  FILLER                     PIC X(136).

The CA-OVERFLOW-QUEUE field is telling. It's been there since 2009, when the team added TS queue overflow to handle cases where the eligibility factors exceeded the 40-entry limit. About 15% of determinations use the overflow queue.

The audit trail requirement makes the overflow approach untenable. Sandra's back-of-envelope calculation: an audit trail with 200 rule applications, each logging the rule ID, input values, output values, and timestamp, would require approximately 120KB of structured data. Plus the existing 31.5KB. Plus the new compliance metadata the Act requires.

"We can't duct-tape this one," Sandra told Marcus. "It's time to do channels properly."

Analysis Phase

Marcus spent two weeks analyzing the existing flow. The eligibility determination involves seven programs:

Program Function COMMAREA Usage
ELIG0000 Terminal handler / web service entry Creates COMMAREA, reads response
ELIG0010 Input validation Reads/writes header, reads beneficiary data
ELIG0020 Data enrichment (remote, AOR2) Reads beneficiary data, writes income data via TS queue
ELIG0030 Eligibility calculation Reads all, writes factors and result
ELIG0040 Compliance check (remote, AOR2) Reads factors and result, writes compliance status
ELIG0050 Audit logging Reads overflow queue, writes audit record
ELIG0060 Response formatting Reads result, formats for terminal/web service

Key findings from Marcus's analysis:

  1. ELIG0010 and ELIG0060 use less than 2KB of the COMMAREA. They're including the full 31.5KB copybook because that's how COMMAREA works — everyone includes everything.

  2. ELIG0020 (data enrichment) runs in AOR2 via DPL. It uses a separate 8KB TS queue to return income data that doesn't fit in the COMMAREA. Both regions use CCSID 037, so code page conversion isn't an issue today — but the agency plans to consolidate with a Canadian office running CCSID 500 next year.

  3. ELIG0050 (audit logging) reads the TS overflow queue. If the transaction abends between the queue write and the audit read, the audit data is lost. This has happened 23 times in the past year, each requiring manual reconstruction.

  4. The TS queue cleanup is fragile. A DELETEQ TS at the end of ELIG0060 is supposed to clean up the overflow queue. If ELIG0060 doesn't execute (abend in ELIG0050, for example), the queue persists until the next auxiliary TS cleanup cycle — which runs daily at 2 AM. This has caused queue name collisions when the same beneficiary is processed twice in one day.

Design Decision

Sandra and Marcus designed a channel architecture based on the patterns in Chapter 15:

Channel: ELIG-CHAN

Container Type Max Size Created By Read By
ELIG-META BIT 200 bytes ELIG0000 All programs
ELIG-REQUEST BIT 400 bytes ELIG0000 ELIG0010, ELIG0030
ELIG-BENEFICIARY BIT 800 bytes ELIG0000 ELIG0010, ELIG0020, ELIG0030
ELIG-INCOME BIT 8,000 bytes ELIG0020 ELIG0030, ELIG0040
ELIG-FACTORS BIT Variable (up to 50KB) ELIG0030 ELIG0040, ELIG0050
ELIG-RESULT BIT 500 bytes ELIG0030 ELIG0040, ELIG0050, ELIG0060
ELIG-COMPLIANCE BIT 2,000 bytes ELIG0040 ELIG0050, ELIG0060
ELIG-AUDIT BIT Variable (up to 200KB) ELIG0030, ELIG0040 ELIG0050
ELIG-ERROR BIT 350 bytes Any program All subsequent programs

Design rationale:

  • BIT for all containers because every structure contains at least one packed decimal field (amounts, dates stored as COMP-3). Even though the Canadian office consolidation will bring CCSID 500 into play, BIT containers ensure binary field integrity. Text fields within BIT containers will need explicit conversion at the application level for the Canadian office — Sandra noted this as a Phase 2 consideration, where selected containers might be split into CHAR text containers and BIT numeric containers.

  • ELIG-AUDIT is the big win. The audit trail container starts empty. ELIG0030 PUTs its rule application log into the container. ELIG0040 GETs the existing audit data, appends its compliance check log, and PUTs the updated container. ELIG0050 GETs the final audit trail and writes it to the audit database. No TS queue. No overflow. No cleanup. No data loss on abend (the channel persists for the duration of the LINK chain).

  • ELIG-FACTORS uses variable length. The 40-entry limit in the old COMMAREA is gone. A determination can now log as many factors as the rules engine produces. The FLENGTH on GET tells the consuming program exactly how much data is present.

  • ELIG-ERROR follows the standard pattern. Any program that encounters an error populates the error container. Subsequent programs check for the error container on entry and short-circuit if an error has already occurred.

Migration Execution

Phase 1: Wrapper (Weeks 1-3)

Marcus wrote a wrapper program, ELIGWRAP, that: 1. Creates the ELIG-CHAN channel from a COMMAREA passed by legacy callers 2. Maps the COMMAREA sections into the appropriate containers 3. LINKs to the new channel-based ELIG0000 4. On return, maps the response containers back to the COMMAREA 5. Handles the case where legacy callers still pass the overflow queue name

This allowed the old 3270 terminal interface to continue working while the programs were migrated one at a time.

Phase 2: Core Programs (Weeks 4-8)

Marcus migrated programs in dependency order: - Week 4: ELIG0010 (validation) — simplest, reads two containers, writes nothing new - Week 5: ELIG0060 (formatting) — reads one container, no write - Week 6: ELIG0030 (calculation) — the core engine, reads three containers, writes three - Week 7: ELIG0020 (data enrichment, remote) — DPL with channel, eliminated TS queue - Week 8: ELIG0040 (compliance, remote) and ELIG0050 (audit) — completed the chain

Each program was implemented as a dual-interface program during migration. The COMMAREA path remained active as a fallback. Marcus used EXEC CICS ASSIGN CHANNEL to detect the invocation method.

Phase 3: Audit Trail Implementation (Weeks 9-10)

With the channel infrastructure in place, Marcus implemented the full audit trail. The ELIG-AUDIT container starts at ELIG0030, which writes a structured audit record:

01  WS-AUDIT-RECORD.
    05  AR-ENTRY-COUNT        PIC S9(8) COMP.
    05  AR-ENTRY OCCURS 1 TO 500 TIMES
        DEPENDING ON AR-ENTRY-COUNT.
        10  AR-RULE-ID        PIC X(10).
        10  AR-RULE-DESC      PIC X(40).
        10  AR-INPUT-VALUES   PIC X(100).
        10  AR-OUTPUT-VALUE   PIC X(20).
        10  AR-TIMESTAMP      PIC X(26).
        10  AR-SOURCE-PGM     PIC X(8).

Each downstream program GETs the existing audit data, increments the entry count, adds its entries, and PUTs the expanded container. The variable-length support of containers makes this seamless — the container grows as audit entries accumulate.

Phase 4: Cleanup (Weeks 11-12)

After two weeks of parallel running (both COMMAREA and channel paths active), Sandra verified: - Zero data discrepancies between COMMAREA-path and channel-path results - TS queue overflow incidents: 0 (down from 23/year) - Audit trail completeness: 100% (up from 97.8%) - DPL performance: 12% improvement (channel transmission vs. COMMAREA + TS queue reads)

The COMMAREA path was disabled (but not removed — Sandra believes in leaving the safety net for six months).

Lessons Learned

1. Migration order matters. Starting with the simplest, most isolated programs (validation and formatting) built confidence and established the patterns that the more complex programs followed.

2. Dual-interface programs are essential. The ability to fall back to COMMAREA during migration eliminated the "all or nothing" risk that had blocked previous modernization attempts.

3. TS queue elimination was the biggest operational win. The 23 annual overflow incidents weren't just data quality problems — each one triggered a manual investigation that averaged 4 hours of analyst time. That's 92 hours/year of incident response eliminated.

4. BIT vs. CHAR requires forward planning. Sandra's decision to use BIT everywhere was pragmatic for the current topology but created a technical debt item for the Canadian office integration. She documented it, assigned it a future sprint, and moved on. Perfect is the enemy of done.

5. Container naming conventions prevent bugs. Marcus established the convention [FLOW]-[CONTENT] (e.g., ELIG-FACTORS, ELIG-AUDIT) from day one. When a junior developer tried FACTORS without the prefix, the container collided with a different application's container of the same name on the shared AOR. The prefix convention was made mandatory.

Discussion Questions

  1. Sandra chose BIT for all containers, even those containing only text data. Under what circumstances would this decision need to be revisited? What would the migration to mixed CHAR/BIT containers look like?

  2. The audit trail container is written by multiple programs (ELIG0030 and ELIG0040 both add entries). What happens if ELIG0040 abends after GET but before PUT of the updated audit trail? How would you design for this failure mode?

  3. Marcus migrated ELIG0020 (data enrichment) in Week 7, which eliminated the TS queue overflow for DPL data. If the DPL link had been unavailable during the migration window, what fallback strategy could Marcus have used?

  4. The ELIG-FACTORS container uses variable length. How should ELIG0040 (compliance check) handle the case where the factors container is unexpectedly empty? Should it treat this as an error or a valid state?

  5. Federal Benefits plans to expose the eligibility determination as a REST API next year. How does the channel architecture support this transition? Which containers would map to the REST request and response payloads?