Case Study 2: Federal Benefits' Regulatory Reporting Pipeline

Background

Federal Benefits Administration processes benefits for 4.2 million beneficiaries across twelve programs. By federal statute, FBA must submit quarterly activity reports to three oversight agencies:

  • Office of Management and Budget (OMB) — Aggregate spending by program, demographic category, and geographic region. Fixed-width ASCII format, SFTP delivery.
  • Government Accountability Office (GAO) — Individual-level transaction records for sampled beneficiaries (approximately 420,000 per quarter). XML format, Connect:Direct delivery.
  • Congressional Budget Office (CBO) — Actuarial projection data with historical trends. CSV format, API upload to CBO's data portal.

Each agency has different format requirements, different delivery mechanisms, and different validation rules. A late or rejected submission triggers a formal deficiency notice that goes to the FBA administrator — Sandra Chen's boss's boss. Missing two consecutive quarters triggers a congressional inquiry.

Sandra, the chief systems architect, and Marcus Johnson, the lead COBOL developer, are responsible for ensuring these reports are complete, accurate, and delivered on time. Every quarter.

The Regulatory Calendar

The quarterly reporting cycle is brutally compressed:

Day Activity
Q+1 (first business day after quarter end) Quarter-end batch processing completes
Q+2 Extract and validation begin
Q+3 through Q+5 Data correction window (fix any issues found during validation)
Q+6 Final extract, transformation, and control report generation
Q+7 Internal review and sign-off
Q+8 Submission to all three agencies
Q+9 Confirmation of receipt and acceptance
Q+10 Remediation window (if any agency rejects)

Ten business days. That's all Sandra has from quarter-close to final delivery. Any delay in earlier steps compresses the remediation window — and if the remediation window hits zero, there's no buffer for rejections.

The Pipeline Architecture

Marcus designed the reporting pipeline as a five-stage process. Each stage has explicit inputs, outputs, validation gates, and restart capabilities.

Stage 1: Canonical Extract

A single COBOL batch job extracts all data needed for all three reports from the production DB2 tables and VSAM files. This canonical extract contains every field needed by any agency, in a superset layout:

       01  WS-CANONICAL-RECORD.
           05  WS-CAN-BENE-ID       PIC X(11).
           05  WS-CAN-PROGRAM-CD    PIC X(04).
           05  WS-CAN-ACTIVITY-TYPE  PIC X(03).
           05  WS-CAN-ACTIVITY-DATE  PIC 9(08).
           05  WS-CAN-AMOUNT        PIC S9(11)V99 COMP-3.
           05  WS-CAN-STATE-CD      PIC X(02).
           05  WS-CAN-COUNTY-FIPS   PIC X(05).
           05  WS-CAN-DEMOG-DATA.
               10  WS-CAN-AGE-GROUP PIC X(02).
               10  WS-CAN-GENDER    PIC X(01).
               10  WS-CAN-ETHNICITY PIC X(02).
           05  WS-CAN-ELIGIBILITY.
               10  WS-CAN-ELIG-START PIC 9(08).
               10  WS-CAN-ELIG-END   PIC 9(08).
               10  WS-CAN-ELIG-STATUS PIC X(02).
           05  WS-CAN-PAYMENT-DATA.
               10  WS-CAN-PAY-METHOD PIC X(02).
               10  WS-CAN-PAY-DATE   PIC 9(08).
               10  WS-CAN-PAY-AMOUNT PIC S9(09)V99 COMP-3.
           05  WS-CAN-AUDIT-FIELDS.
               10  WS-CAN-LAST-UPD-PGM PIC X(08).
               10  WS-CAN-LAST-UPD-TS  PIC X(26).
               10  WS-CAN-LAST-UPD-USER PIC X(08).
           05  WS-CAN-SAMPLE-FLAG    PIC X(01).
           05  FILLER                 PIC X(40).

The extract writes to a GDG base: FBA.PROD.REG.CANONICAL.QnYYYY where n is the quarter number and YYYY is the year.

Why a canonical extract? Marcus's predecessor built three separate extract programs — one per agency. When a DB2 table changed, all three programs needed updating. When a new field was required by one agency, the other two extract programs had to be regression-tested. The canonical approach concentrates all source data access in one program. The three agency-specific formats are produced by transformation programs that read the canonical extract.

Stage 2: Validation

The canonical extract passes through a validation program that checks every record against 87 validation rules:

       PERFORM-VALIDATION.
           PERFORM VALIDATE-BENE-ID
           PERFORM VALIDATE-PROGRAM-CODE
           PERFORM VALIDATE-DATES
           PERFORM VALIDATE-AMOUNTS
           PERFORM VALIDATE-GEOGRAPHIC
           PERFORM VALIDATE-DEMOGRAPHIC
           PERFORM VALIDATE-ELIGIBILITY-LOGIC
           PERFORM VALIDATE-CROSS-FIELD-RULES
           .

       VALIDATE-DATES.
           IF WS-CAN-ACTIVITY-DATE < WS-QUARTER-START
           OR WS-CAN-ACTIVITY-DATE > WS-QUARTER-END
               PERFORM WRITE-VALIDATION-ERROR
               ADD 1 TO WS-DATE-ERROR-COUNT
           END-IF

           IF WS-CAN-PAY-DATE NOT = ZEROS
           AND WS-CAN-PAY-DATE < WS-CAN-ACTIVITY-DATE
               PERFORM WRITE-VALIDATION-ERROR
               ADD 1 TO WS-DATE-SEQUENCE-ERROR
           END-IF
           .

Validation produces three outputs: 1. Clean records — passed all 87 rules, ready for transformation 2. Error records — failed one or more rules, written to an error file with error codes 3. Validation summary report — counts by error type, severity, and program code

Sandra's rule: if critical errors exceed 0.1% of total records (approximately 180 records for a typical quarter), the pipeline stops for investigation. Non-critical errors (formatting issues, minor inconsistencies) are flagged but don't stop processing. The data correction window (Q+3 through Q+5) exists specifically to address error records.

Stage 3: Agency-Specific Transformation

Three transformation programs read the validated canonical extract and produce agency-specific output files.

OMB Transformation (OMBXFORM)

OMB requires aggregate data — no individual beneficiary records. OMBXFORM reads the canonical extract and produces summary records:

       PRODUCE-OMB-RECORD.
           PERFORM VARYING WS-PGM-IDX FROM 1 BY 1
               UNTIL WS-PGM-IDX > WS-PROGRAM-COUNT

               PERFORM VARYING WS-STATE-IDX FROM 1 BY 1
                   UNTIL WS-STATE-IDX > 56

                   IF WS-AGG-COUNT(WS-PGM-IDX, WS-STATE-IDX) > 0
                       MOVE WS-PROGRAM-TABLE(WS-PGM-IDX)
                           TO OMB-PROGRAM-CODE
                       MOVE WS-STATE-TABLE(WS-STATE-IDX)
                           TO OMB-STATE-CODE
                       MOVE WS-AGG-COUNT(WS-PGM-IDX, WS-STATE-IDX)
                           TO OMB-BENE-COUNT
                       MOVE WS-AGG-AMOUNT(WS-PGM-IDX, WS-STATE-IDX)
                           TO OMB-TOTAL-AMOUNT

                       PERFORM CONVERT-AMOUNT-TO-DISPLAY
                       PERFORM CONVERT-EBCDIC-TO-ASCII

                       WRITE OMB-OUTPUT-RECORD
                       ADD 1 TO WS-OMB-REC-COUNT
                   END-IF
               END-PERFORM
           END-PERFORM
           .

The OMB output is a fixed-width ASCII file with 350-byte records. Every field is converted from EBCDIC, packed decimals are converted to display format, and dates are reformatted from YYYYMMDD to MM/DD/YYYY (OMB's required format — yes, they still require MM/DD/YYYY in 2026).

GAO Transformation (GAOXFORM)

GAO receives individual-level records for a statistical sample. The sample flag in the canonical record determines inclusion. The output is XML — and generating well-formed XML from COBOL is exactly as painful as you'd imagine:

       WRITE-GAO-XML-RECORD.
           STRING '<beneficiary>' DELIMITED BY SIZE
               '<id>' DELIMITED BY SIZE
               WS-CAN-BENE-ID DELIMITED BY SPACES
               '</id>' DELIMITED BY SIZE
               '<program>' DELIMITED BY SIZE
               WS-CAN-PROGRAM-CD DELIMITED BY SPACES
               '</program>' DELIMITED BY SIZE
               '<activity>' DELIMITED BY SIZE
               '<type>' DELIMITED BY SIZE
               WS-CAN-ACTIVITY-TYPE DELIMITED BY SPACES
               '</type>' DELIMITED BY SIZE
               '<date>' DELIMITED BY SIZE
               WS-ISO-DATE DELIMITED BY SIZE
               '</date>' DELIMITED BY SIZE
               '<amount>' DELIMITED BY SIZE
               WS-DISPLAY-AMOUNT DELIMITED BY SIZE
               '</amount>' DELIMITED BY SIZE
               '</activity>' DELIMITED BY SIZE
               '</beneficiary>' DELIMITED BY SIZE
           INTO WS-XML-BUFFER
           END-STRING

           WRITE GAO-OUTPUT-RECORD FROM WS-XML-BUFFER
           .

Marcus considered using IBM's XML GENERATE statement (available in Enterprise COBOL v5+) but found it added excessive whitespace and produced XML that didn't match GAO's exact schema requirements. The STRING-based approach gives precise control over the output format.

CBO Transformation (CBOXFORM)

CBO requires CSV with a header row. This is the simplest transformation — but CSV has its own gotchas. Any field containing commas must be quoted. Beneficiary descriptions occasionally contain commas. Marcus's program handles this:

       FORMAT-CSV-FIELD.
           INSPECT WS-FIELD-VALUE TALLYING WS-COMMA-COUNT
               FOR ALL ','
           IF WS-COMMA-COUNT > 0
               STRING '"' DELIMITED BY SIZE
                      WS-FIELD-VALUE DELIMITED BY SPACES
                      '"' DELIMITED BY SIZE
                   INTO WS-CSV-FIELD
               END-STRING
           ELSE
               MOVE WS-FIELD-VALUE TO WS-CSV-FIELD
           END-IF
           .

Stage 4: Delivery

Each agency receives its file through a different mechanism.

OMB — SFTP:

//OMBSFTP  EXEC PGM=BPXBATCH,
//         PARM='SH /opt/fba/scripts/sftp_omb.sh'
//STDIN    DD DUMMY
//STDOUT   DD SYSOUT=*
//STDERR   DD SYSOUT=*

The shell script uses SFTP with key-based authentication to upload to OMB's secure file server. Marcus wraps this in BPXBATCH to call it from JCL, with return code checking in a subsequent step.

GAO — Connect:Direct:

Connect:Direct process transfers the XML file to GAO's mainframe (yes, GAO also runs a mainframe). This is the most reliable transfer path — mainframe-to-mainframe Connect:Direct with checkpoint/restart and guaranteed delivery.

CBO — API Upload:

CBO modernized their intake process in 2024 and now accepts submissions via a REST API. Marcus wrote a COBOL program that uses z/OS Connect to POST the CSV file to CBO's upload endpoint:

POST https://data.cbo.gov/api/v2/submissions/quarterly
Content-Type: multipart/form-data

agency_id: FBA
report_type: QUARTERLY_ACTUARY
quarter: Q1-2026
file: [CSV content]

This was the most challenging integration because it's synchronous — the COBOL program must handle HTTP response codes, timeouts, and retry logic. Marcus implemented exponential backoff with three retries and a circuit breaker that halts submission attempts if three consecutive calls fail.

Stage 5: Confirmation and Reconciliation

After delivery, each agency provides confirmation:

  • OMB sends an acknowledgment file via SFTP within 24 hours
  • GAO sends a Connect:Direct acknowledgment process
  • CBO's API returns a submission ID and status in the HTTP response

A monitoring job runs every two hours after submission, checking for confirmations. If no confirmation arrives within 48 hours, Sandra is notified. If any agency rejects the submission, the remediation process begins immediately.

Rejection handling:

Each agency provides rejection reasons in their confirmation/rejection response. Common rejection reasons: - Record count mismatch (file truncated during transfer) - Schema validation failure (field in wrong position or wrong format) - Business rule violation (amounts don't sum to expected totals) - Duplicate submission (same quarter submitted twice)

For each rejection type, Marcus has a pre-built remediation runbook:

  1. Record count mismatch: Retransmit from the GDG. The file is still available.
  2. Schema validation: Compare the rejected file against the agency's published schema. Fix the transformation program. Regenerate from the canonical extract.
  3. Business rule violation: Investigate the specific rule. Fix data in the canonical extract if necessary. Regenerate the agency file.
  4. Duplicate submission: Contact the agency to clarify. Usually a false positive from their system.

The Q3-2025 Incident

In October 2025, FBA experienced their most significant reporting failure in five years. The Q3-2025 canonical extract completed normally. Validation passed with 0.03% error rate. All three transformations completed. All three deliveries succeeded. All three confirmations were received.

Then GAO called. Their analysis showed that FBA's Q3 submission contained only 389,000 beneficiary records in the statistical sample, not the expected 420,000. GAO's rules require the sample to include every beneficiary who received benefits in all three months of the quarter. FBA's sample flag logic only included beneficiaries active at quarter-end, missing 31,000 who were active during the quarter but whose benefits ended before September 30.

The root cause was a change to the eligibility determination program made in August 2025. The change modified how the eligibility end date was populated, which affected the sample selection logic in the canonical extract program. The extract program's sample flag logic had an implicit dependency on the eligibility end date format that wasn't documented.

What went wrong:

  1. The eligibility program change was tested for eligibility correctness but not for downstream reporting impact.
  2. The validation rules checked individual record validity but not population completeness.
  3. No validation rule compared the sample count against historical baselines.
  4. The GAO transformation had no count validation against expected sample sizes.

Remediation:

Marcus rebuilt the canonical extract with corrected sample logic and resubmitted within three days. GAO accepted the corrected file.

Preventive measures:

  1. Baseline monitoring. Every key metric (record counts by program, state, sample flag) is now compared against the prior quarter's values. Deviations greater than 5% trigger a warning; deviations greater than 15% halt the pipeline.
  2. Impact analysis for upstream changes. Any change to programs that feed the canonical extract now requires sign-off from the reporting team.
  3. End-to-end test with production data. Each quarter, a dry-run extract is produced one week before quarter-end using production data through the prior month. This catches logic errors before the real deadline.
  4. Sample population reconciliation. A new validation step independently calculates the expected sample population from the eligibility database and compares it against the canonical extract's sample flag counts.

Architecture Diagram

┌─────────────────────────────────────────────────────┐
│                   STAGE 1: EXTRACT                   │
│  DB2 Tables ─┐                                       │
│  VSAM Files ─┼──► CANEXTR Program ──► Canonical GDG  │
│  IMS Segments┘                                       │
└─────────────────────┬───────────────────────────────┘
                      │
┌─────────────────────▼───────────────────────────────┐
│                  STAGE 2: VALIDATE                    │
│  Canonical GDG ──► VALREG Program ──┬─► Clean File   │
│                                     ├─► Error File   │
│                                     └─► Summary Rpt  │
└─────────────────────┬───────────────────────────────┘
                      │
┌─────────────────────▼───────────────────────────────┐
│               STAGE 3: TRANSFORM                     │
│  Clean File ──┬──► OMBXFORM ──► Fixed-Width ASCII    │
│               ├──► GAOXFORM ──► XML                  │
│               └──► CBOXFORM ──► CSV                  │
└──────┬──────────────┬──────────────┬────────────────┘
       │              │              │
┌──────▼──────┐ ┌─────▼─────┐ ┌─────▼──────┐
│  STAGE 4A   │ │ STAGE 4B  │ │ STAGE 4C   │
│  OMB: SFTP  │ │ GAO: C:D  │ │ CBO: API   │
└──────┬──────┘ └─────┬─────┘ └─────┬──────┘
       │              │              │
┌──────▼──────────────▼──────────────▼────────────────┐
│             STAGE 5: CONFIRM & RECONCILE             │
│  Monitor confirmations, handle rejections            │
└─────────────────────────────────────────────────────┘

Key Design Principles

  1. Single canonical extract. One program, one extract, three transformations. Changes to source data affect one program. Changes to agency formats affect one transformation each.

  2. GDG everything. Every intermediate file uses GDGs. When GAO rejected Q3-2025 and Sandra needed to regenerate, the canonical extract was still available in the GDG. No need to re-extract from production.

  3. Validation before transformation. Bad data is caught once, in one place, before it propagates to three different output formats.

  4. Independent delivery paths. Each agency's delivery is a separate job step. If SFTP to OMB fails, Connect:Direct to GAO still proceeds. Failures are isolated.

  5. Baseline monitoring. The Q3-2025 incident taught Sandra that correctness validation alone isn't enough. Completeness validation — comparing against expected baselines — catches a class of errors that record-level rules miss.

Discussion Questions

  1. Marcus chose STRING-based XML generation over IBM's XML GENERATE. Under what circumstances would XML GENERATE be the better choice? What are the tradeoffs?

  2. The CBO API integration requires synchronous HTTP calls from COBOL. What are the failure modes of this approach, and how would you design for resilience? Consider: CBO API downtime, network timeouts, rate limiting, and authentication token expiration.

  3. Sandra's team has ten business days from quarter-close to final delivery. If you could redesign the pipeline to produce all three reports in five business days, what changes would you make?

  4. The Q3-2025 incident was caused by an upstream change impacting downstream reporting. How would you formalize the dependency management between the eligibility system and the reporting pipeline? Consider both technical and process solutions.

  5. FBA serves 4.2 million beneficiaries. If that number doubles to 8.4 million over the next decade, which stages of the pipeline will hit scalability limits first? What architecture changes would you make proactively?