Case Study 2: MedClaim Claims Pipeline Reconciliation Failure

DataField.Dev

Case Study 2: MedClaim Claims Pipeline Reconciliation Failure

The Situation

MedClaim's monthly claims processing volumes had been steadily increasing — from 420,000 claims per month two years ago to 510,000 claims in the current month. The nightly batch pipeline processed between 15,000 and 25,000 claims each night, depending on submission patterns.

One Thursday morning, James Okafor arrived to find an unusual message in the overnight batch log:

CLM-RECON: PIPELINE RECONCILIATION FAILED
  INTAKE HASH TOTAL:    892,441,337
  ADJUDICATION HASH:    862,113,448
  REJECTED HASH:         29,019,554
  COMBINED HASH:        891,133,002
  DIFFERENCE:             1,308,335
  RECORDS MISSING:               47
CLM-RECON: RETURN CODE 16

The reconciliation job — the final step in the pipeline — detected that 47 claims had entered the intake step but were not accounted for at the end of adjudication. They had not been adjudicated, not been rejected, and not been pended. They had simply vanished.

The Investigation

Day 1: Narrowing the Scope

James first checked each pipeline step's control total reports:

Intake (CLM-INT): 18,247 claims received, 17,891 accepted, 356 rejected. Hash total of accepted claims: 862,441,337. Record counts balanced. No anomalies.

Validation (CLM-VAL): 17,891 claims input (matches intake output). 17,612 validated, 279 failed validation. Combined hash: 862,441,337. Balanced.

Wait — the reconciliation report showed the intake hash total as 892,441,337, but validation confirmed receiving 862,441,337. James re-examined the intake control total report:

CLM-INT: CONTROL TOTALS
  CLAIMS RECEIVED:     18,247
  CLAIMS ACCEPTED:     17,891
  CLAIMS REJECTED:        356
  HASH TOTAL (ACCEPTED): 862,441,337
  HASH TOTAL (REJECTED):  30,327,889
  HASH TOTAL (COMBINED):  892,768,226

The combined hash was 892,768,226, not 892,441,337. But the reconciliation report showed 892,441,337. Where did that number come from?

Day 2: Finding the Bug

James traced the reconciliation program's logic. It read the intake control total file to get the "input" hash total. He examined the control total file format:

       01  CTL-INTAKE-RECORD.
           05  CTL-INT-CLAIMS-RECEIVED PIC 9(06).
           05  CTL-INT-CLAIMS-ACCEPTED PIC 9(06).
           05  CTL-INT-CLAIMS-REJECTED PIC 9(06).
           05  CTL-INT-HASH-ACCEPTED   PIC S9(15) COMP-3.
           05  CTL-INT-HASH-REJECTED   PIC S9(15) COMP-3.
           05  CTL-INT-HASH-COMBINED   PIC S9(15) COMP-3.

The reconciliation program was reading CTL-INT-HASH-COMBINED as the total hash to verify against. But there was a layout mismatch. Three months earlier, Sarah Kim had requested an additional field — CTL-INT-FINANCIAL-TOTAL — inserted between the rejected hash and the combined hash. The intake program's copybook had been updated:

       01  CTL-INTAKE-RECORD.
           05  CTL-INT-CLAIMS-RECEIVED PIC 9(06).
           05  CTL-INT-CLAIMS-ACCEPTED PIC 9(06).
           05  CTL-INT-CLAIMS-REJECTED PIC 9(06).
           05  CTL-INT-HASH-ACCEPTED   PIC S9(15) COMP-3.
           05  CTL-INT-HASH-REJECTED   PIC S9(15) COMP-3.
           05  CTL-INT-FINANCIAL-TOTAL PIC S9(13)V99 COMP-3.
           05  CTL-INT-HASH-COMBINED   PIC S9(15) COMP-3.

But the reconciliation program was still using the old copybook. What it thought was the combined hash total was actually the first 8 bytes of the financial total field. The "892,441,337" was not a hash total at all — it was a misinterpreted fragment of a dollar amount.

The Real Question

Once the copybook mismatch was fixed, the reconciliation program reported:

CLM-RECON: PIPELINE RECONCILIATION
  INTAKE HASH TOTAL:    892,768,226
  ADJUDICATION HASH:    862,113,448
  REJECTED HASH:         30,327,889
  COMBINED HASH:        892,441,337
  DIFFERENCE:               326,889
  RECORDS MISSING:               12

Now the hash totals were real, and the actual discrepancy was 12 claims (326,889 in hash difference), not 47. But 12 claims were still missing. James dug deeper into the validation step.

The bug was in the validation program's handling of claims with special characters in the provider name field. Twelve claims from a provider named "O'Brien & Associates" had apostrophes and ampersands that caused the validation program to reject them with a format error — but the rejection counter was not incremented because the error occurred in a paragraph that was executed before the counter logic.

The Fix

Three changes were needed:

Copybook synchronization: All programs in the pipeline were recompiled with the same version of the control total copybook. A new build procedure was implemented requiring all pipeline programs to compile from a shared copybook library.
Validation fix: The format validation paragraph was restructured so that rejections were counted regardless of where in the validation process they occurred.
Reconciliation enhancement: The reconciliation program was enhanced to compare not just the combined hash but also the component hashes (accepted + rejected should equal combined from the prior step).

Key Lessons

Copybook version mismatches are insidious. The reconciliation program compiled and ran without errors — it simply interpreted the data incorrectly. This is a class of bug that no compiler can catch.
Control totals only work if the control total mechanism itself is correct. The reconciliation was designed to catch missing claims, but a copybook mismatch undermined the mechanism. Testing the control total logic independently is essential.
Multiple levels of verification catch more errors. If the reconciliation had compared component hashes (not just the combined total), the copybook mismatch would have been detected immediately — the individual hash totals would not have made sense as hash values.
Change management across multi-program pipelines requires discipline. Adding a field to a shared record layout requires recompiling every program that uses that layout — not just the ones that use the new field.

Discussion Questions

How could the copybook mismatch have been prevented by the build process? What automated checks could catch this type of error?
The 12 missing claims went undetected for three months (since the copybook change). What processes should be in place to detect silent data loss earlier?
How would using a shared COPY member in a central copybook library have helped or not helped in this situation?
Should the control total file use a self-describing format (like including field names or a version number in the record)? What are the trade-offs?