Case Study 2: MedClaim Claims Adjudication Pipeline

DataField.Dev

Case Study 2: MedClaim Claims Adjudication Pipeline

Background

MedClaim Health Services processes 500,000 insurance claims per month — approximately 25,000 per business day. Each claim must be validated against multiple reference files before payment can be authorized. The claims adjudication pipeline is a multi-step batch process that James Okafor's team has maintained and evolved over 12 years.

"Claims adjudication is the hardest multi-file problem I've worked on," says James. "A single claim touches the provider file, the member file, the procedure code table, the fee schedule, and the accumulator file. If any one of those lookups fails or returns stale data, we either pay a claim we shouldn't or deny a claim we should pay. Both are expensive mistakes."

The Pipeline Architecture

Step 1: CLAIM-SORT     — Sort incoming claims by Claim ID
Step 2: CLAIM-MERGE    — Merge claims from 3 sources (EDI, paper, resubmission)
Step 3: CLAIM-VALIDATE — Validate against provider + member files
Step 4: CLAIM-PRICE    — Apply fee schedule and benefit rules
Step 5: CLAIM-ACCUM    — Update member accumulators (deductible, out-of-pocket)
Step 6: CLAIM-SPLIT    — Split into approved, denied, pending files
Step 7: CLAIM-REPORT   — Control break report by provider, member, denial reason

Step 2: Three-Way Merge

Claims arrive from three sources daily:

Source	Volume	Format
EDI (Electronic Data Interchange)	~18,000/day	837P/837I transactions
Paper claims (keyed by data entry)	~5,000/day	Internal flat file
Resubmissions (previously denied)	~2,000/day	Internal flat file

Each source produces a file sorted by Claim ID. The merge step combines all three into a single adjudication input file:

       MERGE-CLAIM-SOURCES.
           PERFORM READ-EDI
           PERFORM READ-PAPER
           PERFORM READ-RESUB

           PERFORM UNTIL EDI-CLAIM-ID = HIGH-VALUES
               AND PAPER-CLAIM-ID = HIGH-VALUES
               AND RESUB-CLAIM-ID = HIGH-VALUES

               EVALUATE TRUE
                   WHEN EDI-CLAIM-ID <= PAPER-CLAIM-ID
                       AND EDI-CLAIM-ID <= RESUB-CLAIM-ID
                       MOVE 'E' TO MG-SOURCE-CODE
                       WRITE MERGED-RECORD FROM EDI-RECORD
                       PERFORM READ-EDI
                   WHEN PAPER-CLAIM-ID <= EDI-CLAIM-ID
                       AND PAPER-CLAIM-ID <= RESUB-CLAIM-ID
                       MOVE 'P' TO MG-SOURCE-CODE
                       WRITE MERGED-RECORD FROM PAPER-RECORD
                       PERFORM READ-PAPER
                   WHEN OTHER
                       MOVE 'R' TO MG-SOURCE-CODE
                       WRITE MERGED-RECORD FROM RESUB-RECORD
                       PERFORM READ-RESUB
               END-EVALUATE
           END-PERFORM.

Step 3: Multi-File Validation

This is the core of the pipeline. Each claim is matched against the PROVIDER-FILE (VSAM KSDS) and MEMBER-FILE (VSAM KSDS) using random access, while the claims themselves are processed sequentially:

Claims (sequential) ──┐
                       ├── CLAIM-VALIDATE ──> Validated claims
Provider (VSAM KSDS) ─┤                  ──> Denied claims
Member (VSAM KSDS) ───┘                  ──> Exception report

The validation performs six checks for each claim:

Provider exists: Look up PRV-PROVIDER-ID in PROVIDER-FILE
Provider is active: Check PRV-CONTRACT-STATUS = 'IN'
Member exists: Look up MBR-MEMBER-ID in MEMBER-FILE
Member is active: Check MBR-STATUS = 'AC' and service date within coverage dates
Provider-member match: Verify provider is in the member's plan network
Procedure code valid: Look up CLM-PROC-CODE in the procedure code table

Each check can result in a specific denial reason code, and the first failure stops further validation (fail-fast pattern).

Step 7: Multi-Level Control Break Report

The daily adjudication report uses three control break levels:

Level 1: Denial Reason Code
  Level 2: Provider ID
    Level 3: Detail (individual claims)

For each denial reason:
  Count of claims denied for that reason
  Total billed amount denied
  For each provider within that reason:
    Count of denials from that provider
    Total denied amount from that provider
    Individual claim details

Grand total: Total claims processed, approved, denied, pending

The Data Quality Crisis of 2023

In September 2023, MedClaim's claims denial rate suddenly spiked from the normal 8% to 23%. The exception report showed thousands of claims denied with reason code 'MBRNF' (member not found).

Investigation

James Okafor's team traced the problem through the pipeline:

Step 3 (CLAIM-VALIDATE) was correctly reporting member-not-found
MEMBER-FILE showed the members existed with active status
The lookup was failing because the claim's MEMBER-ID field had trailing spaces in a different position than the MEMBER-FILE's key

Root cause: The EDI translation program had been updated to use a new field mapping. The new mapping left-justified the member ID in a 15-byte field (previously 12 bytes). The extra 3 bytes of spaces caused the key lookup to fail because 'MBR12345 ' (12 chars + 3 spaces) did not match 'MBR12345 ' (12 chars with different trailing space pattern).

The Fix

Sarah Kim designed a key normalization step that runs before validation:

       NORMALIZE-MEMBER-ID.
      *    Strip trailing spaces and re-pad to exact key length
           MOVE SPACES TO WS-NORMALIZED-ID
           MOVE ZERO TO WS-CHAR-POS
           INSPECT CLM-MEMBER-ID
               TALLYING WS-CHAR-POS
               FOR CHARACTERS BEFORE INITIAL SPACE
           IF WS-CHAR-POS > ZERO
               MOVE CLM-MEMBER-ID(1:WS-CHAR-POS)
                   TO WS-NORMALIZED-ID
           END-IF
           MOVE WS-NORMALIZED-ID TO CLM-MEMBER-ID.

Prevention

The team added a reconciliation step that compares daily claim volumes by source and denial rate against 30-day moving averages. Any denial rate deviation greater than 2 percentage points triggers an alert:

       CHECK-DENIAL-RATE.
           COMPUTE WS-CURRENT-RATE =
               (WS-DENIED-COUNT * 100) / WS-TOTAL-COUNT
           IF WS-CURRENT-RATE >
               (WS-30DAY-AVG-RATE + 2)
               DISPLAY '*** ALERT: Denial rate '
                       WS-CURRENT-RATE
                       '% exceeds 30-day average '
                       WS-30DAY-AVG-RATE '% by > 2 points'
               PERFORM SEND-OPERATIONS-ALERT
           END-IF.

Performance Metrics

Pipeline Step	Elapsed Time	Records/Minute
CLAIM-SORT	3 min	N/A (sort utility)
CLAIM-MERGE	5 min	150,000
CLAIM-VALIDATE	28 min	53,571
CLAIM-PRICE	12 min	125,000
CLAIM-ACCUM	15 min	100,000
CLAIM-SPLIT	4 min	375,000
CLAIM-REPORT	8 min	N/A (report generation)
Total	75 min	—

CLAIM-VALIDATE is the bottleneck because each claim requires two random VSAM reads (provider and member). The team uses VSAM Local Shared Resources (LSR) buffering to keep frequently accessed provider and member records in memory, reducing physical I/O.

Checkpoint/Restart Strategy

The CLAIM-VALIDATE step checkpoints every 5,000 claims:

Checkpoint record:
  - Last claim ID processed
  - Counts: approved, denied, pending, error
  - Running totals: billed amount, allowed amount
  - Timestamp

On restart:
  - Read checkpoint file
  - Position claim input to last claim + 1
  - Restore all counters and totals
  - Resume processing

The checkpoint file is a simple sequential file with one record, overwritten at each checkpoint. On restart, the program reads it once, then processes normally.

Lessons Learned

Key normalization is not optional: Any field used for cross-file matching must be normalized — trailing spaces, leading zeros, case, and encoding must be consistent across all sources.
Fail-fast validation saves time: Checking provider before member (and stopping on first failure) avoids unnecessary VSAM reads for claims that will be denied anyway.
Monitoring beats testing: The 30-day moving average alert caught the September 2023 spike within 4 hours. Regression testing of the EDI change had not caught the field-width issue because test data used short member IDs that fit within both the old and new field sizes.
Multi-file processing amplifies data quality issues: When one file has clean data and another has dirty data, the matching step exposes every inconsistency. Data quality in the source files is the single biggest factor in adjudication accuracy.
Control break reports are operational tools: The denial-by-provider-by-reason report is not just documentation — it is the primary tool operations uses to identify billing problems, provider issues, and system errors.

Discussion Questions

The pipeline has 7 sequential steps. Which steps could be parallelized? What dependencies prevent full parallelization?
CLAIM-VALIDATE uses random VSAM access for provider and member lookups. Under what circumstances would loading these reference files into COBOL tables (in-memory) be preferable? What are the limits?
The September 2023 incident was caused by a field-width change in an upstream system. How would you design an interface contract that prevents this type of issue?
James Okafor's team processes 25,000 claims daily. If volume grew to 250,000 daily, which pipeline steps would hit bottlenecks first? How would you scale?
Compare this batch pipeline approach with a real-time adjudication model (processing each claim as it arrives via a CICS transaction). What are the trade-offs in accuracy, performance, and operational complexity?