Case Study 2: MedClaim Claims Adjudication Pipeline
Background
MedClaim Health Services processes 500,000 insurance claims per month — approximately 25,000 per business day. Each claim must be validated against multiple reference files before payment can be authorized. The claims adjudication pipeline is a multi-step batch process that James Okafor's team has maintained and evolved over 12 years.
"Claims adjudication is the hardest multi-file problem I've worked on," says James. "A single claim touches the provider file, the member file, the procedure code table, the fee schedule, and the accumulator file. If any one of those lookups fails or returns stale data, we either pay a claim we shouldn't or deny a claim we should pay. Both are expensive mistakes."
The Pipeline Architecture
Step 1: CLAIM-SORT — Sort incoming claims by Claim ID
Step 2: CLAIM-MERGE — Merge claims from 3 sources (EDI, paper, resubmission)
Step 3: CLAIM-VALIDATE — Validate against provider + member files
Step 4: CLAIM-PRICE — Apply fee schedule and benefit rules
Step 5: CLAIM-ACCUM — Update member accumulators (deductible, out-of-pocket)
Step 6: CLAIM-SPLIT — Split into approved, denied, pending files
Step 7: CLAIM-REPORT — Control break report by provider, member, denial reason
Step 2: Three-Way Merge
Claims arrive from three sources daily:
| Source | Volume | Format |
|---|---|---|
| EDI (Electronic Data Interchange) | ~18,000/day | 837P/837I transactions |
| Paper claims (keyed by data entry) | ~5,000/day | Internal flat file |
| Resubmissions (previously denied) | ~2,000/day | Internal flat file |
Each source produces a file sorted by Claim ID. The merge step combines all three into a single adjudication input file:
MERGE-CLAIM-SOURCES.
PERFORM READ-EDI
PERFORM READ-PAPER
PERFORM READ-RESUB
PERFORM UNTIL EDI-CLAIM-ID = HIGH-VALUES
AND PAPER-CLAIM-ID = HIGH-VALUES
AND RESUB-CLAIM-ID = HIGH-VALUES
EVALUATE TRUE
WHEN EDI-CLAIM-ID <= PAPER-CLAIM-ID
AND EDI-CLAIM-ID <= RESUB-CLAIM-ID
MOVE 'E' TO MG-SOURCE-CODE
WRITE MERGED-RECORD FROM EDI-RECORD
PERFORM READ-EDI
WHEN PAPER-CLAIM-ID <= EDI-CLAIM-ID
AND PAPER-CLAIM-ID <= RESUB-CLAIM-ID
MOVE 'P' TO MG-SOURCE-CODE
WRITE MERGED-RECORD FROM PAPER-RECORD
PERFORM READ-PAPER
WHEN OTHER
MOVE 'R' TO MG-SOURCE-CODE
WRITE MERGED-RECORD FROM RESUB-RECORD
PERFORM READ-RESUB
END-EVALUATE
END-PERFORM.
Step 3: Multi-File Validation
This is the core of the pipeline. Each claim is matched against the PROVIDER-FILE (VSAM KSDS) and MEMBER-FILE (VSAM KSDS) using random access, while the claims themselves are processed sequentially:
Claims (sequential) ──┐
├── CLAIM-VALIDATE ──> Validated claims
Provider (VSAM KSDS) ─┤ ──> Denied claims
Member (VSAM KSDS) ───┘ ──> Exception report
The validation performs six checks for each claim:
- Provider exists: Look up PRV-PROVIDER-ID in PROVIDER-FILE
- Provider is active: Check PRV-CONTRACT-STATUS = 'IN'
- Member exists: Look up MBR-MEMBER-ID in MEMBER-FILE
- Member is active: Check MBR-STATUS = 'AC' and service date within coverage dates
- Provider-member match: Verify provider is in the member's plan network
- Procedure code valid: Look up CLM-PROC-CODE in the procedure code table
Each check can result in a specific denial reason code, and the first failure stops further validation (fail-fast pattern).
Step 7: Multi-Level Control Break Report
The daily adjudication report uses three control break levels:
Level 1: Denial Reason Code
Level 2: Provider ID
Level 3: Detail (individual claims)
For each denial reason:
Count of claims denied for that reason
Total billed amount denied
For each provider within that reason:
Count of denials from that provider
Total denied amount from that provider
Individual claim details
Grand total: Total claims processed, approved, denied, pending
The Data Quality Crisis of 2023
In September 2023, MedClaim's claims denial rate suddenly spiked from the normal 8% to 23%. The exception report showed thousands of claims denied with reason code 'MBRNF' (member not found).
Investigation
James Okafor's team traced the problem through the pipeline:
- Step 3 (CLAIM-VALIDATE) was correctly reporting member-not-found
- MEMBER-FILE showed the members existed with active status
- The lookup was failing because the claim's MEMBER-ID field had trailing spaces in a different position than the MEMBER-FILE's key
Root cause: The EDI translation program had been updated to use a new field mapping. The new mapping left-justified the member ID in a 15-byte field (previously 12 bytes). The extra 3 bytes of spaces caused the key lookup to fail because 'MBR12345 ' (12 chars + 3 spaces) did not match 'MBR12345 ' (12 chars with different trailing space pattern).
The Fix
Sarah Kim designed a key normalization step that runs before validation:
NORMALIZE-MEMBER-ID.
* Strip trailing spaces and re-pad to exact key length
MOVE SPACES TO WS-NORMALIZED-ID
MOVE ZERO TO WS-CHAR-POS
INSPECT CLM-MEMBER-ID
TALLYING WS-CHAR-POS
FOR CHARACTERS BEFORE INITIAL SPACE
IF WS-CHAR-POS > ZERO
MOVE CLM-MEMBER-ID(1:WS-CHAR-POS)
TO WS-NORMALIZED-ID
END-IF
MOVE WS-NORMALIZED-ID TO CLM-MEMBER-ID.
Prevention
The team added a reconciliation step that compares daily claim volumes by source and denial rate against 30-day moving averages. Any denial rate deviation greater than 2 percentage points triggers an alert:
CHECK-DENIAL-RATE.
COMPUTE WS-CURRENT-RATE =
(WS-DENIED-COUNT * 100) / WS-TOTAL-COUNT
IF WS-CURRENT-RATE >
(WS-30DAY-AVG-RATE + 2)
DISPLAY '*** ALERT: Denial rate '
WS-CURRENT-RATE
'% exceeds 30-day average '
WS-30DAY-AVG-RATE '% by > 2 points'
PERFORM SEND-OPERATIONS-ALERT
END-IF.
Performance Metrics
| Pipeline Step | Elapsed Time | Records/Minute |
|---|---|---|
| CLAIM-SORT | 3 min | N/A (sort utility) |
| CLAIM-MERGE | 5 min | 150,000 |
| CLAIM-VALIDATE | 28 min | 53,571 |
| CLAIM-PRICE | 12 min | 125,000 |
| CLAIM-ACCUM | 15 min | 100,000 |
| CLAIM-SPLIT | 4 min | 375,000 |
| CLAIM-REPORT | 8 min | N/A (report generation) |
| Total | 75 min | — |
CLAIM-VALIDATE is the bottleneck because each claim requires two random VSAM reads (provider and member). The team uses VSAM Local Shared Resources (LSR) buffering to keep frequently accessed provider and member records in memory, reducing physical I/O.
Checkpoint/Restart Strategy
The CLAIM-VALIDATE step checkpoints every 5,000 claims:
Checkpoint record:
- Last claim ID processed
- Counts: approved, denied, pending, error
- Running totals: billed amount, allowed amount
- Timestamp
On restart:
- Read checkpoint file
- Position claim input to last claim + 1
- Restore all counters and totals
- Resume processing
The checkpoint file is a simple sequential file with one record, overwritten at each checkpoint. On restart, the program reads it once, then processes normally.
Lessons Learned
-
Key normalization is not optional: Any field used for cross-file matching must be normalized — trailing spaces, leading zeros, case, and encoding must be consistent across all sources.
-
Fail-fast validation saves time: Checking provider before member (and stopping on first failure) avoids unnecessary VSAM reads for claims that will be denied anyway.
-
Monitoring beats testing: The 30-day moving average alert caught the September 2023 spike within 4 hours. Regression testing of the EDI change had not caught the field-width issue because test data used short member IDs that fit within both the old and new field sizes.
-
Multi-file processing amplifies data quality issues: When one file has clean data and another has dirty data, the matching step exposes every inconsistency. Data quality in the source files is the single biggest factor in adjudication accuracy.
-
Control break reports are operational tools: The denial-by-provider-by-reason report is not just documentation — it is the primary tool operations uses to identify billing problems, provider issues, and system errors.
Discussion Questions
-
The pipeline has 7 sequential steps. Which steps could be parallelized? What dependencies prevent full parallelization?
-
CLAIM-VALIDATE uses random VSAM access for provider and member lookups. Under what circumstances would loading these reference files into COBOL tables (in-memory) be preferable? What are the limits?
-
The September 2023 incident was caused by a field-width change in an upstream system. How would you design an interface contract that prevents this type of issue?
-
James Okafor's team processes 25,000 claims daily. If volume grew to 250,000 daily, which pipeline steps would hit bottlenecks first? How would you scale?
-
Compare this batch pipeline approach with a real-time adjudication model (processing each claim as it arrives via a CICS transaction). What are the trade-offs in accuracy, performance, and operational complexity?