Case Study 10.2: MedClaim's Malformed Claims Defense

DataField.Dev

Case Study 10.2: MedClaim's Malformed Claims Defense

Background

MedClaim Health Services receives medical claims electronically from over 600 healthcare providers. Each provider uses different billing software, resulting in wide variation in data quality. In 2022, before James Okafor overhauled the intake process, approximately 3% of claims entering the system contained data quality issues — ranging from minor formatting problems to completely garbled records.

At 500,000 claims per month, 3% meant 15,000 problematic claims. Some caused ABENDs in downstream programs. Others passed through validation but produced incorrect payments. The worst cases corrupted the CLAIM-MASTER file when a malformed packed-decimal field was written to VSAM.

The Incident That Sparked the Overhaul

On September 7, 2022, Provider #4472 (a large hospital system) upgraded their billing software. The upgrade changed the format of the CLM-BILLED-AMT field from EBCDIC display numeric to ASCII display numeric — but only for claims submitted after 3 PM that day.

The CLM-INTAKE program had no validation on numeric fields. It accepted the ASCII data and wrote it to the CLAIM-MASTER file. When CLM-ADJUD attempted to use CLM-BILLED-AMT in a COMPUTE statement, the invalid EBCDIC-as-ASCII data caused a S0C7 ABEND. Worse, 1,247 claims with corrupted financial data had already been written to CLAIM-MASTER before the ABEND stopped processing.

Recovery took 14 hours. The team had to: 1. Identify all 1,247 corrupted claims 2. Restore the CLAIM-MASTER VSAM cluster from backup 3. Re-apply all valid claims from the day 4. Contact Provider #4472 to resubmit the affected claims in the correct format

James's Multi-Layered Defense

James designed a three-layer validation pipeline in CLM-INTAKE that he calls the "airlock" — nothing gets through to CLAIM-MASTER without passing all three layers.

Layer 1: Field-Level Validation

Every field in every claim is individually validated:

       3100-VALIDATE-FIELDS.
           PERFORM 3110-VAL-CLAIM-ID
           PERFORM 3120-VAL-CLAIM-TYPE
           PERFORM 3130-VAL-MEMBER-ID
           PERFORM 3140-VAL-PROVIDER-NPI
           PERFORM 3150-VAL-BILLED-AMOUNT
           PERFORM 3160-VAL-DIAGNOSIS-CODES
           PERFORM 3170-VAL-PROCEDURE-CODE
           PERFORM 3180-VAL-SERVICE-DATE
           PERFORM 3190-VAL-PROVIDER-TAX-ID.

Key technique — validate numeric fields before moving them to COMP-3:

       3150-VAL-BILLED-AMOUNT.
           IF WS-RAW-BILLED-AMT IS NOT NUMERIC
               PERFORM 3900-ADD-ERROR
               MOVE 'BILLED AMOUNT NOT NUMERIC'
                   TO WS-VAL-ERR-MSG(WS-VAL-ERROR-COUNT)
           ELSE
               IF WS-RAW-BILLED-AMT NOT > ZERO
                   PERFORM 3900-ADD-ERROR
                   MOVE 'BILLED AMOUNT NOT POSITIVE'
                       TO WS-VAL-ERR-MSG(WS-VAL-ERROR-COUNT)
               ELSE IF WS-RAW-BILLED-AMT > 9999999.99
                   PERFORM 3900-ADD-ERROR
                   MOVE 'BILLED AMOUNT EXCEEDS MAXIMUM'
                       TO WS-VAL-ERR-MSG(WS-VAL-ERROR-COUNT)
               END-IF
           END-IF.

Layer 2: Cross-Field Validation

Relationships between fields are checked:

Pharmacy claims (CLM-TYPE = 'RX') must not have surgical procedure codes
Service date must be on or before the received date
If CLM-TYPE is 'DN' (dental), the diagnosis code must be in the dental range
Provider NPI must match the provider's registered specialties

Layer 3: Referential Validation

Claims are checked against reference files:

Member ID must exist in MEMBER-FILE and have active coverage
Provider NPI must exist in PROVIDER-FILE and be in active status
Group number must be valid for the member
Procedure code must exist in the procedure code table

The Consecutive Error Circuit Breaker

James's most innovative defense is the consecutive error counter:

       01  WS-CONSEC-ERRORS         PIC 9(03) VALUE ZERO.
       01  WS-MAX-CONSEC            PIC 9(03) VALUE 10.

When 10 claims in a row fail validation, CLM-INTAKE stops processing and issues a diagnostic:

*** CIRCUIT BREAKER TRIPPED ***
10 CONSECUTIVE VALIDATION FAILURES
LAST PROVIDER: 4472
LAST ERROR: BILLED AMOUNT NOT NUMERIC
POSSIBLE CAUSE: FORMAT CHANGE BY PROVIDER
ACTION: CONTACT PROVIDER, VERIFY FILE FORMAT

This diagnostic would have caught the Provider #4472 incident within the first 10 records instead of after 1,247.

Results

Metric	Before (2022)	After (2023)
Claims causing downstream ABENDs	50-80/month	0
Claims with corrupted financial data in MASTER	200-400/month	0
Incorrect payments due to data quality	15-25/month	1-2/month
Overall rejection rate	3.0%	2.8% (same rejects, now caught earlier)
Time to detect provider format changes	Hours to days	Minutes (circuit breaker)
CLAIM-MASTER backup restores needed	3-4/year	0

The Reject Analysis Dashboard

Sarah Kim, the business analyst, built a report from the reject file that categorizes errors by provider, error type, and frequency. This report helps MedClaim proactively contact providers with recurring quality issues.

Sample output:

PROVIDER  ERROR TYPE            COUNT  % OF THEIR CLAIMS
--------  --------------------  -----  -----------------
4472      NON-NUMERIC AMOUNT      847  12.3%
2891      INVALID MEMBER ID       234   4.1%
5544      MISSING DIAGNOSIS CODE  189   2.8%
3217      EXPIRED PROVIDER NPI    156   3.4%

Lessons Learned

Validate before you store. Once bad data reaches the master file, recovery is expensive. Validate at the intake boundary.
The NUMERIC test is your S0C7 vaccine. Testing every numeric field with IS NUMERIC before arithmetic or COMP-3 conversion prevents the most common mainframe ABEND.
Consecutive errors reveal systemic problems. A surge of individual errors might be normal variation. Ten in a row means the input is fundamentally broken.
Reject analysis turns defense into offense. By analyzing reject patterns, MedClaim can work with providers to improve data quality at the source.
Layer your defenses. No single check catches everything. Field validation catches format errors, cross-field catches logical errors, referential catches integration errors.

Discussion Questions

James chose 10 as the consecutive error threshold. How would you determine the optimal threshold? What factors should influence this number?
The NUMERIC test catches ASCII-vs-EBCDIC issues. What other character encoding problems can occur when receiving data from external systems?
Layer 3 (referential validation) requires reading from MEMBER-FILE and PROVIDER-FILE for every claim. What is the performance impact, and how would you mitigate it?
Should MedClaim automatically block a provider after repeated data quality failures, or is human intervention always required? What are the business and ethical considerations?
How would this validation pipeline need to change if MedClaim migrated from file-based batch processing to real-time API-based claim submission?