Case Study 2: The Null Character Mystery
The Problem
Sarah Kim, MedClaim's business analyst, discovered during a weekly reconciliation that claims for procedure code "99214" (office visit, established patient) were consistently being paid at 80% of the contracted rate instead of 100%. The discrepancy affected approximately 2,800 claims over a two-month period, totaling $340,000 in underpayments to providers.
The puzzling part: the adjudication program (CLM-ADJUD) had not changed in over a year. And 99214 was classified as "PREVENTIVE" in the procedure code table, which should receive 100% coverage.
The Investigation
Phase 1: Verify the Business Rules
James Okafor first confirmed the expected behavior:
SELECT PROC_CODE, PROC_CATEGORY, COVERAGE_PCT
FROM PROCEDURE_TABLE
WHERE PROC_CODE = '99214';
Result: 99214 PREVENTIVE 1.00
The table was correct. The bug was in the program's behavior, not the reference data.
Phase 2: Add Targeted Debug Output
James compiled CLM-ADJUD with debug level 2 and processed a test claim for procedure 99214:
DBG2: CLAIM=CLM000098765
DBG2: PROC-CODE=99214
DBG2: PROC-CATEGORY=>PREVENTIVE <
DBG2: COVERAGE-PCT=0.80
The category showed "PREVENTIVE" but the coverage was 0.80 (the WHEN OTHER default). The EVALUATE was not matching "PREVENTIVE."
Phase 3: Inspect the Data Closely
James added hex display logic for the category field:
DBG3: PROC-CATEGORY HEX=D7D9C5E5C5D5E3C9E5C500
DBG3: PROC-CATEGORY LEN=11
The last byte was X'00' — a null character. The field contained PREVENTIVE followed by a null, making it PREVENTIVE\0 (11 characters with a null terminator).
The EVALUATE comparison:
- COBOL EVALUATE compares PREVENTIVE\0 (from the field) against 'PREVENTIVE' (literal, 10 chars + space padding)
- PREVENTIVE\0 != PREVENTIVE because X'00' != X'40' (null != space)
- Falls through to WHEN OTHER, which sets 0.80
Phase 4: Find the Source of the Null
James traced CLM-PROC-CATEGORY upstream. It came from a message received via IBM MQ from MedClaim's new Java-based provider portal. The Java application:
// Java code sending claim data
String category = procedureLookup.getCategory(); // "PREVENTIVE"
message.setStringProperty("PROC_CATEGORY", category);
Java strings are null-terminated internally. When the MQ message bridge converted the data for the COBOL program, the null terminator was preserved in the EBCDIC payload. The COBOL copybook defined CLM-PROC-CATEGORY as PIC X(11), which held exactly "PREVENTIVE" + X'00'.
Phase 5: Determine the Scope
James ran a diagnostic query:
SELECT COUNT(*) FROM CLAIM_MASTER
WHERE CLAIM_SOURCE = 'PORTAL'
AND PROC_CATEGORY LIKE '%' || X'00' || '%';
Result: 14,200 claims had null-contaminated category fields. Of those, 2,800 had procedure code 99214 (the ones Sarah caught).
But other procedure categories were also affected — "DIAGNOSTIC\0" did not match "DIAGNOSTIC" either. James discovered that the WHEN OTHER clause happened to set 0.80 for diagnostic procedures, so those claims were coincidentally correct. Only the "PREVENTIVE" claims (which should be 1.00 but got 0.80) showed a visible discrepancy.
Phase 6: Apply Fixes
Immediate fix — clean input data in CLM-ADJUD:
1500-CLEAN-INPUT-FIELDS.
INSPECT CLM-PROC-CODE
REPLACING ALL LOW-VALUES BY SPACES
INSPECT CLM-PROC-CATEGORY
REPLACING ALL LOW-VALUES BY SPACES
INSPECT CLM-DIAG-CODE
REPLACING ALL LOW-VALUES BY SPACES
INSPECT CLM-MODIFIER
REPLACING ALL LOW-VALUES BY SPACES.
Root fix — update the MQ bridge configuration to strip null terminators during character conversion.
Remediation — reprocess 2,800 underpaid claims with corrected adjudication logic, generating supplemental payments.
Timeline
| Day | Action |
|---|---|
| Day 1 | Sarah Kim identifies discrepancy in weekly report |
| Day 1 | James adds debug output, identifies null character |
| Day 2 | James traces source to Java/MQ interface |
| Day 2 | INSPECT fix applied to CLM-ADJUD |
| Day 3 | MQ bridge reconfigured |
| Day 3-5 | 2,800 claims reprocessed for supplemental payment |
| Day 7 | $340,000 in supplemental payments issued |
The Broader Lesson
This bug existed for two months and affected 14,200 claims, but was only detected for one specific category/coverage combination. The other affected claims were "silently wrong in a way that happened to produce the right answer" — the most dangerous type of bug.
After this incident, James implemented a data quality validation step at the beginning of every claim processing program:
1000-VALIDATE-INPUT.
PERFORM 1500-CLEAN-INPUT-FIELDS
PERFORM 1600-VERIFY-CATEGORY-VALUES
PERFORM 1700-LOG-DATA-QUALITY-METRICS.
Discussion Questions
- Why was this bug particularly hard to detect? How did the coincidence of the WHEN OTHER default masking the error for most categories delay discovery?
- The Java developer's code was correct from a Java perspective. The COBOL developer's code was correct from a COBOL perspective. Where did the actual bug lie — in the interface between the two systems. What organizational practices could prevent interface bugs?
- Should the EVALUATE have had an explicit WHEN clause for every valid category instead of using WHEN OTHER as a catch-all? What are the trade-offs?
- James used angle brackets (
>and<) around the DISPLAY of the category field but still did not initially spot the null. Why are null characters invisible in SYSOUT output? What technique finally revealed the problem? - If MedClaim had implemented the data cleaning as part of the MQ message handler (rather than in each COBOL program), what advantages and disadvantages would that approach have?
Key Takeaway
Cross-platform interface bugs are among the most subtle and dangerous in enterprise systems. Different platforms have different assumptions about data representation — null-terminated vs. space-padded strings, ASCII vs. EBCDIC encoding, big-endian vs. little-endian integers. When systems communicate across these boundaries, defensive data validation at the receiving end is essential. Never assume that data from another platform conforms to your platform's conventions.