Case Study 2: MedClaim EDI 837 Variable-Format Claim Processing
Background
MedClaim receives approximately 200,000 electronic claims daily in ANSI X12 EDI 837 Professional format. Each claim is a stream of variable-length segments separated by tildes (~), with data elements separated by asterisks (*) and sub-elements by colons (:). The system must parse these streams into structured COBOL records for adjudication.
The Problem
EDI 837 data arrives as continuous text with no fixed field positions:
ISA*00* *00* *ZZ*SENDER *ZZ*RECEIVER *240615*1200*^*00501*000000001*0*P*:~
GS*HC*SENDER*RECEIVER*20240615*1200*1*X*005010X222A1~
ST*837*0001~
BHT*0019*00*CLAIM001*20240615*1200*CH~
CLM*PAT12345*1500.00*11:B:1*Y*A~
SV1*HC:99213:25*125.00*UN*1***1~
DTP*472*D8*20240610~
SE*7*0001~
Key challenges: - Every segment has different element counts and meanings - Sub-elements within elements require secondary parsing - Element positions are logical (1st, 2nd, 3rd element) not physical byte positions - The same parsing engine must handle hundreds of different segment types
Solution Architecture
James Okafor's team built a three-layer parsing architecture:
- Segment splitter: Uses reference modification to scan for
~terminators and extract individual segments - Element parser: Uses reference modification to scan for
*separators within a segment and extract elements into an array - Sub-element parser: Uses reference modification to scan for
:separators within an element
Each layer uses the same core pattern: a position pointer that scans forward one byte at a time using reference modification, recording field boundaries.
Key Technical Decisions
- Reference modification over UNSTRING: UNSTRING would require multiple calls with different delimiters. Reference modification handles all three delimiter types (~ * :) with the same scanning loop, changing only the comparison character.
- Boundary validation: Every reference modification operation validates position + length against the buffer size. Sarah Kim's testing found that 0.02% of incoming EDI data contained malformed segments that would have caused storage overlays without validation.
- Performance: The parser processes the 200,000 daily claims in 12 minutes, with parsing consuming only 3 minutes of that time.
Results
The reference modification-based parser replaced an older UNSTRING-based parser that was: - 40% slower (UNSTRING overhead on variable-length segments) - Unable to handle sub-elements without a second UNSTRING pass - Prone to failures when element counts exceeded expectations
Discussion Questions
- Why is a three-layer architecture preferable to parsing the entire EDI stream in one pass?
- How would you handle an EDI segment that contains an escaped delimiter (e.g., a colon that is part of a data value rather than a sub-element separator)?
- What testing strategy would you use to verify the parser handles all 837 segment types correctly?
- How does this parsing approach compare to using an XML parser for the equivalent data in X12/XML format?