Case Study 2: MedClaim EDI 837 Variable-Format Claim Processing

Background

MedClaim receives approximately 200,000 electronic claims daily in ANSI X12 EDI 837 Professional format. Each claim is a stream of variable-length segments separated by tildes (~), with data elements separated by asterisks (*) and sub-elements by colons (:). The system must parse these streams into structured COBOL records for adjudication.

The Problem

EDI 837 data arrives as continuous text with no fixed field positions:

ISA*00*          *00*          *ZZ*SENDER         *ZZ*RECEIVER       *240615*1200*^*00501*000000001*0*P*:~
GS*HC*SENDER*RECEIVER*20240615*1200*1*X*005010X222A1~
ST*837*0001~
BHT*0019*00*CLAIM001*20240615*1200*CH~
CLM*PAT12345*1500.00*11:B:1*Y*A~
SV1*HC:99213:25*125.00*UN*1***1~
DTP*472*D8*20240610~
SE*7*0001~

Key challenges: - Every segment has different element counts and meanings - Sub-elements within elements require secondary parsing - Element positions are logical (1st, 2nd, 3rd element) not physical byte positions - The same parsing engine must handle hundreds of different segment types

Solution Architecture

James Okafor's team built a three-layer parsing architecture:

  1. Segment splitter: Uses reference modification to scan for ~ terminators and extract individual segments
  2. Element parser: Uses reference modification to scan for * separators within a segment and extract elements into an array
  3. Sub-element parser: Uses reference modification to scan for : separators within an element

Each layer uses the same core pattern: a position pointer that scans forward one byte at a time using reference modification, recording field boundaries.

Key Technical Decisions

  • Reference modification over UNSTRING: UNSTRING would require multiple calls with different delimiters. Reference modification handles all three delimiter types (~ * :) with the same scanning loop, changing only the comparison character.
  • Boundary validation: Every reference modification operation validates position + length against the buffer size. Sarah Kim's testing found that 0.02% of incoming EDI data contained malformed segments that would have caused storage overlays without validation.
  • Performance: The parser processes the 200,000 daily claims in 12 minutes, with parsing consuming only 3 minutes of that time.

Results

The reference modification-based parser replaced an older UNSTRING-based parser that was: - 40% slower (UNSTRING overhead on variable-length segments) - Unable to handle sub-elements without a second UNSTRING pass - Prone to failures when element counts exceeded expectations

Discussion Questions

  1. Why is a three-layer architecture preferable to parsing the entire EDI stream in one pass?
  2. How would you handle an EDI segment that contains an escaped delimiter (e.g., a colon that is part of a data value rather than a sub-element separator)?
  3. What testing strategy would you use to verify the parser handles all 837 segment types correctly?
  4. How does this parsing approach compare to using an XML parser for the equivalent data in X12/XML format?