Key Takeaways — Chapter 14: Advanced File Techniques

The Balanced-Line Algorithm

  1. The balanced-line algorithm is the gold standard for master-transaction file updates. Two sorted files are processed in a single pass, with three-way key comparison: master < transaction (master-only), master = transaction (matched), master > transaction (transaction-only).

  2. HIGH-VALUES is the sentinel that makes the algorithm work. When a file reaches EOF, its key becomes HIGH-VALUES, forcing all remaining records from the other file to be processed. Processing ends when both keys equal HIGH-VALUES.

  3. Write to a new master file, not the old one. This provides recovery (old master is intact if the job fails), audit trail (compare old and new), and effectively free reorganization.

  4. Handle all transaction types in all paths. An "Add" transaction in the matched path means a duplicate key — handle it as an error, not silently. Every edge case will eventually occur in production.

Multi-File Patterns

  1. All input files must be sorted on the same key for merge and balanced-line processing. Verify sort order at the start of processing — unsorted input produces incorrect results with no error message.

  2. Multi-file merge extends the balanced-line concept to three or more files. Compare all keys to find the lowest, write from that source, and read the next record from that source.

  3. Sequential input with random lookups is the most common multi-file pattern in claims processing, order fulfillment, and similar systems: read transactions sequentially, look up reference data by key.

Control Break Processing

  1. Control breaks require sorted data and produce subtotals at each change in grouping fields. Higher-level breaks must trigger all lower-level breaks — skipping a level produces incorrect subtotals.

  2. Always process the final break after the main loop ends. The last group's subtotals and the grand totals must be printed explicitly — they are not triggered by a key change.

Checkpoint/Restart

  1. Checkpoint/restart saves program state periodically so long-running jobs can resume from the last checkpoint instead of restarting from scratch. Save ALL state that affects the final output — counters, totals, flags, and the last key processed.

  2. Frequency trade-off: More frequent checkpoints mean less rework on restart but add overhead during normal processing. Target every 5,000-50,000 records or every 5-15 minutes.

Defensive Programming and The Human Factor

  1. Every record must be accounted for: No record silently disappears. Write exceptions to an exception file with reason codes. Track counters for every processing path.

  2. Reconciliation arithmetic validates results: New master count = Old master + Adds - Deletes. If the math does not work, investigate immediately.

  3. File processing programs are long-lived: Clear paragraph naming, comprehensive counters, exception files, and careful documentation make programs maintainable across decades and developer generations.

  4. Data quality in source files is the single biggest factor in multi-file processing accuracy. Key normalization (trimming spaces, standardizing case, zero-padding) is essential before cross-file matching.