Key Takeaways — Chapter 14: Advanced File Techniques
The Balanced-Line Algorithm
-
The balanced-line algorithm is the gold standard for master-transaction file updates. Two sorted files are processed in a single pass, with three-way key comparison: master < transaction (master-only), master = transaction (matched), master > transaction (transaction-only).
-
HIGH-VALUES is the sentinel that makes the algorithm work. When a file reaches EOF, its key becomes HIGH-VALUES, forcing all remaining records from the other file to be processed. Processing ends when both keys equal HIGH-VALUES.
-
Write to a new master file, not the old one. This provides recovery (old master is intact if the job fails), audit trail (compare old and new), and effectively free reorganization.
-
Handle all transaction types in all paths. An "Add" transaction in the matched path means a duplicate key — handle it as an error, not silently. Every edge case will eventually occur in production.
Multi-File Patterns
-
All input files must be sorted on the same key for merge and balanced-line processing. Verify sort order at the start of processing — unsorted input produces incorrect results with no error message.
-
Multi-file merge extends the balanced-line concept to three or more files. Compare all keys to find the lowest, write from that source, and read the next record from that source.
-
Sequential input with random lookups is the most common multi-file pattern in claims processing, order fulfillment, and similar systems: read transactions sequentially, look up reference data by key.
Control Break Processing
-
Control breaks require sorted data and produce subtotals at each change in grouping fields. Higher-level breaks must trigger all lower-level breaks — skipping a level produces incorrect subtotals.
-
Always process the final break after the main loop ends. The last group's subtotals and the grand totals must be printed explicitly — they are not triggered by a key change.
Checkpoint/Restart
-
Checkpoint/restart saves program state periodically so long-running jobs can resume from the last checkpoint instead of restarting from scratch. Save ALL state that affects the final output — counters, totals, flags, and the last key processed.
-
Frequency trade-off: More frequent checkpoints mean less rework on restart but add overhead during normal processing. Target every 5,000-50,000 records or every 5-15 minutes.
Defensive Programming and The Human Factor
-
Every record must be accounted for: No record silently disappears. Write exceptions to an exception file with reason codes. Track counters for every processing path.
-
Reconciliation arithmetic validates results: New master count = Old master + Adds - Deletes. If the math does not work, investigate immediately.
-
File processing programs are long-lived: Clear paragraph naming, comprehensive counters, exception files, and careful documentation make programs maintainable across decades and developer generations.
-
Data quality in source files is the single biggest factor in multi-file processing accuracy. Key normalization (trimming spaces, standardizing case, zero-padding) is essential before cross-file matching.