Case Study 1: GlobalBank Daily Master File Update
Background
Every business day at 6:00 PM Eastern, GlobalBank's online banking systems close for the day. The accounts receivable, teller systems, ATM network, and wire transfer systems have generated 2.3 million transactions stored in the TXN-DAILY file. These transactions must be applied to the ACCT-MASTER file (4.7 million accounts) before the next business day begins at 6:00 AM — a 12-hour batch window that also includes report generation, regulatory feeds, and backups.
The DAILY-UPDATE program has been the anchor of this batch cycle since 1992. Maria Chen has maintained it for the last 15 years. "This program is the most important 1,200 lines of COBOL at GlobalBank," she says. "If it produces incorrect results, every account balance in the bank is wrong."
Architecture
The Batch Flow
Step 1: SORT-TXN — Sort TXN-DAILY by account number
Step 2: DAILY-UPDATE — Balanced-line: Old Master + Txns = New Master
Step 3: VERIFY-UPDATE — Compare Old Master + Txns against New Master
Step 4: RENAME — Rename New Master to become tomorrow's Old Master
Step 5: BRANCH-RPT — Control break report by branch
Step 6: EXCEPTION-RPT — Process exception file
Why Balanced-Line Over Random Update
Derek Washington asked the obvious question: "Why not just open the VSAM master file for I-O and REWRITE each account as we process transactions? Why create a whole new master file every day?"
Maria's answer covered five points:
-
Recovery: If DAILY-UPDATE fails at record 3 million, the old master is untouched. Restart from scratch. With in-place updates, the file is in an unknown state.
-
Audit trail: Operations keeps the old master, the transaction file, and the new master for 7 business days. Any discrepancy can be traced by replaying the update.
-
Performance: Sequential read + sequential write is faster than 2.3 million random REWRITEs against a VSAM KSDS. The KSDS would require 3-4 I/Os per update (index traversal + data read + data write + index update). Sequential processing: roughly 1 I/O per record (buffered).
-
Simplicity: The balanced-line algorithm is straightforward to code, test, and maintain. Random update logic with multi-transaction handling, rollback on error, and concurrent access control is significantly more complex.
-
Reorganization: Creating a new master file nightly is effectively a free VSAM reorganization — no CI splits, no fragmentation, optimal performance every morning.
The 2021 Incident
In March 2021, the DAILY-UPDATE program produced a new master with 4,700,003 records — 3 more than expected. The reconciliation step (VERIFY-UPDATE) flagged the discrepancy:
Expected: 4,700,000 (old master) + 127 (adds) - 12 (closes) = 4,700,115
Actual: 4,700,118
Variance: 3 records
Root Cause
Investigation revealed that 3 "Add" transactions had duplicate account numbers — different customers assigned the same account number by the branch system. The balanced-line algorithm correctly handled each one: the first Add created the master record, and the second Add (matching the now-existing master) was processed as a matched transaction. But the transaction type 'A' (Add) in the matched path should have been rejected as a duplicate. Instead, it was silently processed as an update, overwriting the first customer's data with the second customer's data.
The Fix
Maria added explicit handling for Add transactions in the matched path:
WHEN TD-ADD
* ADD transaction for existing account — ERROR
* This means a duplicate account number was assigned
MOVE TD-ACCT-NUMBER TO EX-ACCT-NUMBER
MOVE TD-TXN-TYPE TO EX-TXN-TYPE
MOVE 'DUPE' TO EX-REASON-CODE
STRING 'Add for existing account - '
'POSSIBLE DUPLICATE ACCOUNT NUMBER'
DELIMITED BY SIZE INTO EX-DESCRIPTION
WRITE EXCEPTION-RECORD
ADD 1 TO WS-EXCEPTIONS
She also added a DISPLAY warning for the operations team and an alert to the branch systems team.
Preventive Measures
After the incident, the team added three safeguards:
- Pre-validation: A new batch step before DAILY-UPDATE checks all Add transactions against the current master to detect duplicates before they reach the balanced-line
- Reconciliation precision: The VERIFY-UPDATE program now checks record counts AND total balances, catching data-level discrepancies even when counts match
- Sort-order verification: The DAILY-UPDATE program now verifies that each key is >= the previous key for both files, aborting immediately on sequence errors
Performance Profile
| Metric | Value |
|---|---|
| Old master records | 4,700,000 |
| Daily transactions | 2,300,000 |
| Sort time (TXN-DAILY) | 8 minutes |
| DAILY-UPDATE elapsed | 42 minutes |
| VERIFY-UPDATE elapsed | 35 minutes |
| Total batch window | 12 hours |
| DAILY-UPDATE as % of window | 5.8% |
The DAILY-UPDATE program processes approximately 91,000 records per minute (combined master + transaction reads and new master writes). This is efficient but not exceptional — the program's value is in its reliability, not its speed.
Lessons Learned
-
Every edge case will eventually occur: Duplicate account numbers from the branch system were thought to be "impossible." After 29 years, they happened.
-
Reconciliation is not optional: The 3-record variance was caught within minutes because VERIFY-UPDATE runs immediately after DAILY-UPDATE. Without reconciliation, the error might have gone undetected for days.
-
The balanced-line algorithm handles complexity through simplicity: Three paths (master-only, matched, transaction-only) cover all scenarios. But each path must handle every possible transaction type, including the ones you think "can't happen."
-
Defensive programming means handling the impossible: Maria now has a saying: "If the transaction type is 'A' and we're in the matched path, something upstream is broken. Our job is to catch it, not ignore it."
Discussion Questions
-
Why does the bank keep 7 days of old master files? What regulatory or business reasons drive this retention?
-
The VERIFY-UPDATE program takes 35 minutes — almost as long as the update itself. Is this time justified? Could verification be done more efficiently?
-
Derek suggested using a VSAM KSDS with I-O mode instead of the old-master/new-master pattern. Under what circumstances might this approach be preferable? What safeguards would be needed?
-
The program processes transactions in account-number order. What would happen if two transactions for the same account arrived in the wrong order (e.g., a withdrawal of $500 followed by a deposit of $1,000, but sorted in reverse)? Does the algorithm handle this correctly?
-
How would you modify the program to support a checkpoint/restart capability without losing the benefits of the sequential old-master/new-master approach?