Case Study 2: Merging Branch Transaction Files
Background
Midwest Community Bank operates 24 branches across three states. Each branch runs its own local transaction processing system during business hours, capturing deposits, withdrawals, loan payments, and account maintenance transactions. At the close of business, each branch transmits its daily transaction file to the central data center for consolidated processing.
The central batch system must merge all 24 branch files into a single consolidated transaction file before the posting program can run. The merger is not a simple concatenation -- the output must be in strict account-number order, and duplicate transactions must be detected and flagged. Duplicate transactions occur when a customer performs a transaction at one branch that is simultaneously posted by an automated system (such as ACH or wire transfer) at the central office, resulting in two records for the same logical transaction.
This case study demonstrates COBOL's MERGE statement for combining pre-sorted files, supplemented by an OUTPUT PROCEDURE that detects and handles duplicates. While the bank has 24 branches, the COBOL MERGE statement has a practical limit on the number of input files that can be specified in a single MERGE statement (typically 12 to 16, depending on the compiler). The solution uses a two-phase merge: first merging branches into regional files, then merging regional files into the consolidated output.
Data Design
The Branch Transaction File
Each branch produces a file sorted by account number. The record layout is standardized across all branches:
SELECT BRANCH-FILE-01
ASSIGN TO BRNTXN01
ORGANIZATION IS SEQUENTIAL
FILE STATUS IS WS-BRN01-STATUS.
SELECT BRANCH-FILE-02
ASSIGN TO BRNTXN02
ORGANIZATION IS SEQUENTIAL
FILE STATUS IS WS-BRN02-STATUS.
SELECT BRANCH-FILE-03
ASSIGN TO BRNTXN03
ORGANIZATION IS SEQUENTIAL
FILE STATUS IS WS-BRN03-STATUS.
FD BRANCH-FILE-01
RECORDING MODE IS F
RECORD CONTAINS 180 CHARACTERS.
01 BRANCH-RECORD-01 PIC X(180).
FD BRANCH-FILE-02
RECORDING MODE IS F
RECORD CONTAINS 180 CHARACTERS.
01 BRANCH-RECORD-02 PIC X(180).
FD BRANCH-FILE-03
RECORDING MODE IS F
RECORD CONTAINS 180 CHARACTERS.
01 BRANCH-RECORD-03 PIC X(180).
The common record layout (used for the sort/merge description and all branch files):
01 TRANSACTION-RECORD.
05 TXR-ACCOUNT-NUMBER PIC X(10).
05 TXR-TXN-DATE PIC 9(8).
05 TXR-TXN-TIME PIC 9(6).
05 TXR-TXN-ID PIC X(12).
05 TXR-TXN-TYPE PIC X(2).
88 TXR-DEPOSIT VALUE "DP".
88 TXR-WITHDRAWAL VALUE "WD".
88 TXR-LOAN-PAYMENT VALUE "LP".
88 TXR-TRANSFER-IN VALUE "TI".
88 TXR-TRANSFER-OUT VALUE "TO".
88 TXR-FEE VALUE "FE".
88 TXR-INTEREST VALUE "IN".
88 TXR-ACH-CREDIT VALUE "AC".
88 TXR-ACH-DEBIT VALUE "AD".
88 TXR-WIRE VALUE "WR".
05 TXR-TXN-AMOUNT PIC S9(11)V99.
05 TXR-BRANCH-CODE PIC X(4).
05 TXR-TELLER-ID PIC X(6).
05 TXR-REFERENCE PIC X(20).
05 TXR-DESCRIPTION PIC X(30).
05 TXR-FILLER PIC X(67).
The Merge Work File
SELECT MERGE-WORK-FILE
ASSIGN TO MRGEWK01.
SD MERGE-WORK-FILE
RECORD CONTAINS 180 CHARACTERS.
01 MERGE-RECORD.
05 MR-ACCOUNT-NUMBER PIC X(10).
05 MR-TXN-DATE PIC 9(8).
05 MR-TXN-TIME PIC 9(6).
05 MR-TXN-ID PIC X(12).
05 MR-TXN-TYPE PIC X(2).
05 MR-TXN-AMOUNT PIC S9(11)V99.
05 MR-BRANCH-CODE PIC X(4).
05 MR-TELLER-ID PIC X(6).
05 MR-REFERENCE PIC X(20).
05 MR-DESCRIPTION PIC X(30).
05 MR-FILLER PIC X(67).
Phase 1: Regional Merge
The first phase merges branch files within each region. The bank is organized into three regions (East with 8 branches, Central with 9 branches, West with 7 branches). Each regional merge combines its branches into a single sorted regional file.
The MERGE statement combines the pre-sorted branch files without requiring an INPUT PROCEDURE, because the branch files are already sorted by account number:
IDENTIFICATION DIVISION.
PROGRAM-ID. RGNMERGE.
PROCEDURE DIVISION.
0000-MAIN.
PERFORM 1000-INITIALIZE
* Merge East Region (8 branches)
MERGE MERGE-WORK-FILE
ON ASCENDING KEY MR-ACCOUNT-NUMBER
ON ASCENDING KEY MR-TXN-DATE
ON ASCENDING KEY MR-TXN-TIME
USING BRANCH-FILE-01
BRANCH-FILE-02
BRANCH-FILE-03
BRANCH-FILE-04
BRANCH-FILE-05
BRANCH-FILE-06
BRANCH-FILE-07
BRANCH-FILE-08
GIVING REGIONAL-FILE-EAST
DISPLAY "EAST REGION MERGE COMPLETE"
PERFORM 9000-TERMINATE
STOP RUN
.
The MERGE statement handles all the complexity of interleaving records from eight input files. Internally, it performs an n-way merge: it reads the first record from each input file, selects the one with the lowest key, writes it to the output, reads the next record from that file, and repeats. This produces a correctly ordered output file in a single pass through all input files.
Critical requirement: Each input file must already be sorted on the same keys specified in the MERGE statement. If a branch file is out of order, the MERGE produces incorrect output with no error message. The branch systems are responsible for sorting their files before transmission.
Phase 2: Consolidated Merge with Duplicate Detection
The second phase merges the three regional files into the final consolidated file. This phase uses an OUTPUT PROCEDURE to detect duplicate transactions:
IDENTIFICATION DIVISION.
PROGRAM-ID. CONSMRGE.
PROCEDURE DIVISION.
0000-MAIN.
PERFORM 1000-INITIALIZE
MERGE MERGE-WORK-FILE
ON ASCENDING KEY MR-ACCOUNT-NUMBER
ON ASCENDING KEY MR-TXN-DATE
ON ASCENDING KEY MR-TXN-TIME
USING REGIONAL-FILE-EAST
REGIONAL-FILE-CENTRAL
REGIONAL-FILE-WEST
OUTPUT PROCEDURE IS 3000-DETECT-DUPLICATES
PERFORM 4000-PRODUCE-REPORT
PERFORM 9000-TERMINATE
STOP RUN
.
The Duplicate Detection Logic
Two transactions are considered duplicates when they have the same account number, the same transaction date, the same amount, and the same reference number. The time and branch may differ (because the same logical transaction was recorded at different locations at different times). The OUTPUT PROCEDURE compares each record against the previous record to identify potential duplicates:
3000-DETECT-DUPLICATES.
OPEN OUTPUT CONSOLIDATED-FILE
OPEN OUTPUT DUPLICATE-FILE
MOVE 0 TO WS-CONSOL-COUNT
MOVE 0 TO WS-DUPLICATE-COUNT
MOVE SPACES TO WS-PREV-RECORD
PERFORM 3100-RETURN-MERGED
PERFORM UNTIL WS-MERGE-EOF
PERFORM 3200-CHECK-DUPLICATE
PERFORM 3100-RETURN-MERGED
END-PERFORM
CLOSE CONSOLIDATED-FILE
CLOSE DUPLICATE-FILE
.
3100-RETURN-MERGED.
RETURN MERGE-WORK-FILE INTO MERGE-RECORD
AT END SET WS-MERGE-EOF TO TRUE
END-RETURN
.
3200-CHECK-DUPLICATE.
IF MR-ACCOUNT-NUMBER = WS-PREV-ACCOUNT
AND MR-TXN-DATE = WS-PREV-DATE
AND MR-TXN-AMOUNT = WS-PREV-AMOUNT
AND MR-REFERENCE = WS-PREV-REFERENCE
AND MR-REFERENCE NOT = SPACES
* Potential duplicate found
PERFORM 3300-HANDLE-DUPLICATE
ELSE
* Normal record -- write to consolidated file
MOVE MERGE-RECORD TO CONSOLIDATED-RECORD
WRITE CONSOLIDATED-RECORD
ADD 1 TO WS-CONSOL-COUNT
END-IF
* Save current record fields for next comparison
MOVE MR-ACCOUNT-NUMBER TO WS-PREV-ACCOUNT
MOVE MR-TXN-DATE TO WS-PREV-DATE
MOVE MR-TXN-AMOUNT TO WS-PREV-AMOUNT
MOVE MR-REFERENCE TO WS-PREV-REFERENCE
MOVE MR-BRANCH-CODE TO WS-PREV-BRANCH
MOVE MERGE-RECORD TO WS-PREV-RECORD
.
Handling Detected Duplicates
When a duplicate is detected, the program must decide which record to keep and which to flag. The rule is: keep the record from the originating branch (indicated by the branch code matching the first two characters of the reference number) and flag the other as the duplicate:
3300-HANDLE-DUPLICATE.
ADD 1 TO WS-DUPLICATE-COUNT
* Build duplicate record with both copies for review
INITIALIZE DUPLICATE-REPORT-RECORD
MOVE "DUP" TO DRR-FLAG
MOVE MR-ACCOUNT-NUMBER TO DRR-ACCOUNT
MOVE MR-TXN-DATE TO DRR-DATE
MOVE MR-TXN-AMOUNT TO DRR-AMOUNT
MOVE MR-REFERENCE TO DRR-REFERENCE
MOVE MR-BRANCH-CODE TO DRR-BRANCH-CURRENT
MOVE WS-PREV-BRANCH TO DRR-BRANCH-PREVIOUS
MOVE MR-TXN-ID TO DRR-TXN-ID-CURRENT
WRITE DUPLICATE-REPORT-RECORD
* Determine which record is the original
* Keep the one whose branch matches the reference prefix
IF MR-BRANCH-CODE(1:2) = MR-REFERENCE(1:2)
* Current record is the original -- already written
* in previous iteration (from WS-PREV-RECORD)
* The duplicate is the current record
DISPLAY "DUP DETECTED: ACCT="
MR-ACCOUNT-NUMBER
" REF=" MR-REFERENCE
" KEEPING BRANCH " WS-PREV-BRANCH
ELSE
* Previous record was the original
* Current record is the automated duplicate
DISPLAY "DUP DETECTED: ACCT="
MR-ACCOUNT-NUMBER
" REF=" MR-REFERENCE
" KEEPING BRANCH " MR-BRANCH-CODE
* Write current record since it is the original
MOVE MERGE-RECORD TO CONSOLIDATED-RECORD
WRITE CONSOLIDATED-RECORD
ADD 1 TO WS-CONSOL-COUNT
END-IF
.
JCL for the Two-Phase Merge
//BRNMERGE JOB (ACCT),'BRANCH MERGE',
// CLASS=A,MSGCLASS=X,NOTIFY=&SYSUID
//*
//*============================================================
//* PHASE 1: MERGE BRANCHES INTO REGIONAL FILES
//*============================================================
//*
//* EAST REGION MERGE (BRANCHES 01-08)
//*
//EAST EXEC PGM=RGNMERGE
//STEPLIB DD DSN=PROD.LOADLIB,DISP=SHR
//BRNTXN01 DD DSN=BRANCH.B01.DAILY.TRANS,DISP=SHR
//BRNTXN02 DD DSN=BRANCH.B02.DAILY.TRANS,DISP=SHR
//BRNTXN03 DD DSN=BRANCH.B03.DAILY.TRANS,DISP=SHR
//BRNTXN04 DD DSN=BRANCH.B04.DAILY.TRANS,DISP=SHR
//BRNTXN05 DD DSN=BRANCH.B05.DAILY.TRANS,DISP=SHR
//BRNTXN06 DD DSN=BRANCH.B06.DAILY.TRANS,DISP=SHR
//BRNTXN07 DD DSN=BRANCH.B07.DAILY.TRANS,DISP=SHR
//BRNTXN08 DD DSN=BRANCH.B08.DAILY.TRANS,DISP=SHR
//MRGEWK01 DD DSN=&&MRGWK1,DISP=(NEW,DELETE),
// SPACE=(CYL,(50,25)),UNIT=SYSDA
//RGNEAST DD DSN=REGION.EAST.MERGED(+1),
// DISP=(NEW,CATLG,DELETE),
// SPACE=(CYL,(30,10)),
// DCB=(RECFM=FB,LRECL=180,BLKSIZE=0)
//SYSOUT DD SYSOUT=*
//*
//* CENTRAL REGION MERGE (BRANCHES 09-17)
//*
//CENTRAL EXEC PGM=RGNMERGE
//STEPLIB DD DSN=PROD.LOADLIB,DISP=SHR
//BRNTXN01 DD DSN=BRANCH.B09.DAILY.TRANS,DISP=SHR
//BRNTXN02 DD DSN=BRANCH.B10.DAILY.TRANS,DISP=SHR
//BRNTXN03 DD DSN=BRANCH.B11.DAILY.TRANS,DISP=SHR
//BRNTXN04 DD DSN=BRANCH.B12.DAILY.TRANS,DISP=SHR
//BRNTXN05 DD DSN=BRANCH.B13.DAILY.TRANS,DISP=SHR
//BRNTXN06 DD DSN=BRANCH.B14.DAILY.TRANS,DISP=SHR
//BRNTXN07 DD DSN=BRANCH.B15.DAILY.TRANS,DISP=SHR
//BRNTXN08 DD DSN=BRANCH.B16.DAILY.TRANS,DISP=SHR
//BRNTXN09 DD DSN=BRANCH.B17.DAILY.TRANS,DISP=SHR
//MRGEWK01 DD DSN=&&MRGWK2,DISP=(NEW,DELETE),
// SPACE=(CYL,(50,25)),UNIT=SYSDA
//RGNCENTR DD DSN=REGION.CENTRAL.MERGED(+1),
// DISP=(NEW,CATLG,DELETE),
// SPACE=(CYL,(30,10)),
// DCB=(RECFM=FB,LRECL=180,BLKSIZE=0)
//SYSOUT DD SYSOUT=*
//*
//* WEST REGION MERGE (BRANCHES 18-24)
//*
//WEST EXEC PGM=RGNMERGE
//STEPLIB DD DSN=PROD.LOADLIB,DISP=SHR
//BRNTXN01 DD DSN=BRANCH.B18.DAILY.TRANS,DISP=SHR
//BRNTXN02 DD DSN=BRANCH.B19.DAILY.TRANS,DISP=SHR
//BRNTXN03 DD DSN=BRANCH.B20.DAILY.TRANS,DISP=SHR
//BRNTXN04 DD DSN=BRANCH.B21.DAILY.TRANS,DISP=SHR
//BRNTXN05 DD DSN=BRANCH.B22.DAILY.TRANS,DISP=SHR
//BRNTXN06 DD DSN=BRANCH.B23.DAILY.TRANS,DISP=SHR
//BRNTXN07 DD DSN=BRANCH.B24.DAILY.TRANS,DISP=SHR
//MRGEWK01 DD DSN=&&MRGWK3,DISP=(NEW,DELETE),
// SPACE=(CYL,(50,25)),UNIT=SYSDA
//RGNWEST DD DSN=REGION.WEST.MERGED(+1),
// DISP=(NEW,CATLG,DELETE),
// SPACE=(CYL,(30,10)),
// DCB=(RECFM=FB,LRECL=180,BLKSIZE=0)
//SYSOUT DD SYSOUT=*
//*
//*============================================================
//* PHASE 2: CONSOLIDATED MERGE WITH DUPLICATE DETECTION
//*============================================================
//*
//CONSOL EXEC PGM=CONSMRGE,COND=(0,NE)
//STEPLIB DD DSN=PROD.LOADLIB,DISP=SHR
//RGNEAST DD DSN=REGION.EAST.MERGED(0),DISP=SHR
//RGNCENTR DD DSN=REGION.CENTRAL.MERGED(0),DISP=SHR
//RGNWEST DD DSN=REGION.WEST.MERGED(0),DISP=SHR
//MRGEWK01 DD DSN=&&MRGWK4,DISP=(NEW,DELETE),
// SPACE=(CYL,(100,50)),UNIT=SYSDA
//CONSOUT DD DSN=BANK.CONSOLIDATED.TRANS(+1),
// DISP=(NEW,CATLG,DELETE),
// SPACE=(CYL,(80,30)),
// DCB=(RECFM=FB,LRECL=180,BLKSIZE=0)
//DUPFILE DD DSN=BANK.DUPLICATE.TRANS(+1),
// DISP=(NEW,CATLG,DELETE),
// SPACE=(CYL,(5,2)),
// DCB=(RECFM=FB,LRECL=250,BLKSIZE=0)
//MERGERPT DD SYSOUT=*
//SYSOUT DD SYSOUT=*
//SYSUDUMP DD SYSOUT=*
The Merge Statistics Report
================================================================
MIDWEST COMMUNITY BANK
BRANCH TRANSACTION MERGE REPORT
RUN DATE: 2026-02-10 TIME: 19:15:22
================================================================
PHASE 1: REGIONAL MERGE RESULTS
--------------------------------------------------------
East Region:
Branches Merged: 8
Total Records: 342,871
Central Region:
Branches Merged: 9
Total Records: 398,204
West Region:
Branches Merged: 7
Total Records: 261,553
--------------------------------------------------------
Total Input Records: 1,002,628
PHASE 2: CONSOLIDATED MERGE
--------------------------------------------------------
Records to Consolidated File: 1,002,491
Duplicate Records Detected: 137
Duplicate Pairs by Type:
Branch vs. ACH: 89
Branch vs. Wire: 31
Branch vs. Branch: 17
BRANCH VOLUME SUMMARY
--------------------------------------------------------
Branch Records Amount Debits Amount Credits
------ --------- --------------- ---------------
B01 41,223 $ 23,445,102 $ 28,991,387
B02 38,917 $ 21,008,776 $ 25,332,109
B03 44,501 $ 31,220,445 $ 34,887,223
...
B24 32,118 $ 18,776,321 $ 22,109,556
------ --------- --------------- ---------------
TOTAL 1,002,628 $ 587,231,904 $ 692,108,445
================================================================
MERGE vs. SORT: When to Use Each
This case study illustrates the key distinction between MERGE and SORT:
Use MERGE when: - Input files are already sorted on the merge keys - You want to combine multiple sorted sources into a single sorted output - You do not need an INPUT PROCEDURE (MERGE does not support INPUT PROCEDURE) - The combined output should preserve the sort order of the inputs
Use SORT when: - Input data is unsorted or needs reordering - You need INPUT PROCEDURE to filter or transform records before sorting - You have a single input source (or want to treat multiple sources as one via concatenation in JCL)
The MERGE statement is significantly more efficient than SORT when the input files are pre-sorted. The SORT utility must compare every record against potentially every other record (using an efficient algorithm like quicksort or merge-sort internally), while the MERGE utility only needs to compare the current front record from each input file, selecting the minimum each time. For k input files with a total of n records, MERGE performs O(n log k) comparisons, while SORT performs O(n log n) comparisons. When k is small (3 regional files), the MERGE is much faster.
A critical limitation: the MERGE statement does not support INPUT PROCEDURE. If you need to filter or transform records before merging, you must either pre-process the files in a prior step or use SORT with JCL concatenation instead.
Handling Missing Branch Files
In production, a branch file may be missing due to transmission failure. The program must handle this gracefully rather than abending:
1000-INITIALIZE.
MOVE 0 TO WS-MISSING-BRANCH-COUNT
OPEN INPUT BRANCH-FILE-01
IF WS-BRN01-STATUS NOT = "00"
DISPLAY "WARNING: BRANCH 01 FILE MISSING"
ADD 1 TO WS-MISSING-BRANCH-COUNT
CLOSE BRANCH-FILE-01
END-IF
OPEN INPUT BRANCH-FILE-02
IF WS-BRN02-STATUS NOT = "00"
DISPLAY "WARNING: BRANCH 02 FILE MISSING"
ADD 1 TO WS-MISSING-BRANCH-COUNT
CLOSE BRANCH-FILE-02
END-IF
* ... repeat for each branch file
* If more than 3 branches are missing, abort
IF WS-MISSING-BRANCH-COUNT > 3
DISPLAY "CRITICAL: MORE THAN 3 BRANCHES MISSING"
DISPLAY "MERGE ABORTED - MANUAL REVIEW REQUIRED"
MOVE 16 TO RETURN-CODE
STOP RUN
END-IF
.
The threshold of 3 missing branches was chosen based on operational experience: one or two missing files usually means a transmission delay and can be handled by a re-run. More than three suggests a systemic problem that requires human intervention.
Testing the Merge Process
Test Scenario 1: Basic Three-Way Merge
- Create three small test files, each sorted by account number
- Run the merge program
- Verify the consolidated output is in correct account-number order
- Verify the record count equals the sum of all input records (minus duplicates)
Test Scenario 2: Duplicate Detection
- Create two branch files with an identical transaction (same account, date, amount, reference)
- Run the merge
- Verify one copy appears in the consolidated file and one in the duplicate file
- Verify the duplicate report identifies both branches
Test Scenario 3: Empty Input File
- Create one normal branch file and one empty branch file
- Run the merge
- Verify the program handles the empty file without error
- Verify the output contains all records from the non-empty file
Test Scenario 4: All Records for Same Account
- Create three branch files where every record has the same account number
- Run the merge
- Verify records are ordered by date and time within the account
Discussion Questions
-
The MERGE statement does not support INPUT PROCEDURE. If you needed to filter records during the merge (for example, excluding transactions older than 30 days), how would you restructure the job to accomplish this?
-
The duplicate detection logic compares the current record against only the previous record. Is this sufficient? Could a duplicate be separated by a non-duplicate record in the merged output? Under what circumstances?
-
The two-phase merge approach was chosen because of compiler limits on the number of MERGE input files. An alternative would be to concatenate all 24 branch files in the JCL and use SORT instead of MERGE. Compare the advantages and disadvantages of each approach.
-
What happens if one of the branch files is not sorted correctly? How would you detect this condition before the merge, and what should the program do if it detects an out-of-order input file?
-
The bank is considering adding real-time transaction feeds from online and mobile channels, which would arrive as a continuous stream rather than a batch file. How would this change the merge architecture?