Case Study 2: Merging Branch Transaction Files

Background

Midwest Community Bank operates 24 branches across three states. Each branch runs its own local transaction processing system during business hours, capturing deposits, withdrawals, loan payments, and account maintenance transactions. At the close of business, each branch transmits its daily transaction file to the central data center for consolidated processing.

The central batch system must merge all 24 branch files into a single consolidated transaction file before the posting program can run. The merger is not a simple concatenation -- the output must be in strict account-number order, and duplicate transactions must be detected and flagged. Duplicate transactions occur when a customer performs a transaction at one branch that is simultaneously posted by an automated system (such as ACH or wire transfer) at the central office, resulting in two records for the same logical transaction.

This case study demonstrates COBOL's MERGE statement for combining pre-sorted files, supplemented by an OUTPUT PROCEDURE that detects and handles duplicates. While the bank has 24 branches, the COBOL MERGE statement has a practical limit on the number of input files that can be specified in a single MERGE statement (typically 12 to 16, depending on the compiler). The solution uses a two-phase merge: first merging branches into regional files, then merging regional files into the consolidated output.


Data Design

The Branch Transaction File

Each branch produces a file sorted by account number. The record layout is standardized across all branches:

       SELECT BRANCH-FILE-01
           ASSIGN TO BRNTXN01
           ORGANIZATION IS SEQUENTIAL
           FILE STATUS IS WS-BRN01-STATUS.

       SELECT BRANCH-FILE-02
           ASSIGN TO BRNTXN02
           ORGANIZATION IS SEQUENTIAL
           FILE STATUS IS WS-BRN02-STATUS.

       SELECT BRANCH-FILE-03
           ASSIGN TO BRNTXN03
           ORGANIZATION IS SEQUENTIAL
           FILE STATUS IS WS-BRN03-STATUS.

       FD  BRANCH-FILE-01
           RECORDING MODE IS F
           RECORD CONTAINS 180 CHARACTERS.
       01  BRANCH-RECORD-01            PIC X(180).

       FD  BRANCH-FILE-02
           RECORDING MODE IS F
           RECORD CONTAINS 180 CHARACTERS.
       01  BRANCH-RECORD-02            PIC X(180).

       FD  BRANCH-FILE-03
           RECORDING MODE IS F
           RECORD CONTAINS 180 CHARACTERS.
       01  BRANCH-RECORD-03            PIC X(180).

The common record layout (used for the sort/merge description and all branch files):

       01  TRANSACTION-RECORD.
           05  TXR-ACCOUNT-NUMBER      PIC X(10).
           05  TXR-TXN-DATE            PIC 9(8).
           05  TXR-TXN-TIME            PIC 9(6).
           05  TXR-TXN-ID              PIC X(12).
           05  TXR-TXN-TYPE            PIC X(2).
               88  TXR-DEPOSIT         VALUE "DP".
               88  TXR-WITHDRAWAL      VALUE "WD".
               88  TXR-LOAN-PAYMENT    VALUE "LP".
               88  TXR-TRANSFER-IN     VALUE "TI".
               88  TXR-TRANSFER-OUT    VALUE "TO".
               88  TXR-FEE             VALUE "FE".
               88  TXR-INTEREST        VALUE "IN".
               88  TXR-ACH-CREDIT      VALUE "AC".
               88  TXR-ACH-DEBIT       VALUE "AD".
               88  TXR-WIRE            VALUE "WR".
           05  TXR-TXN-AMOUNT          PIC S9(11)V99.
           05  TXR-BRANCH-CODE         PIC X(4).
           05  TXR-TELLER-ID           PIC X(6).
           05  TXR-REFERENCE           PIC X(20).
           05  TXR-DESCRIPTION         PIC X(30).
           05  TXR-FILLER              PIC X(67).

The Merge Work File

       SELECT MERGE-WORK-FILE
           ASSIGN TO MRGEWK01.

       SD  MERGE-WORK-FILE
           RECORD CONTAINS 180 CHARACTERS.

       01  MERGE-RECORD.
           05  MR-ACCOUNT-NUMBER       PIC X(10).
           05  MR-TXN-DATE             PIC 9(8).
           05  MR-TXN-TIME             PIC 9(6).
           05  MR-TXN-ID               PIC X(12).
           05  MR-TXN-TYPE             PIC X(2).
           05  MR-TXN-AMOUNT           PIC S9(11)V99.
           05  MR-BRANCH-CODE          PIC X(4).
           05  MR-TELLER-ID            PIC X(6).
           05  MR-REFERENCE            PIC X(20).
           05  MR-DESCRIPTION          PIC X(30).
           05  MR-FILLER               PIC X(67).

Phase 1: Regional Merge

The first phase merges branch files within each region. The bank is organized into three regions (East with 8 branches, Central with 9 branches, West with 7 branches). Each regional merge combines its branches into a single sorted regional file.

The MERGE statement combines the pre-sorted branch files without requiring an INPUT PROCEDURE, because the branch files are already sorted by account number:

       IDENTIFICATION DIVISION.
       PROGRAM-ID. RGNMERGE.

       PROCEDURE DIVISION.
       0000-MAIN.
           PERFORM 1000-INITIALIZE

      *    Merge East Region (8 branches)
           MERGE MERGE-WORK-FILE
               ON ASCENDING KEY MR-ACCOUNT-NUMBER
               ON ASCENDING KEY MR-TXN-DATE
               ON ASCENDING KEY MR-TXN-TIME
               USING BRANCH-FILE-01
                     BRANCH-FILE-02
                     BRANCH-FILE-03
                     BRANCH-FILE-04
                     BRANCH-FILE-05
                     BRANCH-FILE-06
                     BRANCH-FILE-07
                     BRANCH-FILE-08
               GIVING REGIONAL-FILE-EAST

           DISPLAY "EAST REGION MERGE COMPLETE"
           PERFORM 9000-TERMINATE
           STOP RUN
           .

The MERGE statement handles all the complexity of interleaving records from eight input files. Internally, it performs an n-way merge: it reads the first record from each input file, selects the one with the lowest key, writes it to the output, reads the next record from that file, and repeats. This produces a correctly ordered output file in a single pass through all input files.

Critical requirement: Each input file must already be sorted on the same keys specified in the MERGE statement. If a branch file is out of order, the MERGE produces incorrect output with no error message. The branch systems are responsible for sorting their files before transmission.


Phase 2: Consolidated Merge with Duplicate Detection

The second phase merges the three regional files into the final consolidated file. This phase uses an OUTPUT PROCEDURE to detect duplicate transactions:

       IDENTIFICATION DIVISION.
       PROGRAM-ID. CONSMRGE.

       PROCEDURE DIVISION.
       0000-MAIN.
           PERFORM 1000-INITIALIZE

           MERGE MERGE-WORK-FILE
               ON ASCENDING KEY MR-ACCOUNT-NUMBER
               ON ASCENDING KEY MR-TXN-DATE
               ON ASCENDING KEY MR-TXN-TIME
               USING REGIONAL-FILE-EAST
                     REGIONAL-FILE-CENTRAL
                     REGIONAL-FILE-WEST
               OUTPUT PROCEDURE IS 3000-DETECT-DUPLICATES

           PERFORM 4000-PRODUCE-REPORT
           PERFORM 9000-TERMINATE
           STOP RUN
           .

The Duplicate Detection Logic

Two transactions are considered duplicates when they have the same account number, the same transaction date, the same amount, and the same reference number. The time and branch may differ (because the same logical transaction was recorded at different locations at different times). The OUTPUT PROCEDURE compares each record against the previous record to identify potential duplicates:

       3000-DETECT-DUPLICATES.
           OPEN OUTPUT CONSOLIDATED-FILE
           OPEN OUTPUT DUPLICATE-FILE

           MOVE 0 TO WS-CONSOL-COUNT
           MOVE 0 TO WS-DUPLICATE-COUNT
           MOVE SPACES TO WS-PREV-RECORD

           PERFORM 3100-RETURN-MERGED
           PERFORM UNTIL WS-MERGE-EOF
               PERFORM 3200-CHECK-DUPLICATE
               PERFORM 3100-RETURN-MERGED
           END-PERFORM

           CLOSE CONSOLIDATED-FILE
           CLOSE DUPLICATE-FILE
           .

       3100-RETURN-MERGED.
           RETURN MERGE-WORK-FILE INTO MERGE-RECORD
               AT END SET WS-MERGE-EOF TO TRUE
           END-RETURN
           .

       3200-CHECK-DUPLICATE.
           IF MR-ACCOUNT-NUMBER = WS-PREV-ACCOUNT
           AND MR-TXN-DATE      = WS-PREV-DATE
           AND MR-TXN-AMOUNT    = WS-PREV-AMOUNT
           AND MR-REFERENCE     = WS-PREV-REFERENCE
           AND MR-REFERENCE NOT = SPACES
      *        Potential duplicate found
               PERFORM 3300-HANDLE-DUPLICATE
           ELSE
      *        Normal record -- write to consolidated file
               MOVE MERGE-RECORD TO CONSOLIDATED-RECORD
               WRITE CONSOLIDATED-RECORD
               ADD 1 TO WS-CONSOL-COUNT
           END-IF

      *    Save current record fields for next comparison
           MOVE MR-ACCOUNT-NUMBER TO WS-PREV-ACCOUNT
           MOVE MR-TXN-DATE       TO WS-PREV-DATE
           MOVE MR-TXN-AMOUNT     TO WS-PREV-AMOUNT
           MOVE MR-REFERENCE      TO WS-PREV-REFERENCE
           MOVE MR-BRANCH-CODE    TO WS-PREV-BRANCH
           MOVE MERGE-RECORD      TO WS-PREV-RECORD
           .

Handling Detected Duplicates

When a duplicate is detected, the program must decide which record to keep and which to flag. The rule is: keep the record from the originating branch (indicated by the branch code matching the first two characters of the reference number) and flag the other as the duplicate:

       3300-HANDLE-DUPLICATE.
           ADD 1 TO WS-DUPLICATE-COUNT

      *    Build duplicate record with both copies for review
           INITIALIZE DUPLICATE-REPORT-RECORD
           MOVE "DUP" TO DRR-FLAG
           MOVE MR-ACCOUNT-NUMBER TO DRR-ACCOUNT
           MOVE MR-TXN-DATE       TO DRR-DATE
           MOVE MR-TXN-AMOUNT     TO DRR-AMOUNT
           MOVE MR-REFERENCE      TO DRR-REFERENCE
           MOVE MR-BRANCH-CODE    TO DRR-BRANCH-CURRENT
           MOVE WS-PREV-BRANCH    TO DRR-BRANCH-PREVIOUS
           MOVE MR-TXN-ID         TO DRR-TXN-ID-CURRENT

           WRITE DUPLICATE-REPORT-RECORD

      *    Determine which record is the original
      *    Keep the one whose branch matches the reference prefix
           IF MR-BRANCH-CODE(1:2) = MR-REFERENCE(1:2)
      *        Current record is the original -- already written
      *        in previous iteration (from WS-PREV-RECORD)
      *        The duplicate is the current record
               DISPLAY "DUP DETECTED: ACCT="
                   MR-ACCOUNT-NUMBER
                   " REF=" MR-REFERENCE
                   " KEEPING BRANCH " WS-PREV-BRANCH
           ELSE
      *        Previous record was the original
      *        Current record is the automated duplicate
               DISPLAY "DUP DETECTED: ACCT="
                   MR-ACCOUNT-NUMBER
                   " REF=" MR-REFERENCE
                   " KEEPING BRANCH " MR-BRANCH-CODE
      *        Write current record since it is the original
               MOVE MERGE-RECORD TO CONSOLIDATED-RECORD
               WRITE CONSOLIDATED-RECORD
               ADD 1 TO WS-CONSOL-COUNT
           END-IF
           .

JCL for the Two-Phase Merge

//BRNMERGE JOB (ACCT),'BRANCH MERGE',
//         CLASS=A,MSGCLASS=X,NOTIFY=&SYSUID
//*
//*============================================================
//* PHASE 1: MERGE BRANCHES INTO REGIONAL FILES
//*============================================================
//*
//* EAST REGION MERGE (BRANCHES 01-08)
//*
//EAST     EXEC PGM=RGNMERGE
//STEPLIB  DD DSN=PROD.LOADLIB,DISP=SHR
//BRNTXN01 DD DSN=BRANCH.B01.DAILY.TRANS,DISP=SHR
//BRNTXN02 DD DSN=BRANCH.B02.DAILY.TRANS,DISP=SHR
//BRNTXN03 DD DSN=BRANCH.B03.DAILY.TRANS,DISP=SHR
//BRNTXN04 DD DSN=BRANCH.B04.DAILY.TRANS,DISP=SHR
//BRNTXN05 DD DSN=BRANCH.B05.DAILY.TRANS,DISP=SHR
//BRNTXN06 DD DSN=BRANCH.B06.DAILY.TRANS,DISP=SHR
//BRNTXN07 DD DSN=BRANCH.B07.DAILY.TRANS,DISP=SHR
//BRNTXN08 DD DSN=BRANCH.B08.DAILY.TRANS,DISP=SHR
//MRGEWK01 DD DSN=&&MRGWK1,DISP=(NEW,DELETE),
//         SPACE=(CYL,(50,25)),UNIT=SYSDA
//RGNEAST  DD DSN=REGION.EAST.MERGED(+1),
//         DISP=(NEW,CATLG,DELETE),
//         SPACE=(CYL,(30,10)),
//         DCB=(RECFM=FB,LRECL=180,BLKSIZE=0)
//SYSOUT   DD SYSOUT=*
//*
//* CENTRAL REGION MERGE (BRANCHES 09-17)
//*
//CENTRAL  EXEC PGM=RGNMERGE
//STEPLIB  DD DSN=PROD.LOADLIB,DISP=SHR
//BRNTXN01 DD DSN=BRANCH.B09.DAILY.TRANS,DISP=SHR
//BRNTXN02 DD DSN=BRANCH.B10.DAILY.TRANS,DISP=SHR
//BRNTXN03 DD DSN=BRANCH.B11.DAILY.TRANS,DISP=SHR
//BRNTXN04 DD DSN=BRANCH.B12.DAILY.TRANS,DISP=SHR
//BRNTXN05 DD DSN=BRANCH.B13.DAILY.TRANS,DISP=SHR
//BRNTXN06 DD DSN=BRANCH.B14.DAILY.TRANS,DISP=SHR
//BRNTXN07 DD DSN=BRANCH.B15.DAILY.TRANS,DISP=SHR
//BRNTXN08 DD DSN=BRANCH.B16.DAILY.TRANS,DISP=SHR
//BRNTXN09 DD DSN=BRANCH.B17.DAILY.TRANS,DISP=SHR
//MRGEWK01 DD DSN=&&MRGWK2,DISP=(NEW,DELETE),
//         SPACE=(CYL,(50,25)),UNIT=SYSDA
//RGNCENTR DD DSN=REGION.CENTRAL.MERGED(+1),
//         DISP=(NEW,CATLG,DELETE),
//         SPACE=(CYL,(30,10)),
//         DCB=(RECFM=FB,LRECL=180,BLKSIZE=0)
//SYSOUT   DD SYSOUT=*
//*
//* WEST REGION MERGE (BRANCHES 18-24)
//*
//WEST     EXEC PGM=RGNMERGE
//STEPLIB  DD DSN=PROD.LOADLIB,DISP=SHR
//BRNTXN01 DD DSN=BRANCH.B18.DAILY.TRANS,DISP=SHR
//BRNTXN02 DD DSN=BRANCH.B19.DAILY.TRANS,DISP=SHR
//BRNTXN03 DD DSN=BRANCH.B20.DAILY.TRANS,DISP=SHR
//BRNTXN04 DD DSN=BRANCH.B21.DAILY.TRANS,DISP=SHR
//BRNTXN05 DD DSN=BRANCH.B22.DAILY.TRANS,DISP=SHR
//BRNTXN06 DD DSN=BRANCH.B23.DAILY.TRANS,DISP=SHR
//BRNTXN07 DD DSN=BRANCH.B24.DAILY.TRANS,DISP=SHR
//MRGEWK01 DD DSN=&&MRGWK3,DISP=(NEW,DELETE),
//         SPACE=(CYL,(50,25)),UNIT=SYSDA
//RGNWEST  DD DSN=REGION.WEST.MERGED(+1),
//         DISP=(NEW,CATLG,DELETE),
//         SPACE=(CYL,(30,10)),
//         DCB=(RECFM=FB,LRECL=180,BLKSIZE=0)
//SYSOUT   DD SYSOUT=*
//*
//*============================================================
//* PHASE 2: CONSOLIDATED MERGE WITH DUPLICATE DETECTION
//*============================================================
//*
//CONSOL   EXEC PGM=CONSMRGE,COND=(0,NE)
//STEPLIB  DD DSN=PROD.LOADLIB,DISP=SHR
//RGNEAST  DD DSN=REGION.EAST.MERGED(0),DISP=SHR
//RGNCENTR DD DSN=REGION.CENTRAL.MERGED(0),DISP=SHR
//RGNWEST  DD DSN=REGION.WEST.MERGED(0),DISP=SHR
//MRGEWK01 DD DSN=&&MRGWK4,DISP=(NEW,DELETE),
//         SPACE=(CYL,(100,50)),UNIT=SYSDA
//CONSOUT  DD DSN=BANK.CONSOLIDATED.TRANS(+1),
//         DISP=(NEW,CATLG,DELETE),
//         SPACE=(CYL,(80,30)),
//         DCB=(RECFM=FB,LRECL=180,BLKSIZE=0)
//DUPFILE  DD DSN=BANK.DUPLICATE.TRANS(+1),
//         DISP=(NEW,CATLG,DELETE),
//         SPACE=(CYL,(5,2)),
//         DCB=(RECFM=FB,LRECL=250,BLKSIZE=0)
//MERGERPT DD SYSOUT=*
//SYSOUT   DD SYSOUT=*
//SYSUDUMP DD SYSOUT=*

The Merge Statistics Report

================================================================
  MIDWEST COMMUNITY BANK
  BRANCH TRANSACTION MERGE REPORT
  RUN DATE: 2026-02-10    TIME: 19:15:22
================================================================

  PHASE 1: REGIONAL MERGE RESULTS
  --------------------------------------------------------
  East Region:
    Branches Merged:                        8
    Total Records:                    342,871
  Central Region:
    Branches Merged:                        9
    Total Records:                    398,204
  West Region:
    Branches Merged:                        7
    Total Records:                    261,553
  --------------------------------------------------------
  Total Input Records:              1,002,628

  PHASE 2: CONSOLIDATED MERGE
  --------------------------------------------------------
  Records to Consolidated File:    1,002,491
  Duplicate Records Detected:            137
  Duplicate Pairs by Type:
    Branch vs. ACH:                       89
    Branch vs. Wire:                      31
    Branch vs. Branch:                    17

  BRANCH VOLUME SUMMARY
  --------------------------------------------------------
  Branch    Records    Amount Debits    Amount Credits
  ------  ---------  ---------------  ---------------
  B01        41,223   $  23,445,102    $  28,991,387
  B02        38,917   $  21,008,776    $  25,332,109
  B03        44,501   $  31,220,445    $  34,887,223
  ...
  B24        32,118   $  18,776,321    $  22,109,556
  ------  ---------  ---------------  ---------------
  TOTAL   1,002,628   $ 587,231,904    $ 692,108,445

================================================================

MERGE vs. SORT: When to Use Each

This case study illustrates the key distinction between MERGE and SORT:

Use MERGE when: - Input files are already sorted on the merge keys - You want to combine multiple sorted sources into a single sorted output - You do not need an INPUT PROCEDURE (MERGE does not support INPUT PROCEDURE) - The combined output should preserve the sort order of the inputs

Use SORT when: - Input data is unsorted or needs reordering - You need INPUT PROCEDURE to filter or transform records before sorting - You have a single input source (or want to treat multiple sources as one via concatenation in JCL)

The MERGE statement is significantly more efficient than SORT when the input files are pre-sorted. The SORT utility must compare every record against potentially every other record (using an efficient algorithm like quicksort or merge-sort internally), while the MERGE utility only needs to compare the current front record from each input file, selecting the minimum each time. For k input files with a total of n records, MERGE performs O(n log k) comparisons, while SORT performs O(n log n) comparisons. When k is small (3 regional files), the MERGE is much faster.

A critical limitation: the MERGE statement does not support INPUT PROCEDURE. If you need to filter or transform records before merging, you must either pre-process the files in a prior step or use SORT with JCL concatenation instead.


Handling Missing Branch Files

In production, a branch file may be missing due to transmission failure. The program must handle this gracefully rather than abending:

       1000-INITIALIZE.
           MOVE 0 TO WS-MISSING-BRANCH-COUNT

           OPEN INPUT BRANCH-FILE-01
           IF WS-BRN01-STATUS NOT = "00"
               DISPLAY "WARNING: BRANCH 01 FILE MISSING"
               ADD 1 TO WS-MISSING-BRANCH-COUNT
               CLOSE BRANCH-FILE-01
           END-IF

           OPEN INPUT BRANCH-FILE-02
           IF WS-BRN02-STATUS NOT = "00"
               DISPLAY "WARNING: BRANCH 02 FILE MISSING"
               ADD 1 TO WS-MISSING-BRANCH-COUNT
               CLOSE BRANCH-FILE-02
           END-IF

      *    ... repeat for each branch file

      *    If more than 3 branches are missing, abort
           IF WS-MISSING-BRANCH-COUNT > 3
               DISPLAY "CRITICAL: MORE THAN 3 BRANCHES MISSING"
               DISPLAY "MERGE ABORTED - MANUAL REVIEW REQUIRED"
               MOVE 16 TO RETURN-CODE
               STOP RUN
           END-IF
           .

The threshold of 3 missing branches was chosen based on operational experience: one or two missing files usually means a transmission delay and can be handled by a re-run. More than three suggests a systemic problem that requires human intervention.


Testing the Merge Process

Test Scenario 1: Basic Three-Way Merge

  1. Create three small test files, each sorted by account number
  2. Run the merge program
  3. Verify the consolidated output is in correct account-number order
  4. Verify the record count equals the sum of all input records (minus duplicates)

Test Scenario 2: Duplicate Detection

  1. Create two branch files with an identical transaction (same account, date, amount, reference)
  2. Run the merge
  3. Verify one copy appears in the consolidated file and one in the duplicate file
  4. Verify the duplicate report identifies both branches

Test Scenario 3: Empty Input File

  1. Create one normal branch file and one empty branch file
  2. Run the merge
  3. Verify the program handles the empty file without error
  4. Verify the output contains all records from the non-empty file

Test Scenario 4: All Records for Same Account

  1. Create three branch files where every record has the same account number
  2. Run the merge
  3. Verify records are ordered by date and time within the account

Discussion Questions

  1. The MERGE statement does not support INPUT PROCEDURE. If you needed to filter records during the merge (for example, excluding transactions older than 30 days), how would you restructure the job to accomplish this?

  2. The duplicate detection logic compares the current record against only the previous record. Is this sufficient? Could a duplicate be separated by a non-duplicate record in the merged output? Under what circumstances?

  3. The two-phase merge approach was chosen because of compiler limits on the number of MERGE input files. An alternative would be to concatenate all 24 branch files in the JCL and use SORT instead of MERGE. Compare the advantages and disadvantages of each approach.

  4. What happens if one of the branch files is not sorted correctly? How would you detect this condition before the merge, and what should the program do if it detects an out-of-order input file?

  5. The bank is considering adding real-time transaction feeds from online and mobile channels, which would arrive as a continuous stream rather than a batch file. How would this change the merge architecture?