Case Study 2: Hash-Based Account Lookup

Background

Pacific Federal Credit Union serves 450,000 members with accounts ranging from basic savings to complex commercial lending facilities. The credit union's core banking system processes approximately 800,000 transactions daily during the overnight batch cycle. Each transaction references an account number, and the posting program must look up the account's master record to validate the transaction and apply the update.

The existing account lookup uses a VSAM KSDS (Key-Sequenced Data Set) with the 10-digit account number as the primary key. While VSAM KSDS provides excellent performance for most workloads, the batch posting program's access pattern creates a specific problem: the transactions arrive in chronological order (not sorted by account), causing the VSAM index to thrash between widely separated areas of the file. During peak batch windows, the index buffer pool overflows and VSAM must perform physical disk I/O for nearly every random read. The batch window is tightening, and the DBA team has asked for an alternative access method.

The solution is a hash-based account lookup using a COBOL relative file. Instead of maintaining an index structure, the program computes the record's slot number directly from the account number using a hashing function. This eliminates the index entirely -- every lookup requires exactly one disk I/O (or two, in the case of a collision). The relative file acts as a hash table stored on disk.


Hash Table Design

The Hashing Concept

A hash function converts a key value (the account number) into a slot number within a fixed range. The ideal hash function distributes keys uniformly across all available slots, minimizing collisions (cases where two different keys map to the same slot).

For the credit union's 450,000 accounts, the team allocates a relative file with 600,000 slots -- approximately 1.33 times the number of accounts. This load factor of 0.75 (450,000 / 600,000) provides a good balance between space efficiency and collision avoidance. A lower load factor reduces collisions but wastes disk space; a higher load factor saves space but increases collision chains.

The Hash Function

The hash function uses a division-remainder method, which is simple and effective for numeric keys:

       WORKING-STORAGE SECTION.
       01  WS-HASH-WORK.
           05  WS-HASH-INPUT          PIC 9(10).
           05  WS-HASH-PRIME          PIC 9(6) VALUE 599993.
           05  WS-HASH-RESULT         PIC 9(6).
           05  WS-TABLE-SIZE          PIC 9(6) VALUE 600000.

       01  WS-REL-KEY                  PIC 9(6).
       5000-COMPUTE-HASH.
      *    Division-remainder hashing with a prime divisor
      *    The prime number reduces clustering for sequential keys
           DIVIDE WS-HASH-INPUT BY WS-HASH-PRIME
               GIVING WS-HASH-WORK
               REMAINDER WS-HASH-RESULT

      *    Ensure slot is between 1 and TABLE-SIZE
      *    (relative files start at slot 1, not slot 0)
           ADD 1 TO WS-HASH-RESULT
           MOVE WS-HASH-RESULT TO WS-REL-KEY
           .

The prime number 599,993 (the largest prime less than 600,000) is used as the divisor because prime divisors produce better distribution than composite numbers, especially when account numbers contain patterns (such as sequential assignment or branch-prefix schemes).

Collision Resolution: Linear Probing

When two account numbers hash to the same slot, a collision occurs. The program resolves collisions using linear probing: it examines consecutive slots starting from the hash position until it finds either the target account or an empty slot.

       01  WS-COLLISION-FIELDS.
           05  WS-PROBE-COUNT         PIC 9(4).
           05  WS-MAX-PROBES          PIC 9(4) VALUE 100.
           05  WS-ORIGINAL-HASH       PIC 9(6).
           05  WS-PROBE-SLOT          PIC 9(6).
           05  WS-PROBE-RESULT        PIC X(1).
               88  PROBE-FOUND        VALUE "F".
               88  PROBE-EMPTY        VALUE "E".
               88  PROBE-COLLISION    VALUE "C".
               88  PROBE-EXHAUSTED    VALUE "X".

File Definitions

The Hash Table File (Relative Organization)

       ENVIRONMENT DIVISION.
       INPUT-OUTPUT SECTION.
       FILE-CONTROL.
           SELECT HASH-TABLE-FILE
               ASSIGN TO HASHTBL
               ORGANIZATION IS RELATIVE
               ACCESS MODE IS RANDOM
               RELATIVE KEY IS WS-REL-KEY
               FILE STATUS IS WS-HT-STATUS.

       DATA DIVISION.
       FILE SECTION.
       FD  HASH-TABLE-FILE
           RECORD CONTAINS 150 CHARACTERS.

       01  HASH-TABLE-RECORD.
           05  HTR-SLOT-STATUS        PIC X(1).
               88  HTR-OCCUPIED       VALUE "O".
               88  HTR-EMPTY          VALUE "E".
               88  HTR-DELETED        VALUE "D".
           05  HTR-ACCOUNT-NUMBER     PIC X(10).
           05  HTR-MEMBER-NAME        PIC X(30).
           05  HTR-ACCOUNT-TYPE       PIC X(2).
               88  HTR-SAVINGS        VALUE "SV".
               88  HTR-CHECKING       VALUE "CK".
               88  HTR-MONEY-MARKET   VALUE "MM".
               88  HTR-CERTIFICATE    VALUE "CD".
               88  HTR-LOAN           VALUE "LN".
               88  HTR-CREDIT-LINE    VALUE "CL".
           05  HTR-BRANCH-CODE        PIC X(4).
           05  HTR-CURRENT-BALANCE    PIC S9(11)V99.
           05  HTR-AVAILABLE-BALANCE  PIC S9(11)V99.
           05  HTR-LAST-TXN-DATE      PIC 9(8).
           05  HTR-OPEN-DATE          PIC 9(8).
           05  HTR-STATUS-CODE        PIC X(1).
               88  HTR-ACTIVE         VALUE "A".
               88  HTR-FROZEN         VALUE "F".
               88  HTR-CLOSED         VALUE "C".
               88  HTR-DORMANT        VALUE "D".
           05  HTR-FILLER             PIC X(48).

The HTR-SLOT-STATUS field is critical for the hash table algorithm: - "O" (Occupied): The slot contains a valid account record. - "E" (Empty): The slot has never been used. During a lookup probe, an empty slot means the target account does not exist in the table. - "D" (Deleted): The slot previously held a record that has been logically deleted. During a lookup, the probe must continue past deleted slots (they might have been part of a collision chain). During an insert, deleted slots can be reused.

The Source Account Master File

       SELECT ACCOUNT-MASTER-FILE
           ASSIGN TO ACCTMAST
           ORGANIZATION IS INDEXED
           ACCESS MODE IS SEQUENTIAL
           RECORD KEY IS AMR-ACCOUNT-NUMBER
           FILE STATUS IS WS-AM-STATUS.

       FD  ACCOUNT-MASTER-FILE
           RECORD CONTAINS 150 CHARACTERS.

       01  ACCOUNT-MASTER-RECORD.
           05  AMR-ACCOUNT-NUMBER     PIC X(10).
           05  AMR-MEMBER-NAME        PIC X(30).
           05  AMR-ACCOUNT-TYPE       PIC X(2).
           05  AMR-BRANCH-CODE        PIC X(4).
           05  AMR-CURRENT-BALANCE    PIC S9(11)V99.
           05  AMR-AVAILABLE-BALANCE  PIC S9(11)V99.
           05  AMR-LAST-TXN-DATE      PIC 9(8).
           05  AMR-OPEN-DATE          PIC 9(8).
           05  AMR-STATUS-CODE        PIC X(1).
           05  AMR-FILLER             PIC X(48).

Building the Hash Table

The hash table is built from the account master file during a nightly initialization step. This program reads every account record, computes its hash, and inserts it into the relative file:

       IDENTIFICATION DIVISION.
       PROGRAM-ID. HTBUILD.

       PROCEDURE DIVISION.
       0000-MAIN.
           PERFORM 1000-INITIALIZE
           PERFORM 2000-PROCESS-ACCOUNTS
               UNTIL WS-AM-EOF
           PERFORM 3000-REPORT-STATISTICS
           PERFORM 9000-TERMINATE
           STOP RUN
           .

       1000-INITIALIZE.
           OPEN INPUT  ACCOUNT-MASTER-FILE
           OPEN OUTPUT HASH-TABLE-FILE
           OPEN OUTPUT BUILD-REPORT-FILE

      *    Pre-allocate all slots as empty
           INITIALIZE HASH-TABLE-RECORD
           SET HTR-EMPTY TO TRUE
           PERFORM VARYING WS-REL-KEY
               FROM 1 BY 1
               UNTIL WS-REL-KEY > WS-TABLE-SIZE
               WRITE HASH-TABLE-RECORD
                   INVALID KEY
                       DISPLAY "INIT ERROR AT SLOT " WS-REL-KEY
                       STOP RUN
               END-WRITE
           END-PERFORM

           MOVE 0 TO WS-RECORDS-LOADED
           MOVE 0 TO WS-TOTAL-COLLISIONS
           MOVE 0 TO WS-MAX-PROBE-DEPTH
           MOVE 0 TO WS-INSERT-FAILURES

           PERFORM 1100-READ-ACCOUNT-MASTER
           .

       2000-PROCESS-ACCOUNTS.
           PERFORM 5000-COMPUTE-HASH
           PERFORM 5100-INSERT-WITH-PROBING
           ADD 1 TO WS-RECORDS-LOADED
           PERFORM 1100-READ-ACCOUNT-MASTER
           .

The Insert Logic with Linear Probing

       5100-INSERT-WITH-PROBING.
           MOVE WS-REL-KEY TO WS-ORIGINAL-HASH
           MOVE 0 TO WS-PROBE-COUNT

           PERFORM 5110-TRY-INSERT
               UNTIL PROBE-FOUND
               OR    PROBE-EMPTY
               OR    PROBE-EXHAUSTED

           IF PROBE-EXHAUSTED
               ADD 1 TO WS-INSERT-FAILURES
               DISPLAY "HASH TABLE FULL: CANNOT INSERT "
                   AMR-ACCOUNT-NUMBER
           END-IF
           .

       5110-TRY-INSERT.
           READ HASH-TABLE-FILE INTO HASH-TABLE-RECORD
               INVALID KEY
                   SET PROBE-EMPTY TO TRUE
                   PERFORM 5120-WRITE-RECORD
           END-READ

           IF WS-HT-STATUS = "00"
               EVALUATE TRUE
                   WHEN HTR-EMPTY
                   WHEN HTR-DELETED
      *                Slot is available -- insert here
                       SET PROBE-EMPTY TO TRUE
                       PERFORM 5120-WRITE-RECORD
                   WHEN HTR-OCCUPIED
      *                Slot is taken -- check if same account
                       IF HTR-ACCOUNT-NUMBER =
                           AMR-ACCOUNT-NUMBER
                           SET PROBE-FOUND TO TRUE
                           DISPLAY "DUPLICATE ACCOUNT: "
                               AMR-ACCOUNT-NUMBER
                       ELSE
      *                    Collision -- try next slot
                           ADD 1 TO WS-PROBE-COUNT
                           ADD 1 TO WS-TOTAL-COLLISIONS
                           IF WS-PROBE-COUNT > WS-MAX-PROBES
                               SET PROBE-EXHAUSTED TO TRUE
                           ELSE
                               ADD 1 TO WS-REL-KEY
                               IF WS-REL-KEY > WS-TABLE-SIZE
                                   MOVE 1 TO WS-REL-KEY
                               END-IF
                           END-IF
                       END-IF
               END-EVALUATE
           END-IF
           .

       5120-WRITE-RECORD.
      *    Transfer account data into hash table record
           SET HTR-OCCUPIED TO TRUE
           MOVE AMR-ACCOUNT-NUMBER  TO HTR-ACCOUNT-NUMBER
           MOVE AMR-MEMBER-NAME     TO HTR-MEMBER-NAME
           MOVE AMR-ACCOUNT-TYPE    TO HTR-ACCOUNT-TYPE
           MOVE AMR-BRANCH-CODE     TO HTR-BRANCH-CODE
           MOVE AMR-CURRENT-BALANCE TO HTR-CURRENT-BALANCE
           MOVE AMR-AVAILABLE-BALANCE
                                    TO HTR-AVAILABLE-BALANCE
           MOVE AMR-LAST-TXN-DATE   TO HTR-LAST-TXN-DATE
           MOVE AMR-OPEN-DATE       TO HTR-OPEN-DATE
           MOVE AMR-STATUS-CODE     TO HTR-STATUS-CODE

           REWRITE HASH-TABLE-RECORD
               INVALID KEY
                   DISPLAY "REWRITE ERROR AT SLOT "
                       WS-REL-KEY
           END-REWRITE

           IF WS-PROBE-COUNT > WS-MAX-PROBE-DEPTH
               MOVE WS-PROBE-COUNT TO WS-MAX-PROBE-DEPTH
           END-IF
           .

Looking Up Accounts During Transaction Processing

The batch posting program uses the hash table for fast account lookups:

       6000-LOOKUP-ACCOUNT.
           MOVE TXN-ACCOUNT-NUMBER TO WS-HASH-INPUT
           PERFORM 5000-COMPUTE-HASH
           MOVE WS-REL-KEY TO WS-ORIGINAL-HASH
           MOVE 0 TO WS-PROBE-COUNT
           SET PROBE-COLLISION TO TRUE

           PERFORM 6100-PROBE-FOR-ACCOUNT
               UNTIL PROBE-FOUND
               OR    PROBE-EMPTY
               OR    PROBE-EXHAUSTED

           IF PROBE-FOUND
               MOVE HTR-CURRENT-BALANCE
                                    TO WS-ACCOUNT-BALANCE
               MOVE HTR-AVAILABLE-BALANCE
                                    TO WS-AVAILABLE-BALANCE
               MOVE HTR-STATUS-CODE TO WS-ACCOUNT-STATUS
               SET WS-LOOKUP-OK     TO TRUE
           ELSE
               SET WS-LOOKUP-NOT-FOUND TO TRUE
               ADD 1 TO WS-LOOKUP-MISS-COUNT
           END-IF
           .

       6100-PROBE-FOR-ACCOUNT.
           READ HASH-TABLE-FILE INTO HASH-TABLE-RECORD
               INVALID KEY
                   SET PROBE-EMPTY TO TRUE
           END-READ

           IF WS-HT-STATUS = "00"
               EVALUATE TRUE
                   WHEN HTR-EMPTY
      *                Empty slot reached -- account not found
                       SET PROBE-EMPTY TO TRUE
                   WHEN HTR-DELETED
      *                Deleted slot -- must continue probing
                       ADD 1 TO WS-PROBE-COUNT
                       IF WS-PROBE-COUNT > WS-MAX-PROBES
                           SET PROBE-EXHAUSTED TO TRUE
                       ELSE
                           ADD 1 TO WS-REL-KEY
                           IF WS-REL-KEY > WS-TABLE-SIZE
                               MOVE 1 TO WS-REL-KEY
                           END-IF
                       END-IF
                   WHEN HTR-OCCUPIED
                       IF HTR-ACCOUNT-NUMBER =
                           TXN-ACCOUNT-NUMBER
                           SET PROBE-FOUND TO TRUE
                       ELSE
                           ADD 1 TO WS-PROBE-COUNT
                           IF WS-PROBE-COUNT > WS-MAX-PROBES
                               SET PROBE-EXHAUSTED TO TRUE
                           ELSE
                               ADD 1 TO WS-REL-KEY
                               IF WS-REL-KEY > WS-TABLE-SIZE
                                   MOVE 1 TO WS-REL-KEY
                               END-IF
                           END-IF
                       END-IF
               END-EVALUATE
           END-IF
           .

Updating Balances After Posting

After a transaction is validated and applied, the program writes the updated balance back to the hash table:

       7000-UPDATE-ACCOUNT-BALANCE.
      *    WS-REL-KEY still points to the record from lookup
           MOVE WS-NEW-BALANCE     TO HTR-CURRENT-BALANCE
           MOVE WS-NEW-AVAILABLE   TO HTR-AVAILABLE-BALANCE
           MOVE WS-CURRENT-DATE    TO HTR-LAST-TXN-DATE

           REWRITE HASH-TABLE-RECORD
               INVALID KEY
                   PERFORM 9300-REWRITE-ERROR
           END-REWRITE
           .

JCL for the Hash Table Build and Posting Jobs

//HTBUILD  JOB (ACCT),'HASH TABLE BUILD',
//         CLASS=A,MSGCLASS=X,NOTIFY=&SYSUID
//*
//* STEP 1: BUILD HASH TABLE FROM ACCOUNT MASTER
//*
//BUILD    EXEC PGM=HTBUILD
//STEPLIB  DD DSN=PROD.LOADLIB,DISP=SHR
//ACCTMAST DD DSN=CREDIT.ACCOUNT.MASTER,DISP=SHR
//HASHTBL  DD DSN=CREDIT.HASH.TABLE,DISP=OLD
//BUILDRPT DD SYSOUT=*
//SYSOUT   DD SYSOUT=*
//SYSUDUMP DD SYSOUT=*
//*
//* STEP 2: POST TRANSACTIONS USING HASH LOOKUPS
//*
//POST     EXEC PGM=HTPOST,COND=(0,NE)
//STEPLIB  DD DSN=PROD.LOADLIB,DISP=SHR
//HASHTBL  DD DSN=CREDIT.HASH.TABLE,DISP=OLD
//TXNFILE  DD DSN=CREDIT.DAILY.TRANSACTIONS(0),DISP=SHR
//POSTRPT  DD SYSOUT=*
//REJECTS  DD DSN=CREDIT.POST.REJECTS(+1),
//         DISP=(NEW,CATLG,DELETE),
//         SPACE=(CYL,(5,2)),
//         DCB=(RECFM=FB,LRECL=200,BLKSIZE=0)
//SYSOUT   DD SYSOUT=*
//SYSUDUMP DD SYSOUT=*

Build Statistics Report

After building the hash table, the program produces a report that helps the DBA team monitor the table's health:

       3000-REPORT-STATISTICS.
           DISPLAY "========================================"
           DISPLAY "  HASH TABLE BUILD STATISTICS"
           DISPLAY "========================================"
           DISPLAY "  TABLE SIZE:          " WS-TABLE-SIZE
           DISPLAY "  RECORDS LOADED:      " WS-RECORDS-LOADED
           COMPUTE WS-LOAD-FACTOR ROUNDED =
               WS-RECORDS-LOADED / WS-TABLE-SIZE * 100
           DISPLAY "  LOAD FACTOR:         " WS-LOAD-FACTOR "%"
           DISPLAY "  TOTAL COLLISIONS:    " WS-TOTAL-COLLISIONS
           COMPUTE WS-AVG-PROBES ROUNDED =
               WS-TOTAL-COLLISIONS / WS-RECORDS-LOADED
           DISPLAY "  AVG PROBES/INSERT:   " WS-AVG-PROBES
           DISPLAY "  MAX PROBE DEPTH:     " WS-MAX-PROBE-DEPTH
           DISPLAY "  INSERT FAILURES:     " WS-INSERT-FAILURES
           DISPLAY "========================================"
           .

A healthy hash table will show an average probe depth below 2.0 and a maximum probe depth below 15. If the statistics reveal clustering problems, the DBA team can adjust the table size or switch to a different hash function.


Performance Comparison

The team measured lookup performance over a batch run of 800,000 transactions:

Metric VSAM KSDS (Indexed) VSAM RRDS (Hash)
Avg. I/Os per lookup 3.7 1.2
Max I/Os per lookup 6 4
Batch elapsed time 47 min 19 min
CPU time 12 min 11 min
Buffer pool hit ratio 68% N/A (no index)

The hash-based approach reduced elapsed time by 60 percent. The CPU time was similar because the hash computation is trivial compared to disk I/O. The primary savings came from eliminating index I/O operations.


Limitations and Trade-Offs

What Hash Tables Cannot Do

  1. Sequential access by key: A hash table scatters accounts randomly across the file. You cannot read accounts in account-number order without sorting the output. For reports that need sorted output, the VSAM KSDS remains necessary.

  2. Range queries: Finding all accounts between 5000000000 and 5000001000 requires probing every possible account number. An indexed file handles this with a START and sequential read.

  3. Growth: When the file reaches 80 percent capacity, collision chains lengthen rapidly. The table must be rebuilt with a larger size, which requires an outage.

  4. Deletion complexity: The three-state slot status (Occupied, Empty, Deleted) is necessary because simply emptying a slot could break collision chains for other records that probed past that slot during insertion.

When to Choose Hash-Based Relative Files

Hash-based relative files are most appropriate when: - The workload is predominantly random read/write by a single key - Sequential access by key order is not required during the batch window - The number of records is predictable and does not grow rapidly - Minimizing I/O response time is critical


Discussion Questions

  1. Why is a prime number used as the hash divisor rather than the table size itself? What distribution problem would arise if the divisor were 600,000 (a highly composite number)?

  2. The three-state slot status (Occupied, Empty, Deleted) introduces complexity. Explain why simply marking a deleted slot as Empty would corrupt the hash table. Provide a concrete example with two account numbers that hash to the same slot.

  3. At what load factor would you recommend rebuilding the hash table with more slots? What observable metrics would trigger this decision?

  4. The current hash function uses simple division-remainder. Research and describe one alternative hash function (such as multiplicative hashing or folding) and explain whether it would be better suited to 10-digit account numbers.

  5. How could the design be extended to support a secondary lookup key (such as member Social Security Number) without duplicating the account data? Consider what additional files or structures would be needed.