> "When you know the slot number, you don't need to search the parking garage. You drive straight to it."
In This Chapter
- 13.1 Relative File Concepts
- 13.2 COBOL SELECT Statement for Relative Files
- 13.3 Record Description for Relative Files
- 13.4 CRUD Operations on Relative Files
- 13.5 The START Statement with Relative Files
- 13.6 Hashing: Mapping Business Keys to Slot Numbers
- 13.7 GlobalBank Case: Hash-Based Quick Lookup Table
- 13.8 MedClaim Case: Provider Code Direct Access Table
- 13.9 Monitoring and Tuning Relative File Performance
- 13.10 Understanding RRDS Internals
- 13.11 Trade-Off Analysis: Relative vs. Indexed
- 13.12 Sparse File Handling
- 13.13 Advanced Hashing: Double Hashing and Bucket Strategies
- 13.14 Try It Yourself: Hands-On Exercises
- 13.15 Common Mistakes and Debugging
- 13.16 Practical Considerations for Production RRDS Files
- 13.17 Practical Walkthrough: Building a Hash Table from Scratch
- 13.18 When NOT to Use Relative Files
- 13.19 File Status Codes for Relative Files
- 13.20 Chapter Summary
Chapter 13: Relative Files and RRDS
"When you know the slot number, you don't need to search the parking garage. You drive straight to it." — Priya Kapoor, explaining relative file access to Derek Washington
In Chapter 12, we explored indexed files, where a B+ tree index maps key values to physical record locations. That index is powerful but introduces overhead — extra I/O operations to traverse the tree, extra storage for the index component, and extra processing for index maintenance during updates. For certain access patterns, there is a faster alternative: relative files.
A relative file assigns each record a relative record number (RRN) — a position within the file. Record 1 is at position 1, record 2 at position 2, and so on. When your program knows the record number, access is instantaneous: VSAM calculates the byte offset directly from the RRN, performs a single I/O, and delivers the record. No index traversal. No key comparison. Just arithmetic and one disk read.
This chapter covers VSAM Relative Record Data Sets (RRDS), COBOL relative file processing, hashing strategies that map business keys to record numbers, and the critical question of when relative files are the right choice — and when they are not.
13.1 Relative File Concepts
13.1.1 What Is a Relative File?
A relative file is conceptually an array on disk. Each "slot" holds one record (or is empty), and each slot is identified by its position number — the relative record number.
Slot 1: ┌─────────────────┐
│ Record data │ ← Relative Record Number 1
└─────────────────┘
Slot 2: ┌─────────────────┐
│ Record data │ ← Relative Record Number 2
└─────────────────┘
Slot 3: ┌─────────────────┐
│ (empty) │ ← Slot exists but no record
└─────────────────┘
Slot 4: ┌─────────────────┐
│ Record data │ ← Relative Record Number 4
└─────────────────┘
Slot 5: ┌─────────────────┐
│ Record data │ ← Relative Record Number 5
└─────────────────┘
Key characteristics:
- Fixed-length records: Every slot is the same size, determined when the file is defined
- Direct calculation: VSAM computes the record's disk location as
offset = (RRN - 1) * record_length - Sparse files allowed: Slots can be empty (no record written to that position)
- No index overhead: There is no index structure — access is purely positional
- Single I/O: A random read or write requires exactly one disk I/O (assuming no buffer contention)
13.1.2 VSAM RRDS
In VSAM terms, a Relative Record Data Set (RRDS) stores fixed-length records in numbered slots. Unlike KSDS, an RRDS has no index component — only a data component. The VSAM catalog records the file's attributes (record size, slot count), and VSAM calculates positions on the fly.
RRDS files are defined via IDCAMS:
//DEFRRDS EXEC PGM=IDCAMS
//SYSPRINT DD SYSOUT=*
//SYSIN DD *
DEFINE CLUSTER -
(NAME(GLOBANK.ACCT.QUICKLOOK) -
NUMBERED -
RECORDSIZE(100 100) -
CYLINDERS(5 2) -
SHAREOPTIONS(2 3)) -
DATA -
(NAME(GLOBANK.ACCT.QUICKLOOK.DATA) -
CISZ(4096))
/*
Note the key difference from KSDS: NUMBERED replaces INDEXED, and there is no KEYS parameter (because access is by position, not by key). There is also no INDEX component definition.
📊 Performance Comparison: For a random read, a KSDS requires 3-4 I/Os (index traversal + data read). An RRDS requires exactly 1 I/O (data read only). For applications that perform millions of random lookups, this difference is significant — potentially a 3x-4x improvement in throughput.
13.1.3 When to Use Relative Files
Relative files excel when:
- Records have a natural numeric identifier that can serve as the RRN (e.g., employee numbers 1-10000, product codes that are sequential integers)
- The key space is dense — most slots between 1 and the maximum RRN are occupied
- Maximum performance for random access is the primary requirement
- The data set is relatively static — not many inserts or deletes
- Lookup tables — code tables, translation tables, rate tables
Relative files are a poor choice when:
- The key is alphanumeric (e.g., account numbers like "ACC1234567") — these cannot directly serve as RRNs
- The key space is sparse — if your keys range from 1 to 10,000,000 but only 50,000 slots are occupied, 99.5% of the file is empty
- You need alternate access paths — RRDS does not support alternate keys
- You need key-sequenced browsing — sequential reads return records in slot order, which may not be meaningful
- Frequent inserts and deletes change the active record count substantially
💡 The Modernization Spectrum: The choice between relative and indexed files is a design decision with long-term consequences. "I've seen shops stick with relative files for decades because the original developer chose them in 1985," says Priya Kapoor. "Sometimes that choice still makes sense. Sometimes the application has evolved beyond what relative files handle well. Know the trade-offs so you can make the right call."
13.2 COBOL SELECT Statement for Relative Files
13.2.1 Basic Syntax
ENVIRONMENT DIVISION.
INPUT-OUTPUT SECTION.
FILE-CONTROL.
SELECT QUICK-LOOKUP-FILE
ASSIGN TO QUICKLK
ORGANIZATION IS RELATIVE
ACCESS MODE IS DYNAMIC
RELATIVE KEY IS WS-RELATIVE-KEY
FILE STATUS IS WS-QL-STATUS.
The key differences from indexed file SELECT:
| Clause | Indexed (KSDS) | Relative (RRDS) |
|---|---|---|
| ORGANIZATION | IS INDEXED | IS RELATIVE |
| Key clause | RECORD KEY IS field-in-record | RELATIVE KEY IS field-in-working-storage |
| Key type | Part of the record | External to the record (in WORKING-STORAGE) |
| Key data type | PIC X or PIC 9 | Must be numeric integer |
⚠️ Critical Difference: The RELATIVE KEY field is defined in WORKING-STORAGE, not in the record description. This is because the relative record number is a file-system concept (the slot position), not part of the business data. The record itself does not "know" its position in the file.
13.2.2 The RELATIVE KEY Field
WORKING-STORAGE SECTION.
01 WS-RELATIVE-KEY PIC 9(07).
The RELATIVE KEY must be a numeric integer field. Its value determines which slot is accessed: - MOVE 1 TO WS-RELATIVE-KEY → access slot 1 - MOVE 500 TO WS-RELATIVE-KEY → access slot 500 - MOVE 0 TO WS-RELATIVE-KEY → invalid (slots start at 1)
13.2.3 Access Modes for Relative Files
Like indexed files, relative files support three access modes:
SEQUENTIAL: Records are read or written in slot order (1, 2, 3, ...). Empty slots are skipped during reads. Writes go to the next available slot.
RANDOM: Records are accessed by specifying the RELATIVE KEY value. This is the high-performance direct access mode.
DYNAMIC: Both sequential and random access in the same program.
*--- Sequential access — reading all occupied slots
SELECT SEQ-REL-FILE
ASSIGN TO SEQREL
ORGANIZATION IS RELATIVE
ACCESS MODE IS SEQUENTIAL
RELATIVE KEY IS WS-SEQ-KEY
FILE STATUS IS WS-SEQ-STATUS.
*--- Random access — direct slot lookup
SELECT RND-REL-FILE
ASSIGN TO RNDREL
ORGANIZATION IS RELATIVE
ACCESS MODE IS RANDOM
RELATIVE KEY IS WS-RND-KEY
FILE STATUS IS WS-RND-STATUS.
*--- Dynamic access — both modes
SELECT DYN-REL-FILE
ASSIGN TO DYNREL
ORGANIZATION IS RELATIVE
ACCESS MODE IS DYNAMIC
RELATIVE KEY IS WS-DYN-KEY
FILE STATUS IS WS-DYN-STATUS.
13.3 Record Description for Relative Files
The FD entry and record description for a relative file look the same as any other file, with the important exception that the record does not contain the relative key:
DATA DIVISION.
FILE SECTION.
FD QUICK-LOOKUP-FILE.
01 QUICK-LOOKUP-RECORD.
05 QL-ACCT-NUMBER PIC X(10).
05 QL-HOLDER-NAME PIC X(30).
05 QL-BALANCE PIC S9(11)V99 COMP-3.
05 QL-STATUS-CODE PIC X(02).
05 QL-LAST-ACCESS-DATE PIC 9(08).
05 FILLER PIC X(43).
Notice: there is no "slot number" field in the record. The relative record number exists only in the WORKING-STORAGE field WS-RELATIVE-KEY. The record contains pure business data.
13.4 CRUD Operations on Relative Files
13.4.1 WRITE — Adding Records
Writing to a relative file places a record in a specific slot:
ADD-TO-QUICK-LOOKUP.
MOVE WS-TARGET-SLOT TO WS-RELATIVE-KEY
MOVE WS-ACCT-DATA TO QUICK-LOOKUP-RECORD
WRITE QUICK-LOOKUP-RECORD
INVALID KEY
EVALUATE TRUE
WHEN WS-QL-DUP-KEY
DISPLAY 'Slot already occupied: '
WS-RELATIVE-KEY
WHEN OTHER
DISPLAY 'WRITE error: '
WS-QL-STATUS
END-EVALUATE
NOT INVALID KEY
ADD 1 TO WS-RECORDS-WRITTEN
END-WRITE.
In RANDOM or DYNAMIC mode, you specify which slot to write to by setting the RELATIVE KEY. If the slot is already occupied, you get a status '22' (duplicate key — the slot is already taken).
In SEQUENTIAL mode, the record goes into the next available slot, and the RELATIVE KEY field is set to the slot number used. This is useful for loading a file without caring about specific slot positions.
13.4.2 READ — Retrieving Records
Random READ:
LOOKUP-QUICK-RECORD.
MOVE WS-SLOT-NUMBER TO WS-RELATIVE-KEY
READ QUICK-LOOKUP-FILE
INVALID KEY
IF WS-QL-NOT-FOUND
DISPLAY 'Slot ' WS-RELATIVE-KEY
' is empty'
ELSE
DISPLAY 'READ error: '
WS-QL-STATUS
END-IF
NOT INVALID KEY
PERFORM DISPLAY-QUICK-RECORD
END-READ.
A random READ on a relative file is the fastest possible file access in VSAM: one calculation, one I/O, one record. Status '23' means the slot is empty (no record has been written to that position).
Sequential READ:
READ-ALL-RECORDS.
SET WS-NOT-EOF TO TRUE
PERFORM UNTIL WS-EOF
READ QUICK-LOOKUP-FILE NEXT
AT END
SET WS-EOF TO TRUE
NOT AT END
DISPLAY 'Slot ' WS-RELATIVE-KEY ': '
QL-ACCT-NUMBER ' '
QL-HOLDER-NAME
END-READ
END-PERFORM.
During sequential reading, VSAM automatically skips empty slots. The RELATIVE KEY field is updated with the slot number of each record returned.
13.4.3 REWRITE — Updating Records
Like indexed files, you must READ before REWRITE:
UPDATE-QUICK-RECORD.
MOVE WS-SLOT-NUMBER TO WS-RELATIVE-KEY
READ QUICK-LOOKUP-FILE
INVALID KEY
DISPLAY 'Slot empty — cannot update'
GO TO UPDATE-QUICK-EXIT
END-READ
* Modify the record in the buffer
MOVE WS-NEW-BALANCE TO QL-BALANCE
MOVE WS-TODAYS-DATE TO QL-LAST-ACCESS-DATE
REWRITE QUICK-LOOKUP-RECORD
INVALID KEY
DISPLAY 'REWRITE error: ' WS-QL-STATUS
NOT INVALID KEY
ADD 1 TO WS-RECORDS-UPDATED
END-REWRITE.
UPDATE-QUICK-EXIT.
EXIT.
13.4.4 DELETE — Removing Records
DELETE empties a slot, making it available for future writes:
REMOVE-QUICK-RECORD.
MOVE WS-SLOT-NUMBER TO WS-RELATIVE-KEY
DELETE QUICK-LOOKUP-FILE
INVALID KEY
DISPLAY 'Slot already empty: '
WS-RELATIVE-KEY
NOT INVALID KEY
ADD 1 TO WS-RECORDS-DELETED
END-DELETE.
Unlike indexed files, deleting a relative record truly empties the slot — there is no "logical delete" at the VSAM level. The slot returns to empty status. (Of course, your application can still implement logical deletes by setting a status field in the record, which is often advisable for audit purposes.)
13.5 The START Statement with Relative Files
Like indexed files, relative files in DYNAMIC or SEQUENTIAL access mode support the START statement. However, its semantics differ:
START-RELATIVE-AT-POSITION.
* Position the file at slot 500
MOVE 500 TO WS-RELATIVE-KEY
START MY-REL-FILE
KEY IS NOT LESS THAN WS-RELATIVE-KEY
INVALID KEY
DISPLAY 'No records at or beyond slot 500'
SET WS-EOF TO TRUE
END-START
* Now READ NEXT will return the first occupied slot
* at or after slot 500
PERFORM UNTIL WS-EOF
READ MY-REL-FILE NEXT
AT END
SET WS-EOF TO TRUE
NOT AT END
DISPLAY 'Slot ' WS-RELATIVE-KEY
': ' MY-REC-DATA
END-READ
END-PERFORM.
The START on a relative file positions by slot number, not by record content. The KEY IS NOT LESS THAN phrase means "position at slot N or the next occupied slot beyond N." This is useful for resuming a sequential scan from a specific position — for example, in a checkpoint/restart scenario.
13.5.1 Combining Random and Sequential Access
With DYNAMIC access mode, you can mix random reads with sequential browsing, just as with indexed files:
MIXED-ACCESS-DEMO.
* Random read of slot 42
MOVE 42 TO WS-RELATIVE-KEY
READ MY-REL-FILE
INVALID KEY
DISPLAY 'Slot 42 is empty'
NOT INVALID KEY
DISPLAY 'Slot 42: ' MY-REC-DATA
END-READ
* Now browse sequentially from slot 100
MOVE 100 TO WS-RELATIVE-KEY
START MY-REL-FILE
KEY IS NOT LESS THAN WS-RELATIVE-KEY
END-START
PERFORM 5 TIMES
READ MY-REL-FILE NEXT
AT END
SET WS-EOF TO TRUE
NOT AT END
DISPLAY 'Slot ' WS-RELATIVE-KEY
': ' MY-REC-DATA
END-READ
END-PERFORM.
This flexibility makes DYNAMIC the preferred access mode for most relative file programs.
13.6 Hashing: Mapping Business Keys to Slot Numbers
Here is the central challenge of relative files: business data rarely comes with neat sequential integers as identifiers. Account numbers might be "ACC1234567". Provider codes might be "PRV00042". Product SKUs might be "WDG-7842-BLU". None of these can directly serve as a relative record number.
The solution is hashing — a function that converts a business key into an integer suitable for use as an RRN.
13.6.1 What Is a Hash Function?
A hash function takes an input value (the business key) and produces a numeric output (the slot number):
Hash("ACC1234567") → 4823
Hash("ACC9876543") → 7291
Hash("ACC5555555") → 1156
The ideal hash function: - Distributes keys uniformly across the available slots - Is fast to compute - Produces the same output for the same input (deterministic) - Minimizes collisions (different keys mapping to the same slot)
13.6.2 Simple Hashing Techniques
Division/Remainder Method: The most common hash technique. Divide the numeric portion of the key by a prime number close to the file size, and use the remainder (+1) as the RRN:
COMPUTE-HASH-DIVISION.
* Extract numeric portion of account number
* For ACC1234567, extract 1234567
MOVE ACCT-NUMBER(4:7) TO WS-NUMERIC-KEY
* Divide by a prime near the file size
* File has 10,000 slots — use prime 9973
DIVIDE WS-NUMERIC-KEY BY 9973
GIVING WS-QUOTIENT
REMAINDER WS-HASH-REMAINDER
* Add 1 because RRNs start at 1, not 0
ADD 1 TO WS-HASH-REMAINDER
MOVE WS-HASH-REMAINDER TO WS-RELATIVE-KEY.
Why a prime number? Primes distribute remainders more uniformly than non-primes, reducing clustering. Common primes used in practice: 97, 997, 9973, 99991.
Folding Method: Split the key into equal-sized pieces and add them together:
COMPUTE-HASH-FOLDING.
* Key: 1234567890 → split into 12|34|56|78|90
MOVE ACCT-NUMBER-NUMERIC(1:2) TO WS-FOLD-1
MOVE ACCT-NUMBER-NUMERIC(3:2) TO WS-FOLD-2
MOVE ACCT-NUMBER-NUMERIC(5:2) TO WS-FOLD-3
MOVE ACCT-NUMBER-NUMERIC(7:2) TO WS-FOLD-4
MOVE ACCT-NUMBER-NUMERIC(9:2) TO WS-FOLD-5
ADD WS-FOLD-1 WS-FOLD-2 WS-FOLD-3
WS-FOLD-4 WS-FOLD-5
GIVING WS-FOLD-SUM
* Apply modulo to fit within file size
DIVIDE WS-FOLD-SUM BY 9973
GIVING WS-QUOTIENT
REMAINDER WS-HASH-REMAINDER
ADD 1 TO WS-HASH-REMAINDER
MOVE WS-HASH-REMAINDER TO WS-RELATIVE-KEY.
Mid-Square Method: Square the key (or a portion of it) and extract the middle digits:
COMPUTE-HASH-MIDSQUARE.
MOVE ACCT-NUMBER-NUMERIC TO WS-SQUARE-INPUT
MULTIPLY WS-SQUARE-INPUT BY WS-SQUARE-INPUT
GIVING WS-SQUARED
* Extract middle 4 digits of the squared value
MOVE WS-SQUARED TO WS-SQUARED-DISPLAY
MOVE WS-SQUARED-DISPLAY(5:4) TO WS-MID-DIGITS
* Apply modulo
DIVIDE WS-MID-DIGITS BY 9973
GIVING WS-QUOTIENT
REMAINDER WS-HASH-REMAINDER
ADD 1 TO WS-HASH-REMAINDER
MOVE WS-HASH-REMAINDER TO WS-RELATIVE-KEY.
13.6.3 Handling Collisions
No hash function is perfect. Eventually, two different keys will produce the same slot number — a collision. There are several strategies for handling collisions:
Linear Probing: If slot N is occupied, try slot N+1, then N+2, and so on:
HASH-WITH-LINEAR-PROBE.
PERFORM COMPUTE-HASH-DIVISION
MOVE WS-RELATIVE-KEY TO WS-ORIGINAL-SLOT
MOVE ZERO TO WS-PROBE-COUNT
PERFORM UNTIL WS-SLOT-FOUND
OR WS-PROBE-COUNT > WS-MAX-PROBES
READ QUICK-LOOKUP-FILE
INVALID KEY
* Slot is empty — use it
SET WS-SLOT-FOUND TO TRUE
NOT INVALID KEY
* Slot occupied — check if it's our key
IF QL-ACCT-NUMBER = WS-SEARCH-ACCT
SET WS-SLOT-FOUND TO TRUE
ELSE
* Collision — try next slot
ADD 1 TO WS-RELATIVE-KEY
ADD 1 TO WS-PROBE-COUNT
IF WS-RELATIVE-KEY >
WS-MAX-SLOT-NUMBER
MOVE 1 TO WS-RELATIVE-KEY
END-IF
END-IF
END-READ
END-PERFORM
IF WS-PROBE-COUNT > WS-MAX-PROBES
DISPLAY 'HASH OVERFLOW — too many collisions'
PERFORM HASH-OVERFLOW-HANDLER
END-IF.
Quadratic Probing: Instead of incrementing by 1, probe at offsets that are squares: 1, 4, 9, 16, 25... This reduces clustering:
HASH-WITH-QUADRATIC-PROBE.
PERFORM COMPUTE-HASH-DIVISION
MOVE 1 TO WS-PROBE-OFFSET
PERFORM UNTIL WS-SLOT-FOUND
OR WS-PROBE-OFFSET > WS-MAX-PROBES
READ QUICK-LOOKUP-FILE
INVALID KEY
SET WS-SLOT-FOUND TO TRUE
NOT INVALID KEY
IF QL-ACCT-NUMBER = WS-SEARCH-ACCT
SET WS-SLOT-FOUND TO TRUE
ELSE
* Quadratic probe
COMPUTE WS-RELATIVE-KEY =
WS-ORIGINAL-SLOT +
(WS-PROBE-OFFSET *
WS-PROBE-OFFSET)
COMPUTE WS-RELATIVE-KEY =
FUNCTION MOD(
WS-RELATIVE-KEY
WS-MAX-SLOT-NUMBER) + 1
ADD 1 TO WS-PROBE-OFFSET
END-IF
END-READ
END-PERFORM.
Separate Overflow Area: Maintain a sequential overflow area for collisions. The primary hash area stores the first record for each slot, and an overflow flag points to the overflow area for additional records.
📊 Collision Statistics: With a good hash function and a load factor of 70% (70% of slots occupied), the average number of probes for a successful search is about 1.7 with linear probing and about 1.4 with quadratic probing. At 90% load factor, these jump to about 5.5 and 2.6 respectively. This is why relative files work best when they are not too full.
13.6.4 Load Factor and File Sizing
The load factor is the ratio of occupied slots to total slots:
Load Factor = Number of Records / Total Slots
| Load Factor | Avg Probes (Linear) | Avg Probes (Quadratic) | Recommendation |
|---|---|---|---|
| 50% | 1.5 | 1.3 | Excellent — lots of room |
| 70% | 2.2 | 1.7 | Good — recommended target |
| 80% | 3.0 | 2.0 | Acceptable but watch performance |
| 90% | 5.5 | 2.6 | Degrading — consider reorganization |
| 95% | 10.5 | 3.2 | Poor — reorganize immediately |
💡 Sizing Rule of Thumb: Allocate 130-150% of the expected record count as slots. For 10,000 records, define a file with 13,000-15,000 slots. This keeps the load factor at 67-77%, where performance is excellent.
13.7 GlobalBank Case: Hash-Based Quick Lookup Table
At GlobalBank, the overnight batch cycle processes 2.3 million transactions against the ACCT-MASTER VSAM KSDS file. Maria Chen noticed that 60% of those transactions hit only 15% of the accounts — the "hot accounts" (high-activity checking accounts, corporate accounts with multiple daily transactions).
Priya Kapoor designed a quick lookup table using a relative file. Before the batch cycle, a prep program identifies the top 50,000 most-accessed accounts and loads their key data (account number, current balance, status) into an RRDS using a hash function. During batch processing, the first lookup attempt goes to the quick lookup table. Only if the account is not found there does the program fall back to the full KSDS.
13.7.1 The Quick Lookup Loader
IDENTIFICATION DIVISION.
PROGRAM-ID. QUICK-LOAD.
*=============================================================*
* QUICK-LOAD: Build the quick lookup table from hot accounts *
* GlobalBank Core Banking System *
* *
* Input: Hot account extract (sequential, sorted by freq) *
* Output: Quick lookup table (RRDS, 65000 slots) *
*=============================================================*
ENVIRONMENT DIVISION.
INPUT-OUTPUT SECTION.
FILE-CONTROL.
SELECT HOT-ACCT-FILE
ASSIGN TO HOTACCT
ORGANIZATION IS LINE SEQUENTIAL
FILE STATUS IS WS-HOT-STATUS.
SELECT QUICK-LOOKUP-FILE
ASSIGN TO QUICKLK
ORGANIZATION IS RELATIVE
ACCESS MODE IS RANDOM
RELATIVE KEY IS WS-RELATIVE-KEY
FILE STATUS IS WS-QL-STATUS.
DATA DIVISION.
FILE SECTION.
FD HOT-ACCT-FILE.
01 HOT-ACCT-RECORD.
05 HA-ACCT-NUMBER PIC X(10).
05 HA-HOLDER-NAME PIC X(30).
05 HA-BALANCE PIC S9(11)V99 COMP-3.
05 HA-STATUS-CODE PIC X(02).
05 HA-FREQUENCY PIC 9(05).
FD QUICK-LOOKUP-FILE.
01 QUICK-LOOKUP-RECORD.
05 QL-ACCT-NUMBER PIC X(10).
05 QL-HOLDER-NAME PIC X(30).
05 QL-BALANCE PIC S9(11)V99 COMP-3.
05 QL-STATUS-CODE PIC X(02).
05 QL-LAST-ACCESS-DATE PIC 9(08).
05 FILLER PIC X(43).
WORKING-STORAGE SECTION.
01 WS-HOT-STATUS PIC XX.
01 WS-QL-STATUS PIC XX.
88 WS-QL-SUCCESS VALUE '00'.
88 WS-QL-DUP-KEY VALUE '22'.
88 WS-QL-NOT-FOUND VALUE '23'.
01 WS-RELATIVE-KEY PIC 9(07).
01 WS-ORIGINAL-SLOT PIC 9(07).
01 WS-NUMERIC-KEY PIC 9(07).
01 WS-QUOTIENT PIC 9(07).
01 WS-HASH-REMAINDER PIC 9(07).
01 WS-MAX-SLOTS PIC 9(07) VALUE 64997.
01 WS-PROBE-COUNT PIC 9(03).
01 WS-MAX-PROBES PIC 9(03) VALUE 50.
01 WS-RECORDS-LOADED PIC 9(07) VALUE ZERO.
01 WS-COLLISIONS PIC 9(07) VALUE ZERO.
01 WS-OVERFLOWS PIC 9(05) VALUE ZERO.
01 WS-TODAYS-DATE PIC 9(08).
01 WS-EOF-FLAG PIC X VALUE 'N'.
88 WS-EOF VALUE 'Y'.
88 WS-NOT-EOF VALUE 'N'.
01 WS-SLOT-FOUND-FLAG PIC X VALUE 'N'.
88 WS-SLOT-FOUND VALUE 'Y'.
88 WS-SLOT-NOT-FOUND VALUE 'N'.
PROCEDURE DIVISION.
0000-MAIN.
PERFORM 1000-INITIALIZE
PERFORM 2000-LOAD-TABLE
UNTIL WS-EOF
PERFORM 9000-TERMINATE
STOP RUN.
1000-INITIALIZE.
ACCEPT WS-TODAYS-DATE FROM DATE YYYYMMDD
OPEN INPUT HOT-ACCT-FILE
OPEN OUTPUT QUICK-LOOKUP-FILE
CLOSE QUICK-LOOKUP-FILE
OPEN I-O QUICK-LOOKUP-FILE
IF WS-QL-STATUS NOT = '00'
DISPLAY 'FATAL: Cannot open QUICK-LOOKUP: '
WS-QL-STATUS
STOP RUN
END-IF
SET WS-NOT-EOF TO TRUE
PERFORM 2100-READ-HOT-ACCT.
2000-LOAD-TABLE.
PERFORM 3000-COMPUTE-HASH
PERFORM 4000-FIND-SLOT
IF WS-SLOT-FOUND
INITIALIZE QUICK-LOOKUP-RECORD
MOVE HA-ACCT-NUMBER TO QL-ACCT-NUMBER
MOVE HA-HOLDER-NAME TO QL-HOLDER-NAME
MOVE HA-BALANCE TO QL-BALANCE
MOVE HA-STATUS-CODE TO QL-STATUS-CODE
MOVE WS-TODAYS-DATE TO QL-LAST-ACCESS-DATE
WRITE QUICK-LOOKUP-RECORD
INVALID KEY
DISPLAY 'WRITE ERROR slot '
WS-RELATIVE-KEY ': '
WS-QL-STATUS
ADD 1 TO WS-OVERFLOWS
NOT INVALID KEY
ADD 1 TO WS-RECORDS-LOADED
END-WRITE
ELSE
DISPLAY 'OVERFLOW for account '
HA-ACCT-NUMBER
ADD 1 TO WS-OVERFLOWS
END-IF
PERFORM 2100-READ-HOT-ACCT.
2100-READ-HOT-ACCT.
READ HOT-ACCT-FILE
AT END
SET WS-EOF TO TRUE
NOT AT END
CONTINUE
END-READ.
3000-COMPUTE-HASH.
* Extract numeric portion of account number
* Account format: ACC1234567 — take digits 4-10
MOVE HA-ACCT-NUMBER(4:7) TO WS-NUMERIC-KEY
IF WS-NUMERIC-KEY IS NOT NUMERIC
MOVE ZERO TO WS-NUMERIC-KEY
END-IF
* Division method with prime 64997
DIVIDE WS-NUMERIC-KEY BY WS-MAX-SLOTS
GIVING WS-QUOTIENT
REMAINDER WS-HASH-REMAINDER
ADD 1 TO WS-HASH-REMAINDER
MOVE WS-HASH-REMAINDER TO WS-RELATIVE-KEY
MOVE WS-RELATIVE-KEY TO WS-ORIGINAL-SLOT.
4000-FIND-SLOT.
* Linear probing to find an empty slot
SET WS-SLOT-NOT-FOUND TO TRUE
MOVE ZERO TO WS-PROBE-COUNT
PERFORM UNTIL WS-SLOT-FOUND
OR WS-PROBE-COUNT > WS-MAX-PROBES
READ QUICK-LOOKUP-FILE
INVALID KEY
* Status 23 = empty slot — use it
IF WS-QL-NOT-FOUND
SET WS-SLOT-FOUND TO TRUE
ELSE
DISPLAY 'Unexpected status: '
WS-QL-STATUS
ADD 1 TO WS-PROBE-COUNT
END-IF
NOT INVALID KEY
* Slot occupied — probe next
ADD 1 TO WS-PROBE-COUNT
ADD 1 TO WS-COLLISIONS
ADD 1 TO WS-RELATIVE-KEY
IF WS-RELATIVE-KEY > WS-MAX-SLOTS
MOVE 1 TO WS-RELATIVE-KEY
END-IF
END-READ
END-PERFORM.
9000-TERMINATE.
CLOSE HOT-ACCT-FILE
QUICK-LOOKUP-FILE
DISPLAY '=== QUICK LOOKUP LOAD COMPLETE ==='
DISPLAY 'Records loaded: ' WS-RECORDS-LOADED
DISPLAY 'Collisions: ' WS-COLLISIONS
DISPLAY 'Overflows: ' WS-OVERFLOWS
COMPUTE WS-NUMERIC-KEY =
(WS-RECORDS-LOADED * 100) / WS-MAX-SLOTS
DISPLAY 'Load factor: ' WS-NUMERIC-KEY '%'.
13.7.2 The Quick Lookup Reader
QUICK-ACCOUNT-LOOKUP.
* Called from transaction processing program.
* Try quick lookup first; fall back to KSDS.
PERFORM COMPUTE-HASH-FOR-LOOKUP
MOVE WS-HASH-RESULT TO WS-RELATIVE-KEY
MOVE WS-RELATIVE-KEY TO WS-ORIGINAL-SLOT
MOVE ZERO TO WS-PROBE-COUNT
SET WS-SLOT-NOT-FOUND TO TRUE
PERFORM UNTIL WS-SLOT-FOUND
OR WS-PROBE-COUNT > WS-MAX-PROBES
READ QUICK-LOOKUP-FILE
INVALID KEY
* Empty slot — account not in table
SET WS-ACCT-NOT-IN-CACHE TO TRUE
SET WS-SLOT-FOUND TO TRUE
NOT INVALID KEY
IF QL-ACCT-NUMBER =
WS-SEARCH-ACCT-NUMBER
* Found it in the cache
SET WS-ACCT-IN-CACHE TO TRUE
SET WS-SLOT-FOUND TO TRUE
MOVE QL-BALANCE TO
WS-FOUND-BALANCE
MOVE QL-STATUS-CODE TO
WS-FOUND-STATUS
ELSE
* Different account — keep probing
ADD 1 TO WS-PROBE-COUNT
ADD 1 TO WS-RELATIVE-KEY
IF WS-RELATIVE-KEY >
WS-MAX-SLOTS
MOVE 1 TO WS-RELATIVE-KEY
END-IF
END-IF
END-READ
END-PERFORM
* If not found in cache, fall back to KSDS
IF WS-ACCT-NOT-IN-CACHE OR
WS-PROBE-COUNT > WS-MAX-PROBES
PERFORM KSDS-ACCOUNT-LOOKUP
END-IF.
📊 Results: The quick lookup table reduced average I/O for the hot accounts from 3.5 (KSDS) to 1.3 (RRDS with occasional probing). For the daily batch of 2.3 million transactions, this saved approximately 4.6 million I/O operations — cutting the batch window by 22 minutes.
13.8 MedClaim Case: Provider Code Direct Access Table
MedClaim uses a similar pattern for provider lookup. Provider IDs are sequentially assigned numeric codes (PRV00001 through PRV85000). Since the numeric portion is a natural sequential integer, the RRDS mapping is trivial — no hashing needed.
13.8.1 Direct Mapping Without Hashing
PROVIDER-QUICK-LOOKUP.
* Provider ID format: PRVnnnnn
* Extract numeric portion as RRN directly
MOVE PRV-PROVIDER-ID(4:5) TO WS-PROVIDER-NUM
IF WS-PROVIDER-NUM IS NUMERIC
AND WS-PROVIDER-NUM > ZERO
MOVE WS-PROVIDER-NUM TO WS-RELATIVE-KEY
READ PROVIDER-RRDS-FILE
INVALID KEY
SET WS-PROVIDER-NOT-CACHED TO TRUE
NOT INVALID KEY
SET WS-PROVIDER-CACHED TO TRUE
END-READ
ELSE
SET WS-PROVIDER-NOT-CACHED TO TRUE
END-IF.
This is the ideal case for relative files: a natural numeric key that maps directly to slot numbers with no collisions. Every access is a single I/O.
"When your key is already a sequential number," says James Okafor, "relative files are a no-brainer. You get the simplicity of array access with the persistence of a file. No hash function, no collisions, no wasted slots."
13.8.2 Provider Code Table Complete Program
IDENTIFICATION DIVISION.
PROGRAM-ID. PRV-RRDS-LOAD.
*=============================================================*
* PRV-RRDS-LOAD: Load provider data into RRDS table *
* MedClaim Insurance Processing *
*=============================================================*
ENVIRONMENT DIVISION.
INPUT-OUTPUT SECTION.
FILE-CONTROL.
SELECT PROVIDER-KSDS-FILE
ASSIGN TO PROVKSDS
ORGANIZATION IS INDEXED
ACCESS MODE IS SEQUENTIAL
RECORD KEY IS PK-PROVIDER-ID
FILE STATUS IS WS-PK-STATUS.
SELECT PROVIDER-RRDS-FILE
ASSIGN TO PROVRRDS
ORGANIZATION IS RELATIVE
ACCESS MODE IS RANDOM
RELATIVE KEY IS WS-RELATIVE-KEY
FILE STATUS IS WS-PR-STATUS.
DATA DIVISION.
FILE SECTION.
FD PROVIDER-KSDS-FILE.
01 PROVIDER-KSDS-RECORD.
05 PK-PROVIDER-ID PIC X(10).
05 PK-NPI-NUMBER PIC 9(10).
05 PK-NAME PIC X(50).
05 PK-SPECIALTY-CODE PIC X(03).
05 PK-CONTRACT-STATUS PIC X(02).
05 PK-MAX-ALLOWED PIC S9(07)V99 COMP-3.
05 FILLER PIC X(220).
FD PROVIDER-RRDS-FILE.
01 PROVIDER-RRDS-RECORD.
05 PR-PROVIDER-ID PIC X(10).
05 PR-NPI-NUMBER PIC 9(10).
05 PR-NAME PIC X(50).
05 PR-SPECIALTY-CODE PIC X(03).
05 PR-CONTRACT-STATUS PIC X(02).
05 PR-MAX-ALLOWED PIC S9(07)V99 COMP-3.
05 FILLER PIC X(20).
WORKING-STORAGE SECTION.
01 WS-PK-STATUS PIC XX.
01 WS-PR-STATUS PIC XX.
88 WS-PR-SUCCESS VALUE '00'.
88 WS-PR-DUP-KEY VALUE '22'.
01 WS-RELATIVE-KEY PIC 9(07).
01 WS-PROVIDER-NUM PIC 9(05).
01 WS-RECORDS-LOADED PIC 9(07) VALUE ZERO.
01 WS-ERRORS PIC 9(05) VALUE ZERO.
01 WS-EOF-FLAG PIC X VALUE 'N'.
88 WS-EOF VALUE 'Y'.
PROCEDURE DIVISION.
0000-MAIN.
OPEN INPUT PROVIDER-KSDS-FILE
OPEN OUTPUT PROVIDER-RRDS-FILE
CLOSE PROVIDER-RRDS-FILE
OPEN I-O PROVIDER-RRDS-FILE
SET WS-EOF TO FALSE
PERFORM UNTIL WS-EOF
READ PROVIDER-KSDS-FILE
AT END
SET WS-EOF TO TRUE
NOT AT END
PERFORM 2000-LOAD-RECORD
END-READ
END-PERFORM
CLOSE PROVIDER-KSDS-FILE
PROVIDER-RRDS-FILE
DISPLAY 'Provider RRDS Load Complete'
DISPLAY 'Records loaded: ' WS-RECORDS-LOADED
DISPLAY 'Errors: ' WS-ERRORS
STOP RUN.
2000-LOAD-RECORD.
* Extract numeric portion: PRV00042 → 42
MOVE PK-PROVIDER-ID(4:5) TO WS-PROVIDER-NUM
IF WS-PROVIDER-NUM IS NOT NUMERIC
OR WS-PROVIDER-NUM = ZERO
DISPLAY 'Invalid provider ID: '
PK-PROVIDER-ID
ADD 1 TO WS-ERRORS
ELSE
MOVE WS-PROVIDER-NUM TO WS-RELATIVE-KEY
INITIALIZE PROVIDER-RRDS-RECORD
MOVE PK-PROVIDER-ID TO PR-PROVIDER-ID
MOVE PK-NPI-NUMBER TO PR-NPI-NUMBER
MOVE PK-NAME TO PR-NAME
MOVE PK-SPECIALTY-CODE TO PR-SPECIALTY-CODE
MOVE PK-CONTRACT-STATUS TO PR-CONTRACT-STATUS
MOVE PK-MAX-ALLOWED TO PR-MAX-ALLOWED
WRITE PROVIDER-RRDS-RECORD
INVALID KEY
DISPLAY 'Write error slot '
WS-RELATIVE-KEY ': '
WS-PR-STATUS
ADD 1 TO WS-ERRORS
NOT INVALID KEY
ADD 1 TO WS-RECORDS-LOADED
END-WRITE
END-IF.
13.8.3 Performance Comparison: Direct Mapping vs. Hashing
James Okafor measured the performance difference between the direct-mapped provider lookup (RRDS) and the original KSDS approach:
| Metric | KSDS Lookup | RRDS Direct Map | Improvement |
|---|---|---|---|
| Average I/Os per lookup | 3.2 | 1.0 | -69% |
| Average response time | 4.1 ms | 1.2 ms | -71% |
| Batch adjudication time (25K claims) | 18 min | 11 min | -39% |
| Memory required | BUFNI + BUFND | 1 data buffer | Less |
The RRDS approach was dramatically faster because every lookup was a single I/O — no index traversal, no key comparison, just a direct position calculation. The batch adjudication improvement was not 69% (as the per-lookup improvement would suggest) because many other operations in the adjudication pipeline are not I/O-bound.
Sarah Kim observed another advantage: "The RRDS lookup code is simpler to understand and maintain. There are no INVALID KEY conditions for wrong-key-type situations, no alternate index complications. You compute the slot number, you read. That's it."
13.8.4 Maintaining the RRDS Cache
The provider RRDS cache must be refreshed whenever the provider master changes. MedClaim rebuilds the cache nightly as part of the batch cycle:
//STEP1 EXEC PGM=IDCAMS
//SYSPRINT DD SYSOUT=*
//SYSIN DD *
DELETE MEDCLM.PROVIDER.RRDS.CACHE PURGE
SET MAXCC = 0
/*
//STEP2 EXEC PGM=PRRDSLD
//PROVKSDS DD DSN=MEDCLM.PROVIDER.MASTER,DISP=SHR
//PROVRRDS DD DSN=MEDCLM.PROVIDER.RRDS.CACHE,DISP=(NEW,CATLG),...
The cache is deleted and rebuilt rather than updated incrementally. This is simpler and ensures consistency — there is no risk of the cache getting out of sync with the master. The rebuild takes under 2 minutes for 85,000 records.
Tomás Rivera notes an important operational detail: "We sequence the cache rebuild before the claims adjudication step. If the rebuild fails, the adjudication step uses the previous day's cache — which may have stale data for a few providers, but it's better than failing the entire batch cycle."
13.9 Monitoring and Tuning Relative File Performance
Once a relative file is in production, ongoing monitoring ensures it continues to perform well. As data volumes grow and key distributions shift, a hash function that worked well initially may begin to degrade.
13.9.1 Key Metrics to Track
Every batch cycle that accesses an RRDS file should capture and log these metrics:
01 WS-RRDS-STATS.
05 WS-TOTAL-LOOKUPS PIC 9(09) VALUE ZERO.
05 WS-DIRECT-HITS PIC 9(09) VALUE ZERO.
05 WS-PROBES-NEEDED PIC 9(09) VALUE ZERO.
05 WS-TOTAL-PROBES PIC 9(09) VALUE ZERO.
05 WS-MAX-PROBES PIC 9(03) VALUE ZERO.
05 WS-NOT-FOUND PIC 9(09) VALUE ZERO.
05 WS-AVG-PROBES PIC 9(03)V99.
RRDS-LOOKUP-WITH-STATS.
ADD 1 TO WS-TOTAL-LOOKUPS
MOVE ZERO TO WS-PROBE-COUNT
PERFORM HASH-AND-PROBE
IF WS-PROBE-COUNT = 1
ADD 1 TO WS-DIRECT-HITS
ELSE
ADD 1 TO WS-PROBES-NEEDED
ADD WS-PROBE-COUNT TO WS-TOTAL-PROBES
IF WS-PROBE-COUNT > WS-MAX-PROBES
MOVE WS-PROBE-COUNT TO WS-MAX-PROBES
END-IF
END-IF.
PRINT-RRDS-STATS.
IF WS-PROBES-NEEDED > ZERO
COMPUTE WS-AVG-PROBES =
WS-TOTAL-PROBES / WS-PROBES-NEEDED
END-IF
DISPLAY 'RRDS PERFORMANCE STATISTICS'
DISPLAY ' Total lookups: ' WS-TOTAL-LOOKUPS
DISPLAY ' Direct hits: ' WS-DIRECT-HITS
DISPLAY ' Probes needed: ' WS-PROBES-NEEDED
DISPLAY ' Avg probes/miss: ' WS-AVG-PROBES
DISPLAY ' Max probes: ' WS-MAX-PROBES
DISPLAY ' Not found: ' WS-NOT-FOUND.
The direct hit rate is the most important metric. If more than 80% of lookups resolve on the first probe, the hash function and file sizing are performing well. A direct hit rate below 60% signals trouble — either the load factor is too high, the hash function distributes poorly for the current data, or key clustering has developed.
13.9.2 When to Rebuild
Derek Washington established a monitoring threshold at GlobalBank: "We track RRDS performance weekly. If the direct hit rate drops below 70% or the maximum probe count exceeds 15, we schedule a rebuild for the next maintenance window."
A rebuild involves:
- Increasing the file size — If the load factor has crept above 80%, allocate a larger file (typically 150% of the current record count)
- Evaluating the hash function — If the data distribution has shifted (e.g., a new branch series was added with account numbers clustered in a narrow range), the hash function may need adjustment
- Rebuilding from the master — Delete and recreate the RRDS from the authoritative KSDS source, using the new file size and hash function
* Rebuild pattern: sequential read from master,
* hash and write to new RRDS
REBUILD-RRDS.
OPEN INPUT MASTER-KSDS-FILE
OPEN OUTPUT NEW-RRDS-FILE
MOVE ZERO TO WS-LOAD-COUNT
WS-COLLISION-COUNT
PERFORM UNTIL WS-MASTER-EOF
READ MASTER-KSDS-FILE
AT END
SET WS-MASTER-EOF TO TRUE
NOT AT END
PERFORM HASH-MASTER-KEY
PERFORM PROBE-AND-WRITE
ADD 1 TO WS-LOAD-COUNT
END-READ
END-PERFORM
DISPLAY 'RRDS rebuilt: ' WS-LOAD-COUNT
' records, ' WS-COLLISION-COUNT
' collisions'
CLOSE MASTER-KSDS-FILE
NEW-RRDS-FILE.
13.9.3 Seasonal Patterns and Pre-emptive Sizing
At MedClaim, Sarah Kim discovered that RRDS performance followed a seasonal pattern. During open enrollment periods (October-December), the provider file grew by 8-12% as new providers joined the network. The RRDS cache, sized at 130% of the September provider count, pushed past 90% load factor by November — causing probe chains to lengthen dramatically.
The fix was simple but important: size the RRDS based on the peak expected count rather than the current count. "We now allocate for the December maximum plus 20% growth margin," says Kim. "The wasted space in July costs almost nothing. The performance degradation in November was costing us 40 minutes in the nightly batch window."
📊 Performance Benchmarks: On a typical z15 mainframe, RRDS lookup with load factor 70% averages 1.05 I/Os per lookup. At 85% load factor, this rises to 1.8 I/Os. At 95% load factor, lookups average 3.2 I/Os — nearly three times the ideal. The relationship is non-linear: performance degrades slowly up to about 75%, then accelerates sharply. This is why the 70% target is so widely used in practice.
⚠️ Trap to Avoid: Never tune an RRDS file by looking only at average performance. A system with 95% direct hits and 5% lookups requiring 20+ probes can appear healthy on average but cause unpredictable latency for specific records. Track the maximum probe count and the 95th percentile, not just the average.
13.10 Understanding RRDS Internals
13.10.1 RRDS Control Interval Structure
An RRDS control interval looks different from a KSDS CI. Each CI contains a fixed number of equal-sized slots, and each slot has a control byte indicating whether it is occupied or empty:
┌───────────────────────────────────────────────────┐
│ RRDS Control Interval │
│ ┌──┬────────┐ ┌──┬────────┐ ┌──┬────────┐ ┌────┐│
│ │CB│Record 1│ │CB│Record 2│ │CB│(empty) │ │CIDF││
│ └──┴────────┘ └──┴────────┘ └──┴────────┘ └────┘│
└───────────────────────────────────────────────────┘
CB = Control Byte (occupied/empty flag)
CIDF = CI Definition Field
The control byte is a single byte prepended to each slot. When the control byte is X'00', the slot is empty. When it contains any other value (typically X'01' or the first byte of the record), the slot is occupied. This is how VSAM knows to return status '23' for empty slots and to skip empty slots during sequential reads.
13.10.2 Fixed-Length Record Requirement
RRDS requires fixed-length records. This is fundamental to the direct-calculation access method: VSAM computes a record's position as:
Byte offset = (RRN - 1) * (record_length + control_byte_length)
If records were variable-length, VSAM could not calculate positions — it would need an index, which would defeat the purpose of relative organization. If your data naturally has variable-length content, you have two options:
-
Pad to the maximum length: Define the record with the maximum possible size. Shorter records waste space but maintain direct access.
-
Use a KSDS instead: If the size variation is large (e.g., some records are 50 bytes and others are 500), the space waste from padding may be unacceptable. Use an indexed file.
13.10.3 RRDS vs. Fixed-Length RRDS
IBM VSAM actually supports two types of relative files:
-
RRDS (traditional): Fixed-length records with control bytes, as described above. Slots are numbered from 1. This is what the COBOL ORGANIZATION IS RELATIVE maps to.
-
Variable-Length RRDS (VRRDS): Introduced in later versions of DFSMS, this allows variable-length records in an RRDS. However, COBOL's ORGANIZATION IS RELATIVE maps to the traditional fixed-length RRDS. VRRDS is typically accessed through Assembler or C programs.
For COBOL programming purposes, always think of RRDS as fixed-length.
13.11 Trade-Off Analysis: Relative vs. Indexed
This section addresses the theme of The Modernization Spectrum — understanding when file organization choices matter and how they evolve over time.
13.11.1 Comparison Matrix
| Criterion | Relative (RRDS) | Indexed (KSDS) |
|---|---|---|
| Random access speed | 1 I/O (fastest) | 3-4 I/Os |
| Sequential access | Slot order (may skip empties) | Key order (meaningful) |
| Key flexibility | Numeric integer only | Any data type |
| Alternate access paths | None | Alternate keys supported |
| Disk utilization | Poor if sparse (wasted slots) | Good (records packed) |
| Insert flexibility | Must hash or have numeric key | Insert anywhere by key |
| Delete handling | True slot emptying | Physical or logical delete |
| Maintenance complexity | Hash function management | Index maintenance (automatic) |
| Collision handling | Application responsibility | None needed |
| Reorganization | Rebuild hash table | REPRO/reload |
| Browse capability | Limited (slot order) | Full (key order, alt keys) |
13.11.2 Decision Framework
Use this decision tree when choosing between relative and indexed files:
1. Does the key naturally map to a sequential integer?
→ YES: Relative file is ideal
→ NO: Go to question 2
2. Is maximum random-access speed critical?
→ YES: Go to question 3
→ NO: Use indexed file
3. Is the key space reasonably dense (>50% occupancy)?
→ YES: Go to question 4
→ NO: Use indexed file (sparse relative file wastes space)
4. Do you need alternate access paths (browse by different keys)?
→ YES: Use indexed file
→ NO: Relative file with hashing is viable
5. Can you tolerate collision handling complexity?
→ YES: Use relative file
→ NO: Use indexed file
13.11.3 Hybrid Approaches
The GlobalBank quick lookup table (Section 13.6) demonstrates a hybrid approach: the primary data store is a KSDS (for full functionality), and an RRDS serves as a performance cache for hot records. This pattern is common in high-volume systems:
- KSDS: Master file, complete data, all access paths
- RRDS: Lookup cache, subset of data, maximum speed
The cache is rebuilt periodically (nightly, hourly, or on-demand) from the master file. This gives you the best of both worlds at the cost of maintaining two files and managing cache coherence.
⚖️ The Real-World Tension: "In theory, you should always use the file organization that best fits your access pattern," says Priya Kapoor. "In practice, most shops standardize on KSDS for everything because it's the most flexible and the operations team knows how to manage it. We use RRDS only when we can demonstrate a measurable performance benefit that justifies the added complexity."
13.12 Sparse File Handling
When a relative file has many empty slots (high sparsity), several issues arise:
13.12.1 The Sparsity Problem
Consider a file with slots 1 through 100,000, but only 5,000 records:
- Disk waste: 95% of allocated space holds nothing
- Sequential scan: Must check 100,000 slots to find 5,000 records
- Backup/restore: Entire file must be backed up even though mostly empty
13.12.2 Strategies for Managing Sparsity
Strategy 1: Compact the key space
Instead of using the raw key value as the RRN (which might range from 1 to 10,000,000), use a hash function to map keys into a smaller range:
*--- Map large key space to smaller RRDS
* 100,000 possible keys → 7,500 slots (load factor ~67%)
DIVIDE WS-RAW-KEY BY 7499
GIVING WS-QUOTIENT
REMAINDER WS-HASH-REMAINDER
ADD 1 TO WS-HASH-REMAINDER
MOVE WS-HASH-REMAINDER TO WS-RELATIVE-KEY.
Strategy 2: Periodic reorganization
When the file becomes too sparse (due to deletes), rebuild it:
REORG-RELATIVE-FILE.
* Read all records from old file
* Write to new file with consecutive slots
MOVE 1 TO WS-NEW-SLOT
SET WS-NOT-EOF TO TRUE
PERFORM UNTIL WS-EOF
READ OLD-RRDS-FILE NEXT
AT END
SET WS-EOF TO TRUE
NOT AT END
MOVE WS-NEW-SLOT TO WS-NEW-REL-KEY
WRITE NEW-RRDS-RECORD
FROM OLD-RRDS-RECORD
ADD 1 TO WS-NEW-SLOT
END-READ
END-PERFORM.
Strategy 3: Accept the sparsity
If disk space is not a constraint and sequential access is rare, sparse files work fine for random access. The unused slots cost nothing in terms of random I/O performance.
13.13 Advanced Hashing: Double Hashing and Bucket Strategies
While linear and quadratic probing are the most commonly implemented collision strategies in COBOL relative files, there are more sophisticated approaches that deserve attention.
13.13.1 Double Hashing
Double hashing uses a second hash function to determine the probe step size. Instead of probing at offsets 1, 2, 3 (linear) or 1, 4, 9 (quadratic), double hashing probes at offsets h2, 2h2, 3h2, where h2 is the result of a second hash function:
DOUBLE-HASH-LOOKUP.
* Primary hash: division by prime p1
DIVIDE WS-NUMERIC-KEY BY 64997
GIVING WS-QUOTIENT
REMAINDER WS-HASH-1
ADD 1 TO WS-HASH-1
MOVE WS-HASH-1 TO WS-RELATIVE-KEY
* Secondary hash: division by smaller prime p2
* p2 must be less than p1 and coprime to p1
DIVIDE WS-NUMERIC-KEY BY 64993
GIVING WS-QUOTIENT
REMAINDER WS-HASH-2
ADD 1 TO WS-HASH-2
* Probe: slot = h1 + i * h2 (mod table_size)
SET WS-SLOT-NOT-FOUND TO TRUE
MOVE ZERO TO WS-PROBE-COUNT
PERFORM UNTIL WS-SLOT-FOUND
OR WS-PROBE-COUNT > WS-MAX-PROBES
READ QUICK-LOOKUP-FILE
INVALID KEY
IF WS-QL-NOT-FOUND
SET WS-SLOT-FOUND TO TRUE
END-IF
NOT INVALID KEY
IF QL-ACCT-NUMBER =
WS-SEARCH-ACCT-NUMBER
SET WS-SLOT-FOUND TO TRUE
ELSE
ADD 1 TO WS-PROBE-COUNT
ADD WS-HASH-2
TO WS-RELATIVE-KEY
IF WS-RELATIVE-KEY > WS-MAX-SLOTS
SUBTRACT WS-MAX-SLOTS
FROM WS-RELATIVE-KEY
END-IF
END-IF
END-READ
END-PERFORM.
Double hashing eliminates the clustering problem of linear probing (where occupied slots tend to form contiguous runs) and provides better distribution than quadratic probing. The trade-off: slightly more complex code and a second hash computation.
13.13.2 Bucket Approach
Instead of storing one record per slot, some designs use "buckets" — groups of slots that share the same hash value. If the hash function maps a key to bucket 42, the record goes in the first available slot within that bucket (slots 421-430, for example):
* Bucket size: 10 slots per bucket
* Total buckets: 6,500
* Total slots: 65,000
BUCKET-HASH-WRITE.
PERFORM COMPUTE-HASH-DIVISION
* Convert hash to bucket start position
COMPUTE WS-BUCKET-START =
(WS-HASH-REMAINDER * 10) + 1
* Search for empty slot within the bucket
SET WS-SLOT-NOT-FOUND TO TRUE
PERFORM VARYING WS-RELATIVE-KEY
FROM WS-BUCKET-START BY 1
UNTIL WS-RELATIVE-KEY >=
WS-BUCKET-START + 10
OR WS-SLOT-FOUND
READ REL-FILE
INVALID KEY
IF WS-QL-NOT-FOUND
SET WS-SLOT-FOUND TO TRUE
END-IF
END-READ
END-PERFORM
IF NOT WS-SLOT-FOUND
* Bucket full — overflow
PERFORM HANDLE-BUCKET-OVERFLOW
END-IF.
Bucket strategies are particularly effective when: (a) you can predict that certain hash values will be more popular than others, and (b) you want to limit the maximum number of I/Os for any single lookup to the bucket size. The downside: more wasted space (partially full buckets) and more complex slot management.
13.13.3 Choosing the Right Strategy
| Strategy | Avg Probes at 70% | Clustering | Code Complexity | Best For |
|---|---|---|---|---|
| Linear probing | 2.2 | Severe | Simple | Small tables, low load |
| Quadratic probing | 1.7 | Moderate | Moderate | General purpose |
| Double hashing | 1.5 | None | Higher | Large tables, high load |
| Bucket (size 10) | 1-10 | None | Moderate | Predictable max lookup |
For most COBOL applications, linear probing with a load factor below 77% is sufficient. The performance difference between probing strategies is measurable but rarely significant compared to other factors (disk speed, buffer allocation, overall system load). Choose linear probing unless you have a specific reason to use something more complex.
13.14 Try It Yourself: Hands-On Exercises
Exercise 13.1 — Basic Relative File
Create a program that: 1. Creates a relative file with 100 slots 2. Writes records to slots 1, 5, 10, 15, ..., 100 3. Reads back specific slots (both occupied and empty) 4. Sequentially reads and displays all occupied slots 5. Reports the total records found and the sparsity percentage
Exercise 13.2 — Hash Table Implementation
Build a hash table using a relative file: 1. Define 1,000 slots 2. Implement the division/remainder hash function 3. Load 500 records with alphanumeric keys 4. Implement linear probing for collision handling 5. Track and report collision statistics
Exercise 13.3 — Performance Comparison
If you have access to a system that supports both KSDS and RRDS: 1. Load 10,000 identical records into both a KSDS and an RRDS 2. Perform 1,000 random lookups on each 3. Use ACCEPT FROM TIME to measure elapsed time 4. Compare and analyze the results
🧪 GnuCOBOL Note: GnuCOBOL supports relative file organization natively. The code in this chapter works without modification. The underlying storage engine may use different mechanisms than VSAM, but the COBOL interface and behavior are identical.
13.15 Common Mistakes and Debugging
Mistake 1: RELATIVE KEY in the Record
*--- WRONG: RELATIVE KEY must be in WORKING-STORAGE
FD MY-REL-FILE.
01 MY-REL-RECORD.
05 MY-REL-KEY PIC 9(05). *> Cannot be here
05 MY-DATA PIC X(95).
*--- RIGHT: RELATIVE KEY in WORKING-STORAGE
WORKING-STORAGE SECTION.
01 WS-REL-KEY PIC 9(05). *> Correct location
Mistake 2: RRN of Zero
*--- WRONG: Relative record numbers start at 1
MOVE 0 TO WS-RELATIVE-KEY
READ MY-REL-FILE *> Undefined behavior
*--- RIGHT: First record is at position 1
MOVE 1 TO WS-RELATIVE-KEY
READ MY-REL-FILE
Mistake 3: Not Handling Empty Slots
*--- WRONG: Assumes every slot is occupied
PERFORM VARYING WS-REL-KEY FROM 1 BY 1
UNTIL WS-REL-KEY > 1000
READ MY-REL-FILE
END-READ
PERFORM PROCESS-RECORD
END-PERFORM
*--- RIGHT: Check for empty slots (status 23)
PERFORM VARYING WS-REL-KEY FROM 1 BY 1
UNTIL WS-REL-KEY > 1000
READ MY-REL-FILE
INVALID KEY
IF WS-NOT-FOUND
CONTINUE *> Empty slot, skip
ELSE
PERFORM HANDLE-ERROR
END-IF
NOT INVALID KEY
PERFORM PROCESS-RECORD
END-READ
END-PERFORM
Mistake 4: Forgetting to Handle Hash Overflow
If your probing loop has no maximum, a full file causes an infinite loop:
*--- WRONG: No probe limit
PERFORM UNTIL WS-SLOT-FOUND
READ ...
ADD 1 TO WS-RELATIVE-KEY
END-PERFORM
*--- RIGHT: Always limit probes
PERFORM UNTIL WS-SLOT-FOUND
OR WS-PROBE-COUNT > WS-MAX-PROBES
READ ...
ADD 1 TO WS-PROBE-COUNT
ADD 1 TO WS-RELATIVE-KEY
END-PERFORM
IF WS-PROBE-COUNT > WS-MAX-PROBES
PERFORM HANDLE-OVERFLOW
END-IF
Mistake 5: Not Verifying the Key After READ
When using hashing with collision handling, a successful READ does not mean you found the right record — it means you found ANY record in that slot. You must verify the business key:
*--- WRONG: Assumes successful READ is the right record
PERFORM COMPUTE-HASH
READ REL-FILE
INVALID KEY
DISPLAY 'Not found'
NOT INVALID KEY
PERFORM PROCESS-RECORD *> Wrong record!
END-READ
*--- RIGHT: Verify the business key matches
PERFORM COMPUTE-HASH
SET WS-SLOT-NOT-FOUND TO TRUE
PERFORM UNTIL WS-SLOT-FOUND
READ REL-FILE
INVALID KEY
DISPLAY 'Not found'
SET WS-SLOT-FOUND TO TRUE
NOT INVALID KEY
IF QL-ACCT-NUMBER = WS-SEARCH-KEY
PERFORM PROCESS-RECORD
SET WS-SLOT-FOUND TO TRUE
ELSE
ADD 1 TO WS-RELATIVE-KEY
END-IF
END-READ
END-PERFORM
This mistake is insidious because it appears to work during testing with sparse data (few collisions), then produces wrong results in production when the load factor increases and collisions become common.
Mistake 6: Forgetting to Handle the Wrap-Around
When probing past the end of the file, the relative key must wrap around to slot 1:
*--- WRONG: No wrap-around
ADD 1 TO WS-RELATIVE-KEY
* If WS-RELATIVE-KEY was at MAX, now it's MAX+1
* → Status '14' (RRN too large) on next READ
*--- RIGHT: Wrap around
ADD 1 TO WS-RELATIVE-KEY
IF WS-RELATIVE-KEY > WS-MAX-SLOTS
MOVE 1 TO WS-RELATIVE-KEY
END-IF
13.16 Practical Considerations for Production RRDS Files
13.16.1 RRDS File Sizing on the Mainframe
When defining an RRDS cluster on the mainframe, you must allocate enough space for all slots — including empty ones. The space calculation:
Total bytes = Number_of_slots * (Record_length + control_byte)
+ CI/CA overhead
Example for 65,000 slots, 100-byte records:
65,000 * (100 + 1) = 6,565,000 bytes ≈ 6.3 MB
This is a trivial allocation on modern DASD. Even sparse RRDS files rarely pose storage concerns unless the record size is very large or the slot count is in the millions.
13.16.2 Recovery and Backup
Because RRDS files are often used as caches that can be rebuilt from source data, many shops do not back them up separately. Instead, they document the rebuild procedure and include it in the disaster recovery plan. The IDCAMS REPRO command can back up an RRDS:
REPRO INFILE(RRDS-DD) OUTFILE(BACKUP-DD)
However, REPRO copies only occupied slots — the backup is a sequential file, not an RRDS. To restore, you must redefine the RRDS cluster and reload, or use REPRO from the backup to a new RRDS.
13.16.3 Monitoring RRDS Health
For production RRDS files that use hashing, monitoring the load factor and collision rate over time is essential:
COMPUTE-RRDS-STATISTICS.
* Sequential scan to count occupied slots
MOVE ZERO TO WS-OCCUPIED-COUNT
MOVE ZERO TO WS-EMPTY-COUNT
SET WS-NOT-EOF TO TRUE
PERFORM UNTIL WS-EOF
READ MY-REL-FILE NEXT
AT END
SET WS-EOF TO TRUE
NOT AT END
ADD 1 TO WS-OCCUPIED-COUNT
END-READ
END-PERFORM
COMPUTE WS-LOAD-FACTOR =
(WS-OCCUPIED-COUNT * 100) / WS-MAX-SLOTS
DISPLAY 'Occupied slots: ' WS-OCCUPIED-COUNT
DISPLAY 'Total slots: ' WS-MAX-SLOTS
DISPLAY 'Load factor: ' WS-LOAD-FACTOR '%'
IF WS-LOAD-FACTOR > 85
DISPLAY '*** WARNING: Load factor above 85%'
DISPLAY '*** Consider expanding the file'
END-IF.
Run this monitoring utility weekly or monthly to detect load factor creep before it degrades performance.
13.17 Practical Walkthrough: Building a Hash Table from Scratch
Let us walk through the complete process of building and using a hash-based relative file, step by step.
Step 1: Analyze the Data
Before choosing a hash function, analyze your keys:
Sample keys from GlobalBank:
ACC0001234 ACC0005678 ACC0009012
ACC0003456 ACC0007890 ACC0002345
ACC0006789 ACC0000123 ACC0004567
Observations:
- All start with "ACC" (constant prefix — ignore for hashing)
- Numeric portion: 7 digits, ranging from 0000123 to 0009012
- No obvious clustering pattern
- Expected record count: 50,000
Step 2: Choose File Size and Hash Function
Target load factor: 77%. File size: 50,000 / 0.77 = 64,935. Nearest prime: 64,997.
Hash function: Division/remainder by 64,997.
Step 3: Estimate Collision Rate
With 50,000 records in 64,997 slots (77% load factor) and linear probing, the expected average probe count is approximately 2.2. This means about 55% of records will hash directly to an empty slot, and the rest will need 1-5 additional probes.
Step 4: Write the Loader
The loader program reads input records, computes the hash for each key, probes for an empty slot, and writes the record. It tracks collision statistics to verify that the hash function performs as expected.
Key design decisions: - Maximum probe count: 50 (safety limit to prevent infinite loops) - Overflow handling: Write overflows to a separate sequential file for investigation - Statistics: Track total collisions, maximum chain length, and distribution
Step 5: Write the Lookup Module
The lookup module computes the same hash, reads the slot, and either finds the matching record or probes forward until it finds the record, hits an empty slot (record not in table), or exceeds the probe limit.
Step 6: Test with Production-Volume Data
Testing with 10 records will not reveal collision patterns. Test with full production volume — or at least 70% of it — to verify: - Collision rate matches predictions - Maximum chain length is acceptable (< 20) - No overflows occurred - Lookup performance meets requirements
Step 7: Monitor and Maintain
After deployment, monitor collision statistics over time. If new key patterns emerge (e.g., a new branch assigns account numbers in a different range), the hash function's distribution may degrade. Periodic review of collision statistics catches this before performance degrades noticeably.
💡 Design Principle: The hash function and the data are co-dependent. A hash function that works perfectly for one set of keys may perform poorly for a different set. Always test with representative data, and monitor after deployment.
13.18 When NOT to Use Relative Files
It is equally important to know when relative files are the wrong choice. Here are common scenarios where developers reach for RRDS and should instead use KSDS:
Scenario 1: Alphanumeric Keys with Large Key Space
A product catalog with SKUs like "WDG-7842-BLU" has a key space that does not map naturally to integers. You could hash, but the complexity of collision handling and the inability to browse by SKU order makes KSDS the better choice.
Scenario 2: Need for Multiple Access Paths
If you need to look up records by more than one field (e.g., by customer ID and by customer name), RRDS cannot help — it has no alternate key support. Use KSDS with alternate indexes.
Scenario 3: Highly Dynamic Data
If records are frequently inserted and deleted, the hash table's load factor fluctuates, and collision rates become unpredictable. KSDS handles dynamic data much more gracefully because the B+ tree index adjusts automatically.
Scenario 4: Sequential Reporting
If you need to produce reports in key order, RRDS is awkward — sequential reads return records in slot order, which is hash order (essentially random). You would need to read all records, sort them, and then produce the report. KSDS returns records in key order naturally.
Scenario 5: Small Files
For files with fewer than 1,000 records, the performance difference between RRDS (1 I/O) and KSDS (3-4 I/Os) is negligible — we are talking about milliseconds either way. Use KSDS for its flexibility and simplicity.
🔵 Derek's Question: "If KSDS is more flexible and RRDS only wins on raw speed, why does RRDS still exist?" Priya Kapoor's answer: "Because in high-volume batch processing, those extra I/Os add up. When you process 2.3 million transactions and each one requires a lookup, saving 2-3 I/Os per lookup saves millions of I/Os total. That translates to real time — 20 minutes, 30 minutes — in a batch window where every minute counts."
13.19 File Status Codes for Relative Files
Relative files share most status codes with indexed files, with a few differences:
| Status | Meaning for Relative Files |
|---|---|
| 00 | Successful completion |
| 10 | End of file (sequential read past last slot) |
| 14 | READ — relative record number too large for file |
| 22 | WRITE — slot already occupied |
| 23 | READ/DELETE — slot is empty (no record at that RRN) |
| 24 | WRITE — RRN exceeds file boundary |
| 30 | Permanent I/O error |
| 44 | Record length mismatch |
Status '14' is unique to relative files: it means you attempted to read a record number that is beyond the highest slot in the file. This is different from '23' (which means the slot exists but is empty).
13.20 Chapter Summary
Relative files offer the fastest possible random access in VSAM — a single I/O per record, with no index overhead. This chapter has covered:
- Relative file concepts: Records stored in numbered slots, accessed by position rather than by key value. VSAM RRDS stores fixed-length records with direct positional calculation.
- COBOL syntax: ORGANIZATION IS RELATIVE, RELATIVE KEY in WORKING-STORAGE (not in the record), and the same three access modes as indexed files.
- CRUD operations: Similar to indexed files but with slot-based semantics. Empty slots return status '23'. DELETE truly empties a slot.
- Hashing strategies: Division/remainder, folding, and mid-square methods for converting business keys to slot numbers. Collision handling via linear probing, quadratic probing, or overflow areas.
- Load factor management: The ratio of occupied slots to total slots determines collision frequency. Target 67-77% for best performance.
- Trade-offs: Relative files excel for direct numeric access but lack alternate keys, meaningful sequential order, and efficient sparse storage. Indexed files are more versatile but slower for random access.
- Hybrid patterns: Using an RRDS as a performance cache in front of a KSDS master file — the GlobalBank quick lookup pattern.
The choice between relative and indexed files is a design decision on the modernization spectrum. Understanding both options — and their trade-offs — makes you a more effective COBOL developer. In Chapter 14, we bring all file types together in advanced multi-file processing patterns.
🔗 Connections: This chapter connects to Chapter 12 (Indexed files — the primary alternative), Chapter 14 (Multi-file processing patterns), Chapter 18 (Table Handling — in-memory equivalents of relative files), Chapter 36 (Performance Tuning — file access optimization), and Appendix E (VSAM reference).