In the preceding chapters, you learned to process sequential files (reading records one after another) and indexed files (accessing records by alphanumeric keys through an index structure). This chapter introduces the third file organization...
In This Chapter
- Introduction
- 13.1 Relative File Organization
- 13.2 When to Use Relative Files vs. Indexed Files vs. Sequential Files
- 13.3 VSAM RRDS on z/OS
- 13.4 Creating, Reading, Updating, and Deleting Relative Records
- 13.5 Dynamic Access Mode with Relative Files
- 13.6 File Status Codes for Relative Files
- 13.7 Handling Empty Slots (Deleted Records)
- 13.8 Advanced File Techniques
- 13.9 VSAM Dataset Types: RRDS vs. KSDS vs. ESDS
- 13.10 Line Sequential Files (GnuCOBOL)
- 13.11 EBCDIC vs. ASCII Sort Order
- 13.12 JCL Techniques for File Processing
- 13.13 Error Handling and Recovery Patterns
- 13.14 Performance Comparison: Sequential vs. Indexed vs. Relative
- 13.15 Complete Examples Reference
- Summary
Chapter 13: Relative File Processing and Advanced File Techniques
Introduction
In the preceding chapters, you learned to process sequential files (reading records one after another) and indexed files (accessing records by alphanumeric keys through an index structure). This chapter introduces the third file organization supported by COBOL: relative file organization, where each record occupies a numbered slot and can be accessed directly by its slot number.
Relative files offer a unique capability: when you know the record's position number, you can read or write it with a single I/O operation, with no index traversal required. This makes relative files the fastest possible access method when the record's position can be calculated from its key value.
Beyond relative files, this chapter covers a set of advanced file processing techniques that every production COBOL programmer must master. These include coordinating reads across multiple files simultaneously, the balanced line algorithm for file comparison and merging, master file update patterns, OPEN EXTEND for appending, header and trailer records for data integrity, and checkpoint/restart mechanisms for long-running batch jobs.
We also examine the z/OS VSAM landscape more broadly, comparing RRDS (Relative Record Data Set), KSDS (Key-Sequenced Data Set), and ESDS (Entry-Sequenced Data Set) to help you choose the right file organization for each situation.
13.1 Relative File Organization
What Is a Relative File?
A relative file is a collection of fixed-length records where each record is identified by its relative record number -- its ordinal position within the file. The first record is at position 1, the second at position 2, and so on. Unlike indexed files, there is no embedded key field in the record itself; the record's identity is entirely determined by its position.
Think of a relative file as an array on disk. Just as you access array element EMPLOYEE-TABLE(47) in working storage, you access relative record 47 in a relative file. The system calculates the physical disk location using simple arithmetic:
byte-offset = (record-number - 1) * record-length
This calculation takes constant time regardless of file size, giving relative files O(1) access characteristics.
COBOL Syntax for Relative Files
To declare a relative file in COBOL, you specify three elements in the FILE-CONTROL paragraph:
SELECT EMPLOYEE-FILE
ASSIGN TO RELFILE
ORGANIZATION IS RELATIVE
ACCESS MODE IS RANDOM
RELATIVE KEY IS WS-RELATIVE-KEY
FILE STATUS IS WS-FILE-STATUS.
The key clauses are:
- ORGANIZATION IS RELATIVE -- Declares this as a relative file (as opposed to SEQUENTIAL or INDEXED).
- RELATIVE KEY IS data-name -- Identifies the working-storage field that holds the relative record number. This field must be an unsigned integer defined in WORKING-STORAGE (not in the file's record description). The system uses this field to determine which slot to read from or write to.
- ACCESS MODE -- Can be SEQUENTIAL, RANDOM, or DYNAMIC, just as with indexed files.
The RELATIVE KEY field is the bridge between your program logic and the physical file. Before each READ or WRITE, you set this field to the desired record number. After a sequential READ, the system updates this field with the number of the record just read.
RELATIVE KEY: The Record Number as Key
The RELATIVE KEY must be defined in WORKING-STORAGE SECTION as an unsigned integer:
WORKING-STORAGE SECTION.
01 WS-RELATIVE-KEY PIC 9(4).
The size should be large enough to hold the maximum record number you expect. For a file with up to 9,999 records, PIC 9(4) suffices. For larger files, use PIC 9(8) or larger.
Important rules about the RELATIVE KEY:
- It must NOT be defined within the file's FD record description
- It must be an unsigned integer (no sign, no decimal places)
- The minimum valid value is 1 (there is no record 0)
- You must set it before every random READ, WRITE, or DELETE
- After a sequential READ, it contains the number of the record just read
Access Modes for Relative Files
COBOL supports three access modes for relative files, each suited to different processing patterns.
Sequential Access (ACCESS MODE IS SEQUENTIAL)
Records are read or written in order of their relative record number, starting from record 1 (or from a position established by START). When reading sequentially, the system automatically advances to the next record, skipping empty slots. This is the only access mode allowed when opening a relative file with OPEN EXTEND.
SELECT REL-FILE
ASSIGN TO RELFILE
ORGANIZATION IS RELATIVE
ACCESS MODE IS SEQUENTIAL
RELATIVE KEY IS WS-REL-KEY
FILE STATUS IS WS-STATUS.
Random Access (ACCESS MODE IS RANDOM)
Each operation targets the specific record identified by the current value of the RELATIVE KEY. There is no concept of "current position" -- every operation is independent. This is the mode for direct lookups when you know the record number.
SELECT REL-FILE
ASSIGN TO RELFILE
ORGANIZATION IS RELATIVE
ACCESS MODE IS RANDOM
RELATIVE KEY IS WS-REL-KEY
FILE STATUS IS WS-STATUS.
Dynamic Access (ACCESS MODE IS DYNAMIC)
Combines random and sequential access within a single file opening. You can perform random reads using READ file-name and sequential reads using READ file-name NEXT RECORD. The START statement positions the file for subsequent sequential reads. This is ideal for programs that need both direct lookup and range scanning.
SELECT REL-FILE
ASSIGN TO RELFILE
ORGANIZATION IS RELATIVE
ACCESS MODE IS DYNAMIC
RELATIVE KEY IS WS-REL-KEY
FILE STATUS IS WS-STATUS.
13.2 When to Use Relative Files vs. Indexed Files vs. Sequential Files
Choosing the right file organization is a critical design decision. Here is a framework for making that choice:
Use Relative Files When: - The key is numeric and falls within a predictable, bounded range - You need the fastest possible direct access (O(1) per lookup) - The key can serve directly as the record number (e.g., employee numbers 1001-9999) - You can tolerate wasted space for gaps in the key range - You are implementing a hash table for constant-time lookups - Record numbers are assigned sequentially (no gaps expected)
Use Indexed Files (VSAM KSDS) When: - The key is alphanumeric (names, mixed codes) - The key space is sparse or unpredictable (e.g., Social Security numbers where only a tiny fraction of possible values exist) - You need alternate keys for multiple access paths - You require sequential processing in key order with no empty slots
Use Sequential Files When: - Records are always processed in order (beginning to end) - Batch processing reads every record (reports, end-of-day processing) - The file is used as input to SORT utility - Simplicity and portability are paramount - No direct access by key is required
Performance Comparison:
| Operation | Sequential | Indexed (KSDS) | Relative (RRDS) |
|---|---|---|---|
| Sequential read (all records) | Fastest | Moderate | Moderate |
| Direct read by key | Not possible | 3-4 I/Os (index levels + data) | 1 I/O |
| Insert (append) | Fast | Moderate (index maintenance) | Fast (if slot known) |
| Update in place | Rewrite entire file | 1-2 I/Os | 1 I/O |
| Delete | Not supported | Mark + reclaim | Mark slot empty |
| Disk space efficiency | Best | Good | May waste space (empty slots) |
The key trade-off with relative files is space versus speed. If your key values range from 1 to 1,000,000 but you only have 10,000 active records, the file wastes space for 990,000 empty slots. Indexed files handle sparse key spaces far more efficiently.
13.3 VSAM RRDS on z/OS
VSAM Relative Record Data Set
On IBM z/OS, relative files are implemented as VSAM RRDS (Relative Record Data Set). Each RRDS consists of fixed-length slots, each identified by a relative record number starting at 1. The VSAM access method manages the physical layout, including control intervals and control areas, just as it does for KSDS and ESDS.
Defining a VSAM RRDS requires IDCAMS (Access Method Services):
DEFINE CLUSTER (
NAME(USERID.EMPLOYEE.RRDS)
NUMBERED
RECORDS(9999)
RECORDSIZE(80 80)
SHAREOPTIONS(2 3)
SPEED )
DATA (
NAME(USERID.EMPLOYEE.RRDS.DATA)
CONTROLINTERVALSIZE(4096) )
Key parameters: - NUMBERED -- This keyword makes it an RRDS (as opposed to INDEXED for KSDS or NONINDEXED for ESDS) - RECORDS(9999) -- Allocates space for 9,999 record slots - RECORDSIZE(80 80) -- Fixed-length records of 80 bytes (average and maximum must be equal for RRDS) - CONTROLINTERVALSIZE -- The CI size affects how many records fit per physical I/O; 4096 is a common choice
Note that RRDS has no INDEX component -- only a DATA component. This is because there is no index; records are located by direct calculation.
Empty Slots in RRDS
When you define an RRDS with RECORDS(9999), all 9,999 slots initially exist but are marked as empty. VSAM tracks which slots are occupied using control information within each control interval.
When you DELETE a record from an RRDS, the slot is marked as empty but its space is not physically reclaimed. The slot can be reused by a subsequent WRITE to that same relative record number.
When you READ sequentially through an RRDS, the system automatically skips empty slots -- your program never sees them. However, if you attempt a random READ of an empty slot, you receive file status 23 (record not found).
13.4 Creating, Reading, Updating, and Deleting Relative Records
Creating a Relative File (WRITE)
To populate a relative file, open it for OUTPUT and write records. In random access mode, set the RELATIVE KEY before each WRITE:
MOVE 1042 TO WS-RELATIVE-KEY
MOVE employee-data TO RELATIVE-RECORD
WRITE RELATIVE-RECORD
INVALID KEY
DISPLAY 'SLOT ALREADY OCCUPIED: ' WS-RELATIVE-KEY
NOT INVALID KEY
ADD 1 TO WS-RECORDS-WRITTEN
END-WRITE
The INVALID KEY condition triggers if:
- The slot is already occupied (file status 22)
- The record number is outside the file's boundaries (file status 24)
See Example 01 (example-01-create-relative.cob) for a complete program that reads a sequential employee file and loads it into an RRDS using employee numbers as relative keys.
Reading a Relative Record (READ)
Random read -- set the key and read:
MOVE 1042 TO WS-RELATIVE-KEY
READ RELATIVE-FILE
INVALID KEY
DISPLAY 'NOT FOUND: ' WS-RELATIVE-KEY
NOT INVALID KEY
DISPLAY 'FOUND: ' EMP-NAME
END-READ
Sequential read -- read the next occupied record:
READ RELATIVE-FILE NEXT RECORD
AT END
SET END-OF-FILE TO TRUE
NOT AT END
DISPLAY 'RECORD ' WS-RELATIVE-KEY ': ' EMP-NAME
END-READ
After a sequential read, WS-RELATIVE-KEY contains the record number of the record just read.
Updating a Record (REWRITE)
To update a relative record, you must first READ it, then REWRITE:
MOVE 1042 TO WS-RELATIVE-KEY
READ RELATIVE-FILE
INVALID KEY
DISPLAY 'NOT FOUND'
NOT INVALID KEY
MOVE 75000.00 TO EMP-SALARY
REWRITE EMP-RECORD
INVALID KEY
DISPLAY 'REWRITE ERROR'
END-REWRITE
END-READ
The READ-before-REWRITE requirement exists because REWRITE replaces the record most recently read. Without a prior READ, the system does not know which record to replace.
Deleting a Record (DELETE)
With random access, set the key and delete:
MOVE 1042 TO WS-RELATIVE-KEY
DELETE RELATIVE-FILE
INVALID KEY
DISPLAY 'NOT FOUND: ' WS-RELATIVE-KEY
NOT INVALID KEY
DISPLAY 'DELETED RECORD ' WS-RELATIVE-KEY
END-DELETE
DELETE marks the slot as empty. A subsequent random READ of that slot will return status 23. However, a new record can be written to that slot with WRITE.
See Example 02 (example-02-random-access.cob) for a complete program demonstrating all four operations driven by a transaction file.
13.5 Dynamic Access Mode with Relative Files
Dynamic access mode is the most powerful access mode because it lets you combine random lookups with sequential scanning in a single file opening. This is demonstrated in Example 03 (example-03-dynamic-access.cob).
The START Statement
The START statement positions the file pointer for subsequent sequential reads without actually reading a record:
MOVE 2000 TO WS-RELATIVE-KEY
START RELATIVE-FILE
KEY IS NOT LESS THAN WS-RELATIVE-KEY
INVALID KEY
DISPLAY 'NO RECORDS AT OR ABOVE: ' WS-RELATIVE-KEY
NOT INVALID KEY
DISPLAY 'POSITIONED FOR SEQUENTIAL READ'
END-START
Valid KEY conditions for START:
- KEY IS EQUAL TO -- Position at the exact record
- KEY IS GREATER THAN -- Position at the first record after the specified key
- KEY IS NOT LESS THAN (or >=) -- Position at or after the specified key
After a successful START, use READ file-name NEXT RECORD to retrieve records sequentially from that position.
Practical Example: Range Query
A common pattern is to look up a range of records:
* Position at the start of the range
MOVE 2000 TO WS-RELATIVE-KEY
START EMPLOYEE-FILE
KEY IS NOT LESS THAN WS-RELATIVE-KEY
END-START
* Read sequentially until we pass the end of range
PERFORM UNTIL WS-RELATIVE-KEY > 2999
OR WS-FILE-STATUS = '10'
READ EMPLOYEE-FILE NEXT RECORD
AT END
CONTINUE
NOT AT END
PERFORM PROCESS-EMPLOYEE
END-READ
END-PERFORM
13.6 File Status Codes for Relative Files
File status codes are essential for robust error handling. Here are the status codes most relevant to relative file processing:
| Status | Meaning | When It Occurs |
|---|---|---|
00 |
Successful operation | Any operation completed normally |
02 |
Duplicate key (non-fatal) | Not applicable to relative files |
10 |
End of file | Sequential READ reaches end |
22 |
Duplicate key | WRITE to an occupied slot |
23 |
Record not found | READ or DELETE of empty/nonexistent slot |
24 |
Boundary violation | Key exceeds file size; disk full |
30 |
Permanent I/O error | Hardware or VSAM error |
35 |
File not found | OPEN fails because file does not exist |
37 |
File type conflict | File opened with wrong mode for its organization |
39 |
File attribute conflict | Record length or organization mismatch |
41 |
File already open | OPEN on an already-open file |
42 |
File not open | Operation on a file that is not open |
43 |
READ required before REWRITE/DELETE | No prior READ for sequential access |
44 |
Record length error | Record too large or too small |
46 |
Read past end of file | Sequential read attempted after AT END |
47 |
READ on file not opened INPUT or I-O | Wrong open mode |
48 |
WRITE on file not opened OUTPUT/I-O/EXTEND | Wrong open mode |
49 |
DELETE/REWRITE on file not opened I-O | Wrong open mode |
Always check file status after every file operation. A robust pattern:
READ RELATIVE-FILE
EVALUATE WS-FILE-STATUS
WHEN '00'
PERFORM PROCESS-RECORD
WHEN '10'
SET END-OF-FILE TO TRUE
WHEN '23'
PERFORM RECORD-NOT-FOUND
WHEN OTHER
PERFORM UNEXPECTED-FILE-ERROR
END-EVALUATE
13.7 Handling Empty Slots (Deleted Records)
Relative files inherently contain gaps. A file with capacity for 10,000 records might have only 6,000 occupied slots. This happens because:
- Sparse key values -- Employee numbers may not be contiguous (1001, 1003, 1007...)
- Deleted records -- Slots that once held data are now marked empty
- Intentional gaps -- Space reserved for future records
When reading sequentially, the system automatically skips empty slots. Your program receives only occupied records, and WS-RELATIVE-KEY tells you which slot each came from.
When reading randomly, an empty slot returns status 23. Your program must handle this gracefully:
READ RELATIVE-FILE
INVALID KEY
IF WS-FILE-STATUS = '23'
DISPLAY 'SLOT ' WS-RELATIVE-KEY ' IS EMPTY'
ELSE
DISPLAY 'ERROR: STATUS ' WS-FILE-STATUS
END-IF
END-READ
To scan for empty slots (useful for space utilization reports), use dynamic access with sequential reads and check the data content:
* With dynamic access mode, read every slot including empty ones
* by checking status after each attempted random read
PERFORM VARYING WS-RELATIVE-KEY FROM 1 BY 1
UNTIL WS-RELATIVE-KEY > 10000
READ RELATIVE-FILE
INVALID KEY
IF WS-FILE-STATUS = '23'
ADD 1 TO WS-EMPTY-COUNT
END-IF
NOT INVALID KEY
ADD 1 TO WS-OCCUPIED-COUNT
END-READ
END-PERFORM
13.8 Advanced File Techniques
The remainder of this chapter covers file processing techniques that apply across all file organizations. These are the patterns that distinguish production-quality COBOL from student exercises.
13.8.1 Multiple File Processing
Real-world batch programs routinely have five, ten, or even twenty files open simultaneously. Example 04 (example-04-multi-file.cob) demonstrates processing with five files: three input (a relative employee master, a sequential department reference, and a sequential transaction file) and two output (a payroll report and an error file).
Key principles for multi-file programs:
Open all files in a controlled sequence. Open each file and check its status before proceeding. If any file fails to open, display a diagnostic message and terminate cleanly:
OPEN INPUT EMPLOYEE-FILE
IF WS-EMP-STATUS NOT = '00'
DISPLAY 'CANNOT OPEN EMPLOYEE FILE: ' WS-EMP-STATUS
STOP RUN
END-IF
Close all files in the finalization paragraph. Even if processing ends early due to an error, close every file that was successfully opened. Unclosed VSAM files can be left in an inconsistent state.
Use separate file status fields. Each file must have its own status field. Using a shared status variable leads to bugs that are extremely hard to diagnose.
Load reference data into tables first. Small reference files (departments, codes, states) should be read into WORKING-STORAGE tables during initialization. This avoids repeated file I/O during the main processing loop.
13.8.2 File Matching and Merging Algorithms
File matching is the process of comparing records from two or more sorted files based on a common key. It is foundational to batch processing and forms the basis of master file update logic.
The balanced line algorithm (also called the "match-merge" algorithm) is the classic technique. It works on two sorted files simultaneously:
WHILE both files have records remaining:
IF key-A < key-B:
Process record from file A only (no match)
Read next from file A
ELSE IF key-A > key-B:
Process record from file B only (no match)
Read next from file B
ELSE (keys equal):
Process the matched pair
Read next from both files
The critical implementation detail is end-of-file handling. When one file is exhausted, you must continue processing the remaining records from the other file. The standard technique is to use HIGH-VALUES as a sentinel:
READ FILE-A
AT END MOVE HIGH-VALUES TO KEY-A
END-READ
When a file reaches EOF, its key becomes HIGH-VALUES (the highest possible value). Since no real key can exceed HIGH-VALUES, the algorithm naturally processes all remaining records from the other file.
Example 06 (example-06-file-comparison.cob) implements a complete file comparison using this algorithm, producing a difference report that identifies additions, deletions, and changes.
13.8.3 Master File Update Patterns
The master file update is the quintessential COBOL batch program. A master file (containing current state) is updated with a transaction file (containing changes). Transactions typically include:
- Add -- Insert a new record
- Change -- Modify fields of an existing record
- Delete -- Remove a record (logically or physically)
The classic sequential master file update creates a new master by merging the old master with sorted transactions:
READ old-master
READ transaction
PERFORM UNTIL both files exhausted:
IF old-key < tran-key:
WRITE old-master-record TO new-master (unchanged)
READ old-master
ELSE IF old-key > tran-key:
IF transaction is ADD
WRITE transaction-record TO new-master
ELSE
ERROR: transaction for nonexistent record
READ transaction
ELSE (keys match):
IF transaction is CHANGE
Apply changes, WRITE updated record TO new-master
ELSE IF transaction is DELETE
Skip (do not write to new-master)
ELSE IF transaction is ADD
ERROR: duplicate record
READ old-master
READ transaction
For relative and indexed files, updates can be done in place using I-O mode (READ + REWRITE) rather than creating a new file. This is demonstrated in Example 02.
13.8.4 The Balanced Line Algorithm for File Comparison
The balanced line algorithm deserves special attention because it appears in so many production programs. The name comes from the idea of keeping the two files "in balance" -- always comparing records at the same logical position.
The algorithm's elegance is its simplicity: you only ever compare the current record from each file, and you advance whichever file had the smaller key (or both, if equal). This guarantees:
- Every record from both files is examined exactly once
- Processing requires only a single pass through each file
- Memory usage is constant (only two records in memory at a time)
The HIGH-VALUES sentinel technique eliminates special-case code for end-of-file:
01 WS-HIGH-KEY PIC 9(10) VALUE 9999999999.
2100-READ-OLD-MASTER.
READ OLD-MASTER-FILE
AT END
MOVE WS-HIGH-KEY TO WS-OLD-KEY
NOT AT END
MOVE OLD-ACCOUNT-NUM TO WS-OLD-KEY
END-READ.
When a file hits EOF, its key becomes a value guaranteed to be higher than any real key. The main comparison logic then naturally drains the remaining records from the other file without any special-case code.
13.8.5 OPEN EXTEND for Appending
OPEN EXTEND positions the file pointer after the last existing record, allowing new records to be appended without overwriting existing data. This is critical for audit trails, logs, and any file that accumulates data over time.
OPEN EXTEND AUDIT-FILE
WRITE AUDIT-RECORD
CLOSE AUDIT-FILE
Compare with OPEN OUTPUT, which erases all existing data:
| Operation | Existing Records | Write Position |
|---|---|---|
| OPEN OUTPUT | Destroyed | Beginning of file |
| OPEN EXTEND | Preserved | After last record |
| OPEN I-O | Preserved | As determined by READ/WRITE |
For sequential files, OPEN EXTEND always writes after the last record. For relative files with sequential access, new records are written after the highest existing relative record number.
OPEN EXTEND is not valid for files opened with RANDOM access mode. To append to a relative file using random access, simply set the RELATIVE KEY to the next available slot number and WRITE.
Example 05 (example-05-extend-mode.cob) demonstrates OPEN EXTEND for both a sequential audit trail and a relative file.
In JCL, use DISP=MOD for non-VSAM sequential files to match the OPEN EXTEND behavior:
//AUDFILE DD DSN=USERID.AUDIT.TRAIL,DISP=MOD
13.8.6 I-O Mode for In-Place Updates
Opening a file for I-O (OPEN I-O) allows reading, rewriting, and deleting records within the same open:
OPEN I-O RELATIVE-FILE
MOVE 1042 TO WS-RELATIVE-KEY
READ RELATIVE-FILE
MOVE 82000.00 TO EMP-SALARY
REWRITE EMP-RECORD
CLOSE RELATIVE-FILE
I-O mode is available for relative files and indexed files. It is the standard mode for transaction processing programs that need to update records in place rather than creating new output files.
Rules for I-O mode: - REWRITE must be preceded by a successful READ of the same record - The RELATIVE KEY must not change between READ and REWRITE - DELETE can be used with or without a prior READ (random access) - WRITE can add new records to empty slots (random access)
13.8.7 Multiple Record Types in One File (Using REDEFINES)
Some files contain different types of records identified by a record-type code. For example, a transaction file might contain header, detail, and trailer records. COBOL handles this with REDEFINES in the FD record description:
FD MULTI-TYPE-FILE ...
01 GENERIC-RECORD.
05 REC-TYPE PIC X(1).
05 REC-DATA PIC X(79).
01 HEADER-RECORD REDEFINES GENERIC-RECORD.
05 HR-TYPE PIC X(1).
88 IS-HEADER VALUE 'H'.
05 HR-FILE-DATE PIC 9(8).
05 HR-RECORD-COUNT PIC 9(7).
05 HR-DESCRIPTION PIC X(64).
01 DETAIL-RECORD REDEFINES GENERIC-RECORD.
05 DR-TYPE PIC X(1).
88 IS-DETAIL VALUE 'D'.
05 DR-ACCOUNT-NUM PIC 9(10).
05 DR-AMOUNT PIC S9(9)V99.
05 DR-DESCRIPTION PIC X(57).
01 TRAILER-RECORD REDEFINES GENERIC-RECORD.
05 TR-TYPE PIC X(1).
88 IS-TRAILER VALUE 'T'.
05 TR-RECORD-COUNT PIC 9(7).
05 TR-TOTAL-AMOUNT PIC S9(13)V99.
05 TR-HASH-TOTAL PIC 9(15).
05 FILLER PIC X(41).
After reading a record, examine the type code and use the appropriate REDEFINES view:
READ MULTI-TYPE-FILE
EVALUATE TRUE
WHEN IS-HEADER
PERFORM PROCESS-HEADER
WHEN IS-DETAIL
PERFORM PROCESS-DETAIL
WHEN IS-TRAILER
PERFORM VALIDATE-TRAILER
END-EVALUATE
13.8.8 Header and Trailer Records
Production files typically include header and trailer records for data integrity:
Header records contain: - File creation date and time - Expected record count - File description or version identifier - Processing period (e.g., "PAYROLL 2024-01-15")
Trailer records contain: - Actual record count (for cross-checking) - Control totals (sum of key financial fields) - Hash totals (sum of key fields, used solely for verification)
The validation pattern:
* Read and validate header
READ MASTER-FILE
IF NOT IS-HEADER
DISPLAY 'ERROR: FIRST RECORD IS NOT A HEADER'
STOP RUN
END-IF
MOVE HR-RECORD-COUNT TO WS-EXPECTED-COUNT
* Process details, accumulating counts and totals
PERFORM UNTIL END-OF-FILE
READ MASTER-FILE
EVALUATE TRUE
WHEN IS-DETAIL
ADD 1 TO WS-ACTUAL-COUNT
ADD DR-AMOUNT TO WS-ACTUAL-TOTAL
WHEN IS-TRAILER
PERFORM VALIDATE-TRAILER
SET END-OF-FILE TO TRUE
END-EVALUATE
END-PERFORM
* Validate trailer
IF WS-ACTUAL-COUNT NOT = TR-RECORD-COUNT
DISPLAY 'COUNT MISMATCH: EXPECTED=' TR-RECORD-COUNT
' ACTUAL=' WS-ACTUAL-COUNT
END-IF
13.8.9 Checkpoint/Restart Patterns for Long-Running Batch Jobs
Batch jobs that process millions of records can run for hours. If a job fails at record 5,000,000 of 10,000,000, you do not want to reprocess the first 5,000,000 records. Checkpoint/restart solves this.
The pattern: 1. Every N records (e.g., 10,000), write a checkpoint record containing the current position, running totals, and other state 2. On restart, read the checkpoint to determine where to resume 3. Position the input file to the checkpoint position 4. Continue processing from that point, initializing counters from the checkpoint values
01 WS-CHECKPOINT-INTERVAL PIC 9(7) VALUE 10000.
01 WS-SINCE-CHECKPOINT PIC 9(7) VALUE ZERO.
2000-PROCESS-RECORD.
(normal processing)
ADD 1 TO WS-SINCE-CHECKPOINT
IF WS-SINCE-CHECKPOINT >= WS-CHECKPOINT-INTERVAL
PERFORM 2500-WRITE-CHECKPOINT
MOVE ZERO TO WS-SINCE-CHECKPOINT
END-IF.
2500-WRITE-CHECKPOINT.
OPEN OUTPUT CHECKPOINT-FILE
MOVE current-key TO CK-LAST-KEY
MOVE record-count TO CK-RECORDS-PROCESSED
MOVE running-total TO CK-RUNNING-TOTAL
MOVE 'P' TO CK-STATUS
WRITE CHECKPOINT-RECORD
CLOSE CHECKPOINT-FILE
DISPLAY 'CHECKPOINT AT KEY: ' CK-LAST-KEY.
On z/OS, you can also use the system CHKPT macro for automatic checkpointing, but the application-level approach gives you more control and is portable across platforms.
13.9 VSAM Dataset Types: RRDS vs. KSDS vs. ESDS
VSAM ESDS (Entry-Sequenced Data Set)
An ESDS stores records in the order they are written, with no key-based access. Records are identified by their RBA (Relative Byte Address) -- the byte offset from the beginning of the file. ESDS is the VSAM equivalent of a sequential file, but with the added capability of direct access by RBA.
ESDS characteristics: - Records are added only at the end (no insertion in the middle) - Records cannot be deleted (only logically marked as inactive) - Records can be read sequentially or by RBA - Variable-length records are supported - Commonly used for log files and audit trails
IDCAMS definition:
DEFINE CLUSTER (
NAME(USERID.AUDIT.ESDS)
NONINDEXED
RECORDSIZE(100 200)
SHAREOPTIONS(2 3) )
Choosing the Right VSAM Organization
| Feature | ESDS | KSDS | RRDS |
|---|---|---|---|
| Record access | Sequential or by RBA | By key, sequentially, or both | By relative record number |
| Key type | None (RBA only) | Alphanumeric embedded key | Numeric slot number |
| Duplicate keys | N/A | Optional (with AIX) | N/A |
| Record deletion | Logical only | Physical (space reclaimed) | Physical (slot marked empty) |
| Alternate keys | No | Yes (via AIX) | No |
| Variable-length records | Yes | Yes | No (fixed only) |
| Insertion order | Append only | Key order | Any slot |
| Best for | Logs, audit trails | General-purpose keyed access | Direct numeric lookups |
13.10 Line Sequential Files (GnuCOBOL)
GnuCOBOL (and some other COBOL implementations outside the mainframe) supports an additional file organization: LINE SEQUENTIAL. This organization uses operating system text file conventions, with records delimited by newline characters rather than fixed-length slots.
SELECT TEXT-FILE
ASSIGN TO 'output.txt'
ORGANIZATION IS LINE SEQUENTIAL
FILE STATUS IS WS-STATUS.
Key differences from standard sequential files: - Records are terminated by newline characters (LF on Unix, CR+LF on Windows) - Trailing spaces are stripped on write and not restored on read - No BLOCK CONTAINS clause (blocking is handled by the OS) - Not available on z/OS (mainframe COBOL does not support this organization)
LINE SEQUENTIAL is useful for: - Reading and writing CSV files and text reports - Interfacing with other languages and tools that expect text files - Development and testing on desktop systems
13.11 EBCDIC vs. ASCII Sort Order
When files are processed across platforms, the difference in character encoding can cause subtle bugs in sort order and file comparison.
EBCDIC (z/OS mainframe): - Lowercase letters: a-z (hex 81-A9) - Uppercase letters: A-Z (hex C1-E9) - Digits: 0-9 (hex F0-F9) - Sort order: spaces < lowercase < uppercase < digits
ASCII (Unix, Windows, GnuCOBOL): - Digits: 0-9 (hex 30-39) - Uppercase letters: A-Z (hex 41-5A) - Lowercase letters: a-z (hex 61-7A) - Sort order: spaces < digits < uppercase < lowercase
This means that the same data sorted on a mainframe and on a PC will be in different order. A file containing "SMITH", "smith", and "123" would sort as:
| EBCDIC order | ASCII order |
|---|---|
| smith | 123 |
| SMITH | SMITH |
| 123 | smith |
Impact on file processing: - Files sorted on one platform may not be correctly sorted on another - The balanced line algorithm assumes both files are sorted in the same order - MERGE operations require consistent collating sequences - When migrating files between platforms, re-sort after transfer
COBOL provides the PROGRAM COLLATING SEQUENCE IS clause and the ALPHABET clause in SPECIAL-NAMES to override the default collating sequence, but the safest approach is to sort files on the same platform where they will be processed.
13.12 JCL Techniques for File Processing
File Concatenation
JCL allows multiple datasets to be concatenated under a single DD name. The COBOL program sees them as one continuous file:
//INPUT DD DSN=USERID.DAILY.TRANS.MON,DISP=SHR
// DD DSN=USERID.DAILY.TRANS.TUE,DISP=SHR
// DD DSN=USERID.DAILY.TRANS.WED,DISP=SHR
// DD DSN=USERID.DAILY.TRANS.THU,DISP=SHR
// DD DSN=USERID.DAILY.TRANS.FRI,DISP=SHR
The program reads through all five files as if they were one. At the end of each dataset, VSAM automatically opens the next one. Important rules: - All datasets must have compatible DCB attributes (RECFM, LRECL) - The first DD sets the block size; subsequent datasets should have equal or smaller block sizes - Concatenation works only for sequential input files
Generation Data Groups (GDG)
GDGs provide automatic file versioning. Each time you create a new "generation," the previous generation is preserved with its version number:
//* Define the GDG base
//DEFGDG EXEC PGM=IDCAMS
//SYSPRINT DD SYSOUT=*
//SYSIN DD *
DEFINE GDG (NAME(USERID.MASTER.FILE) LIMIT(7) SCRATCH)
/*
//* Create a new generation
//NEWGEN DD DSN=USERID.MASTER.FILE(+1),
// DISP=(NEW,CATLG,DELETE),
// UNIT=SYSDA,SPACE=(CYL,(10,5)),
// DCB=(RECFM=FB,LRECL=100,BLKSIZE=0)
//* Read the current (most recent) generation
//CURGEN DD DSN=USERID.MASTER.FILE(0),DISP=SHR
//* Read the previous generation
//PREVGEN DD DSN=USERID.MASTER.FILE(-1),DISP=SHR
GDGs are heavily used for master file updates:
- Read current master: (0)
- Write new master: (+1)
- If the job fails, the current master (0) is still intact
- Previous versions can be used for recovery
Temporary Datasets
Temporary datasets exist only for the duration of the job. They are identified by a double ampersand (&&) prefix:
//SORTOUT DD DSN=&&SORTED,DISP=(NEW,PASS),
// UNIT=SYSDA,SPACE=(CYL,(5,2)),
// DCB=(RECFM=FB,LRECL=100,BLKSIZE=0)
DISP=(NEW,PASS) creates the dataset and passes it to subsequent steps. DISP=(OLD,DELETE) in a later step uses and then deletes it.
Temporary datasets are ideal for intermediate results in multi-step jobs (e.g., sort output that feeds into a comparison program).
13.13 Error Handling and Recovery Patterns
Defensive File Processing
Production programs must handle every possible file error gracefully:
01 WS-FILE-STATUS PIC XX.
88 FS-OK VALUE '00'.
88 FS-EOF VALUE '10'.
88 FS-DUP-KEY VALUE '22'.
88 FS-NOT-FOUND VALUE '23'.
88 FS-BOUNDARY VALUE '24'.
88 FS-PERM-ERROR VALUE '30'.
9100-CHECK-FILE-STATUS.
EVALUATE TRUE
WHEN FS-OK
CONTINUE
WHEN FS-EOF
SET END-OF-FILE TO TRUE
WHEN FS-DUP-KEY
ADD 1 TO WS-DUP-KEY-CT
PERFORM 9200-LOG-ERROR
WHEN FS-NOT-FOUND
ADD 1 TO WS-NOT-FOUND-CT
PERFORM 9200-LOG-ERROR
WHEN FS-PERM-ERROR
DISPLAY 'PERMANENT I/O ERROR - STATUS: '
WS-FILE-STATUS
PERFORM 9900-ABNORMAL-END
WHEN OTHER
DISPLAY 'UNEXPECTED FILE STATUS: '
WS-FILE-STATUS
PERFORM 9900-ABNORMAL-END
END-EVALUATE.
Recovery Pattern: Count and Continue
For non-fatal errors (duplicate keys, records not found), the standard pattern is to: 1. Count the error 2. Log it to an error file 3. Continue processing
Set a threshold: if errors exceed N% of records processed, terminate the job. This prevents runaway processing with bad data:
IF WS-ERROR-CT > (WS-RECORDS-READ * 0.05)
DISPLAY 'ERROR RATE EXCEEDS 5% - ABORTING'
PERFORM 9900-ABNORMAL-END
END-IF
Recovery Pattern: Graceful Shutdown
When a fatal error occurs, close all files before terminating. Leaving VSAM files unclosed can cause data corruption:
9900-ABNORMAL-END.
DISPLAY 'ABNORMAL END - CLOSING ALL FILES'
CLOSE INPUT-FILE
OUTPUT-FILE
ERROR-FILE
RELATIVE-FILE
MOVE 16 TO RETURN-CODE
STOP RUN.
13.14 Performance Comparison: Sequential vs. Indexed vs. Relative
Understanding performance characteristics helps you design efficient batch systems.
Sequential File Performance
- Full-file scan: Fastest. Records are stored contiguously on disk. The operating system reads ahead (prefetch), so the next record is usually already in the buffer.
- Direct access: Not possible without reading from the beginning.
- Insert: Fast (append at end), but inserting in the middle requires rewriting the entire file.
Indexed File (KSDS) Performance
- Full-file scan: Moderate. Records are in key order within control intervals, but CIs may not be physically contiguous (CI/CA splits).
- Direct access: 3-4 I/O operations (sequence set index + data component). With buffering, frequently accessed index levels may be cached.
- Insert: Moderate. May trigger CI splits (move half the records to a new CI) or CA splits (allocate a new control area).
Relative File (RRDS) Performance
- Full-file scan: Moderate to slow. The file may contain many empty slots that must be skipped. If only 10% of slots are occupied, 90% of the I/O is wasted reading empty slots.
- Direct access: Fastest -- exactly 1 I/O operation. No index traversal.
- Insert: Fast if the slot is known. No index maintenance.
Practical Guidelines
- For batch reporting (read every record): Use sequential files. Sort first if needed.
- For online inquiry (random lookups): Use indexed files (most flexible) or relative files (fastest, if key is numeric).
- For master file updates (read master + transactions): Use indexed files in I-O mode, or sequential old-master/new-master pattern.
- For hash tables (O(1) lookup): Use relative files with a hash function.
- For audit trails (append-only): Use sequential files or ESDS.
- For hybrid access (some random, some sequential): Use dynamic access mode on indexed or relative files.
13.15 Complete Examples Reference
This chapter includes six complete, working code examples with companion JCL:
| Example | Program | Description |
|---|---|---|
| 01 | example-01-create-relative.cob |
Create a VSAM RRDS by loading employee records from a sequential file |
| 02 | example-02-random-access.cob |
Random READ, WRITE, REWRITE, and DELETE on a relative file |
| 03 | example-03-dynamic-access.cob |
Dynamic access: single lookups, range scans, and forward scans |
| 04 | example-04-multi-file.cob |
Five files open simultaneously for payroll processing |
| 05 | example-05-extend-mode.cob |
OPEN EXTEND for appending to sequential and relative files |
| 06 | example-06-file-comparison.cob |
Balanced line algorithm for comparing two sorted files |
Each .cob file has a companion .jcl file that includes VSAM definition (where applicable), compile, link-edit, and execution steps.
Summary
Relative file processing gives COBOL programmers a powerful tool for situations where direct, constant-time access by numeric key is required. The combination of ORGANIZATION IS RELATIVE, the RELATIVE KEY clause, and the three access modes (SEQUENTIAL, RANDOM, DYNAMIC) provides all the operations needed for creating, reading, updating, and deleting records.
The advanced file techniques covered in this chapter -- multi-file processing, the balanced line algorithm, master file updates, header/trailer validation, OPEN EXTEND, checkpoint/restart, and multiple record types -- represent the core competencies expected of production COBOL programmers in enterprise environments.
When choosing among file organizations, consider the access patterns your program requires, the nature of your keys, and the trade-offs between speed, space, and flexibility. Sequential files are best for full-file batch processing. Indexed files (VSAM KSDS) are the general-purpose choice for keyed access. Relative files (VSAM RRDS) deliver the fastest direct access but require numeric keys and may waste space with sparse key distributions.
In the next chapter, we will explore the SORT and MERGE statements, which are closely related to the file matching and comparison techniques introduced here.