24 min read

In the preceding chapters, you learned to process sequential files (reading records one after another) and indexed files (accessing records by alphanumeric keys through an index structure). This chapter introduces the third file organization...

Chapter 13: Relative File Processing and Advanced File Techniques

Introduction

In the preceding chapters, you learned to process sequential files (reading records one after another) and indexed files (accessing records by alphanumeric keys through an index structure). This chapter introduces the third file organization supported by COBOL: relative file organization, where each record occupies a numbered slot and can be accessed directly by its slot number.

Relative files offer a unique capability: when you know the record's position number, you can read or write it with a single I/O operation, with no index traversal required. This makes relative files the fastest possible access method when the record's position can be calculated from its key value.

Beyond relative files, this chapter covers a set of advanced file processing techniques that every production COBOL programmer must master. These include coordinating reads across multiple files simultaneously, the balanced line algorithm for file comparison and merging, master file update patterns, OPEN EXTEND for appending, header and trailer records for data integrity, and checkpoint/restart mechanisms for long-running batch jobs.

We also examine the z/OS VSAM landscape more broadly, comparing RRDS (Relative Record Data Set), KSDS (Key-Sequenced Data Set), and ESDS (Entry-Sequenced Data Set) to help you choose the right file organization for each situation.


13.1 Relative File Organization

What Is a Relative File?

A relative file is a collection of fixed-length records where each record is identified by its relative record number -- its ordinal position within the file. The first record is at position 1, the second at position 2, and so on. Unlike indexed files, there is no embedded key field in the record itself; the record's identity is entirely determined by its position.

Think of a relative file as an array on disk. Just as you access array element EMPLOYEE-TABLE(47) in working storage, you access relative record 47 in a relative file. The system calculates the physical disk location using simple arithmetic:

byte-offset = (record-number - 1) * record-length

This calculation takes constant time regardless of file size, giving relative files O(1) access characteristics.

COBOL Syntax for Relative Files

To declare a relative file in COBOL, you specify three elements in the FILE-CONTROL paragraph:

       SELECT EMPLOYEE-FILE
           ASSIGN TO RELFILE
           ORGANIZATION IS RELATIVE
           ACCESS MODE IS RANDOM
           RELATIVE KEY IS WS-RELATIVE-KEY
           FILE STATUS IS WS-FILE-STATUS.

The key clauses are:

  • ORGANIZATION IS RELATIVE -- Declares this as a relative file (as opposed to SEQUENTIAL or INDEXED).
  • RELATIVE KEY IS data-name -- Identifies the working-storage field that holds the relative record number. This field must be an unsigned integer defined in WORKING-STORAGE (not in the file's record description). The system uses this field to determine which slot to read from or write to.
  • ACCESS MODE -- Can be SEQUENTIAL, RANDOM, or DYNAMIC, just as with indexed files.

The RELATIVE KEY field is the bridge between your program logic and the physical file. Before each READ or WRITE, you set this field to the desired record number. After a sequential READ, the system updates this field with the number of the record just read.

RELATIVE KEY: The Record Number as Key

The RELATIVE KEY must be defined in WORKING-STORAGE SECTION as an unsigned integer:

       WORKING-STORAGE SECTION.
       01  WS-RELATIVE-KEY    PIC 9(4).

The size should be large enough to hold the maximum record number you expect. For a file with up to 9,999 records, PIC 9(4) suffices. For larger files, use PIC 9(8) or larger.

Important rules about the RELATIVE KEY:

  1. It must NOT be defined within the file's FD record description
  2. It must be an unsigned integer (no sign, no decimal places)
  3. The minimum valid value is 1 (there is no record 0)
  4. You must set it before every random READ, WRITE, or DELETE
  5. After a sequential READ, it contains the number of the record just read

Access Modes for Relative Files

COBOL supports three access modes for relative files, each suited to different processing patterns.

Sequential Access (ACCESS MODE IS SEQUENTIAL)

Records are read or written in order of their relative record number, starting from record 1 (or from a position established by START). When reading sequentially, the system automatically advances to the next record, skipping empty slots. This is the only access mode allowed when opening a relative file with OPEN EXTEND.

       SELECT REL-FILE
           ASSIGN TO RELFILE
           ORGANIZATION IS RELATIVE
           ACCESS MODE IS SEQUENTIAL
           RELATIVE KEY IS WS-REL-KEY
           FILE STATUS IS WS-STATUS.

Random Access (ACCESS MODE IS RANDOM)

Each operation targets the specific record identified by the current value of the RELATIVE KEY. There is no concept of "current position" -- every operation is independent. This is the mode for direct lookups when you know the record number.

       SELECT REL-FILE
           ASSIGN TO RELFILE
           ORGANIZATION IS RELATIVE
           ACCESS MODE IS RANDOM
           RELATIVE KEY IS WS-REL-KEY
           FILE STATUS IS WS-STATUS.

Dynamic Access (ACCESS MODE IS DYNAMIC)

Combines random and sequential access within a single file opening. You can perform random reads using READ file-name and sequential reads using READ file-name NEXT RECORD. The START statement positions the file for subsequent sequential reads. This is ideal for programs that need both direct lookup and range scanning.

       SELECT REL-FILE
           ASSIGN TO RELFILE
           ORGANIZATION IS RELATIVE
           ACCESS MODE IS DYNAMIC
           RELATIVE KEY IS WS-REL-KEY
           FILE STATUS IS WS-STATUS.

13.2 When to Use Relative Files vs. Indexed Files vs. Sequential Files

Choosing the right file organization is a critical design decision. Here is a framework for making that choice:

Use Relative Files When: - The key is numeric and falls within a predictable, bounded range - You need the fastest possible direct access (O(1) per lookup) - The key can serve directly as the record number (e.g., employee numbers 1001-9999) - You can tolerate wasted space for gaps in the key range - You are implementing a hash table for constant-time lookups - Record numbers are assigned sequentially (no gaps expected)

Use Indexed Files (VSAM KSDS) When: - The key is alphanumeric (names, mixed codes) - The key space is sparse or unpredictable (e.g., Social Security numbers where only a tiny fraction of possible values exist) - You need alternate keys for multiple access paths - You require sequential processing in key order with no empty slots

Use Sequential Files When: - Records are always processed in order (beginning to end) - Batch processing reads every record (reports, end-of-day processing) - The file is used as input to SORT utility - Simplicity and portability are paramount - No direct access by key is required

Performance Comparison:

Operation Sequential Indexed (KSDS) Relative (RRDS)
Sequential read (all records) Fastest Moderate Moderate
Direct read by key Not possible 3-4 I/Os (index levels + data) 1 I/O
Insert (append) Fast Moderate (index maintenance) Fast (if slot known)
Update in place Rewrite entire file 1-2 I/Os 1 I/O
Delete Not supported Mark + reclaim Mark slot empty
Disk space efficiency Best Good May waste space (empty slots)

The key trade-off with relative files is space versus speed. If your key values range from 1 to 1,000,000 but you only have 10,000 active records, the file wastes space for 990,000 empty slots. Indexed files handle sparse key spaces far more efficiently.


13.3 VSAM RRDS on z/OS

VSAM Relative Record Data Set

On IBM z/OS, relative files are implemented as VSAM RRDS (Relative Record Data Set). Each RRDS consists of fixed-length slots, each identified by a relative record number starting at 1. The VSAM access method manages the physical layout, including control intervals and control areas, just as it does for KSDS and ESDS.

Defining a VSAM RRDS requires IDCAMS (Access Method Services):

  DEFINE CLUSTER (
           NAME(USERID.EMPLOYEE.RRDS)
           NUMBERED
           RECORDS(9999)
           RECORDSIZE(80 80)
           SHAREOPTIONS(2 3)
           SPEED )
         DATA (
           NAME(USERID.EMPLOYEE.RRDS.DATA)
           CONTROLINTERVALSIZE(4096) )

Key parameters: - NUMBERED -- This keyword makes it an RRDS (as opposed to INDEXED for KSDS or NONINDEXED for ESDS) - RECORDS(9999) -- Allocates space for 9,999 record slots - RECORDSIZE(80 80) -- Fixed-length records of 80 bytes (average and maximum must be equal for RRDS) - CONTROLINTERVALSIZE -- The CI size affects how many records fit per physical I/O; 4096 is a common choice

Note that RRDS has no INDEX component -- only a DATA component. This is because there is no index; records are located by direct calculation.

Empty Slots in RRDS

When you define an RRDS with RECORDS(9999), all 9,999 slots initially exist but are marked as empty. VSAM tracks which slots are occupied using control information within each control interval.

When you DELETE a record from an RRDS, the slot is marked as empty but its space is not physically reclaimed. The slot can be reused by a subsequent WRITE to that same relative record number.

When you READ sequentially through an RRDS, the system automatically skips empty slots -- your program never sees them. However, if you attempt a random READ of an empty slot, you receive file status 23 (record not found).


13.4 Creating, Reading, Updating, and Deleting Relative Records

Creating a Relative File (WRITE)

To populate a relative file, open it for OUTPUT and write records. In random access mode, set the RELATIVE KEY before each WRITE:

       MOVE 1042 TO WS-RELATIVE-KEY
       MOVE employee-data TO RELATIVE-RECORD
       WRITE RELATIVE-RECORD
           INVALID KEY
               DISPLAY 'SLOT ALREADY OCCUPIED: ' WS-RELATIVE-KEY
           NOT INVALID KEY
               ADD 1 TO WS-RECORDS-WRITTEN
       END-WRITE

The INVALID KEY condition triggers if: - The slot is already occupied (file status 22) - The record number is outside the file's boundaries (file status 24)

See Example 01 (example-01-create-relative.cob) for a complete program that reads a sequential employee file and loads it into an RRDS using employee numbers as relative keys.

Reading a Relative Record (READ)

Random read -- set the key and read:

       MOVE 1042 TO WS-RELATIVE-KEY
       READ RELATIVE-FILE
           INVALID KEY
               DISPLAY 'NOT FOUND: ' WS-RELATIVE-KEY
           NOT INVALID KEY
               DISPLAY 'FOUND: ' EMP-NAME
       END-READ

Sequential read -- read the next occupied record:

       READ RELATIVE-FILE NEXT RECORD
           AT END
               SET END-OF-FILE TO TRUE
           NOT AT END
               DISPLAY 'RECORD ' WS-RELATIVE-KEY ': ' EMP-NAME
       END-READ

After a sequential read, WS-RELATIVE-KEY contains the record number of the record just read.

Updating a Record (REWRITE)

To update a relative record, you must first READ it, then REWRITE:

       MOVE 1042 TO WS-RELATIVE-KEY
       READ RELATIVE-FILE
           INVALID KEY
               DISPLAY 'NOT FOUND'
           NOT INVALID KEY
               MOVE 75000.00 TO EMP-SALARY
               REWRITE EMP-RECORD
                   INVALID KEY
                       DISPLAY 'REWRITE ERROR'
               END-REWRITE
       END-READ

The READ-before-REWRITE requirement exists because REWRITE replaces the record most recently read. Without a prior READ, the system does not know which record to replace.

Deleting a Record (DELETE)

With random access, set the key and delete:

       MOVE 1042 TO WS-RELATIVE-KEY
       DELETE RELATIVE-FILE
           INVALID KEY
               DISPLAY 'NOT FOUND: ' WS-RELATIVE-KEY
           NOT INVALID KEY
               DISPLAY 'DELETED RECORD ' WS-RELATIVE-KEY
       END-DELETE

DELETE marks the slot as empty. A subsequent random READ of that slot will return status 23. However, a new record can be written to that slot with WRITE.

See Example 02 (example-02-random-access.cob) for a complete program demonstrating all four operations driven by a transaction file.


13.5 Dynamic Access Mode with Relative Files

Dynamic access mode is the most powerful access mode because it lets you combine random lookups with sequential scanning in a single file opening. This is demonstrated in Example 03 (example-03-dynamic-access.cob).

The START Statement

The START statement positions the file pointer for subsequent sequential reads without actually reading a record:

       MOVE 2000 TO WS-RELATIVE-KEY
       START RELATIVE-FILE
           KEY IS NOT LESS THAN WS-RELATIVE-KEY
           INVALID KEY
               DISPLAY 'NO RECORDS AT OR ABOVE: ' WS-RELATIVE-KEY
           NOT INVALID KEY
               DISPLAY 'POSITIONED FOR SEQUENTIAL READ'
       END-START

Valid KEY conditions for START: - KEY IS EQUAL TO -- Position at the exact record - KEY IS GREATER THAN -- Position at the first record after the specified key - KEY IS NOT LESS THAN (or >=) -- Position at or after the specified key

After a successful START, use READ file-name NEXT RECORD to retrieve records sequentially from that position.

Practical Example: Range Query

A common pattern is to look up a range of records:

      * Position at the start of the range
       MOVE 2000 TO WS-RELATIVE-KEY
       START EMPLOYEE-FILE
           KEY IS NOT LESS THAN WS-RELATIVE-KEY
       END-START

      * Read sequentially until we pass the end of range
       PERFORM UNTIL WS-RELATIVE-KEY > 2999
           OR WS-FILE-STATUS = '10'
           READ EMPLOYEE-FILE NEXT RECORD
               AT END
                   CONTINUE
               NOT AT END
                   PERFORM PROCESS-EMPLOYEE
           END-READ
       END-PERFORM

13.6 File Status Codes for Relative Files

File status codes are essential for robust error handling. Here are the status codes most relevant to relative file processing:

Status Meaning When It Occurs
00 Successful operation Any operation completed normally
02 Duplicate key (non-fatal) Not applicable to relative files
10 End of file Sequential READ reaches end
22 Duplicate key WRITE to an occupied slot
23 Record not found READ or DELETE of empty/nonexistent slot
24 Boundary violation Key exceeds file size; disk full
30 Permanent I/O error Hardware or VSAM error
35 File not found OPEN fails because file does not exist
37 File type conflict File opened with wrong mode for its organization
39 File attribute conflict Record length or organization mismatch
41 File already open OPEN on an already-open file
42 File not open Operation on a file that is not open
43 READ required before REWRITE/DELETE No prior READ for sequential access
44 Record length error Record too large or too small
46 Read past end of file Sequential read attempted after AT END
47 READ on file not opened INPUT or I-O Wrong open mode
48 WRITE on file not opened OUTPUT/I-O/EXTEND Wrong open mode
49 DELETE/REWRITE on file not opened I-O Wrong open mode

Always check file status after every file operation. A robust pattern:

       READ RELATIVE-FILE
       EVALUATE WS-FILE-STATUS
           WHEN '00'
               PERFORM PROCESS-RECORD
           WHEN '10'
               SET END-OF-FILE TO TRUE
           WHEN '23'
               PERFORM RECORD-NOT-FOUND
           WHEN OTHER
               PERFORM UNEXPECTED-FILE-ERROR
       END-EVALUATE

13.7 Handling Empty Slots (Deleted Records)

Relative files inherently contain gaps. A file with capacity for 10,000 records might have only 6,000 occupied slots. This happens because:

  1. Sparse key values -- Employee numbers may not be contiguous (1001, 1003, 1007...)
  2. Deleted records -- Slots that once held data are now marked empty
  3. Intentional gaps -- Space reserved for future records

When reading sequentially, the system automatically skips empty slots. Your program receives only occupied records, and WS-RELATIVE-KEY tells you which slot each came from.

When reading randomly, an empty slot returns status 23. Your program must handle this gracefully:

       READ RELATIVE-FILE
           INVALID KEY
               IF WS-FILE-STATUS = '23'
                   DISPLAY 'SLOT ' WS-RELATIVE-KEY ' IS EMPTY'
               ELSE
                   DISPLAY 'ERROR: STATUS ' WS-FILE-STATUS
               END-IF
       END-READ

To scan for empty slots (useful for space utilization reports), use dynamic access with sequential reads and check the data content:

      * With dynamic access mode, read every slot including empty ones
      * by checking status after each attempted random read
       PERFORM VARYING WS-RELATIVE-KEY FROM 1 BY 1
           UNTIL WS-RELATIVE-KEY > 10000
           READ RELATIVE-FILE
               INVALID KEY
                   IF WS-FILE-STATUS = '23'
                       ADD 1 TO WS-EMPTY-COUNT
                   END-IF
               NOT INVALID KEY
                   ADD 1 TO WS-OCCUPIED-COUNT
           END-READ
       END-PERFORM

13.8 Advanced File Techniques

The remainder of this chapter covers file processing techniques that apply across all file organizations. These are the patterns that distinguish production-quality COBOL from student exercises.

13.8.1 Multiple File Processing

Real-world batch programs routinely have five, ten, or even twenty files open simultaneously. Example 04 (example-04-multi-file.cob) demonstrates processing with five files: three input (a relative employee master, a sequential department reference, and a sequential transaction file) and two output (a payroll report and an error file).

Key principles for multi-file programs:

Open all files in a controlled sequence. Open each file and check its status before proceeding. If any file fails to open, display a diagnostic message and terminate cleanly:

       OPEN INPUT EMPLOYEE-FILE
       IF WS-EMP-STATUS NOT = '00'
           DISPLAY 'CANNOT OPEN EMPLOYEE FILE: ' WS-EMP-STATUS
           STOP RUN
       END-IF

Close all files in the finalization paragraph. Even if processing ends early due to an error, close every file that was successfully opened. Unclosed VSAM files can be left in an inconsistent state.

Use separate file status fields. Each file must have its own status field. Using a shared status variable leads to bugs that are extremely hard to diagnose.

Load reference data into tables first. Small reference files (departments, codes, states) should be read into WORKING-STORAGE tables during initialization. This avoids repeated file I/O during the main processing loop.

13.8.2 File Matching and Merging Algorithms

File matching is the process of comparing records from two or more sorted files based on a common key. It is foundational to batch processing and forms the basis of master file update logic.

The balanced line algorithm (also called the "match-merge" algorithm) is the classic technique. It works on two sorted files simultaneously:

WHILE both files have records remaining:
    IF key-A < key-B:
        Process record from file A only (no match)
        Read next from file A
    ELSE IF key-A > key-B:
        Process record from file B only (no match)
        Read next from file B
    ELSE (keys equal):
        Process the matched pair
        Read next from both files

The critical implementation detail is end-of-file handling. When one file is exhausted, you must continue processing the remaining records from the other file. The standard technique is to use HIGH-VALUES as a sentinel:

       READ FILE-A
           AT END MOVE HIGH-VALUES TO KEY-A
       END-READ

When a file reaches EOF, its key becomes HIGH-VALUES (the highest possible value). Since no real key can exceed HIGH-VALUES, the algorithm naturally processes all remaining records from the other file.

Example 06 (example-06-file-comparison.cob) implements a complete file comparison using this algorithm, producing a difference report that identifies additions, deletions, and changes.

13.8.3 Master File Update Patterns

The master file update is the quintessential COBOL batch program. A master file (containing current state) is updated with a transaction file (containing changes). Transactions typically include:

  • Add -- Insert a new record
  • Change -- Modify fields of an existing record
  • Delete -- Remove a record (logically or physically)

The classic sequential master file update creates a new master by merging the old master with sorted transactions:

READ old-master
READ transaction
PERFORM UNTIL both files exhausted:
    IF old-key < tran-key:
        WRITE old-master-record TO new-master (unchanged)
        READ old-master
    ELSE IF old-key > tran-key:
        IF transaction is ADD
            WRITE transaction-record TO new-master
        ELSE
            ERROR: transaction for nonexistent record
        READ transaction
    ELSE (keys match):
        IF transaction is CHANGE
            Apply changes, WRITE updated record TO new-master
        ELSE IF transaction is DELETE
            Skip (do not write to new-master)
        ELSE IF transaction is ADD
            ERROR: duplicate record
        READ old-master
        READ transaction

For relative and indexed files, updates can be done in place using I-O mode (READ + REWRITE) rather than creating a new file. This is demonstrated in Example 02.

13.8.4 The Balanced Line Algorithm for File Comparison

The balanced line algorithm deserves special attention because it appears in so many production programs. The name comes from the idea of keeping the two files "in balance" -- always comparing records at the same logical position.

The algorithm's elegance is its simplicity: you only ever compare the current record from each file, and you advance whichever file had the smaller key (or both, if equal). This guarantees:

  • Every record from both files is examined exactly once
  • Processing requires only a single pass through each file
  • Memory usage is constant (only two records in memory at a time)

The HIGH-VALUES sentinel technique eliminates special-case code for end-of-file:

       01  WS-HIGH-KEY    PIC 9(10) VALUE 9999999999.

       2100-READ-OLD-MASTER.
           READ OLD-MASTER-FILE
               AT END
                   MOVE WS-HIGH-KEY TO WS-OLD-KEY
               NOT AT END
                   MOVE OLD-ACCOUNT-NUM TO WS-OLD-KEY
           END-READ.

When a file hits EOF, its key becomes a value guaranteed to be higher than any real key. The main comparison logic then naturally drains the remaining records from the other file without any special-case code.

13.8.5 OPEN EXTEND for Appending

OPEN EXTEND positions the file pointer after the last existing record, allowing new records to be appended without overwriting existing data. This is critical for audit trails, logs, and any file that accumulates data over time.

       OPEN EXTEND AUDIT-FILE
       WRITE AUDIT-RECORD
       CLOSE AUDIT-FILE

Compare with OPEN OUTPUT, which erases all existing data:

Operation Existing Records Write Position
OPEN OUTPUT Destroyed Beginning of file
OPEN EXTEND Preserved After last record
OPEN I-O Preserved As determined by READ/WRITE

For sequential files, OPEN EXTEND always writes after the last record. For relative files with sequential access, new records are written after the highest existing relative record number.

OPEN EXTEND is not valid for files opened with RANDOM access mode. To append to a relative file using random access, simply set the RELATIVE KEY to the next available slot number and WRITE.

Example 05 (example-05-extend-mode.cob) demonstrates OPEN EXTEND for both a sequential audit trail and a relative file.

In JCL, use DISP=MOD for non-VSAM sequential files to match the OPEN EXTEND behavior:

//AUDFILE  DD DSN=USERID.AUDIT.TRAIL,DISP=MOD

13.8.6 I-O Mode for In-Place Updates

Opening a file for I-O (OPEN I-O) allows reading, rewriting, and deleting records within the same open:

       OPEN I-O RELATIVE-FILE

       MOVE 1042 TO WS-RELATIVE-KEY
       READ RELATIVE-FILE
       MOVE 82000.00 TO EMP-SALARY
       REWRITE EMP-RECORD

       CLOSE RELATIVE-FILE

I-O mode is available for relative files and indexed files. It is the standard mode for transaction processing programs that need to update records in place rather than creating new output files.

Rules for I-O mode: - REWRITE must be preceded by a successful READ of the same record - The RELATIVE KEY must not change between READ and REWRITE - DELETE can be used with or without a prior READ (random access) - WRITE can add new records to empty slots (random access)

13.8.7 Multiple Record Types in One File (Using REDEFINES)

Some files contain different types of records identified by a record-type code. For example, a transaction file might contain header, detail, and trailer records. COBOL handles this with REDEFINES in the FD record description:

       FD  MULTI-TYPE-FILE ...
       01  GENERIC-RECORD.
           05  REC-TYPE            PIC X(1).
           05  REC-DATA            PIC X(79).

       01  HEADER-RECORD REDEFINES GENERIC-RECORD.
           05  HR-TYPE             PIC X(1).
               88 IS-HEADER       VALUE 'H'.
           05  HR-FILE-DATE        PIC 9(8).
           05  HR-RECORD-COUNT     PIC 9(7).
           05  HR-DESCRIPTION      PIC X(64).

       01  DETAIL-RECORD REDEFINES GENERIC-RECORD.
           05  DR-TYPE             PIC X(1).
               88 IS-DETAIL       VALUE 'D'.
           05  DR-ACCOUNT-NUM      PIC 9(10).
           05  DR-AMOUNT           PIC S9(9)V99.
           05  DR-DESCRIPTION      PIC X(57).

       01  TRAILER-RECORD REDEFINES GENERIC-RECORD.
           05  TR-TYPE             PIC X(1).
               88 IS-TRAILER      VALUE 'T'.
           05  TR-RECORD-COUNT     PIC 9(7).
           05  TR-TOTAL-AMOUNT     PIC S9(13)V99.
           05  TR-HASH-TOTAL       PIC 9(15).
           05  FILLER              PIC X(41).

After reading a record, examine the type code and use the appropriate REDEFINES view:

       READ MULTI-TYPE-FILE
       EVALUATE TRUE
           WHEN IS-HEADER
               PERFORM PROCESS-HEADER
           WHEN IS-DETAIL
               PERFORM PROCESS-DETAIL
           WHEN IS-TRAILER
               PERFORM VALIDATE-TRAILER
       END-EVALUATE

13.8.8 Header and Trailer Records

Production files typically include header and trailer records for data integrity:

Header records contain: - File creation date and time - Expected record count - File description or version identifier - Processing period (e.g., "PAYROLL 2024-01-15")

Trailer records contain: - Actual record count (for cross-checking) - Control totals (sum of key financial fields) - Hash totals (sum of key fields, used solely for verification)

The validation pattern:

      * Read and validate header
       READ MASTER-FILE
       IF NOT IS-HEADER
           DISPLAY 'ERROR: FIRST RECORD IS NOT A HEADER'
           STOP RUN
       END-IF
       MOVE HR-RECORD-COUNT TO WS-EXPECTED-COUNT

      * Process details, accumulating counts and totals
       PERFORM UNTIL END-OF-FILE
           READ MASTER-FILE
           EVALUATE TRUE
               WHEN IS-DETAIL
                   ADD 1 TO WS-ACTUAL-COUNT
                   ADD DR-AMOUNT TO WS-ACTUAL-TOTAL
               WHEN IS-TRAILER
                   PERFORM VALIDATE-TRAILER
                   SET END-OF-FILE TO TRUE
           END-EVALUATE
       END-PERFORM

      * Validate trailer
       IF WS-ACTUAL-COUNT NOT = TR-RECORD-COUNT
           DISPLAY 'COUNT MISMATCH: EXPECTED=' TR-RECORD-COUNT
               ' ACTUAL=' WS-ACTUAL-COUNT
       END-IF

13.8.9 Checkpoint/Restart Patterns for Long-Running Batch Jobs

Batch jobs that process millions of records can run for hours. If a job fails at record 5,000,000 of 10,000,000, you do not want to reprocess the first 5,000,000 records. Checkpoint/restart solves this.

The pattern: 1. Every N records (e.g., 10,000), write a checkpoint record containing the current position, running totals, and other state 2. On restart, read the checkpoint to determine where to resume 3. Position the input file to the checkpoint position 4. Continue processing from that point, initializing counters from the checkpoint values

       01  WS-CHECKPOINT-INTERVAL  PIC 9(7) VALUE 10000.
       01  WS-SINCE-CHECKPOINT     PIC 9(7) VALUE ZERO.

       2000-PROCESS-RECORD.
           (normal processing)
           ADD 1 TO WS-SINCE-CHECKPOINT
           IF WS-SINCE-CHECKPOINT >= WS-CHECKPOINT-INTERVAL
               PERFORM 2500-WRITE-CHECKPOINT
               MOVE ZERO TO WS-SINCE-CHECKPOINT
           END-IF.

       2500-WRITE-CHECKPOINT.
           OPEN OUTPUT CHECKPOINT-FILE
           MOVE current-key     TO CK-LAST-KEY
           MOVE record-count    TO CK-RECORDS-PROCESSED
           MOVE running-total   TO CK-RUNNING-TOTAL
           MOVE 'P'             TO CK-STATUS
           WRITE CHECKPOINT-RECORD
           CLOSE CHECKPOINT-FILE
           DISPLAY 'CHECKPOINT AT KEY: ' CK-LAST-KEY.

On z/OS, you can also use the system CHKPT macro for automatic checkpointing, but the application-level approach gives you more control and is portable across platforms.


13.9 VSAM Dataset Types: RRDS vs. KSDS vs. ESDS

VSAM ESDS (Entry-Sequenced Data Set)

An ESDS stores records in the order they are written, with no key-based access. Records are identified by their RBA (Relative Byte Address) -- the byte offset from the beginning of the file. ESDS is the VSAM equivalent of a sequential file, but with the added capability of direct access by RBA.

ESDS characteristics: - Records are added only at the end (no insertion in the middle) - Records cannot be deleted (only logically marked as inactive) - Records can be read sequentially or by RBA - Variable-length records are supported - Commonly used for log files and audit trails

IDCAMS definition:

  DEFINE CLUSTER (
           NAME(USERID.AUDIT.ESDS)
           NONINDEXED
           RECORDSIZE(100 200)
           SHAREOPTIONS(2 3) )

Choosing the Right VSAM Organization

Feature ESDS KSDS RRDS
Record access Sequential or by RBA By key, sequentially, or both By relative record number
Key type None (RBA only) Alphanumeric embedded key Numeric slot number
Duplicate keys N/A Optional (with AIX) N/A
Record deletion Logical only Physical (space reclaimed) Physical (slot marked empty)
Alternate keys No Yes (via AIX) No
Variable-length records Yes Yes No (fixed only)
Insertion order Append only Key order Any slot
Best for Logs, audit trails General-purpose keyed access Direct numeric lookups

13.10 Line Sequential Files (GnuCOBOL)

GnuCOBOL (and some other COBOL implementations outside the mainframe) supports an additional file organization: LINE SEQUENTIAL. This organization uses operating system text file conventions, with records delimited by newline characters rather than fixed-length slots.

       SELECT TEXT-FILE
           ASSIGN TO 'output.txt'
           ORGANIZATION IS LINE SEQUENTIAL
           FILE STATUS IS WS-STATUS.

Key differences from standard sequential files: - Records are terminated by newline characters (LF on Unix, CR+LF on Windows) - Trailing spaces are stripped on write and not restored on read - No BLOCK CONTAINS clause (blocking is handled by the OS) - Not available on z/OS (mainframe COBOL does not support this organization)

LINE SEQUENTIAL is useful for: - Reading and writing CSV files and text reports - Interfacing with other languages and tools that expect text files - Development and testing on desktop systems


13.11 EBCDIC vs. ASCII Sort Order

When files are processed across platforms, the difference in character encoding can cause subtle bugs in sort order and file comparison.

EBCDIC (z/OS mainframe): - Lowercase letters: a-z (hex 81-A9) - Uppercase letters: A-Z (hex C1-E9) - Digits: 0-9 (hex F0-F9) - Sort order: spaces < lowercase < uppercase < digits

ASCII (Unix, Windows, GnuCOBOL): - Digits: 0-9 (hex 30-39) - Uppercase letters: A-Z (hex 41-5A) - Lowercase letters: a-z (hex 61-7A) - Sort order: spaces < digits < uppercase < lowercase

This means that the same data sorted on a mainframe and on a PC will be in different order. A file containing "SMITH", "smith", and "123" would sort as:

EBCDIC order ASCII order
smith 123
SMITH SMITH
123 smith

Impact on file processing: - Files sorted on one platform may not be correctly sorted on another - The balanced line algorithm assumes both files are sorted in the same order - MERGE operations require consistent collating sequences - When migrating files between platforms, re-sort after transfer

COBOL provides the PROGRAM COLLATING SEQUENCE IS clause and the ALPHABET clause in SPECIAL-NAMES to override the default collating sequence, but the safest approach is to sort files on the same platform where they will be processed.


13.12 JCL Techniques for File Processing

File Concatenation

JCL allows multiple datasets to be concatenated under a single DD name. The COBOL program sees them as one continuous file:

//INPUT    DD DSN=USERID.DAILY.TRANS.MON,DISP=SHR
//         DD DSN=USERID.DAILY.TRANS.TUE,DISP=SHR
//         DD DSN=USERID.DAILY.TRANS.WED,DISP=SHR
//         DD DSN=USERID.DAILY.TRANS.THU,DISP=SHR
//         DD DSN=USERID.DAILY.TRANS.FRI,DISP=SHR

The program reads through all five files as if they were one. At the end of each dataset, VSAM automatically opens the next one. Important rules: - All datasets must have compatible DCB attributes (RECFM, LRECL) - The first DD sets the block size; subsequent datasets should have equal or smaller block sizes - Concatenation works only for sequential input files

Generation Data Groups (GDG)

GDGs provide automatic file versioning. Each time you create a new "generation," the previous generation is preserved with its version number:

//* Define the GDG base
//DEFGDG   EXEC PGM=IDCAMS
//SYSPRINT DD SYSOUT=*
//SYSIN    DD *
  DEFINE GDG (NAME(USERID.MASTER.FILE) LIMIT(7) SCRATCH)
/*

//* Create a new generation
//NEWGEN   DD DSN=USERID.MASTER.FILE(+1),
//         DISP=(NEW,CATLG,DELETE),
//         UNIT=SYSDA,SPACE=(CYL,(10,5)),
//         DCB=(RECFM=FB,LRECL=100,BLKSIZE=0)

//* Read the current (most recent) generation
//CURGEN   DD DSN=USERID.MASTER.FILE(0),DISP=SHR

//* Read the previous generation
//PREVGEN  DD DSN=USERID.MASTER.FILE(-1),DISP=SHR

GDGs are heavily used for master file updates: - Read current master: (0) - Write new master: (+1) - If the job fails, the current master (0) is still intact - Previous versions can be used for recovery

Temporary Datasets

Temporary datasets exist only for the duration of the job. They are identified by a double ampersand (&&) prefix:

//SORTOUT  DD DSN=&&SORTED,DISP=(NEW,PASS),
//         UNIT=SYSDA,SPACE=(CYL,(5,2)),
//         DCB=(RECFM=FB,LRECL=100,BLKSIZE=0)

DISP=(NEW,PASS) creates the dataset and passes it to subsequent steps. DISP=(OLD,DELETE) in a later step uses and then deletes it.

Temporary datasets are ideal for intermediate results in multi-step jobs (e.g., sort output that feeds into a comparison program).


13.13 Error Handling and Recovery Patterns

Defensive File Processing

Production programs must handle every possible file error gracefully:

       01  WS-FILE-STATUS     PIC XX.
           88  FS-OK          VALUE '00'.
           88  FS-EOF         VALUE '10'.
           88  FS-DUP-KEY     VALUE '22'.
           88  FS-NOT-FOUND   VALUE '23'.
           88  FS-BOUNDARY    VALUE '24'.
           88  FS-PERM-ERROR  VALUE '30'.

       9100-CHECK-FILE-STATUS.
           EVALUATE TRUE
               WHEN FS-OK
                   CONTINUE
               WHEN FS-EOF
                   SET END-OF-FILE TO TRUE
               WHEN FS-DUP-KEY
                   ADD 1 TO WS-DUP-KEY-CT
                   PERFORM 9200-LOG-ERROR
               WHEN FS-NOT-FOUND
                   ADD 1 TO WS-NOT-FOUND-CT
                   PERFORM 9200-LOG-ERROR
               WHEN FS-PERM-ERROR
                   DISPLAY 'PERMANENT I/O ERROR - STATUS: '
                       WS-FILE-STATUS
                   PERFORM 9900-ABNORMAL-END
               WHEN OTHER
                   DISPLAY 'UNEXPECTED FILE STATUS: '
                       WS-FILE-STATUS
                   PERFORM 9900-ABNORMAL-END
           END-EVALUATE.

Recovery Pattern: Count and Continue

For non-fatal errors (duplicate keys, records not found), the standard pattern is to: 1. Count the error 2. Log it to an error file 3. Continue processing

Set a threshold: if errors exceed N% of records processed, terminate the job. This prevents runaway processing with bad data:

       IF WS-ERROR-CT > (WS-RECORDS-READ * 0.05)
           DISPLAY 'ERROR RATE EXCEEDS 5% - ABORTING'
           PERFORM 9900-ABNORMAL-END
       END-IF

Recovery Pattern: Graceful Shutdown

When a fatal error occurs, close all files before terminating. Leaving VSAM files unclosed can cause data corruption:

       9900-ABNORMAL-END.
           DISPLAY 'ABNORMAL END - CLOSING ALL FILES'
           CLOSE INPUT-FILE
                 OUTPUT-FILE
                 ERROR-FILE
                 RELATIVE-FILE
           MOVE 16 TO RETURN-CODE
           STOP RUN.

13.14 Performance Comparison: Sequential vs. Indexed vs. Relative

Understanding performance characteristics helps you design efficient batch systems.

Sequential File Performance

  • Full-file scan: Fastest. Records are stored contiguously on disk. The operating system reads ahead (prefetch), so the next record is usually already in the buffer.
  • Direct access: Not possible without reading from the beginning.
  • Insert: Fast (append at end), but inserting in the middle requires rewriting the entire file.

Indexed File (KSDS) Performance

  • Full-file scan: Moderate. Records are in key order within control intervals, but CIs may not be physically contiguous (CI/CA splits).
  • Direct access: 3-4 I/O operations (sequence set index + data component). With buffering, frequently accessed index levels may be cached.
  • Insert: Moderate. May trigger CI splits (move half the records to a new CI) or CA splits (allocate a new control area).

Relative File (RRDS) Performance

  • Full-file scan: Moderate to slow. The file may contain many empty slots that must be skipped. If only 10% of slots are occupied, 90% of the I/O is wasted reading empty slots.
  • Direct access: Fastest -- exactly 1 I/O operation. No index traversal.
  • Insert: Fast if the slot is known. No index maintenance.

Practical Guidelines

  1. For batch reporting (read every record): Use sequential files. Sort first if needed.
  2. For online inquiry (random lookups): Use indexed files (most flexible) or relative files (fastest, if key is numeric).
  3. For master file updates (read master + transactions): Use indexed files in I-O mode, or sequential old-master/new-master pattern.
  4. For hash tables (O(1) lookup): Use relative files with a hash function.
  5. For audit trails (append-only): Use sequential files or ESDS.
  6. For hybrid access (some random, some sequential): Use dynamic access mode on indexed or relative files.

13.15 Complete Examples Reference

This chapter includes six complete, working code examples with companion JCL:

Example Program Description
01 example-01-create-relative.cob Create a VSAM RRDS by loading employee records from a sequential file
02 example-02-random-access.cob Random READ, WRITE, REWRITE, and DELETE on a relative file
03 example-03-dynamic-access.cob Dynamic access: single lookups, range scans, and forward scans
04 example-04-multi-file.cob Five files open simultaneously for payroll processing
05 example-05-extend-mode.cob OPEN EXTEND for appending to sequential and relative files
06 example-06-file-comparison.cob Balanced line algorithm for comparing two sorted files

Each .cob file has a companion .jcl file that includes VSAM definition (where applicable), compile, link-edit, and execution steps.


Summary

Relative file processing gives COBOL programmers a powerful tool for situations where direct, constant-time access by numeric key is required. The combination of ORGANIZATION IS RELATIVE, the RELATIVE KEY clause, and the three access modes (SEQUENTIAL, RANDOM, DYNAMIC) provides all the operations needed for creating, reading, updating, and deleting records.

The advanced file techniques covered in this chapter -- multi-file processing, the balanced line algorithm, master file updates, header/trailer validation, OPEN EXTEND, checkpoint/restart, and multiple record types -- represent the core competencies expected of production COBOL programmers in enterprise environments.

When choosing among file organizations, consider the access patterns your program requires, the nature of your keys, and the trade-offs between speed, space, and flexibility. Sequential files are best for full-file batch processing. Indexed files (VSAM KSDS) are the general-purpose choice for keyed access. Relative files (VSAM RRDS) deliver the fastest direct access but require numeric keys and may waste space with sparse key distributions.

In the next chapter, we will explore the SORT and MERGE statements, which are closely related to the file matching and comparison techniques introduced here.