Chapter 14: Sort and Merge Operations -- Key Takeaways

Chapter Summary

Sorting and merging are among the most common operations in mainframe batch processing, and COBOL provides built-in language support for both through the SORT and MERGE statements. This chapter explored how the SORT statement reorders records from one or more input files according to specified key fields, producing a sorted output file. The simplest form uses the USING and GIVING phrases to name the input and output files directly, while the more powerful INPUT PROCEDURE and OUTPUT PROCEDURE give the programmer full control over which records enter the sort, how they are transformed, and what happens to them after sorting.

The MERGE statement was introduced as a specialized operation for combining two or more files that are already sorted on the same key into a single sorted output. Unlike SORT, MERGE does not support an INPUT PROCEDURE because the input files must already be in order; however, it does support an OUTPUT PROCEDURE for post-merge processing. We examined how MERGE simplifies workflows that would otherwise require concatenating files and then sorting the combined result, saving both processing time and sort work space.

The chapter also covered the practical mainframe aspects of sort operations: the JCL SORTWK DD statements that allocate temporary work space for the sort utility, the integration between COBOL's SORT verb and external sort products such as IBM DFSORT and SyncSort (now Precisely), and performance considerations including sort key placement, record length, and the number of sort work data sets. Understanding these operational details is essential for writing sort programs that perform well in production batch windows.

Key Concepts

The SORT statement reorders records from one or more input files based on ascending or descending key fields and writes the result to an output file or procedure.
Sort keys are declared using the ASCENDING KEY and DESCENDING KEY phrases on the SORT statement, referencing fields defined in the SD (Sort Description) record.
Multiple sort keys establish a hierarchy: the first key is the major sort field, the second is the minor sort field within equal values of the first, and so on.
The SD (Sort-Description) entry in the FILE SECTION defines the sort work file and its record layout, analogous to an FD entry for regular files.
USING names one or more input files whose records are fed into the sort automatically; GIVING names the output file that receives the sorted records.
INPUT PROCEDURE names a section or paragraph range that executes before sorting begins, allowing the program to filter, transform, or generate records using the RELEASE statement.
The RELEASE statement writes a record to the sort work file from within an INPUT PROCEDURE, functioning as the sort-equivalent of WRITE.
OUTPUT PROCEDURE names a section or paragraph range that executes after sorting completes, allowing the program to process sorted records one at a time using the RETURN statement.
The RETURN statement reads the next sorted record from the sort work file within an OUTPUT PROCEDURE, functioning as the sort-equivalent of READ, with AT END detecting the last record.
The MERGE statement combines two or more pre-sorted input files into a single sorted output, preserving the existing order and interleaving records by key value.
MERGE supports USING (required, with at least two files) and GIVING or OUTPUT PROCEDURE, but does not support INPUT PROCEDURE because inputs must already be sorted.
DFSORT and SyncSort are the two dominant external sort products on z/OS; COBOL's SORT verb invokes these utilities internally through the sort interface module.
JCL SORTWK01 through SORTWKnn DD statements allocate temporary disk space for the sort utility to use during the sorting process; insufficient SORTWK space causes sort failures.
The SORT statement's DUPLICATES IN ORDER phrase (or the equivalent installation default) preserves the original input order of records with equal sort keys.
WITH DUPLICATES IN ORDER is important for stable sorts where the relative position of equal-key records must be predictable and repeatable.

Common Pitfalls

Forgetting to define SORTWK DD statements in JCL: The SORT verb requires temporary work space allocated through JCL. Missing SORTWK DDs cause the sort to fail at runtime with a sort utility error, not a COBOL file status error.
Using INPUT PROCEDURE with MERGE: MERGE does not support INPUT PROCEDURE. Attempting to use one causes a compilation error. If you need to filter or transform records before merging, sort each input file separately first or preprocess them in a prior step.
Releasing records outside of INPUT PROCEDURE: The RELEASE statement is only valid within an INPUT PROCEDURE section. Using RELEASE elsewhere in the program causes a compilation error.
Returning records outside of OUTPUT PROCEDURE: Similarly, the RETURN statement is only valid within an OUTPUT PROCEDURE section. Attempting to use it in the main procedure division causes a compilation error.
Mismatching sort key definitions with actual data: The key fields referenced in the SORT statement must be defined within the SD record description. If the key field positions do not align with the actual data layout, records will be sorted on garbage values, producing silently incorrect output.
Not handling the RETURN AT END condition: Failing to test AT END when using RETURN in an OUTPUT PROCEDURE causes the program to attempt to read past the end of sorted data, resulting in an abend or unpredictable behavior.
Sorting large files with insufficient SORTWK space: Production sort jobs that process millions of records require adequate temporary disk space. Under-allocating SORTWK DDs causes the sort to fail midway through processing, wasting the entire batch window time spent up to that point.
Assuming MERGE input files are sorted: MERGE trusts that all input files are already in the specified key order. If an input file is out of order, MERGE produces silently incorrect output rather than raising an error in most implementations.

Quick Reference

      * SD entry for sort work file
       SD  SORT-FILE.
       01  SORT-RECORD.
           05  SORT-DEPT         PIC X(03).
           05  SORT-EMPID        PIC 9(06).
           05  SORT-NAME         PIC X(30).
           05  SORT-SALARY       PIC 9(07)V99.

      * Simple SORT with USING and GIVING
           SORT SORT-FILE
               ON ASCENDING KEY SORT-DEPT
               ON DESCENDING KEY SORT-SALARY
               USING INPUT-FILE
               GIVING OUTPUT-FILE.

      * SORT with INPUT PROCEDURE and OUTPUT PROCEDURE
           SORT SORT-FILE
               ON ASCENDING KEY SORT-DEPT
               ON ASCENDING KEY SORT-EMPID
               INPUT PROCEDURE IS 1000-SELECT-RECORDS
               OUTPUT PROCEDURE IS 2000-WRITE-REPORT.

      * INPUT PROCEDURE -- filter and RELEASE
       1000-SELECT-RECORDS SECTION.
           OPEN INPUT RAW-FILE
           PERFORM UNTIL WS-EOF = "Y"
               READ RAW-FILE INTO WS-RAW-RECORD
                   AT END
                       MOVE "Y" TO WS-EOF
                   NOT AT END
                       IF WS-RAW-STATUS = "A"
                           MOVE WS-RAW-RECORD TO SORT-RECORD
                           RELEASE SORT-RECORD
                       END-IF
               END-READ
           END-PERFORM
           CLOSE RAW-FILE.

      * OUTPUT PROCEDURE -- RETURN and process
       2000-WRITE-REPORT SECTION.
           OPEN OUTPUT REPORT-FILE
           PERFORM UNTIL WS-SORT-EOF = "Y"
               RETURN SORT-FILE INTO WS-SORTED-RECORD
                   AT END
                       MOVE "Y" TO WS-SORT-EOF
                   NOT AT END
                       PERFORM 2100-FORMAT-LINE
                       WRITE REPORT-RECORD
               END-RETURN
           END-PERFORM
           CLOSE REPORT-FILE.

      * MERGE two pre-sorted files
           MERGE SORT-FILE
               ON ASCENDING KEY SORT-DEPT
               ON ASCENDING KEY SORT-EMPID
               USING SORTED-FILE-A
                     SORTED-FILE-B
               GIVING MERGED-FILE.

What's Next

Chapter 15 introduces the Report Writer module, a specialized COBOL feature designed to automate the production of formatted reports. You will learn how to define report layouts using the REPORT SECTION, RD entries, and report group definitions (report headings, page headings, detail lines, control headings, control footings, page footings, and report footings). The GENERATE, INITIATE, and TERMINATE statements replace the manual line-counting, page-break logic, and accumulator management that you would otherwise have to code by hand. Report Writer builds naturally on the sorted output you learned to produce in this chapter, since most reports require their input data to be sorted by control break fields.