25 min read

In This Chapter

Introduction: Why Sorting Matters on the Mainframe
14.1 The Sort Description (SD) Entry
14.2 The SORT Statement: Basic Syntax
14.3 Simple Sort: USING and GIVING
14.4 INPUT PROCEDURE: Controlling What Gets Sorted
14.5 OUTPUT PROCEDURE: Processing Sorted Records
14.6 Using Both INPUT and OUTPUT PROCEDURE
14.7 Multiple Sort Keys: Mixed Ascending and Descending
14.8 WITH DUPLICATES IN ORDER (Stable Sort)
14.9 SORT Special Registers
14.10 The MERGE Statement
14.11 Collating Sequence and Sort Order
14.12 JCL Considerations for Sort Programs
14.13 DFSORT Overview: IBM's Sort Utility
14.14 GnuCOBOL Sort Implementation
14.15 Performance Optimization
14.16 External vs. Internal Sort
14.17 Common Sort Patterns
14.18 Error Handling for Sort Operations
14.19 Complete Multi-Key Sort Example (Example 06)
14.20 Summary

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

Chapter 14: Sort and Merge Operations

Introduction: Why Sorting Matters on the Mainframe

Sorting is one of the most fundamental operations in batch data processing. Industry estimates suggest that approximately 25 percent of all mainframe CPU cycles are consumed by sort operations. Every bank that posts daily transactions, every insurance company that processes claims, every retailer that reconciles inventory -- they all depend on sorting as the essential precursor to sequential batch processing.

The reason is architectural. Mainframe batch processing follows a pattern established decades ago: data arrives in an unpredictable order from many sources (ATMs, teller terminals, web transactions, file transfers), and it must be reorganized into a meaningful sequence before it can be processed efficiently. Posting transactions to accounts requires them sorted by account number. Printing reports requires data sorted by the report's grouping hierarchy. Matching two files requires both to be in the same key order. Deduplication requires adjacent duplicates, which means sorted data.

COBOL provides two built-in statements for ordering data: SORT and MERGE. The SORT statement rearranges records from one or more input sources into a specified sequence. The MERGE statement combines two or more pre-sorted files into a single ordered file. Both statements integrate tightly with the operating system's sort utility (DFSORT on IBM z/OS, or SYNCSORT) to deliver high-performance sorting that can handle files containing billions of records.

This chapter covers the complete sort and merge facility in COBOL, from the simplest SORT...USING...GIVING form to advanced techniques using INPUT PROCEDURE and OUTPUT PROCEDURE sections that give you complete programmatic control over what enters and exits the sort process.

14.1 The Sort Description (SD) Entry

Before you can use the SORT statement, you must define a sort work file in the DATA DIVISION. This file serves as the intermediate workspace where the sort utility stores records during the sorting process. The sort work file is defined using the SD (Sort Description) level indicator, which parallels the FD (File Description) used for regular files.

Syntax

       SD  sort-file-name
           RECORD CONTAINS integer CHARACTERS.

       01  sort-record-name.
           05  sort-key-1        PIC X(10).
           05  sort-key-2        PIC 9(05).
           05  other-fields      PIC X(65).

Key Differences Between SD and FD

Feature	FD (File Description)	SD (Sort Description)
Level indicator	FD	SD
BLOCK CONTAINS	Yes	No (not applicable)
LABEL RECORDS	Yes	No (not applicable)
RECORDING MODE	Yes	No (not applicable)
RECORD CONTAINS	Yes	Yes
SELECT clause assigns to	DD name / file name	Sort work DD (SORTWK)
Opened/Closed by	Your program (OPEN/CLOSE)	The SORT/MERGE statement

The SD entry does not include BLOCK CONTAINS, LABEL RECORDS, or RECORDING MODE clauses because the sort work file is managed entirely by the sort utility. You never explicitly OPEN, READ, WRITE, or CLOSE a sort work file -- the SORT and MERGE statements handle all of that internally.

The SELECT Clause for Sort Files

In the FILE-CONTROL paragraph, the sort work file is assigned to a DD name that corresponds to SORTWK DD statements in the JCL:

       FILE-CONTROL.
           SELECT SORT-WORK-FILE
               ASSIGN TO SORTWK01.

The sort utility may use multiple work files (SORTWK01 through SORTWK33) depending on the volume of data. The COBOL program only references one logical sort file; the JCL provides the physical work datasets.

Record Layout Requirements

The record layout under the SD entry must include all fields that will be used as sort keys. The sort keys must be defined at the positions and with the data types that match the actual data. The sort utility uses these field definitions to determine how to compare records.

       SD  SORT-WORK-FILE
           RECORD CONTAINS 80 CHARACTERS.

       01  SORT-RECORD.
           05  SR-DEPARTMENT      PIC X(04).    *> Sort key 1
           05  SR-EMPLOYEE-ID     PIC X(06).    *> Sort key 2
           05  SR-EMPLOYEE-NAME   PIC X(30).
           05  SR-SALARY          PIC 9(07)V99.
           05  FILLER             PIC X(31).

14.2 The SORT Statement: Basic Syntax

The SORT statement is the workhorse of COBOL data ordering. Its general syntax is:

SORT sort-file-name
    ON {ASCENDING|DESCENDING} KEY data-name-1 [data-name-2 ...]
    [ON {ASCENDING|DESCENDING} KEY data-name-3 ...]
    [WITH DUPLICATES IN ORDER]
    [COLLATING SEQUENCE IS alphabet-name]
    {USING file-name-1 [file-name-2 ...] | INPUT PROCEDURE IS section-name}
    {GIVING file-name-3 [file-name-4 ...] | OUTPUT PROCEDURE IS section-name}

Let us break down each component.

ON ASCENDING/DESCENDING KEY

The KEY clause specifies which fields to sort on and in what direction:

ASCENDING KEY: Sorts from lowest to highest value (A before Z, 0 before 9)
DESCENDING KEY: Sorts from highest to lowest value (Z before A, 9 before 0)

Multiple keys create a hierarchy. The first key specified is the major (primary) key. Subsequent keys are minor (secondary, tertiary, etc.) keys. Records are first ordered by the major key; records with the same major key value are then ordered by the first minor key, and so on.

           SORT SORT-WORK-FILE
               ON ASCENDING KEY SR-DEPARTMENT
               ON ASCENDING KEY SR-EMPLOYEE-ID

In this example, records are first sorted by department. Within each department, records are sorted by employee ID. Both are in ascending order.

You can mix ascending and descending directions:

           SORT SORT-WORK-FILE
               ON ASCENDING  KEY SR-DEPARTMENT
               ON DESCENDING KEY SR-SALARY
               ON ASCENDING  KEY SR-EMPLOYEE-NAME

This sorts by department (A to Z), then within each department by salary (highest first), then by name (A to Z) for employees with the same salary.

Data Name Requirements

Each key data-name must be defined within the SD record description. The sort utility uses the byte position, length, and data type of these fields to perform comparisons. Valid data types for sort keys include:

Alphanumeric (PIC X): Compared character by character using the collating sequence
Numeric display (PIC 9): Compared as numeric values
Packed decimal (PIC 9 COMP-3): Compared as numeric values
Binary (PIC 9 COMP or COMP-4): Compared as numeric values

14.3 Simple Sort: USING and GIVING

The simplest form of the SORT statement uses USING to specify the input file and GIVING to specify the output file. The sort utility handles all file I/O automatically.

USING Clause

The USING clause names one or more input files whose records will be sorted. When the SORT statement executes, it automatically opens each USING file, reads all records, and closes the file. You must not have the USING file open when the SORT executes.

           SORT SORT-WORK-FILE
               ON ASCENDING KEY SR-DEPARTMENT
               ON ASCENDING KEY SR-EMPLOYEE-ID
               USING UNSORTED-FILE
               GIVING SORTED-FILE

You can specify multiple USING files. All records from all files are combined and sorted together:

           SORT SORT-WORK-FILE
               ON ASCENDING KEY SR-ACCOUNT-NUMBER
               USING TRANS-FILE-1
                     TRANS-FILE-2
                     TRANS-FILE-3
               GIVING COMBINED-SORTED-FILE

GIVING Clause

The GIVING clause names the output file where sorted records are written. The sort utility automatically opens the GIVING file, writes all sorted records, and closes the file.

Complete Example (Example 01)

The following program sorts an employee file by department and employee ID. This is the simplest possible sort program. See code/example-01-basic-sort.cob for the full listing.

       PROCEDURE DIVISION.
       0000-MAIN-PROGRAM.
           SORT SORT-WORK-FILE
               ON ASCENDING KEY WK-DEPARTMENT
               ON ASCENDING KEY WK-EMPLOYEE-ID
               USING  UNSORTED-FILE
               GIVING SORTED-FILE

           IF SORT-RETURN = ZERO
               DISPLAY 'SORT SUCCESSFUL'
           ELSE
               DISPLAY 'SORT FAILED'
           END-IF
           STOP RUN.

The entire sort operation -- opening the input file, reading all records, sorting them, writing to the output file, and closing both files -- is performed by that single SORT statement. The SORT-RETURN special register contains the return code: zero for success, non-zero for failure.

JCL Requirements

The JCL for a COBOL sort program must include:

DD statements for the input and output files (matching the ASSIGN TO names)
SORTWK DD statements for sort work space (temporary datasets)
SORTLIB DD pointing to the sort utility library

//SORTWK01 DD UNIT=SYSDA,SPACE=(CYL,(10,5))
//SORTWK02 DD UNIT=SYSDA,SPACE=(CYL,(10,5))
//SORTWK03 DD UNIT=SYSDA,SPACE=(CYL,(10,5))

The general rule for SORTWK sizing is to allocate two to three times the input file size, distributed across the SORTWK DD statements. Three SORTWK DDs is a common starting point; very large sorts may benefit from six or more.

14.4 INPUT PROCEDURE: Controlling What Gets Sorted

The USING clause feeds every record from the input file into the sort. But what if you need to filter records, transform data, or combine records from multiple sources with different layouts before sorting? That is where the INPUT PROCEDURE comes in.

An INPUT PROCEDURE is a SECTION in the PROCEDURE DIVISION that executes before the sort begins. Within this section, you read input files yourself and use the RELEASE statement to feed selected records to the sort.

The RELEASE Statement

The RELEASE statement sends a single record to the sort work file. It is the INPUT PROCEDURE equivalent of WRITE for a regular file.

           RELEASE sort-record-name
           RELEASE sort-record-name FROM identifier

The first form releases the current contents of the sort record (defined under the SD entry). The second form copies data from identifier into the sort record and then releases it -- similar to WRITE record FROM identifier.

Structure of an INPUT PROCEDURE

An INPUT PROCEDURE must be a complete SECTION. It must explicitly OPEN, READ, and CLOSE all input files. Only records that are RELEASEd enter the sort.

       SORT SORT-WORK-FILE
           ON ASCENDING KEY SR-DEPARTMENT
           ON ASCENDING KEY SR-SALARY
           INPUT PROCEDURE IS 1000-FILTER-AND-RELEASE
           GIVING SORTED-OUTPUT-FILE

The section referenced by INPUT PROCEDURE:

       1000-FILTER-AND-RELEASE SECTION.
           OPEN INPUT INPUT-FILE
           PERFORM UNTIL END-OF-FILE
               READ INPUT-FILE
                   AT END SET END-OF-FILE TO TRUE
                   NOT AT END
                       IF RECORD-IS-VALID
                           RELEASE SORT-RECORD FROM INPUT-RECORD
                       END-IF
               END-READ
           END-PERFORM
           CLOSE INPUT-FILE.
       1000-EXIT.
           EXIT.

Filtering Records

The most common use of INPUT PROCEDURE is filtering. By choosing which records to RELEASE, you control what enters the sort:

      *    Only RELEASE active employees with salary >= threshold
           IF IR-STATUS = 'A'
           AND IR-SALARY >= WS-SALARY-THRESHOLD
               RELEASE SORT-RECORD FROM INPUT-RECORD
               ADD 1 TO WS-RECORDS-RELEASED
           ELSE
               ADD 1 TO WS-RECORDS-FILTERED
           END-IF

Transforming Records

You can also modify records before RELEASEing them:

      *    Convert name to uppercase before sorting
           INSPECT IR-EMPLOYEE-NAME CONVERTING
               'abcdefghijklmnopqrstuvwxyz'
               TO 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

      *    Calculate a derived field
           COMPUTE IR-ANNUAL-BONUS = IR-SALARY * 0.10

           RELEASE SORT-RECORD FROM INPUT-RECORD

Complete Example (Example 02)

See code/example-02-sort-input-proc.cob for a complete program that filters employees by status and salary threshold, converts names to uppercase, and sorts the qualifying records by department and salary. The companion JCL is in code/example-02-sort-input-proc.jcl.

Rules for INPUT PROCEDURE

The INPUT PROCEDURE must be a SECTION (not a paragraph)
The input files must be explicitly OPENed and CLOSEd within the section
You must not execute a SORT or MERGE statement within an INPUT PROCEDURE
The RELEASE statement can only be used within an INPUT PROCEDURE
Control must not pass to any code outside the INPUT PROCEDURE section (except via PERFORM)
When the INPUT PROCEDURE section ends, the sort begins processing the released records

14.5 OUTPUT PROCEDURE: Processing Sorted Records

While GIVING writes every sorted record directly to an output file, an OUTPUT PROCEDURE gives you control over how sorted records are processed after the sort completes. The OUTPUT PROCEDURE retrieves sorted records one at a time using the RETURN statement.

The RETURN Statement

The RETURN statement retrieves the next sorted record from the sort work file. It is the OUTPUT PROCEDURE equivalent of READ for a regular file.

           RETURN sort-file-name
               AT END imperative-statement
               NOT AT END imperative-statement
           END-RETURN

           RETURN sort-file-name INTO identifier
               AT END imperative-statement
               NOT AT END imperative-statement
           END-RETURN

The AT END condition is triggered when all sorted records have been returned. The INTO clause copies the sort record into a working-storage area, analogous to READ file INTO identifier.

Structure of an OUTPUT PROCEDURE

An OUTPUT PROCEDURE must be a complete SECTION. It must RETURN all sorted records (or at least continue RETURNing until AT END is triggered). Output files must be explicitly OPENed, WRITTEn, and CLOSEd.

       SORT SORT-WORK-FILE
           ON ASCENDING KEY SR-DEPARTMENT
           ON ASCENDING KEY SR-EMPLOYEE-NAME
           USING INPUT-FILE
           OUTPUT PROCEDURE IS 2000-PROCESS-SORTED-RECORDS

Common OUTPUT PROCEDURE Patterns

Control Break Processing

The most powerful use of OUTPUT PROCEDURE is control break reporting on sorted data:

       2000-PROCESS-SORTED-RECORDS SECTION.
           OPEN OUTPUT REPORT-FILE
           PERFORM 2100-RETURN-AND-PROCESS
               UNTIL END-OF-SORT
           IF RECORDS-PROCESSED > ZERO
               PERFORM PRINT-DEPT-TOTAL
               PERFORM PRINT-GRAND-TOTAL
           END-IF
           CLOSE REPORT-FILE.
       2000-EXIT.
           EXIT.

       2100-RETURN-AND-PROCESS SECTION.
           RETURN SORT-WORK-FILE INTO WS-RECORD
               AT END
                   SET END-OF-SORT TO TRUE
               NOT AT END
                   IF DEPT-CHANGED
                       PERFORM PRINT-DEPT-TOTAL
                       INITIALIZE DEPT-ACCUMULATORS
                   END-IF
                   ADD SALARY TO DEPT-TOTAL
                   PERFORM PRINT-DETAIL
           END-RETURN.

Deduplication

After sorting, duplicate records are adjacent, making deduplication straightforward:

      *    If this record matches the previous on key fields,
      *    it is a duplicate -- skip it
           IF WR-ACCOUNT = WS-PREV-ACCOUNT
           AND WR-DATE   = WS-PREV-DATE
           AND WR-AMOUNT = WS-PREV-AMOUNT
               ADD 1 TO WS-DUPLICATE-COUNT
           ELSE
               WRITE OUTPUT-RECORD FROM WS-RECORD
               ADD 1 TO WS-WRITTEN-COUNT
               MOVE WR-ACCOUNT TO WS-PREV-ACCOUNT
               MOVE WR-DATE    TO WS-PREV-DATE
               MOVE WR-AMOUNT  TO WS-PREV-AMOUNT
           END-IF

Complete Example (Example 03)

See code/example-03-sort-output-proc.cob for a complete program that sorts employees by department and name, then uses an OUTPUT PROCEDURE to generate a department salary summary report with control break totals. The companion JCL is in code/example-03-sort-output-proc.jcl.

Rules for OUTPUT PROCEDURE

The OUTPUT PROCEDURE must be a SECTION
Output files must be explicitly OPENed and CLOSEd within the section
You must not execute a SORT or MERGE statement within an OUTPUT PROCEDURE
The RETURN statement can only be used within an OUTPUT PROCEDURE
Control must not pass outside the OUTPUT PROCEDURE section (except via PERFORM)
You should RETURN records until AT END is reached, even if you stop writing output early

14.6 Using Both INPUT and OUTPUT PROCEDURE

For maximum control, you can use both an INPUT PROCEDURE and an OUTPUT PROCEDURE in the same SORT statement. This gives you a complete pipeline:

INPUT PROCEDURE: Read, validate, filter, transform, and RELEASE records
SORT: The sort utility orders the released records
OUTPUT PROCEDURE: RETURN sorted records, deduplicate, generate reports

           SORT SORT-WORK-FILE
               ON ASCENDING KEY SR-ACCOUNT-NUMBER
               ON ASCENDING KEY SR-TRANS-DATE
               INPUT PROCEDURE IS 1000-VALIDATE-INPUT
               OUTPUT PROCEDURE IS 2000-DEDUP-OUTPUT

Complete Example (Example 04)

See code/example-04-sort-both-proc.cob for a program that demonstrates this pattern in a banking scenario. The INPUT PROCEDURE validates transactions (checking account numbers, transaction types, amounts, and dates), rejecting invalid records to an error report. The OUTPUT PROCEDURE detects and removes duplicate transactions and writes the clean, sorted output. The companion JCL is in code/example-04-sort-both-proc.jcl.

Execution Flow

The execution order when both procedures are specified:

The SORT statement begins
The INPUT PROCEDURE section executes completely (all RELEASEs happen here)
The sort utility sorts all released records
The OUTPUT PROCEDURE section executes completely (all RETURNs happen here)
Control returns to the statement following the SORT

This means the INPUT PROCEDURE runs to completion before any sorting occurs, and the sort completes before the OUTPUT PROCEDURE begins. The two procedures never execute concurrently.

14.7 Multiple Sort Keys: Mixed Ascending and Descending

Real-world sorting often requires multiple keys with different sort directions. Consider a shipping manifest that needs orders prioritized by urgency (highest priority first), then by ship date (earliest first), then by customer name (alphabetical).

           SORT SORT-WORK-FILE
               ON DESCENDING KEY SR-PRIORITY-CODE
               ON ASCENDING  KEY SR-SHIP-DATE
               ON ASCENDING  KEY SR-CUSTOMER-NAME

How Multiple Keys Work

The sort evaluates keys from left to right (major to minor):

All records are first ordered by PRIORITY-CODE in descending order (9 before 1)
Records with the same PRIORITY-CODE are then ordered by SHIP-DATE in ascending order (20240101 before 20240115)
Records with the same PRIORITY-CODE and SHIP-DATE are then ordered by CUSTOMER-NAME in ascending order (ADAMS before BAKER)

You can have as many key levels as needed, though in practice more than four or five keys is unusual.

Multiple Keys in One ON Clause

You can list multiple fields in a single ON clause when they share the same direction:

      *    These two forms are equivalent:
           SORT SORT-FILE
               ON ASCENDING KEY SR-DEPT SR-EMP-ID
               ...

           SORT SORT-FILE
               ON ASCENDING KEY SR-DEPT
               ON ASCENDING KEY SR-EMP-ID
               ...

The first key listed is always the major key regardless of which form you use.

14.8 WITH DUPLICATES IN ORDER (Stable Sort)

By default, when two records have identical values for all sort keys, their relative order in the output is implementation-dependent -- the sort utility may place them in any order. The WITH DUPLICATES IN ORDER clause guarantees that records with identical keys appear in the output in the same order they appeared in the input. This is known as a stable sort.

           SORT SORT-WORK-FILE
               ON ASCENDING KEY SR-ACCOUNT-NUMBER
               WITH DUPLICATES IN ORDER
               USING INPUT-FILE
               GIVING OUTPUT-FILE

When Stable Sort Matters

Stable sorting is critical in several scenarios:

Audit compliance: Regulatory requirements may demand that the original transaction sequence is preserved within each account
Chronological processing: When a secondary timestamp field is not part of the sort key but the original sequence reflects time order
Reproducibility: When the same input must always produce the exact same output, run after run
Multi-step processing: When a file has already been partially sorted by a previous step

Performance Consideration

WITH DUPLICATES IN ORDER may have a slight performance cost because the sort utility must track the original position of each record. On modern DFSORT implementations, this overhead is typically negligible, but on extremely large sorts (billions of records), it is worth benchmarking.

DFSORT EQUALS Option

On IBM z/OS, the DFSORT utility supports an EQUALS option in its control statements that provides the same guarantee. When you code WITH DUPLICATES IN ORDER in COBOL, the compiler generates instructions that tell DFSORT to use EQUALS behavior.

14.9 SORT Special Registers

COBOL provides several special registers related to sort operations. These registers allow you to check the status of a sort, provide hints to the sort utility, and capture diagnostic information.

SORT-RETURN

The most important special register. It contains the return code from the most recent SORT or MERGE operation:

Value	Meaning
0	Sort/merge completed successfully
16	Sort/merge terminated prematurely
Other	Warning or informational (implementation-specific)

You can also set SORT-RETURN to 16 within an INPUT or OUTPUT PROCEDURE to force the sort to terminate:

       1000-INPUT-PROC SECTION.
           OPEN INPUT INPUT-FILE
           IF WS-FILE-STATUS NOT = '00'
               DISPLAY 'CANNOT OPEN INPUT FILE'
               MOVE 16 TO SORT-RETURN
               GO TO 1000-EXIT
           END-IF
           ...

SORT-FILE-SIZE

A register you can set before the SORT executes to tell the sort utility approximately how many records to expect. This helps the sort optimize its memory allocation and work file usage:

           MOVE 5000000 TO SORT-FILE-SIZE
           SORT SORT-WORK-FILE ...

A value of zero (the default) means "let the sort determine the size." Setting this register is optional but can improve performance for very large sorts.

SORT-CORE-SIZE

Specifies the maximum amount of main storage (memory) the sort should use, in bytes:

           MOVE 16777216 TO SORT-CORE-SIZE   *> 16 MB

A value of zero means "use the default allocation." Increasing this value can improve sort performance by reducing the number of intermediate merge passes, but it reduces memory available to other processes.

SORT-MESSAGE

A register that controls where the sort utility writes its diagnostic messages. On z/OS, this typically maps to a DD name. Its behavior is implementation-specific.

14.10 The MERGE Statement

The MERGE statement combines two or more pre-sorted files into a single ordered output file. Unlike SORT, MERGE does not rearrange records -- it assumes each input file is already in the correct order and simply interleaves them.

MERGE Syntax

MERGE merge-file-name
    ON {ASCENDING|DESCENDING} KEY data-name-1 [data-name-2 ...]
    [COLLATING SEQUENCE IS alphabet-name]
    USING file-name-1 file-name-2 [file-name-3 ...]
    {GIVING file-name-4 | OUTPUT PROCEDURE IS section-name}

Key Differences from SORT

Feature	SORT	MERGE
INPUT PROCEDURE	Yes	No -- MERGE always uses USING
OUTPUT PROCEDURE	Yes	Yes
USING clause	One or more files	Two or more files (minimum 2)
Input files	Can be unsorted	Must be pre-sorted on merge keys
Re-sorting	Yes	No -- only interleaves

Pre-Sorted Requirement

This is the critical constraint: every input file to a MERGE must already be sorted on the same keys, in the same order, as specified in the MERGE statement. If an input file is not properly sorted, the results are unpredictable. The merge utility does not verify sort order (though some implementations may issue a warning).

Simple MERGE Example

           MERGE MERGE-WORK-FILE
               ON ASCENDING KEY MR-PRODUCT-CODE
               ON ASCENDING KEY MR-SALE-DATE
               USING EAST-REGION-FILE
                     CENTRAL-REGION-FILE
                     WEST-REGION-FILE
               GIVING NATIONAL-FILE

This merges three regional sales files into one national file. Each regional file must already be sorted by product code and sale date in ascending order.

MERGE with OUTPUT PROCEDURE

While MERGE cannot use INPUT PROCEDURE, it can use OUTPUT PROCEDURE to process the merged records before writing them:

           MERGE MERGE-WORK-FILE
               ON ASCENDING KEY MR-CUSTOMER-ID
               USING CUST-FILE-A
                     CUST-FILE-B
               OUTPUT PROCEDURE IS 2000-DEDUP-AND-WRITE

This is useful for deduplicating merged records or generating reports from merged data.

Complete Example (Example 05)

See code/example-05-merge.cob for a program that merges three regional sales files into a national file, and code/example-05-merge.jcl for the companion JCL.

14.11 Collating Sequence and Sort Order

The order in which characters compare during a sort depends on the collating sequence in effect. This matters because different character encodings produce different sort orders.

EBCDIC vs. ASCII Collating Sequence

On IBM mainframes, the native character encoding is EBCDIC (Extended Binary Coded Decimal Interchange Code). On most other platforms (and in GnuCOBOL), the native encoding is ASCII. The two encodings place characters in different orders:

EBCDIC Order (ascending)	ASCII Order (ascending)
Spaces	Spaces
Special characters (many)	Digits (0-9)
Lowercase letters (a-z)	Uppercase letters (A-Z)
Uppercase letters (A-Z)	Lowercase letters (a-z)
Digits (0-9)	Special characters (many)

The practical impact:

In EBCDIC, lowercase letters sort before uppercase: 'apple' < 'APPLE'
In ASCII, uppercase letters sort before lowercase: 'APPLE' < 'apple'
In EBCDIC, digits sort after letters: 'Z' < '0'
In ASCII, digits sort before letters: '0' < 'A'

This means a program that produces correct sort order on a mainframe (EBCDIC) may produce different results on a PC (ASCII) for the same data, if the data contains mixed-case text or special characters.

COLLATING SEQUENCE IS Clause

You can override the default collating sequence using the COLLATING SEQUENCE IS clause on the SORT or MERGE statement:

       SPECIAL-NAMES.
           ALPHABET MY-ORDER IS NATIVE.
           ALPHABET ASCII-ORDER IS STANDARD-1.
           ALPHABET EBCDIC-ORDER IS STANDARD-2.

Then in the SORT statement:

           SORT SORT-WORK-FILE
               ON ASCENDING KEY SR-NAME
               COLLATING SEQUENCE IS ASCII-ORDER
               USING INPUT-FILE
               GIVING OUTPUT-FILE

NATIVE: The default collating sequence for the platform
STANDARD-1: ASCII collating sequence
STANDARD-2: EBCDIC collating sequence (may not be supported on all implementations)

SORT-CONTROL and External Sort Parameters

The SORT-CONTROL special register or a SORTCNTL DD statement can pass additional control parameters to the sort utility. On z/OS, this allows you to specify DFSORT control statements without modifying the COBOL program:

//SORTCNTL DD *
  OPTION EQUALS
  SORT FIELDS=(1,10,CH,A,11,8,CH,A)
/*

14.12 JCL Considerations for Sort Programs

Running a COBOL sort program on z/OS requires careful attention to the JCL. The sort utility (DFSORT or SYNCSORT) has specific DD statement requirements.

Required DD Statements

DD Name	Purpose
SORTWK01-SORTWK33	Sort work files (temporary scratch space)
SORTLIB	Sort utility load library
SORTINxx	Input files (when using DFSORT directly)
SORTOUTx	Output files (when using DFSORT directly)
SORTCNTL	Optional sort control statements
SYSOUT	Sort diagnostic messages

SORTWK Sizing Guidelines

The sort work files are the temporary datasets used during the sort process. Proper sizing is critical for performance:

Minimum: Allocate at least 2x the input file size across all SORTWK files
Recommended: 3x input file size for optimal performance
Distribution: Spread SORTWK files across different volumes/devices for I/O parallelism
Number of files: Start with 3; use 6 or more for very large sorts

//* For a 100-cylinder input file, allocate ~300 cylinders total
//SORTWK01 DD UNIT=SYSDA,SPACE=(CYL,(50,25)),VOL=SER=WRK001
//SORTWK02 DD UNIT=SYSDA,SPACE=(CYL,(50,25)),VOL=SER=WRK002
//SORTWK03 DD UNIT=SYSDA,SPACE=(CYL,(50,25)),VOL=SER=WRK003
//SORTWK04 DD UNIT=SYSDA,SPACE=(CYL,(50,25)),VOL=SER=WRK004
//SORTWK05 DD UNIT=SYSDA,SPACE=(CYL,(50,25)),VOL=SER=WRK005
//SORTWK06 DD UNIT=SYSDA,SPACE=(CYL,(50,25)),VOL=SER=WRK006

REGION Parameter

Sort operations can be memory-intensive. The REGION parameter on the JOB or EXEC statement controls how much memory is available:

//RUN     EXEC PGM=SORTPROG,REGION=0M

REGION=0M requests the maximum available memory, which is common for sort jobs. Alternatively, you can specify a specific amount like REGION=256M.

SORTLIB DD

The SORTLIB DD points to the load library containing the sort utility modules:

//SORTLIB  DD DSN=SYS1.SORTLIB,DISP=SHR

The exact dataset name depends on your installation's sort product (DFSORT, SYNCSORT, or another vendor).

14.13 DFSORT Overview: IBM's Sort Utility

DFSORT (Data Facility Sort) is IBM's high-performance sort/merge utility for z/OS. When a COBOL program executes a SORT statement, DFSORT is typically the engine that performs the actual sorting. Understanding DFSORT helps you optimize sort performance and use advanced features.

How COBOL Interfaces with DFSORT

When the COBOL runtime encounters a SORT statement, it calls DFSORT through an interface called the E15 exit (for INPUT PROCEDURE) and E35 exit (for OUTPUT PROCEDURE). The COBOL compiler generates the necessary linkage automatically. You do not need to code any special interface -- just use the SORT statement.

DFSORT Control Statements

You can pass additional instructions to DFSORT through the SORTCNTL DD statement. Common control statements include:

SORT FIELDS: Specifies sort keys (normally generated by the COBOL compiler):

  SORT FIELDS=(1,10,CH,A,11,8,CH,A)

OPTION EQUALS: Requests stable sort (equivalent to WITH DUPLICATES IN ORDER):

  OPTION EQUALS

INCLUDE/OMIT: Filters records (alternative to INPUT PROCEDURE for simple cases):

  INCLUDE COND=(5,2,CH,EQ,C'CA')
  OMIT COND=(20,1,CH,EQ,C'D')

INREC/OUTREC: Reformats records during sort:

  INREC FIELDS=(1,10,25,30,80:X)
  OUTREC FIELDS=(1,40,C' ',41,20)

SUM FIELDS: Summarizes numeric fields for records with duplicate keys:

  SUM FIELDS=(25,8,PD)

DFSORT vs. COBOL Procedures

For simple filtering and reformatting, DFSORT control statements can be more efficient than INPUT/OUTPUT PROCEDUREs because DFSORT processes them internally without the overhead of calling back to the COBOL program for each record. However, COBOL procedures offer much greater flexibility for complex validation, multi-file processing, and business logic.

Approach	Best For
DFSORT INCLUDE/OMIT	Simple field-value filtering
DFSORT INREC/OUTREC	Simple field reformatting
COBOL INPUT PROCEDURE	Complex validation, multi-file input, transformations
COBOL OUTPUT PROCEDURE	Control breaks, deduplication with logic, report generation

14.14 GnuCOBOL Sort Implementation

GnuCOBOL implements the SORT and MERGE statements using internal sorting algorithms rather than an external sort utility. The sort is performed in memory using the C standard library's qsort function or a similar algorithm.

Differences from Mainframe Sort

No SORTWK DD statements: GnuCOBOL sorts entirely in memory (or uses temporary files managed by the runtime)
No SORTLIB: No external sort library is needed
No DFSORT control statements: SORTCNTL and related features are not available
Collating sequence: Default is ASCII, not EBCDIC
Performance: Adequate for moderate file sizes; very large sorts (millions of records) may be slower than DFSORT

Running Sort Programs in GnuCOBOL

# Compile
cobc -x -o sortprog example-01-basic-sort.cob

# Set environment variables for file assignments
export EMPUNSRT=employee-unsorted.dat
export EMPSORTD=employee-sorted.dat

# Execute
./sortprog

GnuCOBOL maps the ASSIGN TO names to environment variables or file names. No JCL or SORTWK files are needed.

Practical Considerations

When developing sort programs that must run on both GnuCOBOL and z/OS:

Keep the COBOL source identical across platforms
Use environment variables (GnuCOBOL) or JCL (z/OS) for file assignments
Be aware of EBCDIC vs. ASCII sort order differences for alphanumeric keys
Test with representative data on both platforms
Avoid DFSORT-specific features in the COBOL code (keep them in JCL)

14.15 Performance Optimization

Sort performance can be critical when processing millions or billions of records nightly. Here are the key tuning strategies.

Memory Allocation

The single most impactful performance tuning is giving the sort enough memory. More memory means fewer intermediate merge passes:

REGION=0M in JCL: Allows maximum memory
SORT-CORE-SIZE: Set in COBOL to suggest memory allocation
DFSORT MAINSIZE: Control statement to specify memory limit

Sort Work File Optimization

Multiple volumes: Spread SORTWK files across different physical devices for parallel I/O
Adequate sizing: Undersized work files cause dynamic allocation overhead and potential failures
Device type: Use the fastest available DASD

Record Length Optimization

Shorter records sort faster because more records fit in memory:

If you only need a subset of fields in the output, consider using INPUT PROCEDURE to create shorter sort records, then expanding them in the OUTPUT PROCEDURE
Avoid sorting records with large FILLER areas if possible

Key Length and Position

Shorter keys sort faster
Keys at the beginning of the record may perform slightly better
Fewer keys means fewer comparisons per record pair

Parallel Sort

Modern DFSORT versions support parallel sorting (DFSORT FICON channel programs, Hipersorting, etc.) that can dramatically reduce elapsed time for large sorts. These features are typically configured at the system level and activate automatically for qualifying sorts.

Avoiding Unnecessary Sorts

Sometimes the best optimization is not sorting at all:

If the data is already approximately sorted, consider MERGE instead of SORT
If only the top N records are needed, an OUTPUT PROCEDURE that stops after N records avoids processing the entire sorted output
If an external sort utility step (DFSORT JCL) can do the work, it may be faster than a COBOL sort with procedures

14.16 External vs. Internal Sort

There are two approaches to sorting in a batch job:

External Sort (Utility Sort Step)

A standalone JCL step that invokes DFSORT or SYNCSORT directly, without any COBOL program:

//SORTONLY EXEC PGM=SORT
//SORTIN   DD DSN=INPUT.FILE,DISP=SHR
//SORTOUT  DD DSN=OUTPUT.FILE,DISP=(NEW,CATLG,DELETE),
//            UNIT=SYSDA,SPACE=(CYL,(100,20))
//SORTWK01 DD UNIT=SYSDA,SPACE=(CYL,(50,25))
//SORTWK02 DD UNIT=SYSDA,SPACE=(CYL,(50,25))
//SORTWK03 DD UNIT=SYSDA,SPACE=(CYL,(50,25))
//SYSIN    DD *
  SORT FIELDS=(1,10,CH,A,11,8,CH,A)
  OPTION EQUALS
/*
//SYSOUT   DD SYSOUT=*

Internal Sort (COBOL SORT Statement)

The SORT statement within a COBOL program, as covered throughout this chapter.

When to Use Each Approach

Use External Sort When	Use Internal Sort (COBOL) When
Simple key-based sort only	Complex validation or filtering needed
No record transformation	Records must be transformed before/after sort
Maximum sort performance critical	Control break or report generation needed
DFSORT features (INCLUDE, SUM, etc.) suffice	Business logic determines what gets sorted
Sorting is the only operation in the step	Sort is one part of a larger program

Hybrid Approach

A common pattern in production batch jobs is to combine both:

Step 1: External DFSORT sort/merge for initial ordering
Step 2: COBOL program reads the pre-sorted file for processing

This leverages DFSORT's raw sorting speed while using COBOL for the complex processing logic.

14.17 Common Sort Patterns

Pattern 1: Sort Before Processing

The most common pattern. An unsorted transaction file is sorted before a batch processing program reads it sequentially:

Step 1: SORT transactions by account number
Step 2: Process sorted transactions (post to accounts)

Pattern 2: Sort for Reporting

Data is sorted into the hierarchy needed for a report, then processed with control break logic:

           SORT SORT-FILE
               ON ASCENDING KEY SK-REGION
               ON ASCENDING KEY SK-BRANCH
               ON ASCENDING KEY SK-DEPARTMENT
               USING DATA-FILE
               OUTPUT PROCEDURE IS GENERATE-REPORT

Pattern 3: Deduplication via Sort

Sort the file so duplicates are adjacent, then use an OUTPUT PROCEDURE to keep only unique records:

           SORT SORT-FILE
               ON ASCENDING KEY SK-CUSTOMER-ID
               ON DESCENDING KEY SK-LAST-UPDATE
               USING CUSTOMER-FILE
               OUTPUT PROCEDURE IS KEEP-LATEST-ONLY

By sorting customer ID ascending and last-update descending, the most recent record for each customer appears first. The OUTPUT PROCEDURE keeps only the first record for each customer ID.

Pattern 4: Merge for Consolidation

Combine pre-sorted files from multiple sources:

           MERGE MERGE-FILE
               ON ASCENDING KEY MK-ACCOUNT-NUMBER
               USING BRANCH-FILE-1
                     BRANCH-FILE-2
                     BRANCH-FILE-3
               GIVING CONSOLIDATED-FILE

Pattern 5: Sort with Aggregation

Use an OUTPUT PROCEDURE to sum values for records with the same key:

       2000-AGGREGATE SECTION.
           ...
           RETURN SORT-FILE INTO WS-REC
               NOT AT END
                   IF WR-KEY = WS-PREV-KEY
                       ADD WR-AMOUNT TO WS-RUNNING-TOTAL
                   ELSE
                       IF WS-PREV-KEY NOT = LOW-VALUES
                           PERFORM WRITE-AGGREGATE
                       END-IF
                       MOVE WR-KEY TO WS-PREV-KEY
                       MOVE WR-AMOUNT TO WS-RUNNING-TOTAL
                   END-IF
           END-RETURN
           ...

14.18 Error Handling for Sort Operations

Robust sort programs must handle errors at every stage.

Checking SORT-RETURN

Always check SORT-RETURN after every SORT or MERGE statement:

           SORT SORT-WORK-FILE
               ON ASCENDING KEY SR-ACCOUNT
               USING INPUT-FILE
               GIVING OUTPUT-FILE

           IF SORT-RETURN NOT = ZERO
               DISPLAY 'SORT FAILED - RC: ' SORT-RETURN
               MOVE 16 TO RETURN-CODE
               STOP RUN
           END-IF

File Status Checking in Procedures

Within INPUT and OUTPUT PROCEDUREs, check file status after every OPEN, READ, WRITE, and CLOSE:

           OPEN INPUT TRANS-FILE
           IF WS-FILE-STATUS NOT = '00'
               DISPLAY 'FATAL: CANNOT OPEN INPUT FILE: '
                       WS-FILE-STATUS
               MOVE 16 TO SORT-RETURN
               GO TO 1000-EXIT
           END-IF

Setting SORT-RETURN to 16 within a procedure causes the sort to terminate and the overall SORT statement to return 16.

Common Sort Failures

Symptom	Likely Cause	Resolution
RC=16, "insufficient work space"	SORTWK too small	Increase SPACE on SORTWK DDs
RC=16, "record length mismatch"	SD record size does not match input	Verify RECORD CONTAINS matches actual data
RC=16, "open error"	Input/output file cannot be opened	Check DD statements and dataset names
Incorrect sort order	Wrong key fields specified	Verify key positions and data types in SD
Missing records	INPUT PROCEDURE not RELEASEing all records	Debug RELEASE logic

14.19 Complete Multi-Key Sort Example (Example 06)

See code/example-06-multi-key-sort.cob for a comprehensive example that demonstrates:

Multiple sort keys with mixed ascending/descending directions
WITH DUPLICATES IN ORDER for stable sort
COLLATING SEQUENCE IS clause
Use of SORT special registers (SORT-FILE-SIZE, SORT-CORE-SIZE, SORT-RETURN)

The scenario is sorting an order file for a shipping manifest, with priority code descending (rush orders first), ship date ascending (earliest dates first), and customer name ascending (alphabetical). The companion JCL is in code/example-06-multi-key-sort.jcl.

14.20 Summary

The SORT and MERGE statements are essential tools for COBOL batch processing. Here is a concise summary of the key concepts:

SD Entry: Defines the sort work file in the DATA DIVISION. Uses SD instead of FD. No BLOCK CONTAINS or LABEL RECORDS.

SORT Statement: Rearranges records into a specified sequence. Uses ON ASCENDING/DESCENDING KEY to define sort order.

USING/GIVING: The simplest form. The sort utility handles all file I/O. You must not have USING/GIVING files open when SORT executes.

INPUT PROCEDURE: A SECTION that feeds records to the sort via RELEASE. Use it for filtering, validation, and transformation before sorting. You manage file OPEN/READ/CLOSE.

OUTPUT PROCEDURE: A SECTION that retrieves sorted records via RETURN. Use it for control breaks, deduplication, aggregation, and report generation. You manage file OPEN/WRITE/CLOSE.

RELEASE: Sends a record to the sort. Only valid in an INPUT PROCEDURE.

RETURN: Retrieves the next sorted record. Only valid in an OUTPUT PROCEDURE. Use AT END to detect when all records have been returned.

Multiple Keys: List from major to minor. Each key can be ASCENDING or DESCENDING independently.

WITH DUPLICATES IN ORDER: Guarantees stable sort -- records with identical keys maintain their original input order.

SORT-RETURN: Special register containing the sort return code. Check after every SORT/MERGE. Set to 16 in a procedure to force termination.

MERGE: Combines pre-sorted files. Requires USING with two or more files. Supports GIVING or OUTPUT PROCEDURE but not INPUT PROCEDURE. Input files must be pre-sorted on the merge keys.

COLLATING SEQUENCE: Controls character comparison order. EBCDIC (mainframe) and ASCII (PC) produce different orders for the same data.

JCL: SORTWK DDs for work space, SORTLIB for the sort library, SORTCNTL for optional control statements. Size SORTWK at 2-3x input file size.

Understanding these concepts gives you the ability to handle virtually any data ordering requirement in COBOL batch processing, from simple file sorts to complex multi-stage pipelines that validate, transform, sort, deduplicate, and report in a single program execution.