Chapter 14 Quiz: Sort and Merge Operations

Test your understanding of COBOL sort and merge concepts. Each question is followed by a hidden answer -- try to answer before revealing it.

Question 1

What level indicator is used in the DATA DIVISION to define a sort work file?

Show Answer

**SD** (Sort Description). The SD entry replaces FD for sort work files. Unlike FD, SD does not include BLOCK CONTAINS, LABEL RECORDS, or RECORDING MODE clauses because the sort utility manages the work file internally.

Question 2

Which of the following clauses are NOT valid in an SD entry? (Select all that apply) - a) RECORD CONTAINS 80 CHARACTERS - b) BLOCK CONTAINS 0 RECORDS - c) LABEL RECORDS ARE STANDARD - d) RECORDING MODE IS F

Show Answer

**b, c, and d** are NOT valid in an SD entry. Only RECORD CONTAINS is valid. BLOCK CONTAINS, LABEL RECORDS, and RECORDING MODE are FD-only clauses because the sort work file's physical characteristics are managed by the sort utility, not the COBOL program.

Question 3

What statement is used within an INPUT PROCEDURE to send a record to the sort?

Show Answer

The **RELEASE** statement. It can be used in two forms: - `RELEASE sort-record-name` -- releases the current content of the sort record - `RELEASE sort-record-name FROM identifier` -- copies data from the identifier into the sort record and releases it RELEASE is analogous to WRITE for regular files.

Question 4

What statement is used within an OUTPUT PROCEDURE to retrieve a sorted record?

Show Answer

The **RETURN** statement. It retrieves the next record in sort sequence. It supports AT END and NOT AT END clauses:

RETURN sort-file INTO ws-record
    AT END SET end-of-sort TO TRUE
    NOT AT END PERFORM process-record
END-RETURN

RETURN is analogous to READ for regular files.

Question 5

True or False: When using the USING clause on a SORT statement, you must explicitly OPEN the input file before the SORT executes.

Show Answer

**False**. When using USING, the SORT statement automatically opens, reads, and closes the input file. If the file is already open when the SORT executes, the result is a runtime error. You must NOT have USING or GIVING files open when the SORT statement executes.

Question 6

What value does SORT-RETURN contain after a successful sort operation?

Show Answer

**Zero (0)**. SORT-RETURN contains 0 for success and 16 for premature termination. Other non-zero values may indicate warnings, depending on the sort utility implementation. Always check SORT-RETURN after every SORT and MERGE statement.

Question 7

Given the following SORT statement, what is the major sort key and what is the minor sort key?

SORT SORT-FILE
    ON ASCENDING KEY SK-DEPARTMENT
    ON DESCENDING KEY SK-SALARY
    ON ASCENDING KEY SK-EMPLOYEE-NAME
    USING INPUT-FILE
    GIVING OUTPUT-FILE

Show Answer

- **Major key**: SK-DEPARTMENT (ascending) -- records are first sorted by department A to Z - **First minor key**: SK-SALARY (descending) -- within each department, records are sorted by salary highest to lowest - **Second minor key**: SK-EMPLOYEE-NAME (ascending) -- within each department and salary level, records are sorted by name A to Z The first key listed is always the major (primary) key, and subsequent keys are progressively more minor.

Question 8

What does the WITH DUPLICATES IN ORDER clause guarantee?

Show Answer

WITH DUPLICATES IN ORDER guarantees a **stable sort**: when two or more records have identical values for ALL sort keys, they will appear in the output in the same relative order as they appeared in the input. Without this clause, the order of records with duplicate keys is implementation-dependent (unpredictable). This is important for audit compliance, chronological preservation, and reproducibility.

Question 9

Can a MERGE statement use an INPUT PROCEDURE?

Show Answer

**No**. MERGE does not support INPUT PROCEDURE. MERGE always uses the USING clause to specify two or more pre-sorted input files. However, MERGE does support OUTPUT PROCEDURE as an alternative to GIVING. This is because MERGE assumes its inputs are already sorted and does not need preprocessing -- it only interleaves records.

Question 10

What is the minimum number of files that must be specified in the USING clause of a MERGE statement?

Show Answer

**Two (2)**. A MERGE operation combines two or more pre-sorted files. If you only have one input file, there is nothing to merge -- use SORT instead (or simply copy the file).

Question 11

What is the critical prerequisite for all input files used in a MERGE statement?

Show Answer

All input files must be **pre-sorted** on the same key fields, in the same ascending/descending order, as specified in the MERGE statement's KEY clause. MERGE does not re-sort records; it only interleaves them. If an input file is not properly sorted, the merge output will be incorrect. Most implementations do not verify the sort order of input files.

Question 12

In EBCDIC collating sequence, which sorts first: lowercase 'a' or uppercase 'A'?

Show Answer

In EBCDIC, **lowercase 'a' sorts before uppercase 'A'**. The EBCDIC collating sequence places characters in this general order: spaces, special characters, lowercase letters (a-z), uppercase letters (A-Z), digits (0-9). This is the opposite of ASCII, where uppercase letters sort before lowercase. This difference is important when migrating sort programs between mainframe (EBCDIC) and distributed (ASCII) platforms.

Question 13

What are SORTWK DD statements used for in JCL, and how many should you typically allocate?

Show Answer

**SORTWK DD statements** provide temporary work space (scratch datasets) for the sort utility to store intermediate results during the sort process. They are temporary datasets that exist only for the duration of the sort step. **Typical allocation**: Start with 3 SORTWK files. For very large sorts, use 6 or more. The total space across all SORTWK files should be 2 to 3 times the input file size. For best performance, place SORTWK files on different physical volumes to enable parallel I/O. DFSORT can use up to 33 SORTWK files (SORTWK01 through SORTWK33).

Question 14

True or False: An INPUT PROCEDURE can contain a SORT statement.

Show Answer

**False**. You must not execute a SORT or MERGE statement within an INPUT PROCEDURE or OUTPUT PROCEDURE. This restriction prevents nested sort operations, which would conflict with the sort utility's resource management. If you need to sort data that has already been sorted, use a separate SORT statement after the first SORT completes.

Question 15

What happens when you set SORT-RETURN to 16 within an INPUT PROCEDURE?

Show Answer

Setting SORT-RETURN to 16 within an INPUT or OUTPUT PROCEDURE **forces the sort to terminate immediately**. The SORT statement returns control to the statement following it, with SORT-RETURN containing 16. This is the proper way to abort a sort when a fatal error is detected within a procedure, such as an input file failing to open or a critical validation error.

IF WS-FILE-STATUS NOT = '00'
    MOVE 16 TO SORT-RETURN
    GO TO section-exit
END-IF

Question 16

What is the purpose of the COLLATING SEQUENCE IS clause on a SORT statement?

Show Answer

The COLLATING SEQUENCE IS clause overrides the default character comparison order for alphanumeric sort keys. It specifies an alphabet name (defined in SPECIAL-NAMES) that determines how characters are ranked: - `NATIVE`: The platform's default collating sequence (EBCDIC on mainframes, ASCII on PCs) - `STANDARD-1`: ASCII collating sequence - `STANDARD-2`: EBCDIC collating sequence This is useful for ensuring consistent sort behavior across platforms or for implementing custom sort orders where certain characters should compare differently than their default encoding positions.

Question 17

A programmer writes this code and wonders why the sort output file is empty:

       1000-INPUT-PROC SECTION.
           OPEN INPUT DATA-FILE
           PERFORM UNTIL END-OF-FILE
               READ DATA-FILE
                   AT END SET END-OF-FILE TO TRUE
                   NOT AT END
                       MOVE INPUT-REC TO SORT-REC
               END-READ
           END-PERFORM
           CLOSE DATA-FILE.
       1000-EXIT.
           EXIT.

What is wrong?

Show Answer

The programmer **never executes a RELEASE statement**. Moving data to the sort record (MOVE INPUT-REC TO SORT-REC) does not send the record to the sort. The programmer must use RELEASE to actually submit the record:

RELEASE SORT-REC FROM INPUT-REC

or equivalently:

MOVE INPUT-REC TO SORT-REC
RELEASE SORT-REC

Without RELEASE, no records enter the sort, so the output is empty.

Question 18

What is the difference between SORT-FILE-SIZE and SORT-CORE-SIZE special registers?

Show Answer

- **SORT-FILE-SIZE**: A hint to the sort utility about the expected number of records. Setting this helps the sort optimize its work file allocation and merge strategy. A value of zero means "let the sort determine it." - **SORT-CORE-SIZE**: Specifies the maximum amount of main storage (memory), in bytes, that the sort should use. More memory generally means better performance (fewer merge passes). A value of zero means "use the default." Both registers should be set BEFORE the SORT statement executes. They are inputs to the sort utility, not outputs.

Question 19

What DFSORT control statement is equivalent to coding WITH DUPLICATES IN ORDER in a COBOL SORT statement?

Show Answer

The DFSORT **OPTION EQUALS** control statement. When coded in the SORTCNTL DD:

  OPTION EQUALS

This tells DFSORT to preserve the original input sequence for records with identical key values, which is the same guarantee provided by WITH DUPLICATES IN ORDER in the COBOL source. When the COBOL compiler generates sort instructions, coding WITH DUPLICATES IN ORDER causes it to request EQUALS behavior from DFSORT.

Question 20

In a COBOL program that uses SORT with both INPUT PROCEDURE and OUTPUT PROCEDURE, what is the exact execution order?

Show Answer

The execution order is strictly sequential: 1. The SORT statement begins execution 2. The **INPUT PROCEDURE section executes completely** -- all OPEN, READ, RELEASE, and CLOSE operations happen here 3. After the INPUT PROCEDURE finishes, the **sort utility sorts all released records** in memory/work files 4. After sorting completes, the **OUTPUT PROCEDURE section executes completely** -- all OPEN, RETURN, WRITE, and CLOSE operations happen here 5. After the OUTPUT PROCEDURE finishes, control returns to the statement following the SORT The two procedures never execute concurrently. The INPUT PROCEDURE must finish before sorting begins, and sorting must finish before the OUTPUT PROCEDURE begins.

Question 21

A COBOL program must sort a file and also count the total number of records. Which approach is simpler: INPUT PROCEDURE or OUTPUT PROCEDURE?

Show Answer

For simply counting records, an **OUTPUT PROCEDURE** is simpler because: 1. With an OUTPUT PROCEDURE, you only need to RETURN and count records -- no filtering or transformation logic 2. With USING for input, the sort handles all input file I/O automatically 3. The count at the end of the OUTPUT PROCEDURE gives you the exact number of sorted records

SORT SORT-FILE
    ON ASCENDING KEY SK-FIELD
    USING INPUT-FILE
    OUTPUT PROCEDURE IS COUNT-RECORDS

Using an INPUT PROCEDURE just for counting would also work but adds unnecessary complexity since you would still need to RELEASE every record. However, if you need the count of input records (before any filtering), you would need the INPUT PROCEDURE approach.

Question 22

What is the practical difference between an external sort (DFSORT utility step) and an internal sort (COBOL SORT statement)?

Show Answer

**External sort** (utility step): - Invoked directly via JCL: `EXEC PGM=SORT` - No COBOL program involved - Uses DFSORT control statements (SORT FIELDS, INCLUDE, OUTREC, SUM, etc.) - Maximum performance for pure sorting - Limited to operations DFSORT supports natively - Separate JCL step **Internal sort** (COBOL SORT): - SORT statement within a COBOL program - Calls DFSORT through E15/E35 exits - Can use INPUT/OUTPUT PROCEDUREs for complex logic - Supports any COBOL business logic (validation, transformation, reporting) - Slightly more overhead than external sort for simple operations - Part of a larger program (sorting is one of many operations) **Use external** when the sort is the only operation needed. **Use internal** when sorting is combined with validation, transformation, or reporting.

Question 23

What is wrong with the following MERGE statement?

MERGE MERGE-FILE
    ON ASCENDING KEY MK-ACCOUNT
    USING CUST-FILE-A
    GIVING OUTPUT-FILE

Show Answer

The MERGE statement specifies only **one** file in the USING clause (CUST-FILE-A). MERGE requires a minimum of **two** input files in the USING clause. A merge with only one file is pointless -- it would just copy the file. The corrected statement needs at least two files:

MERGE MERGE-FILE
    ON ASCENDING KEY MK-ACCOUNT
    USING CUST-FILE-A
          CUST-FILE-B
    GIVING OUTPUT-FILE

Question 24

How does REGION=0M in JCL affect sort performance?

Show Answer

**REGION=0M** requests the maximum available memory for the job step. This directly benefits sort performance because: 1. More memory means more records can be held in memory during the initial sort phase 2. Fewer initial sorted runs are created 3. Fewer merge passes are needed to combine the runs into the final sorted output 4. This reduces I/O to SORTWK files, which is the primary performance bottleneck For example, if a sort can hold all records in memory, it completes in a single pass with no SORTWK I/O at all (an "in-core sort"). With limited memory, the sort must write and re-read intermediate results multiple times. The trade-off is that REGION=0M may consume memory that other concurrent jobs need. In production, a specific REGION value (like 512M) may be more appropriate to balance sort performance with overall system throughput.

Question 25

An INPUT PROCEDURE reads 1,000,000 records and RELEASEs 750,000. The OUTPUT PROCEDURE RETURNs all sorted records and writes them to the output file. An OUTPUT PROCEDURE counter shows 750,000 records written. Is this correct?

Show Answer

**Yes, this is correct and expected.** The flow is: 1. INPUT PROCEDURE reads 1,000,000 records from the input file 2. INPUT PROCEDURE RELEASEs only 750,000 records (filtering out 250,000) 3. The sort utility sorts the 750,000 released records 4. OUTPUT PROCEDURE RETURNs all 750,000 sorted records 5. OUTPUT PROCEDURE writes all 750,000 records to the output file The 250,000 records that were not RELEASEd never enter the sort and are effectively discarded (or could have been written to an error/reject file by the INPUT PROCEDURE). The number of records coming out of the sort exactly equals the number RELEASEd into it.

Question 26

What happens if an input file to a MERGE statement is not actually sorted on the merge keys?

Show Answer

The results are **unpredictable and incorrect**. MERGE assumes all input files are pre-sorted and simply interleaves records by comparing the current record from each file. It does not verify or enforce sort order. If an input file is out of order, the merged output will also be out of order in ways that depend on the specific data and how the merge algorithm processes it. Some sort utility implementations may issue a warning message (like DFSORT's ICE071A message) when it detects out-of-sequence records, but this is not guaranteed and the merge may still complete with incorrect results. **Prevention**: Always ensure input files are properly sorted before merging. Consider adding a sort verification step or using DFSORT's VERIFY option.

Question 27

True or False: A SORT statement can specify both USING and INPUT PROCEDURE at the same time.

Show Answer

**False**. A SORT statement uses either USING or INPUT PROCEDURE for input, but not both. Similarly, it uses either GIVING or OUTPUT PROCEDURE for output, but not both. The valid combinations are: 1. USING with GIVING (simplest) 2. USING with OUTPUT PROCEDURE 3. INPUT PROCEDURE with GIVING 4. INPUT PROCEDURE with OUTPUT PROCEDURE (most flexible)

Question 28

A sort program must process 50 million records. The input file is 10 GB. The available memory is 256 MB. Estimate the number of initial runs and merge passes needed (assume each run fits the available memory, and the merge order equals the number of SORTWK files = 6).

Show Answer

**Initial runs calculation:** - Input file: 10 GB = 10,240 MB - Memory available: 256 MB - Initial runs = 10,240 / 256 = 40 runs **Merge passes calculation:** - Merge order = 6 (number of SORTWK files that can be merged simultaneously) - After pass 1: 40 runs merged 6 at a time = ceil(40/6) = 7 runs - After pass 2: 7 runs merged 6 at a time = ceil(7/6) = 2 runs - After pass 3: 2 runs merged = 1 final sorted output - Total merge passes = **3** **Total I/O estimate:** - Initial sort: Read 10 GB + Write 10 GB = 20 GB - Each merge pass: Read 10 GB + Write 10 GB = 20 GB - 3 merge passes: 60 GB - Total I/O: ~80 GB This illustrates why more memory is so valuable for sort performance -- doubling memory to 512 MB would reduce initial runs from 40 to 20, potentially eliminating one merge pass.

Question 29

Why must INPUT PROCEDURE and OUTPUT PROCEDURE be coded as SECTIONs rather than paragraphs?

Show Answer

INPUT PROCEDURE and OUTPUT PROCEDURE must be **SECTIONs** because of how the SORT statement manages control flow: 1. **Scope isolation**: A SECTION provides a clear boundary for the procedure's code. The SORT statement needs to know exactly where the procedure begins and ends so it can transfer control to the procedure and detect when it has completed. 2. **Historical convention**: The COBOL standard requires SECTION names for sort procedures. This dates back to the original COBOL design where SECTIONs provided the necessary scoping for the sort interface. 3. **PERFORM behavior**: Within a sort procedure, you can PERFORM paragraphs and other sections, but control must not "fall through" to code outside the procedure. The SECTION boundary enforces this. 4. **Compiler enforcement**: The COBOL compiler validates that the name referenced by INPUT PROCEDURE IS or OUTPUT PROCEDURE IS is a SECTION name, not a paragraph name. Using a paragraph name will produce a compilation error.

Question 30

Describe a scenario where using MERGE is significantly more efficient than SORT.

Show Answer

**Scenario**: A bank has 10 regional offices, each producing a daily transaction file of 1 million records, already sorted by account number. The bank needs a single consolidated file sorted by account number. **Using SORT**: Concatenate all 10 files (10 million records) and sort from scratch. This requires reading 10 million records, creating initial runs in memory, writing to SORTWK files, and performing multiple merge passes. Estimated I/O: 40+ GB for a 2 GB combined input. **Using MERGE**: Since the files are already sorted, MERGE simply interleaves them. It reads each input file once and writes the output once. No SORTWK I/O is needed (or minimal). Estimated I/O: ~4 GB (2 GB read + 2 GB write). **MERGE is ~10x more efficient** in this case because it avoids the expensive sort phase entirely. The prerequisite is that all input files must already be sorted on the merge keys. This is a common real-world pattern where regional or branch offices maintain their data in sorted order and a consolidation process combines them.