Chapter 14 Exercises: Sort and Merge Operations

Tier 1: Recall and Recognition (Exercises 1-8)

Exercise 1: SD vs FD

Identify which of the following clauses are valid for an SD entry and which are valid only for an FD entry: - a) RECORD CONTAINS 80 CHARACTERS - b) BLOCK CONTAINS 0 RECORDS - c) LABEL RECORDS ARE STANDARD - d) RECORDING MODE IS F

Write a correct SD entry for a sort file with 120-character records containing an employee ID (6 bytes), department code (4 bytes), salary (PIC 9(7)V99), and a name field (30 bytes), with filler for the rest.

Exercise 2: SORT Statement Syntax

Fill in the blanks to create a valid SORT statement that sorts a file by customer name in alphabetical order:

           ________ SORT-WORK-FILE
               ON __________ KEY SR-CUSTOMER-NAME
               ________ CUSTOMER-INPUT
               ________ CUSTOMER-OUTPUT

Exercise 3: RELEASE and RETURN

Answer the following questions: - a) In which type of procedure is the RELEASE statement valid? - b) In which type of procedure is the RETURN statement valid? - c) What is the RELEASE statement analogous to in standard file processing? - d) What is the RETURN statement analogous to in standard file processing? - e) What clause on the RETURN statement detects when all sorted records have been retrieved?

Exercise 4: SORT-RETURN Register

A programmer writes the following code. What are the possible values of WS-SORT-RC after the SORT executes, and what does each value mean?

           SORT SORT-FILE
               ON ASCENDING KEY SK-ACCOUNT
               USING INPUT-FILE
               GIVING OUTPUT-FILE
           MOVE SORT-RETURN TO WS-SORT-RC

Exercise 5: Basic INPUT PROCEDURE Sort

Write a complete COBOL program that reads a student file (80-byte records), filters out inactive students (status field = 'I'), converts last names to uppercase, and sorts the remaining records by GPA descending and last name ascending. Use an INPUT PROCEDURE for the filtering and GIVING for the output.

Record layout: - Student ID: PIC X(08) at position 1 - Last Name: PIC X(20) at position 9 - First Name: PIC X(15) at position 29 - Major Code: PIC X(04) at position 44 - GPA: PIC 9V99 at position 48 - Credits: PIC 9(03) at position 51 - Status: PIC X(01) at position 54 ('A' = active, 'I' = inactive) - Enrollment Date: PIC X(08) at position 55 - Filler: PIC X(18) at position 63

Solution: See code/exercise-solutions.cob (EX05SOL)

Exercise 6: Key Order Identification

Given the following SORT statement, describe the exact order of the output records. Provide an example showing how five sample records would be ordered.

           SORT SORT-FILE
               ON DESCENDING KEY SK-PRIORITY
               ON ASCENDING  KEY SK-DUE-DATE
               ON ASCENDING  KEY SK-TASK-NAME

Sample records: | Priority | Due Date | Task Name | |---|---|---| | 3 | 20240315 | BACKUP | | 5 | 20240310 | AUDIT | | 5 | 20240310 | DEPLOY | | 3 | 20240301 | REVIEW | | 5 | 20240315 | CHECK |

Exercise 7: MERGE Requirements

A programmer writes the following MERGE statement but it fails. Identify at least three potential issues:

           MERGE MERGE-FILE
               ON ASCENDING KEY MK-ACCOUNT
               INPUT PROCEDURE IS 1000-VALIDATE
               USING FILE-B
               GIVING OUTPUT-FILE

Exercise 8: JCL Identification

Match each DD name with its purpose in a sort job:

DD Name Purpose
a) SORTWK01 1) Sort diagnostic messages
b) SORTLIB 2) Input file for the program
c) EMPINPUT 3) Sort utility load library
d) SYSOUT 4) Temporary sort work space
e) SORTCNTL 5) Optional sort control statements

Tier 2: Comprehension and Application (Exercises 9-16)

Exercise 9: USING with Multiple Files

Write a SORT statement that combines three quarterly transaction files (Q1-TRANS, Q2-TRANS, Q3-TRANS) into a single annual file sorted by account number and transaction date, both ascending. Include the SD entry and FILE-CONTROL entries.

Exercise 10: Deduplication with OUTPUT PROCEDURE

Write a complete COBOL program that sorts an inventory file by item code (ascending) and last-update date (descending), then uses an OUTPUT PROCEDURE to remove duplicates, keeping only the most recently updated record for each item code.

Record layout (60 bytes): - Item Code: PIC X(10) at position 1 - Item Description: PIC X(25) at position 11 - Quantity: PIC 9(05) at position 36 - Unit Price: PIC 9(05)V99 at position 41 - Last Update: PIC X(08) at position 48 (YYYYMMDD) - Filler: PIC X(05) at position 56

Solution: See code/exercise-solutions.cob (EX10SOL)

Exercise 11: INPUT PROCEDURE with Multiple Validations

Write an INPUT PROCEDURE section that reads an insurance claims file and performs the following validations before RELEASEing to the sort:

  1. Claim number must be numeric (10 digits)
  2. Policy number must start with 'POL'
  3. Claim amount must be greater than zero and less than 1,000,000
  4. Claim date must be numeric (YYYYMMDD format)
  5. Status must be 'O' (Open), 'P' (Processing), or 'A' (Approved)

Records failing any validation should be written to an error file with a reason code. Count records read, released, and rejected.

Exercise 12: OUTPUT PROCEDURE with Totals

Write an OUTPUT PROCEDURE section that processes sorted sales records (sorted by region and salesperson) and produces the following output: - A detail line for each record - A subtotal line for each salesperson (total sales amount) - A subtotal line for each region (total sales amount, number of salespeople) - A grand total line (total sales amount, number of regions, number of salespeople)

Exercise 13: MERGE Two Files

Write a complete program that merges two pre-sorted customer files (CUST-EAST and CUST-WEST, both sorted by customer ID ascending). The merged output should go to CUST-NATIONAL. Include the complete ENVIRONMENT DIVISION, DATA DIVISION, and PROCEDURE DIVISION, plus companion JCL.

Exercise 14: WITH DUPLICATES IN ORDER

Explain why WITH DUPLICATES IN ORDER is important in each of the following scenarios:

a) An ATM transaction log is sorted by account number. Multiple transactions for the same account must remain in chronological order.

b) A payroll file is sorted by department. Within each department, employees were originally listed in seniority order, and that order must be preserved.

c) A regulatory audit requires that the output of a sort be identical every time it runs on the same input data.

Exercise 15: Multi-Level Control Break Report

Write a complete COBOL program that sorts a payroll file by region, branch, and department (all ascending), then uses an OUTPUT PROCEDURE to produce a report with: - Detail lines for each employee (name and net pay) - Department subtotals - Branch subtotals - Region subtotals - Grand total

Solution: See code/exercise-solutions.cob (EX15SOL)

Exercise 16: Sort Performance Estimation

A file contains 10 million records, each 200 bytes long. Calculate: - a) The total file size in bytes, megabytes, and cylinders (assume 849,960 bytes per cylinder on 3390 DASD) - b) The recommended total SORTWK space (3x input size) - c) How to distribute this across 6 SORTWK DD statements - d) If the sort can hold 500,000 records in memory, how many merge passes would be needed? (Hint: number of initial runs = total records / records in memory; passes = ceil(log2(initial runs / merge order)), assuming merge order of 6)


Tier 3: Analysis and Problem Solving (Exercises 17-24)

Exercise 17: Debugging a Sort Program

The following program compiles but produces no output records. Identify all bugs and explain the fix for each.

       PROCEDURE DIVISION.
       0000-MAIN.
           OPEN INPUT INPUT-FILE
           SORT SORT-FILE
               ON ASCENDING KEY SK-ACCOUNT
               USING INPUT-FILE
               GIVING OUTPUT-FILE
           CLOSE INPUT-FILE
           STOP RUN.

Exercise 18: INPUT PROCEDURE vs DFSORT INCLUDE

A team must sort a 50-million-record file, keeping only records where REGION-CODE = 'NE' or 'NW'. Compare and contrast the two approaches:

Approach A: COBOL INPUT PROCEDURE that reads each record and only RELEASEs records where REGION-CODE is 'NE' or 'NW'.

Approach B: Using DFSORT INCLUDE control statement via SORTCNTL DD:

  INCLUDE COND=(15,2,CH,EQ,C'NE',OR,15,2,CH,EQ,C'NW')

Discuss: performance, flexibility, maintainability, testability, and portability.

Exercise 19: Sort Order Debugging

A program sorts customer records by name and produces this output:

BAKER, JOHN
Baker, Mary
SMITH, ALICE
smith, bob
123CORP
99SALES

The programmer expected uppercase and lowercase names to sort together. Explain what is happening and provide two different solutions.

Exercise 20: MERGE with OUTPUT PROCEDURE and Validation

Write a complete COBOL program that merges two pre-sorted customer files and uses an OUTPUT PROCEDURE to: 1. Remove duplicate customer IDs (keep the first occurrence) 2. Validate that customer IDs are numeric 3. Count merged records, written records, duplicates, and invalid records 4. Display statistics at the end

Solution: See code/exercise-solutions.cob (EX20SOL)

Exercise 21: Analyzing a Production Sort Failure

A production sort job ABENDs with a S001-4 (sort work file full). The input file contains 8 million records at 250 bytes each. The JCL has:

//SORTWK01 DD UNIT=SYSDA,SPACE=(CYL,(50,10))
//SORTWK02 DD UNIT=SYSDA,SPACE=(CYL,(50,10))
//SORTWK03 DD UNIT=SYSDA,SPACE=(CYL,(50,10))

a) Calculate the input file size in cylinders. b) Calculate the total SORTWK space currently allocated (primary + one secondary extension each). c) Determine why the sort failed (compare input size to SORTWK space). d) Propose JCL changes to fix the problem with a 50% safety margin. e) What REGION parameter would you recommend?

Exercise 22: Designing a Sort Strategy

A banking application receives transactions from five channels: ATM, Teller, Online, Mobile, and ACH. Each channel produces a separate daily file, already sorted by timestamp. The batch posting system requires a single file sorted by account number, then transaction type, then timestamp. Design the most efficient sorting strategy. Should you: - a) MERGE the five files first, then SORT by account? - b) Concatenate all files and SORT once? - c) SORT each file individually by account, then MERGE? Justify your choice with performance analysis.

Exercise 23: EBCDIC vs ASCII Sort Order Analysis

A COBOL program is being migrated from z/OS (EBCDIC) to a Linux server (ASCII/GnuCOBOL). The program sorts customer names. Given these sample names, show the different sort orders on each platform and explain why:

JOHNSON
johnson
O'Brien
123Corp
SMITH-JONES
De La Cruz

What code changes would ensure consistent sort order across platforms?

Exercise 24: Complex Sort Pipeline Design

Design a complete sort program for the following requirements:

A hospital billing system receives daily charge records from multiple departments. The program must: 1. Read and validate charge records (valid patient ID, valid department, valid charge code, amount > 0) 2. Apply a 15% markup to laboratory charges (department code 'LAB') 3. Sort by patient ID, then department, then charge date 4. In the output, detect patients with total charges exceeding $50,000 and flag them for review 5. Produce three outputs: a sorted charge file, a high-balance exception report, and a department summary report

Write the complete program design including: FILE-CONTROL, SD entry, record layouts, INPUT PROCEDURE logic, OUTPUT PROCEDURE logic, and error handling. You may write pseudocode for complex sections.


Tier 4: Synthesis and Design (Exercises 25-32)

Exercise 25: Complete Sort Pipeline

Write a complete COBOL program with both INPUT and OUTPUT PROCEDUREs that processes an invoice file:

INPUT PROCEDURE: - Validate invoice number is not blank - Validate customer ID is not blank - Validate amount is not zero - Normalize region code to uppercase - RELEASE valid records; count rejections

OUTPUT PROCEDURE: - Remove duplicate invoice numbers - Write sorted output file - Produce region subtotals on the summary report - Produce grand total on the summary report

Sort by region (ascending), customer ID (ascending), invoice number (ascending), WITH DUPLICATES IN ORDER.

Solution: See code/exercise-solutions.cob (EX25SOL)

Exercise 26: Multi-File MERGE with Report

Write a program that merges four quarterly sales files (Q1 through Q4, each pre-sorted by product code) into an annual file. Use an OUTPUT PROCEDURE that: - Produces a product summary showing total quantity and revenue per product across all quarters - Identifies the top-selling products (those with revenue > $100,000) - Counts records from each quarter (hint: each record has a quarter indicator field)

Exercise 27: JCL Design for Large Sort

Design complete JCL for a sort job with the following requirements: - Input file: 25 million records, 300 bytes each, FB, on volume PROD01 - Output file: Same format, new dataset on volume PROD02 - The sort must complete within 30 minutes - The sort must use no more than 512MB of memory - Sort work files must be on volumes WRK001 through WRK006 - Include DFSORT control statements for EQUALS option and estimated file size - Include a cleanup step that deletes the output if it already exists - Include a verification step that checks the output record count

Exercise 28: Sort-Based Matching

Two files need to be matched: a master customer file and a daily transaction file. Both must be in customer-ID order. Design a two-step process:

Step 1: Sort both files by customer ID if they are not already sorted. (If they are already sorted, use a validation step instead.)

Step 2: Write a COBOL program that reads both sorted files in parallel (not using SORT/MERGE but sequential reads) to: - Match transactions to their master customer records - Report unmatched transactions (orphan transactions) - Report customers with no transactions (inactive customers)

Write the SORT program for Step 1 and the matching logic for Step 2.

Exercise 29: Incremental Sort Design

A large master file (100 million records) needs to incorporate 500,000 daily update records. The master file is already sorted by account number. Design the most efficient approach:

a) Re-sort the entire master + updates together b) Sort only the updates, then MERGE with the existing master c) A hybrid approach

Analyze the performance trade-offs of each approach. Calculate estimated I/O operations for each option (assume 15,000 I/Os per cylinder and 50 records per track on 3390).

Exercise 30: Error Recovery Design

Design a sort program with comprehensive error recovery:

  1. If the input file is empty, produce a warning message and set RC=4 (not a failure)
  2. If more than 5% of input records are rejected, terminate the sort and set RC=12
  3. If the sort work file runs out of space, capture the error and write a meaningful message
  4. If the output file cannot be opened, terminate gracefully
  5. If the audit file cannot be opened, continue processing but write a warning
  6. After successful completion, write a checkpoint record to a control file for restart purposes

Write the COBOL code for the error handling logic.

Exercise 31: Sort for Regulatory Compliance

A financial institution must produce a quarterly regulatory report. The requirements are:

  1. All customer accounts must be reported in account-number order
  2. Within each account, transactions must be in chronological order
  3. The sort must be reproducible (identical output for identical input on every run)
  4. An audit trail must document: input record count, output record count, total debits, total credits, net balance, and a hash total of all account numbers
  5. The program must detect and report any out-of-sequence records in the input (records that are not in the expected order from the source system)

Design the complete sort program addressing all five requirements. Include the audit trail format and the out-of-sequence detection logic.

Exercise 32: Performance Benchmarking Framework

Design a test framework for benchmarking sort performance. The framework should:

  1. Generate test data files of varying sizes (1,000; 10,000; 100,000; 1,000,000 records)
  2. Generate data in different initial orders (random, nearly sorted, reverse sorted, many duplicates)
  3. Sort each file using three approaches: - SORT with USING/GIVING (simple) - SORT with INPUT PROCEDURE filtering 50% of records - External DFSORT utility step
  4. Capture elapsed time, CPU time, and I/O counts for each run
  5. Produce a comparison report

Write the data generation program and the sort programs. Describe how to capture and compare the performance metrics.


Tier 5: Real-World Scenarios (Exercises 33-40)

Exercise 33: Daily Bank Statement Generation

Design a sort-based system to generate daily bank statements. Input: transaction file with millions of records from multiple channels. Output: transactions sorted by customer, then by date, with running balance calculations. Handle the case where a customer's transactions span midnight (transactions from both yesterday and today).

Exercise 34: Insurance Claims Processing

An insurance company receives claims from 50 regional offices. Each office submits a daily file pre-sorted by policy number. Design a MERGE-based system that: - Merges all 50 files (note: COBOL MERGE has a practical limit on USING files) - Validates claim amounts against policy limits - Detects duplicate claims (same policy, same date, same provider, same amount) - Routes claims above $25,000 to a manual review queue

Address the challenge of merging more files than the MERGE statement supports (most implementations limit USING to 16 files). Propose a multi-pass merge strategy.

Exercise 35: Inventory Reconciliation

A retail chain with 200 stores performs nightly inventory reconciliation. Each store submits a file of item movements (receipts, sales, adjustments, transfers). Design a sort/merge system that: 1. Merges store files into a regional file (25 regions, 8 stores each) 2. Sorts regional files by item code 3. Produces an item movement summary showing: opening balance, total receipts, total sales, total adjustments, transfers in, transfers out, and closing balance 4. Flags items where the calculated closing balance differs from the reported closing balance by more than 2%

Exercise 36: Election Night Results Processing

Design a sort program for processing election results. Precincts report results in arbitrary order as they finish counting. The program must: 1. Accept results as they arrive (simulated by a file with timestamp-ordered precinct results) 2. Sort by race, then candidate, then precinct 3. Produce running totals by candidate within each race 4. Flag precincts reporting more votes than registered voters 5. Handle late-arriving corrections (a precinct may submit revised numbers)

The key challenge is handling corrections: if precinct 42 submits results twice, the second submission should replace the first. Design the INPUT PROCEDURE logic.

Exercise 37: Supply Chain Order Prioritization

A manufacturer receives 50,000 orders daily. Design a sort program that prioritizes orders for fulfillment: - Priority 1: Emergency medical supply orders (ship same day) - Priority 2: Government contract orders (ship within 2 days) - Priority 3: Regular orders by customer tier (Platinum > Gold > Silver > Standard) - Within each priority level, sort by order date (oldest first) - Within the same date, sort by order value (highest first)

The program must split the output into three files (one per priority level) while maintaining the sort order within each file. Design the sort keys and OUTPUT PROCEDURE.

Exercise 38: Payroll Multi-Sort Pipeline

A payroll system requires four different sort orders of the same payroll file for different downstream consumers:

  1. By department/employee for the department managers
  2. By pay grade/seniority for HR analysis
  3. By tax jurisdiction for tax reporting
  4. By check number for the check printing system

Design an efficient approach. Should you sort the file four times? Can you sort once and reorder using MERGE? Is there a way to produce all four outputs in fewer sort operations?

Exercise 39: Log File Analysis Sort

An application generates 500 million log records daily. A nightly analysis job must: 1. Sort by severity level (FATAL, ERROR, WARN, INFO, DEBUG) with FATAL first 2. Within each severity level, sort by timestamp ascending 3. In the OUTPUT PROCEDURE, count occurrences of each severity level and each unique error code 4. Produce a summary report showing the top 50 most frequent error codes 5. Write only ERROR and FATAL records to a separate exception file

Address the performance challenge of sorting 500 million records. Discuss memory requirements, SORTWK sizing, and elapsed time estimates.

Exercise 40: Year-End Financial Consolidation

Design a complete year-end financial consolidation system for a corporation with 15 subsidiaries. Each subsidiary produces a monthly general ledger file (12 months x 15 subsidiaries = 180 files). The system must:

  1. MERGE all monthly files for each subsidiary into annual files (15 merge operations)
  2. SORT each annual file by account code
  3. MERGE all 15 subsidiary annual files into a consolidated corporate file
  4. Produce a trial balance showing total debits and credits for each account code
  5. Validate that debits equal credits for each subsidiary and for the consolidated total
  6. Produce an audit report documenting every step

Design the multi-step JCL procedure, the COBOL programs for merge and sort steps, and the validation and reporting logic. Pay special attention to the restart/recovery strategy if any step fails.