Case Study 2: DFSORT for Transaction File Processing
Background
Meridian Trust Bank processes approximately 5.8 million financial transactions per day across its retail banking, commercial lending, and wealth management divisions. At the end of each business day, a single consolidated transaction file is produced by the online transaction processing system. This file contains every transaction in the order it was committed -- deposits, withdrawals, transfers, loan payments, fee assessments, interest postings, and wire transfers -- with no particular sorting or grouping.
The nightly batch cycle must transform this raw transaction file into multiple sorted, filtered, and reformatted outputs for downstream processing:
- Sorted transaction file for the general ledger posting system, ordered by posting date, account number, and transaction type
- Filtered deposit-only file for the Federal Reserve reporting system, containing only deposits above $10,000 (Currency Transaction Report threshold)
- Account summary file aggregating daily totals by account number for the account reconciliation system
- Split output files routing transactions to regional processing centers based on branch region codes
The bank's operations team has been performing these transformations with four separate COBOL programs, each reading the full 5.8-million-record input file. The combined elapsed time is 2 hours and 47 minutes. The team has been asked to reduce this to under 45 minutes.
This case study demonstrates how DFSORT (Data Facility Sort) can replace all four COBOL programs with a single, highly optimized sort job that completes in approximately 22 minutes.
The Transaction File Layout
The daily transaction file is a fixed-length sequential file with 250-byte records:
01 TRANSACTION-RECORD.
05 TR-ACCOUNT-NUMBER PIC X(10).
05 TR-BRANCH-CODE PIC X(4).
05 TR-REGION-CODE PIC X(2).
05 TR-TRANS-DATE PIC 9(8).
05 TR-TRANS-TIME PIC 9(6).
05 TR-TRANS-TYPE PIC X(2).
88 TR-IS-DEPOSIT VALUE 'DP'.
88 TR-IS-WITHDRAWAL VALUE 'WD'.
88 TR-IS-TRANSFER-IN VALUE 'TI'.
88 TR-IS-TRANSFER-OUT VALUE 'TO'.
88 TR-IS-LOAN-PAYMENT VALUE 'LP'.
88 TR-IS-FEE VALUE 'FE'.
88 TR-IS-INTEREST VALUE 'IN'.
88 TR-IS-WIRE-IN VALUE 'WI'.
88 TR-IS-WIRE-OUT VALUE 'WO'.
05 TR-AMOUNT PIC S9(11)V99.
05 TR-RUNNING-BALANCE PIC S9(13)V99.
05 TR-DESCRIPTION PIC X(40).
05 TR-TELLER-ID PIC X(8).
05 TR-TERMINAL-ID PIC X(8).
05 TR-AUTHORIZATION-CODE PIC X(12).
05 TR-CUSTOMER-NAME PIC X(30).
05 TR-REFERENCE-NUMBER PIC X(15).
05 TR-CURRENCY-CODE PIC X(3).
05 TR-CHANNEL-CODE PIC X(2).
05 FILLER PIC X(86).
Positions referenced in SORT control statements (1-based byte offsets): - Account Number: positions 1-10 - Branch Code: positions 11-14 - Region Code: positions 15-16 - Transaction Date: positions 17-24 - Transaction Time: positions 25-30 - Transaction Type: positions 31-32 - Amount: positions 33-45 (signed, V99 implied) - Description: positions 62-101
Solution: The Consolidated DFSORT Job
The following JCL replaces all four COBOL programs with a single DFSORT invocation using ICETOOL, DFSORT's multi-purpose utility that can perform multiple operations on the same input file in a single pass.
//MRDNSORT JOB (ACCT),'MERIDIAN DAILY SORT',
// CLASS=A,MSGCLASS=X,MSGLEVEL=(1,1),
// NOTIFY=&SYSUID,
// REGION=0M
//*
//*================================================================*
//* JOB: MRDNSORT - DAILY TRANSACTION FILE PROCESSING
//* REPLACES: TRSORT01, TRFILT01, TRSUM01, TRSPLIT01 PROGRAMS
//* INPUT: MERIDIAN.DAILY.TRANS.FILE
//* OUTPUT: SORTED FILE, CTR DEPOSITS, ACCOUNT SUMMARY,
//* REGIONAL SPLIT FILES
//*================================================================*
//*
//*------------------------------------------------------------*
//* STEP 1: SORT TRANSACTIONS FOR GENERAL LEDGER POSTING
//* SORT BY: DATE (ASC), ACCOUNT (ASC), TYPE (ASC)
//*------------------------------------------------------------*
//SORTGL EXEC PGM=SORT
//SYSOUT DD SYSOUT=*
//SORTIN DD DSN=MERIDIAN.DAILY.TRANS.FILE,
// DISP=SHR
//SORTOUT DD DSN=MERIDIAN.DAILY.TRANS.SORTED,
// DISP=(NEW,CATLG,DELETE),
// UNIT=SYSDA,
// SPACE=(CYL,(500,100),RLSE),
// DCB=(RECFM=FB,LRECL=250,BLKSIZE=27750)
//SORTWK01 DD UNIT=SYSDA,SPACE=(CYL,(200,50))
//SORTWK02 DD UNIT=SYSDA,SPACE=(CYL,(200,50))
//SORTWK03 DD UNIT=SYSDA,SPACE=(CYL,(200,50))
//SYSIN DD *
SORT FIELDS=(17,8,CH,A, * TRANS DATE - ASCENDING
1,10,CH,A, * ACCOUNT NUM - ASCENDING
31,2,CH,A) * TRANS TYPE - ASCENDING
OPTION EQUALS * PRESERVE ORDER OF EQUAL KEYS
/*
//*
//*------------------------------------------------------------*
//* STEP 2: FILTER DEPOSITS OVER $10,000 FOR CTR REPORTING
//* INCLUDE ONLY DEPOSIT TYPES (DP, WI, TI) WITH AMOUNT > 1000000
//* (AMOUNT FIELD IS PIC S9(11)V99, SO $10,000.00 = 1000000)
//*------------------------------------------------------------*
//FILTCTR EXEC PGM=SORT
//SYSOUT DD SYSOUT=*
//SORTIN DD DSN=MERIDIAN.DAILY.TRANS.FILE,
// DISP=SHR
//SORTOUT DD DSN=MERIDIAN.DAILY.CTR.DEPOSITS,
// DISP=(NEW,CATLG,DELETE),
// UNIT=SYSDA,
// SPACE=(CYL,(10,5),RLSE),
// DCB=(RECFM=FB,LRECL=250,BLKSIZE=27750)
//SYSIN DD *
SORT FIELDS=(1,10,CH,A, * SORT CTR FILE BY ACCOUNT
17,8,CH,A) * THEN BY DATE
*
INCLUDE COND=((31,2,CH,EQ,C'DP',
OR,
31,2,CH,EQ,C'WI',
OR,
31,2,CH,EQ,C'TI'),
AND,
33,13,ZD,GT,+1000000)
/*
//*
//*------------------------------------------------------------*
//* STEP 3: CREATE ACCOUNT SUMMARY WITH DAILY TOTALS
//* OUTPUT: ACCOUNT NUMBER, DEBIT TOTAL, CREDIT TOTAL, NET,
//* TRANSACTION COUNT
//*------------------------------------------------------------*
//ACCTSUM EXEC PGM=SORT
//SYSOUT DD SYSOUT=*
//SORTIN DD DSN=MERIDIAN.DAILY.TRANS.FILE,
// DISP=SHR
//SORTOUT DD DSN=MERIDIAN.DAILY.ACCT.SUMMARY,
// DISP=(NEW,CATLG,DELETE),
// UNIT=SYSDA,
// SPACE=(CYL,(50,20),RLSE),
// DCB=(RECFM=FB,LRECL=100,BLKSIZE=27700)
//SYSIN DD *
SORT FIELDS=(1,10,CH,A) * SORT BY ACCOUNT NUMBER
*
* REFORMAT RECORDS FOR SUMMARIZATION
* CREATE FIXED OUTPUT WITH ACCOUNT, AMOUNT, AND COUNT FIELD
INREC FIELDS=(1,10, * ACCOUNT NUMBER
33,13, * TRANSACTION AMOUNT (SIGNED)
80:X) * FILLER TO PAD
*
* SUM THE AMOUNTS BY ACCOUNT NUMBER (UNIQUE KEY IN POS 1-10)
SUM FIELDS=(11,13,ZD) * SUM AMOUNTS BY ACCOUNT
*
* REFORMAT OUTPUT TO FINAL SUMMARY LAYOUT
OUTREC FIELDS=(1,10, * ACCOUNT NUMBER
11,13, * NET AMOUNT TOTAL
C' SUMMARY', * LITERAL INDICATOR
52:X) * FILLER TO 100 BYTES
/*
//*
//*------------------------------------------------------------*
//* STEP 4: SPLIT TRANSACTIONS BY REGION CODE
//* REGION NE=NORTHEAST, SE=SOUTHEAST, MW=MIDWEST,
//* SW=SOUTHWEST, WC=WEST COAST, OT=OTHER
//*------------------------------------------------------------*
//REGSPLIT EXEC PGM=SORT
//SYSOUT DD SYSOUT=*
//SORTIN DD DSN=MERIDIAN.DAILY.TRANS.FILE,
// DISP=SHR
//SORTOUT DD DSN=MERIDIAN.DAILY.TRANS.SORTED.ALL,
// DISP=(NEW,CATLG,DELETE),
// UNIT=SYSDA,
// SPACE=(CYL,(500,100),RLSE),
// DCB=(RECFM=FB,LRECL=250,BLKSIZE=27750)
//REGNE DD DSN=MERIDIAN.DAILY.TRANS.REG.NE,
// DISP=(NEW,CATLG,DELETE),
// UNIT=SYSDA,
// SPACE=(CYL,(100,20),RLSE),
// DCB=(RECFM=FB,LRECL=250,BLKSIZE=27750)
//REGSE DD DSN=MERIDIAN.DAILY.TRANS.REG.SE,
// DISP=(NEW,CATLG,DELETE),
// UNIT=SYSDA,
// SPACE=(CYL,(100,20),RLSE),
// DCB=(RECFM=FB,LRECL=250,BLKSIZE=27750)
//REGMW DD DSN=MERIDIAN.DAILY.TRANS.REG.MW,
// DISP=(NEW,CATLG,DELETE),
// UNIT=SYSDA,
// SPACE=(CYL,(100,20),RLSE),
// DCB=(RECFM=FB,LRECL=250,BLKSIZE=27750)
//REGSW DD DSN=MERIDIAN.DAILY.TRANS.REG.SW,
// DISP=(NEW,CATLG,DELETE),
// UNIT=SYSDA,
// SPACE=(CYL,(80,20),RLSE),
// DCB=(RECFM=FB,LRECL=250,BLKSIZE=27750)
//REGWC DD DSN=MERIDIAN.DAILY.TRANS.REG.WC,
// DISP=(NEW,CATLG,DELETE),
// UNIT=SYSDA,
// SPACE=(CYL,(100,20),RLSE),
// DCB=(RECFM=FB,LRECL=250,BLKSIZE=27750)
//REGOT DD DSN=MERIDIAN.DAILY.TRANS.REG.OT,
// DISP=(NEW,CATLG,DELETE),
// UNIT=SYSDA,
// SPACE=(CYL,(20,10),RLSE),
// DCB=(RECFM=FB,LRECL=250,BLKSIZE=27750)
//SYSIN DD *
SORT FIELDS=(15,2,CH,A, * REGION CODE - ASCENDING
1,10,CH,A, * ACCOUNT NUM - ASCENDING
17,8,CH,A) * TRANS DATE - ASCENDING
*
OUTFIL FNAMES=REGNE,INCLUDE=(15,2,CH,EQ,C'NE')
OUTFIL FNAMES=REGSE,INCLUDE=(15,2,CH,EQ,C'SE')
OUTFIL FNAMES=REGMW,INCLUDE=(15,2,CH,EQ,C'MW')
OUTFIL FNAMES=REGSW,INCLUDE=(15,2,CH,EQ,C'SW')
OUTFIL FNAMES=REGWC,INCLUDE=(15,2,CH,EQ,C'WC')
OUTFIL FNAMES=REGOT,
INCLUDE=(15,2,CH,NE,C'NE',
AND,15,2,CH,NE,C'SE',
AND,15,2,CH,NE,C'MW',
AND,15,2,CH,NE,C'SW',
AND,15,2,CH,NE,C'WC')
/*
Deep Dive: Understanding Each SORT Operation
Operation 1: Multi-Key Sort for General Ledger
The general ledger posting system requires transactions in a specific order: date first (so postings are applied chronologically), then account number (so all transactions for one account are grouped), then transaction type (so debits and credits are applied in a consistent sequence).
SORT FIELDS=(17,8,CH,A, 1,10,CH,A, 31,2,CH,A)
Each field specification consists of four elements: starting position, length, data format, and sort direction. The CH (character) format treats the fields as EBCDIC strings with standard collating sequence. This works correctly for numeric strings stored as display (zoned decimal) values because the EBCDIC encoding of digits 0-9 is in ascending order.
The OPTION EQUALS directive guarantees that records with identical sort keys retain their original relative order. Without EQUALS, DFSORT may rearrange equal-key records for performance optimization. For financial processing, this deterministic behavior is required -- if two transactions for the same account on the same date have the same type, they must appear in the order they were originally committed.
Operation 2: INCLUDE for CTR Filtering
The Currency Transaction Report requires only deposit-type transactions exceeding $10,000. The INCLUDE statement combines type filtering with amount filtering:
INCLUDE COND=((31,2,CH,EQ,C'DP',
OR,
31,2,CH,EQ,C'WI',
OR,
31,2,CH,EQ,C'TI'),
AND,
33,13,ZD,GT,+1000000)
The parenthesized OR group matches any of the three deposit types: direct deposit (DP), wire-in (WI), or transfer-in (TI). The AND clause then requires the amount field (position 33, length 13, zoned decimal format) to be greater than 1,000,000 -- which represents $10,000.00 in the PIC S9(11)V99 format where the last two digits are implied decimal places.
The ZD (zoned decimal) format is critical here. The amount field in the source record is stored as a signed zoned decimal (PIC S9(11)V99), which means the sign is embedded in the last byte using EBCDIC zone bits. DFSORT's ZD format correctly interprets this encoding, including negative amounts that would have zone bits of X'Dx' instead of X'Fx'.
Operation 3: SUM for Account Aggregation
Creating the account summary requires three SORT operations working together: INREC to reshape the input, SUM to aggregate, and OUTREC to format the output.
INREC executes before the sort phase. It reformats each 250-byte input record into a shorter working record containing only the fields needed for summarization. This reduces the data volume that SORT must process, improving performance.
SUM FIELDS=(11,13,ZD) tells DFSORT to aggregate records that have identical sort keys. When two or more records have the same account number (positions 1-10 after INREC), DFSORT adds the amount fields together and produces a single output record. For 5.8 million transactions across approximately 4.2 million accounts, SUM reduces the output to at most 4.2 million summary records -- and in practice far fewer, since most accounts have multiple daily transactions.
OUTREC executes after the sort and SUM phases. It reformats the summarized record into the final 100-byte layout expected by the downstream reconciliation system.
Operation 4: OUTFIL for Regional Splitting
The OUTFIL statement is the key to efficient file splitting. Without OUTFIL, splitting by region would require either six separate SORT passes (one per region) or a COBOL program to read the sorted file and write to six outputs.
OUTFIL processes each record once during a single output pass. As each sorted record is written, DFSORT evaluates the INCLUDE conditions for each OUTFIL statement and routes the record to the appropriate output DD. A single record can match multiple OUTFIL conditions (though in this case, region codes are mutually exclusive).
The "other" region file (REGOT) uses a negated condition to catch any records that do not match the five defined regions. This defensive approach ensures that no transactions are lost due to unexpected region codes in the data -- a data quality issue that occurs periodically when new branches are onboarded before the region mapping table is updated.
Performance Analysis: DFSORT vs. COBOL Programs
The following metrics compare the original COBOL implementation against the DFSORT replacement:
| Metric | COBOL (4 programs) | DFSORT (1 job) | Improvement |
|---|---|---|---|
| Elapsed time | 167 minutes | 22 minutes | 87% reduction |
| CPU time | 48 minutes | 8 minutes | 83% reduction |
| I/O operations | 23.2 million | 6.1 million | 74% reduction |
| DASD reads of input | 4 full passes | 4 passes* | -- |
| Sort work space | N/A (COBOL tables) | 600 cylinders | Traded for speed |
| Lines of code | 3,200 (COBOL) | 74 (SORT control) | 98% reduction |
| Maintenance effort | 4 programs to maintain | 1 job to maintain | 75% reduction |
*Each SORT step reads the input file once. However, DFSORT's I/O is performed using optimized block-mode channel programs that are significantly faster than COBOL's record-at-a-time QSAM access.
The dramatic performance improvement comes from three factors:
-
Optimized I/O: DFSORT uses large block I/O with channel programs that bypass much of the overhead of standard access methods. Where a COBOL program issues one READ per record (5.8 million QSAM GETs), DFSORT reads entire tracks in a single channel operation.
-
In-memory sorting: DFSORT uses all available memory for sort workspace, employing sophisticated algorithms (Balanced Polyphase Merge, Tournament Sort) that minimize the number of merge passes. The COBOL sort-by-table approach was limited to 64 MB of working storage.
-
Single-pass output: The OUTFIL splitting produces all six regional files during the same output pass. The COBOL approach required a separate pass through the sorted file for each region.
Advanced Technique: ICETOOL for Multi-Operation Processing
For environments that require even more sophisticated processing, ICETOOL can orchestrate multiple SORT operations with cross-referencing between them. Here is an enhanced version that adds record counting and validation:
//MRDNTOOL JOB (ACCT),'MERIDIAN ICETOOL',
// CLASS=A,MSGCLASS=X,MSGLEVEL=(1,1),
// NOTIFY=&SYSUID,
// REGION=0M
//*
//*================================================================*
//* ICETOOL: VALIDATE AND PROCESS DAILY TRANSACTIONS
//*================================================================*
//*
//PROCESS EXEC PGM=ICETOOL
//TOOLMSG DD SYSOUT=*
//DFSMSG DD SYSOUT=*
//TRANIN DD DSN=MERIDIAN.DAILY.TRANS.FILE,
// DISP=SHR
//COUNTOUT DD DSN=MERIDIAN.DAILY.TRANS.COUNTS,
// DISP=(NEW,CATLG,DELETE),
// UNIT=SYSDA,
// SPACE=(TRK,(1,1),RLSE),
// DCB=(RECFM=FB,LRECL=80,BLKSIZE=27920)
//HIGHVAL DD DSN=MERIDIAN.DAILY.TRANS.HIGHVAL,
// DISP=(NEW,CATLG,DELETE),
// UNIT=SYSDA,
// SPACE=(CYL,(5,2),RLSE),
// DCB=(RECFM=FB,LRECL=250,BLKSIZE=27750)
//T1 DD DSN=&&TEMP1,
// UNIT=SYSDA,
// SPACE=(CYL,(100,50)),
// DCB=(RECFM=FB,LRECL=250,BLKSIZE=27750)
//TOOLIN DD *
* VERIFY MINIMUM RECORD COUNT (DETECT TRUNCATED INPUT)
COUNT FROM(TRANIN) HAVING(GT,100000) -
WRITE(COUNTOUT) -
TEXT('DAILY TRANSACTION COUNT VERIFIED')
*
* EXTRACT HIGH-VALUE TRANSACTIONS (OVER $100,000)
COPY FROM(TRANIN) TO(HIGHVAL) USING(CTL1)
*
* DETECT DUPLICATE REFERENCE NUMBERS (DATA QUALITY CHECK)
OCCUR FROM(TRANIN) LIST(COUNTOUT) -
ON(103,15,CH) -
HAVING(GT,1) -
TITLE('DUPLICATE REFERENCE NUMBERS DETECTED')
/*
//CTL1CNTL DD *
SORT FIELDS=(33,13,ZD,D, * AMOUNT DESCENDING
1,10,CH,A) * ACCOUNT ASCENDING
INCLUDE COND=(33,13,ZD,GT,+10000000)
OUTREC FIELDS=(1,250) * KEEP FULL RECORD
/*
The ICETOOL COUNT operation verifies that the input file contains at least 100,000 records. If the daily file is unexpectedly small (indicating a truncated extract or system failure), the job terminates before downstream processing begins. This prevents the bank from running an incomplete batch cycle.
The OCCUR operation identifies duplicate reference numbers -- a data quality issue that could indicate double-posted transactions. The output report lists every reference number that appears more than once, enabling the operations team to investigate before the records are posted to the general ledger.
Lessons Learned
1. Utility Programs Outperform Custom Code for Data Manipulation
DFSORT is written in highly optimized assembler language and uses hardware-level I/O techniques that COBOL programs cannot replicate. For pure data sorting, filtering, reformatting, and aggregation, DFSORT will always outperform an equivalent COBOL program. The COBOL language is best reserved for business logic that requires procedural decision-making -- calculations, validations, and conditional processing that SORT control statements cannot express.
2. SORT Control Statement Position References Are Error-Prone
Every field reference in SORT control statements uses absolute byte positions (starting at 1) rather than named fields. A single incorrect position number can produce subtly wrong results rather than an obvious error. The team maintains a master field-position reference document for every file layout and cross-checks every SORT control statement against it during code review. Many shops use COBOL copybook-to-SORT-field conversion tools to reduce this risk.
3. OUTFIL Eliminates Multi-Pass Processing
Before discovering OUTFIL, the team assumed that producing multiple output files required multiple passes through the input data. OUTFIL's ability to route records to different output DD names based on conditions turns a multi-pass problem into a single-pass operation. For large files, this alone can cut elapsed time by 50% or more.
4. SUM Requires Careful Attention to Data Types
The SUM operation adds fields arithmetically, and the data type specification (ZD, PD, BI, etc.) must exactly match the actual data format. A zoned decimal field specified as packed decimal will produce garbage results without an error message. The team validates every SUM operation by comparing a sample of aggregated results against manually calculated totals.
5. OPTION EQUALS Has a Performance Cost
The EQUALS option guarantees stable sort order but adds approximately 5-10% to sort elapsed time because it constrains the sort algorithm's optimization choices. It should be specified only when the application truly requires deterministic ordering of equal-key records.
Discussion Questions
-
The INCLUDE condition for CTR filtering uses a hard-coded threshold of 1,000,000 (representing $10,000.00). If regulatory requirements changed the threshold to $5,000, what single change to the SORT control statement would implement this? What process controls should surround such changes?
-
The regional split uses OUTFIL with explicit INCLUDE conditions for each region. An alternative approach would be to use OUTFIL with SPLIT by the region code field. What are the advantages and disadvantages of each approach?
-
The SUM operation aggregates all transaction amounts into a single net total per account. The downstream reconciliation system actually needs separate debit and credit totals. How would you modify the SORT control statements to produce separate summaries for debits and credits?
-
The performance comparison shows that DFSORT reduces I/O operations by 74%. Explain how DFSORT's block-mode I/O differs from COBOL's QSAM record-at-a-time I/O, and why this difference has such a large performance impact.
-
The ICETOOL OCCUR operation detects duplicate reference numbers. What other data quality checks could be implemented using ICETOOL operations, and how would you integrate these checks into the production batch schedule?
Connection to Chapter Concepts
This case study directly demonstrates several key concepts from Chapter 29:
-
DFSORT SORT statement (Section: DFSORT -- Data Facility Sort): The multi-key SORT FIELDS specification, including field position, length, format, and direction parameters.
-
INCLUDE and OMIT (Section: Filtering Records with INCLUDE/OMIT): The CTR filtering demonstrates compound conditions with AND/OR logic and comparison against literal values.
-
OUTREC and OUTFIL (Section: Reformatting and Splitting Output): OUTREC for record reformatting and OUTFIL for routing records to multiple output files based on field values.
-
SUM for aggregation (Section: Summarizing Records with SUM): The account summary demonstrates DFSORT's ability to collapse multiple records with identical keys into a single aggregated record.
-
ICETOOL (Section: ICETOOL -- The Multi-Purpose Utility): The advanced example shows COUNT, COPY, and OCCUR operations for validation and multi-step processing within a single job step.