> "People think COBOL can't handle strings. They're wrong. COBOL handles strings the way a surgeon handles a scalpel — precisely, carefully, and with full awareness of boundaries." — Maria Chen, during a code review of a CSV parser
In This Chapter
- 17.1 Introduction: Strings in a Fixed-Field World
- 17.2 The STRING Statement
- 17.3 The UNSTRING Statement
- 17.4 Parsing CSV Data: A Complete Example
- 17.5 The INSPECT Statement
- 17.6 Reference Modification
- 17.7 GlobalBank Case Study: Formatting Account Statements
- 17.8 MedClaim Case Study: Parsing Diagnosis Codes
- 17.9 Common String Handling Patterns
- 17.10 String Handling for Data Interchange
- 17.11 Name Formatting and Manipulation
- 17.12 Performance Considerations
- 17.13 Defensive String Handling
- 17.14 GnuCOBOL String Handling Notes
- 17.15 Putting It All Together: Data Formatting Program
- 17.16 INSPECT CONVERTING: Deep Dive
- 17.17 Reference Modification Patterns for Fixed-Width Fields
- 17.18 Data Cleansing Patterns
- 17.19 Advanced INSPECT Patterns
- 17.20 Combining String Facilities: A Production Pattern Library
- 17.21 EBCDIC vs. ASCII: String Handling Portability Considerations
- 17.22 Chapter Summary
- Key Terms Introduced in This Chapter
Chapter 17: String Handling
"People think COBOL can't handle strings. They're wrong. COBOL handles strings the way a surgeon handles a scalpel — precisely, carefully, and with full awareness of boundaries." — Maria Chen, during a code review of a CSV parser
17.1 Introduction: Strings in a Fixed-Field World
COBOL was born in an era of fixed-length fields punched into 80-column cards. A name was always 30 characters. An address was always 40. A state code was always 2. This fixed-field heritage is both COBOL's strength (data is always where you expect it) and its perceived weakness (what about variable-length data? delimiters? parsing?).
The truth is more nuanced. COBOL has a rich set of string manipulation tools that, while different from the regex-powered libraries of Python or Java, are remarkably capable. Four facilities form the core of COBOL string handling:
- STRING — Concatenates multiple fields and literals into a single field, with delimiter control and overflow detection
- UNSTRING — Parses a delimited string into multiple fields, with delimiter tracking, counting, and tallying
- INSPECT — Examines a field character by character, counting occurrences, replacing characters, or converting between character sets
- Reference modification — Accesses substrings within a field using offset and length notation
These tools matter more than ever. Modern mainframe COBOL programs increasingly need to:
- Parse CSV files received from distributed systems
- Build XML or JSON fragments for web service responses
- Format human-readable messages from fixed-field data
- Convert between character encodings (EBCDIC ↔ ASCII)
- Extract components from composite data elements (diagnosis codes, addresses, names)
💡 The Modernization Spectrum: This chapter's theme — The Modernization Spectrum — manifests in string handling because modern data interchange formats (CSV, XML, JSON, pipe-delimited) are inherently string-oriented. COBOL programs that can parse and produce these formats become integration points between the mainframe and the modern distributed world. String handling is not just text manipulation — it is a modernization enabler.
17.2 The STRING Statement
The STRING statement concatenates (joins) data from multiple sending fields into a single receiving field.
17.2.1 Basic Syntax
STRING
{identifier-1 | literal-1}
DELIMITED BY {identifier-2 | literal-2 | SIZE}
[{identifier-3 | literal-3}
DELIMITED BY {identifier-4 | literal-4 | SIZE}]
...
INTO identifier-5
[WITH POINTER identifier-6]
[ON OVERFLOW imperative-statement-1]
[NOT ON OVERFLOW imperative-statement-2]
END-STRING
Key elements:
- Sending fields: One or more fields or literals to concatenate. Each sending field has its own DELIMITED BY clause.
- DELIMITED BY: Controls how much of the sending field is used:
DELIMITED BY SIZE— uses the entire field, including trailing spacesDELIMITED BY SPACE— uses characters up to (but not including) the first spaceDELIMITED BY ','— uses characters up to (but not including) the first commaDELIMITED BY identifier— uses characters up to the first occurrence of that identifier's value- INTO: The receiving field where concatenated data is placed
- WITH POINTER: A numeric field tracking the current position in the receiving field. Must be initialized before the STRING and is updated after each character is placed.
- ON OVERFLOW: Executed if the receiving field is too small to hold all the concatenated data
17.2.2 Simple Concatenation
WORKING-STORAGE SECTION.
01 WS-FIRST-NAME PIC X(15) VALUE 'MARIA'.
01 WS-LAST-NAME PIC X(20) VALUE 'CHEN'.
01 WS-FULL-NAME PIC X(40) VALUE SPACES.
PROCEDURE DIVISION.
STRING WS-FIRST-NAME DELIMITED BY SPACE
' ' DELIMITED BY SIZE
WS-LAST-NAME DELIMITED BY SPACE
INTO WS-FULL-NAME
END-STRING.
After execution, WS-FULL-NAME contains: 'MARIA CHEN '
Note how DELIMITED BY SPACE stops at the first space in each sending field. Without it, WS-FIRST-NAME would contribute all 15 characters (including 10 trailing spaces), and you would get 'MARIA CHEN '.
⚠️ Critical Detail: DELIMITED BY SPACE means "stop at the first space character." If the data itself contains embedded spaces (like "NEW YORK"), only "NEW" would be used. For data with embedded spaces, use DELIMITED BY SIZE or a specific delimiter.
17.2.3 Using the POINTER Phrase
The POINTER phrase gives you control over where in the receiving field the STRING operation begins and tracks where it ends:
01 WS-OUTPUT PIC X(80) VALUE SPACES.
01 WS-PTR PIC 99 VALUE 1.
PROCEDURE DIVISION.
MOVE 1 TO WS-PTR
STRING 'ACCOUNT: ' DELIMITED BY SIZE
INTO WS-OUTPUT
WITH POINTER WS-PTR
END-STRING.
* WS-PTR is now 10 (next available position)
STRING WS-ACCT-NUM DELIMITED BY SPACE
INTO WS-OUTPUT
WITH POINTER WS-PTR
END-STRING.
* WS-PTR now points past the account number
STRING ' DATE: ' DELIMITED BY SIZE
WS-TXN-DATE DELIMITED BY SIZE
INTO WS-OUTPUT
WITH POINTER WS-PTR
END-STRING.
The POINTER is an index into the receiving field. It starts at the value you set (usually 1) and increments by 1 for each character placed. After the STRING completes, the POINTER value tells you the position after the last character placed.
💡 POINTER Must Be Initialized: The STRING statement does not automatically set the POINTER to 1. If you forget to initialize it, the STRING may start placing characters at an unexpected position — or overflow immediately if the pointer value exceeds the receiving field length.
17.2.4 ON OVERFLOW Handling
The ON OVERFLOW phrase detects when the receiving field is too small:
STRING WS-LINE-1 DELIMITED BY SIZE
WS-LINE-2 DELIMITED BY SIZE
WS-LINE-3 DELIMITED BY SIZE
INTO WS-OUTPUT
ON OVERFLOW
DISPLAY 'WARNING: Output truncated'
ADD 1 TO WS-OVERFLOW-CNT
NOT ON OVERFLOW
ADD 1 TO WS-SUCCESS-CNT
END-STRING.
When overflow occurs, the STRING statement stops placing characters at the boundary of the receiving field. Characters that would have exceeded the field are lost. The pointer (if used) reflects the position after the last character that fit.
⚠️ Defensive Programming: In production code, always use ON OVERFLOW. Silent truncation is a data integrity risk. At minimum, log the overflow for debugging. At MedClaim, James Okafor requires every STRING statement to have ON OVERFLOW: "If we're truncating data, I want to know about it — not discover it six months later during an audit."
17.2.5 Building Formatted Output
STRING is ideal for building formatted messages, log entries, and output records:
01 WS-MSG PIC X(200) VALUE SPACES.
01 WS-MSG-PTR PIC 999 VALUE 1.
MOVE 1 TO WS-MSG-PTR
STRING 'TXN ' DELIMITED BY SIZE
WS-TXN-ID DELIMITED BY SPACE
' FOR ACCT ' DELIMITED BY SIZE
WS-ACCT-NUM DELIMITED BY SPACE
' AMT $' DELIMITED BY SIZE
WS-AMT-EDIT DELIMITED BY SPACE
' ON ' DELIMITED BY SIZE
WS-DATE-EDIT DELIMITED BY SIZE
INTO WS-MSG
WITH POINTER WS-MSG-PTR
ON OVERFLOW
DISPLAY 'MSG OVERFLOW'
END-STRING.
Result: 'TXN 00012345 FOR ACCT 1234567890 AMT $1,234.56 ON 01/15/2024'
🧪 Try It Yourself: Write a program that takes a first name, middle initial, and last name (each in separate fields) and uses STRING to produce three formats: 1. "Last, First M." (e.g., "CHEN, MARIA C.") 2. "First M. Last" (e.g., "MARIA C. CHEN") 3. "F. Last" (e.g., "M. CHEN")
Experiment with fields that contain no middle initial (spaces) and see how DELIMITED BY SPACE behaves.
17.3 The UNSTRING Statement
If STRING joins fields together, UNSTRING splits them apart. UNSTRING parses a delimited source field into multiple receiving fields.
17.3.1 Basic Syntax
UNSTRING identifier-1
[DELIMITED BY [ALL] {identifier-2 | literal-1}
[OR [ALL] {identifier-3 | literal-2}] ...]
INTO identifier-4 [DELIMITER IN identifier-5]
[COUNT IN identifier-6]
[identifier-7 [DELIMITER IN identifier-8]
[COUNT IN identifier-9]] ...
[WITH POINTER identifier-10]
[TALLYING IN identifier-11]
[ON OVERFLOW imperative-statement-1]
[NOT ON OVERFLOW imperative-statement-2]
END-UNSTRING
This is one of COBOL's most complex statements, but its parts are logical:
- Source field (identifier-1): The string to parse
- DELIMITED BY: One or more delimiters that separate the fields. ALL means consecutive delimiters count as one (treats "a,,b" as two fields instead of three with an empty middle field).
- INTO fields: The receiving fields where parsed segments are placed
- DELIMITER IN: Captures which delimiter was actually found
- COUNT IN: Captures how many characters were moved into each receiving field
- WITH POINTER: Tracks parsing position (allows multiple UNSTRING passes)
- TALLYING IN: Counts how many receiving fields were populated
- ON OVERFLOW: Triggered when there are more delimited segments than receiving fields
17.3.2 Simple UNSTRING — Parsing a Name
01 WS-FULL-NAME PIC X(40)
VALUE 'MARIA CHEN'.
01 WS-FIRST PIC X(15).
01 WS-LAST PIC X(20).
UNSTRING WS-FULL-NAME
DELIMITED BY SPACE
INTO WS-FIRST
WS-LAST
END-UNSTRING.
After execution: WS-FIRST = 'MARIA ', WS-LAST = 'CHEN '
17.3.3 Multiple Delimiters
UNSTRING can handle multiple delimiter types using OR:
01 WS-DATE-STRING PIC X(10) VALUE '01/15/2024'.
01 WS-MONTH PIC X(02).
01 WS-DAY PIC X(02).
01 WS-YEAR PIC X(04).
UNSTRING WS-DATE-STRING
DELIMITED BY '/' OR '-' OR '.'
INTO WS-MONTH
WS-DAY
WS-YEAR
END-UNSTRING.
This handles dates in any of these formats: "01/15/2024", "01-15-2024", "01.15.2024".
17.3.4 DELIMITER IN — Tracking Which Delimiter Was Found
01 WS-INPUT PIC X(50)
VALUE 'FIELD1,FIELD2;FIELD3'.
01 WS-FLD1 PIC X(15).
01 WS-FLD2 PIC X(15).
01 WS-FLD3 PIC X(15).
01 WS-DELIM1 PIC X(01).
01 WS-DELIM2 PIC X(01).
UNSTRING WS-INPUT
DELIMITED BY ',' OR ';'
INTO WS-FLD1 DELIMITER IN WS-DELIM1
WS-FLD2 DELIMITER IN WS-DELIM2
WS-FLD3
END-UNSTRING.
After execution:
- WS-FLD1 = 'FIELD1 '
- WS-DELIM1 = ','
- WS-FLD2 = 'FIELD2 '
- WS-DELIM2 = ';'
- WS-FLD3 = 'FIELD3 '
17.3.5 COUNT IN — How Many Characters Were Parsed
01 WS-CNT1 PIC 99.
01 WS-CNT2 PIC 99.
01 WS-CNT3 PIC 99.
UNSTRING WS-INPUT
DELIMITED BY ','
INTO WS-FLD1 COUNT IN WS-CNT1
WS-FLD2 COUNT IN WS-CNT2
WS-FLD3 COUNT IN WS-CNT3
END-UNSTRING.
COUNT IN tells you how many characters from the source were placed in each receiving field. This is invaluable when you need to know the actual length of variable-length data.
17.3.6 TALLYING IN — Counting Parsed Fields
01 WS-FIELD-COUNT PIC 99 VALUE ZERO.
UNSTRING WS-CSV-LINE
DELIMITED BY ','
INTO WS-FLD1 WS-FLD2 WS-FLD3
WS-FLD4 WS-FLD5
TALLYING IN WS-FIELD-COUNT
END-UNSTRING.
After execution, WS-FIELD-COUNT contains the number of receiving fields that were populated. If the CSV line had only 3 fields, WS-FIELD-COUNT would be 3, and WS-FLD4/WS-FLD5 would be unchanged.
⚠️ TALLYING Must Be Initialized: TALLYING IN adds to the existing value. If you do not initialize it to zero before each UNSTRING, it will accumulate across multiple executions.
17.3.7 The ALL Phrase — Treating Consecutive Delimiters as One
Without ALL, consecutive delimiters create empty fields:
01 WS-DATA PIC X(20) VALUE 'A,,B,C'.
* DELIMITED BY ','
* Result: FLD1='A', FLD2='', FLD3='B', FLD4='C'
* DELIMITED BY ALL ','
* Result: FLD1='A', FLD2='B', FLD3='C'
ALL is essential when parsing data that may have varying amounts of whitespace:
UNSTRING WS-LINE
DELIMITED BY ALL SPACES
INTO WS-WORD1 WS-WORD2 WS-WORD3
END-UNSTRING.
This correctly parses 'THE QUICK FOX' into three words, ignoring the multiple spaces between them.
17.3.8 WITH POINTER — Multi-Pass Parsing
The POINTER phrase allows you to parse a string in multiple passes, continuing from where the last UNSTRING left off:
01 WS-PARSE-PTR PIC 999 VALUE 1.
* First pass: parse the header
MOVE 1 TO WS-PARSE-PTR
UNSTRING WS-RECORD
DELIMITED BY '|'
INTO WS-REC-TYPE
WS-REC-DATE
WITH POINTER WS-PARSE-PTR
END-UNSTRING.
* Second pass: parse the detail (starting where
* first pass left off)
UNSTRING WS-RECORD
DELIMITED BY '|'
INTO WS-ACCT-NUM
WS-AMOUNT
WS-DESC
WITH POINTER WS-PARSE-PTR
END-UNSTRING.
17.4 Parsing CSV Data: A Complete Example
One of the most common string handling tasks in modern COBOL is parsing CSV (Comma-Separated Values) files received from distributed systems. Let us build a complete CSV parser.
17.4.1 The Challenge
CSV parsing seems simple — just split on commas. But real CSV data has complications:
- Fields may be enclosed in quotes:
"Smith, John",25,New York - Embedded commas in quoted fields:
"123 Main St, Apt 4B" - Empty fields:
Smith,,New York(missing middle field) - Trailing commas:
Smith,25,(empty last field) - Varying numbers of fields per line
17.4.2 Simple CSV Parser (No Quoted Fields)
For CSV files without quoted fields, UNSTRING handles the task directly:
IDENTIFICATION DIVISION.
PROGRAM-ID. CSV-PARSE.
*================================================================
* Program: CSV-PARSE
* Purpose: Parse simple CSV file into fixed-format output
* Chapter: 17 - String Handling
* Context: GlobalBank - parse branch feed from web system
*================================================================
ENVIRONMENT DIVISION.
INPUT-OUTPUT SECTION.
FILE-CONTROL.
SELECT CSV-FILE ASSIGN TO CSVIN.
SELECT OUTPUT-FILE ASSIGN TO FIXOUT.
DATA DIVISION.
FILE SECTION.
FD CSV-FILE.
01 CSV-RECORD PIC X(500).
FD OUTPUT-FILE
RECORDING MODE IS F.
01 OUTPUT-RECORD.
05 OUT-ACCT-NUM PIC X(10).
05 OUT-NAME PIC X(30).
05 OUT-BALANCE PIC X(15).
05 OUT-STATUS PIC X(02).
05 OUT-BRANCH PIC X(05).
05 FILLER PIC X(18).
WORKING-STORAGE SECTION.
01 WS-FLAGS.
05 WS-EOF PIC X VALUE 'N'.
88 END-OF-FILE VALUE 'Y'.
01 WS-CSV-FIELDS.
05 WS-CSV-FLD1 PIC X(50).
05 WS-CSV-FLD2 PIC X(50).
05 WS-CSV-FLD3 PIC X(50).
05 WS-CSV-FLD4 PIC X(50).
05 WS-CSV-FLD5 PIC X(50).
01 WS-FIELD-COUNT PIC 99 VALUE ZERO.
01 WS-COUNTS.
05 WS-CNT1 PIC 999.
05 WS-CNT2 PIC 999.
05 WS-CNT3 PIC 999.
05 WS-CNT4 PIC 999.
05 WS-CNT5 PIC 999.
01 WS-RECORD-COUNT PIC 9(07) VALUE ZERO.
01 WS-ERROR-COUNT PIC 9(07) VALUE ZERO.
PROCEDURE DIVISION.
0000-MAIN.
OPEN INPUT CSV-FILE
OPEN OUTPUT OUTPUT-FILE
READ CSV-FILE
AT END SET END-OF-FILE TO TRUE
END-READ
* Skip header line
IF NOT END-OF-FILE
READ CSV-FILE
AT END SET END-OF-FILE TO TRUE
END-READ
END-IF
PERFORM 1000-PROCESS-CSV
UNTIL END-OF-FILE
DISPLAY 'CSV-PARSE COMPLETE'
DISPLAY ' RECORDS PROCESSED: ' WS-RECORD-COUNT
DISPLAY ' ERRORS: ' WS-ERROR-COUNT
CLOSE CSV-FILE OUTPUT-FILE
STOP RUN.
1000-PROCESS-CSV.
ADD 1 TO WS-RECORD-COUNT
INITIALIZE WS-CSV-FIELDS
MOVE ZERO TO WS-FIELD-COUNT
UNSTRING CSV-RECORD
DELIMITED BY ','
INTO WS-CSV-FLD1 COUNT IN WS-CNT1
WS-CSV-FLD2 COUNT IN WS-CNT2
WS-CSV-FLD3 COUNT IN WS-CNT3
WS-CSV-FLD4 COUNT IN WS-CNT4
WS-CSV-FLD5 COUNT IN WS-CNT5
TALLYING IN WS-FIELD-COUNT
ON OVERFLOW
DISPLAY 'WARNING: Too many fields in '
'record ' WS-RECORD-COUNT
END-UNSTRING
IF WS-FIELD-COUNT < 5
DISPLAY 'ERROR: Too few fields in record '
WS-RECORD-COUNT
' (found ' WS-FIELD-COUNT ')'
ADD 1 TO WS-ERROR-COUNT
ELSE
PERFORM 1100-FORMAT-OUTPUT
WRITE OUTPUT-RECORD
END-IF
READ CSV-FILE
AT END SET END-OF-FILE TO TRUE
END-READ.
1100-FORMAT-OUTPUT.
INITIALIZE OUTPUT-RECORD
MOVE WS-CSV-FLD1 TO OUT-ACCT-NUM
MOVE WS-CSV-FLD2 TO OUT-NAME
MOVE WS-CSV-FLD3 TO OUT-BALANCE
MOVE WS-CSV-FLD4 TO OUT-STATUS
MOVE WS-CSV-FLD5 TO OUT-BRANCH.
🧪 Try It Yourself: Create a CSV file with the following content and run the parser:
ACCT_NUM,NAME,BALANCE,STATUS,BRANCH
1234567890,MARIA CHEN,15234.56,AC,NYC01
9876543210,DEREK WASHINGTON,8901.23,AC,CHI01
5555555555,PRIYA KAPOOR,42567.89,AC,LAX01
17.4.3 Handling Quoted CSV Fields
For CSV with quoted fields, we need a more sophisticated approach. UNSTRING cannot directly handle quoted fields because the comma inside quotes is indistinguishable from the field separator. We must pre-process the record:
1200-PARSE-QUOTED-CSV.
* Strategy: scan character by character,
* replacing commas inside quotes with a placeholder,
* then UNSTRING on commas, then restore placeholders.
MOVE 'N' TO WS-IN-QUOTES
PERFORM VARYING WS-POS FROM 1 BY 1
UNTIL WS-POS > FUNCTION LENGTH(CSV-RECORD)
EVALUATE TRUE
WHEN CSV-RECORD(WS-POS:1) = '"'
IF WS-IN-QUOTES = 'N'
MOVE 'Y' TO WS-IN-QUOTES
ELSE
MOVE 'N' TO WS-IN-QUOTES
END-IF
MOVE SPACE TO CSV-RECORD(WS-POS:1)
WHEN CSV-RECORD(WS-POS:1) = ','
AND WS-IN-QUOTES = 'Y'
MOVE X'FF' TO CSV-RECORD(WS-POS:1)
END-EVALUATE
END-PERFORM
* Now UNSTRING on commas (embedded commas are X'FF')
UNSTRING CSV-RECORD
DELIMITED BY ','
INTO WS-CSV-FLD1 WS-CSV-FLD2 WS-CSV-FLD3
TALLYING IN WS-FIELD-COUNT
END-UNSTRING
* Restore X'FF' back to commas in each field
INSPECT WS-CSV-FLD1
REPLACING ALL X'FF' BY ','
INSPECT WS-CSV-FLD2
REPLACING ALL X'FF' BY ','
INSPECT WS-CSV-FLD3
REPLACING ALL X'FF' BY ','.
📊 Performance Note: This character-by-character scan is slower than a simple UNSTRING. For high-volume processing, consider whether quoted fields actually occur in your data. If they do not, the simple UNSTRING approach is both faster and clearer.
17.5 The INSPECT Statement
INSPECT examines a field character by character and performs three operations: counting, replacing, or converting characters.
17.5.1 INSPECT TALLYING — Counting Characters
INSPECT identifier-1 TALLYING
identifier-2 FOR {CHARACTERS | ALL literal-1 | LEADING literal-2}
[{BEFORE | AFTER} INITIAL {identifier-3 | literal-3}]
Count all occurrences:
01 WS-COMMA-COUNT PIC 99 VALUE ZERO.
MOVE ZERO TO WS-COMMA-COUNT
INSPECT WS-CSV-LINE
TALLYING WS-COMMA-COUNT FOR ALL ','
Count leading zeros:
01 WS-LEADING-ZEROS PIC 99 VALUE ZERO.
MOVE ZERO TO WS-LEADING-ZEROS
INSPECT WS-AMOUNT-STR
TALLYING WS-LEADING-ZEROS
FOR LEADING '0'
Count characters before a delimiter:
01 WS-CHARS-BEFORE PIC 99 VALUE ZERO.
MOVE ZERO TO WS-CHARS-BEFORE
INSPECT WS-DATA
TALLYING WS-CHARS-BEFORE
FOR CHARACTERS BEFORE INITIAL ','
Multiple counts in one INSPECT:
MOVE ZERO TO WS-DIGIT-CNT
WS-ALPHA-CNT
WS-SPACE-CNT
INSPECT WS-DATA
TALLYING
WS-DIGIT-CNT FOR ALL '0' '1' '2' '3' '4'
'5' '6' '7' '8' '9'
WS-SPACE-CNT FOR ALL SPACE
💡 Counting Commas to Determine Field Count: Before parsing CSV data with UNSTRING, count the commas to determine how many fields are present. This helps you validate the record before parsing:
MOVE ZERO TO WS-COMMA-COUNT
INSPECT CSV-RECORD
TALLYING WS-COMMA-COUNT FOR ALL ','
* Expected fields = commas + 1
IF WS-COMMA-COUNT + 1 NOT = WS-EXPECTED-FIELDS
DISPLAY 'FIELD COUNT MISMATCH'
END-IF
17.5.2 INSPECT REPLACING — Character Substitution
INSPECT identifier-1 REPLACING
{CHARACTERS BY {identifier-2 | literal-1} |
ALL {identifier-3 | literal-2} BY {identifier-4 | literal-3} |
LEADING {identifier-5 | literal-4} BY {identifier-6 | literal-5} |
FIRST {identifier-7 | literal-6} BY {identifier-8 | literal-7}}
[{BEFORE | AFTER} INITIAL {identifier-9 | literal-8}]
Replace all occurrences:
* Replace all hyphens with spaces
INSPECT WS-PHONE-NUM
REPLACING ALL '-' BY ' '
* Replace all commas with pipes
INSPECT WS-DATA
REPLACING ALL ',' BY '|'
Replace leading zeros with spaces:
INSPECT WS-AMOUNT
REPLACING LEADING '0' BY SPACE
Replace first occurrence only:
INSPECT WS-TEXT
REPLACING FIRST 'ERROR' BY 'WARN '
Replace with BEFORE/AFTER boundaries:
* Replace commas with semicolons, but only
* before the first pipe character
INSPECT WS-DATA
REPLACING ALL ',' BY ';'
BEFORE INITIAL '|'
17.5.3 INSPECT CONVERTING — Character Translation
CONVERTING replaces characters according to a translation table:
INSPECT identifier-1 CONVERTING
{identifier-2 | literal-1} TO {identifier-3 | literal-2}
[{BEFORE | AFTER} INITIAL {identifier-4 | literal-3}]
Convert lowercase to uppercase:
INSPECT WS-NAME
CONVERTING
'abcdefghijklmnopqrstuvwxyz'
TO
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
Convert special characters to spaces:
INSPECT WS-DATA
CONVERTING
'!@#$%^&*()'
TO
' '
Convert digits to their word equivalents (single-character mapping):
* This only works for single-character to
* single-character conversion
INSPECT WS-CODE
CONVERTING
'ABCDEF'
TO
'123456'
⚠️ CONVERTING Rule: The FROM and TO strings must be the same length. Each character in the FROM string is replaced by the character in the corresponding position of the TO string. This is a character-by-character mapping, not a substring replacement.
17.5.4 Combined TALLYING and REPLACING
You can combine TALLYING and REPLACING in a single INSPECT:
MOVE ZERO TO WS-SPACE-CNT
INSPECT WS-DATA
TALLYING WS-SPACE-CNT FOR ALL SPACE
REPLACING ALL SPACE BY '-'
This counts the spaces AND replaces them in a single pass — more efficient than two separate INSPECT statements.
17.6 Reference Modification
Reference modification provides direct substring access using a parenthesized offset and length notation.
17.6.1 Syntax
identifier(offset:length)
- offset: Starting position (1-based). Can be a literal, data-name, or arithmetic expression.
- length: Number of characters to access. Can be a literal, data-name, or arithmetic expression. Optional — if omitted, everything from offset to the end of the field is referenced.
17.6.2 Basic Examples
01 WS-DATE PIC X(08) VALUE '20240115'.
01 WS-YEAR PIC X(04).
01 WS-MONTH PIC X(02).
01 WS-DAY PIC X(02).
* Extract components
MOVE WS-DATE(1:4) TO WS-YEAR *> '2024'
MOVE WS-DATE(5:2) TO WS-MONTH *> '01'
MOVE WS-DATE(7:2) TO WS-DAY *> '15'
* Format as MM/DD/YYYY
01 WS-FORMATTED-DATE PIC X(10).
STRING WS-DATE(5:2) DELIMITED BY SIZE
'/' DELIMITED BY SIZE
WS-DATE(7:2) DELIMITED BY SIZE
'/' DELIMITED BY SIZE
WS-DATE(1:4) DELIMITED BY SIZE
INTO WS-FORMATTED-DATE
END-STRING.
* Result: '01/15/2024'
17.6.3 Reference Modification with Variables
The real power of reference modification comes when the offset and length are computed at runtime:
01 WS-TEXT PIC X(100).
01 WS-POS PIC 999.
01 WS-LEN PIC 999.
01 WS-SUBSTR PIC X(50).
* Extract a variable-position substring
MOVE WS-TEXT(WS-POS:WS-LEN) TO WS-SUBSTR
17.6.4 Scanning with Reference Modification
You can build a character-by-character scanner using reference modification:
* Find the position of the first comma
MOVE ZERO TO WS-COMMA-POS
PERFORM VARYING WS-POS FROM 1 BY 1
UNTIL WS-POS > FUNCTION LENGTH(WS-DATA)
OR WS-COMMA-POS > 0
IF WS-DATA(WS-POS:1) = ','
MOVE WS-POS TO WS-COMMA-POS
END-IF
END-PERFORM
17.6.5 Right-Trimming with Reference Modification
COBOL fields are always padded with spaces on the right. To find the actual length of the data (excluding trailing spaces):
* Find the last non-space character
MOVE FUNCTION LENGTH(WS-NAME) TO WS-ACTUAL-LEN
PERFORM UNTIL WS-ACTUAL-LEN = 0
OR WS-NAME(WS-ACTUAL-LEN:1) NOT = SPACE
SUBTRACT 1 FROM WS-ACTUAL-LEN
END-PERFORM
* WS-ACTUAL-LEN is now the position of the last
* non-space character
💡 FUNCTION LENGTH: The intrinsic function LENGTH returns the defined length of a data item. Combined with reference modification, it enables dynamic string operations. We will explore intrinsic functions fully in Chapter 20.
17.6.6 Reference Modification vs. Substring Moves
Reference modification can appear on either side of a MOVE:
* Extract (reading)
MOVE WS-RECORD(15:10) TO WS-FIELD
* Insert (writing)
MOVE WS-NEW-VALUE TO WS-RECORD(15:10)
⚠️ Boundary Checking: If the offset or offset+length-1 exceeds the field's length, the results are undefined (and may cause an S0C4 ABEND on mainframes). Always validate your computed offsets and lengths:
IF WS-POS > 0
AND WS-POS + WS-LEN - 1
<= FUNCTION LENGTH(WS-DATA)
MOVE WS-DATA(WS-POS:WS-LEN) TO WS-RESULT
ELSE
DISPLAY 'SUBSTRING OUT OF BOUNDS'
END-IF
17.7 GlobalBank Case Study: Formatting Account Statements
Maria Chen needs to generate formatted statement lines from fixed-field transaction records. Each line must look like:
01/15/2024 DEPOSIT POS #1234-NYC01 $ 1,234.56 Balance: $ 45,678.90
Here is the string handling code:
IDENTIFICATION DIVISION.
PROGRAM-ID. GBFMT01.
*================================================================
* Program: GBFMT01
* Purpose: Format transaction records for account statements
* Chapter: 17 - String Handling
* Context: GlobalBank - customer statement generation
*================================================================
ENVIRONMENT DIVISION.
INPUT-OUTPUT SECTION.
FILE-CONTROL.
SELECT TXN-FILE ASSIGN TO TXNIN.
SELECT STMT-FILE ASSIGN TO STMTOUT.
DATA DIVISION.
FILE SECTION.
FD TXN-FILE.
01 TXN-REC.
05 TXN-ACCT PIC X(10).
05 TXN-DATE PIC 9(08).
05 TXN-TIME PIC 9(06).
05 TXN-TYPE PIC X(01).
05 TXN-AMOUNT PIC S9(09)V99 COMP-3.
05 TXN-BRANCH PIC X(05).
05 TXN-TELLER PIC X(06).
05 TXN-DESC PIC X(30).
05 TXN-REF-NUM PIC X(10).
05 FILLER PIC X(06).
FD STMT-FILE.
01 STMT-LINE PIC X(132).
WORKING-STORAGE SECTION.
01 WS-EOF PIC X VALUE 'N'.
88 END-OF-FILE VALUE 'Y'.
01 WS-FORMATTED-DATE PIC X(10).
01 WS-TYPE-DESC PIC X(12).
01 WS-AMT-EDITED PIC $ZZZ,ZZ9.99.
01 WS-BAL-EDITED PIC $ZZZ,ZZZ,ZZ9.99.
01 WS-RUNNING-BAL PIC S9(11)V99 VALUE ZERO.
01 WS-STMT-PTR PIC 999.
PROCEDURE DIVISION.
0000-MAIN.
OPEN INPUT TXN-FILE
OPEN OUTPUT STMT-FILE
READ TXN-FILE
AT END SET END-OF-FILE TO TRUE
END-READ
PERFORM 1000-FORMAT-TRANSACTION
UNTIL END-OF-FILE
CLOSE TXN-FILE STMT-FILE
STOP RUN.
1000-FORMAT-TRANSACTION.
* Format the date: YYYYMMDD -> MM/DD/YYYY
STRING TXN-DATE(5:2) DELIMITED BY SIZE
'/' DELIMITED BY SIZE
TXN-DATE(7:2) DELIMITED BY SIZE
'/' DELIMITED BY SIZE
TXN-DATE(1:4) DELIMITED BY SIZE
INTO WS-FORMATTED-DATE
END-STRING
* Set type description
EVALUATE TXN-TYPE
WHEN 'D'
MOVE 'DEPOSIT ' TO WS-TYPE-DESC
ADD TXN-AMOUNT TO WS-RUNNING-BAL
WHEN 'W'
MOVE 'WITHDRAWAL ' TO WS-TYPE-DESC
SUBTRACT TXN-AMOUNT FROM WS-RUNNING-BAL
WHEN 'T'
MOVE 'TRANSFER ' TO WS-TYPE-DESC
ADD TXN-AMOUNT TO WS-RUNNING-BAL
WHEN 'F'
MOVE 'FEE ' TO WS-TYPE-DESC
SUBTRACT TXN-AMOUNT FROM WS-RUNNING-BAL
WHEN 'I'
MOVE 'INTEREST ' TO WS-TYPE-DESC
ADD TXN-AMOUNT TO WS-RUNNING-BAL
WHEN OTHER
MOVE 'UNKNOWN ' TO WS-TYPE-DESC
END-EVALUATE
* Format amounts
MOVE TXN-AMOUNT TO WS-AMT-EDITED
MOVE WS-RUNNING-BAL TO WS-BAL-EDITED
* Build the statement line using STRING
MOVE SPACES TO STMT-LINE
MOVE 1 TO WS-STMT-PTR
STRING WS-FORMATTED-DATE DELIMITED BY SIZE
' ' DELIMITED BY SIZE
WS-TYPE-DESC DELIMITED BY SPACE
' ' DELIMITED BY SIZE
TXN-DESC DELIMITED BY SPACE
' ' DELIMITED BY SIZE
WS-AMT-EDITED DELIMITED BY SIZE
' Balance: ' DELIMITED BY SIZE
WS-BAL-EDITED DELIMITED BY SIZE
INTO STMT-LINE
WITH POINTER WS-STMT-PTR
ON OVERFLOW
DISPLAY 'STMT LINE OVERFLOW'
END-STRING
WRITE STMT-LINE
READ TXN-FILE
AT END SET END-OF-FILE TO TRUE
END-READ.
17.8 MedClaim Case Study: Parsing Diagnosis Codes
At MedClaim, diagnosis codes arrive in ICD-10 format: a letter followed by two digits, a decimal point, and up to four additional characters (e.g., "E11.65" for Type 2 diabetes with hyperglycemia). Sarah Kim needs to parse these codes into their components for reporting:
01 WS-ICD10-CODE PIC X(08).
01 WS-ICD10-PARTS.
05 WS-ICD-CATEGORY PIC X(01).
05 WS-ICD-ETIOLOGY PIC X(02).
05 WS-ICD-DETAIL PIC X(04).
01 WS-DOT-POS PIC 99.
2000-PARSE-DIAGNOSIS.
* Validate basic format: letter + 2 digits + dot
IF WS-ICD10-CODE(1:1) NOT ALPHABETIC
MOVE 'INVALID: NOT ALPHA START' TO WS-ERR-MSG
PERFORM 9000-LOG-ERROR
EXIT PARAGRAPH
END-IF
IF WS-ICD10-CODE(2:2) NOT NUMERIC
MOVE 'INVALID: NON-NUMERIC AFTER CATEGORY'
TO WS-ERR-MSG
PERFORM 9000-LOG-ERROR
EXIT PARAGRAPH
END-IF
* Extract category letter
MOVE WS-ICD10-CODE(1:1) TO WS-ICD-CATEGORY
* Extract etiology (2 digits after category)
MOVE WS-ICD10-CODE(2:2) TO WS-ICD-ETIOLOGY
* Find the decimal point
MOVE ZERO TO WS-DOT-POS
INSPECT WS-ICD10-CODE
TALLYING WS-DOT-POS
FOR CHARACTERS BEFORE INITIAL '.'
IF WS-DOT-POS = 0
* No dot — code has no detail portion
MOVE SPACES TO WS-ICD-DETAIL
ELSE
* Extract detail after the dot
ADD 2 TO WS-DOT-POS
* WS-DOT-POS now points past the dot
IF WS-DOT-POS <=
FUNCTION LENGTH(WS-ICD10-CODE)
MOVE WS-ICD10-CODE(WS-DOT-POS:)
TO WS-ICD-DETAIL
ELSE
MOVE SPACES TO WS-ICD-DETAIL
END-IF
END-IF.
🔵 Sarah Kim's Note: "The ICD-10 parsing looks simple, but the devil is in edge cases. Some legacy claims use ICD-9 format (3-5 digits, no leading letter). Our parser must handle both formats, plus malformed codes from manual data entry. Defensive string handling — checking every assumption about the data — is non-negotiable in claims processing."
17.9 Common String Handling Patterns
17.9.1 Pattern: Left-Trim
Remove leading spaces from a field:
01 WS-LEADING-SPACES PIC 999 VALUE ZERO.
MOVE ZERO TO WS-LEADING-SPACES
INSPECT WS-DATA
TALLYING WS-LEADING-SPACES
FOR LEADING SPACE
IF WS-LEADING-SPACES > 0
MOVE WS-DATA(WS-LEADING-SPACES + 1:)
TO WS-DATA
END-IF.
17.9.2 Pattern: Right-Trim (Find Actual Length)
01 WS-ACTUAL-LEN PIC 999.
MOVE FUNCTION LENGTH(WS-DATA) TO WS-ACTUAL-LEN
PERFORM UNTIL WS-ACTUAL-LEN = 0
OR WS-DATA(WS-ACTUAL-LEN:1) NOT = SPACE
SUBTRACT 1 FROM WS-ACTUAL-LEN
END-PERFORM.
* WS-ACTUAL-LEN is the length of data without
* trailing spaces. Zero means the field is all spaces.
17.9.3 Pattern: Center a String
01 WS-CENTERED PIC X(80).
01 WS-PAD-LEN PIC 99.
* Assume WS-ACTUAL-LEN has been computed
COMPUTE WS-PAD-LEN =
(80 - WS-ACTUAL-LEN) / 2
MOVE SPACES TO WS-CENTERED
MOVE WS-DATA(1:WS-ACTUAL-LEN)
TO WS-CENTERED(WS-PAD-LEN + 1:WS-ACTUAL-LEN).
17.9.4 Pattern: Replace Substring
INSPECT REPLACING ALL works for single characters or fixed strings. For variable-length substring replacement, use reference modification with STRING:
* Replace "OLD" with "NEW" at position WS-FOUND-POS
MOVE SPACES TO WS-RESULT
MOVE 1 TO WS-PTR
STRING WS-DATA(1:WS-FOUND-POS - 1)
DELIMITED BY SIZE
'NEW'
DELIMITED BY SIZE
WS-DATA(WS-FOUND-POS + 3:)
DELIMITED BY SIZE
INTO WS-RESULT
WITH POINTER WS-PTR
END-STRING.
17.9.5 Pattern: Token-by-Token Parsing
For parsing a string one token at a time (useful for command strings or free-form input):
01 WS-INPUT PIC X(200).
01 WS-TOKEN PIC X(50).
01 WS-TOKEN-PTR PIC 999 VALUE 1.
01 WS-TOKEN-CNT PIC 99 VALUE ZERO.
MOVE 1 TO WS-TOKEN-PTR
PERFORM UNTIL WS-TOKEN-PTR >
FUNCTION LENGTH(WS-INPUT)
INITIALIZE WS-TOKEN
MOVE ZERO TO WS-TOKEN-CNT
UNSTRING WS-INPUT
DELIMITED BY ALL SPACES
INTO WS-TOKEN
WITH POINTER WS-TOKEN-PTR
TALLYING IN WS-TOKEN-CNT
END-UNSTRING
IF WS-TOKEN-CNT > 0
PERFORM 2000-PROCESS-TOKEN
END-IF
END-PERFORM.
17.9.6 Pattern: Building Pipe-Delimited Output
For producing output that distributed systems can consume:
01 WS-OUTPUT PIC X(500) VALUE SPACES.
01 WS-OUT-PTR PIC 999 VALUE 1.
MOVE 1 TO WS-OUT-PTR
STRING WS-ACCT-NUM DELIMITED BY SPACE
'|' DELIMITED BY SIZE
WS-NAME DELIMITED BY SPACE
'|' DELIMITED BY SIZE
WS-DATE DELIMITED BY SIZE
'|' DELIMITED BY SIZE
WS-AMOUNT-EDIT DELIMITED BY SPACE
'|' DELIMITED BY SIZE
WS-STATUS DELIMITED BY SPACE
INTO WS-OUTPUT
WITH POINTER WS-OUT-PTR
END-STRING.
17.10 String Handling for Data Interchange
17.10.1 Building XML Fragments
COBOL programs increasingly need to produce XML for web services:
01 WS-XML-BUFFER PIC X(2000) VALUE SPACES.
01 WS-XML-PTR PIC 9(04) VALUE 1.
MOVE 1 TO WS-XML-PTR
STRING '<transaction>' DELIMITED BY SIZE
'<account>' DELIMITED BY SIZE
WS-ACCT-NUM DELIMITED BY SPACE
'</account>' DELIMITED BY SIZE
'<date>' DELIMITED BY SIZE
WS-TXN-DATE DELIMITED BY SIZE
'</date>' DELIMITED BY SIZE
'<amount>' DELIMITED BY SIZE
WS-AMOUNT-STR DELIMITED BY SPACE
'</amount>' DELIMITED BY SIZE
'<type>' DELIMITED BY SIZE
WS-TXN-TYPE DELIMITED BY SPACE
'</type>' DELIMITED BY SIZE
'</transaction>' DELIMITED BY SIZE
INTO WS-XML-BUFFER
WITH POINTER WS-XML-PTR
ON OVERFLOW
DISPLAY 'XML BUFFER OVERFLOW'
END-STRING.
🔗 Cross-Reference: Chapter 39 (Real-Time Integration) and Chapter 40 (COBOL and the Modern Stack) explore XML and JSON processing in depth, including the XML GENERATE and JSON GENERATE statements available in modern Enterprise COBOL.
17.10.2 Parsing Key-Value Pairs
Configuration files or API responses sometimes contain key=value pairs:
* Input: 'TIMEOUT=30|RETRY=3|HOST=MAINFRAME01'
01 WS-CONFIG PIC X(200).
01 WS-PAIR PIC X(50).
01 WS-KEY PIC X(20).
01 WS-VALUE PIC X(30).
01 WS-CFG-PTR PIC 999 VALUE 1.
01 WS-PAIR-CNT PIC 99 VALUE ZERO.
MOVE 1 TO WS-CFG-PTR
PERFORM UNTIL WS-CFG-PTR >
FUNCTION LENGTH(WS-CONFIG)
INITIALIZE WS-PAIR WS-KEY WS-VALUE
MOVE ZERO TO WS-PAIR-CNT
UNSTRING WS-CONFIG
DELIMITED BY '|'
INTO WS-PAIR
WITH POINTER WS-CFG-PTR
TALLYING IN WS-PAIR-CNT
END-UNSTRING
IF WS-PAIR-CNT > 0
UNSTRING WS-PAIR
DELIMITED BY '='
INTO WS-KEY WS-VALUE
END-UNSTRING
PERFORM 3000-APPLY-CONFIG
END-IF
END-PERFORM.
17.11 Name Formatting and Manipulation
Name formatting is a surprisingly common and tricky string handling task. Names have varied structures and cultural conventions.
17.11.1 Name Inversion: "First Last" to "Last, First"
01 WS-DISPLAY-NAME PIC X(40) VALUE 'MARIA CHEN'.
01 WS-FORMAL-NAME PIC X(40).
01 WS-FIRST PIC X(20).
01 WS-LAST PIC X(20).
INITIALIZE WS-FIRST WS-LAST
UNSTRING WS-DISPLAY-NAME
DELIMITED BY SPACE
INTO WS-FIRST WS-LAST
END-UNSTRING
MOVE SPACES TO WS-FORMAL-NAME
STRING WS-LAST DELIMITED BY SPACE
', ' DELIMITED BY SIZE
WS-FIRST DELIMITED BY SPACE
INTO WS-FORMAL-NAME
END-STRING.
* Result: 'CHEN, MARIA'
17.11.2 Extracting Initials
01 WS-FULL-NAME PIC X(60)
VALUE 'DEREK JAMES WASHINGTON'.
01 WS-INITIALS PIC X(10) VALUE SPACES.
01 WS-WORD PIC X(20).
01 WS-NAME-PTR PIC 999 VALUE 1.
01 WS-INIT-PTR PIC 99 VALUE 1.
01 WS-TALLY-CNT PIC 99 VALUE ZERO.
MOVE 1 TO WS-NAME-PTR
MOVE 1 TO WS-INIT-PTR
PERFORM UNTIL WS-NAME-PTR >
FUNCTION LENGTH(WS-FULL-NAME)
INITIALIZE WS-WORD
MOVE ZERO TO WS-TALLY-CNT
UNSTRING WS-FULL-NAME
DELIMITED BY ALL SPACES
INTO WS-WORD
WITH POINTER WS-NAME-PTR
TALLYING IN WS-TALLY-CNT
END-UNSTRING
IF WS-TALLY-CNT > 0
MOVE WS-WORD(1:1)
TO WS-INITIALS(WS-INIT-PTR:1)
ADD 1 TO WS-INIT-PTR
END-IF
END-PERFORM.
* Result: WS-INITIALS = 'DJW'
17.12 Performance Considerations
17.12.1 STRING and UNSTRING vs. MOVE
For simple field-to-field operations, MOVE is always faster than STRING:
* Slower:
STRING WS-SOURCE DELIMITED BY SIZE
INTO WS-TARGET
END-STRING
* Faster:
MOVE WS-SOURCE TO WS-TARGET
Use STRING only when you need concatenation, delimiter handling, or pointer tracking.
17.12.2 INSPECT vs. Reference Modification Loop
For counting a single character, INSPECT is typically faster than a manual loop because it may be implemented with hardware string instructions on mainframes:
* Faster (may use hardware instructions):
INSPECT WS-DATA
TALLYING WS-COUNT FOR ALL ','
* Slower (always character-by-character):
MOVE ZERO TO WS-COUNT
PERFORM VARYING WS-POS FROM 1 BY 1
UNTIL WS-POS > 100
IF WS-DATA(WS-POS:1) = ','
ADD 1 TO WS-COUNT
END-IF
END-PERFORM
17.12.3 Minimizing STRING Operations in Loops
If you are building the same type of output line millions of times, consider pre-building a template and using MOVE with reference modification instead of STRING:
* Template approach (faster in tight loops):
MOVE WS-TEMPLATE TO WS-OUTPUT
MOVE WS-ACCT-NUM TO WS-OUTPUT(1:10)
MOVE WS-DATE-EDIT TO WS-OUTPUT(15:10)
MOVE WS-AMT-EDIT TO WS-OUTPUT(30:14)
* STRING approach (slower but more readable):
STRING WS-ACCT-NUM DELIMITED BY SPACE
...
INTO WS-OUTPUT
📊 When Performance Matters: For batch programs processing millions of records, the performance difference between STRING and direct MOVE can be significant. Maria Chen's rule: "Use STRING for readability in low-volume paths. Use MOVE with reference modification for performance in high-volume inner loops."
17.13 Defensive String Handling
String operations are a common source of production ABENDs. Defensive programming is essential.
17.13.1 Always Initialize Receiving Fields
* Before STRING:
MOVE SPACES TO WS-OUTPUT
MOVE 1 TO WS-PTR
* Before UNSTRING:
INITIALIZE WS-FLD1 WS-FLD2 WS-FLD3
MOVE ZERO TO WS-TALLY-CNT
17.13.2 Always Use ON OVERFLOW
STRING ... INTO WS-OUTPUT
ON OVERFLOW
PERFORM 9100-LOG-OVERFLOW
END-STRING
UNSTRING ... INTO WS-FLD1 WS-FLD2
ON OVERFLOW
PERFORM 9200-LOG-TOO-MANY-FIELDS
END-UNSTRING
17.13.3 Validate Reference Modification Bounds
IF WS-POS > 0
AND WS-POS <= FUNCTION LENGTH(WS-DATA)
AND WS-POS + WS-LEN - 1
<= FUNCTION LENGTH(WS-DATA)
MOVE WS-DATA(WS-POS:WS-LEN) TO WS-RESULT
ELSE
MOVE SPACES TO WS-RESULT
PERFORM 9300-LOG-BOUNDS-ERROR
END-IF
17.13.4 Handle Empty Input
* Check for empty/blank input before parsing
IF WS-INPUT = SPACES
DISPLAY 'WARNING: Empty input record'
ADD 1 TO WS-EMPTY-CNT
ELSE
PERFORM 2000-PARSE-RECORD
END-IF
⚠️ The Blank Record Trap: An entirely blank record passed to UNSTRING with DELIMITED BY SPACE or DELIMITED BY ',' will produce unexpected results. Always check for blank records before parsing.
17.14 GnuCOBOL String Handling Notes
For students using GnuCOBOL:
-
All four facilities are supported: STRING, UNSTRING, INSPECT, and reference modification all work in GnuCOBOL.
-
EBCDIC vs. ASCII: On ASCII systems, character ordering differs from EBCDIC. This affects INSPECT CONVERTING if you rely on character-range assumptions. Explicit character lists (as shown in this chapter) are portable.
-
FUNCTION LENGTH: Fully supported and recommended for portable code.
-
Performance: GnuCOBOL's string operations are adequate for development and testing. Mainframe z/Architecture includes hardware string instructions (CLCL, MVCL, TRT) that make INSPECT and reference modification faster at production scale.
🧪 Try It Yourself: Write a GnuCOBOL program that: 1. Reads a pipe-delimited file (NAME|CITY|STATE|ZIP) 2. Uses UNSTRING to parse each record 3. Uses STRING to produce formatted output ("NAME, CITY, STATE ZIP") 4. Uses INSPECT CONVERTING to convert names to uppercase 5. Uses reference modification to extract the first 3 digits of the ZIP code
17.15 Putting It All Together: Data Formatting Program
Here is a complete program that demonstrates all four string handling facilities working together — parsing a delimited input file, transforming the data, and producing formatted output:
IDENTIFICATION DIVISION.
PROGRAM-ID. STRFMT01.
*================================================================
* Program: STRFMT01
* Purpose: Demonstrate STRING, UNSTRING, INSPECT, and
* reference modification working together
* Input: Pipe-delimited customer records
* Output: Formatted customer report lines
*================================================================
ENVIRONMENT DIVISION.
INPUT-OUTPUT SECTION.
FILE-CONTROL.
SELECT INPUT-FILE ASSIGN TO INFILE.
SELECT OUTPUT-FILE ASSIGN TO OUTFILE.
DATA DIVISION.
FILE SECTION.
FD INPUT-FILE.
01 INPUT-REC PIC X(300).
FD OUTPUT-FILE.
01 OUTPUT-REC PIC X(132).
WORKING-STORAGE SECTION.
01 WS-EOF PIC X VALUE 'N'.
88 END-OF-FILE VALUE 'Y'.
01 WS-PARSED.
05 WS-CUST-ID PIC X(10).
05 WS-FIRST-NAME PIC X(20).
05 WS-LAST-NAME PIC X(25).
05 WS-STREET PIC X(40).
05 WS-CITY PIC X(25).
05 WS-STATE PIC X(02).
05 WS-ZIP PIC X(10).
05 WS-PHONE PIC X(15).
01 WS-FIELD-COUNT PIC 99 VALUE ZERO.
01 WS-PIPE-COUNT PIC 99 VALUE ZERO.
01 WS-FORMATTED-NAME PIC X(50).
01 WS-FORMATTED-ADDR PIC X(80).
01 WS-FORMATTED-PHONE PIC X(20).
01 WS-PTR PIC 999.
01 WS-RECORD-CNT PIC 9(07) VALUE ZERO.
01 WS-ERROR-CNT PIC 9(07) VALUE ZERO.
PROCEDURE DIVISION.
0000-MAIN.
OPEN INPUT INPUT-FILE
OPEN OUTPUT OUTPUT-FILE
READ INPUT-FILE
AT END SET END-OF-FILE TO TRUE
END-READ
PERFORM 1000-PROCESS-RECORDS
UNTIL END-OF-FILE
DISPLAY 'STRFMT01 COMPLETE'
DISPLAY ' PROCESSED: ' WS-RECORD-CNT
DISPLAY ' ERRORS: ' WS-ERROR-CNT
CLOSE INPUT-FILE OUTPUT-FILE
STOP RUN.
1000-PROCESS-RECORDS.
ADD 1 TO WS-RECORD-CNT
PERFORM 1100-PARSE-INPUT
PERFORM 1200-VALIDATE-FIELDS
PERFORM 1300-FORMAT-OUTPUT
WRITE OUTPUT-REC
READ INPUT-FILE
AT END SET END-OF-FILE TO TRUE
END-READ.
1100-PARSE-INPUT.
* First, verify field count by counting pipes
MOVE ZERO TO WS-PIPE-COUNT
INSPECT INPUT-REC
TALLYING WS-PIPE-COUNT FOR ALL '|'
INITIALIZE WS-PARSED
MOVE ZERO TO WS-FIELD-COUNT
UNSTRING INPUT-REC
DELIMITED BY '|'
INTO WS-CUST-ID
WS-FIRST-NAME
WS-LAST-NAME
WS-STREET
WS-CITY
WS-STATE
WS-ZIP
WS-PHONE
TALLYING IN WS-FIELD-COUNT
ON OVERFLOW
DISPLAY 'TOO MANY FIELDS REC '
WS-RECORD-CNT
END-UNSTRING.
1200-VALIDATE-FIELDS.
* Convert names to uppercase
INSPECT WS-FIRST-NAME
CONVERTING
'abcdefghijklmnopqrstuvwxyz'
TO
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
INSPECT WS-LAST-NAME
CONVERTING
'abcdefghijklmnopqrstuvwxyz'
TO
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
* Remove hyphens from phone number for storage
INSPECT WS-PHONE
REPLACING ALL '-' BY ' '.
1300-FORMAT-OUTPUT.
MOVE SPACES TO OUTPUT-REC
* Format name: "LAST, FIRST"
MOVE SPACES TO WS-FORMATTED-NAME
STRING WS-LAST-NAME DELIMITED BY SPACE
', ' DELIMITED BY SIZE
WS-FIRST-NAME DELIMITED BY SPACE
INTO WS-FORMATTED-NAME
END-STRING
* Format address: "CITY, STATE ZIP"
MOVE SPACES TO WS-FORMATTED-ADDR
STRING WS-CITY DELIMITED BY SPACE
', ' DELIMITED BY SIZE
WS-STATE DELIMITED BY SIZE
' ' DELIMITED BY SIZE
WS-ZIP DELIMITED BY SPACE
INTO WS-FORMATTED-ADDR
END-STRING
* Build output line
MOVE 1 TO WS-PTR
STRING WS-CUST-ID DELIMITED BY SPACE
' ' DELIMITED BY SIZE
WS-FORMATTED-NAME DELIMITED BY SPACE
' ' DELIMITED BY SIZE
WS-FORMATTED-ADDR DELIMITED BY SPACE
INTO OUTPUT-REC
WITH POINTER WS-PTR
ON OVERFLOW
DISPLAY 'OUTPUT OVERFLOW REC '
WS-RECORD-CNT
ADD 1 TO WS-ERROR-CNT
END-STRING.
17.16 INSPECT CONVERTING: Deep Dive
INSPECT CONVERTING deserves a thorough treatment because it is one of COBOL's most versatile character-transformation tools, and its behavior in edge cases is not always intuitive.
17.16.1 EBCDIC-to-ASCII Translation
On mainframes that must exchange data with distributed systems, character set conversion is a frequent requirement. INSPECT CONVERTING handles this elegantly when the mapping is a known subset:
*--- Convert EBCDIC special characters to ASCII-safe equivalents
*--- This handles the most common problem characters:
INSPECT WS-DATA
CONVERTING
X'4A4B4C4D4E4F' *> EBCDIC: ¢.<(+|
TO
X'202E3C282B7C' *> ASCII equivalents
For full character set translation, most shops use a 256-byte translation table and INSPECT CONVERTING with the full table. However, for targeted conversions of known problem characters, explicit short lists are clearer and more maintainable.
17.16.2 Masking Sensitive Data
MedClaim must mask Social Security numbers in audit reports. INSPECT CONVERTING provides a one-line solution:
01 WS-SSN PIC X(09) VALUE '123456789'.
01 WS-MASKED-SSN PIC X(09).
MOVE WS-SSN TO WS-MASKED-SSN
INSPECT WS-MASKED-SSN
CONVERTING
'0123456789'
TO
'**********'
*> Result: WS-MASKED-SSN = '*********'
For partial masking (show last four digits), combine with reference modification:
MOVE WS-SSN TO WS-MASKED-SSN
INSPECT WS-MASKED-SSN(1:5)
CONVERTING
'0123456789'
TO
'*****XXXXX'
*> Wait — this does not work as intended.
*> CONVERTING maps character-to-character:
*> '0'->'*', '1'->'*', '2'->'*', '3'->'*', '4'->'*'
*> '5'->'X', '6'->'X', '7'->'X', '8'->'X', '9'->'X'
*--- Correct approach: mask first 5, leave last 4 ---
MOVE WS-SSN TO WS-MASKED-SSN
MOVE '*****' TO WS-MASKED-SSN(1:5)
*> Result: '*****6789'
The lesson: INSPECT CONVERTING maps each character in the FROM string to the corresponding character in the TO string. It is a transliteration, not a substring replacement. For partial masking, direct reference modification is simpler and more correct.
17.16.3 Normalizing Input Data
When GlobalBank receives customer data from external partners, names often contain mixed case, accidental digits, and stray punctuation. A normalization pipeline uses multiple INSPECT statements:
*--- Step 1: Convert to uppercase ---
INSPECT WS-CUSTOMER-NAME
CONVERTING
'abcdefghijklmnopqrstuvwxyz'
TO
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
*--- Step 2: Remove digits from name fields ---
INSPECT WS-CUSTOMER-NAME
REPLACING ALL '0' BY SPACE
ALL '1' BY SPACE
ALL '2' BY SPACE
ALL '3' BY SPACE
ALL '4' BY SPACE
ALL '5' BY SPACE
ALL '6' BY SPACE
ALL '7' BY SPACE
ALL '8' BY SPACE
ALL '9' BY SPACE
*--- Step 3: Remove punctuation except hyphen and apostrophe
*--- (for names like O'BRIEN and SMITH-JONES) ---
INSPECT WS-CUSTOMER-NAME
CONVERTING
'!@#$%^&*()+=[]{}|;:",.<>?/~`'
TO
' '
Each INSPECT pass handles one aspect of normalization. While this could be combined into fewer statements, the three-pass approach is more readable and maintainable — each step's purpose is clear from its comment.
17.17 Reference Modification Patterns for Fixed-Width Fields
Mainframe data is overwhelmingly fixed-width. Reference modification is the primary tool for working with positional data.
17.17.1 Parsing Fixed-Width Records Without UNSTRING
When incoming records have fixed field positions (no delimiters), reference modification is more efficient and more appropriate than UNSTRING:
*--- Input record layout (documented but not declared) ---
*> Positions 1-10: Account number
*> Positions 11-18: Transaction date (YYYYMMDD)
*> Positions 19-19: Transaction type
*> Positions 20-30: Amount (signed, implied V99)
*> Positions 31-60: Description
*> Positions 61-65: Branch code
01 WS-RAW-RECORD PIC X(100).
01 WS-PARSED-FIELDS.
05 WS-ACCT PIC X(10).
05 WS-DATE PIC X(08).
05 WS-TYPE PIC X(01).
05 WS-AMT-RAW PIC X(11).
05 WS-DESC PIC X(30).
05 WS-BRANCH PIC X(05).
*--- Parse by position ---
MOVE WS-RAW-RECORD(1:10) TO WS-ACCT
MOVE WS-RAW-RECORD(11:8) TO WS-DATE
MOVE WS-RAW-RECORD(19:1) TO WS-TYPE
MOVE WS-RAW-RECORD(20:11) TO WS-AMT-RAW
MOVE WS-RAW-RECORD(31:30) TO WS-DESC
MOVE WS-RAW-RECORD(61:5) TO WS-BRANCH
This is faster than UNSTRING because there is no delimiter scanning. Each field is extracted directly by its known position. This is the idiomatic COBOL approach for fixed-width data.
17.17.2 Sliding Window Pattern
When searching for a pattern within a field, a sliding window using reference modification is effective:
*--- Find the first occurrence of 'ERROR' in a 500-byte field ---
01 WS-LOG-LINE PIC X(500).
01 WS-SEARCH-POS PIC 999 VALUE ZERO.
01 WS-FOUND PIC X VALUE 'N'.
88 PATTERN-FOUND VALUE 'Y'.
PERFORM VARYING WS-SEARCH-POS FROM 1 BY 1
UNTIL WS-SEARCH-POS > 496
OR PATTERN-FOUND
IF WS-LOG-LINE(WS-SEARCH-POS:5) = 'ERROR'
SET PATTERN-FOUND TO TRUE
END-IF
END-PERFORM
IF PATTERN-FOUND
DISPLAY 'ERROR found at position '
WS-SEARCH-POS
END-IF
The loop boundary is 496 (not 500) because we are comparing a 5-character window, and the last valid starting position is 500 - 5 + 1 = 496.
17.17.3 Field-by-Field Record Construction
When building output records for external systems, reference modification gives you precise control over byte placement:
01 WS-OUTPUT-REC PIC X(200) VALUE SPACES.
*--- Build a fixed-format output record ---
MOVE WS-ACCT-NUM TO WS-OUTPUT-REC(1:10)
MOVE WS-CUST-NAME TO WS-OUTPUT-REC(11:40)
MOVE WS-FORMATTED-DATE TO WS-OUTPUT-REC(51:10)
MOVE WS-AMT-EDITED TO WS-OUTPUT-REC(61:15)
MOVE WS-STATUS-CODE TO WS-OUTPUT-REC(76:2)
MOVE WS-REGION TO WS-OUTPUT-REC(78:4)
This is the "template" approach mentioned in Section 17.12.3. It is significantly faster than STRING for high-volume batch output because it avoids delimiter scanning and pointer management.
17.18 Data Cleansing Patterns
Data cleansing — removing or correcting invalid characters, normalizing formats, and ensuring consistency — is one of the most common uses of string handling in production COBOL.
17.18.1 Phone Number Normalization
Phone numbers arrive in many formats: (555) 123-4567, 555-123-4567, 5551234567, 555.123.4567. Normalizing them to a standard 10-digit format:
01 WS-PHONE-RAW PIC X(20).
01 WS-PHONE-CLEAN PIC X(20).
01 WS-PHONE-DIGITS PIC X(10).
01 WS-DIGIT-POS PIC 99 VALUE ZERO.
01 WS-SCAN-POS PIC 99.
*--- Step 1: Remove all non-digits ---
MOVE SPACES TO WS-PHONE-CLEAN
MOVE WS-PHONE-RAW TO WS-PHONE-CLEAN
INSPECT WS-PHONE-CLEAN
CONVERTING
'()- .+/'
TO
' '
*--- Step 2: Extract only digits ---
MOVE SPACES TO WS-PHONE-DIGITS
MOVE ZERO TO WS-DIGIT-POS
PERFORM VARYING WS-SCAN-POS FROM 1 BY 1
UNTIL WS-SCAN-POS > 20
OR WS-DIGIT-POS >= 10
IF WS-PHONE-CLEAN(WS-SCAN-POS:1) NUMERIC
ADD 1 TO WS-DIGIT-POS
MOVE WS-PHONE-CLEAN(WS-SCAN-POS:1)
TO WS-PHONE-DIGITS(WS-DIGIT-POS:1)
END-IF
END-PERFORM
*--- Step 3: Validate we got exactly 10 digits ---
IF WS-DIGIT-POS NOT = 10
DISPLAY 'INVALID PHONE: ' WS-PHONE-RAW
MOVE SPACES TO WS-PHONE-DIGITS
END-IF
This pattern handles any format by stripping non-digits first, then collecting digits into a clean 10-position field. The final validation ensures we have exactly 10 digits — no more (country codes), no less (incomplete numbers).
17.18.2 Address Line Cleanup
Address data from external sources often contains double spaces, leading/trailing spaces, and inconsistent abbreviations:
*--- Remove double spaces ---
01 WS-PREV-LEN PIC 999.
01 WS-CURR-LEN PIC 999.
MOVE FUNCTION LENGTH(WS-ADDRESS) TO WS-CURR-LEN
PERFORM UNTIL WS-CURR-LEN = WS-PREV-LEN
MOVE WS-CURR-LEN TO WS-PREV-LEN
INSPECT WS-ADDRESS
REPLACING ALL ' ' BY ' '
*> Note: ' ' is two spaces, ' ' is one space
*> Each pass reduces one double-space to single
MOVE FUNCTION LENGTH(WS-ADDRESS) TO WS-CURR-LEN
END-PERFORM
⚠️ Important: INSPECT REPLACING ALL ' ' BY ' ' reduces each occurrence of two consecutive spaces to one space. But if there were three consecutive spaces, one pass produces two consecutive spaces (the third space remains). Multiple passes are needed until no more double-spaces exist.
17.18.3 Date Format Conversion
Converting between date formats is a constant need when interfacing with external systems:
*--- Convert MM/DD/YYYY to YYYYMMDD ---
01 WS-DATE-MDY PIC X(10). *> '03/15/2026'
01 WS-DATE-YMD PIC X(08).
STRING WS-DATE-MDY(7:4) DELIMITED BY SIZE
WS-DATE-MDY(1:2) DELIMITED BY SIZE
WS-DATE-MDY(4:2) DELIMITED BY SIZE
INTO WS-DATE-YMD
END-STRING
*> Result: '20260315'
*--- Convert YYYYMMDD to YYYY-MM-DD (ISO 8601) ---
01 WS-DATE-ISO PIC X(10).
STRING WS-DATE-YMD(1:4) DELIMITED BY SIZE
'-' DELIMITED BY SIZE
WS-DATE-YMD(5:2) DELIMITED BY SIZE
'-' DELIMITED BY SIZE
WS-DATE-YMD(7:2) DELIMITED BY SIZE
INTO WS-DATE-ISO
END-STRING
*> Result: '2026-03-15'
17.18.4 Complete Data Cleansing Pipeline
Tomás Rivera at MedClaim built a reusable data cleansing paragraph that standardizes incoming provider records:
5000-CLEANSE-PROVIDER-DATA.
*--- Uppercase the provider name ---
INSPECT WS-PROV-NAME
CONVERTING
'abcdefghijklmnopqrstuvwxyz'
TO
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
*--- Remove non-printable characters ---
INSPECT WS-PROV-NAME
REPLACING ALL LOW-VALUE BY SPACE
*--- Normalize phone ---
PERFORM 5100-NORMALIZE-PHONE
*--- Validate and normalize state code ---
INSPECT WS-PROV-STATE
CONVERTING
'abcdefghijklmnopqrstuvwxyz'
TO
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
*--- Validate ZIP is numeric ---
IF WS-PROV-ZIP(1:5) NOT NUMERIC
MOVE '00000' TO WS-PROV-ZIP(1:5)
ADD 1 TO WS-CLEANSE-ERRORS
END-IF
*--- Trim trailing spaces from city ---
MOVE FUNCTION LENGTH(WS-PROV-CITY) TO WS-CITY-LEN
PERFORM UNTIL WS-CITY-LEN = 0
OR WS-PROV-CITY(WS-CITY-LEN:1) NOT = SPACE
SUBTRACT 1 FROM WS-CITY-LEN
END-PERFORM.
This pipeline demonstrates the production pattern: uppercase, remove garbage characters, normalize formats, validate content, and track cleansing statistics. Each step is simple and testable in isolation.
🧪 Try It Yourself: Data Cleansing Challenge
Write a COBOL program that reads a file of customer records with the following problems and produces a clean output file:
- Names in mixed case (normalize to uppercase)
- Phone numbers in various formats (normalize to 10 digits)
- ZIP codes with trailing spaces or embedded hyphens (normalize to 5-digit or 9-digit)
- Embedded low-value characters (replace with spaces)
- Double spaces in address fields (reduce to single spaces)
Track statistics: records read, records with at least one cleansing correction, total corrections applied.
17.19 Advanced INSPECT Patterns
INSPECT is often underutilized. Beyond simple counting and replacing, it supports several advanced patterns that are valuable in production programs.
17.19.1 INSPECT with BEFORE and AFTER Boundaries
The BEFORE INITIAL and AFTER INITIAL clauses restrict INSPECT's scope to a portion of the field:
*--- Count digits before the decimal point ---
01 WS-AMOUNT-STR PIC X(15) VALUE ' 12345.67 '.
01 WS-INT-DIGITS PIC 99 VALUE ZERO.
MOVE ZERO TO WS-INT-DIGITS
INSPECT WS-AMOUNT-STR
TALLYING WS-INT-DIGITS
FOR ALL '0' BEFORE INITIAL '.'
FOR ALL '1' BEFORE INITIAL '.'
FOR ALL '2' BEFORE INITIAL '.'
FOR ALL '3' BEFORE INITIAL '.'
FOR ALL '4' BEFORE INITIAL '.'
FOR ALL '5' BEFORE INITIAL '.'
FOR ALL '6' BEFORE INITIAL '.'
FOR ALL '7' BEFORE INITIAL '.'
FOR ALL '8' BEFORE INITIAL '.'
FOR ALL '9' BEFORE INITIAL '.'
*> Result: WS-INT-DIGITS = 5 (digits 1,2,3,4,5)
Count characters after a delimiter:
*--- Count characters after the '@' in an email ---
01 WS-EMAIL PIC X(50) VALUE 'user@domain.com'.
01 WS-DOMAIN-LEN PIC 99 VALUE ZERO.
MOVE ZERO TO WS-DOMAIN-LEN
INSPECT WS-EMAIL
TALLYING WS-DOMAIN-LEN
FOR CHARACTERS AFTER INITIAL '@'
BEFORE INITIAL SPACE
*> Result: WS-DOMAIN-LEN = 10 (domain.com)
17.19.2 INSPECT for Data Validation
INSPECT TALLYING is a powerful validation tool. You can verify that a field contains only expected characters:
*--- Validate that account number contains only digits ---
01 WS-ACCT-NUM PIC X(10).
01 WS-NON-DIGIT-CNT PIC 99 VALUE ZERO.
MOVE ZERO TO WS-NON-DIGIT-CNT
MOVE FUNCTION LENGTH(WS-ACCT-NUM) TO WS-TOTAL-CHARS
INSPECT WS-ACCT-NUM
TALLYING WS-NON-DIGIT-CNT FOR CHARACTERS
INSPECT WS-ACCT-NUM
TALLYING WS-DIGIT-CNT
FOR ALL '0' '1' '2' '3' '4'
'5' '6' '7' '8' '9'
IF WS-DIGIT-CNT NOT = WS-TOTAL-CHARS
DISPLAY 'INVALID: Non-digit characters in account'
END-IF
17.19.3 INSPECT REPLACING for Record Sanitization
When receiving data from external systems, records may contain control characters or other non-printable bytes that could disrupt downstream processing:
*--- Replace all control characters with spaces ---
*--- (Assumes EBCDIC; characters below X'40' are control) ---
INSPECT WS-INCOMING-DATA
REPLACING ALL X'00' BY SPACE
ALL X'01' BY SPACE
ALL X'02' BY SPACE
ALL X'03' BY SPACE
ALL X'0D' BY SPACE
ALL X'0A' BY SPACE
ALL X'15' BY SPACE
ALL X'25' BY SPACE
In practice, Tomás Rivera at MedClaim maintains a "sanitizer copybook" — a COPY member containing INSPECT REPLACING statements for every known problem character. Any program receiving external data includes this copybook:
COPY SANITIZE REPLACING ==:DATA:== BY ==WS-INCOMING-REC==.
17.19.4 Multi-Stage INSPECT Pipeline
Complex transformations can be built by chaining INSPECT statements. Each stage handles one aspect of the transformation:
*--- Stage 1: Normalize whitespace variations ---
INSPECT WS-DATA
REPLACING ALL X'09' BY SPACE *> Tab to space
ALL X'0D' BY SPACE *> CR to space
ALL X'0A' BY SPACE *> LF to space
*--- Stage 2: Remove leading zeros from numeric portion ---
INSPECT WS-DATA
REPLACING LEADING '0' BY SPACE
AFTER INITIAL '='
BEFORE INITIAL '|'
*--- Stage 3: Convert delimiters ---
INSPECT WS-DATA
REPLACING ALL '|' BY ','
Each stage is independently testable and has a clear purpose. This pipeline approach is far more maintainable than trying to accomplish everything in a single complex transformation.
17.20 Combining String Facilities: A Production Pattern Library
The most effective string handling in COBOL comes from combining the four facilities — STRING, UNSTRING, INSPECT, and reference modification — in well-understood patterns.
17.20.1 Pattern: Parse, Validate, Transform, Assemble
This four-step pattern is the standard approach for data transformation programs:
2000-TRANSFORM-RECORD.
*--- PARSE: Break input into fields ---
INITIALIZE WS-PARSED-FIELDS
UNSTRING WS-INPUT-LINE
DELIMITED BY '|'
INTO WS-FLD-1 WS-FLD-2 WS-FLD-3 WS-FLD-4
TALLYING IN WS-FIELD-COUNT
END-UNSTRING
*--- VALIDATE: Check each field ---
IF WS-FIELD-COUNT < 4
ADD 1 TO WS-ERROR-CNT
PERFORM 9000-LOG-ERROR
EXIT PARAGRAPH
END-IF
IF WS-FLD-1 = SPACES
ADD 1 TO WS-ERROR-CNT
PERFORM 9000-LOG-ERROR
EXIT PARAGRAPH
END-IF
*--- TRANSFORM: Normalize and clean ---
INSPECT WS-FLD-2
CONVERTING
'abcdefghijklmnopqrstuvwxyz'
TO
'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
INSPECT WS-FLD-3
REPLACING ALL '-' BY SPACE
*--- ASSEMBLE: Build output ---
MOVE SPACES TO WS-OUTPUT-LINE
MOVE 1 TO WS-OUT-PTR
STRING WS-FLD-1 DELIMITED BY SPACE
',' DELIMITED BY SIZE
WS-FLD-2 DELIMITED BY SPACE
',' DELIMITED BY SIZE
WS-FLD-3 DELIMITED BY SPACE
',' DELIMITED BY SIZE
WS-FLD-4 DELIMITED BY SPACE
INTO WS-OUTPUT-LINE
WITH POINTER WS-OUT-PTR
ON OVERFLOW
PERFORM 9100-LOG-OVERFLOW
END-STRING.
This is the canonical form of a data transformation paragraph. Every field passes through validation before it reaches the output. Every string operation has error handling. The flow is linear and easy to follow.
17.20.2 Pattern: Multi-Line Record Assembly
Some output formats require assembling a single logical record from multiple physical lines. For example, building a fixed-width output record that spans 400 bytes from data distributed across several working-storage groups:
01 WS-OUTPUT-RECORD PIC X(400) VALUE SPACES.
3000-ASSEMBLE-OUTPUT.
*--- Header portion: bytes 1-50 ---
MOVE WS-RECORD-TYPE TO WS-OUTPUT-RECORD(1:2)
MOVE WS-SEQUENCE-NUM TO WS-OUTPUT-RECORD(3:8)
MOVE WS-PROCESS-DATE TO WS-OUTPUT-RECORD(11:8)
MOVE WS-BATCH-ID TO WS-OUTPUT-RECORD(19:12)
MOVE SPACES TO WS-OUTPUT-RECORD(31:20)
*--- Customer portion: bytes 51-200 ---
MOVE WS-CUST-ID TO WS-OUTPUT-RECORD(51:10)
MOVE WS-CUST-NAME TO WS-OUTPUT-RECORD(61:50)
MOVE WS-CUST-ADDR TO WS-OUTPUT-RECORD(111:80)
MOVE WS-CUST-PHONE TO WS-OUTPUT-RECORD(191:10)
*--- Financial portion: bytes 201-300 ---
MOVE WS-AMOUNT-EDITED TO WS-OUTPUT-RECORD(201:15)
MOVE WS-TAX-EDITED TO WS-OUTPUT-RECORD(216:12)
MOVE WS-TOTAL-EDITED TO WS-OUTPUT-RECORD(228:15)
MOVE WS-CURRENCY TO WS-OUTPUT-RECORD(243:3)
MOVE SPACES TO WS-OUTPUT-RECORD(246:55)
*--- Trailer portion: bytes 301-400 ---
MOVE WS-CONTROL-HASH TO WS-OUTPUT-RECORD(301:20)
MOVE WS-RECORD-COUNT TO WS-OUTPUT-RECORD(321:8)
MOVE SPACES TO WS-OUTPUT-RECORD(329:72)
This template-based assembly using reference modification is the fastest approach for high-volume batch output. Each field is placed at its exact byte position with no parsing overhead.
💡 Documentation Practice: When using reference modification for record assembly, always include a record layout document (or copybook) that maps field names to byte positions. Without this documentation, maintenance becomes extremely difficult. Derek Washington learned this the hard way: "I once inherited a program with 50 reference-modification MOVEs and no layout documentation. It took me two days to reverse-engineer the record format."
17.21 EBCDIC vs. ASCII: String Handling Portability Considerations
When writing COBOL string handling code that must work on both mainframe (EBCDIC) and distributed (ASCII) platforms, several character set differences affect program behavior.
17.21.1 Collating Sequence Differences
The most significant difference is the collating (sort) order of characters:
EBCDIC order: space < lowercase < uppercase < digits
ASCII order: space < digits < uppercase < lowercase
This means that INSPECT TALLYING FOR CHARACTERS BEFORE INITIAL 'A' will count different characters on EBCDIC vs. ASCII systems, because the characters that precede 'A' are different in each encoding. When portability matters, use explicit character lists rather than relying on collating sequence:
*--- Portable: explicit character list ---
INSPECT WS-DATA
TALLYING WS-DIGIT-CNT
FOR ALL '0' '1' '2' '3' '4'
'5' '6' '7' '8' '9'
*--- NOT portable: relies on collating sequence ---
INSPECT WS-DATA
TALLYING WS-COUNT
FOR CHARACTERS BEFORE INITIAL 'A'
17.21.2 Hex Literal Portability
Hex literals (X'nn') represent different characters in EBCDIC and ASCII:
X'C1' = 'A' in EBCDIC, but 'Á' (A-acute) in ASCII
X'41' = unassigned in EBCDIC, but 'A' in ASCII
X'40' = space in EBCDIC, but '@' in ASCII
X'20' = unassigned in EBCDIC, but space in ASCII
If your INSPECT CONVERTING or REPLACING uses hex literals, it is inherently non-portable. Use character literals instead when possible:
*--- Portable ---
INSPECT WS-DATA REPLACING ALL SPACE BY '-'
*--- NOT portable ---
INSPECT WS-DATA REPLACING ALL X'40' BY X'60'
17.21.3 The FUNCTION ORD and FUNCTION CHAR Alternative
For portable character manipulation, use the intrinsic functions ORD (character to ordinal position) and CHAR (ordinal to character) rather than hex literals. These functions return values based on the native character set of the platform:
*--- Get the ordinal position of 'A' on any platform ---
COMPUTE WS-ORD-A = FUNCTION ORD('A')
*--- Get the character at ordinal position 65 ---
MOVE FUNCTION CHAR(65) TO WS-CHAR
*> On ASCII: WS-CHAR = 'A'
*> On EBCDIC: WS-CHAR = something else
For string handling code that must be truly portable, avoid character arithmetic and hex literals entirely. Stick to named characters ('A', 'Z', '0', '9') and the INSPECT CONVERTING technique with explicit character strings.
17.22 Chapter Summary
In this chapter, we explored COBOL's four string handling facilities — each serving a distinct purpose in the manipulation of textual data:
-
STRING concatenates fields and literals with delimiter control, pointer tracking, and overflow detection. It is the primary tool for building formatted output, log messages, and data interchange records.
-
UNSTRING parses delimited strings into multiple fields, with support for multiple delimiter types, delimiter tracking, field counting, and multi-pass parsing. It is essential for CSV, pipe-delimited, and other variable-format data.
-
INSPECT examines fields character by character for counting (TALLYING), substituting (REPLACING), and translating (CONVERTING). It handles tasks from counting commas in a CSV line to converting lowercase to uppercase.
-
Reference modification provides direct substring access through offset:length notation. It enables character-by-character scanning, substring extraction and insertion, and high-performance string operations in inner loops.
Together, these facilities make COBOL a capable text processing language — not as concise as regex-based languages, perhaps, but precise, predictable, and well-suited to the kind of structured data transformation that enterprise systems demand.
The theme of The Modernization Spectrum runs through this chapter: string handling enables COBOL programs to participate in modern data interchange. A COBOL program that can parse CSV input from a web application and produce pipe-delimited output for an analytics system is a modernized program — even if its core business logic has not changed.
In the next chapter, we turn to table handling — COBOL's array processing facilities, including OCCURS, SEARCH, SEARCH ALL, and the powerful patterns that enable in-memory data manipulation.
Key Terms Introduced in This Chapter
| Term | Definition |
|---|---|
| STRING | COBOL verb that concatenates multiple sending fields into a receiving field |
| UNSTRING | COBOL verb that parses a delimited string into multiple receiving fields |
| INSPECT | COBOL verb that examines a field character by character for counting, replacing, or converting |
| Reference modification | Substring access using identifier(offset:length) notation |
| DELIMITED BY | Clause controlling how much of a sending field is used in STRING, or what separates fields in UNSTRING |
| POINTER | Numeric field tracking position within the receiving/source field during STRING/UNSTRING |
| TALLYING | Counting operation in INSPECT (counts characters) or UNSTRING (counts fields populated) |
| ON OVERFLOW | Imperative phrase executed when STRING's receiving field is too small or UNSTRING has more fields than receivers |
| CONVERTING | INSPECT operation that translates characters according to a one-to-one mapping |
| GROUP INDICATE | (Cross-ref from Ch 16) Report Writer clause; not a string operation but often confused with string formatting |