String handling in COBOL occupies a unique position in the programming language landscape. While modern languages like Python, JavaScript, and Java offer concise string operations through built-in methods and operators, COBOL takes a...
In This Chapter
- Introduction
- 9.1 The STRING Statement: Concatenation
- 9.2 The UNSTRING Statement: Parsing and Splitting
- 9.3 The INSPECT Statement: Character Manipulation
- 9.4 Reference Modification: Substringing by Position
- 9.5 Intrinsic Functions for Strings
- 9.6 EBCDIC vs. ASCII: Collating Sequences and String Operations
- 9.7 Common String Handling Patterns
- 9.8 Parsing Real-World Data
- 9.9 Performance Considerations
- 9.10 Common Mistakes and Debugging Tips
- 9.11 Fixed-Format vs. Free-Format Examples
- 9.12 Comprehensive Example: CSV Transaction Processing
- Summary
Chapter 9: String Handling and Character Manipulation
Introduction
String handling in COBOL occupies a unique position in the programming language landscape. While modern languages like Python, JavaScript, and Java offer concise string operations through built-in methods and operators, COBOL takes a characteristically different approach: verbose, explicit, and precise. Where Python might concatenate strings with a simple + operator, COBOL requires the STRING statement with explicit delimiter specifications. Where JavaScript splits strings with a single .split() method call, COBOL uses the UNSTRING statement with detailed clauses for delimiter handling, field counting, and overflow detection.
This verbosity is not a weakness. In enterprise environments where financial transactions, government records, and healthcare data must be processed with absolute precision, COBOL's explicit string handling provides critical advantages. Every operation is self-documenting. Every boundary condition can be explicitly handled. Every overflow situation triggers programmer-defined logic. There are no hidden memory allocations, no implicit type conversions, and no silent truncation without the programmer's knowledge.
COBOL provides five primary mechanisms for string manipulation:
- STRING -- Concatenates multiple source fields into a single destination field
- UNSTRING -- Splits a single source field into multiple destination fields
- INSPECT -- Counts, replaces, or translates characters within a field
- Reference Modification -- Extracts or modifies substrings by position and length
- Intrinsic Functions -- Built-in functions for case conversion, trimming, length calculation, and more (introduced in COBOL-85, expanded in COBOL 2002 and COBOL 2014)
This chapter covers each mechanism in depth, with practical examples drawn from real-world data processing scenarios. By the end of this chapter, you will be able to parse CSV files, format report output, validate data fields, convert between character representations, and handle the edge cases that arise in production string processing.
9.1 The STRING Statement: Concatenation
The STRING statement concatenates one or more source fields into a single receiving field. It is COBOL's primary tool for building output strings, constructing messages, and assembling formatted data.
9.1.1 Basic Syntax
The general format of the STRING statement is:
STRING source-1 DELIMITED BY {SIZE | literal | identifier}
[source-2 DELIMITED BY {SIZE | literal | identifier}]
...
INTO receiving-field
[WITH POINTER pointer-field]
[ON OVERFLOW imperative-statement-1]
[NOT ON OVERFLOW imperative-statement-2]
END-STRING
Every source field in a STRING statement must have a DELIMITED BY clause. This clause controls how much of the source field is transferred to the receiving field.
9.1.2 DELIMITED BY SIZE
DELIMITED BY SIZE transfers the entire contents of the source field, including trailing spaces. This is the simplest form:
01 WS-FIRST-NAME PIC X(15) VALUE "JOHN".
01 WS-LAST-NAME PIC X(20) VALUE "SMITH".
01 WS-FULL-NAME PIC X(50) VALUE SPACES.
STRING WS-FIRST-NAME DELIMITED BY SIZE
WS-LAST-NAME DELIMITED BY SIZE
INTO WS-FULL-NAME
END-STRING
Because WS-FIRST-NAME is defined as PIC X(15), DELIMITED BY SIZE transfers all 15 characters -- "JOHN" followed by 11 spaces. The result in WS-FULL-NAME would be "JOHN SMITH " -- not usually what you want.
9.1.3 DELIMITED BY Literal
DELIMITED BY followed by a literal string causes the transfer to stop when the specified literal is encountered in the source field. The delimiter itself is not transferred:
STRING WS-FIRST-NAME DELIMITED BY SPACE
" " DELIMITED BY SIZE
WS-LAST-NAME DELIMITED BY SPACE
INTO WS-FULL-NAME
END-STRING
Here, DELIMITED BY SPACE stops the transfer at the first space character. The result is "JOHN SMITH". The literal " " (a single space) with DELIMITED BY SIZE inserts exactly one space between the names.
Common delimiters include:
| Delimiter | Purpose |
|---|---|
SPACE or SPACES |
Stop at first space (trim trailing blanks) |
"," |
Stop at comma (CSV field extraction) |
" " (two spaces) |
Stop at double space (useful for padded fields) |
LOW-VALUES |
Stop at null character |
9.1.4 DELIMITED BY Identifier
The delimiter can be stored in a variable, allowing runtime flexibility:
01 WS-DELIMITER PIC X(01) VALUE "|".
STRING WS-FIELD-1 DELIMITED BY SPACE
WS-DELIMITER DELIMITED BY SIZE
WS-FIELD-2 DELIMITED BY SPACE
INTO WS-PIPE-RECORD
END-STRING
By changing WS-DELIMITER from "|" to ",", the same code can produce either pipe-delimited or CSV output. This technique is valuable in programs that must support multiple output formats.
9.1.5 The INTO Clause
The INTO clause specifies the receiving field. Before executing a STRING statement, you should typically initialize the receiving field:
MOVE SPACES TO WS-FULL-NAME
STRING ...
INTO WS-FULL-NAME
END-STRING
If you do not initialize the receiving field, the STRING statement writes over existing content starting at the pointer position, but leaves any content beyond the last character written unchanged.
9.1.6 WITH POINTER: Tracking Position
The WITH POINTER clause specifies a numeric field that tracks the current write position in the receiving field. The pointer starts at the value you set (typically 1) and advances as each character is written:
01 WS-PTR PIC 99 VALUE 1.
01 WS-OUTPUT PIC X(80) VALUE SPACES.
MOVE 1 TO WS-PTR
STRING "CUSTOMER: " DELIMITED BY SIZE
INTO WS-OUTPUT
WITH POINTER WS-PTR
END-STRING
* WS-PTR now contains 11 (next available position)
STRING WS-CUST-NAME DELIMITED BY " "
INTO WS-OUTPUT
WITH POINTER WS-PTR
END-STRING
The POINTER clause serves three critical purposes:
- Continuation: Multiple
STRINGstatements can append to the same field by reusing the pointer - Length tracking: After the
STRING, subtracting 1 from the pointer gives the number of characters written - Positioning: Presetting the pointer to a value greater than 1 skips positions in the receiving field, useful for column-aligned output
Important: The pointer value must be initialized before use. If the pointer is less than 1 or greater than the length of the receiving field plus 1, the STRING statement triggers the ON OVERFLOW condition without transferring any data.
9.1.7 ON OVERFLOW and NOT ON OVERFLOW
When the receiving field cannot hold all the data being concatenated, the ON OVERFLOW condition is triggered. The transfer stops, the receiving field contains whatever fit, and control passes to the ON OVERFLOW imperative statement:
01 WS-SHORT-FIELD PIC X(10) VALUE SPACES.
STRING "THIS IS A VERY LONG STRING"
DELIMITED BY SIZE
INTO WS-SHORT-FIELD
ON OVERFLOW
DISPLAY "WARNING: Data truncated"
NOT ON OVERFLOW
DISPLAY "All data transferred"
END-STRING
In production code, ON OVERFLOW should always be coded for defensive programming. Silently truncated data is a common source of bugs in string processing.
9.1.8 Multiple Source Fields
A single STRING statement can concatenate many source fields. This is more efficient and clearer than using multiple MOVE and STRING statements:
STRING WS-DATE-YEAR DELIMITED BY SIZE
"-" DELIMITED BY SIZE
WS-DATE-MONTH DELIMITED BY SIZE
"-" DELIMITED BY SIZE
WS-DATE-DAY DELIMITED BY SIZE
INTO WS-ISO-DATE
END-STRING
The sources are processed left to right. Each source's delimiter clause is evaluated independently. See code/example-01-string-statement.cob for complete working examples of all STRING statement variations.
9.2 The UNSTRING Statement: Parsing and Splitting
The UNSTRING statement is the inverse of STRING. It takes a single source field and splits it into multiple receiving fields based on delimiters. UNSTRING is COBOL's primary tool for parsing delimited data such as CSV files, pipe-delimited records, and free-format input.
9.2.1 Basic Syntax
UNSTRING source-field
DELIMITED BY [ALL] {literal | identifier}
[OR [ALL] {literal | identifier}] ...
INTO dest-1 [DELIMITER IN delim-1] [COUNT IN count-1]
[dest-2 [DELIMITER IN delim-2] [COUNT IN count-2]]
...
[WITH POINTER pointer-field]
[TALLYING IN tally-field]
[ON OVERFLOW imperative-statement-1]
[NOT ON OVERFLOW imperative-statement-2]
END-UNSTRING
9.2.2 DELIMITED BY
The DELIMITED BY clause specifies what character or string separates the fields in the source:
01 WS-CSV-RECORD PIC X(80)
VALUE "10045,WILLIAMS,ROBERT,CHECKING,15230.75".
01 WS-ACCT-NUM PIC X(10).
01 WS-LAST-NAME PIC X(20).
01 WS-FIRST-NAME PIC X(20).
01 WS-ACCT-TYPE PIC X(15).
01 WS-BALANCE PIC X(12).
UNSTRING WS-CSV-RECORD
DELIMITED BY ","
INTO WS-ACCT-NUM
WS-LAST-NAME
WS-FIRST-NAME
WS-ACCT-TYPE
WS-BALANCE
END-UNSTRING
9.2.3 DELIMITED BY ALL
DELIMITED BY ALL treats consecutive occurrences of the delimiter as a single delimiter. This is essential for parsing space-separated data where multiple spaces may appear between words:
01 WS-INPUT PIC X(50)
VALUE "WORD1 WORD2 WORD3 WORD4".
01 WS-W1 PIC X(10).
01 WS-W2 PIC X(10).
01 WS-W3 PIC X(10).
01 WS-W4 PIC X(10).
UNSTRING WS-INPUT
DELIMITED BY ALL SPACES
INTO WS-W1 WS-W2 WS-W3 WS-W4
END-UNSTRING
Without ALL, each space would be treated as a separate delimiter, producing empty fields between words.
9.2.4 OR: Multiple Delimiters
The OR keyword allows specifying multiple delimiter options:
UNSTRING WS-MIXED-RECORD
DELIMITED BY "|" OR "," OR ";"
INTO WS-FIELD-1
WS-FIELD-2
WS-FIELD-3
END-UNSTRING
This parses records that may use any of the three delimiters. The OR clause can be combined with ALL:
DELIMITED BY ALL SPACES OR "," OR ";"
9.2.5 DELIMITER IN: Capturing the Delimiter
The DELIMITER IN clause stores the actual delimiter that was found for each field. This is useful when parsing records with mixed delimiters:
01 WS-DELIM-1 PIC X(01).
01 WS-DELIM-2 PIC X(01).
UNSTRING WS-RECORD
DELIMITED BY "|" OR ","
INTO WS-FIELD-1 DELIMITER IN WS-DELIM-1
WS-FIELD-2 DELIMITER IN WS-DELIM-2
WS-FIELD-3
END-UNSTRING
After execution, WS-DELIM-1 contains the delimiter found after field 1 (either "|" or ","), and WS-DELIM-2 contains the delimiter found after field 2.
9.2.6 COUNT IN: Tracking Field Lengths
The COUNT IN clause stores the number of characters placed in each receiving field:
01 WS-LEN-1 PIC 99.
01 WS-LEN-2 PIC 99.
UNSTRING "AB,CDEF,GH"
DELIMITED BY ","
INTO WS-FIELD-1 COUNT IN WS-LEN-1
WS-FIELD-2 COUNT IN WS-LEN-2
WS-FIELD-3
END-UNSTRING
* WS-LEN-1 = 2 (for "AB")
* WS-LEN-2 = 4 (for "CDEF")
This is particularly useful when receiving fields are larger than the data they receive and you need to know the actual data length.
9.2.7 WITH POINTER
The WITH POINTER clause tracks the current read position in the source field. Like the STRING pointer, it must be initialized to at least 1:
01 WS-PTR PIC 99 VALUE 1.
MOVE 1 TO WS-PTR
UNSTRING WS-SOURCE
DELIMITED BY ","
INTO WS-FIELD-1
WS-FIELD-2
WITH POINTER WS-PTR
END-UNSTRING
* WS-PTR now points past the last delimiter processed
The pointer can be used to perform incremental parsing -- parse some fields, examine them, then continue parsing from where you left off.
9.2.8 TALLYING IN: Counting Fields
The TALLYING IN clause counts the number of receiving fields that were actually populated:
01 WS-FIELD-COUNT PIC 99 VALUE 0.
MOVE 0 TO WS-FIELD-COUNT
UNSTRING WS-CSV-RECORD
DELIMITED BY ","
INTO WS-F1 WS-F2 WS-F3 WS-F4 WS-F5
TALLYING IN WS-FIELD-COUNT
END-UNSTRING
* WS-FIELD-COUNT tells you how many fields were parsed
Important: Initialize the tally field to zero before the UNSTRING. The TALLYING clause increments, not replaces, the counter.
9.2.9 ON OVERFLOW
For UNSTRING, the ON OVERFLOW condition occurs when:
- The source field contains more delimited fields than there are receiving fields
- The pointer value is less than 1 or exceeds the source field length
UNSTRING WS-LONG-RECORD
DELIMITED BY ","
INTO WS-F1 WS-F2 WS-F3
ON OVERFLOW
DISPLAY "More fields than receivers"
NOT ON OVERFLOW
DISPLAY "All fields parsed"
END-UNSTRING
9.2.10 Parsing Real-World Data Formats
CSV Records:
UNSTRING WS-CSV-LINE
DELIMITED BY ","
INTO WS-FIELD-1
WS-FIELD-2
WS-FIELD-3
TALLYING IN WS-FIELD-COUNT
END-UNSTRING
Pipe-Delimited Records:
UNSTRING WS-PIPE-LINE
DELIMITED BY "|"
INTO WS-EMP-ID
WS-EMP-NAME
WS-EMP-DEPT
WS-EMP-SALARY
END-UNSTRING
Date Reformatting (MM/DD/YYYY to YYYY-MM-DD):
UNSTRING WS-US-DATE
DELIMITED BY "/"
INTO WS-MONTH WS-DAY WS-YEAR
END-UNSTRING
STRING WS-YEAR DELIMITED BY SIZE
"-" DELIMITED BY SIZE
WS-MONTH DELIMITED BY SIZE
"-" DELIMITED BY SIZE
WS-DAY DELIMITED BY SIZE
INTO WS-ISO-DATE
END-STRING
See code/example-02-unstring.cob for complete working examples and code/example-06-data-parsing.cob for a full CSV processing program.
9.3 The INSPECT Statement: Character Manipulation
The INSPECT statement is COBOL's Swiss Army knife for character-level operations. It can count characters, replace characters, and translate entire character sets -- all within a single statement. INSPECT operates on a field in place, examining every character from left to right.
9.3.1 INSPECT TALLYING: Counting Characters
The TALLYING form of INSPECT counts occurrences of characters or strings within a field:
INSPECT source-field TALLYING
counter FOR {CHARACTERS | ALL literal | LEADING literal}
[BEFORE | AFTER INITIAL literal]
Counting All Occurrences:
01 WS-COUNT PIC 999 VALUE 0.
MOVE 0 TO WS-COUNT
INSPECT "MISSISSIPPI"
TALLYING WS-COUNT FOR ALL "S"
* WS-COUNT = 4
Counting Leading Characters:
MOVE 0 TO WS-COUNT
INSPECT "000012345"
TALLYING WS-COUNT FOR LEADING "0"
* WS-COUNT = 4 (four leading zeros)
Counting All Characters:
MOVE 0 TO WS-COUNT
INSPECT WS-SOURCE
TALLYING WS-COUNT FOR CHARACTERS
* WS-COUNT = total number of characters (field size)
Combined Tallying:
You can count multiple patterns in a single INSPECT:
MOVE 0 TO WS-VOWEL-COUNT
INSPECT WS-TEXT
TALLYING WS-VOWEL-COUNT
FOR ALL "A" ALL "E" ALL "I"
ALL "O" ALL "U"
Important: Always initialize the counter to zero before INSPECT TALLYING. The TALLYING clause adds to the counter; it does not reset it.
9.3.2 INSPECT REPLACING: Substitution
The REPLACING form modifies characters in place:
INSPECT source-field REPLACING
{CHARACTERS | ALL literal-1 | LEADING literal-1 | FIRST literal-1}
BY literal-2
[BEFORE | AFTER INITIAL literal-3]
REPLACING ALL:
INSPECT WS-PHONE
REPLACING ALL "-" BY " "
* "555-123-4567" becomes "555 123 4567"
REPLACING LEADING:
INSPECT WS-AMOUNT
REPLACING LEADING "0" BY SPACE
* "000125.50" becomes " 125.50"
REPLACING FIRST:
INSPECT WS-TEXT
REPLACING FIRST "ABC" BY "XYZ"
* "ABCABC" becomes "XYZABC" (only first match)
REPLACING CHARACTERS:
INSPECT WS-SENSITIVE-DATA
REPLACING CHARACTERS BY "*"
BEFORE INITIAL "|"
* "SECRET|PUBLIC" becomes "******|PUBLIC"
Note on replacement string length: When using REPLACING ALL, the replacement string must be exactly the same length as the string being replaced. REPLACING ALL "AB" BY "X" is invalid -- both must be the same length.
9.3.3 INSPECT CONVERTING: Character Translation
CONVERTING provides character-by-character translation, similar to the Unix tr command. Each character in the first string is replaced by the corresponding character in the second string:
INSPECT source-field CONVERTING
string-1 TO string-2
[BEFORE | AFTER INITIAL literal]
Uppercase to Lowercase:
INSPECT WS-TEXT
CONVERTING
"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
TO
"abcdefghijklmnopqrstuvwxyz"
Lowercase to Uppercase:
INSPECT WS-TEXT
CONVERTING
"abcdefghijklmnopqrstuvwxyz"
TO
"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
Digit Masking:
INSPECT WS-SSN
CONVERTING "0123456789" TO "XXXXXXXXXX"
* "123-45-6789" becomes "XXX-XX-XXXX"
ROT13 Cipher:
INSPECT WS-TEXT
CONVERTING
"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
TO
"NOPQRSTUVWXYZABCDEFGHIJKLM"
The CONVERTING clause is highly efficient for character set translation and is the recommended approach for EBCDIC-to-ASCII conversion on platforms that support it. The two strings must be the same length, and each character in the first string maps positionally to the corresponding character in the second string.
9.3.4 BEFORE/AFTER INITIAL: Bounded Operations
The BEFORE INITIAL and AFTER INITIAL phrases limit the scope of any INSPECT operation to a portion of the field:
* Count 'S' only before the word "RIVER"
INSPECT "MISSISSIPPI RIVER BASIN"
TALLYING WS-COUNT
FOR ALL "S" BEFORE INITIAL "RIVER"
* WS-COUNT = 4 (counts only in "MISSISSIPPI ")
* Convert to lowercase only after "@" in an email
INSPECT WS-EMAIL
CONVERTING
"ABCDEFGHIJKLMNOPQRSTUVWXYZ"
TO
"abcdefghijklmnopqrstuvwxyz"
AFTER INITIAL "@"
The INITIAL keyword means the operation applies before/after the first occurrence of the specified literal. If the literal is not found:
- BEFORE INITIAL -- the entire field is within scope
- AFTER INITIAL -- nothing is within scope (no operation performed)
See code/example-03-inspect.cob for complete working examples of all INSPECT variations.
9.4 Reference Modification: Substringing by Position
Reference modification provides direct access to substrings within a field by specifying a starting position and optional length. It was introduced in COBOL-85 and is one of the most frequently used string operations in modern COBOL programs.
9.4.1 Basic Syntax
identifier(start-position:length)
Both start-position and length can be numeric literals, data items, or arithmetic expressions. Positions are 1-based (the first character is position 1).
01 WS-SOURCE PIC X(20) VALUE "ABCDEFGHIJ".
MOVE WS-SOURCE(1:5) TO WS-TARGET
* WS-TARGET = "ABCDE"
MOVE WS-SOURCE(4:3) TO WS-TARGET
* WS-TARGET = "DEF"
9.4.2 Omitting the Length
When the length is omitted, the reference extends from the starting position to the end of the field:
MOVE WS-SOURCE(6:) TO WS-TARGET
* WS-TARGET = "FGHIJ" plus trailing spaces to fill
* the remainder of WS-SOURCE's defined length
9.4.3 Dynamic Substringing with Variables
The power of reference modification comes from using variables:
01 WS-START PIC 99 VALUE 1.
01 WS-LEN PIC 99 VALUE 5.
MOVE WS-SOURCE(WS-START:WS-LEN) TO WS-TARGET
This enables patterns like iterating through a string character by character:
PERFORM VARYING WS-IDX FROM 1 BY 1
UNTIL WS-IDX > 20
MOVE WS-SOURCE(WS-IDX:1) TO WS-CHAR
IF WS-CHAR = "X"
DISPLAY "Found X at position " WS-IDX
END-IF
END-PERFORM
9.4.4 Using with Arithmetic Expressions
Reference modification supports arithmetic expressions in both the start position and length:
MOVE WS-SOURCE(WS-IDX + 1 : WS-LEN - 2)
TO WS-TARGET
This is particularly useful for calculating positions relative to other values:
* Extract last N characters
COMPUTE WS-START =
FUNCTION LENGTH(WS-SOURCE) - WS-N + 1
MOVE WS-SOURCE(WS-START:WS-N) TO WS-TARGET
9.4.5 Reference Modification as a Receiving Field
Reference modification works on the receiving side of a MOVE as well, allowing in-place modification:
MOVE "XXXX" TO WS-RECORD(5:4)
* Replaces positions 5-8 with "XXXX"
This is invaluable for fixed-format record construction and field-level masking.
9.4.6 Practical Applications
Formatting Phone Numbers:
01 WS-RAW-PHONE PIC X(10) VALUE "5551234567".
01 WS-FMT-PHONE PIC X(14).
STRING "(" DELIMITED BY SIZE
WS-RAW-PHONE(1:3) DELIMITED BY SIZE
") " DELIMITED BY SIZE
WS-RAW-PHONE(4:3) DELIMITED BY SIZE
"-" DELIMITED BY SIZE
WS-RAW-PHONE(7:4) DELIMITED BY SIZE
INTO WS-FMT-PHONE
END-STRING
* Result: "(555) 123-4567"
Parsing Fixed-Format Records:
01 WS-RECORD PIC X(50)
VALUE "ACCT1234JOHN SMITH 20240315".
MOVE WS-RECORD(1:4) TO WS-ACCT-CODE
MOVE WS-RECORD(5:4) TO WS-ACCT-NUMBER
MOVE WS-RECORD(9:16) TO WS-CUST-NAME
MOVE WS-RECORD(25:8) TO WS-TRANS-DATE
Table-Driven Parsing:
01 WS-FIELD-DEF.
05 WS-FLD OCCURS 4.
10 WS-FLD-START PIC 99.
10 WS-FLD-LEN PIC 99.
PERFORM VARYING WS-I FROM 1 BY 1
UNTIL WS-I > 4
MOVE WS-RECORD(WS-FLD-START(WS-I):
WS-FLD-LEN(WS-I))
TO WS-OUTPUT(WS-I)
END-PERFORM
9.4.7 Boundary Checking Concerns
Reference modification does not automatically check boundaries. Accessing positions beyond the field's defined length produces undefined behavior:
01 WS-SHORT PIC X(10).
* DANGEROUS: accessing position 15 in a 10-byte field
MOVE WS-SHORT(15:3) TO WS-TARGET
Always validate start position and length before use:
IF WS-START > 0
AND WS-START <= FUNCTION LENGTH(WS-FIELD)
AND WS-START + WS-LEN - 1
<= FUNCTION LENGTH(WS-FIELD)
MOVE WS-FIELD(WS-START:WS-LEN)
TO WS-TARGET
ELSE
DISPLAY "Boundary violation"
END-IF
GnuCOBOL and some mainframe compilers offer runtime boundary checking options that can detect these errors during development.
See code/example-04-reference-mod.cob for complete working examples.
9.5 Intrinsic Functions for Strings
COBOL 2002 and COBOL 2014 introduced a set of intrinsic functions for string manipulation. These functions are invoked with the FUNCTION keyword and can be used anywhere a data item of the appropriate type is expected.
9.5.1 FUNCTION UPPER-CASE and LOWER-CASE
Convert all alphabetic characters to uppercase or lowercase:
MOVE FUNCTION UPPER-CASE(WS-MIXED)
TO WS-UPPER
MOVE FUNCTION LOWER-CASE(WS-MIXED)
TO WS-LOWER
These functions are particularly useful for case-insensitive comparisons:
IF FUNCTION UPPER-CASE(WS-INPUT)
= "APPROVED"
PERFORM PROCESS-APPROVAL
END-IF
Before COBOL 2002, case conversion required INSPECT CONVERTING, which remains a valid and efficient alternative.
9.5.2 FUNCTION REVERSE
Returns the characters of the argument in reverse order:
MOVE FUNCTION REVERSE(WS-TEXT)
TO WS-REVERSED
Practical uses include palindrome checking and finding the logical length of a string by counting leading spaces in the reversed string:
* Find logical length (excluding trailing spaces)
MOVE 0 TO WS-TRAILING-SPACES
INSPECT FUNCTION REVERSE(WS-TEXT)
TALLYING WS-TRAILING-SPACES
FOR LEADING SPACES
COMPUTE WS-LOGICAL-LENGTH =
FUNCTION LENGTH(WS-TEXT)
- WS-TRAILING-SPACES
9.5.3 FUNCTION LENGTH vs LENGTH OF
Both return the defined size of a field, but with an important distinction:
FUNCTION LENGTH(identifier)-- a runtime function that returns the size in charactersLENGTH OF identifier-- a compile-time special register
MOVE FUNCTION LENGTH(WS-FIELD) TO WS-LEN
* equivalent to:
MOVE LENGTH OF WS-FIELD TO WS-LEN
For alphanumeric fields, both return the number of bytes. For national (Unicode) fields, FUNCTION LENGTH returns the number of characters, while LENGTH OF returns the number of bytes.
FUNCTION LENGTH can also be applied to literals:
IF FUNCTION LENGTH("HELLO") = 5
DISPLAY "Five characters"
END-IF
9.5.4 FUNCTION TRIM
Removes leading spaces, trailing spaces, or both:
01 WS-PADDED PIC X(30) VALUE " HELLO ".
* Trim both ends (default)
MOVE FUNCTION TRIM(WS-PADDED) TO WS-RESULT
* Result: "HELLO"
* Trim leading only
MOVE FUNCTION TRIM(WS-PADDED LEADING)
TO WS-RESULT
* Result: "HELLO "
* Trim trailing only
MOVE FUNCTION TRIM(WS-PADDED TRAILING)
TO WS-RESULT
* Result: " HELLO"
FUNCTION TRIM is especially useful for comparisons and for building output strings from padded fields:
STRING FUNCTION TRIM(WS-FIRST)
DELIMITED BY SIZE
" " DELIMITED BY SIZE
FUNCTION TRIM(WS-LAST)
DELIMITED BY SIZE
INTO WS-FULL-NAME
END-STRING
Note: FUNCTION TRIM was introduced in COBOL 2002. Some older compilers may not support it, in which case you can use DELIMITED BY SPACE in a STRING statement or the REVERSE-and-count technique shown above.
9.5.5 FUNCTION ORD and FUNCTION CHAR
ORD returns the ordinal position of a character in the program's collating sequence. CHAR returns the character at a given ordinal position:
MOVE FUNCTION ORD("A") TO WS-ORD
* ASCII: WS-ORD = 66 (65 + 1, since ORD is 1-based)
* EBCDIC: WS-ORD = 194
MOVE FUNCTION CHAR(66) TO WS-RESULT
* ASCII: WS-RESULT = "A"
Important: FUNCTION ORD returns a 1-based ordinal, not the raw numeric code point. The ordinal value of a character is its position in the collating sequence plus 1. So in ASCII, ORD("A") returns 66 (the ASCII code 65, plus 1).
These functions are useful for character classification:
* Check if character is uppercase letter
IF FUNCTION ORD(WS-CHAR) >= FUNCTION ORD("A")
AND FUNCTION ORD(WS-CHAR) <= FUNCTION ORD("Z")
DISPLAY "Uppercase letter"
END-IF
9.5.6 FUNCTION CONCATENATE (COBOL 2014)
Concatenates multiple arguments into a single result:
MOVE FUNCTION CONCATENATE(
FUNCTION TRIM(WS-FIRST)
" "
FUNCTION TRIM(WS-LAST))
TO WS-FULL-NAME
CONCATENATE can accept any number of arguments and supports nested function calls. It is more concise than STRING for simple concatenation but does not provide overflow detection or pointer tracking.
9.5.7 FUNCTION SUBSTITUTE (COBOL 2014)
Replaces occurrences of a search string with a replacement string:
MOVE FUNCTION SUBSTITUTE(
WS-TEXT
"OLD-VALUE" "NEW-VALUE"
"ANOTHER" "REPLACED")
TO WS-RESULT
The arguments after the source come in pairs: search string followed by replacement. Multiple pairs can be specified. Note that unlike INSPECT REPLACING, the replacement string does not need to be the same length as the search string.
Support for SUBSTITUTE is limited to compilers that implement the COBOL 2014 standard. Check your compiler's documentation for availability.
See code/example-05-intrinsic-string.cob for complete working examples.
9.6 EBCDIC vs. ASCII: Collating Sequences and String Operations
Understanding character encoding is fundamental to COBOL string handling, especially in environments that bridge mainframe and distributed systems.
9.6.1 Two Worlds of Character Encoding
EBCDIC (Extended Binary Coded Decimal Interchange Code) is used on IBM mainframes (z/OS, AS/400). In EBCDIC: - Lowercase letters (a-z) have codes 129-169 - Uppercase letters (A-Z) have codes 193-233 - Digits (0-9) have codes 240-249 - Letters come after digits in the collating sequence
ASCII (American Standard Code for Information Interchange) is used on PCs, Unix/Linux, and modern systems. In ASCII: - Digits (0-9) have codes 48-57 - Uppercase letters (A-Z) have codes 65-90 - Lowercase letters (a-z) have codes 97-122 - Digits come before letters in the collating sequence
9.6.2 Impact on String Operations
The different collating sequences affect several operations:
Sorting and Comparison:
In EBCDIC, "9" > "A" is true (240 > 193). In ASCII, "9" < "A" is true (57 < 65). Programs that sort or compare mixed alphanumeric data may produce different results on different platforms.
INSPECT CONVERTING: Character translation tables must use the correct character set. An EBCDIC-to-ASCII conversion table is specific to the platform:
* On a mainframe, converting common EBCDIC to ASCII
* equivalents using INSPECT CONVERTING
INSPECT WS-EBCDIC-DATA
CONVERTING WS-EBCDIC-TABLE
TO WS-ASCII-TABLE
FUNCTION ORD:
The ordinal values returned by FUNCTION ORD depend on the platform's native character set. Code that uses ORD for character classification should use character comparisons rather than hardcoded ordinal values.
9.6.3 Portable String Code
To write string-handling code that works correctly on both EBCDIC and ASCII platforms:
- Use
FUNCTION UPPER-CASEandLOWER-CASEinstead of hardcodedINSPECT CONVERTINGtables for case conversion - Use alphabetic class conditions (
IF WS-CHAR IS ALPHABETIC) rather than ordinal comparisons - Use
FUNCTION ORDwith character references rather than numeric literals:FUNCTION ORD("A")rather than66 - Specify
PROGRAM COLLATING SEQUENCEin theOBJECT-COMPUTERparagraph when consistent ordering is required
9.7 Common String Handling Patterns
9.7.1 Left-Justifying a Field
* Remove leading spaces by unstringing then restringing
UNSTRING WS-INPUT
DELIMITED BY ALL SPACES
INTO WS-TEMP
END-UNSTRING
MOVE WS-TEMP TO WS-OUTPUT
Or using FUNCTION TRIM:
MOVE FUNCTION TRIM(WS-INPUT LEADING)
TO WS-OUTPUT
9.7.2 Right-Justifying a Field
* Find logical length
MOVE 0 TO WS-TRAILING
INSPECT FUNCTION REVERSE(WS-INPUT)
TALLYING WS-TRAILING FOR LEADING SPACES
COMPUTE WS-DATA-LEN =
FUNCTION LENGTH(WS-INPUT) - WS-TRAILING
* Calculate padding
COMPUTE WS-PAD =
FUNCTION LENGTH(WS-OUTPUT) - WS-DATA-LEN
MOVE SPACES TO WS-OUTPUT
MOVE WS-INPUT(1:WS-DATA-LEN)
TO WS-OUTPUT(WS-PAD + 1:WS-DATA-LEN)
9.7.3 Padding a Field
* Pad with zeros on the left
MOVE SPACES TO WS-PADDED
MOVE ALL "0" TO WS-PADDED
COMPUTE WS-START =
FUNCTION LENGTH(WS-PADDED) - WS-DATA-LEN + 1
MOVE WS-DATA(1:WS-DATA-LEN)
TO WS-PADDED(WS-START:WS-DATA-LEN)
9.7.4 Case-Insensitive Comparison
IF FUNCTION UPPER-CASE(WS-INPUT-1)
= FUNCTION UPPER-CASE(WS-INPUT-2)
DISPLAY "Match (case-insensitive)"
END-IF
9.7.5 Data Validation: Checking for Numeric Content
MOVE 0 TO WS-DIGIT-COUNT
INSPECT WS-FIELD
TALLYING WS-DIGIT-COUNT
FOR ALL "0" ALL "1" ALL "2" ALL "3"
ALL "4" ALL "5" ALL "6" ALL "7"
ALL "8" ALL "9"
BEFORE INITIAL SPACE
IF WS-DIGIT-COUNT = FUNCTION LENGTH(
FUNCTION TRIM(WS-FIELD))
DISPLAY "Field is all numeric"
END-IF
Or more simply:
IF WS-FIELD IS NUMERIC
DISPLAY "Field is numeric"
END-IF
9.7.6 Simple Search and Replace
COBOL does not have a built-in search-and-replace for arbitrary-length strings (prior to COBOL 2014's SUBSTITUTE). For same-length replacements, use INSPECT REPLACING ALL:
INSPECT WS-TEXT
REPLACING ALL "OLD" BY "NEW"
For different-length replacements, you must implement a loop with reference modification:
MOVE 1 TO WS-SRC-IDX
MOVE 1 TO WS-DST-IDX
PERFORM UNTIL WS-SRC-IDX > WS-SRC-LEN
IF WS-SOURCE(WS-SRC-IDX:WS-SEARCH-LEN)
= WS-SEARCH-STR
MOVE WS-REPLACE-STR(1:WS-REPL-LEN)
TO WS-DEST(WS-DST-IDX:WS-REPL-LEN)
ADD WS-SEARCH-LEN TO WS-SRC-IDX
ADD WS-REPL-LEN TO WS-DST-IDX
ELSE
MOVE WS-SOURCE(WS-SRC-IDX:1)
TO WS-DEST(WS-DST-IDX:1)
ADD 1 TO WS-SRC-IDX
ADD 1 TO WS-DST-IDX
END-IF
END-PERFORM
9.7.7 Building Report Output Strings
The WITH POINTER clause of STRING is ideal for building fixed-column report lines:
MOVE SPACES TO WS-REPORT-LINE
MOVE 1 TO WS-PTR
* Column 1: Account number
STRING WS-ACCT-NUM DELIMITED BY SIZE
INTO WS-REPORT-LINE
WITH POINTER WS-PTR
END-STRING
* Column 12: Customer name
MOVE 12 TO WS-PTR
STRING WS-CUST-NAME DELIMITED BY " "
INTO WS-REPORT-LINE
WITH POINTER WS-PTR
END-STRING
* Column 40: Balance
MOVE 40 TO WS-PTR
STRING WS-FMT-BALANCE DELIMITED BY SIZE
INTO WS-REPORT-LINE
WITH POINTER WS-PTR
END-STRING
9.8 Parsing Real-World Data
9.8.1 CSV File Processing
Processing CSV files is one of the most common string handling tasks. The general approach is:
- Read a record from the file
- Use
UNSTRINGwithDELIMITED BY ","to split fields - Use
TALLYING INto verify the expected number of fields - Validate each parsed field
- Convert string representations to appropriate types
3100-PARSE-CSV-RECORD.
MOVE SPACES TO WS-PARSED-FIELDS
MOVE 0 TO WS-FIELD-TALLY
UNSTRING WS-INPUT-RECORD
DELIMITED BY ","
INTO WS-FIELD-1
WS-FIELD-2
WS-FIELD-3
WS-FIELD-4
TALLYING IN WS-FIELD-TALLY
ON OVERFLOW
SET EXTRA-FIELDS TO TRUE
END-UNSTRING
IF WS-FIELD-TALLY NOT = 4
SET RECORD-INVALID TO TRUE
END-IF
Handling quoted CSV fields: Standard COBOL UNSTRING does not handle quoted fields (where commas may appear within quotes). For true RFC 4180 CSV processing, you need a character-by-character parser using reference modification. This is demonstrated in the case study programs.
9.8.2 Fixed-Length Record Parsing
Fixed-length records are COBOL's native format. Reference modification provides direct access:
* Legacy record: NAME(1:30) ADDR(31:30) CITY(61:20)
* ST(81:2) ZIP(83:5) ACCT(88:10)
MOVE WS-RECORD(1:30) TO WS-NAME
MOVE WS-RECORD(31:30) TO WS-ADDRESS
MOVE WS-RECORD(61:20) TO WS-CITY
MOVE WS-RECORD(81:2) TO WS-STATE
MOVE WS-RECORD(83:5) TO WS-ZIP
MOVE WS-RECORD(88:10) TO WS-ACCOUNT
For maintainability, define the record structure in the DATA DIVISION rather than using reference modification for every field:
01 WS-LEGACY-RECORD.
05 WS-LR-NAME PIC X(30).
05 WS-LR-ADDRESS PIC X(30).
05 WS-LR-CITY PIC X(20).
05 WS-LR-STATE PIC X(02).
05 WS-LR-ZIP PIC X(05).
05 WS-LR-ACCOUNT PIC X(10).
Use reference modification for dynamic or table-driven parsing where the field positions come from a configuration table.
9.8.3 Name and Address Parsing
Name parsing is notoriously complex due to the variety of formats: - "SMITH, JOHN" (last-first) - "JOHN SMITH" (first-last) - "JOHN Q. SMITH" (first-middle-last) - "DR. JOHN SMITH JR." (prefix-first-last-suffix) - "MARY O'CONNOR" (names with apostrophes)
The general strategy is:
- Check for a comma to detect "LAST, FIRST" format
- Count words to determine the pattern
- Check for known prefixes (DR., MR., MRS., MS.)
- Check for known suffixes (JR., SR., III, IV)
- Handle special cases (multi-word last names like "VAN DER BERG")
See code/case-study-code.cob for a complete implementation.
9.9 Performance Considerations
9.9.1 STRING vs. Reference Modification
For simple concatenation of fields at known positions, direct MOVE with reference modification is faster than STRING:
* Faster approach for known-position assembly
MOVE WS-YEAR TO WS-DATE(1:4)
MOVE "-" TO WS-DATE(5:1)
MOVE WS-MONTH TO WS-DATE(6:2)
MOVE "-" TO WS-DATE(8:1)
MOVE WS-DAY TO WS-DATE(9:2)
Use STRING when the positions are dynamic, when multiple delimited fields are involved, or when overflow detection is needed.
9.9.2 INSPECT Efficiency
INSPECT scans the entire field from left to right. For large fields, use BEFORE INITIAL or AFTER INITIAL to limit the scan range:
* More efficient: only scan up to the delimiter
INSPECT WS-LARGE-FIELD
TALLYING WS-COUNT FOR ALL "X"
BEFORE INITIAL SPACES
9.9.3 Avoid Unnecessary Operations
When processing millions of records, small optimizations in string handling compound significantly:
- Pre-calculate lengths rather than calling
FUNCTION LENGTHin a loop - Use
INSPECT CONVERTINGfor bulk character translation rather than character-by-character loops - Initialize receiving fields once before a loop, not inside the loop, if possible
- Use
MOVEfor simple copies rather thanSTRINGwithDELIMITED BY SIZE
9.9.4 Intrinsic Functions and Performance
Intrinsic functions like UPPER-CASE and TRIM create temporary intermediate results. In tight loops, it may be more efficient to use INSPECT CONVERTING for case conversion and manual trimming logic rather than function calls.
9.10 Common Mistakes and Debugging Tips
9.10.1 Pointer Management Errors
Forgetting to initialize the pointer:
* BUG: WS-PTR might contain anything
STRING WS-DATA DELIMITED BY SIZE
INTO WS-OUTPUT
WITH POINTER WS-PTR
END-STRING
* FIX: Always initialize
MOVE 1 TO WS-PTR
STRING WS-DATA DELIMITED BY SIZE
INTO WS-OUTPUT
WITH POINTER WS-PTR
END-STRING
Forgetting to initialize the TALLYING counter:
* BUG: WS-TALLY accumulates across calls
UNSTRING WS-RECORD
DELIMITED BY ","
INTO WS-F1 WS-F2
TALLYING IN WS-TALLY
END-UNSTRING
* FIX: Reset before each UNSTRING
MOVE 0 TO WS-TALLY
UNSTRING WS-RECORD ...
9.10.2 Overflow Handling
Not coding ON OVERFLOW:
Every STRING and UNSTRING should include ON OVERFLOW handling in production code. Silent data truncation is a common source of production bugs.
9.10.3 Off-by-One Errors in Reference Modification
Common mistake: zero-based thinking
* BUG: Position 0 is invalid (COBOL is 1-based)
MOVE WS-DATA(0:5) TO WS-TARGET
* FIX: Start at position 1
MOVE WS-DATA(1:5) TO WS-TARGET
Common mistake: exceeding field boundaries
01 WS-FIELD PIC X(10).
* BUG: Positions 8-12 exceed the 10-byte field
MOVE WS-FIELD(8:5) TO WS-TARGET
* FIX: Ensure start + length - 1 <= field size
MOVE WS-FIELD(8:3) TO WS-TARGET
9.10.4 DELIMITED BY SIZE vs. DELIMITED BY SPACE
This is perhaps the most common STRING mistake:
01 WS-NAME PIC X(20) VALUE "JOHN".
* DELIMITED BY SIZE copies all 20 characters:
* "JOHN " (with 16 trailing spaces)
STRING WS-NAME DELIMITED BY SIZE ...
* DELIMITED BY SPACE copies until first space:
* "JOHN"
STRING WS-NAME DELIMITED BY SPACE ...
For most practical concatenation, DELIMITED BY SPACE or DELIMITED BY " " (double space) is what you want.
9.10.5 INSPECT REPLACING Length Mismatch
* ERROR: Different lengths in REPLACING
INSPECT WS-TEXT
REPLACING ALL "AB" BY "X"
* "X" is 1 byte, "AB" is 2 bytes - INVALID
* FIX: Use same-length strings
INSPECT WS-TEXT
REPLACING ALL "AB" BY "X "
* Both are 2 bytes - VALID
9.10.6 Not Initializing Receiving Fields
* BUG: WS-OUTPUT may contain leftover data
STRING WS-NEW-DATA DELIMITED BY SPACE
INTO WS-OUTPUT
END-STRING
* FIX: Initialize first
MOVE SPACES TO WS-OUTPUT
STRING WS-NEW-DATA DELIMITED BY SPACE
INTO WS-OUTPUT
END-STRING
9.11 Fixed-Format vs. Free-Format Examples
All examples in this chapter follow COBOL fixed-format conventions with the traditional column layout:
- Columns 1-6: Sequence number area (left blank)
- Column 7: Indicator area (
*for comments) - Columns 8-11: Area A (division, section, paragraph headers)
- Columns 12-72: Area B (statements)
- Columns 73-80: Comment area (ignored by compiler)
In free-format COBOL (supported by GnuCOBOL with the -free flag and by many modern compilers), the same code can be written without column restrictions:
Fixed-format:
01 WS-NAME PIC X(30).
01 WS-RESULT PIC X(60).
STRING WS-FIRST DELIMITED BY SPACE
" " DELIMITED BY SIZE
WS-LAST DELIMITED BY SPACE
INTO WS-RESULT
END-STRING
Free-format:
01 WS-NAME PIC X(30).
01 WS-RESULT PIC X(60).
STRING WS-FIRST DELIMITED BY SPACE
" " DELIMITED BY SIZE
WS-LAST DELIMITED BY SPACE
INTO WS-RESULT
END-STRING
The functionality is identical. The code examples in the code/ directory use fixed-format for maximum compatibility, as this is the format encountered in the vast majority of existing COBOL codebases.
9.12 Comprehensive Example: CSV Transaction Processing
The program in code/example-06-data-parsing.cob demonstrates a complete data processing pipeline using the string handling techniques covered in this chapter:
- Input: CSV records containing account number, name, transaction type, amount, and date
- Parsing:
UNSTRINGwithDELIMITED BY ","andTALLYING IN - Validation:
INSPECT TALLYINGto verify numeric content,EVALUATEfor valid transaction types, boundary checks on dates - Formatting:
STRINGwithWITH POINTERto build fixed-width report lines, reference modification for date reformatting - Accumulation: Running totals for deposits and withdrawals
- Error handling:
ON OVERFLOW, validation error reporting
This program processes eight sample records including deliberate error cases (invalid account number, negative amount, missing name) to demonstrate robust error handling.
Summary
This chapter covered the five pillars of COBOL string handling:
| Mechanism | Primary Use | Key Feature |
|---|---|---|
| STRING | Concatenation | DELIMITED BY, WITH POINTER, ON OVERFLOW |
| UNSTRING | Parsing/splitting | DELIMITER IN, COUNT IN, TALLYING IN |
| INSPECT | Counting/replacing/translating | TALLYING, REPLACING, CONVERTING |
| Reference Modification | Substringing | identifier(start:length) |
| Intrinsic Functions | Case, trim, length, etc. | UPPER-CASE, TRIM, LENGTH, ORD, CHAR |
The STRING and UNSTRING statements are complementary: STRING assembles, UNSTRING disassembles. Together with INSPECT for character-level operations and reference modification for positional access, they provide a complete toolkit for any string processing task.
Key principles to remember:
- Always initialize receiving fields before
STRINGoperations - Always initialize counters before
INSPECT TALLYINGandUNSTRING TALLYING - Always initialize pointers before using
WITH POINTER - Always code
ON OVERFLOWin production programs - Use
DELIMITED BY SPACE(notSIZE) when you want to exclude trailing spaces - Reference modification is 1-based, not 0-based
INSPECT REPLACINGrequires same-length source and target stringsINSPECT CONVERTINGrequires same-length source and target character sets- Test string operations with boundary cases: empty fields, fields full of spaces, maximum-length data, and data that exceeds field capacity
The next chapter builds on these string handling skills by exploring file input/output operations, where you will read records from external files, parse them using the techniques from this chapter, and write formatted output.