19 min read

In This Chapter

Introduction
9.1 The STRING Statement: Concatenation
9.2 The UNSTRING Statement: Parsing and Splitting
9.3 The INSPECT Statement: Character Manipulation
9.4 Reference Modification: Substringing by Position
9.5 Intrinsic Functions for Strings
9.6 EBCDIC vs. ASCII: Collating Sequences and String Operations
9.7 Common String Handling Patterns
9.8 Parsing Real-World Data
9.9 Performance Considerations
9.10 Common Mistakes and Debugging Tips
9.11 Fixed-Format vs. Free-Format Examples
9.12 Comprehensive Example: CSV Transaction Processing
Summary

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

Chapter 9: String Handling and Character Manipulation

Introduction

String handling in COBOL occupies a unique position in the programming language landscape. While modern languages like Python, JavaScript, and Java offer concise string operations through built-in methods and operators, COBOL takes a characteristically different approach: verbose, explicit, and precise. Where Python might concatenate strings with a simple + operator, COBOL requires the STRING statement with explicit delimiter specifications. Where JavaScript splits strings with a single .split() method call, COBOL uses the UNSTRING statement with detailed clauses for delimiter handling, field counting, and overflow detection.

This verbosity is not a weakness. In enterprise environments where financial transactions, government records, and healthcare data must be processed with absolute precision, COBOL's explicit string handling provides critical advantages. Every operation is self-documenting. Every boundary condition can be explicitly handled. Every overflow situation triggers programmer-defined logic. There are no hidden memory allocations, no implicit type conversions, and no silent truncation without the programmer's knowledge.

COBOL provides five primary mechanisms for string manipulation:

STRING -- Concatenates multiple source fields into a single destination field
UNSTRING -- Splits a single source field into multiple destination fields
INSPECT -- Counts, replaces, or translates characters within a field
Reference Modification -- Extracts or modifies substrings by position and length
Intrinsic Functions -- Built-in functions for case conversion, trimming, length calculation, and more (introduced in COBOL-85, expanded in COBOL 2002 and COBOL 2014)

This chapter covers each mechanism in depth, with practical examples drawn from real-world data processing scenarios. By the end of this chapter, you will be able to parse CSV files, format report output, validate data fields, convert between character representations, and handle the edge cases that arise in production string processing.

9.1 The STRING Statement: Concatenation

The STRING statement concatenates one or more source fields into a single receiving field. It is COBOL's primary tool for building output strings, constructing messages, and assembling formatted data.

9.1.1 Basic Syntax

The general format of the STRING statement is:

STRING  source-1 DELIMITED BY {SIZE | literal | identifier}
       [source-2 DELIMITED BY {SIZE | literal | identifier}]
       ...
  INTO receiving-field
 [WITH POINTER pointer-field]
 [ON OVERFLOW imperative-statement-1]
 [NOT ON OVERFLOW imperative-statement-2]
END-STRING

Every source field in a STRING statement must have a DELIMITED BY clause. This clause controls how much of the source field is transferred to the receiving field.

9.1.2 DELIMITED BY SIZE

DELIMITED BY SIZE transfers the entire contents of the source field, including trailing spaces. This is the simplest form:

       01  WS-FIRST-NAME    PIC X(15) VALUE "JOHN".
       01  WS-LAST-NAME     PIC X(20) VALUE "SMITH".
       01  WS-FULL-NAME     PIC X(50) VALUE SPACES.

           STRING WS-FIRST-NAME DELIMITED BY SIZE
                  WS-LAST-NAME  DELIMITED BY SIZE
             INTO WS-FULL-NAME
           END-STRING

Because WS-FIRST-NAME is defined as PIC X(15), DELIMITED BY SIZE transfers all 15 characters -- "JOHN" followed by 11 spaces. The result in WS-FULL-NAME would be "JOHN SMITH " -- not usually what you want.

9.1.3 DELIMITED BY Literal

DELIMITED BY followed by a literal string causes the transfer to stop when the specified literal is encountered in the source field. The delimiter itself is not transferred:

           STRING WS-FIRST-NAME DELIMITED BY SPACE
                  " "            DELIMITED BY SIZE
                  WS-LAST-NAME  DELIMITED BY SPACE
             INTO WS-FULL-NAME
           END-STRING

Here, DELIMITED BY SPACE stops the transfer at the first space character. The result is "JOHN SMITH". The literal " " (a single space) with DELIMITED BY SIZE inserts exactly one space between the names.

Common delimiters include:

Delimiter	Purpose
`SPACE` or `SPACES`	Stop at first space (trim trailing blanks)
`","`	Stop at comma (CSV field extraction)
`" "` (two spaces)	Stop at double space (useful for padded fields)
`LOW-VALUES`	Stop at null character

9.1.4 DELIMITED BY Identifier

The delimiter can be stored in a variable, allowing runtime flexibility:

       01  WS-DELIMITER    PIC X(01) VALUE "|".

           STRING WS-FIELD-1  DELIMITED BY SPACE
                  WS-DELIMITER DELIMITED BY SIZE
                  WS-FIELD-2  DELIMITED BY SPACE
             INTO WS-PIPE-RECORD
           END-STRING

By changing WS-DELIMITER from "|" to ",", the same code can produce either pipe-delimited or CSV output. This technique is valuable in programs that must support multiple output formats.

9.1.5 The INTO Clause

The INTO clause specifies the receiving field. Before executing a STRING statement, you should typically initialize the receiving field:

           MOVE SPACES TO WS-FULL-NAME
           STRING ...
             INTO WS-FULL-NAME
           END-STRING

If you do not initialize the receiving field, the STRING statement writes over existing content starting at the pointer position, but leaves any content beyond the last character written unchanged.

9.1.6 WITH POINTER: Tracking Position

The WITH POINTER clause specifies a numeric field that tracks the current write position in the receiving field. The pointer starts at the value you set (typically 1) and advances as each character is written:

       01  WS-PTR          PIC 99 VALUE 1.
       01  WS-OUTPUT        PIC X(80) VALUE SPACES.

           MOVE 1 TO WS-PTR
           STRING "CUSTOMER: " DELIMITED BY SIZE
             INTO WS-OUTPUT
             WITH POINTER WS-PTR
           END-STRING

      *    WS-PTR now contains 11 (next available position)
           STRING WS-CUST-NAME DELIMITED BY "  "
             INTO WS-OUTPUT
             WITH POINTER WS-PTR
           END-STRING

The POINTER clause serves three critical purposes:

Continuation: Multiple STRING statements can append to the same field by reusing the pointer
Length tracking: After the STRING, subtracting 1 from the pointer gives the number of characters written
Positioning: Presetting the pointer to a value greater than 1 skips positions in the receiving field, useful for column-aligned output

Important: The pointer value must be initialized before use. If the pointer is less than 1 or greater than the length of the receiving field plus 1, the STRING statement triggers the ON OVERFLOW condition without transferring any data.

9.1.7 ON OVERFLOW and NOT ON OVERFLOW

When the receiving field cannot hold all the data being concatenated, the ON OVERFLOW condition is triggered. The transfer stops, the receiving field contains whatever fit, and control passes to the ON OVERFLOW imperative statement:

       01  WS-SHORT-FIELD   PIC X(10) VALUE SPACES.

           STRING "THIS IS A VERY LONG STRING"
                  DELIMITED BY SIZE
             INTO WS-SHORT-FIELD
             ON OVERFLOW
                 DISPLAY "WARNING: Data truncated"
             NOT ON OVERFLOW
                 DISPLAY "All data transferred"
           END-STRING

In production code, ON OVERFLOW should always be coded for defensive programming. Silently truncated data is a common source of bugs in string processing.

9.1.8 Multiple Source Fields

A single STRING statement can concatenate many source fields. This is more efficient and clearer than using multiple MOVE and STRING statements:

           STRING WS-DATE-YEAR  DELIMITED BY SIZE
                  "-"            DELIMITED BY SIZE
                  WS-DATE-MONTH DELIMITED BY SIZE
                  "-"            DELIMITED BY SIZE
                  WS-DATE-DAY   DELIMITED BY SIZE
             INTO WS-ISO-DATE
           END-STRING

The sources are processed left to right. Each source's delimiter clause is evaluated independently. See code/example-01-string-statement.cob for complete working examples of all STRING statement variations.

9.2 The UNSTRING Statement: Parsing and Splitting

The UNSTRING statement is the inverse of STRING. It takes a single source field and splits it into multiple receiving fields based on delimiters. UNSTRING is COBOL's primary tool for parsing delimited data such as CSV files, pipe-delimited records, and free-format input.

9.2.1 Basic Syntax

UNSTRING source-field
    DELIMITED BY [ALL] {literal | identifier}
    [OR [ALL] {literal | identifier}] ...
    INTO dest-1 [DELIMITER IN delim-1] [COUNT IN count-1]
        [dest-2 [DELIMITER IN delim-2] [COUNT IN count-2]]
        ...
   [WITH POINTER pointer-field]
   [TALLYING IN tally-field]
   [ON OVERFLOW imperative-statement-1]
   [NOT ON OVERFLOW imperative-statement-2]
END-UNSTRING

9.2.2 DELIMITED BY

The DELIMITED BY clause specifies what character or string separates the fields in the source:

       01  WS-CSV-RECORD   PIC X(80)
           VALUE "10045,WILLIAMS,ROBERT,CHECKING,15230.75".
       01  WS-ACCT-NUM     PIC X(10).
       01  WS-LAST-NAME    PIC X(20).
       01  WS-FIRST-NAME   PIC X(20).
       01  WS-ACCT-TYPE    PIC X(15).
       01  WS-BALANCE      PIC X(12).

           UNSTRING WS-CSV-RECORD
               DELIMITED BY ","
               INTO WS-ACCT-NUM
                    WS-LAST-NAME
                    WS-FIRST-NAME
                    WS-ACCT-TYPE
                    WS-BALANCE
           END-UNSTRING

9.2.3 DELIMITED BY ALL

DELIMITED BY ALL treats consecutive occurrences of the delimiter as a single delimiter. This is essential for parsing space-separated data where multiple spaces may appear between words:

       01  WS-INPUT  PIC X(50)
           VALUE "WORD1   WORD2     WORD3  WORD4".
       01  WS-W1     PIC X(10).
       01  WS-W2     PIC X(10).
       01  WS-W3     PIC X(10).
       01  WS-W4     PIC X(10).

           UNSTRING WS-INPUT
               DELIMITED BY ALL SPACES
               INTO WS-W1 WS-W2 WS-W3 WS-W4
           END-UNSTRING

Without ALL, each space would be treated as a separate delimiter, producing empty fields between words.

9.2.4 OR: Multiple Delimiters

The OR keyword allows specifying multiple delimiter options:

           UNSTRING WS-MIXED-RECORD
               DELIMITED BY "|" OR "," OR ";"
               INTO WS-FIELD-1
                    WS-FIELD-2
                    WS-FIELD-3
           END-UNSTRING

This parses records that may use any of the three delimiters. The OR clause can be combined with ALL:

           DELIMITED BY ALL SPACES OR "," OR ";"

9.2.5 DELIMITER IN: Capturing the Delimiter

The DELIMITER IN clause stores the actual delimiter that was found for each field. This is useful when parsing records with mixed delimiters:

       01  WS-DELIM-1  PIC X(01).
       01  WS-DELIM-2  PIC X(01).

           UNSTRING WS-RECORD
               DELIMITED BY "|" OR ","
               INTO WS-FIELD-1 DELIMITER IN WS-DELIM-1
                    WS-FIELD-2 DELIMITER IN WS-DELIM-2
                    WS-FIELD-3
           END-UNSTRING

After execution, WS-DELIM-1 contains the delimiter found after field 1 (either "|" or ","), and WS-DELIM-2 contains the delimiter found after field 2.

9.2.6 COUNT IN: Tracking Field Lengths

The COUNT IN clause stores the number of characters placed in each receiving field:

       01  WS-LEN-1  PIC 99.
       01  WS-LEN-2  PIC 99.

           UNSTRING "AB,CDEF,GH"
               DELIMITED BY ","
               INTO WS-FIELD-1 COUNT IN WS-LEN-1
                    WS-FIELD-2 COUNT IN WS-LEN-2
                    WS-FIELD-3
           END-UNSTRING

      *    WS-LEN-1 = 2 (for "AB")
      *    WS-LEN-2 = 4 (for "CDEF")

This is particularly useful when receiving fields are larger than the data they receive and you need to know the actual data length.

9.2.7 WITH POINTER

The WITH POINTER clause tracks the current read position in the source field. Like the STRING pointer, it must be initialized to at least 1:

       01  WS-PTR  PIC 99 VALUE 1.

           MOVE 1 TO WS-PTR
           UNSTRING WS-SOURCE
               DELIMITED BY ","
               INTO WS-FIELD-1
                    WS-FIELD-2
               WITH POINTER WS-PTR
           END-UNSTRING

      *    WS-PTR now points past the last delimiter processed

The pointer can be used to perform incremental parsing -- parse some fields, examine them, then continue parsing from where you left off.

9.2.8 TALLYING IN: Counting Fields

The TALLYING IN clause counts the number of receiving fields that were actually populated:

       01  WS-FIELD-COUNT  PIC 99 VALUE 0.

           MOVE 0 TO WS-FIELD-COUNT
           UNSTRING WS-CSV-RECORD
               DELIMITED BY ","
               INTO WS-F1 WS-F2 WS-F3 WS-F4 WS-F5
               TALLYING IN WS-FIELD-COUNT
           END-UNSTRING

      *    WS-FIELD-COUNT tells you how many fields were parsed

Important: Initialize the tally field to zero before the UNSTRING. The TALLYING clause increments, not replaces, the counter.

9.2.9 ON OVERFLOW

For UNSTRING, the ON OVERFLOW condition occurs when: - The source field contains more delimited fields than there are receiving fields - The pointer value is less than 1 or exceeds the source field length

           UNSTRING WS-LONG-RECORD
               DELIMITED BY ","
               INTO WS-F1 WS-F2 WS-F3
               ON OVERFLOW
                   DISPLAY "More fields than receivers"
               NOT ON OVERFLOW
                   DISPLAY "All fields parsed"
           END-UNSTRING

9.2.10 Parsing Real-World Data Formats

CSV Records:

           UNSTRING WS-CSV-LINE
               DELIMITED BY ","
               INTO WS-FIELD-1
                    WS-FIELD-2
                    WS-FIELD-3
               TALLYING IN WS-FIELD-COUNT
           END-UNSTRING

Pipe-Delimited Records:

           UNSTRING WS-PIPE-LINE
               DELIMITED BY "|"
               INTO WS-EMP-ID
                    WS-EMP-NAME
                    WS-EMP-DEPT
                    WS-EMP-SALARY
           END-UNSTRING

Date Reformatting (MM/DD/YYYY to YYYY-MM-DD):

           UNSTRING WS-US-DATE
               DELIMITED BY "/"
               INTO WS-MONTH WS-DAY WS-YEAR
           END-UNSTRING

           STRING WS-YEAR  DELIMITED BY SIZE
                  "-"       DELIMITED BY SIZE
                  WS-MONTH DELIMITED BY SIZE
                  "-"       DELIMITED BY SIZE
                  WS-DAY   DELIMITED BY SIZE
             INTO WS-ISO-DATE
           END-STRING

See code/example-02-unstring.cob for complete working examples and code/example-06-data-parsing.cob for a full CSV processing program.

9.3 The INSPECT Statement: Character Manipulation

The INSPECT statement is COBOL's Swiss Army knife for character-level operations. It can count characters, replace characters, and translate entire character sets -- all within a single statement. INSPECT operates on a field in place, examining every character from left to right.

9.3.1 INSPECT TALLYING: Counting Characters

The TALLYING form of INSPECT counts occurrences of characters or strings within a field:

INSPECT source-field TALLYING
    counter FOR {CHARACTERS | ALL literal | LEADING literal}
    [BEFORE | AFTER INITIAL literal]

Counting All Occurrences:

       01  WS-COUNT  PIC 999 VALUE 0.

           MOVE 0 TO WS-COUNT
           INSPECT "MISSISSIPPI"
               TALLYING WS-COUNT FOR ALL "S"
      *    WS-COUNT = 4

Counting Leading Characters:

           MOVE 0 TO WS-COUNT
           INSPECT "000012345"
               TALLYING WS-COUNT FOR LEADING "0"
      *    WS-COUNT = 4 (four leading zeros)

Counting All Characters:

           MOVE 0 TO WS-COUNT
           INSPECT WS-SOURCE
               TALLYING WS-COUNT FOR CHARACTERS
      *    WS-COUNT = total number of characters (field size)

Combined Tallying:

You can count multiple patterns in a single INSPECT:

           MOVE 0 TO WS-VOWEL-COUNT
           INSPECT WS-TEXT
               TALLYING WS-VOWEL-COUNT
               FOR ALL "A" ALL "E" ALL "I"
                   ALL "O" ALL "U"

Important: Always initialize the counter to zero before INSPECT TALLYING. The TALLYING clause adds to the counter; it does not reset it.

9.3.2 INSPECT REPLACING: Substitution

The REPLACING form modifies characters in place:

INSPECT source-field REPLACING
    {CHARACTERS | ALL literal-1 | LEADING literal-1 | FIRST literal-1}
    BY literal-2
    [BEFORE | AFTER INITIAL literal-3]

REPLACING ALL:

           INSPECT WS-PHONE
               REPLACING ALL "-" BY " "
      *    "555-123-4567" becomes "555 123 4567"

REPLACING LEADING:

           INSPECT WS-AMOUNT
               REPLACING LEADING "0" BY SPACE
      *    "000125.50" becomes "   125.50"

REPLACING FIRST:

           INSPECT WS-TEXT
               REPLACING FIRST "ABC" BY "XYZ"
      *    "ABCABC" becomes "XYZABC" (only first match)

REPLACING CHARACTERS:

           INSPECT WS-SENSITIVE-DATA
               REPLACING CHARACTERS BY "*"
               BEFORE INITIAL "|"
      *    "SECRET|PUBLIC" becomes "******|PUBLIC"

Note on replacement string length: When using REPLACING ALL, the replacement string must be exactly the same length as the string being replaced. REPLACING ALL "AB" BY "X" is invalid -- both must be the same length.

9.3.3 INSPECT CONVERTING: Character Translation

CONVERTING provides character-by-character translation, similar to the Unix tr command. Each character in the first string is replaced by the corresponding character in the second string:

INSPECT source-field CONVERTING
    string-1 TO string-2
    [BEFORE | AFTER INITIAL literal]

Uppercase to Lowercase:

           INSPECT WS-TEXT
               CONVERTING
               "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
               TO
               "abcdefghijklmnopqrstuvwxyz"

Lowercase to Uppercase:

           INSPECT WS-TEXT
               CONVERTING
               "abcdefghijklmnopqrstuvwxyz"
               TO
               "ABCDEFGHIJKLMNOPQRSTUVWXYZ"

Digit Masking:

           INSPECT WS-SSN
               CONVERTING "0123456789" TO "XXXXXXXXXX"
      *    "123-45-6789" becomes "XXX-XX-XXXX"

ROT13 Cipher:

           INSPECT WS-TEXT
               CONVERTING
               "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
               TO
               "NOPQRSTUVWXYZABCDEFGHIJKLM"

The CONVERTING clause is highly efficient for character set translation and is the recommended approach for EBCDIC-to-ASCII conversion on platforms that support it. The two strings must be the same length, and each character in the first string maps positionally to the corresponding character in the second string.

9.3.4 BEFORE/AFTER INITIAL: Bounded Operations

The BEFORE INITIAL and AFTER INITIAL phrases limit the scope of any INSPECT operation to a portion of the field:

      *    Count 'S' only before the word "RIVER"
           INSPECT "MISSISSIPPI RIVER BASIN"
               TALLYING WS-COUNT
               FOR ALL "S" BEFORE INITIAL "RIVER"
      *    WS-COUNT = 4 (counts only in "MISSISSIPPI ")

      *    Convert to lowercase only after "@" in an email
           INSPECT WS-EMAIL
               CONVERTING
               "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
               TO
               "abcdefghijklmnopqrstuvwxyz"
               AFTER INITIAL "@"

The INITIAL keyword means the operation applies before/after the first occurrence of the specified literal. If the literal is not found: - BEFORE INITIAL -- the entire field is within scope - AFTER INITIAL -- nothing is within scope (no operation performed)

See code/example-03-inspect.cob for complete working examples of all INSPECT variations.

9.4 Reference Modification: Substringing by Position

Reference modification provides direct access to substrings within a field by specifying a starting position and optional length. It was introduced in COBOL-85 and is one of the most frequently used string operations in modern COBOL programs.

9.4.1 Basic Syntax

identifier(start-position:length)

Both start-position and length can be numeric literals, data items, or arithmetic expressions. Positions are 1-based (the first character is position 1).

       01  WS-SOURCE  PIC X(20) VALUE "ABCDEFGHIJ".

           MOVE WS-SOURCE(1:5)  TO WS-TARGET
      *    WS-TARGET = "ABCDE"

           MOVE WS-SOURCE(4:3)  TO WS-TARGET
      *    WS-TARGET = "DEF"

9.4.2 Omitting the Length

When the length is omitted, the reference extends from the starting position to the end of the field:

           MOVE WS-SOURCE(6:)  TO WS-TARGET
      *    WS-TARGET = "FGHIJ" plus trailing spaces to fill
      *    the remainder of WS-SOURCE's defined length

9.4.3 Dynamic Substringing with Variables

The power of reference modification comes from using variables:

       01  WS-START   PIC 99 VALUE 1.
       01  WS-LEN     PIC 99 VALUE 5.

           MOVE WS-SOURCE(WS-START:WS-LEN) TO WS-TARGET

This enables patterns like iterating through a string character by character:

           PERFORM VARYING WS-IDX FROM 1 BY 1
               UNTIL WS-IDX > 20
               MOVE WS-SOURCE(WS-IDX:1) TO WS-CHAR
               IF WS-CHAR = "X"
                   DISPLAY "Found X at position " WS-IDX
               END-IF
           END-PERFORM

9.4.4 Using with Arithmetic Expressions

Reference modification supports arithmetic expressions in both the start position and length:

           MOVE WS-SOURCE(WS-IDX + 1 : WS-LEN - 2)
               TO WS-TARGET

This is particularly useful for calculating positions relative to other values:

      *    Extract last N characters
           COMPUTE WS-START =
               FUNCTION LENGTH(WS-SOURCE) - WS-N + 1
           MOVE WS-SOURCE(WS-START:WS-N) TO WS-TARGET

9.4.5 Reference Modification as a Receiving Field

Reference modification works on the receiving side of a MOVE as well, allowing in-place modification:

           MOVE "XXXX" TO WS-RECORD(5:4)
      *    Replaces positions 5-8 with "XXXX"

This is invaluable for fixed-format record construction and field-level masking.

9.4.6 Practical Applications

Formatting Phone Numbers:

       01  WS-RAW-PHONE  PIC X(10) VALUE "5551234567".
       01  WS-FMT-PHONE  PIC X(14).

           STRING "(" DELIMITED BY SIZE
                  WS-RAW-PHONE(1:3) DELIMITED BY SIZE
                  ") " DELIMITED BY SIZE
                  WS-RAW-PHONE(4:3) DELIMITED BY SIZE
                  "-" DELIMITED BY SIZE
                  WS-RAW-PHONE(7:4) DELIMITED BY SIZE
             INTO WS-FMT-PHONE
           END-STRING
      *    Result: "(555) 123-4567"

Parsing Fixed-Format Records:

       01  WS-RECORD  PIC X(50)
           VALUE "ACCT1234JOHN SMITH      20240315".

           MOVE WS-RECORD(1:4)   TO WS-ACCT-CODE
           MOVE WS-RECORD(5:4)   TO WS-ACCT-NUMBER
           MOVE WS-RECORD(9:16)  TO WS-CUST-NAME
           MOVE WS-RECORD(25:8)  TO WS-TRANS-DATE

Table-Driven Parsing:

       01  WS-FIELD-DEF.
           05  WS-FLD OCCURS 4.
               10  WS-FLD-START  PIC 99.
               10  WS-FLD-LEN   PIC 99.

           PERFORM VARYING WS-I FROM 1 BY 1
               UNTIL WS-I > 4
               MOVE WS-RECORD(WS-FLD-START(WS-I):
                   WS-FLD-LEN(WS-I))
                   TO WS-OUTPUT(WS-I)
           END-PERFORM

9.4.7 Boundary Checking Concerns

Reference modification does not automatically check boundaries. Accessing positions beyond the field's defined length produces undefined behavior:

       01  WS-SHORT  PIC X(10).

      *    DANGEROUS: accessing position 15 in a 10-byte field
           MOVE WS-SHORT(15:3) TO WS-TARGET

Always validate start position and length before use:

           IF WS-START > 0
               AND WS-START <= FUNCTION LENGTH(WS-FIELD)
               AND WS-START + WS-LEN - 1
                   <= FUNCTION LENGTH(WS-FIELD)
               MOVE WS-FIELD(WS-START:WS-LEN)
                   TO WS-TARGET
           ELSE
               DISPLAY "Boundary violation"
           END-IF

GnuCOBOL and some mainframe compilers offer runtime boundary checking options that can detect these errors during development.

See code/example-04-reference-mod.cob for complete working examples.

9.5 Intrinsic Functions for Strings

COBOL 2002 and COBOL 2014 introduced a set of intrinsic functions for string manipulation. These functions are invoked with the FUNCTION keyword and can be used anywhere a data item of the appropriate type is expected.

9.5.1 FUNCTION UPPER-CASE and LOWER-CASE

Convert all alphabetic characters to uppercase or lowercase:

           MOVE FUNCTION UPPER-CASE(WS-MIXED)
               TO WS-UPPER
           MOVE FUNCTION LOWER-CASE(WS-MIXED)
               TO WS-LOWER

These functions are particularly useful for case-insensitive comparisons:

           IF FUNCTION UPPER-CASE(WS-INPUT)
               = "APPROVED"
               PERFORM PROCESS-APPROVAL
           END-IF

Before COBOL 2002, case conversion required INSPECT CONVERTING, which remains a valid and efficient alternative.

9.5.2 FUNCTION REVERSE

Returns the characters of the argument in reverse order:

           MOVE FUNCTION REVERSE(WS-TEXT)
               TO WS-REVERSED

Practical uses include palindrome checking and finding the logical length of a string by counting leading spaces in the reversed string:

      *    Find logical length (excluding trailing spaces)
           MOVE 0 TO WS-TRAILING-SPACES
           INSPECT FUNCTION REVERSE(WS-TEXT)
               TALLYING WS-TRAILING-SPACES
               FOR LEADING SPACES
           COMPUTE WS-LOGICAL-LENGTH =
               FUNCTION LENGTH(WS-TEXT)
               - WS-TRAILING-SPACES

9.5.3 FUNCTION LENGTH vs LENGTH OF

Both return the defined size of a field, but with an important distinction:

FUNCTION LENGTH(identifier) -- a runtime function that returns the size in characters
LENGTH OF identifier -- a compile-time special register

           MOVE FUNCTION LENGTH(WS-FIELD) TO WS-LEN
      *    equivalent to:
           MOVE LENGTH OF WS-FIELD TO WS-LEN

For alphanumeric fields, both return the number of bytes. For national (Unicode) fields, FUNCTION LENGTH returns the number of characters, while LENGTH OF returns the number of bytes.

FUNCTION LENGTH can also be applied to literals:

           IF FUNCTION LENGTH("HELLO") = 5
               DISPLAY "Five characters"
           END-IF

9.5.4 FUNCTION TRIM

Removes leading spaces, trailing spaces, or both:

       01  WS-PADDED  PIC X(30) VALUE "   HELLO   ".

      *    Trim both ends (default)
           MOVE FUNCTION TRIM(WS-PADDED) TO WS-RESULT
      *    Result: "HELLO"

      *    Trim leading only
           MOVE FUNCTION TRIM(WS-PADDED LEADING)
               TO WS-RESULT
      *    Result: "HELLO   "

      *    Trim trailing only
           MOVE FUNCTION TRIM(WS-PADDED TRAILING)
               TO WS-RESULT
      *    Result: "   HELLO"

FUNCTION TRIM is especially useful for comparisons and for building output strings from padded fields:

           STRING FUNCTION TRIM(WS-FIRST)
                      DELIMITED BY SIZE
                  " " DELIMITED BY SIZE
                  FUNCTION TRIM(WS-LAST)
                      DELIMITED BY SIZE
             INTO WS-FULL-NAME
           END-STRING

Note: FUNCTION TRIM was introduced in COBOL 2002. Some older compilers may not support it, in which case you can use DELIMITED BY SPACE in a STRING statement or the REVERSE-and-count technique shown above.

9.5.5 FUNCTION ORD and FUNCTION CHAR

ORD returns the ordinal position of a character in the program's collating sequence. CHAR returns the character at a given ordinal position:

           MOVE FUNCTION ORD("A") TO WS-ORD
      *    ASCII: WS-ORD = 66 (65 + 1, since ORD is 1-based)
      *    EBCDIC: WS-ORD = 194

           MOVE FUNCTION CHAR(66) TO WS-RESULT
      *    ASCII: WS-RESULT = "A"

Important: FUNCTION ORD returns a 1-based ordinal, not the raw numeric code point. The ordinal value of a character is its position in the collating sequence plus 1. So in ASCII, ORD("A") returns 66 (the ASCII code 65, plus 1).

These functions are useful for character classification:

      *    Check if character is uppercase letter
           IF FUNCTION ORD(WS-CHAR) >= FUNCTION ORD("A")
               AND FUNCTION ORD(WS-CHAR) <= FUNCTION ORD("Z")
               DISPLAY "Uppercase letter"
           END-IF

9.5.6 FUNCTION CONCATENATE (COBOL 2014)

Concatenates multiple arguments into a single result:

           MOVE FUNCTION CONCATENATE(
               FUNCTION TRIM(WS-FIRST)
               " "
               FUNCTION TRIM(WS-LAST))
               TO WS-FULL-NAME

CONCATENATE can accept any number of arguments and supports nested function calls. It is more concise than STRING for simple concatenation but does not provide overflow detection or pointer tracking.

9.5.7 FUNCTION SUBSTITUTE (COBOL 2014)

Replaces occurrences of a search string with a replacement string:

           MOVE FUNCTION SUBSTITUTE(
               WS-TEXT
               "OLD-VALUE" "NEW-VALUE"
               "ANOTHER"   "REPLACED")
               TO WS-RESULT

The arguments after the source come in pairs: search string followed by replacement. Multiple pairs can be specified. Note that unlike INSPECT REPLACING, the replacement string does not need to be the same length as the search string.

Support for SUBSTITUTE is limited to compilers that implement the COBOL 2014 standard. Check your compiler's documentation for availability.

See code/example-05-intrinsic-string.cob for complete working examples.

9.6 EBCDIC vs. ASCII: Collating Sequences and String Operations

Understanding character encoding is fundamental to COBOL string handling, especially in environments that bridge mainframe and distributed systems.

9.6.1 Two Worlds of Character Encoding

EBCDIC (Extended Binary Coded Decimal Interchange Code) is used on IBM mainframes (z/OS, AS/400). In EBCDIC: - Lowercase letters (a-z) have codes 129-169 - Uppercase letters (A-Z) have codes 193-233 - Digits (0-9) have codes 240-249 - Letters come after digits in the collating sequence

ASCII (American Standard Code for Information Interchange) is used on PCs, Unix/Linux, and modern systems. In ASCII: - Digits (0-9) have codes 48-57 - Uppercase letters (A-Z) have codes 65-90 - Lowercase letters (a-z) have codes 97-122 - Digits come before letters in the collating sequence

9.6.2 Impact on String Operations

The different collating sequences affect several operations:

Sorting and Comparison: In EBCDIC, "9" > "A" is true (240 > 193). In ASCII, "9" < "A" is true (57 < 65). Programs that sort or compare mixed alphanumeric data may produce different results on different platforms.

INSPECT CONVERTING: Character translation tables must use the correct character set. An EBCDIC-to-ASCII conversion table is specific to the platform:

      *    On a mainframe, converting common EBCDIC to ASCII
      *    equivalents using INSPECT CONVERTING
           INSPECT WS-EBCDIC-DATA
               CONVERTING WS-EBCDIC-TABLE
               TO WS-ASCII-TABLE

FUNCTION ORD: The ordinal values returned by FUNCTION ORD depend on the platform's native character set. Code that uses ORD for character classification should use character comparisons rather than hardcoded ordinal values.

9.6.3 Portable String Code

To write string-handling code that works correctly on both EBCDIC and ASCII platforms:

Use FUNCTION UPPER-CASE and LOWER-CASE instead of hardcoded INSPECT CONVERTING tables for case conversion
Use alphabetic class conditions (IF WS-CHAR IS ALPHABETIC) rather than ordinal comparisons
Use FUNCTION ORD with character references rather than numeric literals: FUNCTION ORD("A") rather than 66
Specify PROGRAM COLLATING SEQUENCE in the OBJECT-COMPUTER paragraph when consistent ordering is required

9.7 Common String Handling Patterns

9.7.1 Left-Justifying a Field

      *    Remove leading spaces by unstringing then restringing
           UNSTRING WS-INPUT
               DELIMITED BY ALL SPACES
               INTO WS-TEMP
           END-UNSTRING
           MOVE WS-TEMP TO WS-OUTPUT

Or using FUNCTION TRIM:

           MOVE FUNCTION TRIM(WS-INPUT LEADING)
               TO WS-OUTPUT

9.7.2 Right-Justifying a Field

      *    Find logical length
           MOVE 0 TO WS-TRAILING
           INSPECT FUNCTION REVERSE(WS-INPUT)
               TALLYING WS-TRAILING FOR LEADING SPACES
           COMPUTE WS-DATA-LEN =
               FUNCTION LENGTH(WS-INPUT) - WS-TRAILING

      *    Calculate padding
           COMPUTE WS-PAD =
               FUNCTION LENGTH(WS-OUTPUT) - WS-DATA-LEN
           MOVE SPACES TO WS-OUTPUT
           MOVE WS-INPUT(1:WS-DATA-LEN)
               TO WS-OUTPUT(WS-PAD + 1:WS-DATA-LEN)

9.7.3 Padding a Field

      *    Pad with zeros on the left
           MOVE SPACES TO WS-PADDED
           MOVE ALL "0" TO WS-PADDED
           COMPUTE WS-START =
               FUNCTION LENGTH(WS-PADDED) - WS-DATA-LEN + 1
           MOVE WS-DATA(1:WS-DATA-LEN)
               TO WS-PADDED(WS-START:WS-DATA-LEN)

9.7.4 Case-Insensitive Comparison

           IF FUNCTION UPPER-CASE(WS-INPUT-1)
               = FUNCTION UPPER-CASE(WS-INPUT-2)
               DISPLAY "Match (case-insensitive)"
           END-IF

9.7.5 Data Validation: Checking for Numeric Content

           MOVE 0 TO WS-DIGIT-COUNT
           INSPECT WS-FIELD
               TALLYING WS-DIGIT-COUNT
               FOR ALL "0" ALL "1" ALL "2" ALL "3"
                   ALL "4" ALL "5" ALL "6" ALL "7"
                   ALL "8" ALL "9"
               BEFORE INITIAL SPACE

           IF WS-DIGIT-COUNT = FUNCTION LENGTH(
               FUNCTION TRIM(WS-FIELD))
               DISPLAY "Field is all numeric"
           END-IF

Or more simply:

           IF WS-FIELD IS NUMERIC
               DISPLAY "Field is numeric"
           END-IF

9.7.6 Simple Search and Replace

COBOL does not have a built-in search-and-replace for arbitrary-length strings (prior to COBOL 2014's SUBSTITUTE). For same-length replacements, use INSPECT REPLACING ALL:

           INSPECT WS-TEXT
               REPLACING ALL "OLD" BY "NEW"

For different-length replacements, you must implement a loop with reference modification:

           MOVE 1 TO WS-SRC-IDX
           MOVE 1 TO WS-DST-IDX
           PERFORM UNTIL WS-SRC-IDX > WS-SRC-LEN
               IF WS-SOURCE(WS-SRC-IDX:WS-SEARCH-LEN)
                   = WS-SEARCH-STR
                   MOVE WS-REPLACE-STR(1:WS-REPL-LEN)
                       TO WS-DEST(WS-DST-IDX:WS-REPL-LEN)
                   ADD WS-SEARCH-LEN TO WS-SRC-IDX
                   ADD WS-REPL-LEN   TO WS-DST-IDX
               ELSE
                   MOVE WS-SOURCE(WS-SRC-IDX:1)
                       TO WS-DEST(WS-DST-IDX:1)
                   ADD 1 TO WS-SRC-IDX
                   ADD 1 TO WS-DST-IDX
               END-IF
           END-PERFORM

9.7.7 Building Report Output Strings

The WITH POINTER clause of STRING is ideal for building fixed-column report lines:

           MOVE SPACES TO WS-REPORT-LINE
           MOVE 1 TO WS-PTR

      *    Column 1: Account number
           STRING WS-ACCT-NUM DELIMITED BY SIZE
             INTO WS-REPORT-LINE
             WITH POINTER WS-PTR
           END-STRING

      *    Column 12: Customer name
           MOVE 12 TO WS-PTR
           STRING WS-CUST-NAME DELIMITED BY "  "
             INTO WS-REPORT-LINE
             WITH POINTER WS-PTR
           END-STRING

      *    Column 40: Balance
           MOVE 40 TO WS-PTR
           STRING WS-FMT-BALANCE DELIMITED BY SIZE
             INTO WS-REPORT-LINE
             WITH POINTER WS-PTR
           END-STRING

9.8 Parsing Real-World Data

9.8.1 CSV File Processing

Processing CSV files is one of the most common string handling tasks. The general approach is:

Read a record from the file
Use UNSTRING with DELIMITED BY "," to split fields
Use TALLYING IN to verify the expected number of fields
Validate each parsed field
Convert string representations to appropriate types

       3100-PARSE-CSV-RECORD.
           MOVE SPACES TO WS-PARSED-FIELDS
           MOVE 0 TO WS-FIELD-TALLY

           UNSTRING WS-INPUT-RECORD
               DELIMITED BY ","
               INTO WS-FIELD-1
                    WS-FIELD-2
                    WS-FIELD-3
                    WS-FIELD-4
               TALLYING IN WS-FIELD-TALLY
               ON OVERFLOW
                   SET EXTRA-FIELDS TO TRUE
           END-UNSTRING

           IF WS-FIELD-TALLY NOT = 4
               SET RECORD-INVALID TO TRUE
           END-IF

Handling quoted CSV fields: Standard COBOL UNSTRING does not handle quoted fields (where commas may appear within quotes). For true RFC 4180 CSV processing, you need a character-by-character parser using reference modification. This is demonstrated in the case study programs.

9.8.2 Fixed-Length Record Parsing

Fixed-length records are COBOL's native format. Reference modification provides direct access:

      *    Legacy record: NAME(1:30) ADDR(31:30) CITY(61:20)
      *                   ST(81:2) ZIP(83:5) ACCT(88:10)
           MOVE WS-RECORD(1:30)  TO WS-NAME
           MOVE WS-RECORD(31:30) TO WS-ADDRESS
           MOVE WS-RECORD(61:20) TO WS-CITY
           MOVE WS-RECORD(81:2)  TO WS-STATE
           MOVE WS-RECORD(83:5)  TO WS-ZIP
           MOVE WS-RECORD(88:10) TO WS-ACCOUNT

For maintainability, define the record structure in the DATA DIVISION rather than using reference modification for every field:

       01  WS-LEGACY-RECORD.
           05  WS-LR-NAME     PIC X(30).
           05  WS-LR-ADDRESS  PIC X(30).
           05  WS-LR-CITY     PIC X(20).
           05  WS-LR-STATE    PIC X(02).
           05  WS-LR-ZIP      PIC X(05).
           05  WS-LR-ACCOUNT  PIC X(10).

Use reference modification for dynamic or table-driven parsing where the field positions come from a configuration table.

9.8.3 Name and Address Parsing

Name parsing is notoriously complex due to the variety of formats: - "SMITH, JOHN" (last-first) - "JOHN SMITH" (first-last) - "JOHN Q. SMITH" (first-middle-last) - "DR. JOHN SMITH JR." (prefix-first-last-suffix) - "MARY O'CONNOR" (names with apostrophes)

The general strategy is:

Check for a comma to detect "LAST, FIRST" format
Count words to determine the pattern
Check for known prefixes (DR., MR., MRS., MS.)
Check for known suffixes (JR., SR., III, IV)
Handle special cases (multi-word last names like "VAN DER BERG")

See code/case-study-code.cob for a complete implementation.

9.9 Performance Considerations

9.9.1 STRING vs. Reference Modification

For simple concatenation of fields at known positions, direct MOVE with reference modification is faster than STRING:

      *    Faster approach for known-position assembly
           MOVE WS-YEAR  TO WS-DATE(1:4)
           MOVE "-"       TO WS-DATE(5:1)
           MOVE WS-MONTH TO WS-DATE(6:2)
           MOVE "-"       TO WS-DATE(8:1)
           MOVE WS-DAY   TO WS-DATE(9:2)

Use STRING when the positions are dynamic, when multiple delimited fields are involved, or when overflow detection is needed.

9.9.2 INSPECT Efficiency

INSPECT scans the entire field from left to right. For large fields, use BEFORE INITIAL or AFTER INITIAL to limit the scan range:

      *    More efficient: only scan up to the delimiter
           INSPECT WS-LARGE-FIELD
               TALLYING WS-COUNT FOR ALL "X"
               BEFORE INITIAL SPACES

9.9.3 Avoid Unnecessary Operations

When processing millions of records, small optimizations in string handling compound significantly:

Pre-calculate lengths rather than calling FUNCTION LENGTH in a loop
Use INSPECT CONVERTING for bulk character translation rather than character-by-character loops
Initialize receiving fields once before a loop, not inside the loop, if possible
Use MOVE for simple copies rather than STRING with DELIMITED BY SIZE

9.9.4 Intrinsic Functions and Performance

Intrinsic functions like UPPER-CASE and TRIM create temporary intermediate results. In tight loops, it may be more efficient to use INSPECT CONVERTING for case conversion and manual trimming logic rather than function calls.

9.10 Common Mistakes and Debugging Tips

9.10.1 Pointer Management Errors

Forgetting to initialize the pointer:

      *    BUG: WS-PTR might contain anything
           STRING WS-DATA DELIMITED BY SIZE
             INTO WS-OUTPUT
             WITH POINTER WS-PTR
           END-STRING

      *    FIX: Always initialize
           MOVE 1 TO WS-PTR
           STRING WS-DATA DELIMITED BY SIZE
             INTO WS-OUTPUT
             WITH POINTER WS-PTR
           END-STRING

Forgetting to initialize the TALLYING counter:

      *    BUG: WS-TALLY accumulates across calls
           UNSTRING WS-RECORD
               DELIMITED BY ","
               INTO WS-F1 WS-F2
               TALLYING IN WS-TALLY
           END-UNSTRING

      *    FIX: Reset before each UNSTRING
           MOVE 0 TO WS-TALLY
           UNSTRING WS-RECORD ...

9.10.2 Overflow Handling

Not coding ON OVERFLOW:

Every STRING and UNSTRING should include ON OVERFLOW handling in production code. Silent data truncation is a common source of production bugs.

9.10.3 Off-by-One Errors in Reference Modification

Common mistake: zero-based thinking

      *    BUG: Position 0 is invalid (COBOL is 1-based)
           MOVE WS-DATA(0:5) TO WS-TARGET

      *    FIX: Start at position 1
           MOVE WS-DATA(1:5) TO WS-TARGET

Common mistake: exceeding field boundaries

       01  WS-FIELD  PIC X(10).

      *    BUG: Positions 8-12 exceed the 10-byte field
           MOVE WS-FIELD(8:5) TO WS-TARGET

      *    FIX: Ensure start + length - 1 <= field size
           MOVE WS-FIELD(8:3) TO WS-TARGET

9.10.4 DELIMITED BY SIZE vs. DELIMITED BY SPACE

This is perhaps the most common STRING mistake:

       01  WS-NAME  PIC X(20) VALUE "JOHN".

      *    DELIMITED BY SIZE copies all 20 characters:
      *    "JOHN                " (with 16 trailing spaces)
           STRING WS-NAME DELIMITED BY SIZE ...

      *    DELIMITED BY SPACE copies until first space:
      *    "JOHN"
           STRING WS-NAME DELIMITED BY SPACE ...

For most practical concatenation, DELIMITED BY SPACE or DELIMITED BY " " (double space) is what you want.

9.10.5 INSPECT REPLACING Length Mismatch

      *    ERROR: Different lengths in REPLACING
           INSPECT WS-TEXT
               REPLACING ALL "AB" BY "X"
      *    "X" is 1 byte, "AB" is 2 bytes - INVALID

      *    FIX: Use same-length strings
           INSPECT WS-TEXT
               REPLACING ALL "AB" BY "X "
      *    Both are 2 bytes - VALID

9.10.6 Not Initializing Receiving Fields

      *    BUG: WS-OUTPUT may contain leftover data
           STRING WS-NEW-DATA DELIMITED BY SPACE
             INTO WS-OUTPUT
           END-STRING

      *    FIX: Initialize first
           MOVE SPACES TO WS-OUTPUT
           STRING WS-NEW-DATA DELIMITED BY SPACE
             INTO WS-OUTPUT
           END-STRING

9.11 Fixed-Format vs. Free-Format Examples

All examples in this chapter follow COBOL fixed-format conventions with the traditional column layout:

Columns 1-6: Sequence number area (left blank)
Column 7: Indicator area (* for comments)
Columns 8-11: Area A (division, section, paragraph headers)
Columns 12-72: Area B (statements)
Columns 73-80: Comment area (ignored by compiler)

In free-format COBOL (supported by GnuCOBOL with the -free flag and by many modern compilers), the same code can be written without column restrictions:

Fixed-format:

       01  WS-NAME     PIC X(30).
       01  WS-RESULT   PIC X(60).

           STRING WS-FIRST  DELIMITED BY SPACE
                  " "        DELIMITED BY SIZE
                  WS-LAST   DELIMITED BY SPACE
             INTO WS-RESULT
           END-STRING

Free-format:

01 WS-NAME   PIC X(30).
01 WS-RESULT PIC X(60).

STRING WS-FIRST DELIMITED BY SPACE
       " "      DELIMITED BY SIZE
       WS-LAST  DELIMITED BY SPACE
  INTO WS-RESULT
END-STRING

The functionality is identical. The code examples in the code/ directory use fixed-format for maximum compatibility, as this is the format encountered in the vast majority of existing COBOL codebases.

9.12 Comprehensive Example: CSV Transaction Processing

The program in code/example-06-data-parsing.cob demonstrates a complete data processing pipeline using the string handling techniques covered in this chapter:

Input: CSV records containing account number, name, transaction type, amount, and date
Parsing: UNSTRING with DELIMITED BY "," and TALLYING IN
Validation: INSPECT TALLYING to verify numeric content, EVALUATE for valid transaction types, boundary checks on dates
Formatting: STRING with WITH POINTER to build fixed-width report lines, reference modification for date reformatting
Accumulation: Running totals for deposits and withdrawals
Error handling: ON OVERFLOW, validation error reporting

This program processes eight sample records including deliberate error cases (invalid account number, negative amount, missing name) to demonstrate robust error handling.

Summary

This chapter covered the five pillars of COBOL string handling:

Mechanism	Primary Use	Key Feature
STRING	Concatenation	DELIMITED BY, WITH POINTER, ON OVERFLOW
UNSTRING	Parsing/splitting	DELIMITER IN, COUNT IN, TALLYING IN
INSPECT	Counting/replacing/translating	TALLYING, REPLACING, CONVERTING
Reference Modification	Substringing	identifier(start:length)
Intrinsic Functions	Case, trim, length, etc.	UPPER-CASE, TRIM, LENGTH, ORD, CHAR

The STRING and UNSTRING statements are complementary: STRING assembles, UNSTRING disassembles. Together with INSPECT for character-level operations and reference modification for positional access, they provide a complete toolkit for any string processing task.

Key principles to remember:

Always initialize receiving fields before STRING operations
Always initialize counters before INSPECT TALLYING and UNSTRING TALLYING
Always initialize pointers before using WITH POINTER
Always code ON OVERFLOW in production programs
Use DELIMITED BY SPACE (not SIZE) when you want to exclude trailing spaces
Reference modification is 1-based, not 0-based
INSPECT REPLACING requires same-length source and target strings
INSPECT CONVERTING requires same-length source and target character sets
Test string operations with boundary cases: empty fields, fields full of spaces, maximum-length data, and data that exceeds field capacity

The next chapter builds on these string handling skills by exploring file input/output operations, where you will read records from external files, parse them using the techniques from this chapter, and write formatted output.