Chapter 17: String Handling

DataField.Dev

22 min read

> "People think COBOL can't handle strings. They're wrong. COBOL handles strings the way a surgeon handles a scalpel — precisely, carefully, and with full awareness of boundaries." — Maria Chen, during a code review of a CSV parser

In This Chapter

17.1 Introduction: Strings in a Fixed-Field World
17.2 The STRING Statement
17.3 The UNSTRING Statement
17.4 Parsing CSV Data: A Complete Example
17.5 The INSPECT Statement
17.6 Reference Modification
17.7 GlobalBank Case Study: Formatting Account Statements
17.8 MedClaim Case Study: Parsing Diagnosis Codes
17.9 Common String Handling Patterns
17.10 String Handling for Data Interchange
17.11 Name Formatting and Manipulation
17.12 Performance Considerations
17.13 Defensive String Handling
17.14 GnuCOBOL String Handling Notes
17.15 Putting It All Together: Data Formatting Program
17.16 INSPECT CONVERTING: Deep Dive
17.17 Reference Modification Patterns for Fixed-Width Fields
17.18 Data Cleansing Patterns
17.19 Advanced INSPECT Patterns
17.20 Combining String Facilities: A Production Pattern Library
17.21 EBCDIC vs. ASCII: String Handling Portability Considerations
17.22 Chapter Summary
Key Terms Introduced in This Chapter

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

Chapter 17: String Handling

"People think COBOL can't handle strings. They're wrong. COBOL handles strings the way a surgeon handles a scalpel — precisely, carefully, and with full awareness of boundaries." — Maria Chen, during a code review of a CSV parser

17.1 Introduction: Strings in a Fixed-Field World

COBOL was born in an era of fixed-length fields punched into 80-column cards. A name was always 30 characters. An address was always 40. A state code was always 2. This fixed-field heritage is both COBOL's strength (data is always where you expect it) and its perceived weakness (what about variable-length data? delimiters? parsing?).

The truth is more nuanced. COBOL has a rich set of string manipulation tools that, while different from the regex-powered libraries of Python or Java, are remarkably capable. Four facilities form the core of COBOL string handling:

STRING — Concatenates multiple fields and literals into a single field, with delimiter control and overflow detection
UNSTRING — Parses a delimited string into multiple fields, with delimiter tracking, counting, and tallying
INSPECT — Examines a field character by character, counting occurrences, replacing characters, or converting between character sets
Reference modification — Accesses substrings within a field using offset and length notation

These tools matter more than ever. Modern mainframe COBOL programs increasingly need to:

Parse CSV files received from distributed systems
Build XML or JSON fragments for web service responses
Format human-readable messages from fixed-field data
Convert between character encodings (EBCDIC ↔ ASCII)
Extract components from composite data elements (diagnosis codes, addresses, names)

💡 The Modernization Spectrum: This chapter's theme — The Modernization Spectrum — manifests in string handling because modern data interchange formats (CSV, XML, JSON, pipe-delimited) are inherently string-oriented. COBOL programs that can parse and produce these formats become integration points between the mainframe and the modern distributed world. String handling is not just text manipulation — it is a modernization enabler.

17.2 The STRING Statement

The STRING statement concatenates (joins) data from multiple sending fields into a single receiving field.

17.2.1 Basic Syntax

STRING
    {identifier-1 | literal-1}
        DELIMITED BY {identifier-2 | literal-2 | SIZE}
    [{identifier-3 | literal-3}
        DELIMITED BY {identifier-4 | literal-4 | SIZE}]
    ...
    INTO identifier-5
    [WITH POINTER identifier-6]
    [ON OVERFLOW imperative-statement-1]
    [NOT ON OVERFLOW imperative-statement-2]
END-STRING

Key elements:

Sending fields: One or more fields or literals to concatenate. Each sending field has its own DELIMITED BY clause.
DELIMITED BY: Controls how much of the sending field is used:
DELIMITED BY SIZE — uses the entire field, including trailing spaces
DELIMITED BY SPACE — uses characters up to (but not including) the first space
DELIMITED BY ',' — uses characters up to (but not including) the first comma
DELIMITED BY identifier — uses characters up to the first occurrence of that identifier's value
INTO: The receiving field where concatenated data is placed
WITH POINTER: A numeric field tracking the current position in the receiving field. Must be initialized before the STRING and is updated after each character is placed.
ON OVERFLOW: Executed if the receiving field is too small to hold all the concatenated data

17.2.2 Simple Concatenation

       WORKING-STORAGE SECTION.
       01  WS-FIRST-NAME      PIC X(15) VALUE 'MARIA'.
       01  WS-LAST-NAME       PIC X(20) VALUE 'CHEN'.
       01  WS-FULL-NAME       PIC X(40) VALUE SPACES.

       PROCEDURE DIVISION.
           STRING WS-FIRST-NAME DELIMITED BY SPACE
                  ' '           DELIMITED BY SIZE
                  WS-LAST-NAME  DELIMITED BY SPACE
                  INTO WS-FULL-NAME
           END-STRING.

After execution, WS-FULL-NAME contains: 'MARIA CHEN '

Note how DELIMITED BY SPACE stops at the first space in each sending field. Without it, WS-FIRST-NAME would contribute all 15 characters (including 10 trailing spaces), and you would get 'MARIA CHEN '.

⚠️ Critical Detail: DELIMITED BY SPACE means "stop at the first space character." If the data itself contains embedded spaces (like "NEW YORK"), only "NEW" would be used. For data with embedded spaces, use DELIMITED BY SIZE or a specific delimiter.

17.2.3 Using the POINTER Phrase

The POINTER phrase gives you control over where in the receiving field the STRING operation begins and tracks where it ends:

       01  WS-OUTPUT          PIC X(80) VALUE SPACES.
       01  WS-PTR             PIC 99 VALUE 1.

       PROCEDURE DIVISION.
           MOVE 1 TO WS-PTR
           STRING 'ACCOUNT: '  DELIMITED BY SIZE
                  INTO WS-OUTPUT
                  WITH POINTER WS-PTR
           END-STRING.
      *    WS-PTR is now 10 (next available position)

           STRING WS-ACCT-NUM  DELIMITED BY SPACE
                  INTO WS-OUTPUT
                  WITH POINTER WS-PTR
           END-STRING.
      *    WS-PTR now points past the account number

           STRING '  DATE: '   DELIMITED BY SIZE
                  WS-TXN-DATE  DELIMITED BY SIZE
                  INTO WS-OUTPUT
                  WITH POINTER WS-PTR
           END-STRING.

The POINTER is an index into the receiving field. It starts at the value you set (usually 1) and increments by 1 for each character placed. After the STRING completes, the POINTER value tells you the position after the last character placed.

💡 POINTER Must Be Initialized: The STRING statement does not automatically set the POINTER to 1. If you forget to initialize it, the STRING may start placing characters at an unexpected position — or overflow immediately if the pointer value exceeds the receiving field length.

17.2.4 ON OVERFLOW Handling

The ON OVERFLOW phrase detects when the receiving field is too small:

           STRING WS-LINE-1    DELIMITED BY SIZE
                  WS-LINE-2    DELIMITED BY SIZE
                  WS-LINE-3    DELIMITED BY SIZE
                  INTO WS-OUTPUT
                  ON OVERFLOW
                      DISPLAY 'WARNING: Output truncated'
                      ADD 1 TO WS-OVERFLOW-CNT
                  NOT ON OVERFLOW
                      ADD 1 TO WS-SUCCESS-CNT
           END-STRING.

When overflow occurs, the STRING statement stops placing characters at the boundary of the receiving field. Characters that would have exceeded the field are lost. The pointer (if used) reflects the position after the last character that fit.

⚠️ Defensive Programming: In production code, always use ON OVERFLOW. Silent truncation is a data integrity risk. At minimum, log the overflow for debugging. At MedClaim, James Okafor requires every STRING statement to have ON OVERFLOW: "If we're truncating data, I want to know about it — not discover it six months later during an audit."

17.2.5 Building Formatted Output

STRING is ideal for building formatted messages, log entries, and output records:

       01  WS-MSG             PIC X(200) VALUE SPACES.
       01  WS-MSG-PTR         PIC 999 VALUE 1.

           MOVE 1 TO WS-MSG-PTR
           STRING 'TXN '       DELIMITED BY SIZE
                  WS-TXN-ID    DELIMITED BY SPACE
                  ' FOR ACCT ' DELIMITED BY SIZE
                  WS-ACCT-NUM  DELIMITED BY SPACE
                  ' AMT $'     DELIMITED BY SIZE
                  WS-AMT-EDIT  DELIMITED BY SPACE
                  ' ON '       DELIMITED BY SIZE
                  WS-DATE-EDIT DELIMITED BY SIZE
                  INTO WS-MSG
                  WITH POINTER WS-MSG-PTR
                  ON OVERFLOW
                      DISPLAY 'MSG OVERFLOW'
           END-STRING.

Result: 'TXN 00012345 FOR ACCT 1234567890 AMT $1,234.56 ON 01/15/2024'

🧪 Try It Yourself: Write a program that takes a first name, middle initial, and last name (each in separate fields) and uses STRING to produce three formats: 1. "Last, First M." (e.g., "CHEN, MARIA C.") 2. "First M. Last" (e.g., "MARIA C. CHEN") 3. "F. Last" (e.g., "M. CHEN")

Experiment with fields that contain no middle initial (spaces) and see how DELIMITED BY SPACE behaves.

17.3 The UNSTRING Statement

If STRING joins fields together, UNSTRING splits them apart. UNSTRING parses a delimited source field into multiple receiving fields.

17.3.1 Basic Syntax

UNSTRING identifier-1
    [DELIMITED BY [ALL] {identifier-2 | literal-1}
     [OR [ALL] {identifier-3 | literal-2}] ...]
    INTO identifier-4 [DELIMITER IN identifier-5]
                      [COUNT IN identifier-6]
         [identifier-7 [DELIMITER IN identifier-8]
                       [COUNT IN identifier-9]] ...
    [WITH POINTER identifier-10]
    [TALLYING IN identifier-11]
    [ON OVERFLOW imperative-statement-1]
    [NOT ON OVERFLOW imperative-statement-2]
END-UNSTRING

This is one of COBOL's most complex statements, but its parts are logical:

Source field (identifier-1): The string to parse
DELIMITED BY: One or more delimiters that separate the fields. ALL means consecutive delimiters count as one (treats "a,,b" as two fields instead of three with an empty middle field).
INTO fields: The receiving fields where parsed segments are placed
DELIMITER IN: Captures which delimiter was actually found
COUNT IN: Captures how many characters were moved into each receiving field
WITH POINTER: Tracks parsing position (allows multiple UNSTRING passes)
TALLYING IN: Counts how many receiving fields were populated
ON OVERFLOW: Triggered when there are more delimited segments than receiving fields

17.3.2 Simple UNSTRING — Parsing a Name

       01  WS-FULL-NAME       PIC X(40)
                               VALUE 'MARIA CHEN'.
       01  WS-FIRST           PIC X(15).
       01  WS-LAST            PIC X(20).

           UNSTRING WS-FULL-NAME
               DELIMITED BY SPACE
               INTO WS-FIRST
                    WS-LAST
           END-UNSTRING.

After execution: WS-FIRST = 'MARIA ', WS-LAST = 'CHEN '

17.3.3 Multiple Delimiters

UNSTRING can handle multiple delimiter types using OR:

       01  WS-DATE-STRING     PIC X(10) VALUE '01/15/2024'.
       01  WS-MONTH           PIC X(02).
       01  WS-DAY             PIC X(02).
       01  WS-YEAR            PIC X(04).

           UNSTRING WS-DATE-STRING
               DELIMITED BY '/' OR '-' OR '.'
               INTO WS-MONTH
                    WS-DAY
                    WS-YEAR
           END-UNSTRING.

This handles dates in any of these formats: "01/15/2024", "01-15-2024", "01.15.2024".

17.3.4 DELIMITER IN — Tracking Which Delimiter Was Found

       01  WS-INPUT           PIC X(50)
                               VALUE 'FIELD1,FIELD2;FIELD3'.
       01  WS-FLD1            PIC X(15).
       01  WS-FLD2            PIC X(15).
       01  WS-FLD3            PIC X(15).
       01  WS-DELIM1          PIC X(01).
       01  WS-DELIM2          PIC X(01).

           UNSTRING WS-INPUT
               DELIMITED BY ',' OR ';'
               INTO WS-FLD1  DELIMITER IN WS-DELIM1
                    WS-FLD2  DELIMITER IN WS-DELIM2
                    WS-FLD3
           END-UNSTRING.

After execution: - WS-FLD1 = 'FIELD1 ' - WS-DELIM1 = ',' - WS-FLD2 = 'FIELD2 ' - WS-DELIM2 = ';' - WS-FLD3 = 'FIELD3 '

17.3.5 COUNT IN — How Many Characters Were Parsed

       01  WS-CNT1            PIC 99.
       01  WS-CNT2            PIC 99.
       01  WS-CNT3            PIC 99.

           UNSTRING WS-INPUT
               DELIMITED BY ','
               INTO WS-FLD1 COUNT IN WS-CNT1
                    WS-FLD2 COUNT IN WS-CNT2
                    WS-FLD3 COUNT IN WS-CNT3
           END-UNSTRING.

COUNT IN tells you how many characters from the source were placed in each receiving field. This is invaluable when you need to know the actual length of variable-length data.

17.3.6 TALLYING IN — Counting Parsed Fields

       01  WS-FIELD-COUNT     PIC 99 VALUE ZERO.

           UNSTRING WS-CSV-LINE
               DELIMITED BY ','
               INTO WS-FLD1 WS-FLD2 WS-FLD3
                    WS-FLD4 WS-FLD5
               TALLYING IN WS-FIELD-COUNT
           END-UNSTRING.

After execution, WS-FIELD-COUNT contains the number of receiving fields that were populated. If the CSV line had only 3 fields, WS-FIELD-COUNT would be 3, and WS-FLD4/WS-FLD5 would be unchanged.

⚠️ TALLYING Must Be Initialized: TALLYING IN adds to the existing value. If you do not initialize it to zero before each UNSTRING, it will accumulate across multiple executions.

17.3.7 The ALL Phrase — Treating Consecutive Delimiters as One

Without ALL, consecutive delimiters create empty fields:

       01  WS-DATA  PIC X(20) VALUE 'A,,B,C'.
      * DELIMITED BY ','
      * Result: FLD1='A', FLD2='', FLD3='B', FLD4='C'

      * DELIMITED BY ALL ','
      * Result: FLD1='A', FLD2='B', FLD3='C'

ALL is essential when parsing data that may have varying amounts of whitespace:

           UNSTRING WS-LINE
               DELIMITED BY ALL SPACES
               INTO WS-WORD1 WS-WORD2 WS-WORD3
           END-UNSTRING.

This correctly parses 'THE QUICK FOX' into three words, ignoring the multiple spaces between them.

17.3.8 WITH POINTER — Multi-Pass Parsing

The POINTER phrase allows you to parse a string in multiple passes, continuing from where the last UNSTRING left off:

       01  WS-PARSE-PTR       PIC 999 VALUE 1.

      *    First pass: parse the header
           MOVE 1 TO WS-PARSE-PTR
           UNSTRING WS-RECORD
               DELIMITED BY '|'
               INTO WS-REC-TYPE
                    WS-REC-DATE
               WITH POINTER WS-PARSE-PTR
           END-UNSTRING.

      *    Second pass: parse the detail (starting where
      *    first pass left off)
           UNSTRING WS-RECORD
               DELIMITED BY '|'
               INTO WS-ACCT-NUM
                    WS-AMOUNT
                    WS-DESC
               WITH POINTER WS-PARSE-PTR
           END-UNSTRING.

17.4 Parsing CSV Data: A Complete Example

One of the most common string handling tasks in modern COBOL is parsing CSV (Comma-Separated Values) files received from distributed systems. Let us build a complete CSV parser.

17.4.1 The Challenge

CSV parsing seems simple — just split on commas. But real CSV data has complications:

Fields may be enclosed in quotes: "Smith, John",25,New York
Embedded commas in quoted fields: "123 Main St, Apt 4B"
Empty fields: Smith,,New York (missing middle field)
Trailing commas: Smith,25, (empty last field)
Varying numbers of fields per line

17.4.2 Simple CSV Parser (No Quoted Fields)

For CSV files without quoted fields, UNSTRING handles the task directly:

       IDENTIFICATION DIVISION.
       PROGRAM-ID. CSV-PARSE.
      *================================================================
      * Program:  CSV-PARSE
      * Purpose:  Parse simple CSV file into fixed-format output
      * Chapter:  17 - String Handling
      * Context:  GlobalBank - parse branch feed from web system
      *================================================================

       ENVIRONMENT DIVISION.
       INPUT-OUTPUT SECTION.
       FILE-CONTROL.
           SELECT CSV-FILE    ASSIGN TO CSVIN.
           SELECT OUTPUT-FILE ASSIGN TO FIXOUT.

       DATA DIVISION.
       FILE SECTION.

       FD  CSV-FILE.
       01  CSV-RECORD         PIC X(500).

       FD  OUTPUT-FILE
           RECORDING MODE IS F.
       01  OUTPUT-RECORD.
           05  OUT-ACCT-NUM   PIC X(10).
           05  OUT-NAME       PIC X(30).
           05  OUT-BALANCE    PIC X(15).
           05  OUT-STATUS     PIC X(02).
           05  OUT-BRANCH     PIC X(05).
           05  FILLER         PIC X(18).

       WORKING-STORAGE SECTION.
       01  WS-FLAGS.
           05  WS-EOF         PIC X VALUE 'N'.
               88  END-OF-FILE VALUE 'Y'.

       01  WS-CSV-FIELDS.
           05  WS-CSV-FLD1    PIC X(50).
           05  WS-CSV-FLD2    PIC X(50).
           05  WS-CSV-FLD3    PIC X(50).
           05  WS-CSV-FLD4    PIC X(50).
           05  WS-CSV-FLD5    PIC X(50).

       01  WS-FIELD-COUNT     PIC 99 VALUE ZERO.
       01  WS-COUNTS.
           05  WS-CNT1        PIC 999.
           05  WS-CNT2        PIC 999.
           05  WS-CNT3        PIC 999.
           05  WS-CNT4        PIC 999.
           05  WS-CNT5        PIC 999.

       01  WS-RECORD-COUNT    PIC 9(07) VALUE ZERO.
       01  WS-ERROR-COUNT     PIC 9(07) VALUE ZERO.

       PROCEDURE DIVISION.
       0000-MAIN.
           OPEN INPUT  CSV-FILE
           OPEN OUTPUT OUTPUT-FILE

           READ CSV-FILE
               AT END SET END-OF-FILE TO TRUE
           END-READ

      *    Skip header line
           IF NOT END-OF-FILE
               READ CSV-FILE
                   AT END SET END-OF-FILE TO TRUE
               END-READ
           END-IF

           PERFORM 1000-PROCESS-CSV
               UNTIL END-OF-FILE

           DISPLAY 'CSV-PARSE COMPLETE'
           DISPLAY '  RECORDS PROCESSED: ' WS-RECORD-COUNT
           DISPLAY '  ERRORS:            ' WS-ERROR-COUNT

           CLOSE CSV-FILE OUTPUT-FILE
           STOP RUN.

       1000-PROCESS-CSV.
           ADD 1 TO WS-RECORD-COUNT
           INITIALIZE WS-CSV-FIELDS
           MOVE ZERO TO WS-FIELD-COUNT

           UNSTRING CSV-RECORD
               DELIMITED BY ','
               INTO WS-CSV-FLD1 COUNT IN WS-CNT1
                    WS-CSV-FLD2 COUNT IN WS-CNT2
                    WS-CSV-FLD3 COUNT IN WS-CNT3
                    WS-CSV-FLD4 COUNT IN WS-CNT4
                    WS-CSV-FLD5 COUNT IN WS-CNT5
               TALLYING IN WS-FIELD-COUNT
               ON OVERFLOW
                   DISPLAY 'WARNING: Too many fields in '
                           'record ' WS-RECORD-COUNT
           END-UNSTRING

           IF WS-FIELD-COUNT < 5
               DISPLAY 'ERROR: Too few fields in record '
                       WS-RECORD-COUNT
                       ' (found ' WS-FIELD-COUNT ')'
               ADD 1 TO WS-ERROR-COUNT
           ELSE
               PERFORM 1100-FORMAT-OUTPUT
               WRITE OUTPUT-RECORD
           END-IF

           READ CSV-FILE
               AT END SET END-OF-FILE TO TRUE
           END-READ.

       1100-FORMAT-OUTPUT.
           INITIALIZE OUTPUT-RECORD
           MOVE WS-CSV-FLD1 TO OUT-ACCT-NUM
           MOVE WS-CSV-FLD2 TO OUT-NAME
           MOVE WS-CSV-FLD3 TO OUT-BALANCE
           MOVE WS-CSV-FLD4 TO OUT-STATUS
           MOVE WS-CSV-FLD5 TO OUT-BRANCH.

🧪 Try It Yourself: Create a CSV file with the following content and run the parser:

ACCT_NUM,NAME,BALANCE,STATUS,BRANCH
1234567890,MARIA CHEN,15234.56,AC,NYC01
9876543210,DEREK WASHINGTON,8901.23,AC,CHI01
5555555555,PRIYA KAPOOR,42567.89,AC,LAX01

17.4.3 Handling Quoted CSV Fields

For CSV with quoted fields, we need a more sophisticated approach. UNSTRING cannot directly handle quoted fields because the comma inside quotes is indistinguishable from the field separator. We must pre-process the record:

       1200-PARSE-QUOTED-CSV.
      *    Strategy: scan character by character,
      *    replacing commas inside quotes with a placeholder,
      *    then UNSTRING on commas, then restore placeholders.

           MOVE 'N' TO WS-IN-QUOTES
           PERFORM VARYING WS-POS FROM 1 BY 1
               UNTIL WS-POS > FUNCTION LENGTH(CSV-RECORD)
               EVALUATE TRUE
                   WHEN CSV-RECORD(WS-POS:1) = '"'
                       IF WS-IN-QUOTES = 'N'
                           MOVE 'Y' TO WS-IN-QUOTES
                       ELSE
                           MOVE 'N' TO WS-IN-QUOTES
                       END-IF
                       MOVE SPACE TO CSV-RECORD(WS-POS:1)
                   WHEN CSV-RECORD(WS-POS:1) = ','
                       AND WS-IN-QUOTES = 'Y'
                       MOVE X'FF' TO CSV-RECORD(WS-POS:1)
               END-EVALUATE
           END-PERFORM

      *    Now UNSTRING on commas (embedded commas are X'FF')
           UNSTRING CSV-RECORD
               DELIMITED BY ','
               INTO WS-CSV-FLD1 WS-CSV-FLD2 WS-CSV-FLD3
               TALLYING IN WS-FIELD-COUNT
           END-UNSTRING

      *    Restore X'FF' back to commas in each field
           INSPECT WS-CSV-FLD1
               REPLACING ALL X'FF' BY ','
           INSPECT WS-CSV-FLD2
               REPLACING ALL X'FF' BY ','
           INSPECT WS-CSV-FLD3
               REPLACING ALL X'FF' BY ','.

📊 Performance Note: This character-by-character scan is slower than a simple UNSTRING. For high-volume processing, consider whether quoted fields actually occur in your data. If they do not, the simple UNSTRING approach is both faster and clearer.

17.5 The INSPECT Statement

INSPECT examines a field character by character and performs three operations: counting, replacing, or converting characters.

17.5.1 INSPECT TALLYING — Counting Characters

INSPECT identifier-1 TALLYING
    identifier-2 FOR {CHARACTERS | ALL literal-1 | LEADING literal-2}
    [{BEFORE | AFTER} INITIAL {identifier-3 | literal-3}]

Count all occurrences:

       01  WS-COMMA-COUNT     PIC 99 VALUE ZERO.

           MOVE ZERO TO WS-COMMA-COUNT
           INSPECT WS-CSV-LINE
               TALLYING WS-COMMA-COUNT FOR ALL ','

Count leading zeros:

       01  WS-LEADING-ZEROS   PIC 99 VALUE ZERO.

           MOVE ZERO TO WS-LEADING-ZEROS
           INSPECT WS-AMOUNT-STR
               TALLYING WS-LEADING-ZEROS
                   FOR LEADING '0'

Count characters before a delimiter:

       01  WS-CHARS-BEFORE    PIC 99 VALUE ZERO.

           MOVE ZERO TO WS-CHARS-BEFORE
           INSPECT WS-DATA
               TALLYING WS-CHARS-BEFORE
                   FOR CHARACTERS BEFORE INITIAL ','

Multiple counts in one INSPECT:

           MOVE ZERO TO WS-DIGIT-CNT
                        WS-ALPHA-CNT
                        WS-SPACE-CNT
           INSPECT WS-DATA
               TALLYING
                   WS-DIGIT-CNT FOR ALL '0' '1' '2' '3' '4'
                                        '5' '6' '7' '8' '9'
                   WS-SPACE-CNT FOR ALL SPACE

💡 Counting Commas to Determine Field Count: Before parsing CSV data with UNSTRING, count the commas to determine how many fields are present. This helps you validate the record before parsing:

           MOVE ZERO TO WS-COMMA-COUNT
           INSPECT CSV-RECORD
               TALLYING WS-COMMA-COUNT FOR ALL ','
      *    Expected fields = commas + 1
           IF WS-COMMA-COUNT + 1 NOT = WS-EXPECTED-FIELDS
               DISPLAY 'FIELD COUNT MISMATCH'
           END-IF

17.5.2 INSPECT REPLACING — Character Substitution

INSPECT identifier-1 REPLACING
    {CHARACTERS BY {identifier-2 | literal-1} |
     ALL {identifier-3 | literal-2} BY {identifier-4 | literal-3} |
     LEADING {identifier-5 | literal-4} BY {identifier-6 | literal-5} |
     FIRST {identifier-7 | literal-6} BY {identifier-8 | literal-7}}
    [{BEFORE | AFTER} INITIAL {identifier-9 | literal-8}]

Replace all occurrences:

      *    Replace all hyphens with spaces
           INSPECT WS-PHONE-NUM
               REPLACING ALL '-' BY ' '

      *    Replace all commas with pipes
           INSPECT WS-DATA
               REPLACING ALL ',' BY '|'

Replace leading zeros with spaces:

           INSPECT WS-AMOUNT
               REPLACING LEADING '0' BY SPACE

Replace first occurrence only:

           INSPECT WS-TEXT
               REPLACING FIRST 'ERROR' BY 'WARN '

Replace with BEFORE/AFTER boundaries:

      *    Replace commas with semicolons, but only
      *    before the first pipe character
           INSPECT WS-DATA
               REPLACING ALL ',' BY ';'
                   BEFORE INITIAL '|'

17.5.3 INSPECT CONVERTING — Character Translation

CONVERTING replaces characters according to a translation table:

INSPECT identifier-1 CONVERTING
    {identifier-2 | literal-1} TO {identifier-3 | literal-2}
    [{BEFORE | AFTER} INITIAL {identifier-4 | literal-3}]

Convert lowercase to uppercase:

           INSPECT WS-NAME
               CONVERTING
                   'abcdefghijklmnopqrstuvwxyz'
                   TO
                   'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

Convert special characters to spaces:

           INSPECT WS-DATA
               CONVERTING
                   '!@#$%^&*()'
                   TO
                   '          '

Convert digits to their word equivalents (single-character mapping):

      *    This only works for single-character to
      *    single-character conversion
           INSPECT WS-CODE
               CONVERTING
                   'ABCDEF'
                   TO
                   '123456'

⚠️ CONVERTING Rule: The FROM and TO strings must be the same length. Each character in the FROM string is replaced by the character in the corresponding position of the TO string. This is a character-by-character mapping, not a substring replacement.

17.5.4 Combined TALLYING and REPLACING

You can combine TALLYING and REPLACING in a single INSPECT:

           MOVE ZERO TO WS-SPACE-CNT
           INSPECT WS-DATA
               TALLYING WS-SPACE-CNT FOR ALL SPACE
               REPLACING ALL SPACE BY '-'

This counts the spaces AND replaces them in a single pass — more efficient than two separate INSPECT statements.

17.6 Reference Modification

Reference modification provides direct substring access using a parenthesized offset and length notation.

17.6.1 Syntax

identifier(offset:length)

offset: Starting position (1-based). Can be a literal, data-name, or arithmetic expression.
length: Number of characters to access. Can be a literal, data-name, or arithmetic expression. Optional — if omitted, everything from offset to the end of the field is referenced.

17.6.2 Basic Examples

       01  WS-DATE            PIC X(08) VALUE '20240115'.
       01  WS-YEAR            PIC X(04).
       01  WS-MONTH           PIC X(02).
       01  WS-DAY             PIC X(02).

      *    Extract components
           MOVE WS-DATE(1:4) TO WS-YEAR       *> '2024'
           MOVE WS-DATE(5:2) TO WS-MONTH      *> '01'
           MOVE WS-DATE(7:2) TO WS-DAY        *> '15'

      *    Format as MM/DD/YYYY
       01  WS-FORMATTED-DATE  PIC X(10).

           STRING WS-DATE(5:2)  DELIMITED BY SIZE
                  '/'           DELIMITED BY SIZE
                  WS-DATE(7:2)  DELIMITED BY SIZE
                  '/'           DELIMITED BY SIZE
                  WS-DATE(1:4)  DELIMITED BY SIZE
                  INTO WS-FORMATTED-DATE
           END-STRING.
      *    Result: '01/15/2024'

17.6.3 Reference Modification with Variables

The real power of reference modification comes when the offset and length are computed at runtime:

       01  WS-TEXT            PIC X(100).
       01  WS-POS             PIC 999.
       01  WS-LEN             PIC 999.
       01  WS-SUBSTR          PIC X(50).

      *    Extract a variable-position substring
           MOVE WS-TEXT(WS-POS:WS-LEN) TO WS-SUBSTR

17.6.4 Scanning with Reference Modification

You can build a character-by-character scanner using reference modification:

      *    Find the position of the first comma
           MOVE ZERO TO WS-COMMA-POS
           PERFORM VARYING WS-POS FROM 1 BY 1
               UNTIL WS-POS > FUNCTION LENGTH(WS-DATA)
               OR WS-COMMA-POS > 0
               IF WS-DATA(WS-POS:1) = ','
                   MOVE WS-POS TO WS-COMMA-POS
               END-IF
           END-PERFORM

17.6.5 Right-Trimming with Reference Modification

COBOL fields are always padded with spaces on the right. To find the actual length of the data (excluding trailing spaces):

      *    Find the last non-space character
           MOVE FUNCTION LENGTH(WS-NAME) TO WS-ACTUAL-LEN
           PERFORM UNTIL WS-ACTUAL-LEN = 0
               OR WS-NAME(WS-ACTUAL-LEN:1) NOT = SPACE
               SUBTRACT 1 FROM WS-ACTUAL-LEN
           END-PERFORM
      *    WS-ACTUAL-LEN is now the position of the last
      *    non-space character

💡 FUNCTION LENGTH: The intrinsic function LENGTH returns the defined length of a data item. Combined with reference modification, it enables dynamic string operations. We will explore intrinsic functions fully in Chapter 20.

17.6.6 Reference Modification vs. Substring Moves

Reference modification can appear on either side of a MOVE:

      *    Extract (reading)
           MOVE WS-RECORD(15:10) TO WS-FIELD

      *    Insert (writing)
           MOVE WS-NEW-VALUE TO WS-RECORD(15:10)

⚠️ Boundary Checking: If the offset or offset+length-1 exceeds the field's length, the results are undefined (and may cause an S0C4 ABEND on mainframes). Always validate your computed offsets and lengths:

           IF WS-POS > 0
           AND WS-POS + WS-LEN - 1
               <= FUNCTION LENGTH(WS-DATA)
               MOVE WS-DATA(WS-POS:WS-LEN) TO WS-RESULT
           ELSE
               DISPLAY 'SUBSTRING OUT OF BOUNDS'
           END-IF

17.7 GlobalBank Case Study: Formatting Account Statements

Maria Chen needs to generate formatted statement lines from fixed-field transaction records. Each line must look like:

01/15/2024  DEPOSIT    POS #1234-NYC01    $  1,234.56  Balance: $ 45,678.90

Here is the string handling code:

       IDENTIFICATION DIVISION.
       PROGRAM-ID. GBFMT01.
      *================================================================
      * Program:  GBFMT01
      * Purpose:  Format transaction records for account statements
      * Chapter:  17 - String Handling
      * Context:  GlobalBank - customer statement generation
      *================================================================

       ENVIRONMENT DIVISION.
       INPUT-OUTPUT SECTION.
       FILE-CONTROL.
           SELECT TXN-FILE    ASSIGN TO TXNIN.
           SELECT STMT-FILE   ASSIGN TO STMTOUT.

       DATA DIVISION.
       FILE SECTION.

       FD  TXN-FILE.
       01  TXN-REC.
           05  TXN-ACCT       PIC X(10).
           05  TXN-DATE        PIC 9(08).
           05  TXN-TIME        PIC 9(06).
           05  TXN-TYPE        PIC X(01).
           05  TXN-AMOUNT      PIC S9(09)V99 COMP-3.
           05  TXN-BRANCH      PIC X(05).
           05  TXN-TELLER      PIC X(06).
           05  TXN-DESC        PIC X(30).
           05  TXN-REF-NUM     PIC X(10).
           05  FILLER          PIC X(06).

       FD  STMT-FILE.
       01  STMT-LINE           PIC X(132).

       WORKING-STORAGE SECTION.
       01  WS-EOF              PIC X VALUE 'N'.
           88  END-OF-FILE     VALUE 'Y'.

       01  WS-FORMATTED-DATE   PIC X(10).
       01  WS-TYPE-DESC        PIC X(12).
       01  WS-AMT-EDITED       PIC $ZZZ,ZZ9.99.
       01  WS-BAL-EDITED       PIC $ZZZ,ZZZ,ZZ9.99.
       01  WS-RUNNING-BAL      PIC S9(11)V99 VALUE ZERO.

       01  WS-STMT-PTR         PIC 999.

       PROCEDURE DIVISION.
       0000-MAIN.
           OPEN INPUT  TXN-FILE
           OPEN OUTPUT STMT-FILE

           READ TXN-FILE
               AT END SET END-OF-FILE TO TRUE
           END-READ

           PERFORM 1000-FORMAT-TRANSACTION
               UNTIL END-OF-FILE

           CLOSE TXN-FILE STMT-FILE
           STOP RUN.

       1000-FORMAT-TRANSACTION.
      *    Format the date: YYYYMMDD -> MM/DD/YYYY
           STRING TXN-DATE(5:2)  DELIMITED BY SIZE
                  '/'            DELIMITED BY SIZE
                  TXN-DATE(7:2)  DELIMITED BY SIZE
                  '/'            DELIMITED BY SIZE
                  TXN-DATE(1:4)  DELIMITED BY SIZE
                  INTO WS-FORMATTED-DATE
           END-STRING

      *    Set type description
           EVALUATE TXN-TYPE
               WHEN 'D'
                   MOVE 'DEPOSIT    ' TO WS-TYPE-DESC
                   ADD TXN-AMOUNT TO WS-RUNNING-BAL
               WHEN 'W'
                   MOVE 'WITHDRAWAL ' TO WS-TYPE-DESC
                   SUBTRACT TXN-AMOUNT FROM WS-RUNNING-BAL
               WHEN 'T'
                   MOVE 'TRANSFER   ' TO WS-TYPE-DESC
                   ADD TXN-AMOUNT TO WS-RUNNING-BAL
               WHEN 'F'
                   MOVE 'FEE        ' TO WS-TYPE-DESC
                   SUBTRACT TXN-AMOUNT FROM WS-RUNNING-BAL
               WHEN 'I'
                   MOVE 'INTEREST   ' TO WS-TYPE-DESC
                   ADD TXN-AMOUNT TO WS-RUNNING-BAL
               WHEN OTHER
                   MOVE 'UNKNOWN    ' TO WS-TYPE-DESC
           END-EVALUATE

      *    Format amounts
           MOVE TXN-AMOUNT TO WS-AMT-EDITED
           MOVE WS-RUNNING-BAL TO WS-BAL-EDITED

      *    Build the statement line using STRING
           MOVE SPACES TO STMT-LINE
           MOVE 1 TO WS-STMT-PTR
           STRING WS-FORMATTED-DATE DELIMITED BY SIZE
                  '  '              DELIMITED BY SIZE
                  WS-TYPE-DESC      DELIMITED BY SPACE
                  '  '              DELIMITED BY SIZE
                  TXN-DESC          DELIMITED BY SPACE
                  '  '              DELIMITED BY SIZE
                  WS-AMT-EDITED     DELIMITED BY SIZE
                  '  Balance: '     DELIMITED BY SIZE
                  WS-BAL-EDITED     DELIMITED BY SIZE
                  INTO STMT-LINE
                  WITH POINTER WS-STMT-PTR
                  ON OVERFLOW
                      DISPLAY 'STMT LINE OVERFLOW'
           END-STRING

           WRITE STMT-LINE

           READ TXN-FILE
               AT END SET END-OF-FILE TO TRUE
           END-READ.

17.8 MedClaim Case Study: Parsing Diagnosis Codes

At MedClaim, diagnosis codes arrive in ICD-10 format: a letter followed by two digits, a decimal point, and up to four additional characters (e.g., "E11.65" for Type 2 diabetes with hyperglycemia). Sarah Kim needs to parse these codes into their components for reporting:

       01  WS-ICD10-CODE       PIC X(08).
       01  WS-ICD10-PARTS.
           05  WS-ICD-CATEGORY  PIC X(01).
           05  WS-ICD-ETIOLOGY  PIC X(02).
           05  WS-ICD-DETAIL    PIC X(04).
       01  WS-DOT-POS          PIC 99.

       2000-PARSE-DIAGNOSIS.
      *    Validate basic format: letter + 2 digits + dot
           IF WS-ICD10-CODE(1:1) NOT ALPHABETIC
               MOVE 'INVALID: NOT ALPHA START' TO WS-ERR-MSG
               PERFORM 9000-LOG-ERROR
               EXIT PARAGRAPH
           END-IF

           IF WS-ICD10-CODE(2:2) NOT NUMERIC
               MOVE 'INVALID: NON-NUMERIC AFTER CATEGORY'
                   TO WS-ERR-MSG
               PERFORM 9000-LOG-ERROR
               EXIT PARAGRAPH
           END-IF

      *    Extract category letter
           MOVE WS-ICD10-CODE(1:1) TO WS-ICD-CATEGORY

      *    Extract etiology (2 digits after category)
           MOVE WS-ICD10-CODE(2:2) TO WS-ICD-ETIOLOGY

      *    Find the decimal point
           MOVE ZERO TO WS-DOT-POS
           INSPECT WS-ICD10-CODE
               TALLYING WS-DOT-POS
                   FOR CHARACTERS BEFORE INITIAL '.'

           IF WS-DOT-POS = 0
      *        No dot — code has no detail portion
               MOVE SPACES TO WS-ICD-DETAIL
           ELSE
      *        Extract detail after the dot
               ADD 2 TO WS-DOT-POS
      *        WS-DOT-POS now points past the dot
               IF WS-DOT-POS <=
                   FUNCTION LENGTH(WS-ICD10-CODE)
                   MOVE WS-ICD10-CODE(WS-DOT-POS:)
                       TO WS-ICD-DETAIL
               ELSE
                   MOVE SPACES TO WS-ICD-DETAIL
               END-IF
           END-IF.

🔵 Sarah Kim's Note: "The ICD-10 parsing looks simple, but the devil is in edge cases. Some legacy claims use ICD-9 format (3-5 digits, no leading letter). Our parser must handle both formats, plus malformed codes from manual data entry. Defensive string handling — checking every assumption about the data — is non-negotiable in claims processing."

17.9 Common String Handling Patterns

17.9.1 Pattern: Left-Trim

Remove leading spaces from a field:

       01  WS-LEADING-SPACES  PIC 999 VALUE ZERO.

           MOVE ZERO TO WS-LEADING-SPACES
           INSPECT WS-DATA
               TALLYING WS-LEADING-SPACES
                   FOR LEADING SPACE

           IF WS-LEADING-SPACES > 0
               MOVE WS-DATA(WS-LEADING-SPACES + 1:)
                   TO WS-DATA
           END-IF.

17.9.2 Pattern: Right-Trim (Find Actual Length)

       01  WS-ACTUAL-LEN      PIC 999.

           MOVE FUNCTION LENGTH(WS-DATA) TO WS-ACTUAL-LEN
           PERFORM UNTIL WS-ACTUAL-LEN = 0
               OR WS-DATA(WS-ACTUAL-LEN:1) NOT = SPACE
               SUBTRACT 1 FROM WS-ACTUAL-LEN
           END-PERFORM.
      *    WS-ACTUAL-LEN is the length of data without
      *    trailing spaces. Zero means the field is all spaces.

17.9.3 Pattern: Center a String

       01  WS-CENTERED        PIC X(80).
       01  WS-PAD-LEN         PIC 99.

      *    Assume WS-ACTUAL-LEN has been computed
           COMPUTE WS-PAD-LEN =
               (80 - WS-ACTUAL-LEN) / 2
           MOVE SPACES TO WS-CENTERED
           MOVE WS-DATA(1:WS-ACTUAL-LEN)
               TO WS-CENTERED(WS-PAD-LEN + 1:WS-ACTUAL-LEN).

17.9.4 Pattern: Replace Substring

INSPECT REPLACING ALL works for single characters or fixed strings. For variable-length substring replacement, use reference modification with STRING:

      *    Replace "OLD" with "NEW" at position WS-FOUND-POS
           MOVE SPACES TO WS-RESULT
           MOVE 1 TO WS-PTR
           STRING WS-DATA(1:WS-FOUND-POS - 1)
                      DELIMITED BY SIZE
                  'NEW'
                      DELIMITED BY SIZE
                  WS-DATA(WS-FOUND-POS + 3:)
                      DELIMITED BY SIZE
                  INTO WS-RESULT
                  WITH POINTER WS-PTR
           END-STRING.

17.9.5 Pattern: Token-by-Token Parsing

For parsing a string one token at a time (useful for command strings or free-form input):

       01  WS-INPUT           PIC X(200).
       01  WS-TOKEN           PIC X(50).
       01  WS-TOKEN-PTR       PIC 999 VALUE 1.
       01  WS-TOKEN-CNT       PIC 99 VALUE ZERO.

           MOVE 1 TO WS-TOKEN-PTR
           PERFORM UNTIL WS-TOKEN-PTR >
               FUNCTION LENGTH(WS-INPUT)
               INITIALIZE WS-TOKEN
               MOVE ZERO TO WS-TOKEN-CNT
               UNSTRING WS-INPUT
                   DELIMITED BY ALL SPACES
                   INTO WS-TOKEN
                   WITH POINTER WS-TOKEN-PTR
                   TALLYING IN WS-TOKEN-CNT
               END-UNSTRING
               IF WS-TOKEN-CNT > 0
                   PERFORM 2000-PROCESS-TOKEN
               END-IF
           END-PERFORM.

17.9.6 Pattern: Building Pipe-Delimited Output

For producing output that distributed systems can consume:

       01  WS-OUTPUT          PIC X(500) VALUE SPACES.
       01  WS-OUT-PTR         PIC 999 VALUE 1.

           MOVE 1 TO WS-OUT-PTR
           STRING WS-ACCT-NUM     DELIMITED BY SPACE
                  '|'             DELIMITED BY SIZE
                  WS-NAME         DELIMITED BY SPACE
                  '|'             DELIMITED BY SIZE
                  WS-DATE         DELIMITED BY SIZE
                  '|'             DELIMITED BY SIZE
                  WS-AMOUNT-EDIT  DELIMITED BY SPACE
                  '|'             DELIMITED BY SIZE
                  WS-STATUS       DELIMITED BY SPACE
                  INTO WS-OUTPUT
                  WITH POINTER WS-OUT-PTR
           END-STRING.

17.10 String Handling for Data Interchange

17.10.1 Building XML Fragments

COBOL programs increasingly need to produce XML for web services:

       01  WS-XML-BUFFER      PIC X(2000) VALUE SPACES.
       01  WS-XML-PTR         PIC 9(04) VALUE 1.

           MOVE 1 TO WS-XML-PTR
           STRING '<transaction>'      DELIMITED BY SIZE
                  '<account>'          DELIMITED BY SIZE
                  WS-ACCT-NUM          DELIMITED BY SPACE
                  '</account>'         DELIMITED BY SIZE
                  '<date>'             DELIMITED BY SIZE
                  WS-TXN-DATE          DELIMITED BY SIZE
                  '</date>'            DELIMITED BY SIZE
                  '<amount>'           DELIMITED BY SIZE
                  WS-AMOUNT-STR        DELIMITED BY SPACE
                  '</amount>'          DELIMITED BY SIZE
                  '<type>'             DELIMITED BY SIZE
                  WS-TXN-TYPE          DELIMITED BY SPACE
                  '</type>'            DELIMITED BY SIZE
                  '</transaction>'     DELIMITED BY SIZE
                  INTO WS-XML-BUFFER
                  WITH POINTER WS-XML-PTR
                  ON OVERFLOW
                      DISPLAY 'XML BUFFER OVERFLOW'
           END-STRING.

🔗 Cross-Reference: Chapter 39 (Real-Time Integration) and Chapter 40 (COBOL and the Modern Stack) explore XML and JSON processing in depth, including the XML GENERATE and JSON GENERATE statements available in modern Enterprise COBOL.

17.10.2 Parsing Key-Value Pairs

Configuration files or API responses sometimes contain key=value pairs:

      *    Input: 'TIMEOUT=30|RETRY=3|HOST=MAINFRAME01'
       01  WS-CONFIG          PIC X(200).
       01  WS-PAIR            PIC X(50).
       01  WS-KEY             PIC X(20).
       01  WS-VALUE           PIC X(30).
       01  WS-CFG-PTR         PIC 999 VALUE 1.
       01  WS-PAIR-CNT        PIC 99 VALUE ZERO.

           MOVE 1 TO WS-CFG-PTR
           PERFORM UNTIL WS-CFG-PTR >
               FUNCTION LENGTH(WS-CONFIG)
               INITIALIZE WS-PAIR WS-KEY WS-VALUE
               MOVE ZERO TO WS-PAIR-CNT
               UNSTRING WS-CONFIG
                   DELIMITED BY '|'
                   INTO WS-PAIR
                   WITH POINTER WS-CFG-PTR
                   TALLYING IN WS-PAIR-CNT
               END-UNSTRING
               IF WS-PAIR-CNT > 0
                   UNSTRING WS-PAIR
                       DELIMITED BY '='
                       INTO WS-KEY WS-VALUE
                   END-UNSTRING
                   PERFORM 3000-APPLY-CONFIG
               END-IF
           END-PERFORM.

17.11 Name Formatting and Manipulation

Name formatting is a surprisingly common and tricky string handling task. Names have varied structures and cultural conventions.

17.11.1 Name Inversion: "First Last" to "Last, First"

       01  WS-DISPLAY-NAME    PIC X(40) VALUE 'MARIA CHEN'.
       01  WS-FORMAL-NAME     PIC X(40).
       01  WS-FIRST           PIC X(20).
       01  WS-LAST            PIC X(20).

           INITIALIZE WS-FIRST WS-LAST
           UNSTRING WS-DISPLAY-NAME
               DELIMITED BY SPACE
               INTO WS-FIRST WS-LAST
           END-UNSTRING

           MOVE SPACES TO WS-FORMAL-NAME
           STRING WS-LAST   DELIMITED BY SPACE
                  ', '      DELIMITED BY SIZE
                  WS-FIRST  DELIMITED BY SPACE
                  INTO WS-FORMAL-NAME
           END-STRING.
      *    Result: 'CHEN, MARIA'

17.11.2 Extracting Initials

       01  WS-FULL-NAME       PIC X(60)
                               VALUE 'DEREK JAMES WASHINGTON'.
       01  WS-INITIALS        PIC X(10) VALUE SPACES.
       01  WS-WORD            PIC X(20).
       01  WS-NAME-PTR        PIC 999 VALUE 1.
       01  WS-INIT-PTR        PIC 99 VALUE 1.
       01  WS-TALLY-CNT       PIC 99 VALUE ZERO.

           MOVE 1 TO WS-NAME-PTR
           MOVE 1 TO WS-INIT-PTR
           PERFORM UNTIL WS-NAME-PTR >
               FUNCTION LENGTH(WS-FULL-NAME)
               INITIALIZE WS-WORD
               MOVE ZERO TO WS-TALLY-CNT
               UNSTRING WS-FULL-NAME
                   DELIMITED BY ALL SPACES
                   INTO WS-WORD
                   WITH POINTER WS-NAME-PTR
                   TALLYING IN WS-TALLY-CNT
               END-UNSTRING
               IF WS-TALLY-CNT > 0
                   MOVE WS-WORD(1:1)
                       TO WS-INITIALS(WS-INIT-PTR:1)
                   ADD 1 TO WS-INIT-PTR
               END-IF
           END-PERFORM.
      *    Result: WS-INITIALS = 'DJW'

17.12 Performance Considerations

17.12.1 STRING and UNSTRING vs. MOVE

For simple field-to-field operations, MOVE is always faster than STRING:

      *    Slower:
           STRING WS-SOURCE DELIMITED BY SIZE
                  INTO WS-TARGET
           END-STRING

      *    Faster:
           MOVE WS-SOURCE TO WS-TARGET

Use STRING only when you need concatenation, delimiter handling, or pointer tracking.

17.12.2 INSPECT vs. Reference Modification Loop

For counting a single character, INSPECT is typically faster than a manual loop because it may be implemented with hardware string instructions on mainframes:

      *    Faster (may use hardware instructions):
           INSPECT WS-DATA
               TALLYING WS-COUNT FOR ALL ','

      *    Slower (always character-by-character):
           MOVE ZERO TO WS-COUNT
           PERFORM VARYING WS-POS FROM 1 BY 1
               UNTIL WS-POS > 100
               IF WS-DATA(WS-POS:1) = ','
                   ADD 1 TO WS-COUNT
               END-IF
           END-PERFORM

17.12.3 Minimizing STRING Operations in Loops

If you are building the same type of output line millions of times, consider pre-building a template and using MOVE with reference modification instead of STRING:

      *    Template approach (faster in tight loops):
           MOVE WS-TEMPLATE TO WS-OUTPUT
           MOVE WS-ACCT-NUM  TO WS-OUTPUT(1:10)
           MOVE WS-DATE-EDIT TO WS-OUTPUT(15:10)
           MOVE WS-AMT-EDIT  TO WS-OUTPUT(30:14)

      *    STRING approach (slower but more readable):
           STRING WS-ACCT-NUM DELIMITED BY SPACE
                  ...
                  INTO WS-OUTPUT

📊 When Performance Matters: For batch programs processing millions of records, the performance difference between STRING and direct MOVE can be significant. Maria Chen's rule: "Use STRING for readability in low-volume paths. Use MOVE with reference modification for performance in high-volume inner loops."

17.13 Defensive String Handling

String operations are a common source of production ABENDs. Defensive programming is essential.

17.13.1 Always Initialize Receiving Fields

      *    Before STRING:
           MOVE SPACES TO WS-OUTPUT
           MOVE 1 TO WS-PTR

      *    Before UNSTRING:
           INITIALIZE WS-FLD1 WS-FLD2 WS-FLD3
           MOVE ZERO TO WS-TALLY-CNT

17.13.2 Always Use ON OVERFLOW

           STRING ... INTO WS-OUTPUT
               ON OVERFLOW
                   PERFORM 9100-LOG-OVERFLOW
           END-STRING

           UNSTRING ... INTO WS-FLD1 WS-FLD2
               ON OVERFLOW
                   PERFORM 9200-LOG-TOO-MANY-FIELDS
           END-UNSTRING

17.13.3 Validate Reference Modification Bounds

           IF WS-POS > 0
           AND WS-POS <= FUNCTION LENGTH(WS-DATA)
           AND WS-POS + WS-LEN - 1
               <= FUNCTION LENGTH(WS-DATA)
               MOVE WS-DATA(WS-POS:WS-LEN) TO WS-RESULT
           ELSE
               MOVE SPACES TO WS-RESULT
               PERFORM 9300-LOG-BOUNDS-ERROR
           END-IF

17.13.4 Handle Empty Input

      *    Check for empty/blank input before parsing
           IF WS-INPUT = SPACES
               DISPLAY 'WARNING: Empty input record'
               ADD 1 TO WS-EMPTY-CNT
           ELSE
               PERFORM 2000-PARSE-RECORD
           END-IF

⚠️ The Blank Record Trap: An entirely blank record passed to UNSTRING with DELIMITED BY SPACE or DELIMITED BY ',' will produce unexpected results. Always check for blank records before parsing.

17.14 GnuCOBOL String Handling Notes

For students using GnuCOBOL:

All four facilities are supported: STRING, UNSTRING, INSPECT, and reference modification all work in GnuCOBOL.
EBCDIC vs. ASCII: On ASCII systems, character ordering differs from EBCDIC. This affects INSPECT CONVERTING if you rely on character-range assumptions. Explicit character lists (as shown in this chapter) are portable.
FUNCTION LENGTH: Fully supported and recommended for portable code.
Performance: GnuCOBOL's string operations are adequate for development and testing. Mainframe z/Architecture includes hardware string instructions (CLCL, MVCL, TRT) that make INSPECT and reference modification faster at production scale.

🧪 Try It Yourself: Write a GnuCOBOL program that: 1. Reads a pipe-delimited file (NAME|CITY|STATE|ZIP) 2. Uses UNSTRING to parse each record 3. Uses STRING to produce formatted output ("NAME, CITY, STATE ZIP") 4. Uses INSPECT CONVERTING to convert names to uppercase 5. Uses reference modification to extract the first 3 digits of the ZIP code

17.15 Putting It All Together: Data Formatting Program

Here is a complete program that demonstrates all four string handling facilities working together — parsing a delimited input file, transforming the data, and producing formatted output:

       IDENTIFICATION DIVISION.
       PROGRAM-ID. STRFMT01.
      *================================================================
      * Program:  STRFMT01
      * Purpose:  Demonstrate STRING, UNSTRING, INSPECT, and
      *           reference modification working together
      * Input:    Pipe-delimited customer records
      * Output:   Formatted customer report lines
      *================================================================

       ENVIRONMENT DIVISION.
       INPUT-OUTPUT SECTION.
       FILE-CONTROL.
           SELECT INPUT-FILE  ASSIGN TO INFILE.
           SELECT OUTPUT-FILE ASSIGN TO OUTFILE.

       DATA DIVISION.
       FILE SECTION.

       FD  INPUT-FILE.
       01  INPUT-REC           PIC X(300).

       FD  OUTPUT-FILE.
       01  OUTPUT-REC          PIC X(132).

       WORKING-STORAGE SECTION.
       01  WS-EOF              PIC X VALUE 'N'.
           88  END-OF-FILE     VALUE 'Y'.

       01  WS-PARSED.
           05  WS-CUST-ID     PIC X(10).
           05  WS-FIRST-NAME  PIC X(20).
           05  WS-LAST-NAME   PIC X(25).
           05  WS-STREET      PIC X(40).
           05  WS-CITY        PIC X(25).
           05  WS-STATE       PIC X(02).
           05  WS-ZIP         PIC X(10).
           05  WS-PHONE       PIC X(15).

       01  WS-FIELD-COUNT     PIC 99 VALUE ZERO.
       01  WS-PIPE-COUNT      PIC 99 VALUE ZERO.

       01  WS-FORMATTED-NAME  PIC X(50).
       01  WS-FORMATTED-ADDR  PIC X(80).
       01  WS-FORMATTED-PHONE PIC X(20).
       01  WS-PTR             PIC 999.

       01  WS-RECORD-CNT      PIC 9(07) VALUE ZERO.
       01  WS-ERROR-CNT       PIC 9(07) VALUE ZERO.

       PROCEDURE DIVISION.
       0000-MAIN.
           OPEN INPUT  INPUT-FILE
           OPEN OUTPUT OUTPUT-FILE

           READ INPUT-FILE
               AT END SET END-OF-FILE TO TRUE
           END-READ

           PERFORM 1000-PROCESS-RECORDS
               UNTIL END-OF-FILE

           DISPLAY 'STRFMT01 COMPLETE'
           DISPLAY '  PROCESSED: ' WS-RECORD-CNT
           DISPLAY '  ERRORS:    ' WS-ERROR-CNT
           CLOSE INPUT-FILE OUTPUT-FILE
           STOP RUN.

       1000-PROCESS-RECORDS.
           ADD 1 TO WS-RECORD-CNT
           PERFORM 1100-PARSE-INPUT
           PERFORM 1200-VALIDATE-FIELDS
           PERFORM 1300-FORMAT-OUTPUT
           WRITE OUTPUT-REC

           READ INPUT-FILE
               AT END SET END-OF-FILE TO TRUE
           END-READ.

       1100-PARSE-INPUT.
      *    First, verify field count by counting pipes
           MOVE ZERO TO WS-PIPE-COUNT
           INSPECT INPUT-REC
               TALLYING WS-PIPE-COUNT FOR ALL '|'

           INITIALIZE WS-PARSED
           MOVE ZERO TO WS-FIELD-COUNT
           UNSTRING INPUT-REC
               DELIMITED BY '|'
               INTO WS-CUST-ID
                    WS-FIRST-NAME
                    WS-LAST-NAME
                    WS-STREET
                    WS-CITY
                    WS-STATE
                    WS-ZIP
                    WS-PHONE
               TALLYING IN WS-FIELD-COUNT
               ON OVERFLOW
                   DISPLAY 'TOO MANY FIELDS REC '
                           WS-RECORD-CNT
           END-UNSTRING.

       1200-VALIDATE-FIELDS.
      *    Convert names to uppercase
           INSPECT WS-FIRST-NAME
               CONVERTING
                   'abcdefghijklmnopqrstuvwxyz'
                   TO
                   'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
           INSPECT WS-LAST-NAME
               CONVERTING
                   'abcdefghijklmnopqrstuvwxyz'
                   TO
                   'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

      *    Remove hyphens from phone number for storage
           INSPECT WS-PHONE
               REPLACING ALL '-' BY ' '.

       1300-FORMAT-OUTPUT.
           MOVE SPACES TO OUTPUT-REC

      *    Format name: "LAST, FIRST"
           MOVE SPACES TO WS-FORMATTED-NAME
           STRING WS-LAST-NAME  DELIMITED BY SPACE
                  ', '          DELIMITED BY SIZE
                  WS-FIRST-NAME DELIMITED BY SPACE
                  INTO WS-FORMATTED-NAME
           END-STRING

      *    Format address: "CITY, STATE ZIP"
           MOVE SPACES TO WS-FORMATTED-ADDR
           STRING WS-CITY   DELIMITED BY SPACE
                  ', '      DELIMITED BY SIZE
                  WS-STATE  DELIMITED BY SIZE
                  ' '       DELIMITED BY SIZE
                  WS-ZIP    DELIMITED BY SPACE
                  INTO WS-FORMATTED-ADDR
           END-STRING

      *    Build output line
           MOVE 1 TO WS-PTR
           STRING WS-CUST-ID        DELIMITED BY SPACE
                  '  '              DELIMITED BY SIZE
                  WS-FORMATTED-NAME DELIMITED BY SPACE
                  '  '              DELIMITED BY SIZE
                  WS-FORMATTED-ADDR DELIMITED BY SPACE
                  INTO OUTPUT-REC
                  WITH POINTER WS-PTR
                  ON OVERFLOW
                      DISPLAY 'OUTPUT OVERFLOW REC '
                              WS-RECORD-CNT
                      ADD 1 TO WS-ERROR-CNT
           END-STRING.

17.16 INSPECT CONVERTING: Deep Dive

INSPECT CONVERTING deserves a thorough treatment because it is one of COBOL's most versatile character-transformation tools, and its behavior in edge cases is not always intuitive.

17.16.1 EBCDIC-to-ASCII Translation

On mainframes that must exchange data with distributed systems, character set conversion is a frequent requirement. INSPECT CONVERTING handles this elegantly when the mapping is a known subset:

      *--- Convert EBCDIC special characters to ASCII-safe equivalents
      *--- This handles the most common problem characters:
           INSPECT WS-DATA
               CONVERTING
                   X'4A4B4C4D4E4F'      *> EBCDIC: ¢.<(+|
                   TO
                   X'202E3C282B7C'       *> ASCII equivalents

For full character set translation, most shops use a 256-byte translation table and INSPECT CONVERTING with the full table. However, for targeted conversions of known problem characters, explicit short lists are clearer and more maintainable.

17.16.2 Masking Sensitive Data

MedClaim must mask Social Security numbers in audit reports. INSPECT CONVERTING provides a one-line solution:

       01  WS-SSN                PIC X(09) VALUE '123456789'.
       01  WS-MASKED-SSN         PIC X(09).

           MOVE WS-SSN TO WS-MASKED-SSN
           INSPECT WS-MASKED-SSN
               CONVERTING
                   '0123456789'
                   TO
                   '**********'
      *>   Result: WS-MASKED-SSN = '*********'

For partial masking (show last four digits), combine with reference modification:

           MOVE WS-SSN TO WS-MASKED-SSN
           INSPECT WS-MASKED-SSN(1:5)
               CONVERTING
                   '0123456789'
                   TO
                   '*****XXXXX'
      *>   Wait — this does not work as intended.
      *>   CONVERTING maps character-to-character:
      *>   '0'->'*', '1'->'*', '2'->'*', '3'->'*', '4'->'*'
      *>   '5'->'X', '6'->'X', '7'->'X', '8'->'X', '9'->'X'

      *--- Correct approach: mask first 5, leave last 4 ---
           MOVE WS-SSN TO WS-MASKED-SSN
           MOVE '*****' TO WS-MASKED-SSN(1:5)
      *>   Result: '*****6789'

The lesson: INSPECT CONVERTING maps each character in the FROM string to the corresponding character in the TO string. It is a transliteration, not a substring replacement. For partial masking, direct reference modification is simpler and more correct.

17.16.3 Normalizing Input Data

When GlobalBank receives customer data from external partners, names often contain mixed case, accidental digits, and stray punctuation. A normalization pipeline uses multiple INSPECT statements:

      *--- Step 1: Convert to uppercase ---
           INSPECT WS-CUSTOMER-NAME
               CONVERTING
                   'abcdefghijklmnopqrstuvwxyz'
                   TO
                   'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

      *--- Step 2: Remove digits from name fields ---
           INSPECT WS-CUSTOMER-NAME
               REPLACING ALL '0' BY SPACE
                         ALL '1' BY SPACE
                         ALL '2' BY SPACE
                         ALL '3' BY SPACE
                         ALL '4' BY SPACE
                         ALL '5' BY SPACE
                         ALL '6' BY SPACE
                         ALL '7' BY SPACE
                         ALL '8' BY SPACE
                         ALL '9' BY SPACE

      *--- Step 3: Remove punctuation except hyphen and apostrophe
      *---         (for names like O'BRIEN and SMITH-JONES) ---
           INSPECT WS-CUSTOMER-NAME
               CONVERTING
                   '!@#$%^&*()+=[]{}|;:",.<>?/~`'
                   TO
                   '                             '

Each INSPECT pass handles one aspect of normalization. While this could be combined into fewer statements, the three-pass approach is more readable and maintainable — each step's purpose is clear from its comment.

17.17 Reference Modification Patterns for Fixed-Width Fields

Mainframe data is overwhelmingly fixed-width. Reference modification is the primary tool for working with positional data.

17.17.1 Parsing Fixed-Width Records Without UNSTRING

When incoming records have fixed field positions (no delimiters), reference modification is more efficient and more appropriate than UNSTRING:

      *--- Input record layout (documented but not declared) ---
      *>   Positions 1-10:   Account number
      *>   Positions 11-18:  Transaction date (YYYYMMDD)
      *>   Positions 19-19:  Transaction type
      *>   Positions 20-30:  Amount (signed, implied V99)
      *>   Positions 31-60:  Description
      *>   Positions 61-65:  Branch code

       01  WS-RAW-RECORD          PIC X(100).
       01  WS-PARSED-FIELDS.
           05  WS-ACCT             PIC X(10).
           05  WS-DATE             PIC X(08).
           05  WS-TYPE             PIC X(01).
           05  WS-AMT-RAW          PIC X(11).
           05  WS-DESC             PIC X(30).
           05  WS-BRANCH           PIC X(05).

      *--- Parse by position ---
           MOVE WS-RAW-RECORD(1:10)  TO WS-ACCT
           MOVE WS-RAW-RECORD(11:8)  TO WS-DATE
           MOVE WS-RAW-RECORD(19:1)  TO WS-TYPE
           MOVE WS-RAW-RECORD(20:11) TO WS-AMT-RAW
           MOVE WS-RAW-RECORD(31:30) TO WS-DESC
           MOVE WS-RAW-RECORD(61:5)  TO WS-BRANCH

This is faster than UNSTRING because there is no delimiter scanning. Each field is extracted directly by its known position. This is the idiomatic COBOL approach for fixed-width data.

17.17.2 Sliding Window Pattern

When searching for a pattern within a field, a sliding window using reference modification is effective:

      *--- Find the first occurrence of 'ERROR' in a 500-byte field ---
       01  WS-LOG-LINE            PIC X(500).
       01  WS-SEARCH-POS          PIC 999 VALUE ZERO.
       01  WS-FOUND               PIC X   VALUE 'N'.
           88  PATTERN-FOUND      VALUE 'Y'.

           PERFORM VARYING WS-SEARCH-POS FROM 1 BY 1
               UNTIL WS-SEARCH-POS > 496
               OR PATTERN-FOUND
               IF WS-LOG-LINE(WS-SEARCH-POS:5) = 'ERROR'
                   SET PATTERN-FOUND TO TRUE
               END-IF
           END-PERFORM

           IF PATTERN-FOUND
               DISPLAY 'ERROR found at position '
                       WS-SEARCH-POS
           END-IF

The loop boundary is 496 (not 500) because we are comparing a 5-character window, and the last valid starting position is 500 - 5 + 1 = 496.

17.17.3 Field-by-Field Record Construction

When building output records for external systems, reference modification gives you precise control over byte placement:

       01  WS-OUTPUT-REC          PIC X(200) VALUE SPACES.

      *--- Build a fixed-format output record ---
           MOVE WS-ACCT-NUM       TO WS-OUTPUT-REC(1:10)
           MOVE WS-CUST-NAME      TO WS-OUTPUT-REC(11:40)
           MOVE WS-FORMATTED-DATE TO WS-OUTPUT-REC(51:10)
           MOVE WS-AMT-EDITED     TO WS-OUTPUT-REC(61:15)
           MOVE WS-STATUS-CODE    TO WS-OUTPUT-REC(76:2)
           MOVE WS-REGION         TO WS-OUTPUT-REC(78:4)

This is the "template" approach mentioned in Section 17.12.3. It is significantly faster than STRING for high-volume batch output because it avoids delimiter scanning and pointer management.

17.18 Data Cleansing Patterns

Data cleansing — removing or correcting invalid characters, normalizing formats, and ensuring consistency — is one of the most common uses of string handling in production COBOL.

17.18.1 Phone Number Normalization

Phone numbers arrive in many formats: (555) 123-4567, 555-123-4567, 5551234567, 555.123.4567. Normalizing them to a standard 10-digit format:

       01  WS-PHONE-RAW          PIC X(20).
       01  WS-PHONE-CLEAN        PIC X(20).
       01  WS-PHONE-DIGITS       PIC X(10).
       01  WS-DIGIT-POS          PIC 99 VALUE ZERO.
       01  WS-SCAN-POS           PIC 99.

      *--- Step 1: Remove all non-digits ---
           MOVE SPACES TO WS-PHONE-CLEAN
           MOVE WS-PHONE-RAW TO WS-PHONE-CLEAN

           INSPECT WS-PHONE-CLEAN
               CONVERTING
                   '()- .+/'
                   TO
                   '       '

      *--- Step 2: Extract only digits ---
           MOVE SPACES TO WS-PHONE-DIGITS
           MOVE ZERO TO WS-DIGIT-POS
           PERFORM VARYING WS-SCAN-POS FROM 1 BY 1
               UNTIL WS-SCAN-POS > 20
               OR WS-DIGIT-POS >= 10
               IF WS-PHONE-CLEAN(WS-SCAN-POS:1) NUMERIC
                   ADD 1 TO WS-DIGIT-POS
                   MOVE WS-PHONE-CLEAN(WS-SCAN-POS:1)
                       TO WS-PHONE-DIGITS(WS-DIGIT-POS:1)
               END-IF
           END-PERFORM

      *--- Step 3: Validate we got exactly 10 digits ---
           IF WS-DIGIT-POS NOT = 10
               DISPLAY 'INVALID PHONE: ' WS-PHONE-RAW
               MOVE SPACES TO WS-PHONE-DIGITS
           END-IF

This pattern handles any format by stripping non-digits first, then collecting digits into a clean 10-position field. The final validation ensures we have exactly 10 digits — no more (country codes), no less (incomplete numbers).

17.18.2 Address Line Cleanup

Address data from external sources often contains double spaces, leading/trailing spaces, and inconsistent abbreviations:

      *--- Remove double spaces ---
       01  WS-PREV-LEN           PIC 999.
       01  WS-CURR-LEN           PIC 999.

           MOVE FUNCTION LENGTH(WS-ADDRESS) TO WS-CURR-LEN
           PERFORM UNTIL WS-CURR-LEN = WS-PREV-LEN
               MOVE WS-CURR-LEN TO WS-PREV-LEN
               INSPECT WS-ADDRESS
                   REPLACING ALL '  ' BY ' '
      *>       Note: '  ' is two spaces, ' ' is one space
      *>       Each pass reduces one double-space to single
               MOVE FUNCTION LENGTH(WS-ADDRESS) TO WS-CURR-LEN
           END-PERFORM

⚠️ Important: INSPECT REPLACING ALL ' ' BY ' ' reduces each occurrence of two consecutive spaces to one space. But if there were three consecutive spaces, one pass produces two consecutive spaces (the third space remains). Multiple passes are needed until no more double-spaces exist.

17.18.3 Date Format Conversion

Converting between date formats is a constant need when interfacing with external systems:

      *--- Convert MM/DD/YYYY to YYYYMMDD ---
       01  WS-DATE-MDY            PIC X(10).  *> '03/15/2026'
       01  WS-DATE-YMD            PIC X(08).

           STRING WS-DATE-MDY(7:4)   DELIMITED BY SIZE
                  WS-DATE-MDY(1:2)   DELIMITED BY SIZE
                  WS-DATE-MDY(4:2)   DELIMITED BY SIZE
                  INTO WS-DATE-YMD
           END-STRING
      *>   Result: '20260315'

      *--- Convert YYYYMMDD to YYYY-MM-DD (ISO 8601) ---
       01  WS-DATE-ISO            PIC X(10).

           STRING WS-DATE-YMD(1:4) DELIMITED BY SIZE
                  '-'              DELIMITED BY SIZE
                  WS-DATE-YMD(5:2) DELIMITED BY SIZE
                  '-'              DELIMITED BY SIZE
                  WS-DATE-YMD(7:2) DELIMITED BY SIZE
                  INTO WS-DATE-ISO
           END-STRING
      *>   Result: '2026-03-15'

17.18.4 Complete Data Cleansing Pipeline

Tomás Rivera at MedClaim built a reusable data cleansing paragraph that standardizes incoming provider records:

       5000-CLEANSE-PROVIDER-DATA.
      *--- Uppercase the provider name ---
           INSPECT WS-PROV-NAME
               CONVERTING
                   'abcdefghijklmnopqrstuvwxyz'
                   TO
                   'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

      *--- Remove non-printable characters ---
           INSPECT WS-PROV-NAME
               REPLACING ALL LOW-VALUE BY SPACE

      *--- Normalize phone ---
           PERFORM 5100-NORMALIZE-PHONE

      *--- Validate and normalize state code ---
           INSPECT WS-PROV-STATE
               CONVERTING
                   'abcdefghijklmnopqrstuvwxyz'
                   TO
                   'ABCDEFGHIJKLMNOPQRSTUVWXYZ'

      *--- Validate ZIP is numeric ---
           IF WS-PROV-ZIP(1:5) NOT NUMERIC
               MOVE '00000' TO WS-PROV-ZIP(1:5)
               ADD 1 TO WS-CLEANSE-ERRORS
           END-IF

      *--- Trim trailing spaces from city ---
           MOVE FUNCTION LENGTH(WS-PROV-CITY) TO WS-CITY-LEN
           PERFORM UNTIL WS-CITY-LEN = 0
               OR WS-PROV-CITY(WS-CITY-LEN:1) NOT = SPACE
               SUBTRACT 1 FROM WS-CITY-LEN
           END-PERFORM.

This pipeline demonstrates the production pattern: uppercase, remove garbage characters, normalize formats, validate content, and track cleansing statistics. Each step is simple and testable in isolation.

🧪 Try It Yourself: Data Cleansing Challenge

Write a COBOL program that reads a file of customer records with the following problems and produces a clean output file:

Names in mixed case (normalize to uppercase)
Phone numbers in various formats (normalize to 10 digits)
ZIP codes with trailing spaces or embedded hyphens (normalize to 5-digit or 9-digit)
Embedded low-value characters (replace with spaces)
Double spaces in address fields (reduce to single spaces)

Track statistics: records read, records with at least one cleansing correction, total corrections applied.

17.19 Advanced INSPECT Patterns

INSPECT is often underutilized. Beyond simple counting and replacing, it supports several advanced patterns that are valuable in production programs.

17.19.1 INSPECT with BEFORE and AFTER Boundaries

The BEFORE INITIAL and AFTER INITIAL clauses restrict INSPECT's scope to a portion of the field:

      *--- Count digits before the decimal point ---
       01  WS-AMOUNT-STR          PIC X(15) VALUE '   12345.67   '.
       01  WS-INT-DIGITS          PIC 99 VALUE ZERO.

           MOVE ZERO TO WS-INT-DIGITS
           INSPECT WS-AMOUNT-STR
               TALLYING WS-INT-DIGITS
                   FOR ALL '0' BEFORE INITIAL '.'
                   FOR ALL '1' BEFORE INITIAL '.'
                   FOR ALL '2' BEFORE INITIAL '.'
                   FOR ALL '3' BEFORE INITIAL '.'
                   FOR ALL '4' BEFORE INITIAL '.'
                   FOR ALL '5' BEFORE INITIAL '.'
                   FOR ALL '6' BEFORE INITIAL '.'
                   FOR ALL '7' BEFORE INITIAL '.'
                   FOR ALL '8' BEFORE INITIAL '.'
                   FOR ALL '9' BEFORE INITIAL '.'
      *>   Result: WS-INT-DIGITS = 5 (digits 1,2,3,4,5)

Count characters after a delimiter:

      *--- Count characters after the '@' in an email ---
       01  WS-EMAIL              PIC X(50) VALUE 'user@domain.com'.
       01  WS-DOMAIN-LEN         PIC 99 VALUE ZERO.

           MOVE ZERO TO WS-DOMAIN-LEN
           INSPECT WS-EMAIL
               TALLYING WS-DOMAIN-LEN
                   FOR CHARACTERS AFTER INITIAL '@'
                        BEFORE INITIAL SPACE
      *>   Result: WS-DOMAIN-LEN = 10 (domain.com)

17.19.2 INSPECT for Data Validation

INSPECT TALLYING is a powerful validation tool. You can verify that a field contains only expected characters:

      *--- Validate that account number contains only digits ---
       01  WS-ACCT-NUM           PIC X(10).
       01  WS-NON-DIGIT-CNT      PIC 99 VALUE ZERO.

           MOVE ZERO TO WS-NON-DIGIT-CNT
           MOVE FUNCTION LENGTH(WS-ACCT-NUM) TO WS-TOTAL-CHARS
           INSPECT WS-ACCT-NUM
               TALLYING WS-NON-DIGIT-CNT FOR CHARACTERS
           INSPECT WS-ACCT-NUM
               TALLYING WS-DIGIT-CNT
                   FOR ALL '0' '1' '2' '3' '4'
                           '5' '6' '7' '8' '9'

           IF WS-DIGIT-CNT NOT = WS-TOTAL-CHARS
               DISPLAY 'INVALID: Non-digit characters in account'
           END-IF

17.19.3 INSPECT REPLACING for Record Sanitization

When receiving data from external systems, records may contain control characters or other non-printable bytes that could disrupt downstream processing:

      *--- Replace all control characters with spaces ---
      *--- (Assumes EBCDIC; characters below X'40' are control) ---
           INSPECT WS-INCOMING-DATA
               REPLACING ALL X'00' BY SPACE
                         ALL X'01' BY SPACE
                         ALL X'02' BY SPACE
                         ALL X'03' BY SPACE
                         ALL X'0D' BY SPACE
                         ALL X'0A' BY SPACE
                         ALL X'15' BY SPACE
                         ALL X'25' BY SPACE

In practice, Tomás Rivera at MedClaim maintains a "sanitizer copybook" — a COPY member containing INSPECT REPLACING statements for every known problem character. Any program receiving external data includes this copybook:

           COPY SANITIZE REPLACING ==:DATA:== BY ==WS-INCOMING-REC==.

17.19.4 Multi-Stage INSPECT Pipeline

Complex transformations can be built by chaining INSPECT statements. Each stage handles one aspect of the transformation:

      *--- Stage 1: Normalize whitespace variations ---
           INSPECT WS-DATA
               REPLACING ALL X'09' BY SPACE    *> Tab to space
                         ALL X'0D' BY SPACE    *> CR to space
                         ALL X'0A' BY SPACE    *> LF to space

      *--- Stage 2: Remove leading zeros from numeric portion ---
           INSPECT WS-DATA
               REPLACING LEADING '0' BY SPACE
                   AFTER INITIAL '='
                   BEFORE INITIAL '|'

      *--- Stage 3: Convert delimiters ---
           INSPECT WS-DATA
               REPLACING ALL '|' BY ','

Each stage is independently testable and has a clear purpose. This pipeline approach is far more maintainable than trying to accomplish everything in a single complex transformation.

17.20 Combining String Facilities: A Production Pattern Library

The most effective string handling in COBOL comes from combining the four facilities — STRING, UNSTRING, INSPECT, and reference modification — in well-understood patterns.

17.20.1 Pattern: Parse, Validate, Transform, Assemble

This four-step pattern is the standard approach for data transformation programs:

       2000-TRANSFORM-RECORD.
      *--- PARSE: Break input into fields ---
           INITIALIZE WS-PARSED-FIELDS
           UNSTRING WS-INPUT-LINE
               DELIMITED BY '|'
               INTO WS-FLD-1 WS-FLD-2 WS-FLD-3 WS-FLD-4
               TALLYING IN WS-FIELD-COUNT
           END-UNSTRING

      *--- VALIDATE: Check each field ---
           IF WS-FIELD-COUNT < 4
               ADD 1 TO WS-ERROR-CNT
               PERFORM 9000-LOG-ERROR
               EXIT PARAGRAPH
           END-IF
           IF WS-FLD-1 = SPACES
               ADD 1 TO WS-ERROR-CNT
               PERFORM 9000-LOG-ERROR
               EXIT PARAGRAPH
           END-IF

      *--- TRANSFORM: Normalize and clean ---
           INSPECT WS-FLD-2
               CONVERTING
                   'abcdefghijklmnopqrstuvwxyz'
                   TO
                   'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
           INSPECT WS-FLD-3
               REPLACING ALL '-' BY SPACE

      *--- ASSEMBLE: Build output ---
           MOVE SPACES TO WS-OUTPUT-LINE
           MOVE 1 TO WS-OUT-PTR
           STRING WS-FLD-1  DELIMITED BY SPACE
                  ','       DELIMITED BY SIZE
                  WS-FLD-2  DELIMITED BY SPACE
                  ','       DELIMITED BY SIZE
                  WS-FLD-3  DELIMITED BY SPACE
                  ','       DELIMITED BY SIZE
                  WS-FLD-4  DELIMITED BY SPACE
                  INTO WS-OUTPUT-LINE
                  WITH POINTER WS-OUT-PTR
                  ON OVERFLOW
                      PERFORM 9100-LOG-OVERFLOW
           END-STRING.

This is the canonical form of a data transformation paragraph. Every field passes through validation before it reaches the output. Every string operation has error handling. The flow is linear and easy to follow.

17.20.2 Pattern: Multi-Line Record Assembly

Some output formats require assembling a single logical record from multiple physical lines. For example, building a fixed-width output record that spans 400 bytes from data distributed across several working-storage groups:

       01  WS-OUTPUT-RECORD      PIC X(400) VALUE SPACES.

       3000-ASSEMBLE-OUTPUT.
      *--- Header portion: bytes 1-50 ---
           MOVE WS-RECORD-TYPE    TO WS-OUTPUT-RECORD(1:2)
           MOVE WS-SEQUENCE-NUM   TO WS-OUTPUT-RECORD(3:8)
           MOVE WS-PROCESS-DATE   TO WS-OUTPUT-RECORD(11:8)
           MOVE WS-BATCH-ID       TO WS-OUTPUT-RECORD(19:12)
           MOVE SPACES             TO WS-OUTPUT-RECORD(31:20)

      *--- Customer portion: bytes 51-200 ---
           MOVE WS-CUST-ID        TO WS-OUTPUT-RECORD(51:10)
           MOVE WS-CUST-NAME      TO WS-OUTPUT-RECORD(61:50)
           MOVE WS-CUST-ADDR      TO WS-OUTPUT-RECORD(111:80)
           MOVE WS-CUST-PHONE     TO WS-OUTPUT-RECORD(191:10)

      *--- Financial portion: bytes 201-300 ---
           MOVE WS-AMOUNT-EDITED  TO WS-OUTPUT-RECORD(201:15)
           MOVE WS-TAX-EDITED     TO WS-OUTPUT-RECORD(216:12)
           MOVE WS-TOTAL-EDITED   TO WS-OUTPUT-RECORD(228:15)
           MOVE WS-CURRENCY       TO WS-OUTPUT-RECORD(243:3)
           MOVE SPACES             TO WS-OUTPUT-RECORD(246:55)

      *--- Trailer portion: bytes 301-400 ---
           MOVE WS-CONTROL-HASH   TO WS-OUTPUT-RECORD(301:20)
           MOVE WS-RECORD-COUNT   TO WS-OUTPUT-RECORD(321:8)
           MOVE SPACES             TO WS-OUTPUT-RECORD(329:72)

This template-based assembly using reference modification is the fastest approach for high-volume batch output. Each field is placed at its exact byte position with no parsing overhead.

💡 Documentation Practice: When using reference modification for record assembly, always include a record layout document (or copybook) that maps field names to byte positions. Without this documentation, maintenance becomes extremely difficult. Derek Washington learned this the hard way: "I once inherited a program with 50 reference-modification MOVEs and no layout documentation. It took me two days to reverse-engineer the record format."

17.21 EBCDIC vs. ASCII: String Handling Portability Considerations

When writing COBOL string handling code that must work on both mainframe (EBCDIC) and distributed (ASCII) platforms, several character set differences affect program behavior.

17.21.1 Collating Sequence Differences

The most significant difference is the collating (sort) order of characters:

EBCDIC order: space < lowercase < uppercase < digits
ASCII order:  space < digits < uppercase < lowercase

This means that INSPECT TALLYING FOR CHARACTERS BEFORE INITIAL 'A' will count different characters on EBCDIC vs. ASCII systems, because the characters that precede 'A' are different in each encoding. When portability matters, use explicit character lists rather than relying on collating sequence:

      *--- Portable: explicit character list ---
           INSPECT WS-DATA
               TALLYING WS-DIGIT-CNT
                   FOR ALL '0' '1' '2' '3' '4'
                           '5' '6' '7' '8' '9'

      *--- NOT portable: relies on collating sequence ---
           INSPECT WS-DATA
               TALLYING WS-COUNT
                   FOR CHARACTERS BEFORE INITIAL 'A'

17.21.2 Hex Literal Portability

Hex literals (X'nn') represent different characters in EBCDIC and ASCII:

X'C1' = 'A' in EBCDIC, but 'Á' (A-acute) in ASCII
X'41' = unassigned in EBCDIC, but 'A' in ASCII
X'40' = space in EBCDIC, but '@' in ASCII
X'20' = unassigned in EBCDIC, but space in ASCII

If your INSPECT CONVERTING or REPLACING uses hex literals, it is inherently non-portable. Use character literals instead when possible:

      *--- Portable ---
           INSPECT WS-DATA REPLACING ALL SPACE BY '-'

      *--- NOT portable ---
           INSPECT WS-DATA REPLACING ALL X'40' BY X'60'

17.21.3 The FUNCTION ORD and FUNCTION CHAR Alternative

For portable character manipulation, use the intrinsic functions ORD (character to ordinal position) and CHAR (ordinal to character) rather than hex literals. These functions return values based on the native character set of the platform:

      *--- Get the ordinal position of 'A' on any platform ---
           COMPUTE WS-ORD-A = FUNCTION ORD('A')

      *--- Get the character at ordinal position 65 ---
           MOVE FUNCTION CHAR(65) TO WS-CHAR
      *>   On ASCII: WS-CHAR = 'A'
      *>   On EBCDIC: WS-CHAR = something else

For string handling code that must be truly portable, avoid character arithmetic and hex literals entirely. Stick to named characters ('A', 'Z', '0', '9') and the INSPECT CONVERTING technique with explicit character strings.

17.22 Chapter Summary

In this chapter, we explored COBOL's four string handling facilities — each serving a distinct purpose in the manipulation of textual data:

STRING concatenates fields and literals with delimiter control, pointer tracking, and overflow detection. It is the primary tool for building formatted output, log messages, and data interchange records.
UNSTRING parses delimited strings into multiple fields, with support for multiple delimiter types, delimiter tracking, field counting, and multi-pass parsing. It is essential for CSV, pipe-delimited, and other variable-format data.
INSPECT examines fields character by character for counting (TALLYING), substituting (REPLACING), and translating (CONVERTING). It handles tasks from counting commas in a CSV line to converting lowercase to uppercase.
Reference modification provides direct substring access through offset:length notation. It enables character-by-character scanning, substring extraction and insertion, and high-performance string operations in inner loops.

Together, these facilities make COBOL a capable text processing language — not as concise as regex-based languages, perhaps, but precise, predictable, and well-suited to the kind of structured data transformation that enterprise systems demand.

The theme of The Modernization Spectrum runs through this chapter: string handling enables COBOL programs to participate in modern data interchange. A COBOL program that can parse CSV input from a web application and produce pipe-delimited output for an analytics system is a modernized program — even if its core business logic has not changed.

In the next chapter, we turn to table handling — COBOL's array processing facilities, including OCCURS, SEARCH, SEARCH ALL, and the powerful patterns that enable in-memory data manipulation.

Key Terms Introduced in This Chapter

Term	Definition
STRING	COBOL verb that concatenates multiple sending fields into a receiving field
UNSTRING	COBOL verb that parses a delimited string into multiple receiving fields
INSPECT	COBOL verb that examines a field character by character for counting, replacing, or converting
Reference modification	Substring access using identifier(offset:length) notation
DELIMITED BY	Clause controlling how much of a sending field is used in STRING, or what separates fields in UNSTRING
POINTER	Numeric field tracking position within the receiving/source field during STRING/UNSTRING
TALLYING	Counting operation in INSPECT (counts characters) or UNSTRING (counts fields populated)
ON OVERFLOW	Imperative phrase executed when STRING's receiving field is too small or UNSTRING has more fields than receivers
CONVERTING	INSPECT operation that translates characters according to a one-to-one mapping
GROUP INDICATE	(Cross-ref from Ch 16) Report Writer clause; not a string operation but often confused with string formatting