Chapter 9: String Handling and Character Manipulation -- Key Takeaways

Chapter Summary

COBOL was originally designed for processing fixed-length records and numeric data, not for the kind of flexible string manipulation common in modern programming languages. Nevertheless, over successive standard revisions, COBOL has acquired a robust set of string-handling capabilities. This chapter covered the core statements and intrinsic functions that allow COBOL programs to concatenate, parse, inspect, transform, and extract portions of alphanumeric data. Mastering these features is essential for tasks such as formatting report lines, parsing comma-delimited input, cleaning data fields, and converting between upper and lower case.

The STRING statement concatenates two or more source fields into a single destination field, with precise control over delimiters and positioning through the POINTER phrase. The UNSTRING statement performs the inverse operation, splitting a single source field into multiple destination fields based on specified delimiters. These two statements together provide the foundation for assembling and parsing structured text data. We examined practical scenarios including building full names from first and last name fields, parsing CSV records, and constructing formatted output lines.

The INSPECT statement provides powerful character-level analysis and transformation. Its TALLYING option counts occurrences of characters or strings, REPLACING substitutes one character or string for another, and CONVERTING performs character-by-character translation similar to the Unix tr command. Reference modification, COBOL's substring notation, allows direct access to any contiguous portion of a data item using a starting position and optional length. Finally, intrinsic functions such as FUNCTION TRIM, FUNCTION UPPER-CASE, FUNCTION LOWER-CASE, and FUNCTION LENGTH round out the string-handling toolkit, providing capabilities that were missing from earlier COBOL standards.

Key Concepts

STRING concatenates multiple sending fields into a single receiving field, stopping when the receiving field is full or all sending fields are exhausted.
The DELIMITED BY clause in STRING specifies what portion of each sending field to include: DELIMITED BY SIZE uses the entire field, while DELIMITED BY a literal or identifier stops at the first occurrence of that delimiter.
The POINTER phrase in STRING tracks the current position in the receiving field, enabling multiple STRING statements to build output incrementally.
The OVERFLOW condition in STRING triggers when the receiving field is too small to hold all the concatenated data.
UNSTRING splits a single source field into multiple destination fields based on one or more delimiters specified with DELIMITED BY.
UNSTRING supports the DELIMITER IN phrase to capture the actual delimiter found, and the COUNT IN phrase to capture the number of characters transferred to each receiving field.
The TALLYING IN phrase of UNSTRING counts the number of destination fields that received data, which is useful for determining how many tokens were in the source.
INSPECT TALLYING counts occurrences of specified characters or strings within a data item, storing the count in a numeric identifier.
INSPECT REPLACING substitutes characters or strings within a data item, with options for CHARACTERS, ALL, LEADING, and FIRST to control which occurrences are replaced.
INSPECT CONVERTING performs character-by-character translation, converting each character found in one string to the corresponding character in another string.
BEFORE and AFTER phrases in INSPECT limit the scanning to the portion of the field before or after the first occurrence of a specified character or string.
Reference modification accesses a substring using the syntax identifier(start:length), where start is a 1-based position and length is optional (defaulting to the remainder of the field).
FUNCTION UPPER-CASE and FUNCTION LOWER-CASE return the uppercase or lowercase equivalent of an alphanumeric argument.
FUNCTION TRIM removes leading spaces, trailing spaces, or both from an alphanumeric argument, controlled by the LEADING, TRAILING, or both options.
FUNCTION LENGTH returns the length of a data item in characters, which is determined at compile time for fixed-length items and at runtime for variable-length items.

Common Pitfalls

Not initializing the POINTER variable: The POINTER identifier in STRING must be initialized to 1 before the first STRING operation. Forgetting this causes data to be placed at an unpredictable position or triggers an overflow condition.
Not initializing the receiving field in STRING: STRING does not clear the receiving field before writing. If the concatenated result is shorter than the receiving field, leftover characters from a previous value remain. Always MOVE SPACES to the receiving field before STRING.
Forgetting the ON OVERFLOW clause: Omitting ON OVERFLOW means the program silently truncates data when the receiving field is too small, potentially losing important information without warning.
Delimiter confusion in UNSTRING: When a source field contains consecutive delimiters, UNSTRING treats the space between them as an empty field. This can produce unexpected empty receiving fields and an incorrect field count.
Off-by-one errors in reference modification: Reference modification uses 1-based positions, not 0-based. Specifying a start position of 0 causes a runtime abend with an S0C4 or equivalent error.
Exceeding field boundaries in reference modification: If start plus length minus one exceeds the length of the data item, the behavior is undefined and typically results in a runtime abend. Always validate positions before using reference modification.
Assuming INSPECT CONVERTING is case-insensitive: INSPECT CONVERTING matches characters exactly. Converting "abc" to "ABC" does not affect characters that are already uppercase. Each character in the converting-from string maps positionally to the converting-to string.
Using FUNCTION TRIM on numeric fields: Intrinsic string functions expect alphanumeric arguments. Passing a numeric field without first converting it to an alphanumeric representation produces unexpected results or compilation errors.

Quick Reference

      * STRING -- concatenate fields
           MOVE SPACES TO WS-FULL-NAME
           STRING WS-FIRST-NAME DELIMITED BY "  "
                  " "            DELIMITED BY SIZE
                  WS-LAST-NAME   DELIMITED BY "  "
               INTO WS-FULL-NAME
               ON OVERFLOW
                  DISPLAY "Name truncated"
           END-STRING

      * STRING with POINTER
           MOVE 1 TO WS-PTR
           STRING "Date: " DELIMITED BY SIZE
               INTO WS-OUTPUT-LINE
               WITH POINTER WS-PTR
           END-STRING
           STRING WS-DATE-FORMATTED DELIMITED BY SIZE
               INTO WS-OUTPUT-LINE
               WITH POINTER WS-PTR
           END-STRING

      * UNSTRING -- parse CSV record
           UNSTRING WS-CSV-RECORD
               DELIMITED BY ","
               INTO WS-FIELD-1
                    WS-FIELD-2
                    WS-FIELD-3
               TALLYING IN WS-FIELD-COUNT
               ON OVERFLOW
                  DISPLAY "Too many fields"
           END-UNSTRING

      * INSPECT TALLYING
           MOVE ZERO TO WS-COMMA-COUNT
           INSPECT WS-INPUT-STRING
               TALLYING WS-COMMA-COUNT
               FOR ALL ","

      * INSPECT REPLACING
           INSPECT WS-PHONE-NUMBER
               REPLACING ALL "-" BY " "

      * INSPECT CONVERTING (uppercase conversion)
           INSPECT WS-DATA-FIELD
               CONVERTING
               "abcdefghijklmnopqrstuvwxyz"
               TO
               "ABCDEFGHIJKLMNOPQRSTUVWXYZ"

      * Reference modification
           MOVE WS-FULL-DATE(1:4)  TO WS-YEAR
           MOVE WS-FULL-DATE(5:2)  TO WS-MONTH
           MOVE WS-FULL-DATE(7:2)  TO WS-DAY

      * Intrinsic functions
           MOVE FUNCTION UPPER-CASE(WS-NAME)
               TO WS-NAME-UPPER
           MOVE FUNCTION TRIM(WS-INPUT TRAILING)
               TO WS-TRIMMED
           MOVE FUNCTION LENGTH(WS-RECORD)
               TO WS-REC-LEN

What's Next

Chapter 10 introduces tables and arrays in COBOL, using the OCCURS clause to define repeating data structures. You will learn to use subscripts and indexes to access table elements, search tables with SEARCH and SEARCH ALL, and define variable-length tables with OCCURS DEPENDING ON. The string-handling skills from this chapter combine naturally with table processing, as many real-world programs parse input strings into table entries or format table data into output strings.