Part IV: Data Manipulation and Strings

DataField.Dev

12 min read

COBOL is, at its core, a data processing language. The name says it — Common Business-Oriented Language. It was designed to read data, transform data, and write data. And while the file processing skills you developed in Part III handle the reading...

In This Chapter

The Data Manipulation Toolkit
String Handling: Harder Than You Think
Tables: In-Memory Data Structures
Reference Modification: Precision Substring Operations
Intrinsic Functions: COBOL's Built-In Library
Date and Time Processing: Lessons from Y2K
The Integration of Skills
What Part IV Covers
From Syntax to Fluency

Part IV: Data Manipulation and Strings

The Data Manipulation Toolkit

COBOL is, at its core, a data processing language. The name says it — Common Business-Oriented Language. It was designed to read data, transform data, and write data. And while the file processing skills you developed in Part III handle the reading and writing, the skills in Part IV handle the transformation.

Data manipulation in COBOL is a broader topic than it first appears. It encompasses string handling — parsing, concatenating, inspecting, and transforming character data. It encompasses table processing — defining, loading, searching, and manipulating in-memory arrays of data. It encompasses reference modification — the ability to extract and manipulate substrings and subfields by position and length. It encompasses intrinsic functions — COBOL's built-in library of mathematical, string, and date operations. And it encompasses date and time processing — a topic that might seem mundane until you realize that dates are embedded in virtually every business transaction and that getting them wrong has, historically, caused some of the most expensive bugs in computing history.

These are not isolated skills. They work together. A typical enterprise COBOL program might read a record from a file (Part III skills), parse out a date field using reference modification, validate it using intrinsic functions, look up a rate in a loaded table, perform a calculation, format the result using string handling, and write it to an output file. Each skill in Part IV is a tool in your toolkit, and like any set of tools, their power comes from knowing when and how to use each one.

String Handling: Harder Than You Think

If you have programmed in Python, Java, or JavaScript, you take string manipulation for granted. Concatenation is an operator. Substring extraction is a method call. Regular expressions handle pattern matching. The language runtime manages memory allocation as strings grow and shrink.

COBOL's approach to strings is fundamentally different, and understanding why will help you use COBOL's string features effectively rather than fighting against them.

COBOL was designed in an era when memory was measured in kilobytes, not gigabytes. Every byte mattered. Variable-length data structures — the foundation of string handling in modern languages — were expensive to manage and could fragment memory unpredictably. So COBOL adopted fixed-length fields: you declare a PIC X(50) field, and it always occupies exactly 50 bytes, padded with spaces if the actual data is shorter. This design choice — which may seem archaic — gives COBOL programs predictable memory usage, which matters enormously in batch processing environments that run for hours and process millions of records.

But fixed-length fields make string manipulation more complex. Concatenating two strings is not a simple append operation; you need to know where the meaningful data ends and the padding begins. Parsing a delimited string requires explicit counting and positioning. Building an output string from multiple pieces requires careful management of a pointer that tracks your position in the receiving field.

COBOL addresses these challenges with three specialized statements — STRING, UNSTRING, and INSPECT — plus reference modification for positional substring operations. These tools are powerful but different from what you may be used to, and mastering them requires practice.

At MedClaim, string handling is critical for EDI processing. The HIPAA 837 claim format uses delimited segments — fields separated by asterisks within segments separated by tildes — and parsing these segments into COBOL data structures requires extensive use of UNSTRING with delimiter handling. James Okafor's team has developed a set of reusable string-parsing paragraphs that are used across dozens of programs, and the code is a masterclass in COBOL string manipulation.

Derek Washington at GlobalBank had a revelatory moment with COBOL string handling during his first year. He was assigned to write a program that formatted customer addresses for mailing labels — combining name, street, city, state, and ZIP code into formatted lines. In Java, this would have been a trivial StringBuilder operation. In COBOL, it required STRING statements with pointer management and careful handling of variable-length name fields. "It made me think about strings differently," Derek recalls. "In Java, I never thought about where the data physically lived. In COBOL, you always know exactly where every byte is. It is lower-level, but it is also more precise."

Tables: In-Memory Data Structures

COBOL tables — arrays, in the terminology of most other languages — are essential for any program that needs to hold and manipulate collections of data in memory. Rate tables, lookup codes, accumulated totals, temporary work areas — all of these are implemented using COBOL's table facilities.

In your introductory course, you likely learned the basics: OCCURS clause, subscript access, simple table definitions. Part IV takes you much further. You will learn about multi-dimensional tables (OCCURS within OCCURS), variable-length tables (OCCURS DEPENDING ON), indexed table access (the SET statement and INDEX data items, which are more efficient than subscripts), and the SEARCH and SEARCH ALL statements for linear and binary searching.

You will also learn the practical art of table design — choosing the right table structure for the problem, loading tables from files or databases, handling table overflow, and the performance implications of different search strategies on large tables.

At GlobalBank, tables are used extensively in the core banking system. The interest rate table — which holds rates by account type, tier, and effective date — is a multi-dimensional table loaded from a VSAM file at program initialization. The transaction code table maps three-digit codes to descriptions, processing rules, and GL account numbers. The branch table holds information about every branch in the bank's network. These tables are not academic exercises; they are production data structures that programs depend on.

One pattern you will encounter frequently is the "load and look up" pattern: at program initialization, read a reference file into a table; during processing, look up values in the table using SEARCH or SEARCH ALL; at program termination, the table's contents expire with the program. This pattern is efficient — it avoids repeated file reads during processing — and it is the standard approach for handling reference data in COBOL batch programs.

Reference Modification: Precision Substring Operations

Reference modification — the ability to extract or manipulate a substring of a data item by specifying a starting position and length — is one of COBOL's most useful features for data manipulation. The syntax is simple: WS-FIELD(3:5) refers to 5 bytes starting at position 3 of WS-FIELD. But the applications are extensive.

Reference modification lets you parse fixed-format records without defining every subfield in WORKING-STORAGE. It lets you extract portions of fields for comparison or output. It lets you build strings byte by byte when the STRING statement is not flexible enough. And it lets you handle variable-format data — records whose structure depends on a type code or whose fields are positionally dependent — without defining every possible variation in your DATA DIVISION.

At MedClaim, reference modification is used heavily in the EDI processing modules. EDI records have a hierarchical structure where the meaning of bytes at a given position depends on segment identifiers earlier in the record. James Okafor's team uses reference modification to extract segment identifiers, determine the record type, and then parse the remaining bytes according to the appropriate layout. The alternative — defining a separate record layout for every possible segment type — would be unwieldy given the number of segment types in the HIPAA 837 standard.

Reference modification also has pitfalls. Specifying a starting position or length that exceeds the field's boundaries causes a runtime error — typically a SOC7 abend that is frustrating to diagnose because the abending instruction is a compiler-generated operation rather than an explicit statement in your source code. Defensive programming requires validating positions and lengths before using them in reference modification expressions.

Intrinsic Functions: COBOL's Built-In Library

COBOL-85 introduced intrinsic functions, and subsequent standards expanded the library significantly. These built-in functions provide mathematical operations (FUNCTION SQRT, FUNCTION LOG, FUNCTION REM), string operations (FUNCTION UPPER-CASE, FUNCTION LOWER-CASE, FUNCTION REVERSE, FUNCTION TRIM), date operations (FUNCTION CURRENT-DATE, FUNCTION INTEGER-OF-DATE, FUNCTION DATE-OF-INTEGER), and financial operations (FUNCTION ANNUITY, FUNCTION PRESENT-VALUE) without requiring the programmer to write the logic from scratch.

Intrinsic functions are underused in practice. Many experienced COBOL programmers learned the language before functions were available and continue to write explicit logic for operations that a function could handle in a single expression. Part IV will make you fluent in the function library — not just knowing that the functions exist, but knowing when to use them and how they interact with COBOL's type system.

One particularly important category of intrinsic functions is the date conversion functions. FUNCTION INTEGER-OF-DATE converts a Gregorian date (YYYYMMDD format) to an integer day number; FUNCTION DATE-OF-INTEGER converts back. These functions allow date arithmetic — adding days to a date, calculating the difference between two dates, determining the day of the week — by converting to integers, performing the arithmetic, and converting back. This approach is far more reliable than attempting to do date arithmetic directly on YYYYMMDD values, where you must handle month lengths, leap years, and century boundaries yourself.

Date and Time Processing: Lessons from Y2K

No discussion of COBOL data manipulation is complete without addressing dates, and no discussion of COBOL dates is complete without acknowledging Y2K — the Year 2000 problem that made COBOL front-page news for the first (and perhaps only) time in its history.

The Y2K problem was, at its root, a data manipulation problem. Programs stored years as two digits (PIC 99) to save storage space, and calculations that assumed "98" was before "99" broke when "00" arrived. The industry spent an estimated $300 billion fixing the problem, and COBOL programmers — many of whom had retired or moved to other fields — were pulled back into service to remediate code they had written decades earlier.

Y2K is history, but its lessons are not:

Storage decisions have long-term consequences. When you choose how to represent data — how many digits, what format, what century convention — you are making a decision that may affect programs for decades. Choose carefully.

Date arithmetic is harder than it looks. Leap years, month boundaries, time zones, daylight saving time, and calendar systems that vary by locale all make date processing a minefield of edge cases. Use the intrinsic functions. Do not roll your own.

Testing date logic requires creative test cases. End-of-month, end-of-year, leap year, century boundary, and DST transition dates should all be in your test suite. The fact that your program works on January 15 does not prove it works on February 29.

Part IV dedicates an entire chapter to date and time processing — not because dates are inherently fascinating, but because getting dates wrong in enterprise systems has consequences. At MedClaim, claims have filing deadlines that are measured in days from the date of service. A date calculation error that shortens or extends a deadline can cause claims to be wrongly denied or wrongly paid. At GlobalBank, interest calculations depend on the exact number of days between dates, and a single day's error on millions of accounts adds up to significant money.

The Integration of Skills

The five chapters in Part IV are individually important, but their real power emerges when you combine them. A realistic COBOL program does not use string handling in isolation or table processing in isolation. It uses all of these skills together, in the context of file processing (Part III) and control flow (Part II).

Consider a program that processes employee payroll records. It reads each record, uses UNSTRING to parse the employee's full name into first, middle, and last name components. It uses reference modification to extract the department code from positions 45-48 of the record. It searches a loaded rate table to find the employee's pay rate based on job classification and seniority. It uses intrinsic functions to calculate the number of workdays between the period start and end dates. It performs the arithmetic to compute gross pay, deductions, and net pay. It formats the result using STRING and writes it to a check-printing file.

Every skill in Part IV is at work in that one program. And programs like that are the bread and butter of enterprise COBOL.

What Part IV Covers

The five chapters in Part IV build your data manipulation toolkit:

Chapter 17: String Handling covers STRING, UNSTRING, and INSPECT in depth. Concatenation, parsing, tallying, replacing, and the patterns for handling variable-length and delimited data in COBOL's fixed-field environment.

Chapter 18: Table Processing takes you from basic OCCURS clauses to multi-dimensional tables, OCCURS DEPENDING ON, indexed access, SEARCH and SEARCH ALL, and the design patterns for loading, searching, and managing in-memory tables.

Chapter 19: Reference Modification and Data Manipulation explores positional substring operations, their applications in parsing and formatting, their pitfalls, and how they complement STRING and UNSTRING for comprehensive data manipulation.

Chapter 20: Intrinsic Functions surveys COBOL's function library: mathematical, string, date, and financial functions. Practical applications, type considerations, and why functions should be your first choice before writing custom logic.

Chapter 21: Date and Time Processing addresses date representation, date arithmetic using intrinsic functions, time zone handling, the lessons of Y2K, and the defensive testing strategies that prevent date-related bugs.

From Syntax to Fluency

Part IV is where many COBOL programmers make the leap from knowing the language to being fluent in it. Knowing that the STRING statement exists is syntax knowledge. Knowing how to use STRING, UNSTRING, reference modification, and intrinsic functions together to parse, transform, and format data in a production program — that is fluency.

Fluency is what the enterprise needs. The programs you will write after completing Part IV will not just process files; they will manipulate the data within those files with precision, efficiency, and the defensive rigor that production systems demand.

That is the transformation Part IV is designed to achieve. Let us get to work.