Case Study 1: Parsing Bank Transaction Descriptions at Meridian National Bank

Background

Meridian National Bank (MNB) processes 3.2 million debit card and ACH transactions daily. Each transaction arrives from the payment network with a free-form description field -- a 60-character string that contains the merchant name, location, reference number, and sometimes additional codes, all concatenated together with no consistent delimiter. These descriptions were designed for human readability on bank statements, not for machine processing.

In 2024, MNB launched an initiative to categorize transactions automatically for their mobile banking app. Customers wanted to see spending breakdowns by category (groceries, restaurants, gas stations, subscriptions), and the categorization engine needed structured fields extracted from the unstructured description text.

The challenge: the description field follows no formal standard. Different payment networks and merchants format descriptions differently:

Raw Description Merchant Location Reference
WALMART SUPERCTR 5274 DALLAS TX REF#829471 WALMART SUPERCTR 5274 DALLAS TX 829471
SHELL OIL 57442 HOUSTON TX 77001 SHELL OIL 57442 HOUSTON TX (embedded in name)
AMZN MKTP US*RT4K92HF0 AMZN.COM/BILLWA AMZN MKTP US (online) RT4K92HF0
SQ *BELLA ROSA CAFE CHICAGO IL BELLA ROSA CAFE CHICAGO IL (none)
PAYPAL *NETFLIX.COM 402-935-7733 CA NETFLIX.COM CA (none)

James Chen, a COBOL developer on the batch processing team, was assigned to build a transaction description parser using COBOL's string handling facilities: UNSTRING, INSPECT, STRING, reference modification, and intrinsic functions.


The Problem

James needed to extract three structured fields from each 60-character description:

  1. Merchant Name (up to 30 characters) -- The primary business name, stripped of location and reference data
  2. Merchant Location (up to 20 characters) -- City and state, if present
  3. Reference Number (up to 15 characters) -- Any reference, confirmation, or trace number

Additionally, the parser needed to: - Handle multiple description formats without failing - Normalize merchant names to uppercase with no leading/trailing spaces - Strip common prefixes like "SQ ", "PAYPAL ", and "AMZN MKTP US*" - Count the number of successfully parsed and unparseable records - Produce an output file with both the original and parsed fields for audit

The difficulty lies in the lack of consistent delimiters. Some descriptions use double spaces to separate sections, others use single spaces throughout. Some embed reference numbers after "REF#", others embed them after asterisks, and still others have no reference at all.


The Solution

Parsing Strategy

James developed a multi-pass parsing strategy:

  1. Pass 1 (INSPECT): Count and identify delimiter characters to determine the format type
  2. Pass 2 (UNSTRING): Split the description using the identified delimiters
  3. Pass 3 (INSPECT REPLACING): Strip known prefixes and normalize characters
  4. Pass 4 (Reference Modification): Extract specific substrings when positional patterns are detected
  5. Pass 5 (STRING): Reassemble the cleaned fields into the output record

The Complete COBOL Program

       IDENTIFICATION DIVISION.
       PROGRAM-ID.  TXNPARSE.
       AUTHOR.      JAMES CHEN.
       DATE-WRITTEN. 2024-08-20.
      *================================================================
      * PROGRAM:  TXNPARSE
      * PURPOSE:  Parse free-form transaction descriptions into
      *           structured merchant name, location, and
      *           reference number fields. Demonstrates STRING,
      *           UNSTRING, INSPECT, and reference modification.
      *================================================================

       ENVIRONMENT DIVISION.

       INPUT-OUTPUT SECTION.
       FILE-CONTROL.
           SELECT TRANS-INPUT-FILE
               ASSIGN TO "TRANSIN"
               ORGANIZATION IS SEQUENTIAL
               FILE STATUS IS WS-INPUT-STATUS.

           SELECT PARSED-OUTPUT-FILE
               ASSIGN TO "PARSOUT"
               ORGANIZATION IS SEQUENTIAL
               FILE STATUS IS WS-OUTPUT-STATUS.

       DATA DIVISION.

       FILE SECTION.
       FD  TRANS-INPUT-FILE
           RECORDING MODE IS F
           RECORD CONTAINS 80 CHARACTERS.
       01  FS-INPUT-RECORD.
           05  FS-IN-ACCOUNT          PIC 9(10).
           05  FS-IN-TRANS-DATE       PIC 9(8).
           05  FS-IN-AMOUNT           PIC S9(9)V99.
           05  FS-IN-DESCRIPTION      PIC X(60).

       FD  PARSED-OUTPUT-FILE
           RECORDING MODE IS F
           RECORD CONTAINS 160 CHARACTERS.
       01  FS-OUTPUT-RECORD.
           05  FS-OUT-ACCOUNT         PIC 9(10).
           05  FS-OUT-TRANS-DATE      PIC 9(8).
           05  FS-OUT-AMOUNT          PIC S9(9)V99.
           05  FS-OUT-ORIG-DESC       PIC X(60).
           05  FS-OUT-MERCHANT-NAME   PIC X(30).
           05  FS-OUT-MERCHANT-LOC    PIC X(20).
           05  FS-OUT-REFERENCE       PIC X(15).
           05  FS-OUT-PARSE-STATUS    PIC X(1).
           05  FILLER                 PIC X(4).

       WORKING-STORAGE SECTION.

      *----------------------------------------------------------------
      * FILE STATUS FIELDS
      *----------------------------------------------------------------
       01  WS-INPUT-STATUS            PIC X(2).
           88  INPUT-OK                          VALUE "00".
           88  INPUT-EOF                         VALUE "10".
       01  WS-OUTPUT-STATUS           PIC X(2).
           88  OUTPUT-OK                         VALUE "00".

      *----------------------------------------------------------------
      * WORKING FIELDS FOR PARSING
      *----------------------------------------------------------------
       01  WS-WORK-DESC               PIC X(60).
       01  WS-MERCHANT-NAME           PIC X(30).
       01  WS-MERCHANT-LOC            PIC X(20).
       01  WS-REFERENCE               PIC X(15).

      *----------------------------------------------------------------
      * UNSTRING RECEIVING FIELDS
      *----------------------------------------------------------------
       01  WS-UNSTR-FIELDS.
           05  WS-PART-1              PIC X(30).
           05  WS-PART-2              PIC X(20).
           05  WS-PART-3              PIC X(20).
           05  WS-PART-4              PIC X(15).

      *----------------------------------------------------------------
      * UNSTRING CONTROL FIELDS
      *----------------------------------------------------------------
       01  WS-UNSTR-PTR               PIC 9(3).
       01  WS-UNSTR-TALLY             PIC 9(3).
       01  WS-DELIM-1                 PIC X(5).
       01  WS-DELIM-2                 PIC X(5).

      *----------------------------------------------------------------
      * INSPECT COUNTERS
      *----------------------------------------------------------------
       01  WS-ASTERISK-COUNT          PIC 9(3) VALUE ZERO.
       01  WS-HASH-COUNT              PIC 9(3) VALUE ZERO.
       01  WS-SLASH-COUNT             PIC 9(3) VALUE ZERO.
       01  WS-DOUBLE-SPACE-POS        PIC 9(3) VALUE ZERO.

      *----------------------------------------------------------------
      * STATE CODE VALIDATION TABLE
      *----------------------------------------------------------------
       01  WS-STATE-CODES.
           05  FILLER PIC X(100) VALUE
               "AL AK AZ AR CA CO CT DE FL GA "
             & "HI ID IL IN IA KS KY LA ME MD ".
           05  FILLER PIC X(100) VALUE
               "MA MI MN MS MO MT NE NV NH NJ "
             & "NM NY NC ND OH OK OR PA RI SC ".
           05  FILLER PIC X(50) VALUE
               "SD TN TX UT VT VA WA WV WI WY ".
       01  WS-STATE-TABLE REDEFINES WS-STATE-CODES.
           05  WS-STATE-ENTRY         PIC X(3)
                                      OCCURS 50 TIMES.
       01  WS-STATE-IDX               PIC 9(3).
       01  WS-FOUND-STATE             PIC X(1).
           88  STATE-FOUND                       VALUE 'Y'.
           88  STATE-NOT-FOUND                   VALUE 'N'.

      *----------------------------------------------------------------
      * REFERENCE MODIFICATION WORK FIELDS
      *----------------------------------------------------------------
       01  WS-SCAN-POS                PIC 9(3).
       01  WS-SCAN-LEN               PIC 9(3).
       01  WS-REF-START               PIC 9(3).
       01  WS-REF-LEN                 PIC 9(3).
       01  WS-CHAR                    PIC X(1).

      *----------------------------------------------------------------
      * PREFIX REMOVAL TABLE
      *----------------------------------------------------------------
       01  WS-PREFIX-TABLE.
           05  WS-PREFIX-COUNT        PIC 9(2) VALUE 6.
           05  WS-PREFIX-DATA.
               10  FILLER PIC X(15) VALUE "SQ *           ".
               10  FILLER PIC X(15) VALUE "PAYPAL *       ".
               10  FILLER PIC X(15) VALUE "AMZN MKTP US*  ".
               10  FILLER PIC X(15) VALUE "TST*           ".
               10  FILLER PIC X(15) VALUE "SP *           ".
               10  FILLER PIC X(15) VALUE "CKE*           ".
           05  WS-PREFIX-ENTRIES REDEFINES WS-PREFIX-DATA.
               10  WS-PREFIX-ENTRY    PIC X(15)
                                      OCCURS 6 TIMES.
       01  WS-PREFIX-IDX              PIC 9(2).
       01  WS-PREFIX-LEN              PIC 9(2).

      *----------------------------------------------------------------
      * COUNTERS AND STATISTICS
      *----------------------------------------------------------------
       01  WS-COUNTERS.
           05  WS-TOTAL-READ          PIC S9(7) COMP-3
                                      VALUE ZERO.
           05  WS-TOTAL-PARSED        PIC S9(7) COMP-3
                                      VALUE ZERO.
           05  WS-TOTAL-PARTIAL       PIC S9(7) COMP-3
                                      VALUE ZERO.
           05  WS-TOTAL-FAILED        PIC S9(7) COMP-3
                                      VALUE ZERO.
           05  WS-TOTAL-WRITTEN       PIC S9(7) COMP-3
                                      VALUE ZERO.

      *----------------------------------------------------------------
      * DISPLAY FIELDS
      *----------------------------------------------------------------
       01  WS-DISP-COUNT              PIC Z,ZZZ,ZZ9.

       PROCEDURE DIVISION.

       0000-MAIN-CONTROL.
           PERFORM 1000-INITIALIZE
           PERFORM 2000-PROCESS-TRANSACTIONS
               UNTIL INPUT-EOF
           PERFORM 8000-DISPLAY-STATISTICS
           PERFORM 9000-FINALIZE
           STOP RUN
           .

       1000-INITIALIZE.
           DISPLAY "========================================"
           DISPLAY " TRANSACTION DESCRIPTION PARSER"
           DISPLAY " MERIDIAN NATIONAL BANK"
           DISPLAY "========================================"

           OPEN INPUT  TRANS-INPUT-FILE
                OUTPUT PARSED-OUTPUT-FILE

           IF NOT INPUT-OK
               DISPLAY "ERROR: Cannot open input file. "
                       "Status: " WS-INPUT-STATUS
               STOP RUN
           END-IF
           IF NOT OUTPUT-OK
               DISPLAY "ERROR: Cannot open output file. "
                       "Status: " WS-OUTPUT-STATUS
               STOP RUN
           END-IF

           PERFORM 2100-READ-TRANSACTION
           .

       2000-PROCESS-TRANSACTIONS.
           ADD 1 TO WS-TOTAL-READ
           INITIALIZE WS-MERCHANT-NAME
           INITIALIZE WS-MERCHANT-LOC
           INITIALIZE WS-REFERENCE
           INITIALIZE WS-UNSTR-FIELDS
           MOVE FS-IN-DESCRIPTION TO WS-WORK-DESC

      *    --- Pass 1: Analyze the description format ---
           PERFORM 3000-ANALYZE-FORMAT

      *    --- Pass 2: Remove known prefixes ---
           PERFORM 3100-REMOVE-PREFIXES

      *    --- Pass 3: Extract reference number ---
           PERFORM 3200-EXTRACT-REFERENCE

      *    --- Pass 4: Extract location ---
           PERFORM 3300-EXTRACT-LOCATION

      *    --- Pass 5: Extract merchant name ---
           PERFORM 3400-EXTRACT-MERCHANT-NAME

      *    --- Build output record ---
           PERFORM 4000-BUILD-OUTPUT
           PERFORM 4100-WRITE-OUTPUT
           PERFORM 2100-READ-TRANSACTION
           .

       2100-READ-TRANSACTION.
           READ TRANS-INPUT-FILE
               AT END
                   SET INPUT-EOF TO TRUE
           END-READ
           .

       3000-ANALYZE-FORMAT.
      *    -------------------------------------------------------
      *    Use INSPECT to count delimiter characters in the
      *    description. This determines the parsing strategy.
      *    -------------------------------------------------------
           MOVE ZERO TO WS-ASTERISK-COUNT
           MOVE ZERO TO WS-HASH-COUNT
           MOVE ZERO TO WS-SLASH-COUNT

           INSPECT WS-WORK-DESC
               TALLYING WS-ASTERISK-COUNT FOR ALL "*"
                        WS-HASH-COUNT     FOR ALL "#"
                        WS-SLASH-COUNT    FOR ALL "/"
           .

       3100-REMOVE-PREFIXES.
      *    -------------------------------------------------------
      *    Check if the description starts with a known prefix
      *    (SQ *, PAYPAL *, etc.) and remove it using reference
      *    modification.
      *    -------------------------------------------------------
           PERFORM VARYING WS-PREFIX-IDX FROM 1 BY 1
               UNTIL WS-PREFIX-IDX > WS-PREFIX-COUNT
      *        Determine the actual length of this prefix
      *        (excluding trailing spaces in the table entry)
               MOVE ZERO TO WS-PREFIX-LEN
               INSPECT WS-PREFIX-ENTRY(WS-PREFIX-IDX)
                   TALLYING WS-PREFIX-LEN
                   FOR CHARACTERS BEFORE INITIAL "  "
      *        If prefix length is valid, check for a match
               IF WS-PREFIX-LEN > 0 AND WS-PREFIX-LEN < 15
                   IF WS-WORK-DESC(1:WS-PREFIX-LEN) =
                      WS-PREFIX-ENTRY(WS-PREFIX-IDX)
                                     (1:WS-PREFIX-LEN)
      *                Shift the description left, removing
      *                the prefix
                       MOVE SPACES TO WS-MERCHANT-NAME
                       COMPUTE WS-SCAN-LEN =
                           60 - WS-PREFIX-LEN
                       MOVE WS-WORK-DESC
                           (WS-PREFIX-LEN + 1:WS-SCAN-LEN)
                           TO WS-WORK-DESC
                   END-IF
               END-IF
           END-PERFORM
           .

       3200-EXTRACT-REFERENCE.
      *    -------------------------------------------------------
      *    Look for reference number patterns:
      *    1. "REF#" followed by digits
      *    2. "CONF#" followed by digits
      *    3. Asterisk-delimited reference in certain formats
      *    Uses reference modification to scan and extract.
      *    -------------------------------------------------------
           MOVE SPACES TO WS-REFERENCE

      *    Pattern 1: Look for "REF#"
           IF WS-HASH-COUNT > 0
               PERFORM VARYING WS-SCAN-POS FROM 1 BY 1
                   UNTIL WS-SCAN-POS > 55
                   IF WS-WORK-DESC(WS-SCAN-POS:4) = "REF#"
                       COMPUTE WS-REF-START =
                           WS-SCAN-POS + 4
                       COMPUTE WS-REF-LEN =
                           60 - WS-REF-START + 1
                       IF WS-REF-LEN > 15
                           MOVE 15 TO WS-REF-LEN
                       END-IF
                       MOVE WS-WORK-DESC
                           (WS-REF-START:WS-REF-LEN)
                           TO WS-REFERENCE
      *                Remove the REF# and number from the
      *                working description
                       MOVE SPACES TO
                           WS-WORK-DESC(WS-SCAN-POS:
                               60 - WS-SCAN-POS + 1)
                   END-IF
               END-PERFORM
           END-IF

      *    Pattern 2: Look for "CONF#"
           IF WS-REFERENCE = SPACES AND WS-HASH-COUNT > 0
               PERFORM VARYING WS-SCAN-POS FROM 1 BY 1
                   UNTIL WS-SCAN-POS > 54
                   IF WS-WORK-DESC(WS-SCAN-POS:5) = "CONF#"
                       COMPUTE WS-REF-START =
                           WS-SCAN-POS + 5
                       COMPUTE WS-REF-LEN =
                           60 - WS-REF-START + 1
                       IF WS-REF-LEN > 15
                           MOVE 15 TO WS-REF-LEN
                       END-IF
                       MOVE WS-WORK-DESC
                           (WS-REF-START:WS-REF-LEN)
                           TO WS-REFERENCE
                       MOVE SPACES TO
                           WS-WORK-DESC(WS-SCAN-POS:
                               60 - WS-SCAN-POS + 1)
                   END-IF
               END-PERFORM
           END-IF
           .

       3300-EXTRACT-LOCATION.
      *    -------------------------------------------------------
      *    Scan the description from right to left looking for
      *    a two-letter US state code preceded by a city name.
      *    Uses the state code validation table.
      *    -------------------------------------------------------
           MOVE SPACES TO WS-MERCHANT-LOC
           SET STATE-NOT-FOUND TO TRUE

      *    Scan from position 58 backward (state code is 2 chars)
           PERFORM VARYING WS-SCAN-POS FROM 58 BY -1
               UNTIL WS-SCAN-POS < 10 OR STATE-FOUND
      *        Check if this position has a potential state code
      *        (preceded by a space)
               IF WS-SCAN-POS > 1
                   IF WS-WORK-DESC(WS-SCAN-POS - 1:1) = SPACE
                       PERFORM VARYING WS-STATE-IDX
                           FROM 1 BY 1
                           UNTIL WS-STATE-IDX > 50
                               OR STATE-FOUND
                           IF WS-WORK-DESC
                               (WS-SCAN-POS:2) =
                               WS-STATE-ENTRY(WS-STATE-IDX)
                                              (1:2)
                               SET STATE-FOUND TO TRUE
      *                        Extract city + state
      *                        Scan backward from state to
      *                        find start of city
                               PERFORM
                                   3310-EXTRACT-CITY-STATE
                           END-IF
                       END-PERFORM
                   END-IF
               END-IF
           END-PERFORM
           .

       3310-EXTRACT-CITY-STATE.
      *    WS-SCAN-POS points to the state code.
      *    Walk backward to find the start of the city name.
           MOVE WS-SCAN-POS TO WS-REF-START
           COMPUTE WS-REF-START = WS-SCAN-POS - 2
           PERFORM VARYING WS-REF-START
               FROM WS-REF-START BY -1
               UNTIL WS-REF-START < 2
               IF WS-WORK-DESC(WS-REF-START:1) = SPACE
                   AND WS-WORK-DESC(WS-REF-START - 1:1)
                       NOT = SPACE
      *            Found the space before the city name
      *            but need to check the character before
      *            is part of the merchant name
                   CONTINUE
               ELSE IF WS-WORK-DESC(WS-REF-START:1) = SPACE
                   AND WS-WORK-DESC(WS-REF-START - 1:1)
                       = SPACE
      *            Found double space -- city starts after it
                   ADD 1 TO WS-REF-START
                   EXIT PERFORM
               END-IF
           END-PERFORM

      *    Extract the location substring
           COMPUTE WS-REF-LEN =
               WS-SCAN-POS + 2 - WS-REF-START
           IF WS-REF-LEN > 0 AND WS-REF-LEN <= 20
               MOVE WS-WORK-DESC(WS-REF-START:WS-REF-LEN)
                   TO WS-MERCHANT-LOC
      *        Blank out the location from working description
               MOVE SPACES TO
                   WS-WORK-DESC(WS-REF-START:WS-REF-LEN)
           END-IF
           .

       3400-EXTRACT-MERCHANT-NAME.
      *    -------------------------------------------------------
      *    Whatever remains in the working description after
      *    removing the reference and location is the merchant
      *    name. Use INSPECT to clean it up.
      *    -------------------------------------------------------
      *    Convert to uppercase
           MOVE FUNCTION UPPER-CASE(WS-WORK-DESC)
               TO WS-WORK-DESC

      *    Replace multiple consecutive spaces with single spaces
      *    by using INSPECT REPLACING
           INSPECT WS-WORK-DESC
               REPLACING ALL "   " BY "  " & SPACE

      *    Extract the first 30 non-blank characters
           MOVE SPACES TO WS-MERCHANT-NAME
           MOVE 1 TO WS-UNSTR-PTR
           UNSTRING WS-WORK-DESC
               DELIMITED BY ALL SPACES
               INTO WS-PART-1
                    WS-PART-2
                    WS-PART-3
               WITH POINTER WS-UNSTR-PTR
               TALLYING IN WS-UNSTR-TALLY
           END-UNSTRING

      *    Reassemble with single spaces between words
           MOVE SPACES TO WS-MERCHANT-NAME
           MOVE 1 TO WS-UNSTR-PTR
           STRING WS-PART-1 DELIMITED BY "  "
                  " "       DELIMITED BY SIZE
                  WS-PART-2 DELIMITED BY "  "
                  " "       DELIMITED BY SIZE
                  WS-PART-3 DELIMITED BY "  "
             INTO WS-MERCHANT-NAME
             WITH POINTER WS-UNSTR-PTR
             ON OVERFLOW
                 CONTINUE
           END-STRING
           .

       4000-BUILD-OUTPUT.
      *    -------------------------------------------------------
      *    Assemble the output record from parsed fields.
      *    Determine parse quality status.
      *    -------------------------------------------------------
           MOVE FS-IN-ACCOUNT     TO FS-OUT-ACCOUNT
           MOVE FS-IN-TRANS-DATE  TO FS-OUT-TRANS-DATE
           MOVE FS-IN-AMOUNT      TO FS-OUT-AMOUNT
           MOVE FS-IN-DESCRIPTION TO FS-OUT-ORIG-DESC
           MOVE WS-MERCHANT-NAME  TO FS-OUT-MERCHANT-NAME
           MOVE WS-MERCHANT-LOC   TO FS-OUT-MERCHANT-LOC
           MOVE WS-REFERENCE      TO FS-OUT-REFERENCE

      *    Determine parse status
           IF WS-MERCHANT-NAME NOT = SPACES
               IF WS-MERCHANT-LOC NOT = SPACES
                   MOVE "F" TO FS-OUT-PARSE-STATUS
                   ADD 1 TO WS-TOTAL-PARSED
               ELSE
                   MOVE "P" TO FS-OUT-PARSE-STATUS
                   ADD 1 TO WS-TOTAL-PARTIAL
               END-IF
           ELSE
               MOVE "X" TO FS-OUT-PARSE-STATUS
               ADD 1 TO WS-TOTAL-FAILED
           END-IF
           .

       4100-WRITE-OUTPUT.
           WRITE FS-OUTPUT-RECORD
           IF OUTPUT-OK
               ADD 1 TO WS-TOTAL-WRITTEN
           ELSE
               DISPLAY "ERROR: Write failed. Status: "
                       WS-OUTPUT-STATUS
           END-IF
           .

       8000-DISPLAY-STATISTICS.
           DISPLAY " "
           DISPLAY "========================================"
           DISPLAY " PARSING STATISTICS"
           DISPLAY "========================================"
           MOVE WS-TOTAL-READ TO WS-DISP-COUNT
           DISPLAY "  Records read:       " WS-DISP-COUNT
           MOVE WS-TOTAL-PARSED TO WS-DISP-COUNT
           DISPLAY "  Fully parsed:       " WS-DISP-COUNT
           MOVE WS-TOTAL-PARTIAL TO WS-DISP-COUNT
           DISPLAY "  Partially parsed:   " WS-DISP-COUNT
           MOVE WS-TOTAL-FAILED TO WS-DISP-COUNT
           DISPLAY "  Failed to parse:    " WS-DISP-COUNT
           MOVE WS-TOTAL-WRITTEN TO WS-DISP-COUNT
           DISPLAY "  Records written:    " WS-DISP-COUNT
           DISPLAY "========================================"
           .

       9000-FINALIZE.
           CLOSE TRANS-INPUT-FILE
                 PARSED-OUTPUT-FILE
           .

Solution Walkthrough

Pass 1: Format Analysis with INSPECT TALLYING

The first step uses INSPECT TALLYING to count delimiter characters without modifying the source string. This non-destructive analysis determines which parsing patterns to apply:

           INSPECT WS-WORK-DESC
               TALLYING WS-ASTERISK-COUNT FOR ALL "*"
                        WS-HASH-COUNT     FOR ALL "#"
                        WS-SLASH-COUNT    FOR ALL "/"

A single INSPECT statement counts three different characters simultaneously. If WS-HASH-COUNT > 0, the description likely contains a "REF#" or "CONF#" pattern. If WS-ASTERISK-COUNT > 0, it may be a Square, PayPal, or Amazon-format description with prefix delimiters.

Pass 2: Prefix Removal with Reference Modification

Known prefixes like "SQ " and "PAYPAL " are stored in a table rather than hard-coded in IF statements. The removal logic uses reference modification to compare and shift:

           IF WS-WORK-DESC(1:WS-PREFIX-LEN) =
              WS-PREFIX-ENTRY(WS-PREFIX-IDX)(1:WS-PREFIX-LEN)
               MOVE WS-WORK-DESC(WS-PREFIX-LEN + 1:WS-SCAN-LEN)
                   TO WS-WORK-DESC
           END-IF

The expression WS-WORK-DESC(1:WS-PREFIX-LEN) extracts exactly WS-PREFIX-LEN characters from position 1 -- a substring comparison without using UNSTRING. When a match is found, the description is shifted left by moving the portion after the prefix to the beginning of the field.

Pass 3: Reference Number Extraction

The reference extraction scans the description using a sliding window implemented with reference modification:

           IF WS-WORK-DESC(WS-SCAN-POS:4) = "REF#"

This tests four characters starting at WS-SCAN-POS. When "REF#" is found, everything after it (up to 15 characters) is captured as the reference number, and that portion of the working description is blanked out to prevent it from contaminating the merchant name extraction.

Pass 4: Location Extraction with State Code Validation

The location extraction demonstrates a right-to-left scan -- a technique that is harder to implement with UNSTRING (which always scans left to right) but natural with reference modification. The program scans backward through the description looking for a valid two-character US state code preceded by a space. When found, it walks further backward to find the city name.

Pass 5: Name Assembly with STRING

After removing the reference and location, the remaining text is the merchant name. The program uses UNSTRING to split it into words (delimited by spaces), then STRING to reassemble the words with exactly one space between them. This eliminates the multiple consecutive spaces that often appear in transaction descriptions.

           STRING WS-PART-1 DELIMITED BY "  "
                  " "       DELIMITED BY SIZE
                  WS-PART-2 DELIMITED BY "  "
                  " "       DELIMITED BY SIZE
                  WS-PART-3 DELIMITED BY "  "
             INTO WS-MERCHANT-NAME

The DELIMITED BY " " (double space) stops each part at its first double space, effectively trimming trailing blanks. The literal " " with DELIMITED BY SIZE inserts exactly one space between parts.


Lessons Learned

1. INSPECT Is the Swiss Army Knife of COBOL String Analysis

INSPECT TALLYING counts characters without modifying the source, making it safe for preliminary analysis. INSPECT REPLACING modifies characters in place, ideal for normalization. Using both together -- first to analyze, then to transform -- is a powerful pattern.

2. Reference Modification Fills Gaps That UNSTRING Cannot

UNSTRING scans left to right and requires a known delimiter. When you need to scan right to left, test for a pattern at a specific position, or extract a substring of computed length, reference modification is the right tool. The state-code scan in this program could not have been written with UNSTRING alone.

3. Multi-Pass Parsing Is More Maintainable Than Single-Pass

By separating prefix removal, reference extraction, location extraction, and name assembly into distinct passes, each pass can be understood, tested, and modified independently. A single-pass approach that tried to extract all fields simultaneously would be far more complex and fragile.

4. Table-Driven Prefix Removal Scales Better Than Hard-Coded IF Chains

Storing the known prefixes in a table means adding a new prefix requires only a table entry change. With hard-coded IF statements, every new prefix format requires a code change, recompilation, and testing.

5. STRING's ON OVERFLOW Is Essential for Variable-Length Assembly

When assembling a merchant name from multiple parts, the total length may exceed the 30-character target field. The ON OVERFLOW clause handles this gracefully by stopping the transfer without causing a program abend, preserving whatever portion of the name fits.


Discussion Questions

  1. The program uses PERFORM VARYING to scan character by character through the description. COBOL does not have a built-in "indexOf" function. How does reference modification compensate for this? What would the performance impact be on 3.2 million daily transactions?

  2. The state code validation uses a table of 50 entries searched sequentially. How would you optimize this for performance? Could you use SEARCH ALL, and if so, what changes to the table definition would be required?

  3. The parser handles "REF#" and "CONF#" patterns but not all possible reference number formats. How would you extend it to detect reference numbers that follow no keyword pattern (such as alphanumeric codes embedded after asterisks)? What is the risk of false positives?

  4. The UNSTRING statement uses DELIMITED BY ALL SPACES to split the merchant name into words. What is the difference between DELIMITED BY SPACE and DELIMITED BY ALL SPACES? What would happen to the output if the wrong form were used?

  5. The program blanks out extracted sections of the working description to prevent them from appearing in the merchant name. What would happen if the extractions were done in a different order (for example, merchant name first, then location)? Why does order matter?

  6. The output record includes both the original description and the parsed fields. Why is this important for a production system? How would this support a reconciliation or audit process?

  7. The STRING statement's POINTER phrase is initialized to 1 before assembly. What would happen if the programmer forgot to initialize it? How does the WITH POINTER clause interact with multiple STRING statements in sequence?