Case Study 2: Name and Address Standardization at Heritage Life Insurance

Background

Heritage Life Insurance Company services 4.8 million policyholders across the United States. Their policyholder master file, maintained on an IBM z/OS mainframe, has accumulated data quality problems over three decades of manual entry, optical character recognition (OCR) ingestion, and batch merges from acquired companies. The same customer might appear as "JOHN Q. SMITH", "Smith, John Q", "JOHN QUINCY SMITH", or "J Q SMITH" depending on which data entry operator handled the original application.

In 2024, Heritage Life was ordered by their state insurance regulator to perform a comprehensive policyholder reconciliation. This required matching records across multiple systems -- and matching requires standardized names and addresses. A customer entered as "Robert J. McTavish Jr." in the policy system and "MCTAVISH, ROBERT J JR" in the claims system would not match unless both were normalized to a common format.

Priya Sharma, a senior COBOL batch developer, was assigned to build the Name and Address Standardization Program (NASP). The program would read the policyholder file, parse each name and address into component fields, normalize the components using consistent rules, and write a standardized output file suitable for matching.


The Problem

Priya cataloged the data quality issues in the policyholder file:

Name Field Problems

The name field is a single 50-character field (PIC X(50)) with no structure. Names appear in multiple formats:

Input Name Format Type
JOHN Q SMITH First Middle-Initial Last
SMITH, JOHN Q Last, First Middle-Initial
Smith, John Quincy Last, First Middle (mixed case)
DR. MARIA L. GONZALEZ-REYES Prefix, First, MI, Hyphenated Last
JAMES MCALLISTER III First, Last with Mc prefix, Suffix
PATRICIA ANN O'BRIEN First, Middle, Last with apostrophe
MRS ALICE B WONDERLAND-JONES JR Prefix, First, MI, Hyphenated, Suffix

Address Field Problems

The address occupies three lines of 30 characters each:

Field Example Problems
Address Line 1 123 N. Main St., 123 NORTH MAIN STREET, 123 N MAIN ST
Address Line 2 Apt. 4B, APT 4B, #4B, UNIT 4-B, or blank
City-State-ZIP SPRINGFIELD, IL 62704, Springfield IL 62704, SPRINGFIELD,IL62704

Standardization Rules

The regulator specified these normalization requirements:

  1. All output must be uppercase
  2. Names must be parsed into: Prefix, First, Middle, Last, Suffix
  3. Prefixes (MR, MRS, MS, DR, REV, HON) must be removed but recorded
  4. Suffixes (JR, SR, II, III, IV, ESQ, MD, PHD) must be separated from the last name
  5. Street abbreviations must be expanded (ST -> STREET, AVE -> AVENUE, etc.)
  6. Directional abbreviations must be standardized (N -> NORTH, S -> SOUTH, etc.)
  7. Periods must be removed from abbreviations
  8. Multiple spaces must be collapsed to single spaces
  9. Leading and trailing spaces must be removed

The Solution

       IDENTIFICATION DIVISION.
       PROGRAM-ID.  NASP.
       AUTHOR.      PRIYA SHARMA.
       DATE-WRITTEN. 2024-10-05.
      *================================================================
      * PROGRAM:  NASP - NAME AND ADDRESS STANDARDIZATION
      * PURPOSE:  Parse and normalize policyholder names and
      *           addresses for matching across systems.
      *           Demonstrates UNSTRING, STRING, INSPECT,
      *           FUNCTION UPPER-CASE, and reference modification.
      *================================================================

       ENVIRONMENT DIVISION.

       INPUT-OUTPUT SECTION.
       FILE-CONTROL.
           SELECT POLICY-INPUT-FILE
               ASSIGN TO "POLICYIN"
               ORGANIZATION IS SEQUENTIAL
               FILE STATUS IS WS-INPUT-STATUS.

           SELECT STANDARD-OUTPUT-FILE
               ASSIGN TO "STDRDOUT"
               ORGANIZATION IS SEQUENTIAL
               FILE STATUS IS WS-OUTPUT-STATUS.

       DATA DIVISION.

       FILE SECTION.
       FD  POLICY-INPUT-FILE
           RECORDING MODE IS F
           RECORD CONTAINS 200 CHARACTERS.
       01  FS-INPUT-RECORD.
           05  FS-IN-POLICY-NO        PIC X(10).
           05  FS-IN-NAME             PIC X(50).
           05  FS-IN-ADDR-LINE1       PIC X(30).
           05  FS-IN-ADDR-LINE2       PIC X(30).
           05  FS-IN-CITY-ST-ZIP      PIC X(30).
           05  FS-IN-PHONE            PIC X(10).
           05  FS-IN-DOB              PIC 9(8).
           05  FILLER                 PIC X(32).

       FD  STANDARD-OUTPUT-FILE
           RECORDING MODE IS F
           RECORD CONTAINS 250 CHARACTERS.
       01  FS-OUTPUT-RECORD.
           05  FS-OUT-POLICY-NO       PIC X(10).
           05  FS-OUT-NAME-PREFIX     PIC X(4).
           05  FS-OUT-FIRST-NAME      PIC X(20).
           05  FS-OUT-MIDDLE-NAME     PIC X(20).
           05  FS-OUT-LAST-NAME       PIC X(25).
           05  FS-OUT-NAME-SUFFIX     PIC X(5).
           05  FS-OUT-ADDR-NUMBER     PIC X(10).
           05  FS-OUT-ADDR-STREET     PIC X(30).
           05  FS-OUT-ADDR-UNIT       PIC X(10).
           05  FS-OUT-CITY            PIC X(20).
           05  FS-OUT-STATE           PIC X(2).
           05  FS-OUT-ZIP             PIC X(10).
           05  FS-OUT-PHONE           PIC X(10).
           05  FS-OUT-DOB             PIC 9(8).
           05  FS-OUT-STD-NAME-KEY    PIC X(40).
           05  FS-OUT-STD-STATUS      PIC X(1).
           05  FILLER                 PIC X(25).

       WORKING-STORAGE SECTION.

      *----------------------------------------------------------------
      * FILE STATUS
      *----------------------------------------------------------------
       01  WS-INPUT-STATUS            PIC X(2).
           88  INPUT-OK                          VALUE "00".
           88  INPUT-EOF                         VALUE "10".
       01  WS-OUTPUT-STATUS           PIC X(2).
           88  OUTPUT-OK                         VALUE "00".

      *----------------------------------------------------------------
      * NAME PARSING WORK FIELDS
      *----------------------------------------------------------------
       01  WS-NAME-WORK               PIC X(50).
       01  WS-NAME-UPPER              PIC X(50).
       01  WS-NAME-PARTS.
           05  WS-WORD                PIC X(25)
                                      OCCURS 8 TIMES.
       01  WS-WORD-COUNT              PIC 9(2).
       01  WS-UNSTR-PTR              PIC 9(3).
       01  WS-STR-PTR                PIC 9(3).
       01  WS-COMMA-FOUND            PIC X(1).
           88  HAS-COMMA                         VALUE 'Y'.
           88  NO-COMMA                          VALUE 'N'.

      *----------------------------------------------------------------
      * PARSED NAME COMPONENTS
      *----------------------------------------------------------------
       01  WS-PARSED-NAME.
           05  WS-P-PREFIX            PIC X(4).
           05  WS-P-FIRST             PIC X(20).
           05  WS-P-MIDDLE            PIC X(20).
           05  WS-P-LAST              PIC X(25).
           05  WS-P-SUFFIX            PIC X(5).

      *----------------------------------------------------------------
      * PREFIX AND SUFFIX TABLES
      *----------------------------------------------------------------
       01  WS-PREFIX-TABLE-DATA.
           05  FILLER PIC X(4) VALUE "MR  ".
           05  FILLER PIC X(4) VALUE "MRS ".
           05  FILLER PIC X(4) VALUE "MS  ".
           05  FILLER PIC X(4) VALUE "DR  ".
           05  FILLER PIC X(4) VALUE "REV ".
           05  FILLER PIC X(4) VALUE "HON ".
       01  WS-PREFIX-TABLE REDEFINES WS-PREFIX-TABLE-DATA.
           05  WS-KNOWN-PREFIX        PIC X(4)
                                      OCCURS 6 TIMES.
       01  WS-PFX-IDX                PIC 9(2).

       01  WS-SUFFIX-TABLE-DATA.
           05  FILLER PIC X(5) VALUE "JR   ".
           05  FILLER PIC X(5) VALUE "SR   ".
           05  FILLER PIC X(5) VALUE "II   ".
           05  FILLER PIC X(5) VALUE "III  ".
           05  FILLER PIC X(5) VALUE "IV   ".
           05  FILLER PIC X(5) VALUE "ESQ  ".
           05  FILLER PIC X(5) VALUE "MD   ".
           05  FILLER PIC X(5) VALUE "PHD  ".
       01  WS-SUFFIX-TABLE REDEFINES WS-SUFFIX-TABLE-DATA.
           05  WS-KNOWN-SUFFIX       PIC X(5)
                                      OCCURS 8 TIMES.
       01  WS-SFX-IDX                PIC 9(2).

      *----------------------------------------------------------------
      * ADDRESS PARSING WORK FIELDS
      *----------------------------------------------------------------
       01  WS-ADDR-WORK               PIC X(30).
       01  WS-ADDR-PARTS.
           05  WS-ADDR-WORD           PIC X(15)
                                      OCCURS 6 TIMES.
       01  WS-ADDR-WORD-COUNT        PIC 9(2).

      *----------------------------------------------------------------
      * STREET ABBREVIATION EXPANSION TABLE
      * Format: 5-char abbreviation + 15-char expansion
      *----------------------------------------------------------------
       01  WS-STREET-ABBREV-DATA.
           05  FILLER PIC X(20) VALUE "ST   STREET         ".
           05  FILLER PIC X(20) VALUE "AVE  AVENUE         ".
           05  FILLER PIC X(20) VALUE "BLVD BOULEVARD      ".
           05  FILLER PIC X(20) VALUE "DR   DRIVE          ".
           05  FILLER PIC X(20) VALUE "LN   LANE           ".
           05  FILLER PIC X(20) VALUE "RD   ROAD           ".
           05  FILLER PIC X(20) VALUE "CT   COURT          ".
           05  FILLER PIC X(20) VALUE "PL   PLACE          ".
           05  FILLER PIC X(20) VALUE "CIR  CIRCLE         ".
           05  FILLER PIC X(20) VALUE "PKY  PARKWAY        ".
           05  FILLER PIC X(20) VALUE "PKWY PARKWAY        ".
           05  FILLER PIC X(20) VALUE "HWY  HIGHWAY        ".
       01  WS-STREET-ABBREV-TABLE
               REDEFINES WS-STREET-ABBREV-DATA.
           05  WS-STREET-ENTRY        OCCURS 12 TIMES.
               10  WS-ABBREV          PIC X(5).
               10  WS-EXPANSION       PIC X(15).

      *----------------------------------------------------------------
      * DIRECTIONAL ABBREVIATION TABLE
      *----------------------------------------------------------------
       01  WS-DIR-ABBREV-DATA.
           05  FILLER PIC X(10) VALUE "N    NORTH".
           05  FILLER PIC X(10) VALUE "S    SOUTH".
           05  FILLER PIC X(10) VALUE "E    EAST ".
           05  FILLER PIC X(10) VALUE "W    WEST ".
           05  FILLER PIC X(10) VALUE "NE   NE   ".
           05  FILLER PIC X(10) VALUE "NW   NW   ".
           05  FILLER PIC X(10) VALUE "SE   SE   ".
           05  FILLER PIC X(10) VALUE "SW   SW   ".
       01  WS-DIR-TABLE REDEFINES WS-DIR-ABBREV-DATA.
           05  WS-DIR-ENTRY           OCCURS 8 TIMES.
               10  WS-DIR-ABBREV      PIC X(5).
               10  WS-DIR-EXPAND      PIC X(5).

      *----------------------------------------------------------------
      * CITY-STATE-ZIP PARSING
      *----------------------------------------------------------------
       01  WS-CSZ-WORK                PIC X(30).
       01  WS-CSZ-CITY               PIC X(20).
       01  WS-CSZ-STATE              PIC X(2).
       01  WS-CSZ-ZIP                PIC X(10).

      *----------------------------------------------------------------
      * GENERAL WORK FIELDS
      *----------------------------------------------------------------
       01  WS-IDX                     PIC 9(3).
       01  WS-IDX-2                   PIC 9(3).
       01  WS-TEMP-WORD              PIC X(25).
       01  WS-SCAN-POS               PIC 9(3).
       01  WS-TRIM-RESULT            PIC X(50).
       01  WS-IS-PREFIX              PIC X(1).
           88  WORD-IS-PREFIX                    VALUE 'Y'.
           88  WORD-NOT-PREFIX                   VALUE 'N'.
       01  WS-IS-SUFFIX              PIC X(1).
           88  WORD-IS-SUFFIX                    VALUE 'Y'.
           88  WORD-NOT-SUFFIX                   VALUE 'N'.

      *----------------------------------------------------------------
      * COUNTERS
      *----------------------------------------------------------------
       01  WS-COUNTERS.
           05  WS-TOTAL-READ         PIC S9(7) COMP-3 VALUE 0.
           05  WS-TOTAL-WRITTEN      PIC S9(7) COMP-3 VALUE 0.
           05  WS-NAMES-STANDARDIZED PIC S9(7) COMP-3 VALUE 0.
           05  WS-ADDRS-STANDARDIZED PIC S9(7) COMP-3 VALUE 0.
           05  WS-PARSE-ERRORS       PIC S9(7) COMP-3 VALUE 0.
       01  WS-DISP-COUNT             PIC Z,ZZZ,ZZ9.

       PROCEDURE DIVISION.

       0000-MAIN-CONTROL.
           PERFORM 1000-INITIALIZE
           PERFORM 2000-PROCESS-RECORDS
               UNTIL INPUT-EOF
           PERFORM 8000-PRINT-STATISTICS
           PERFORM 9000-FINALIZE
           STOP RUN
           .

       1000-INITIALIZE.
           DISPLAY "============================================="
           DISPLAY " HERITAGE LIFE INSURANCE COMPANY"
           DISPLAY " NAME AND ADDRESS STANDARDIZATION"
           DISPLAY "============================================="

           OPEN INPUT  POLICY-INPUT-FILE
                OUTPUT STANDARD-OUTPUT-FILE

           IF NOT INPUT-OK
               DISPLAY "ERROR: Cannot open input. Status: "
                       WS-INPUT-STATUS
               STOP RUN
           END-IF
           IF NOT OUTPUT-OK
               DISPLAY "ERROR: Cannot open output. Status: "
                       WS-OUTPUT-STATUS
               STOP RUN
           END-IF

           PERFORM 2100-READ-INPUT
           .

       2000-PROCESS-RECORDS.
           ADD 1 TO WS-TOTAL-READ
           INITIALIZE FS-OUTPUT-RECORD
           MOVE FS-IN-POLICY-NO TO FS-OUT-POLICY-NO
           MOVE FS-IN-PHONE     TO FS-OUT-PHONE
           MOVE FS-IN-DOB       TO FS-OUT-DOB

      *    Parse and standardize the name
           PERFORM 3000-STANDARDIZE-NAME

      *    Parse and standardize the address
           PERFORM 4000-STANDARDIZE-ADDRESS

      *    Build the matching key
           PERFORM 5000-BUILD-MATCH-KEY

      *    Write the output record
           PERFORM 6000-WRITE-OUTPUT

           PERFORM 2100-READ-INPUT
           .

       2100-READ-INPUT.
           READ POLICY-INPUT-FILE
               AT END SET INPUT-EOF TO TRUE
           END-READ
           .

       3000-STANDARDIZE-NAME.
      *    -------------------------------------------------------
      *    Step 1: Convert to uppercase and remove periods
      *    -------------------------------------------------------
           MOVE FUNCTION UPPER-CASE(FS-IN-NAME)
               TO WS-NAME-UPPER

      *    Remove all periods (DR. -> DR, J. -> J)
           INSPECT WS-NAME-UPPER
               REPLACING ALL "." BY " "

      *    Remove apostrophes for matching purposes
      *    (O'BRIEN -> O BRIEN, then later OBRIEN in key)
           INSPECT WS-NAME-UPPER
               REPLACING ALL "'" BY " "

      *    -------------------------------------------------------
      *    Step 2: Determine format (comma = Last, First)
      *    -------------------------------------------------------
           MOVE ZERO TO WS-SCAN-POS
           INSPECT WS-NAME-UPPER
               TALLYING WS-SCAN-POS FOR ALL ","
           IF WS-SCAN-POS > 0
               SET HAS-COMMA TO TRUE
           ELSE
               SET NO-COMMA TO TRUE
           END-IF

      *    Remove commas after detecting format
           INSPECT WS-NAME-UPPER
               REPLACING ALL "," BY " "

      *    -------------------------------------------------------
      *    Step 3: Split into individual words using UNSTRING
      *    -------------------------------------------------------
           INITIALIZE WS-NAME-PARTS
           MOVE ZERO TO WS-WORD-COUNT
           MOVE 1 TO WS-UNSTR-PTR

           UNSTRING WS-NAME-UPPER
               DELIMITED BY ALL SPACES
               INTO WS-WORD(1) WS-WORD(2) WS-WORD(3)
                    WS-WORD(4) WS-WORD(5) WS-WORD(6)
                    WS-WORD(7) WS-WORD(8)
               WITH POINTER WS-UNSTR-PTR
               TALLYING IN WS-WORD-COUNT
           END-UNSTRING

      *    -------------------------------------------------------
      *    Step 4: Identify and extract prefix (first word)
      *    -------------------------------------------------------
           INITIALIZE WS-PARSED-NAME
           SET WORD-NOT-PREFIX TO TRUE

           IF WS-WORD-COUNT > 0
               PERFORM VARYING WS-PFX-IDX FROM 1 BY 1
                   UNTIL WS-PFX-IDX > 6
                       OR WORD-IS-PREFIX
                   MOVE FUNCTION TRIM(WS-WORD(1))
                       TO WS-TEMP-WORD
                   MOVE FUNCTION TRIM(
                       WS-KNOWN-PREFIX(WS-PFX-IDX))
                       TO WS-TRIM-RESULT
                   IF WS-TEMP-WORD(1:FUNCTION LENGTH(
                       FUNCTION TRIM(
                       WS-KNOWN-PREFIX(WS-PFX-IDX))))
                       = WS-TRIM-RESULT(1:FUNCTION LENGTH(
                       FUNCTION TRIM(
                       WS-KNOWN-PREFIX(WS-PFX-IDX))))
                       SET WORD-IS-PREFIX TO TRUE
                       MOVE WS-KNOWN-PREFIX(WS-PFX-IDX)
                           TO WS-P-PREFIX
                   END-IF
               END-PERFORM
           END-IF

      *    -------------------------------------------------------
      *    Step 5: Identify and extract suffix (last word)
      *    -------------------------------------------------------
           SET WORD-NOT-SUFFIX TO TRUE

           IF WS-WORD-COUNT > 1
               PERFORM VARYING WS-SFX-IDX FROM 1 BY 1
                   UNTIL WS-SFX-IDX > 8
                       OR WORD-IS-SUFFIX
                   MOVE FUNCTION TRIM(
                       WS-WORD(WS-WORD-COUNT))
                       TO WS-TEMP-WORD
                   MOVE FUNCTION TRIM(
                       WS-KNOWN-SUFFIX(WS-SFX-IDX))
                       TO WS-TRIM-RESULT
                   IF WS-TEMP-WORD(1:FUNCTION LENGTH(
                       FUNCTION TRIM(
                       WS-KNOWN-SUFFIX(WS-SFX-IDX))))
                       = WS-TRIM-RESULT(1:FUNCTION LENGTH(
                       FUNCTION TRIM(
                       WS-KNOWN-SUFFIX(WS-SFX-IDX))))
                       SET WORD-IS-SUFFIX TO TRUE
                       MOVE WS-KNOWN-SUFFIX(WS-SFX-IDX)
                           TO WS-P-SUFFIX
                   END-IF
               END-PERFORM
           END-IF

      *    -------------------------------------------------------
      *    Step 6: Assign remaining words based on format
      *    -------------------------------------------------------
      *    Calculate first and last name word positions
           MOVE 1 TO WS-IDX
           IF WORD-IS-PREFIX
               ADD 1 TO WS-IDX
           END-IF

           MOVE WS-WORD-COUNT TO WS-IDX-2
           IF WORD-IS-SUFFIX
               SUBTRACT 1 FROM WS-IDX-2
           END-IF

           IF HAS-COMMA
      *        Comma format: first word(s) = last name
      *        remaining words = first name, middle
               MOVE WS-WORD(WS-IDX) TO WS-P-LAST
               IF WS-IDX + 1 <= WS-IDX-2
                   MOVE WS-WORD(WS-IDX + 1) TO WS-P-FIRST
               END-IF
               IF WS-IDX + 2 <= WS-IDX-2
                   MOVE WS-WORD(WS-IDX + 2) TO WS-P-MIDDLE
               END-IF
           ELSE
      *        Standard format: first middle last
               IF WS-IDX <= WS-IDX-2
                   MOVE WS-WORD(WS-IDX) TO WS-P-FIRST
               END-IF
               IF WS-IDX-2 > WS-IDX
                   MOVE WS-WORD(WS-IDX-2) TO WS-P-LAST
               END-IF
               IF WS-IDX + 1 < WS-IDX-2
                   MOVE WS-WORD(WS-IDX + 1) TO WS-P-MIDDLE
               END-IF
               IF WS-IDX = WS-IDX-2
      *            Only one name word -- treat as last name
                   MOVE WS-P-FIRST TO WS-P-LAST
                   MOVE SPACES TO WS-P-FIRST
               END-IF
           END-IF

      *    Move parsed fields to output
           MOVE WS-P-PREFIX TO FS-OUT-NAME-PREFIX
           MOVE WS-P-FIRST  TO FS-OUT-FIRST-NAME
           MOVE WS-P-MIDDLE TO FS-OUT-MIDDLE-NAME
           MOVE WS-P-LAST   TO FS-OUT-LAST-NAME
           MOVE WS-P-SUFFIX TO FS-OUT-NAME-SUFFIX

           ADD 1 TO WS-NAMES-STANDARDIZED
           .

       4000-STANDARDIZE-ADDRESS.
      *    -------------------------------------------------------
      *    Step 1: Standardize Address Line 1
      *    -------------------------------------------------------
           MOVE FUNCTION UPPER-CASE(FS-IN-ADDR-LINE1)
               TO WS-ADDR-WORK

      *    Remove periods from abbreviations
           INSPECT WS-ADDR-WORK
               REPLACING ALL "." BY " "

      *    Split address line into words
           INITIALIZE WS-ADDR-PARTS
           MOVE ZERO TO WS-ADDR-WORD-COUNT
           MOVE 1 TO WS-UNSTR-PTR

           UNSTRING WS-ADDR-WORK
               DELIMITED BY ALL SPACES
               INTO WS-ADDR-WORD(1) WS-ADDR-WORD(2)
                    WS-ADDR-WORD(3) WS-ADDR-WORD(4)
                    WS-ADDR-WORD(5) WS-ADDR-WORD(6)
               WITH POINTER WS-UNSTR-PTR
               TALLYING IN WS-ADDR-WORD-COUNT
           END-UNSTRING

      *    First word is usually the street number
           IF WS-ADDR-WORD-COUNT > 0
               MOVE WS-ADDR-WORD(1) TO FS-OUT-ADDR-NUMBER
           END-IF

      *    Expand directional abbreviations (2nd word)
           IF WS-ADDR-WORD-COUNT > 1
               PERFORM VARYING WS-IDX FROM 1 BY 1
                   UNTIL WS-IDX > 8
                   IF FUNCTION TRIM(WS-ADDR-WORD(2)) =
                      FUNCTION TRIM(WS-DIR-ABBREV(WS-IDX))
                       MOVE WS-DIR-EXPAND(WS-IDX)
                           TO WS-ADDR-WORD(2)
                   END-IF
               END-PERFORM
           END-IF

      *    Expand street type abbreviation (last word)
           IF WS-ADDR-WORD-COUNT > 2
               PERFORM VARYING WS-IDX FROM 1 BY 1
                   UNTIL WS-IDX > 12
                   IF FUNCTION TRIM(
                       WS-ADDR-WORD(WS-ADDR-WORD-COUNT)) =
                      FUNCTION TRIM(WS-ABBREV(WS-IDX))
                       MOVE WS-EXPANSION(WS-IDX)
                           TO WS-ADDR-WORD(WS-ADDR-WORD-COUNT)
                   END-IF
               END-PERFORM
           END-IF

      *    Reassemble the street address (without house number)
           MOVE SPACES TO FS-OUT-ADDR-STREET
           MOVE 1 TO WS-STR-PTR
           PERFORM VARYING WS-IDX FROM 2 BY 1
               UNTIL WS-IDX > WS-ADDR-WORD-COUNT
               IF WS-IDX > 2
                   STRING " " DELIMITED BY SIZE
                       INTO FS-OUT-ADDR-STREET
                       WITH POINTER WS-STR-PTR
                   END-STRING
               END-IF
               STRING FUNCTION TRIM(WS-ADDR-WORD(WS-IDX))
                          DELIMITED BY SIZE
                   INTO FS-OUT-ADDR-STREET
                   WITH POINTER WS-STR-PTR
               END-STRING
           END-PERFORM

      *    -------------------------------------------------------
      *    Step 2: Standardize Address Line 2 (unit/apartment)
      *    -------------------------------------------------------
           IF FS-IN-ADDR-LINE2 NOT = SPACES
               MOVE FUNCTION UPPER-CASE(FS-IN-ADDR-LINE2)
                   TO WS-ADDR-WORK
               INSPECT WS-ADDR-WORK
                   REPLACING ALL "." BY " "
      *        Standardize unit designators
               INSPECT WS-ADDR-WORK
                   REPLACING ALL "APT "  BY "UNIT"
                             ALL "SUITE" BY "UNIT "
                             ALL "#"     BY "UNIT "
               MOVE FUNCTION TRIM(WS-ADDR-WORK)
                   TO FS-OUT-ADDR-UNIT
           END-IF

      *    -------------------------------------------------------
      *    Step 3: Parse City, State, ZIP
      *    -------------------------------------------------------
           PERFORM 4100-PARSE-CITY-STATE-ZIP

           ADD 1 TO WS-ADDRS-STANDARDIZED
           .

       4100-PARSE-CITY-STATE-ZIP.
      *    -------------------------------------------------------
      *    Parse the city-state-zip field which may appear as:
      *    "SPRINGFIELD, IL 62704"
      *    "SPRINGFIELD IL 62704"
      *    "SPRINGFIELD,IL62704"
      *    Uses UNSTRING with multiple delimiters.
      *    -------------------------------------------------------
           MOVE FUNCTION UPPER-CASE(FS-IN-CITY-ST-ZIP)
               TO WS-CSZ-WORK
           INITIALIZE WS-CSZ-CITY
           INITIALIZE WS-CSZ-STATE
           INITIALIZE WS-CSZ-ZIP

      *    Replace commas with spaces for uniform parsing
           INSPECT WS-CSZ-WORK
               REPLACING ALL "," BY " "

      *    Use UNSTRING to split from the right: ZIP is last,
      *    state is second-to-last, city is the rest.
      *    Approach: scan from the end to find the ZIP code,
      *    then the 2-character state code.

      *    Extract ZIP code (5 or 9 digits at end of field)
           MOVE ZERO TO WS-SCAN-POS
           PERFORM VARYING WS-IDX FROM 30 BY -1
               UNTIL WS-IDX < 1 OR WS-SCAN-POS > 0
               IF WS-CSZ-WORK(WS-IDX:1) >= "0"
                   AND WS-CSZ-WORK(WS-IDX:1) <= "9"
                   IF WS-SCAN-POS = 0
                       MOVE WS-IDX TO WS-SCAN-POS
                   END-IF
               ELSE
                   IF WS-SCAN-POS > 0
      *                Found end of ZIP; extract it
                       MOVE WS-CSZ-WORK(WS-IDX + 1:
                           WS-SCAN-POS - WS-IDX)
                           TO WS-CSZ-ZIP
                       MOVE SPACES TO
                           WS-CSZ-WORK(WS-IDX + 1:
                           WS-SCAN-POS - WS-IDX)
                       MOVE WS-IDX TO WS-SCAN-POS
                       EXIT PERFORM
                   END-IF
               END-IF
           END-PERFORM

      *    Extract state code (2 uppercase letters)
           MOVE ZERO TO WS-SCAN-POS
           PERFORM VARYING WS-IDX FROM 28 BY -1
               UNTIL WS-IDX < 1 OR WS-SCAN-POS > 0
               IF WS-CSZ-WORK(WS-IDX:1) >= "A"
                   AND WS-CSZ-WORK(WS-IDX:1) <= "Z"
                   AND WS-CSZ-WORK(WS-IDX + 1:1) >= "A"
                   AND WS-CSZ-WORK(WS-IDX + 1:1) <= "Z"
                   AND (WS-IDX = 1
                       OR WS-CSZ-WORK(WS-IDX - 1:1) = SPACE)
                   MOVE WS-CSZ-WORK(WS-IDX:2) TO WS-CSZ-STATE
                   MOVE SPACES TO WS-CSZ-WORK(WS-IDX:2)
                   MOVE WS-IDX TO WS-SCAN-POS
               END-IF
           END-PERFORM

      *    Everything remaining is the city name
           MOVE FUNCTION TRIM(WS-CSZ-WORK)
               TO WS-CSZ-CITY

           MOVE WS-CSZ-CITY  TO FS-OUT-CITY
           MOVE WS-CSZ-STATE TO FS-OUT-STATE
           MOVE WS-CSZ-ZIP   TO FS-OUT-ZIP
           .

       5000-BUILD-MATCH-KEY.
      *    -------------------------------------------------------
      *    Build a 40-character matching key from the
      *    standardized name components:
      *    LAST(25) + FIRST-INITIAL(1) + MIDDLE-INITIAL(1)
      *    + ZIP(5) + DOB(8)
      *    This key enables efficient matching across systems.
      *    -------------------------------------------------------
           MOVE SPACES TO FS-OUT-STD-NAME-KEY

           MOVE 1 TO WS-STR-PTR
           STRING
               FUNCTION TRIM(FS-OUT-LAST-NAME)
                   DELIMITED BY SIZE
               "|" DELIMITED BY SIZE
               FS-OUT-FIRST-NAME(1:1)
                   DELIMITED BY SIZE
               "|" DELIMITED BY SIZE
               FS-OUT-MIDDLE-NAME(1:1)
                   DELIMITED BY SIZE
               "|" DELIMITED BY SIZE
               FUNCTION TRIM(FS-OUT-ZIP)
                   DELIMITED BY SIZE
               "|" DELIMITED BY SIZE
               FS-OUT-DOB
                   DELIMITED BY SIZE
               INTO FS-OUT-STD-NAME-KEY
               WITH POINTER WS-STR-PTR
           END-STRING

           MOVE "S" TO FS-OUT-STD-STATUS
           .

       6000-WRITE-OUTPUT.
           WRITE FS-OUTPUT-RECORD
           IF OUTPUT-OK
               ADD 1 TO WS-TOTAL-WRITTEN
           ELSE
               DISPLAY "WRITE ERROR: " WS-OUTPUT-STATUS
                       " Policy: " FS-OUT-POLICY-NO
               ADD 1 TO WS-PARSE-ERRORS
           END-IF
           .

       8000-PRINT-STATISTICS.
           DISPLAY " "
           DISPLAY "============================================="
           DISPLAY " STANDARDIZATION STATISTICS"
           DISPLAY "============================================="
           MOVE WS-TOTAL-READ TO WS-DISP-COUNT
           DISPLAY "  Records read:            " WS-DISP-COUNT
           MOVE WS-NAMES-STANDARDIZED TO WS-DISP-COUNT
           DISPLAY "  Names standardized:      " WS-DISP-COUNT
           MOVE WS-ADDRS-STANDARDIZED TO WS-DISP-COUNT
           DISPLAY "  Addresses standardized:  " WS-DISP-COUNT
           MOVE WS-TOTAL-WRITTEN TO WS-DISP-COUNT
           DISPLAY "  Records written:         " WS-DISP-COUNT
           MOVE WS-PARSE-ERRORS TO WS-DISP-COUNT
           DISPLAY "  Parse errors:            " WS-DISP-COUNT
           DISPLAY "============================================="
           .

       9000-FINALIZE.
           CLOSE POLICY-INPUT-FILE
                 STANDARD-OUTPUT-FILE
           DISPLAY " "
           DISPLAY "NASP processing complete."
           .

Solution Walkthrough

Name Parsing Strategy: Detect Format, Then Decompose

The name parser first determines whether the name is in "Last, First" format (contains a comma) or "First Last" format (no comma). This detection is done with INSPECT TALLYING before any modification:

           INSPECT WS-NAME-UPPER
               TALLYING WS-SCAN-POS FOR ALL ","

After format detection, commas and periods are removed with INSPECT REPLACING, and the name is split into individual words with UNSTRING using DELIMITED BY ALL SPACES. The ALL keyword is critical -- it treats consecutive spaces as a single delimiter, preventing empty words from appearing in the result.

Prefix and Suffix Identification: Table-Driven Matching

Rather than hard-coding prefix and suffix checks, the program uses lookup tables. Each word is compared against the prefix table (for the first word) and suffix table (for the last word). This table-driven approach means adding "CPT" as a new prefix requires only a table entry, not a code change.

Address Standardization: INSPECT REPLACING for Bulk Transformation

The address standardization uses INSPECT REPLACING ALL to perform bulk character substitutions in a single statement:

           INSPECT WS-ADDR-WORK
               REPLACING ALL "APT "  BY "UNIT"
                         ALL "SUITE" BY "UNIT "
                         ALL "#"     BY "UNIT "

This replaces multiple variations of unit designators with the standard form "UNIT" in a single pass. Note the careful sizing of replacement strings to match the original -- INSPECT REPLACING requires the replacement string to be the same length as the search string.

City-State-ZIP Parsing: Right-to-Left Scanning

The city-state-ZIP parser demonstrates a technique that compensates for UNSTRING's limitation of left-to-right scanning only. Since the ZIP code and state code are always at the end of the field, the parser scans from right to left using reference modification, extracting the ZIP first, then the state, and leaving the city as whatever remains.

Matching Key Construction: STRING with Delimiters

The matching key is built using STRING to concatenate standardized components with pipe delimiters. This produces a key like SMITH|J|Q|62704|19750315 that can be compared across systems regardless of how the original name was formatted.


Lessons Learned

1. FUNCTION UPPER-CASE Simplifies Case Normalization

Converting to uppercase as the first step in every parsing operation eliminates case-sensitivity from all subsequent comparisons. Without this, every comparison would need to handle mixed case.

2. INSPECT REPLACING Requires Equal-Length Strings

The restriction that replacement strings must be the same length as search strings is a common source of bugs. "APT " (4 chars) can be replaced by "UNIT" (4 chars), but "APT" (3 chars) cannot be replaced by "UNIT" (4 chars). Padding with spaces is necessary.

3. UNSTRING TALLYING Counts Fields, Not Delimiters

The TALLYING IN clause on UNSTRING counts the number of receiving fields populated, not the number of delimiters found. This is essential for knowing how many name words were extracted.

4. Right-to-Left Scanning Requires Reference Modification

UNSTRING always works left to right. When the structure of a field is known from the right (such as ZIP codes and state codes at the end of a city-state-ZIP field), reference modification with a backward loop is the practical solution.

5. Matching Keys Reduce Complex Comparisons to Simple Ones

By building a normalized matching key in the output record, downstream matching programs can use simple string comparison instead of repeating the normalization logic.


Discussion Questions

  1. The name parser handles "Last, First" and "First Last" formats but not "First MI Last, Suffix" (where the comma precedes the suffix rather than separating last from first). How would you extend the parser to detect and handle this additional format?

  2. INSPECT REPLACING ALL has a subtle behavior: replacements proceed left to right, and each character position is examined only once. What would happen if you tried to replace " " (double space) with " " (single space) in a string containing four consecutive spaces? How would you collapse all multiple spaces to single spaces?

  3. The street abbreviation table stores both the abbreviation and its expansion. An alternative design would use INSPECT CONVERTING to map abbreviations. Why is INSPECT CONVERTING not suitable for this task?

  4. The matching key uses the first character of the first name and middle name. This means "JAMES" and "JOHN" would both produce "J". What are the trade-offs of using more characters? How would you balance match precision against false-positive rates?

  5. The program removes apostrophes from names (O'BRIEN becomes O BRIEN). This aids matching but loses information. How would you preserve the original name while still enabling apostrophe-insensitive matching?

  6. The city-state-ZIP parser assumes the state code is a two-letter abbreviation. How would it behave if the input contained a fully spelled state name like "ILLINOIS"? How would you extend the parser to handle this case?

  7. The program processes one record at a time. If Heritage Life needed to detect and merge duplicate policyholders (not just standardize records), what additional data structures and logic would be needed? Could this be done within a single-pass COBOL batch program, or would it require multiple passes?