31 min read

> "I spent fifteen years writing cursor loops that processed one row at a time. When I finally benchmarked multi-row FETCH against the same workload, I wanted to go back in time and slap myself." — Kwame Asante, Senior Systems Programmer, City...

Chapter 7: Advanced SQL for COBOL: Multi-Row Operations, Temporal Tables, Recursive CTEs, and OLAP Functions

"I spent fifteen years writing cursor loops that processed one row at a time. When I finally benchmarked multi-row FETCH against the same workload, I wanted to go back in time and slap myself." — Kwame Asante, Senior Systems Programmer, City National Bank

You know how to embed SQL in COBOL. You can declare cursors, FETCH into host variables, handle SQLCODEs, and write WHERE clauses that the optimizer doesn't hate. Chapters 1 and 6 gave you that foundation. Now we throw away the training wheels.

This chapter covers four capabilities that separate the COBOL-DB2 practitioner who writes adequate code from the one who writes code that survives peak-volume production: multi-row operations, temporal tables, recursive CTEs, and OLAP functions. Every one of these features exists because IBM looked at real workloads — batch jobs processing hundreds of millions of rows, audit requirements demanding historical state reconstruction, hierarchy traversals that generated thousands of recursive cursor calls — and built SQL-level solutions. Your job is to use them.

The HA Banking Transaction Processing System you're building will need all four. Multi-row operations for the nightly batch cycle that processes every transaction posted during the day. Temporal tables for regulatory audit trails that must answer "what did this account look like at 3:47 PM on March 3rd?" Recursive CTEs for the organizational hierarchy that determines approval chains. OLAP functions for the running balance calculations that currently consume 40% of the daily batch window.

Let's get to work.


7.1 Beyond SELECT...INTO: Why Single-Row Operations Are Killing Your Batch Jobs

Here's a number that should bother you: every time your COBOL program executes an SQL statement, DB2 performs a thread switch between the application address space and the DB2 address space. On a z16 running at full clip, that switch costs roughly 0.01 to 0.05 milliseconds. Sounds trivial. Now multiply it by 500 million — the number of transactions CNB processes daily.

At 0.03ms per switch, 500 million switches consume 4.17 hours just in thread-switching overhead. Not I/O. Not SQL execution. Just the cost of your COBOL program saying "Hey DB2, I need something" and DB2 saying "Here you go" — five hundred million times.

The Single-Row Bottleneck

The classic COBOL-DB2 pattern you learned in every training class looks like this:

EXEC SQL
    DECLARE CSR-TRANSACTIONS CURSOR FOR
    SELECT ACCT_NUM, TXN_AMT, TXN_DATE, TXN_TYPE
    FROM TRANSACTION_HISTORY
    WHERE PROCESS_FLAG = 'N'
    ORDER BY ACCT_NUM, TXN_DATE
END-EXEC.

EXEC SQL OPEN CSR-TRANSACTIONS END-EXEC.

PERFORM UNTIL SQLCODE = 100
    EXEC SQL
        FETCH CSR-TRANSACTIONS
        INTO :WS-ACCT-NUM, :WS-TXN-AMT,
             :WS-TXN-DATE, :WS-TXN-TYPE
    END-EXEC
    IF SQLCODE = 0
        PERFORM PROCESS-SINGLE-TRANSACTION
    END-IF
END-PERFORM.

One FETCH, one row, one thread switch. For every single row. If you're processing 50 million rows in a nightly batch, you're doing 50 million thread switches just for the FETCH operations — before you count the UPDATE to mark each row as processed, and the INSERT to write the audit record. That's 150 million thread switches minimum.

What Multi-Row Operations Change

DB2 introduced multi-row FETCH in V8 and expanded the capability through subsequent releases. The concept is simple: instead of fetching one row per SQL call, you fetch n rows. Instead of inserting one row per SQL call, you insert n rows. The thread switch happens once for the entire block.

The performance math changes dramatically:

Operation Pattern Rows Thread Switches Overhead at 0.03ms
Single-row FETCH 50,000,000 50,000,000 25 minutes
Multi-row FETCH (100 rows) 50,000,000 500,000 15 seconds
Multi-row FETCH (1,000 rows) 50,000,000 50,000 1.5 seconds

That's not a typo. Going from single-row to multi-row FETCH with a rowset size of 1,000 reduces thread-switching overhead from 25 minutes to 1.5 seconds. The actual I/O and SQL processing time stays roughly the same — you're eliminating pure overhead.

💡 Production Reality: The sweet spot for rowset size is usually between 100 and 1,000 rows. Beyond 1,000, the WORKING-STORAGE requirements grow large and you start competing with other programs for virtual storage. Kwame's team at CNB settled on 500 rows after extensive benchmarking against their z15 configuration. Your optimal number will depend on row width, available storage, and buffer pool configuration.

But It's Not Just FETCH

Multi-row operations extend to INSERT and the MERGE statement (DB2's upsert pattern). The same thread-switching arithmetic applies. If your batch job produces audit records, confirmation records, or summary records, multi-row INSERT eliminates the same overhead on the output side.

And MERGE — the operation that checks "does this row exist? If yes, UPDATE it; if no, INSERT it" — eliminates the SELECT-then-decide-then-INSERT-or-UPDATE pattern that costs you three thread switches per row.

We'll cover each operation in detail. First, multi-row FETCH and INSERT.


7.2 Multi-Row FETCH and INSERT: COBOL Host Variable Arrays

The Mechanics of Multi-Row FETCH

Multi-row FETCH requires three things your single-row FETCH doesn't:

  1. A rowset-positioned cursor — declared with the WITH ROWSET POSITIONING clause
  2. Host variable arrays — COBOL tables (OCCURS) sized to match your rowset
  3. The FETCH...FOR n ROWS syntax — telling DB2 how many rows you want per call

Here's the same transaction fetch rewritten for multi-row operation:

       WORKING-STORAGE SECTION.

       01  WS-ROWSET-SIZE         PIC S9(4) COMP VALUE 500.
       01  WS-ROWS-FETCHED        PIC S9(9) COMP VALUE 0.

       01  WS-TXN-ARRAY.
           05  WS-TXN-ROW OCCURS 500 TIMES.
               10  WS-ACCT-NUM    PIC X(12).
               10  WS-TXN-AMT    PIC S9(13)V99 COMP-3.
               10  WS-TXN-DATE   PIC X(10).
               10  WS-TXN-TYPE   PIC X(3).

       01  WS-IND-ARRAY.
           05  WS-IND-ROW OCCURS 500 TIMES.
               10  WS-IND-ACCT   PIC S9(4) COMP.
               10  WS-IND-AMT    PIC S9(4) COMP.
               10  WS-IND-DATE   PIC S9(4) COMP.
               10  WS-IND-TYPE   PIC S9(4) COMP.

       01  WS-IDX                 PIC S9(9) COMP VALUE 0.

       PROCEDURE DIVISION.

           EXEC SQL
               DECLARE CSR-TXN-MULTI CURSOR
               WITH ROWSET POSITIONING FOR
               SELECT ACCT_NUM, TXN_AMT, TXN_DATE, TXN_TYPE
               FROM TRANSACTION_HISTORY
               WHERE PROCESS_FLAG = 'N'
               ORDER BY ACCT_NUM, TXN_DATE
           END-EXEC.

           EXEC SQL OPEN CSR-TXN-MULTI END-EXEC.

           PERFORM UNTIL SQLCODE = 100
               EXEC SQL
                   FETCH NEXT ROWSET FROM CSR-TXN-MULTI
                   FOR :WS-ROWSET-SIZE ROWS
                   INTO :WS-ACCT-NUM, :WS-TXN-AMT,
                        :WS-TXN-DATE, :WS-TXN-TYPE
                        :WS-IND-ACCT, :WS-IND-AMT,
                        :WS-IND-DATE, :WS-IND-TYPE
               END-EXEC

               IF SQLCODE = 0 OR SQLCODE = 100
                   MOVE SQLERRD(3) TO WS-ROWS-FETCHED
                   PERFORM VARYING WS-IDX FROM 1 BY 1
                       UNTIL WS-IDX > WS-ROWS-FETCHED
                       PERFORM PROCESS-SINGLE-TRANSACTION
                   END-PERFORM
               END-IF
           END-PERFORM.

           EXEC SQL CLOSE CSR-TXN-MULTI END-EXEC.

Let me walk through the critical details.

SQLERRD(3) tells you the actual row count. When you ask for 500 rows and only 317 remain, SQLCODE is 100 (end of result set) but SQLERRD(3) is 317. You must check SQLERRD(3) to know how many rows were actually placed in your arrays. Ignoring this and processing all 500 slots will give you garbage data in positions 318–500 from whatever was left in WORKING-STORAGE.

⚠️ Production Bug Alert: The most common multi-row FETCH bug in production is processing stale data from the array. When the last FETCH returns SQLCODE 100 with SQLERRD(3) = 317, your loop must stop at 317 — not 500. Kwame's team found this bug in three different programs during their multi-row migration at CNB. In one case, the program had been processing phantom transactions for six weeks before anyone noticed the account balance discrepancies.

Indicator variable arrays are mandatory in production. If any column can be NULL, you need indicator arrays. The indicator array must have the same OCCURS count as the host variable array. A value of -1 in the indicator means the corresponding column was NULL for that row. Skipping indicator arrays on nullable columns will get you SQLCODE -305 in production — and it will happen at 2 AM on the one night your on-call person is unreachable.

The OCCURS dimension must match the FOR n ROWS value. If your arrays are OCCURS 500 and you FETCH FOR 1000 ROWS, DB2 will happily write past the end of your arrays and corrupt whatever sits after them in WORKING-STORAGE. The COBOL compiler won't catch this. DB2 won't catch this. You'll find out when unrelated variables suddenly contain garbage. Use a working-storage variable for the rowset size and reference it in both the OCCURS and the FOR clause to keep them synchronized.

Multi-Row INSERT

The INSERT side follows the same pattern. Instead of inserting one audit record per transaction, you fill an array and insert the entire batch:

       01  WS-AUDIT-ARRAY.
           05  WS-AUDIT-ROW OCCURS 500 TIMES.
               10  WS-AUD-ACCT   PIC X(12).
               10  WS-AUD-AMT    PIC S9(13)V99 COMP-3.
               10  WS-AUD-TSTAMP PIC X(26).
               10  WS-AUD-ACTION PIC X(10).
               10  WS-AUD-USER   PIC X(8).

       01  WS-INSERT-COUNT        PIC S9(4) COMP VALUE 0.

       PROCEDURE DIVISION.

       ACCUMULATE-AUDIT-ROW.
           ADD 1 TO WS-INSERT-COUNT.
           MOVE WS-ACCT-NUM(WS-IDX) TO
               WS-AUD-ACCT(WS-INSERT-COUNT).
           MOVE WS-TXN-AMT(WS-IDX) TO
               WS-AUD-AMT(WS-INSERT-COUNT).
           MOVE WS-CURRENT-TSTAMP TO
               WS-AUD-TSTAMP(WS-INSERT-COUNT).
           MOVE 'PROCESSED' TO
               WS-AUD-ACTION(WS-INSERT-COUNT).
           MOVE WS-PROGRAM-USER TO
               WS-AUD-USER(WS-INSERT-COUNT).

           IF WS-INSERT-COUNT >= WS-ROWSET-SIZE
               PERFORM FLUSH-AUDIT-ROWS
           END-IF.

       FLUSH-AUDIT-ROWS.
           IF WS-INSERT-COUNT > 0
               EXEC SQL
                   INSERT INTO AUDIT_TRAIL
                   (ACCT_NUM, TXN_AMT, AUDIT_TSTAMP,
                    AUDIT_ACTION, AUDIT_USER)
                   VALUES (:WS-AUD-ACCT, :WS-AUD-AMT,
                           :WS-AUD-TSTAMP, :WS-AUD-ACTION,
                           :WS-AUD-USER)
                   FOR :WS-INSERT-COUNT ROWS
               END-EXEC
               EVALUATE SQLCODE
                   WHEN 0
                       ADD WS-INSERT-COUNT TO
                           WS-TOTAL-INSERTS
                   WHEN OTHER
                       PERFORM HANDLE-INSERT-ERROR
               END-EVALUATE
               MOVE 0 TO WS-INSERT-COUNT
           END-IF.

The pattern is accumulate-then-flush. You fill the array one row at a time during your processing loop, and when the array is full (or you've reached end-of-data), you flush the entire batch with one INSERT...FOR n ROWS.

Don't forget the final flush. If you process 50,000,317 rows with a rowset size of 500, the last 317 rows will sit in the array un-inserted after your main loop ends. You must call FLUSH-AUDIT-ROWS one final time after the loop completes. This is the second most common multi-row bug after the stale-data problem.

Storage Considerations

Multi-row operations trade CPU time for WORKING-STORAGE space. Each rowset requires enough storage for (rowset_size * row_width) bytes for the data arrays plus (rowset_size * 2 * column_count) bytes for indicator arrays.

For a 500-row rowset with 10 columns averaging 20 bytes each: - Data: 500 * 200 bytes = 100 KB - Indicators: 500 * 2 * 10 = 10 KB - Total: ~110 KB

That's modest. But if your row is 4 KB wide (large VARCHAR columns, LOB locators) and your rowset is 1,000 rows: - Data: 1,000 * 4,096 = 4 MB - That competes with your region size, especially if the program runs in a CICS transaction.

📊 Benchmark Data from CNB: Kwame's team tested rowset sizes of 50, 100, 200, 500, 1000, and 2000 against their transaction processing workload (avg row width 187 bytes). Performance improved linearly from 50 to 500, showed diminishing returns from 500 to 1000, and actually degraded at 2000 due to virtual storage paging. Their production standard is now 500 for batch, 100 for online (CICS/IMS).

Atomic vs. Non-Atomic Insert

DB2 supports two modes for multi-row INSERT:

  • Non-atomic (default): If row 347 out of 500 fails (e.g., duplicate key), rows 1–346 are inserted and rows 348–500 are attempted. You get SQLCODE +252 and must check SQLERRD(3) for the count of successful inserts.
  • Atomic: Declared with ATOMIC keyword. If any row fails, all rows in the batch are rolled back. You get the SQLCODE for the failing row.

For the HA Banking system, use atomic inserts for financial transactions (you don't want partial batches posted) and non-atomic for audit/logging tables (you want as many records written as possible even if some fail).

      * Atomic - all or nothing
           EXEC SQL
               INSERT INTO POSTED_TRANSACTIONS
               (ACCT_NUM, TXN_AMT, POST_DATE)
               VALUES (:WS-POST-ACCT, :WS-POST-AMT,
                       :WS-POST-DATE)
               FOR :WS-INSERT-COUNT ROWS
               ATOMIC
           END-EXEC.

      * Non-atomic - best effort
           EXEC SQL
               INSERT INTO AUDIT_LOG
               (ACCT_NUM, LOG_MSG, LOG_TSTAMP)
               VALUES (:WS-LOG-ACCT, :WS-LOG-MSG,
                       :WS-LOG-TSTAMP)
               FOR :WS-INSERT-COUNT ROWS
               NOT ATOMIC CONTINUE ON SQLEXCEPTION
           END-EXEC.

7.3 The MERGE Statement: Eliminating the SELECT-Decide-Write Anti-Pattern

The Problem MERGE Solves

Every COBOL batch program I've maintained in 25 years has some version of this logic:

EXEC SQL
    SELECT COUNT(*) INTO :WS-COUNT
    FROM ACCOUNT_SUMMARY
    WHERE ACCT_NUM = :WS-ACCT-NUM
END-EXEC.

IF WS-COUNT > 0
    EXEC SQL
        UPDATE ACCOUNT_SUMMARY
        SET BALANCE = BALANCE + :WS-TXN-AMT,
            LAST_TXN_DATE = :WS-TXN-DATE
        WHERE ACCT_NUM = :WS-ACCT-NUM
    END-EXEC
ELSE
    EXEC SQL
        INSERT INTO ACCOUNT_SUMMARY
        (ACCT_NUM, BALANCE, LAST_TXN_DATE, OPEN_DATE)
        VALUES (:WS-ACCT-NUM, :WS-TXN-AMT,
                :WS-TXN-DATE, CURRENT DATE)
    END-EXEC
END-IF.

Three SQL statements. Three thread switches. And a concurrency hole: between your SELECT COUNT(*) and your INSERT, another thread could insert the same ACCT_NUM, giving you a duplicate key error. You handle that with SQLCODE -803 and retry logic, adding complexity and more potential thread switches.

MERGE Syntax for COBOL

MERGE does this in one statement:

           EXEC SQL
               MERGE INTO ACCOUNT_SUMMARY AS TGT
               USING (VALUES (:WS-ACCT-NUM,
                              :WS-TXN-AMT,
                              :WS-TXN-DATE))
                   AS SRC (ACCT_NUM, TXN_AMT, TXN_DATE)
               ON TGT.ACCT_NUM = SRC.ACCT_NUM
               WHEN MATCHED THEN
                   UPDATE SET
                       BALANCE = TGT.BALANCE + SRC.TXN_AMT,
                       LAST_TXN_DATE = SRC.TXN_DATE
               WHEN NOT MATCHED THEN
                   INSERT (ACCT_NUM, BALANCE,
                           LAST_TXN_DATE, OPEN_DATE)
                   VALUES (SRC.ACCT_NUM, SRC.TXN_AMT,
                           SRC.TXN_DATE, CURRENT DATE)
           END-EXEC.

One statement. One thread switch. No concurrency gap. DB2 handles the existence check, the locking, and the insert-or-update decision atomically.

MERGE with Multi-Row Source

The real power emerges when you combine MERGE with a multi-row source. Instead of merging one row at a time, you can merge an entire batch using a global temporary table or a common table expression as the source:

      * Step 1: Multi-row INSERT into a global temp table
           EXEC SQL
               INSERT INTO SESSION.TXN_BATCH
               (ACCT_NUM, TXN_AMT, TXN_DATE)
               VALUES (:WS-BATCH-ACCT, :WS-BATCH-AMT,
                       :WS-BATCH-DATE)
               FOR :WS-BATCH-COUNT ROWS
           END-EXEC.

      * Step 2: MERGE from temp table into target
           EXEC SQL
               MERGE INTO ACCOUNT_SUMMARY AS TGT
               USING SESSION.TXN_BATCH AS SRC
               ON TGT.ACCT_NUM = SRC.ACCT_NUM
               WHEN MATCHED THEN
                   UPDATE SET
                       BALANCE = TGT.BALANCE + SRC.TXN_AMT,
                       LAST_TXN_DATE = SRC.TXN_DATE
               WHEN NOT MATCHED THEN
                   INSERT (ACCT_NUM, BALANCE,
                           LAST_TXN_DATE, OPEN_DATE)
                   VALUES (SRC.ACCT_NUM, SRC.TXN_AMT,
                           SRC.TXN_DATE, CURRENT DATE)
           END-EXEC.

      * Step 3: Clean up temp table for next batch
           EXEC SQL
               DELETE FROM SESSION.TXN_BATCH
           END-EXEC.

This pattern — load a temp table with multi-row INSERT, then MERGE from the temp table — gives you both the thread-switching benefit of multi-row operations and the atomic upsert semantics of MERGE.

🔗 Connection to Chapter 6: The optimizer treats MERGE as a combination of an index lookup (the ON clause) and either an UPDATE or INSERT. The access path for the ON clause follows the same rules you learned in Chapter 6. If ACCT_NUM has a unique index, the ON clause resolves via index lookup. If it doesn't, you're doing a tablespace scan for every source row — which is a disaster at scale. Always ensure the ON clause columns are indexed.

MERGE with Conditional Logic

MERGE supports additional filtering in the WHEN clauses, letting you apply different actions based on data conditions:

           EXEC SQL
               MERGE INTO ACCOUNT_SUMMARY AS TGT
               USING (VALUES (:WS-ACCT-NUM,
                              :WS-TXN-AMT,
                              :WS-TXN-DATE,
                              :WS-TXN-TYPE))
                   AS SRC (ACCT_NUM, TXN_AMT,
                           TXN_DATE, TXN_TYPE)
               ON TGT.ACCT_NUM = SRC.ACCT_NUM
               WHEN MATCHED AND SRC.TXN_TYPE = 'CLO' THEN
                   UPDATE SET
                       BALANCE = 0,
                       ACCT_STATUS = 'C',
                       CLOSE_DATE = SRC.TXN_DATE
               WHEN MATCHED AND SRC.TXN_TYPE <> 'CLO' THEN
                   UPDATE SET
                       BALANCE = TGT.BALANCE + SRC.TXN_AMT,
                       LAST_TXN_DATE = SRC.TXN_DATE
               WHEN NOT MATCHED THEN
                   INSERT (ACCT_NUM, BALANCE,
                           LAST_TXN_DATE, OPEN_DATE,
                           ACCT_STATUS)
                   VALUES (SRC.ACCT_NUM, SRC.TXN_AMT,
                           SRC.TXN_DATE, CURRENT DATE, 'A')
           END-EXEC.

The multiple WHEN MATCHED clauses let you branch the update logic based on the source data. DB2 evaluates them in order and executes the first one that matches. This replaces what would otherwise be an IF/ELSE structure in COBOL wrapped around two separate UPDATE statements — eliminating an entire thread switch and the associated lock acquisition overhead.

MERGE Gotchas for COBOL Programmers

SQLCODE after MERGE: A successful MERGE returns SQLCODE 0. SQLERRD(3) contains the total number of rows affected (updates + inserts combined). If you need to know how many were updates vs. inserts, you must use the DATA CHANGE TABLE REFERENCE extension or count them yourself through other means.

Duplicate source rows: If your source data contains two rows with the same ACCT_NUM, MERGE will fail with SQLCODE -788. The source of a MERGE must not contain duplicate join keys. If your input might have duplicates, aggregate them first (using a CTE or pre-processing in COBOL).

MERGE and triggers: If the target table has BEFORE/AFTER triggers for both INSERT and UPDATE, both sets of triggers are active during MERGE. If your INSERT trigger does something that conflicts with your UPDATE trigger, you'll get unexpected behavior. Review triggers before implementing MERGE against existing tables.


7.4 Temporal Tables: The Database Remembers Everything

Why Temporal Tables Matter for Mainframe Systems

Sandra at Federal Benefits puts it bluntly: "We got subpoenaed. They wanted to know exactly what a beneficiary's eligibility status was on a specific date eighteen months ago. We had audit trails, but reconstructing the state from a sequence of change records took two analysts three weeks. With temporal tables, it would have been a single SQL query."

Temporal tables maintain historical versions of rows automatically. When you UPDATE a row, DB2 doesn't just change it — it preserves the old version in a history table with timestamps marking when it was valid. When you DELETE a row, the deleted version is preserved. You can then query the table "as of" any point in time.

DB2 for z/OS supports three types of temporal tables:

System-Period Temporal Tables

The database manages the timestamps. Every row has two system-generated columns: SYS_START (when this version became current) and SYS_END (when this version stopped being current). The current row has SYS_END set to the maximum timestamp value ('9999-12-30-00.00.00.000000').

CREATE TABLE BENEFICIARY_STATUS
(
    BENEFICIARY_ID   CHAR(10)       NOT NULL,
    STATUS_CODE      CHAR(2)        NOT NULL,
    BENEFIT_LEVEL    CHAR(3)        NOT NULL,
    MONTHLY_AMOUNT   DECIMAL(9,2)   NOT NULL,
    CASE_WORKER_ID   CHAR(8)        NOT NULL,
    SYS_START        TIMESTAMP(12)  NOT NULL
                     GENERATED ALWAYS AS ROW BEGIN,
    SYS_END          TIMESTAMP(12)  NOT NULL
                     GENERATED ALWAYS AS ROW END,
    TXN_START_ID     TIMESTAMP(12)
                     GENERATED ALWAYS AS TRANSACTION
                     START ID,
    PERIOD SYSTEM_TIME (SYS_START, SYS_END)
)
IN DBFEDBEN.TSBENSTATUS;

CREATE TABLE BENEFICIARY_STATUS_HIST
(
    BENEFICIARY_ID   CHAR(10)       NOT NULL,
    STATUS_CODE      CHAR(2)        NOT NULL,
    BENEFIT_LEVEL    CHAR(3)        NOT NULL,
    MONTHLY_AMOUNT   DECIMAL(9,2)   NOT NULL,
    CASE_WORKER_ID   CHAR(8)        NOT NULL,
    SYS_START        TIMESTAMP(12)  NOT NULL,
    SYS_END          TIMESTAMP(12)  NOT NULL,
    TXN_START_ID     TIMESTAMP(12)
)
IN DBFEDBEN.TSBENHIST;

ALTER TABLE BENEFICIARY_STATUS
    ADD VERSIONING USE HISTORY TABLE
    BENEFICIARY_STATUS_HIST;

Once versioning is active, every UPDATE and DELETE automatically archives the old row to BENEFICIARY_STATUS_HIST. Your COBOL program doesn't change at all — the standard UPDATE statement works exactly as before:

           EXEC SQL
               UPDATE BENEFICIARY_STATUS
               SET STATUS_CODE = :WS-NEW-STATUS,
                   BENEFIT_LEVEL = :WS-NEW-LEVEL,
                   MONTHLY_AMOUNT = :WS-NEW-AMOUNT
               WHERE BENEFICIARY_ID = :WS-BEN-ID
           END-EXEC.

DB2 handles the archival transparently. The old row (with its previous STATUS_CODE, BENEFIT_LEVEL, and MONTHLY_AMOUNT) goes to the history table with SYS_END set to the current timestamp.

Querying Temporal Data

This is where the power lives. Sandra's subpoena response becomes:

SELECT BENEFICIARY_ID, STATUS_CODE, BENEFIT_LEVEL,
       MONTHLY_AMOUNT, CASE_WORKER_ID,
       SYS_START, SYS_END
FROM BENEFICIARY_STATUS
FOR SYSTEM_TIME AS OF '2025-09-15-15.47.00.000000'
WHERE BENEFICIARY_ID = '0047829156';

That single query returns the row as it existed at that exact timestamp — whether the current row or a historical version. No application code changes. No cursor loops through audit trails. No reconstruction logic.

The temporal query forms are:

Syntax Returns
FOR SYSTEM_TIME AS OF timestamp The single version valid at that moment
FOR SYSTEM_TIME FROM ts1 TO ts2 All versions overlapping the range (exclusive end)
FOR SYSTEM_TIME BETWEEN ts1 AND ts2 All versions overlapping the range (inclusive end)

Embedding Temporal Queries in COBOL

From COBOL, temporal queries work like any other SQL:

       01  WS-QUERY-TSTAMP        PIC X(26).
       01  WS-BEN-ID              PIC X(10).
       01  WS-STATUS-CODE         PIC X(2).
       01  WS-BENEFIT-LVL         PIC X(3).
       01  WS-MONTHLY-AMT         PIC S9(7)V99 COMP-3.

           MOVE '2025-09-15-15.47.00.000000'
               TO WS-QUERY-TSTAMP.

           EXEC SQL
               SELECT STATUS_CODE, BENEFIT_LEVEL,
                      MONTHLY_AMOUNT
               INTO :WS-STATUS-CODE, :WS-BENEFIT-LVL,
                    :WS-MONTHLY-AMT
               FROM BENEFICIARY_STATUS
               FOR SYSTEM_TIME AS OF :WS-QUERY-TSTAMP
               WHERE BENEFICIARY_ID = :WS-BEN-ID
           END-EXEC.

⚠️ Critical Production Consideration: The history table grows without bound. DB2 doesn't automatically purge it. For a table with 10 million rows that averages two updates per row per month, the history table will accumulate 240 million rows per year. You need a retention policy and a batch job to purge history older than your regulatory requirement (typically 7 years for financial, 10 years for federal benefits). Sandra's team at Federal Benefits runs a monthly purge job with a 10-year retention window.

Application-Period Temporal Tables

Application-period temporal tables let your application define the validity period. Instead of system-managed timestamps ("when was this row current in the database"), you define business-meaningful periods ("when was this insurance policy active").

CREATE TABLE INSURANCE_POLICY
(
    POLICY_ID        CHAR(12)       NOT NULL,
    HOLDER_ID        CHAR(10)       NOT NULL,
    COVERAGE_TYPE    CHAR(3)        NOT NULL,
    PREMIUM_AMT      DECIMAL(9,2)   NOT NULL,
    POLICY_START     DATE           NOT NULL,
    POLICY_END       DATE           NOT NULL,
    PERIOD BUSINESS_TIME (POLICY_START, POLICY_END)
);

Queries use FOR BUSINESS_TIME instead of FOR SYSTEM_TIME:

-- What policies were active on July 4th, 2025?
SELECT POLICY_ID, HOLDER_ID, COVERAGE_TYPE, PREMIUM_AMT
FROM INSURANCE_POLICY
FOR BUSINESS_TIME AS OF '2025-07-04'
WHERE HOLDER_ID = 'H00482910';

Temporal Query Patterns for COBOL Batch

In practice, temporal queries in COBOL batch programs follow several recurring patterns. Here are the three most common:

Pattern 1: Point-in-time state reconstruction. You need every row from a table as it existed at a specific moment — for example, all account balances at quarter-end close:

           EXEC SQL
               DECLARE CSR-QUARTER-SNAP CURSOR FOR
               SELECT ACCT_NUM, BALANCE, STATUS_CODE
               FROM ACCOUNT_BALANCE
               FOR SYSTEM_TIME AS OF :WS-QUARTER-END-TS
               ORDER BY ACCT_NUM
           END-EXEC

Pattern 2: Change history extraction. You need all versions of a row that existed during a time range — for example, every change to a beneficiary's status during an audit period:

           EXEC SQL
               DECLARE CSR-CHANGES CURSOR FOR
               SELECT STATUS_CODE, BENEFIT_LEVEL,
                      MONTHLY_AMOUNT,
                      SYS_START, SYS_END
               FROM BENEFICIARY_STATUS
               FOR SYSTEM_TIME FROM :WS-AUDIT-START
                                 TO :WS-AUDIT-END
               WHERE BENEFICIARY_ID = :WS-BEN-ID
               ORDER BY SYS_START
           END-EXEC

This returns every version of the row that overlapped the audit window, including versions that started before the window (if they were still current during it) and versions that ended during the window. Each row in the result set includes SYS_START and SYS_END, so you can reconstruct the complete timeline.

Pattern 3: Detecting changes between two points. You need to find all rows that were modified between two timestamps — the "delta extraction" pattern for feeding downstream systems:

SELECT CURR.BENEFICIARY_ID, CURR.STATUS_CODE,
       PREV.STATUS_CODE AS PREV_STATUS
FROM BENEFICIARY_STATUS
FOR SYSTEM_TIME AS OF :WS-CURRENT-TS AS CURR
INNER JOIN BENEFICIARY_STATUS
FOR SYSTEM_TIME AS OF :WS-PREVIOUS-TS AS PREV
ON CURR.BENEFICIARY_ID = PREV.BENEFICIARY_ID
WHERE CURR.STATUS_CODE <> PREV.STATUS_CODE;

This self-join across two temporal points returns only the rows that changed between the two timestamps. Sandra at Federal Benefits uses this pattern to generate daily change feeds for three downstream regulatory systems.

Bi-Temporal Tables

Ahmad at Pinnacle Health needs both: "We need to know what the patient's coverage was (business time), but we also need to know when we recorded that information (system time) for compliance audits." Bi-temporal tables combine both period types on a single table:

CREATE TABLE CLAIMS_COVERAGE
(
    CLAIM_ID         CHAR(15)       NOT NULL,
    PATIENT_ID       CHAR(10)       NOT NULL,
    COVERAGE_CODE    CHAR(5)        NOT NULL,
    APPROVED_AMT     DECIMAL(11,2)  NOT NULL,
    COV_START        DATE           NOT NULL,
    COV_END          DATE           NOT NULL,
    SYS_START        TIMESTAMP(12)  NOT NULL
                     GENERATED ALWAYS AS ROW BEGIN,
    SYS_END          TIMESTAMP(12)  NOT NULL
                     GENERATED ALWAYS AS ROW END,
    TXN_START_ID     TIMESTAMP(12)
                     GENERATED ALWAYS AS TRANSACTION
                     START ID,
    PERIOD BUSINESS_TIME (COV_START, COV_END),
    PERIOD SYSTEM_TIME (SYS_START, SYS_END)
);

Bi-temporal queries can filter on both dimensions:

-- What coverage did we THINK this patient had for July 2025,
-- as recorded in our system on August 1st, 2025?
SELECT CLAIM_ID, COVERAGE_CODE, APPROVED_AMT
FROM CLAIMS_COVERAGE
FOR SYSTEM_TIME AS OF '2025-08-01-00.00.00.000000'
FOR BUSINESS_TIME AS OF '2025-07-15'
WHERE PATIENT_ID = 'P003928471';

This answers the question: "Based on what we knew on August 1st, what was this patient's coverage for July?" If the coverage was later corrected (say, on September 3rd), the bi-temporal query still returns the August 1st view — exactly what auditors need.

Performance Impact of Temporal Tables

Temporal tables add overhead to DML operations:

  • INSERT: Minimal overhead. DB2 just populates the SYS_START column.
  • UPDATE: Moderate overhead. DB2 must INSERT the old row into the history table before applying the UPDATE to the base table. This effectively doubles the I/O for updates.
  • DELETE: Same as UPDATE. The deleted row is inserted into the history table.

For read-heavy workloads (which most mainframe OLTP systems are), the overhead is negligible. For update-heavy batch workloads, budget 40–60% additional elapsed time for the history table writes. Diane's team at Pinnacle Health saw a 52% increase in batch elapsed time when they first enabled temporal on the claims table, which they offset by converting the batch job to multi-row operations — netting a 30% overall improvement.

📊 Combined Impact at Pinnacle Health: Before temporal + multi-row migration, Pinnacle's claims batch ran 4.2 hours. After adding temporal tables (which increased it to 6.4 hours), then converting to multi-row FETCH/INSERT with rowset size 500, the job dropped to 2.9 hours. The temporal overhead was more than paid for by the multi-row efficiency gains.


7.5 Recursive CTEs: Hierarchies Without Recursive Program Calls

The Hierarchy Problem

Every enterprise has hierarchical data: organizational charts, account hierarchies (master accounts with sub-accounts), product category trees, regulatory reporting structures, geographic region hierarchies. The traditional COBOL approach is a cursor loop with a stack:

1. FETCH the root node.
2. Push it onto a WORKING-STORAGE stack.
3. FETCH all children of the current node.
4. For each child, push it and repeat.
5. Pop the stack when you exhaust children.

This generates one SELECT per node in the hierarchy. An org chart with 50,000 employees produces 50,000+ SQL calls. An account hierarchy with 12 levels deep and 200,000 nodes generates 200,000 calls with complex stack management logic that's a maintenance nightmare.

CTE Basics

A Common Table Expression (CTE) is a named temporary result set defined in a WITH clause:

WITH ACTIVE_ACCOUNTS AS (
    SELECT ACCT_NUM, ACCT_NAME, ACCT_TYPE, BALANCE
    FROM ACCOUNTS
    WHERE STATUS = 'A'
)
SELECT ACCT_NUM, ACCT_NAME, BALANCE
FROM ACTIVE_ACCOUNTS
WHERE ACCT_TYPE = 'CHECKING'
AND BALANCE > 10000;

CTEs improve readability and allow you to reference the same subquery multiple times without repeating it. But the killer feature is recursion.

Recursive CTE Syntax

A recursive CTE has two parts: 1. The anchor member: A SELECT that retrieves the starting rows (the root nodes) 2. The recursive member: A SELECT that joins back to the CTE itself, retrieving the next level

WITH ACCT_HIERARCHY (ACCT_NUM, PARENT_ACCT, ACCT_NAME,
                     ACCT_LEVEL, HIERARCHY_PATH) AS (
    -- Anchor: top-level accounts (no parent)
    SELECT ACCT_NUM, PARENT_ACCT, ACCT_NAME,
           1 AS ACCT_LEVEL,
           CAST(ACCT_NUM AS VARCHAR(500)) AS HIERARCHY_PATH
    FROM ACCOUNT_STRUCTURE
    WHERE PARENT_ACCT IS NULL

    UNION ALL

    -- Recursive: children of current level
    SELECT CHILD.ACCT_NUM, CHILD.PARENT_ACCT,
           CHILD.ACCT_NAME,
           PARENT.ACCT_LEVEL + 1,
           PARENT.HIERARCHY_PATH CONCAT ' > '
               CONCAT CHILD.ACCT_NUM
    FROM ACCOUNT_STRUCTURE CHILD
    INNER JOIN ACCT_HIERARCHY PARENT
        ON CHILD.PARENT_ACCT = PARENT.ACCT_NUM
)
SELECT ACCT_NUM, PARENT_ACCT, ACCT_NAME,
       ACCT_LEVEL, HIERARCHY_PATH
FROM ACCT_HIERARCHY
ORDER BY HIERARCHY_PATH;

DB2 executes this iteratively: 1. Run the anchor query — get all root accounts. 2. Run the recursive query using the anchor results as ACCT_HIERARCHY — get all level-2 accounts. 3. Run the recursive query using the level-2 results — get all level-3 accounts. 4. Repeat until the recursive query returns no rows.

The entire hierarchy is resolved in a single SQL statement. One thread switch. DB2 handles the iteration internally, using work files in the TEMP database.

Recursive CTEs in COBOL

From COBOL, you declare a cursor over the recursive CTE and fetch the results:

           EXEC SQL
               DECLARE CSR-ACCT-HIER CURSOR FOR
               WITH ACCT_HIERARCHY
                   (ACCT_NUM, PARENT_ACCT, ACCT_NAME,
                    ACCT_LEVEL, HIERARCHY_PATH) AS (
                   SELECT ACCT_NUM, PARENT_ACCT, ACCT_NAME,
                          1, CAST(ACCT_NUM AS VARCHAR(500))
                   FROM ACCOUNT_STRUCTURE
                   WHERE PARENT_ACCT IS NULL
                   AND MASTER_ACCT = :WS-MASTER-ACCT

                   UNION ALL

                   SELECT C.ACCT_NUM, C.PARENT_ACCT,
                          C.ACCT_NAME,
                          P.ACCT_LEVEL + 1,
                          P.HIERARCHY_PATH CONCAT ' > '
                              CONCAT C.ACCT_NUM
                   FROM ACCOUNT_STRUCTURE C
                   INNER JOIN ACCT_HIERARCHY P
                       ON C.PARENT_ACCT = P.ACCT_NUM
               )
               SELECT ACCT_NUM, PARENT_ACCT, ACCT_NAME,
                      ACCT_LEVEL, HIERARCHY_PATH
               FROM ACCT_HIERARCHY
               ORDER BY HIERARCHY_PATH
           END-EXEC.

You can combine this with multi-row FETCH for maximum efficiency. The cursor declaration is more complex, but the FETCH logic is identical to what you learned in Section 7.2.

Preventing Infinite Recursion

If your data has cycles (account A's parent is B, B's parent is C, C's parent is A), the recursive CTE will loop until DB2 runs out of resources. Two safeguards:

1. Maximum recursion depth:

Use a level counter in the recursive member and filter on it in the final SELECT:

-- In the recursive member, add:
WHERE P.ACCT_LEVEL < 20  -- stop at 20 levels deep

Or better, enforce it within the CTE itself so DB2 stops iterating.

2. Cycle detection with HIERARCHY_PATH:

Check whether the current node already appears in the path:

-- In the recursive member, add:
AND LOCATE(C.ACCT_NUM, P.HIERARCHY_PATH) = 0

⚠️ Production Discipline: Always include a depth limit in recursive CTEs. Even if your data shouldn't have cycles, a single bad row from a data migration or a bug in the account setup program can create one. Without a depth limit, that one bad row will consume your TEMP database and potentially bring down the DB2 subsystem. Rob at CNB learned this the hard way during a test run — the runaway recursive CTE filled 50 GB of TEMP space in under two minutes before they could cancel it.

Approval Chain Example for HA Banking

For the HA Banking system, recursive CTEs solve the approval chain problem. When a high-value transaction needs multi-level approval, you need to traverse the organizational hierarchy from the initiator up to someone with sufficient authority:

WITH APPROVAL_CHAIN (EMP_ID, EMP_NAME, TITLE,
                     APPROVAL_LIMIT, MGR_ID,
                     CHAIN_LEVEL) AS (
    -- Anchor: the employee who initiated the transaction
    SELECT EMP_ID, EMP_NAME, TITLE, APPROVAL_LIMIT,
           MGR_ID, 1
    FROM EMPLOYEE_DIRECTORY
    WHERE EMP_ID = :WS-INITIATOR-ID

    UNION ALL

    SELECT MGR.EMP_ID, MGR.EMP_NAME, MGR.TITLE,
           MGR.APPROVAL_LIMIT, MGR.MGR_ID,
           SUB.CHAIN_LEVEL + 1
    FROM EMPLOYEE_DIRECTORY MGR
    INNER JOIN APPROVAL_CHAIN SUB
        ON MGR.EMP_ID = SUB.MGR_ID
    WHERE SUB.APPROVAL_LIMIT < :WS-TXN-AMOUNT
      AND SUB.CHAIN_LEVEL < 10
)
SELECT EMP_ID, EMP_NAME, TITLE, APPROVAL_LIMIT,
       CHAIN_LEVEL
FROM APPROVAL_CHAIN
ORDER BY CHAIN_LEVEL;

This returns the complete approval chain, stopping when it reaches someone whose approval limit exceeds the transaction amount (or the CEO, whichever comes first).


7.6 OLAP Functions: Replacing Cursor Logic with SQL

The Running Balance Problem

Here's a batch routine that exists in some form at every bank:

Open cursor ordered by ACCT_NUM, TXN_DATE.
Set PREV-ACCT to spaces.
Set RUNNING-BAL to 0.
Loop:
    FETCH next row.
    IF ACCT_NUM not equal PREV-ACCT
        MOVE 0 to RUNNING-BAL
        MOVE ACCT_NUM to PREV-ACCT
    END-IF.
    ADD TXN-AMT TO RUNNING-BAL.
    UPDATE the row with RUNNING-BAL.

This is a running total calculation. It requires the rows to be processed in exact order, it requires maintaining state across rows, and it requires updating each row individually. The entire batch window for CNB's running balance job was 2.1 hours — processing 500 million transactions sequentially.

OLAP functions (also called window functions or analytic functions) let DB2 do this calculation within the SQL itself.

Core OLAP Functions

ROW_NUMBER() — assigns a sequential integer to each row within a partition:

SELECT ACCT_NUM, TXN_DATE, TXN_AMT,
       ROW_NUMBER() OVER (
           PARTITION BY ACCT_NUM
           ORDER BY TXN_DATE, TXN_SEQUENCE
       ) AS TXN_SEQ_NUM
FROM TRANSACTION_HISTORY
WHERE PROCESS_DATE = CURRENT DATE;

This replaces the COBOL pattern of maintaining a counter that resets at each account break. Every row gets a sequence number within its account, ordered by transaction date.

RANK() and DENSE_RANK() — like ROW_NUMBER but handle ties:

-- Find the top 5 transaction amounts per account
SELECT ACCT_NUM, TXN_AMT, TXN_DATE,
       RANK() OVER (
           PARTITION BY ACCT_NUM
           ORDER BY TXN_AMT DESC
       ) AS AMT_RANK
FROM TRANSACTION_HISTORY
WHERE PROCESS_DATE = CURRENT DATE;

RANK() leaves gaps (1, 2, 2, 4). DENSE_RANK() doesn't (1, 2, 2, 3). Choose based on whether your business logic needs gap-free ranking.

LAG() and LEAD() — access previous and next rows without a self-join:

SELECT ACCT_NUM, TXN_DATE, TXN_AMT,
       LAG(TXN_AMT, 1) OVER (
           PARTITION BY ACCT_NUM
           ORDER BY TXN_DATE
       ) AS PREV_TXN_AMT,
       TXN_AMT - LAG(TXN_AMT, 1) OVER (
           PARTITION BY ACCT_NUM
           ORDER BY TXN_DATE
       ) AS AMT_CHANGE
FROM TRANSACTION_HISTORY;

This replaces the classic COBOL pattern of saving the previous row's values in WORKING-STORAGE and comparing them to the current row. LAG(column, n) looks back n rows; LEAD(column, n) looks forward n rows.

Running totals with SUM() OVER:

SELECT ACCT_NUM, TXN_DATE, TXN_AMT,
       SUM(TXN_AMT) OVER (
           PARTITION BY ACCT_NUM
           ORDER BY TXN_DATE, TXN_SEQUENCE
           ROWS BETWEEN UNBOUNDED PRECEDING
                    AND CURRENT ROW
       ) AS RUNNING_BALANCE
FROM TRANSACTION_HISTORY
WHERE PROCESS_DATE = CURRENT DATE
ORDER BY ACCT_NUM, TXN_DATE, TXN_SEQUENCE;

That single SQL statement replaces the entire running balance batch program. DB2 partitions the data by account, orders within each partition by date and sequence, and computes the cumulative sum from the first row through the current row.

Window Frame Specifications

The ROWS BETWEEN and RANGE BETWEEN clauses define the window frame — which rows contribute to the aggregate:

Frame Meaning
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW All rows from start of partition to current row (running total)
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING Previous row, current row, next row (3-row moving average)
ROWS BETWEEN 29 PRECEDING AND CURRENT ROW 30-row moving window
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW All rows with values <= current row's value (handles ties differently)

The difference between ROWS and RANGE matters when you have duplicate ORDER BY values. ROWS counts physical rows; RANGE uses the logical value. For transaction processing where you have a unique sequence number in the ORDER BY, they're equivalent. For summary reporting where you might order by date (and multiple transactions share a date), RANGE groups all same-date transactions together.

OLAP Functions in COBOL

You use OLAP functions in cursor declarations and fetch the computed values alongside the base columns:

       01  WS-RESULT-ARRAY.
           05  WS-RESULT-ROW OCCURS 500 TIMES.
               10  WS-R-ACCT     PIC X(12).
               10  WS-R-TXN-DATE PIC X(10).
               10  WS-R-TXN-AMT  PIC S9(13)V99 COMP-3.
               10  WS-R-RUN-BAL  PIC S9(15)V99 COMP-3.
               10  WS-R-PREV-AMT PIC S9(13)V99 COMP-3.
               10  WS-R-TXN-RANK PIC S9(9) COMP.

           EXEC SQL
               DECLARE CSR-OLAP-TXN CURSOR
               WITH ROWSET POSITIONING FOR
               SELECT ACCT_NUM, TXN_DATE, TXN_AMT,
                   SUM(TXN_AMT) OVER (
                       PARTITION BY ACCT_NUM
                       ORDER BY TXN_DATE, TXN_SEQ
                       ROWS BETWEEN UNBOUNDED PRECEDING
                                AND CURRENT ROW
                   ) AS RUNNING_BAL,
                   LAG(TXN_AMT, 1, 0) OVER (
                       PARTITION BY ACCT_NUM
                       ORDER BY TXN_DATE, TXN_SEQ
                   ) AS PREV_TXN_AMT,
                   ROW_NUMBER() OVER (
                       PARTITION BY ACCT_NUM
                       ORDER BY TXN_AMT DESC
                   ) AS TXN_RANK
               FROM TRANSACTION_HISTORY
               WHERE PROCESS_DATE = CURRENT DATE
               ORDER BY ACCT_NUM, TXN_DATE, TXN_SEQ
           END-EXEC.

Now your COBOL program fetches pre-computed running balances, previous transaction amounts, and rankings. The COBOL logic becomes: fetch rows, apply business rules, write results. All the state management and accumulation logic lives in SQL where DB2's optimizer can parallelize it.

Performance Implications

OLAP functions require DB2 to sort the data according to the PARTITION BY and ORDER BY clauses. If the data is already in the right order (because of a clustering index or a prior ORDER BY), the sort is free. If not, DB2 uses work files in TEMP for the sort.

For CNB's running balance job: - Old approach (cursor loop + COBOL accumulation): 2.1 hours - OLAP function approach (multi-row FETCH of pre-computed results): 38 minutes

The 70% reduction came from three sources: 1. Eliminated 500 million individual UPDATE statements for the running balance (now computed on the fly) 2. Multi-row FETCH reduced thread switches by 99.8% 3. DB2's internal sort was faster than the sequential cursor processing because DB2 could use parallelism

📊 Benchmark Note: OLAP functions with large partitions (millions of rows per partition) consume significant TEMP space. Kwame's team sized TEMP at 2x their largest partition. For the HA Banking system, ensure your TEMP database is sized for your largest account's transaction volume. The top 0.1% of CNB accounts have over 1 million transactions per day — those partitions drive the TEMP requirement.

Combining Multiple OLAP Functions

You can use multiple OLAP functions in the same SELECT, each with its own PARTITION BY and ORDER BY:

SELECT ACCT_NUM, BRANCH_CODE, TXN_DATE, TXN_AMT,
    -- Running balance per account
    SUM(TXN_AMT) OVER (
        PARTITION BY ACCT_NUM
        ORDER BY TXN_DATE
        ROWS UNBOUNDED PRECEDING
    ) AS ACCT_RUNNING_BAL,
    -- Running total per branch
    SUM(TXN_AMT) OVER (
        PARTITION BY BRANCH_CODE
        ORDER BY TXN_DATE
        ROWS UNBOUNDED PRECEDING
    ) AS BRANCH_RUNNING_TOTAL,
    -- Rank within branch by amount
    RANK() OVER (
        PARTITION BY BRANCH_CODE
        ORDER BY TXN_AMT DESC
    ) AS BRANCH_AMT_RANK,
    -- Percent of branch total
    TXN_AMT /
    SUM(TXN_AMT) OVER (
        PARTITION BY BRANCH_CODE
    ) * 100 AS PCT_OF_BRANCH
FROM TRANSACTION_HISTORY
WHERE PROCESS_DATE = CURRENT DATE;

Each OVER clause defines an independent window. DB2 may need multiple sort passes for different PARTITION BY clauses, but it's still vastly more efficient than implementing these calculations in COBOL with multiple cursor passes over the data.

FIRST_VALUE and LAST_VALUE — Accessing Boundary Rows

Two additional OLAP functions that replace common COBOL patterns:

SELECT ACCT_NUM, TXN_DATE, TXN_AMT,
       FIRST_VALUE(TXN_AMT) OVER (
           PARTITION BY ACCT_NUM
           ORDER BY TXN_DATE
           ROWS BETWEEN UNBOUNDED PRECEDING
                    AND UNBOUNDED FOLLOWING
       ) AS FIRST_TXN_AMT,
       LAST_VALUE(TXN_AMT) OVER (
           PARTITION BY ACCT_NUM
           ORDER BY TXN_DATE
           ROWS BETWEEN UNBOUNDED PRECEDING
                    AND UNBOUNDED FOLLOWING
       ) AS LAST_TXN_AMT
FROM TRANSACTION_HISTORY
WHERE PROCESS_DATE = CURRENT DATE;

FIRST_VALUE returns the value from the first row in the window. LAST_VALUE returns the value from the last row. These replace the classic COBOL pattern of saving the first and last values in WORKING-STORAGE variables during a break-level loop. Note that LAST_VALUE requires the window frame to extend to UNBOUNDED FOLLOWING — without it, the default frame (UNBOUNDED PRECEDING to CURRENT ROW) means LAST_VALUE always returns the current row's value, which is never what you want.

NTILE() — Distribution Buckets Without COBOL Math

NTILE divides a partition's rows into a specified number of roughly equal groups and assigns a bucket number to each row. This is the function that replaces the COBOL pattern of counting total rows, dividing by the number of buckets, and manually assigning group numbers in a second pass.

SELECT ACCT_NUM, TXN_DATE, TXN_AMT,
       NTILE(10) OVER (
           PARTITION BY BRANCH_CODE
           ORDER BY TXN_AMT DESC
       ) AS DECILE
FROM TRANSACTION_HISTORY
WHERE PROCESS_DATE = CURRENT DATE;

This assigns every transaction within each branch to a decile based on amount. Decile 1 holds the top 10% of transactions by amount; decile 10 holds the bottom 10%. CNB's fraud detection team uses exactly this pattern to flag transactions in the top decile for manual review. Before NTILE, they ran a two-pass COBOL program — first pass counted rows per branch, second pass assigned deciles using a counter and a threshold array in WORKING-STORAGE. The two-pass approach required either reading the data twice (doubling I/O) or caching row counts in a branch array (limiting scalability to however many branches fit in the array). NTILE does it in a single SQL pass.

The function handles uneven division cleanly. If a branch has 103 transactions and you request NTILE(10), the first three buckets get 11 rows each and the remaining seven get 10 rows each. DB2 distributes the remainder across the lower-numbered buckets. Your COBOL program doesn't need to handle the modulo arithmetic.

A common production use is percentile-based reporting for regulatory compliance. Federal Benefits uses NTILE(100) to assign percentile ranks to benefit payment amounts across program categories, feeding directly into their quarterly variance report. The entire calculation that previously required a COBOL sort, a percentile assignment program, and a merge step now lives in a single cursor declaration.

-- Quartile analysis for monthly account activity reporting
SELECT BRANCH_CODE, ACCT_NUM, MONTHLY_VOLUME,
       NTILE(4) OVER (
           PARTITION BY BRANCH_CODE
           ORDER BY MONTHLY_VOLUME
       ) AS VOLUME_QUARTILE,
       AVG(MONTHLY_VOLUME) OVER (
           PARTITION BY BRANCH_CODE
       ) AS BRANCH_AVG
FROM ACCOUNT_ACTIVITY_SUMMARY
WHERE REPORT_MONTH = :WS-REPORT-MONTH;

Combining NTILE with other window functions in the same SELECT — as shown above with AVG() OVER — gives you multi-dimensional analysis in a single pass. Each row gets its quartile assignment and its branch average without any COBOL accumulation logic.


7.7 Choosing Set-Based vs. Cursor-Based: Decision Framework

Not every problem should be solved with set-based SQL. Here's the decision framework Lisa at CNB developed after their migration:

Use Set-Based (OLAP, MERGE, Multi-Row) When:

  1. The logic is data-dependent, not business-rule-dependent. Running totals, rankings, duplicate detection, gap analysis — these are data transformations that SQL handles natively.

  2. The entire result set fits through a single logical operation. If you can express the transformation as "for each row, compute X based on neighboring rows," that's an OLAP function.

  3. Error handling is at the batch level. If a failure means "roll back the entire batch and restart," set-based operations are natural — they're already atomic.

  4. The row count is large. The thread-switching savings from multi-row operations scale linearly with row count. For 1,000 rows, the savings are negligible. For 10 million rows, they're transformative.

Use Cursor-Based When:

  1. Each row requires complex conditional business logic. If processing a transaction requires calling an external service, checking a rules engine, or applying 47 different business rules based on transaction type, account type, and customer status — that logic lives in COBOL, not SQL.

  2. Error handling is at the row level. If a failure on row 347 means "log the error, skip the row, continue processing," you need cursor-based logic to maintain control over each row's fate.

  3. The processing has side effects. If processing a row triggers a CICS transaction, sends a message to MQ, writes to a VSAM file, or calls a web service — those operations can't be embedded in SQL.

  4. The data volume is small. For a lookup table with 500 rows, the overhead difference between single-row and multi-row is measured in milliseconds. Don't add complexity for no benefit.

The Hybrid Approach (What Production Actually Looks Like)

Real production programs use both. The HA Banking batch job will:

  1. Multi-row FETCH transactions from the queue (set-based).
  2. COBOL logic to validate each transaction, apply business rules, determine routing (cursor-based).
  3. Multi-row INSERT valid transactions to the posting table (set-based).
  4. MERGE account summary records (set-based).
  5. OLAP function query for the running balance report (set-based).
  6. Cursor loop for transactions that failed validation, each requiring individual error handling and notification (cursor-based).

The hybrid approach is not a compromise — it's the correct architecture. Use SQL's power for data operations. Use COBOL's power for business logic. The boundary between them is the boundary between "what the database does" and "what the application decides."

🔗 Connection to Chapter 1: Remember the DB2 subsystem architecture from Chapter 1 — the separation between the application address space and the DB2 address space. Every technique in this chapter reduces the number of times your program crosses that boundary. Multi-row operations cross it fewer times. OLAP functions push computation to the DB2 side so results cross once. Recursive CTEs resolve entire hierarchies without crossing at all. The architecture hasn't changed; your use of it has.

Migration Strategy

If you're converting existing cursor-loop programs to use these techniques, Lisa recommends this order:

  1. Multi-row FETCH first. The change is mechanical (add OCCURS, change FETCH syntax, add loop) and the performance gain is immediate and measurable. Low risk, high reward.

  2. Multi-row INSERT second. Same mechanical change on the output side. The accumulate-then-flush pattern is straightforward.

  3. MERGE third. Replacing SELECT-then-INSERT-or-UPDATE patterns with MERGE is moderately complex — you need to verify that the ON clause columns have proper indexes and that no trigger conflicts exist.

  4. OLAP functions fourth. Replacing COBOL accumulation logic with OLAP functions requires rethinking the program structure. The SQL is simpler, but the COBOL changes may be extensive if the program's control flow depends on the accumulation.

  5. Recursive CTEs last. These replace the most complex COBOL logic (stack-based hierarchy traversal) and require the most testing. But when they work, they eliminate entire paragraphs of code.


7.8 Spaced Review: Connecting Back

Chapter 1 Review: z/OS DB2 Subsystem

Every technique in this chapter operates within the subsystem architecture you learned in Chapter 1. Multi-row operations reduce cross-address-space calls. OLAP functions shift computation to the DB2 engine. Recursive CTEs resolve within DB2's work file processing. Understanding the cost of the application-to-DB2 boundary is what motivates all of these optimizations.

Quick check: In which address space does the OLAP function's sort operation execute — the application's or DB2's? (Answer: DB2's. The sort uses DB2's work files in the TEMP database, not your program's WORKING-STORAGE.)

Chapter 6 Review: The Optimizer

Every SQL construct in this chapter affects the access path the optimizer chooses:

  • Multi-row FETCH doesn't change the access path — it changes the delivery mechanism.
  • MERGE uses the index on the ON clause columns. No index means tablespace scan per source row.
  • Recursive CTEs generate work file access paths that appear in EXPLAIN as "work file scan" operations.
  • OLAP functions add sort operations to the access path if the data isn't already in the required order.

Quick check: You add ROW_NUMBER() OVER (PARTITION BY BRANCH_CODE ORDER BY TXN_DATE) to a query. The table has a clustering index on (ACCT_NUM, TXN_DATE). Will DB2 need a sort? (Answer: Yes. The clustering index is on ACCT_NUM, not BRANCH_CODE. DB2 must sort the data by BRANCH_CODE, TXN_DATE to evaluate the OLAP function.)


7.9 HA Banking Project Checkpoint

Objective

Implement multi-row operations and temporal tables for the HA Banking Transaction Processing System.

Requirements

  1. Multi-row transaction fetch: Modify the transaction processing program to use multi-row FETCH with a rowset size of 500. The program reads from HA_TXN_QUEUE and processes daily transactions.

  2. Multi-row audit insert: Implement accumulate-then-flush pattern for writing to HA_AUDIT_TRAIL. Use atomic insert for the HA_POSTED_TXN table and non-atomic for audit.

  3. Temporal table for account balances: Create HA_ACCOUNT_BALANCE as a system-period temporal table with history table HA_ACCOUNT_BALANCE_HIST. Every balance change must be tracked.

  4. MERGE for account summaries: Replace the existing SELECT/INSERT/UPDATE pattern for HA_ACCOUNT_SUMMARY with a MERGE statement.

  5. Running balance calculation: Implement the daily running balance report using OLAP functions (SUM() OVER with ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW).

  6. Approval chain query: Write a recursive CTE that traverses HA_EMPLOYEE_HIERARCHY to determine the approval chain for transactions exceeding the initiator's authority limit.

Deliverables

  • DDL for HA_ACCOUNT_BALANCE temporal table and its history table
  • Modified COBOL source with multi-row FETCH and INSERT
  • MERGE statement for HA_ACCOUNT_SUMMARY
  • Recursive CTE for approval chain
  • OLAP query for running balance report
  • Test cases demonstrating temporal query ("as of" a specific timestamp)

Acceptance Criteria

  • Multi-row FETCH uses indicator arrays for all nullable columns
  • Final flush is performed after the main processing loop
  • Temporal table DDL includes both SYS_START/SYS_END and TRANSACTION START ID
  • Recursive CTE includes depth limit of 15 levels
  • OLAP running balance query partitions by account and orders by date + sequence
  • All SQLCODEs are checked after every SQL statement (no exceptions)

Architecture Note

The temporal table for account balances is not optional. The HA Banking system operates under OCC (Office of the Comptroller of the Currency) examination requirements. Examiners will ask: "What was this account's balance at the time this suspicious transaction was processed?" Without temporal tables, answering that question requires reconstructing balance history from the audit trail — a process that takes hours per account and is error-prone. With temporal tables, it's a single SQL query.


Chapter Summary

You now have five tools that didn't exist in your COBOL-DB2 toolkit this morning:

Multi-row FETCH and INSERT eliminate thread-switching overhead by processing blocks of rows instead of individual rows. The performance improvement scales linearly with volume — exactly the characteristic you want for batch programs processing millions of rows. Remember: check SQLERRD(3) for actual row count, always include indicator arrays, and never forget the final flush.

MERGE replaces the SELECT-then-decide-then-write pattern with a single atomic operation. It eliminates concurrency gaps, reduces thread switches by 67%, and simplifies your COBOL logic. Ensure the ON clause columns are indexed and watch for duplicate source rows.

Temporal tables let DB2 maintain historical versions automatically. System-period for tracking when data was current in the database. Application-period for business-meaningful validity periods. Bi-temporal for both. The history table needs a retention policy, and DML overhead increases 40–60% for update-heavy workloads — offset by combining with multi-row operations.

Recursive CTEs resolve hierarchies in a single SQL statement, replacing stack-based COBOL logic that generates one SQL call per node. Always include a depth limit. Always check for cycles. The performance improvement grows with hierarchy size.

OLAP functions push running totals, rankings, row comparisons, and moving aggregates into SQL. They replace the most complex cursor-processing patterns with declarative SQL that DB2 can optimize and parallelize. The PARTITION BY and ORDER BY clauses are the keys — get them right and the function does the rest.

The decision isn't "which technique do I use?" It's "which combination?" Production batch programs use multi-row FETCH to get data, COBOL logic to apply business rules, multi-row INSERT to write results, MERGE to maintain summary tables, OLAP functions for analytical calculations, and recursive CTEs for hierarchy traversal. The hybrid approach is the production approach.

Next chapter, we go deeper into DB2 locking and concurrency — because all these set-based operations change the locking profile of your programs in ways that will surprise you if you're not prepared.