Chapter 32: Transaction Design Patterns

Generated by Claude

26 min read

> "A transaction is a promise. You promise that either everything happens or nothing happens. Breaking that promise breaks trust — and in banking, trust is all we have." — Maria Chen, GlobalBank senior developer

In This Chapter

32.1 The Unit of Work Concept
32.2 COMMIT and ROLLBACK in Practice
32.3 Two-Phase Commit
32.4 Deadlock Prevention
32.5 Optimistic vs. Pessimistic Locking
32.6 Long-Running Transactions — The Saga Pattern
32.7 Compensation Logic
32.8 Idempotent Transaction Design
32.9 Throughput and Response Time Optimization
32.10 GlobalBank: The Atomic Transfer
32.11 MedClaim: Claim Adjudication Transaction
32.12 Transaction Anti-Patterns
32.13 VSAM Transaction Patterns
32.14 Error Recovery Patterns
32.15 Transaction Design in Mixed Environments
32.16 Batch Checkpoint/Restart Design
32.17 Transaction Design Review Checklist
32.18 Transaction Monitoring and Diagnostics
32.19 Transaction Testing Strategies
32.20 Transaction Isolation Levels
32.21 Putting It All Together: The Transaction Design Spectrum
32.22 Chapter Summary

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

Chapter 32: Transaction Design Patterns

"A transaction is a promise. You promise that either everything happens or nothing happens. Breaking that promise breaks trust — and in banking, trust is all we have." — Maria Chen, GlobalBank senior developer

Derek Washington's first production incident at GlobalBank taught him a lesson no textbook had prepared him for. A funds transfer program crashed halfway through execution. It had debited $5,000 from the source account but never credited the destination. The money had vanished into the ether — not lost, technically, but trapped in an inconsistent state that took four hours and three senior developers to untangle. "This," Maria Chen said, reviewing the code afterward, "is why we design transactions."

This chapter teaches you to design transactions that keep data consistent even when things go wrong — and things will go wrong. Hardware fails, networks drop, programs abend, operators cancel jobs, and databases run out of space. Your transaction design must account for all of these scenarios.

32.1 The Unit of Work Concept

A unit of work (UOW) is a sequence of operations that must either all complete successfully or all be undone. In database terms, this is the "A" in ACID:

Atomicity: All or nothing — either every operation in the unit completes, or none of them take effect
Consistency: The transaction moves the database from one valid state to another
Isolation: Concurrent transactions do not interfere with each other
Durability: Once committed, changes survive system failures

In mainframe COBOL, a unit of work begins implicitly when your program starts (or after the last commit point) and ends explicitly when you issue a COMMIT (or equivalent sync point operation) or implicitly when the program terminates.

32.1.1 What Constitutes a Transaction?

Consider GlobalBank's funds transfer. The business operation is: "Move $5,000 from Account A to Account B." The database operations are:

Read Account A balance → verify sufficient funds
Subtract $5,000 from Account A → write back
Add $5,000 to Account B → write back
Write an audit record for the debit
Write an audit record for the credit
Update daily transaction totals

All six operations form a single unit of work. If any one fails, they must all be undone. You cannot have a state where Account A is debited but Account B is not credited.

📊 The Cost of Inconsistency: In financial systems, data inconsistency is not just a technical problem — it is a regulatory violation. Banking regulators require that all transactions balance to zero (debits must equal credits). A system that loses money, even temporarily, can trigger audits, fines, and loss of charter. At GlobalBank, every transaction is designed to fail safely.

32.1.2 Implicit vs. Explicit Units of Work

In a batch COBOL program: - Implicit: The program's entire execution is one unit of work. If it abends, all changes since the last sync point are rolled back. If it completes normally, all changes are committed at program termination. - Explicit: The program issues COMMIT at defined intervals, breaking the work into multiple units. Each COMMIT is a point of no return — changes before the COMMIT survive even if the program abends later.

In CICS: - Each task (transaction) is typically one unit of work - EXEC CICS SYNCPOINT commits the current unit of work - EXEC CICS SYNCPOINT ROLLBACK undoes changes since the last sync point

In IMS: - The CHKP (checkpoint) call serves as the sync point - ROLB rolls back to the last checkpoint

32.2 COMMIT and ROLLBACK in Practice

32.2.1 DB2 COMMIT and ROLLBACK

In embedded SQL programs, COMMIT and ROLLBACK are straightforward:

      *================================================================*
      * PROGRAM: XFER0100                                              *
      * PURPOSE: Transfer funds between accounts                       *
      * AUTHOR:  Maria Chen / GlobalBank Development                   *
      *================================================================*
       IDENTIFICATION DIVISION.
       PROGRAM-ID. XFER0100.

       DATA DIVISION.
       WORKING-STORAGE SECTION.

           EXEC SQL INCLUDE SQLCA END-EXEC.

       01  WS-TRANSFER-FIELDS.
           05  WS-SOURCE-ACCT     PIC X(12).
           05  WS-DEST-ACCT       PIC X(12).
           05  WS-AMOUNT          PIC S9(11)V99 COMP-3.
           05  WS-SOURCE-BAL      PIC S9(11)V99 COMP-3.
           05  WS-DEST-BAL        PIC S9(11)V99 COMP-3.
           05  WS-TIMESTAMP       PIC X(26).
           05  WS-XFER-ID         PIC X(20).

       01  WS-FLAGS.
           05  WS-ERROR-FLAG      PIC X(1) VALUE 'N'.
               88 TRANSFER-OK     VALUE 'N'.
               88 TRANSFER-FAILED VALUE 'Y'.
           05  WS-ERROR-MSG       PIC X(80).

       PROCEDURE DIVISION.

       0000-MAIN.
           PERFORM 1000-INITIALIZE
           PERFORM 2000-EXECUTE-TRANSFER
           PERFORM 3000-FINALIZE
           STOP RUN.

       1000-INITIALIZE.
           MOVE 'ACCT00001001' TO WS-SOURCE-ACCT
           MOVE 'ACCT00001002' TO WS-DEST-ACCT
           MOVE 5000.00        TO WS-AMOUNT
           MOVE FUNCTION CURRENT-DATE TO WS-TIMESTAMP
           STRING 'XFER' WS-TIMESTAMP(1:14)
               DELIMITED BY SIZE INTO WS-XFER-ID.

       2000-EXECUTE-TRANSFER.
      *--- Step 1: Lock and read source account ---*
           EXEC SQL
               SELECT ACCT_BALANCE
               INTO :WS-SOURCE-BAL
               FROM ACCOUNT
               WHERE ACCT_NUMBER = :WS-SOURCE-ACCT
               FOR UPDATE OF ACCT_BALANCE
               WITH RS USE AND KEEP EXCLUSIVE LOCKS
           END-EXEC

           IF SQLCODE NOT = 0
               MOVE 'SOURCE ACCOUNT NOT FOUND'
                   TO WS-ERROR-MSG
               SET TRANSFER-FAILED TO TRUE
               GO TO 2000-EXIT
           END-IF

      *--- Step 2: Verify sufficient funds ---*
           IF WS-SOURCE-BAL < WS-AMOUNT
               MOVE 'INSUFFICIENT FUNDS' TO WS-ERROR-MSG
               SET TRANSFER-FAILED TO TRUE
               GO TO 2000-EXIT
           END-IF

      *--- Step 3: Lock and read destination account ---*
           EXEC SQL
               SELECT ACCT_BALANCE
               INTO :WS-DEST-BAL
               FROM ACCOUNT
               WHERE ACCT_NUMBER = :WS-DEST-ACCT
               FOR UPDATE OF ACCT_BALANCE
               WITH RS USE AND KEEP EXCLUSIVE LOCKS
           END-EXEC

           IF SQLCODE NOT = 0
               MOVE 'DEST ACCOUNT NOT FOUND' TO WS-ERROR-MSG
               SET TRANSFER-FAILED TO TRUE
               GO TO 2000-EXIT
           END-IF

      *--- Step 4: Debit source ---*
           EXEC SQL
               UPDATE ACCOUNT
               SET ACCT_BALANCE = ACCT_BALANCE - :WS-AMOUNT,
                   LAST_UPDATE = CURRENT TIMESTAMP
               WHERE ACCT_NUMBER = :WS-SOURCE-ACCT
           END-EXEC

           IF SQLCODE NOT = 0
               MOVE 'DEBIT FAILED' TO WS-ERROR-MSG
               SET TRANSFER-FAILED TO TRUE
               GO TO 2000-EXIT
           END-IF

      *--- Step 5: Credit destination ---*
           EXEC SQL
               UPDATE ACCOUNT
               SET ACCT_BALANCE = ACCT_BALANCE + :WS-AMOUNT,
                   LAST_UPDATE = CURRENT TIMESTAMP
               WHERE ACCT_NUMBER = :WS-DEST-ACCT
           END-EXEC

           IF SQLCODE NOT = 0
               MOVE 'CREDIT FAILED' TO WS-ERROR-MSG
               SET TRANSFER-FAILED TO TRUE
               GO TO 2000-EXIT
           END-IF

      *--- Step 6: Write audit records ---*
           EXEC SQL
               INSERT INTO TRANSACTION_LOG
               (XFER_ID, ACCT_NUMBER, TRANS_TYPE,
                TRANS_AMOUNT, TRANS_TIMESTAMP)
               VALUES
               (:WS-XFER-ID, :WS-SOURCE-ACCT, 'DEBIT',
                :WS-AMOUNT, CURRENT TIMESTAMP)
           END-EXEC

           IF SQLCODE NOT = 0
               MOVE 'DEBIT AUDIT FAILED' TO WS-ERROR-MSG
               SET TRANSFER-FAILED TO TRUE
               GO TO 2000-EXIT
           END-IF

           EXEC SQL
               INSERT INTO TRANSACTION_LOG
               (XFER_ID, ACCT_NUMBER, TRANS_TYPE,
                TRANS_AMOUNT, TRANS_TIMESTAMP)
               VALUES
               (:WS-XFER-ID, :WS-DEST-ACCT, 'CREDIT',
                :WS-AMOUNT, CURRENT TIMESTAMP)
           END-EXEC

           IF SQLCODE NOT = 0
               MOVE 'CREDIT AUDIT FAILED' TO WS-ERROR-MSG
               SET TRANSFER-FAILED TO TRUE
               GO TO 2000-EXIT
           END-IF.

       2000-EXIT.
           EXIT.

       3000-FINALIZE.
           IF TRANSFER-OK
               EXEC SQL COMMIT END-EXEC
               DISPLAY 'TRANSFER COMMITTED: ' WS-XFER-ID
           ELSE
               EXEC SQL ROLLBACK END-EXEC
               DISPLAY 'TRANSFER ROLLED BACK: ' WS-ERROR-MSG
           END-IF.

⚠️ Critical Design Point: Notice how the program accumulates an error flag throughout the transfer process and only issues COMMIT or ROLLBACK at the very end. This ensures atomicity — every step must succeed before anything is committed.

32.2.2 CICS SYNCPOINT

In a CICS program, the equivalent operations use EXEC CICS commands:

      *--- Commit all changes ---*
       EXEC CICS SYNCPOINT END-EXEC.

      *--- Rollback all changes ---*
       EXEC CICS SYNCPOINT ROLLBACK END-EXEC.

In CICS, a SYNCPOINT commits changes to all resources managed by the task — DB2 tables, VSAM files, temporary storage queues, transient data queues. This is coordinated through CICS's sync point manager.

32.2.3 IMS Checkpoint

In IMS batch DL/I or BMP programs:

      *--- Commit via checkpoint ---*
       CALL 'CBLTDLI' USING DLI-CHKP
                             IO-PCB
                             WS-CHECKPOINT-AREA.

      *--- Rollback to last checkpoint ---*
       CALL 'CBLTDLI' USING DLI-ROLB
                             IO-PCB.

32.2.4 Choosing Commit Frequency

In batch programs, you must decide how often to commit. The trade-offs:

Frequency	Advantage	Disadvantage
Every record	Minimal rollback on failure	High overhead, slow throughput
Every N records	Balanced approach	Up to N records to reprocess on restart
End of job only	Maximum throughput	Full restart on failure

The standard practice at GlobalBank: commit every 500-1000 records, with checkpoint/restart logic to handle reprocessing.

       PERFORM UNTIL END-OF-INPUT
           READ INPUT-FILE INTO WS-INPUT-REC
               AT END SET END-OF-INPUT TO TRUE
           END-READ

           IF NOT END-OF-INPUT
               PERFORM 2000-PROCESS-RECORD
               ADD 1 TO WS-RECORD-COUNT

               IF FUNCTION MOD(WS-RECORD-COUNT,
                               WS-COMMIT-INTERVAL) = 0
                   EXEC SQL COMMIT END-EXEC
                   DISPLAY 'COMMITTED AT RECORD: '
                           WS-RECORD-COUNT
               END-IF
           END-IF
       END-PERFORM.

      *--- Final commit for remaining records ---*
       EXEC SQL COMMIT END-EXEC.

💡 Defensive Programming: Always issue a final COMMIT after the processing loop. Without it, the last batch of records (fewer than the commit interval) would only be committed at program termination — and if that termination is abnormal, they would be lost.

32.3 Two-Phase Commit

When a single transaction updates multiple resource managers — for example, both a DB2 table and a VSAM file, or DB2 and IMS — a simple COMMIT is not enough. What if DB2 commits but the VSAM update fails? You need two-phase commit (2PC).

32.3.1 How Two-Phase Commit Works

The sync point manager (RRS — Resource Recovery Services on z/OS) coordinates the commit across all resource managers:

Phase 1 — Prepare: 1. The sync point manager sends a "prepare" message to each resource manager 2. Each resource manager writes its changes to a log and responds "ready" or "abort" 3. If all respond "ready," proceed to Phase 2 4. If any responds "abort," all resource managers are told to roll back

Phase 2 — Commit: 1. The sync point manager sends "commit" to all resource managers 2. Each resource manager makes its changes permanent 3. All respond "committed"

Sync Point Manager (RRS)
    │
    ├─── Prepare ──→ DB2     → "Ready"
    │
    ├─── Prepare ──→ VSAM    → "Ready"
    │
    ├─── Prepare ──→ IMS     → "Ready"
    │
    │  (All ready? Yes!)
    │
    ├─── Commit ───→ DB2     → "Committed"
    ├─── Commit ───→ VSAM    → "Committed"
    └─── Commit ───→ IMS     → "Committed"

32.3.2 Two-Phase Commit in Practice

As a COBOL programmer, you rarely code 2PC explicitly. The middleware handles it:

In CICS: EXEC CICS SYNCPOINT automatically coordinates all resource managers. If your CICS program updates a DB2 table and a VSAM file in the same task, SYNCPOINT handles the two-phase commit transparently.
In batch: RRS (Resource Recovery Services) coordinates when your batch program uses DB2 and VSAM RLS (Record Level Sharing).
In IMS: IMS's sync point processing coordinates IMS databases and external resources.

However, you must design your programs to work with 2PC:

      *--- Update DB2 table ---*
           EXEC SQL
               UPDATE CLAIM_MASTER
               SET CLAIM_STATUS = 'APPROVED',
                   APPROVED_AMT = :WS-APPROVED-AMT
               WHERE CLAIM_ID = :WS-CLAIM-ID
           END-EXEC

           IF SQLCODE NOT = 0
               SET TRANSACTION-FAILED TO TRUE
               GO TO 5000-FINALIZE
           END-IF

      *--- Update VSAM payment file ---*
           WRITE PAYMENT-RECORD FROM WS-PAYMENT-REC
           IF WS-VSAM-STATUS NOT = '00'
               SET TRANSACTION-FAILED TO TRUE
               GO TO 5000-FINALIZE
           END-IF

      *--- Both succeeded — commit both ---*
       5000-FINALIZE.
           IF TRANSACTION-OK
               EXEC CICS SYNCPOINT END-EXEC
           ELSE
               EXEC CICS SYNCPOINT ROLLBACK END-EXEC
           END-IF.

⚠️ Important: Two-phase commit adds overhead. Each prepare and commit phase requires I/O to the log. In high-throughput systems, this overhead matters. This is one reason why many mainframe systems keep related data in the same resource manager (e.g., all in DB2) rather than splitting across DB2 and VSAM.

32.4 Deadlock Prevention

A deadlock occurs when two transactions each hold a lock that the other needs, creating a circular wait. Neither can proceed.

Transaction A:                    Transaction B:
  Lock Account 001 ✓                Lock Account 002 ✓
  Lock Account 002 → WAIT           Lock Account 001 → WAIT
  (waiting for B)                    (waiting for A)

32.4.1 Resource Ordering

The most effective deadlock prevention strategy is resource ordering: always acquire locks in the same sequence. If every program that transfers funds between accounts locks the lower-numbered account first, deadlocks cannot occur.

       2000-EXECUTE-TRANSFER.
      *--- Always lock accounts in ascending order ---*
           IF WS-SOURCE-ACCT < WS-DEST-ACCT
               MOVE WS-SOURCE-ACCT TO WS-FIRST-LOCK-ACCT
               MOVE WS-DEST-ACCT   TO WS-SECOND-LOCK-ACCT
           ELSE
               MOVE WS-DEST-ACCT   TO WS-FIRST-LOCK-ACCT
               MOVE WS-SOURCE-ACCT TO WS-SECOND-LOCK-ACCT
           END-IF

      *--- Lock first account ---*
           EXEC SQL
               SELECT ACCT_BALANCE
               INTO :WS-FIRST-BAL
               FROM ACCOUNT
               WHERE ACCT_NUMBER = :WS-FIRST-LOCK-ACCT
               FOR UPDATE OF ACCT_BALANCE
           END-EXEC

      *--- Lock second account ---*
           EXEC SQL
               SELECT ACCT_BALANCE
               INTO :WS-SECOND-BAL
               FROM ACCOUNT
               WHERE ACCT_NUMBER = :WS-SECOND-LOCK-ACCT
               FOR UPDATE OF ACCT_BALANCE
           END-EXEC

      *--- Now perform the actual debit/credit ---*
           ...

💡 Defensive Programming: Resource ordering is a discipline, not a technical feature. It only works if every program that accesses these resources follows the same convention. At GlobalBank, the convention is documented in the programming standards manual, and code reviews check for it.

32.4.2 Timeout Strategies

Even with resource ordering, deadlocks can occur (due to bugs or unforeseen interactions). The safety net is a lock timeout:

      *--- Set a lock timeout for this transaction ---*
           EXEC SQL
               SET CURRENT LOCK TIMEOUT = 10
           END-EXEC.

If a lock is not granted within 10 seconds, DB2 returns SQLCODE -911 (deadlock/timeout). Your program must handle this:

       EVALUATE SQLCODE
           WHEN 0
               CONTINUE
           WHEN -911
      *        Deadlock or timeout — rollback and retry
               EXEC SQL ROLLBACK END-EXEC
               ADD 1 TO WS-RETRY-COUNT
               IF WS-RETRY-COUNT <= WS-MAX-RETRIES
                   PERFORM 2000-EXECUTE-TRANSFER
               ELSE
                   MOVE 'MAX RETRIES EXCEEDED'
                       TO WS-ERROR-MSG
                   PERFORM 9000-ERROR-EXIT
               END-IF
           WHEN OTHER
               PERFORM 9000-SQL-ERROR
       END-EVALUATE.

32.4.3 Deadlock Detection in CICS

CICS has its own deadlock detection for VSAM resources. When CICS detects a deadlock, it abends one of the transactions with ABEND code AFCR. Your program should include:

       EXEC CICS HANDLE ABEND
           LABEL(9000-DEADLOCK-HANDLER)
       END-EXEC.

      *--- Or, preferably, use RESP/RESP2 ---*
       EXEC CICS READ
           FILE('ACCTFILE')
           INTO(WS-ACCT-REC)
           RIDFLD(WS-ACCT-KEY)
           UPDATE
           RESP(WS-RESP)
           RESP2(WS-RESP2)
       END-EXEC

       IF WS-RESP = DFHRESP(LOCKED)
           PERFORM 3100-HANDLE-LOCK-WAIT
       END-IF.

32.4.4 Minimizing Lock Duration

The best deadlock prevention is reducing the window during which locks are held:

Acquire locks as late as possible — do validation and preparation before locking
Release locks as early as possible — commit promptly after updates
Minimize processing between lock acquisition and release — no I/O to non-locked resources, no complex calculations
Use the least restrictive lock mode — share locks for reads, exclusive locks only for updates

      *--- BAD: Hold locks during long calculation ---*
       EXEC SQL SELECT ... FOR UPDATE END-EXEC
       PERFORM COMPLEX-CALCULATION       *> 500ms of CPU
       EXEC SQL UPDATE ... END-EXEC
       EXEC SQL COMMIT END-EXEC

      *--- GOOD: Calculate first, then lock-update-commit ---*
       EXEC SQL SELECT ... END-EXEC      *> No lock yet
       PERFORM COMPLEX-CALCULATION       *> Calculate freely
       EXEC SQL SELECT ... FOR UPDATE END-EXEC  *> Lock now
       EXEC SQL UPDATE ... END-EXEC      *> Update immediately
       EXEC SQL COMMIT END-EXEC          *> Release quickly

📊 Lock Duration Rule: At GlobalBank, the standard is: no transaction should hold exclusive locks for more than 100 milliseconds. Transactions that exceed this are flagged in performance monitoring and must be redesigned.

32.5 Optimistic vs. Pessimistic Locking

32.5.1 Pessimistic Locking

Pessimistic locking assumes conflicts are common and prevents them by locking resources before accessing them. This is the default approach in DB2's SELECT ... FOR UPDATE:

      *--- Pessimistic: Lock immediately ---*
       EXEC SQL
           SELECT ACCT_BALANCE, LAST_UPDATE
           INTO :WS-BALANCE, :WS-LAST-UPDATE
           FROM ACCOUNT
           WHERE ACCT_NUMBER = :WS-ACCT-NUM
           FOR UPDATE OF ACCT_BALANCE
       END-EXEC.
      *--- Row is now locked. Other transactions wait. ---*

Pros: Simple, safe, guaranteed consistency Cons: Reduces concurrency, increases deadlock risk, locks held during user think time in online programs

32.5.2 Optimistic Locking

Optimistic locking assumes conflicts are rare. It reads without locking, performs work, then checks at update time whether anyone else changed the data:

      *--- Step 1: Read without lock (note: no FOR UPDATE) ---*
       EXEC SQL
           SELECT ACCT_BALANCE, LAST_UPDATE
           INTO :WS-BALANCE, :WS-SAVED-TIMESTAMP
           FROM ACCOUNT
           WHERE ACCT_NUMBER = :WS-ACCT-NUM
       END-EXEC.

      *--- Step 2: Process (user think time, calculations, etc.) ---*
       PERFORM CALCULATE-NEW-BALANCE.

      *--- Step 3: Update only if unchanged since we read ---*
       EXEC SQL
           UPDATE ACCOUNT
           SET ACCT_BALANCE = :WS-NEW-BALANCE,
               LAST_UPDATE = CURRENT TIMESTAMP
           WHERE ACCT_NUMBER = :WS-ACCT-NUM
             AND LAST_UPDATE = :WS-SAVED-TIMESTAMP
       END-EXEC.

      *--- Step 4: Check if update succeeded ---*
       IF SQLCODE = 0 AND SQLERRD(3) = 1
      *    Exactly one row updated — success
           EXEC SQL COMMIT END-EXEC
       ELSE IF SQLERRD(3) = 0
      *    Zero rows updated — someone else changed it
           EXEC SQL ROLLBACK END-EXEC
           MOVE 'DATA CHANGED — PLEASE RETRY'
               TO WS-USER-MESSAGE
           PERFORM REDISPLAY-SCREEN
       END-IF.

The LAST_UPDATE timestamp acts as a version number. If it changed between your read and your update, someone else modified the row, and your update affects zero rows.

Pros: Higher concurrency, no lock contention, no deadlocks Cons: Requires retry logic, wasted work on conflict, requires a version column

💡 When to Use Which: Use pessimistic locking for high-contention resources (like account balances during peak hours) and optimistic locking for low-contention resources (like customer address updates). At GlobalBank, the CICS online transfer transaction uses pessimistic locking because the same popular accounts are hit constantly. The batch address-update program uses optimistic locking because address changes are rare.

32.6 Long-Running Transactions — The Saga Pattern

Some business processes span hours or days and cannot be held in a single database transaction. Consider MedClaim's claim lifecycle:

Claim is received (Day 1)
Claim is validated (Day 1)
Claim is adjudicated (Day 2)
Payment is approved (Day 3)
Payment is issued (Day 5)
Explanation of Benefits is generated (Day 5)

You cannot hold a database lock for five days. The solution is the saga pattern: break the long-running business process into a sequence of short database transactions, each with a compensating transaction that can undo it if a later step fails.

32.6.1 Saga Design

Step 1: Receive Claim     → Compensation: Cancel Claim
Step 2: Validate Claim    → Compensation: Mark Invalid
Step 3: Adjudicate Claim  → Compensation: Reverse Adjudication
Step 4: Approve Payment   → Compensation: Reverse Approval
Step 5: Issue Payment     → Compensation: Stop/Void Check
Step 6: Generate EOB      → Compensation: Void EOB

Each step is an independent transaction that commits immediately. If Step 4 fails, you execute the compensation transactions for Steps 3, 2, and 1 in reverse order.

32.6.2 Implementing Sagas in COBOL

      *================================================================*
      * PROGRAM: CLMSAGA1                                              *
      * PURPOSE: Saga-based claim processing with compensation         *
      * AUTHOR:  James Okafor / MedClaim Development                   *
      *================================================================*

       01  WS-SAGA-STATE.
           05  WS-SAGA-STEP     PIC 9(2).
           05  WS-SAGA-STATUS   PIC X(1).
               88 SAGA-OK       VALUE 'Y'.
               88 SAGA-FAILED   VALUE 'N'.
           05  WS-SAGA-STEPS-COMPLETED PIC 9(2).

       PROCEDURE DIVISION.

       2000-EXECUTE-SAGA.
           SET SAGA-OK TO TRUE
           MOVE ZEROS TO WS-SAGA-STEPS-COMPLETED

      *--- Step 1: Receive ---*
           PERFORM 2100-RECEIVE-CLAIM
           IF SAGA-FAILED
               PERFORM 3000-COMPENSATE
               GO TO 2000-EXIT
           END-IF
           ADD 1 TO WS-SAGA-STEPS-COMPLETED

      *--- Step 2: Validate ---*
           PERFORM 2200-VALIDATE-CLAIM
           IF SAGA-FAILED
               PERFORM 3000-COMPENSATE
               GO TO 2000-EXIT
           END-IF
           ADD 1 TO WS-SAGA-STEPS-COMPLETED

      *--- Step 3: Adjudicate ---*
           PERFORM 2300-ADJUDICATE-CLAIM
           IF SAGA-FAILED
               PERFORM 3000-COMPENSATE
               GO TO 2000-EXIT
           END-IF
           ADD 1 TO WS-SAGA-STEPS-COMPLETED

      *--- Step 4: Approve Payment ---*
           PERFORM 2400-APPROVE-PAYMENT
           IF SAGA-FAILED
               PERFORM 3000-COMPENSATE
               GO TO 2000-EXIT
           END-IF
           ADD 1 TO WS-SAGA-STEPS-COMPLETED

      *--- Step 5: Issue Payment ---*
           PERFORM 2500-ISSUE-PAYMENT
           IF SAGA-FAILED
               PERFORM 3000-COMPENSATE
               GO TO 2000-EXIT
           END-IF
           ADD 1 TO WS-SAGA-STEPS-COMPLETED.

       2000-EXIT.
           EXIT.

       3000-COMPENSATE.
      *--- Execute compensations in reverse order ---*
           DISPLAY 'SAGA FAILED AT STEP: '
                   WS-SAGA-STEPS-COMPLETED
                   ' — COMPENSATING'

           EVALUATE WS-SAGA-STEPS-COMPLETED
               WHEN 4
                   PERFORM 3400-REVERSE-APPROVAL
                   PERFORM 3300-REVERSE-ADJUDICATION
                   PERFORM 3200-MARK-INVALID
                   PERFORM 3100-CANCEL-CLAIM
               WHEN 3
                   PERFORM 3300-REVERSE-ADJUDICATION
                   PERFORM 3200-MARK-INVALID
                   PERFORM 3100-CANCEL-CLAIM
               WHEN 2
                   PERFORM 3200-MARK-INVALID
                   PERFORM 3100-CANCEL-CLAIM
               WHEN 1
                   PERFORM 3100-CANCEL-CLAIM
           END-EVALUATE.

       2100-RECEIVE-CLAIM.
           EXEC SQL
               INSERT INTO CLAIM_MASTER
               (CLAIM_ID, CLAIM_STATUS, RECEIVE_DATE,
                PROVIDER_ID, MEMBER_ID, BILLED_AMT)
               VALUES
               (:WS-CLAIM-ID, 'RECEIVED',
                CURRENT DATE,
                :WS-PROVIDER-ID, :WS-MEMBER-ID,
                :WS-BILLED-AMT)
           END-EXEC

           IF SQLCODE = 0
               EXEC SQL COMMIT END-EXEC
           ELSE
               SET SAGA-FAILED TO TRUE
               EXEC SQL ROLLBACK END-EXEC
           END-IF.

       3100-CANCEL-CLAIM.
      *--- Compensation for Step 1 ---*
           EXEC SQL
               UPDATE CLAIM_MASTER
               SET CLAIM_STATUS = 'CANCELLED',
                   CANCEL_REASON = 'SAGA COMPENSATION',
                   CANCEL_DATE = CURRENT DATE
               WHERE CLAIM_ID = :WS-CLAIM-ID
           END-EXEC

           EXEC SQL COMMIT END-EXEC

           DISPLAY 'COMPENSATED: CLAIM CANCELLED'.

🔴 Critical Consideration: Compensation transactions must be idempotent — safe to execute multiple times. If the compensation itself fails and is retried, it must produce the same result. This is why we use UPDATE with specific status values rather than DELETE: if the compensation runs twice, the second UPDATE simply sets a value that is already set.

32.7 Compensation Logic

Compensation is not simply "undo." It is "apply the business-appropriate reversal." The difference matters.

32.7.1 Types of Compensation

Original Action	Simple Undo	Business Compensation
Debit $100 \| Add $100 back	Add $100 back + audit trail
Insert claim	Delete claim	Set status to CANCELLED
Ship product	Un-ship? (impossible)	Issue return authorization
Send notification	Un-send? (impossible)	Send correction notification

Notice that some actions cannot be undone — you cannot un-send an email or un-ship a package. Compensation must account for this reality.

32.7.2 Compensation Design Principles

Log everything: Every action must create an audit trail so compensation knows what to reverse
Use status flags, not deletes: Mark records as cancelled rather than deleting them
Design for partial failure: Each compensation step must handle the case where the original step partially completed
Make compensations idempotent: Safe to retry
Track compensation state: Record which compensations have been executed

       01  WS-COMPENSATION-LOG.
           05  WS-COMP-ENTRIES OCCURS 10 TIMES.
               10  WS-COMP-STEP     PIC 9(2).
               10  WS-COMP-ACTION   PIC X(30).
               10  WS-COMP-STATUS   PIC X(1).
                   88 COMP-SUCCESS  VALUE 'S'.
                   88 COMP-FAILED   VALUE 'F'.
                   88 COMP-SKIPPED  VALUE 'K'.
               10  WS-COMP-TIME     PIC X(26).

       3100-CANCEL-CLAIM.
      *--- Check if already compensated ---*
           EXEC SQL
               SELECT CLAIM_STATUS
               INTO :WS-CURRENT-STATUS
               FROM CLAIM_MASTER
               WHERE CLAIM_ID = :WS-CLAIM-ID
           END-EXEC

           IF WS-CURRENT-STATUS = 'CANCELLED'
      *        Already compensated — idempotent
               SET COMP-SKIPPED(1) TO TRUE
           ELSE
               EXEC SQL
                   UPDATE CLAIM_MASTER
                   SET CLAIM_STATUS = 'CANCELLED'
                   WHERE CLAIM_ID = :WS-CLAIM-ID
               END-EXEC
               IF SQLCODE = 0
                   EXEC SQL COMMIT END-EXEC
                   SET COMP-SUCCESS(1) TO TRUE
               ELSE
                   SET COMP-FAILED(1) TO TRUE
               END-IF
           END-IF.

32.8 Idempotent Transaction Design

An idempotent operation produces the same result whether executed once or multiple times. This is critical in mainframe environments where: - Batch jobs may be restarted from the last checkpoint - CICS transactions may be retried after timeout - Messages in MQ queues may be delivered more than once

32.8.1 The Idempotent Key Pattern

Assign a unique identifier to each transaction and check for it before processing:

       2000-PROCESS-PAYMENT.
      *--- Check if this payment was already processed ---*
           EXEC SQL
               SELECT COUNT(*)
               INTO :WS-EXISTING-COUNT
               FROM PAYMENT_HISTORY
               WHERE PAYMENT_ID = :WS-PAYMENT-ID
           END-EXEC

           IF WS-EXISTING-COUNT > 0
      *        Already processed — idempotent skip
               DISPLAY 'PAYMENT ALREADY PROCESSED: '
                       WS-PAYMENT-ID
               MOVE 'DUPLICATE' TO WS-PROCESS-RESULT
           ELSE
               PERFORM 2100-EXECUTE-PAYMENT
           END-IF.

32.8.2 The Idempotent Update Pattern

Design UPDATE statements to be naturally idempotent:

      *--- NON-IDEMPOTENT: Running this twice doubles the credit ---*
       EXEC SQL
           UPDATE ACCOUNT
           SET ACCT_BALANCE = ACCT_BALANCE + :WS-AMOUNT
           WHERE ACCT_NUMBER = :WS-ACCT-NUM
       END-EXEC.

      *--- IDEMPOTENT: Running this twice has the same effect ---*
       EXEC SQL
           UPDATE ACCOUNT
           SET ACCT_BALANCE = :WS-NEW-BALANCE
           WHERE ACCT_NUMBER = :WS-ACCT-NUM
             AND LAST_UPDATE < :WS-TRANS-TIMESTAMP
       END-EXEC.

The idempotent version sets an absolute value rather than a relative increment, and includes a timestamp guard to prevent reapplication.

32.8.3 The Request-Response Log Pattern

For complex transactions, maintain a log of requests and their results:

       01  WS-REQUEST-LOG-REC.
           05  RL-REQUEST-ID     PIC X(20).
           05  RL-REQUEST-TYPE   PIC X(10).
           05  RL-REQUEST-TIME   PIC X(26).
           05  RL-STATUS         PIC X(10).
           05  RL-RESULT-CODE    PIC X(4).
           05  RL-RESULT-DATA    PIC X(200).

       2000-PROCESS-REQUEST.
      *--- Check if we already processed this request ---*
           EXEC SQL
               SELECT STATUS, RESULT_CODE, RESULT_DATA
               INTO :RL-STATUS, :RL-RESULT-CODE,
                    :RL-RESULT-DATA
               FROM REQUEST_LOG
               WHERE REQUEST_ID = :RL-REQUEST-ID
           END-EXEC

           EVALUATE SQLCODE
               WHEN 0
      *            Already processed — return cached result
                   DISPLAY 'RETURNING CACHED RESULT FOR: '
                           RL-REQUEST-ID
               WHEN +100
      *            Not found — process normally
                   PERFORM 2100-EXECUTE-REQUEST
                   PERFORM 2200-LOG-RESULT
               WHEN OTHER
                   PERFORM 9000-SQL-ERROR
           END-EVALUATE.

📊 Idempotency in Practice at MedClaim: James Okafor implemented the request-log pattern across all of MedClaim's claim processing programs after a weekend production restart processed 12,000 claims twice. The duplicate payments cost $2.3 million to reverse manually. Now, every claim processing step checks a transaction log before executing. "Idempotency is not a nice-to-have," James says. "It is the difference between a calm Monday and a catastrophe."

32.9 Throughput and Response Time Optimization

Transaction design directly affects two key metrics: throughput (transactions per second) and response time (how long each transaction takes).

32.9.1 Batch Throughput Optimization

      *--- Pattern: Batch with periodic commits ---*
       01  WS-BATCH-CONTROL.
           05  WS-COMMIT-INTERVAL PIC 9(5) VALUE 01000.
           05  WS-RECORD-COUNT    PIC 9(9) VALUE ZEROS.
           05  WS-COMMIT-COUNT    PIC 9(5) VALUE ZEROS.
           05  WS-ERROR-COUNT     PIC 9(5) VALUE ZEROS.
           05  WS-START-TIME      PIC X(26).
           05  WS-END-TIME        PIC X(26).

       2000-PROCESS-BATCH.
           MOVE FUNCTION CURRENT-DATE TO WS-START-TIME

           PERFORM UNTIL END-OF-INPUT
               READ INPUT-FILE INTO WS-INPUT-REC
                   AT END SET END-OF-INPUT TO TRUE
               END-READ

               IF NOT END-OF-INPUT
                   PERFORM 2100-PROCESS-ONE-RECORD
                   ADD 1 TO WS-RECORD-COUNT

                   IF FUNCTION MOD(WS-RECORD-COUNT,
                                   WS-COMMIT-INTERVAL) = 0
                       EXEC SQL COMMIT END-EXEC
                       ADD 1 TO WS-COMMIT-COUNT
                       PERFORM 2200-CHECKPOINT
                   END-IF
               END-IF
           END-PERFORM

      *--- Final commit ---*
           EXEC SQL COMMIT END-EXEC
           MOVE FUNCTION CURRENT-DATE TO WS-END-TIME

           DISPLAY 'RECORDS PROCESSED: ' WS-RECORD-COUNT
           DISPLAY 'COMMITS ISSUED:    ' WS-COMMIT-COUNT
           DISPLAY 'ERRORS:            ' WS-ERROR-COUNT
           DISPLAY 'START TIME:        ' WS-START-TIME
           DISPLAY 'END TIME:          ' WS-END-TIME.

Throughput levers: 1. Commit interval: Too small (every record) wastes I/O on commits. Too large (end of job) risks long rollbacks. Profile to find the sweet spot. 2. Buffer pool sizing: Larger DB2 buffer pools reduce physical I/O. Work with your DBA to tune. 3. Sequential prefetch: DB2 automatically prefetches data for sequential scans. Design your access pattern to be sequential when possible. 4. Parallel processing: Split the input into ranges and run multiple batch jobs concurrently (but watch for lock contention).

32.9.2 Online Response Time Optimization

For CICS transactions, response time is king:

      *--- Minimize DB2 calls ---*
      *--- BAD: Multiple selects ---*
       EXEC SQL SELECT CUST_NAME INTO :H ...  END-EXEC.
       EXEC SQL SELECT ACCT_BAL INTO :H ...   END-EXEC.
       EXEC SQL SELECT LAST_TXN INTO :H ...   END-EXEC.

      *--- GOOD: Single join ---*
       EXEC SQL
           SELECT C.CUST_NAME, A.ACCT_BAL, T.LAST_TXN
           INTO :WS-CUST-NAME, :WS-ACCT-BAL,
                :WS-LAST-TXN
           FROM CUSTOMER C
           JOIN ACCOUNT A ON C.CUST_ID = A.CUST_ID
           LEFT JOIN LAST_TRANSACTION T
               ON A.ACCT_NUM = T.ACCT_NUM
           WHERE C.CUST_ID = :WS-CUST-ID
             AND A.ACCT_NUM = :WS-ACCT-NUM
       END-EXEC.

Response time levers: 1. Minimize the number of I/O operations — SQL calls, VSAM reads, IMS DL/I calls 2. Avoid table scans — ensure WHERE clauses use indexed columns 3. Commit early — release locks as soon as possible 4. Avoid GETMAIN/FREEMAIN in loops — allocate working storage once 5. Keep transaction scope small — do only what is necessary for this request

✅ Try It Yourself: Write a CICS-style transfer transaction that implements pessimistic locking with resource ordering. Add a retry loop for deadlock handling (SQLCODE -911). Measure the "lock hold time" by capturing timestamps before the first SELECT FOR UPDATE and after the COMMIT.

32.10 GlobalBank: The Atomic Transfer

Let us put it all together with a complete, production-quality funds transfer program for GlobalBank. This version includes resource ordering, deadlock retry, idempotency, compensation logging, and proper error handling:

      *================================================================*
      * PROGRAM: XFER0200                                              *
      * PURPOSE: Production funds transfer with full safety             *
      * AUTHOR:  Maria Chen / GlobalBank Development                   *
      * DATE:    2025-07-01                                            *
      *================================================================*
       IDENTIFICATION DIVISION.
       PROGRAM-ID. XFER0200.

       DATA DIVISION.
       WORKING-STORAGE SECTION.

           EXEC SQL INCLUDE SQLCA END-EXEC.

       01  WS-TRANSFER-REQUEST.
           05  WS-XFER-ID        PIC X(20).
           05  WS-SOURCE-ACCT    PIC X(12).
           05  WS-DEST-ACCT      PIC X(12).
           05  WS-AMOUNT         PIC S9(11)V99 COMP-3.
           05  WS-REQUESTOR      PIC X(8).
           05  WS-REQUEST-TIME   PIC X(26).

       01  WS-ORDERED-LOCKS.
           05  WS-FIRST-ACCT     PIC X(12).
           05  WS-SECOND-ACCT    PIC X(12).
           05  WS-FIRST-BAL      PIC S9(11)V99 COMP-3.
           05  WS-SECOND-BAL     PIC S9(11)V99 COMP-3.

       01  WS-CONTROL.
           05  WS-RETRY-COUNT    PIC 9(2) VALUE ZEROS.
           05  WS-MAX-RETRIES    PIC 9(2) VALUE 03.
           05  WS-STATUS-FLAG    PIC X(1).
               88 XFER-SUCCESS   VALUE 'S'.
               88 XFER-FAILED    VALUE 'F'.
               88 XFER-DUPLICATE VALUE 'D'.
           05  WS-ERROR-MSG      PIC X(80).

       PROCEDURE DIVISION.

       0000-MAIN.
           PERFORM 1000-CHECK-IDEMPOTENCY
           IF NOT XFER-DUPLICATE
               PERFORM 2000-EXECUTE-WITH-RETRY
           END-IF
           PERFORM 9000-REPORT-RESULT
           GOBACK.

       1000-CHECK-IDEMPOTENCY.
           EXEC SQL
               SELECT XFER_STATUS
               INTO :WS-STATUS-FLAG
               FROM TRANSFER_LOG
               WHERE XFER_ID = :WS-XFER-ID
           END-EXEC

           IF SQLCODE = 0
               SET XFER-DUPLICATE TO TRUE
               DISPLAY 'DUPLICATE TRANSFER: ' WS-XFER-ID
           ELSE IF SQLCODE = +100
               CONTINUE
           ELSE
               PERFORM 9800-SQL-ERROR
           END-IF.

       2000-EXECUTE-WITH-RETRY.
           MOVE ZEROS TO WS-RETRY-COUNT
           PERFORM 2100-ATTEMPT-TRANSFER

           PERFORM UNTIL XFER-SUCCESS
                      OR XFER-FAILED
                      OR WS-RETRY-COUNT >= WS-MAX-RETRIES
               ADD 1 TO WS-RETRY-COUNT
               DISPLAY 'RETRYING TRANSFER: ATTEMPT '
                       WS-RETRY-COUNT
               PERFORM 2100-ATTEMPT-TRANSFER
           END-PERFORM

           IF NOT XFER-SUCCESS
               SET XFER-FAILED TO TRUE
               MOVE 'MAX RETRIES EXCEEDED' TO WS-ERROR-MSG
           END-IF.

       2100-ATTEMPT-TRANSFER.
      *--- Resource ordering: lock lower account first ---*
           IF WS-SOURCE-ACCT < WS-DEST-ACCT
               MOVE WS-SOURCE-ACCT TO WS-FIRST-ACCT
               MOVE WS-DEST-ACCT   TO WS-SECOND-ACCT
           ELSE
               MOVE WS-DEST-ACCT   TO WS-FIRST-ACCT
               MOVE WS-SOURCE-ACCT TO WS-SECOND-ACCT
           END-IF

      *--- Lock first account ---*
           EXEC SQL
               SELECT ACCT_BALANCE
               INTO :WS-FIRST-BAL
               FROM ACCOUNT
               WHERE ACCT_NUMBER = :WS-FIRST-ACCT
               FOR UPDATE OF ACCT_BALANCE
           END-EXEC

           IF SQLCODE = -911
               EXEC SQL ROLLBACK END-EXEC
               GO TO 2100-EXIT
           END-IF
           IF SQLCODE NOT = 0
               MOVE 'FIRST LOCK FAILED' TO WS-ERROR-MSG
               SET XFER-FAILED TO TRUE
               EXEC SQL ROLLBACK END-EXEC
               GO TO 2100-EXIT
           END-IF

      *--- Lock second account ---*
           EXEC SQL
               SELECT ACCT_BALANCE
               INTO :WS-SECOND-BAL
               FROM ACCOUNT
               WHERE ACCT_NUMBER = :WS-SECOND-ACCT
               FOR UPDATE OF ACCT_BALANCE
           END-EXEC

           IF SQLCODE = -911
               EXEC SQL ROLLBACK END-EXEC
               GO TO 2100-EXIT
           END-IF
           IF SQLCODE NOT = 0
               MOVE 'SECOND LOCK FAILED' TO WS-ERROR-MSG
               SET XFER-FAILED TO TRUE
               EXEC SQL ROLLBACK END-EXEC
               GO TO 2100-EXIT
           END-IF

      *--- Verify sufficient funds ---*
           IF WS-SOURCE-ACCT = WS-FIRST-ACCT
               IF WS-FIRST-BAL < WS-AMOUNT
                   MOVE 'INSUFFICIENT FUNDS'
                       TO WS-ERROR-MSG
                   SET XFER-FAILED TO TRUE
                   EXEC SQL ROLLBACK END-EXEC
                   GO TO 2100-EXIT
               END-IF
           ELSE
               IF WS-SECOND-BAL < WS-AMOUNT
                   MOVE 'INSUFFICIENT FUNDS'
                       TO WS-ERROR-MSG
                   SET XFER-FAILED TO TRUE
                   EXEC SQL ROLLBACK END-EXEC
                   GO TO 2100-EXIT
               END-IF
           END-IF

      *--- Debit source ---*
           EXEC SQL
               UPDATE ACCOUNT
               SET ACCT_BALANCE = ACCT_BALANCE - :WS-AMOUNT,
                   LAST_UPDATE = CURRENT TIMESTAMP
               WHERE ACCT_NUMBER = :WS-SOURCE-ACCT
           END-EXEC

      *--- Credit destination ---*
           EXEC SQL
               UPDATE ACCOUNT
               SET ACCT_BALANCE = ACCT_BALANCE + :WS-AMOUNT,
                   LAST_UPDATE = CURRENT TIMESTAMP
               WHERE ACCT_NUMBER = :WS-DEST-ACCT
           END-EXEC

      *--- Log the transfer ---*
           EXEC SQL
               INSERT INTO TRANSFER_LOG
               (XFER_ID, SOURCE_ACCT, DEST_ACCT,
                AMOUNT, XFER_STATUS, XFER_TIMESTAMP)
               VALUES
               (:WS-XFER-ID, :WS-SOURCE-ACCT,
                :WS-DEST-ACCT, :WS-AMOUNT,
                'S', CURRENT TIMESTAMP)
           END-EXEC

      *--- Commit everything ---*
           EXEC SQL COMMIT END-EXEC

           SET XFER-SUCCESS TO TRUE.

       2100-EXIT.
           EXIT.

       9000-REPORT-RESULT.
           EVALUATE TRUE
               WHEN XFER-SUCCESS
                   DISPLAY 'TRANSFER SUCCESSFUL: '
                           WS-XFER-ID
               WHEN XFER-DUPLICATE
                   DISPLAY 'TRANSFER ALREADY PROCESSED: '
                           WS-XFER-ID
               WHEN XFER-FAILED
                   DISPLAY 'TRANSFER FAILED: ' WS-ERROR-MSG
           END-EVALUATE.

       9800-SQL-ERROR.
           DISPLAY 'SQL ERROR: ' SQLCODE.

🧪 Design Review: This program demonstrates five defensive patterns working together: 1. Idempotency check (paragraph 1000) — prevents duplicate processing 2. Resource ordering (paragraph 2100) — prevents deadlocks 3. Deadlock retry (paragraph 2000) — recovers from unavoidable deadlocks 4. All-or-nothing commit — only commits when all steps succeed 5. Transfer logging — creates an audit trail and supports the idempotency check

32.11 MedClaim: Claim Adjudication Transaction

At MedClaim, the claim adjudication process updates three tables atomically: CLAIM_MASTER (status and approved amount), PAYMENT (new payment record), and PROVIDER_BALANCE (running balance). James Okafor designed it with the saga pattern for the multi-day lifecycle but atomic transactions within each step:

      *================================================================*
      * PROGRAM: CLMADJ01                                              *
      * PURPOSE: Adjudicate a single claim — atomic step in saga       *
      *================================================================*
       3000-ADJUDICATE-CLAIM.
      *--- Read claim with lock ---*
           EXEC SQL
               SELECT CLAIM_STATUS, BILLED_AMT,
                      PROVIDER_ID, MEMBER_ID
               INTO :WS-CLAIM-STATUS, :WS-BILLED-AMT,
                    :WS-PROVIDER-ID, :WS-MEMBER-ID
               FROM CLAIM_MASTER
               WHERE CLAIM_ID = :WS-CLAIM-ID
               FOR UPDATE
           END-EXEC

           IF SQLCODE NOT = 0
               MOVE 'CLAIM NOT FOUND' TO WS-ERROR-MSG
               SET ADJ-FAILED TO TRUE
               GO TO 3000-EXIT
           END-IF

      *--- Verify claim is in correct state ---*
           IF WS-CLAIM-STATUS NOT = 'VALIDATED'
               MOVE 'CLAIM NOT IN VALIDATED STATUS'
                   TO WS-ERROR-MSG
               SET ADJ-FAILED TO TRUE
               GO TO 3000-EXIT
           END-IF

      *--- Calculate approved amount (business rules) ---*
           PERFORM 3100-APPLY-ADJUDICATION-RULES

      *--- Update claim status ---*
           EXEC SQL
               UPDATE CLAIM_MASTER
               SET CLAIM_STATUS = 'ADJUDICATED',
                   APPROVED_AMT = :WS-APPROVED-AMT,
                   ADJUDICATION_DATE = CURRENT DATE,
                   ADJUDICATOR = :WS-ADJUDICATOR-ID
               WHERE CLAIM_ID = :WS-CLAIM-ID
           END-EXEC

      *--- Create payment record ---*
           EXEC SQL
               INSERT INTO PAYMENT
               (PAYMENT_ID, CLAIM_ID, PROVIDER_ID,
                PAYMENT_AMT, PAYMENT_STATUS,
                CREATE_DATE)
               VALUES
               (:WS-PAYMENT-ID, :WS-CLAIM-ID,
                :WS-PROVIDER-ID, :WS-APPROVED-AMT,
                'PENDING', CURRENT DATE)
           END-EXEC

      *--- Update provider balance ---*
           EXEC SQL
               UPDATE PROVIDER_BALANCE
               SET PENDING_AMT = PENDING_AMT
                                + :WS-APPROVED-AMT,
                   LAST_CLAIM_DATE = CURRENT DATE
               WHERE PROVIDER_ID = :WS-PROVIDER-ID
           END-EXEC

      *--- All succeeded — commit ---*
           EXEC SQL COMMIT END-EXEC
           SET ADJ-SUCCESS TO TRUE.

       3000-EXIT.
           IF ADJ-FAILED
               EXEC SQL ROLLBACK END-EXEC
           END-IF.

⚖️ The Modernization Spectrum: Notice that MedClaim's adjudication uses DB2 for all three tables. James Okafor lobbied to move the PROVIDER_BALANCE from a VSAM file to a DB2 table specifically so that all three updates could participate in a single DB2 COMMIT, avoiding the complexity and overhead of two-phase commit. "Sometimes the best modernization is simplifying your transaction boundaries," he says.

32.12 Transaction Anti-Patterns

Learning what not to do is as important as learning what to do. Here are the most common transaction design mistakes:

32.12.1 The Chatty Transaction

      *--- ANTI-PATTERN: Too many commits ---*
       EXEC SQL UPDATE ... END-EXEC.
       EXEC SQL COMMIT END-EXEC.          *> Commit 1
       EXEC SQL UPDATE ... END-EXEC.
       EXEC SQL COMMIT END-EXEC.          *> Commit 2
       EXEC SQL INSERT ... END-EXEC.
       EXEC SQL COMMIT END-EXEC.          *> Commit 3

Problem: If the program fails between Commit 2 and Commit 3, the data is inconsistent. Each update should be part of the same unit of work unless they are truly independent.

32.12.2 The Mega-Transaction

      *--- ANTI-PATTERN: One million updates, no commit ---*
       PERFORM 1000000 TIMES
           EXEC SQL UPDATE ... END-EXEC
       END-PERFORM
       EXEC SQL COMMIT END-EXEC.

Problem: If the program abends at record 999,999, the entire million-record rollback takes hours, during which the database is locked and unavailable.

32.12.3 The Optimistic Failure Ignorer

      *--- ANTI-PATTERN: Ignoring optimistic lock failure ---*
       EXEC SQL UPDATE ... WHERE LAST_UPDATE = :SAVED END-EXEC.
       IF SQLERRD(3) = 0
           DISPLAY 'CONCURRENT MODIFICATION'
      *    But then continues processing anyway!
       END-IF.

Problem: The program detects the conflict but does not stop or retry. It continues with stale data.

32.12.4 The Lock-and-Think

      *--- ANTI-PATTERN: Holding locks during user interaction ---*
       EXEC SQL SELECT ... FOR UPDATE END-EXEC.
       EXEC CICS SEND MAP(...) END-EXEC.   *> Show screen
       EXEC CICS RECEIVE MAP(...) END-EXEC. *> Wait for user
      *    Lock held for entire user think time!
       EXEC SQL UPDATE ... END-EXEC.
       EXEC SQL COMMIT END-EXEC.

Problem: In pseudo-conversational CICS, this can lock a row for minutes while the user reads the screen. Use optimistic locking for conversational patterns.

32.13 VSAM Transaction Patterns

Many COBOL programs work with VSAM files rather than (or in addition to) DB2 tables. VSAM has its own transaction patterns and considerations.

VSAM Record-Level Sharing (RLS) enables multiple programs across multiple z/OS systems to access the same VSAM file concurrently with record-level locking. Without RLS, VSAM locking is at the control-interval (CI) level, which is much more restrictive.

With RLS, VSAM participates in z/OS's Resource Recovery Services (RRS) for two-phase commit. This means a single CICS transaction can atomically update both a DB2 table and a VSAM file.

32.13.2 VSAM Locking Patterns

      *--- Read for update (acquires exclusive lock) ---*
       READ ACCT-FILE INTO WS-ACCT-REC
           KEY IS WS-ACCT-KEY

       IF WS-FILE-STATUS = '00'
      *--- Record is now locked ---*
           ADD WS-DEPOSIT TO ACCT-BALANCE
           REWRITE ACCT-RECORD FROM WS-ACCT-REC

           IF WS-FILE-STATUS = '00'
               DISPLAY 'UPDATE SUCCESSFUL'
           ELSE
               DISPLAY 'REWRITE FAILED: ' WS-FILE-STATUS
           END-IF
       END-IF.

32.13.3 VSAM Commit in Batch

In batch COBOL programs without CICS or RRS, VSAM files do not participate in formal COMMIT/ROLLBACK. Changes are written directly to disk. This means:

There is no automatic rollback if the program abends after a REWRITE
The programmer must implement application-level recovery
For critical updates, consider writing to a temporary file first, then applying all changes in a single pass

      *--- Application-level backup before batch update ---*
       PERFORM UNTIL END-OF-UPDATE-FILE
           READ UPDATE-FILE INTO WS-UPDATE-REC
               AT END SET END-OF-UPDATE-FILE TO TRUE
           END-READ

           IF NOT END-OF-UPDATE-FILE
      *--- Read current record and save for backup ---*
               READ MASTER-FILE INTO WS-MASTER-REC
                   KEY IS WS-MASTER-KEY
               IF WS-MASTER-STATUS = '00'
                   WRITE BACKUP-RECORD FROM WS-MASTER-REC
               END-IF

      *--- Apply update ---*
               PERFORM 2100-APPLY-UPDATE
               REWRITE MASTER-RECORD FROM WS-MASTER-REC
           END-IF
       END-PERFORM.

The backup file provides a manual recovery path: if the program fails, a restore program can read the backup file and revert the master file to its pre-update state.

⚠️ Important: This application-level backup approach is much less robust than DB2's COMMIT/ROLLBACK or CICS's SYNCPOINT. It does not handle the case where the program abends during a REWRITE (leaving a partially written record). For critical data, strongly consider using VSAM RLS with CICS/RRS, or migrating the data to DB2.

32.13.4 Mixed DB2/VSAM Transactions

Many real-world COBOL programs update both DB2 tables and VSAM files in the same transaction. In CICS with VSAM RLS:

      *--- Update DB2 table ---*
       EXEC SQL
           UPDATE CLAIM_MASTER
           SET CLAIM_STATUS = 'PROCESSED'
           WHERE CLAIM_ID = :WS-CLAIM-ID
       END-EXEC

      *--- Update VSAM file ---*
       EXEC CICS READ
           FILE('CLMVSAM')
           INTO(WS-VSAM-REC)
           RIDFLD(WS-CLAIM-KEY)
           UPDATE
           RESP(WS-RESP)
       END-EXEC

       IF WS-RESP = DFHRESP(NORMAL)
           MOVE 'PROCESSED' TO VSAM-CLM-STATUS
           EXEC CICS REWRITE
               FILE('CLMVSAM')
               FROM(WS-VSAM-REC)
               RESP(WS-RESP)
           END-EXEC
       END-IF

      *--- Commit both atomically ---*
       EXEC CICS SYNCPOINT END-EXEC.

The SYNCPOINT coordinates the DB2 commit and the VSAM commit through RRS's two-phase commit protocol. Either both succeed or both roll back.

32.14 Error Recovery Patterns

Beyond rollback, production COBOL programs need structured error recovery that handles different categories of failures differently.

32.14.1 The Error Classification Pattern

Not all errors are equal. A classification system helps the program respond appropriately:

       01  WS-ERROR-CLASSIFICATION.
           05  WS-ERROR-CLASS    PIC X(1).
               88 ERR-RETRYABLE  VALUE 'R'.
               88 ERR-SKIPPABLE  VALUE 'S'.
               88 ERR-FATAL      VALUE 'F'.
           05  WS-ERROR-CODE     PIC X(10).
           05  WS-ERROR-DESC     PIC X(80).
           05  WS-ERROR-ACTION   PIC X(20).

       8000-CLASSIFY-ERROR.
           EVALUATE TRUE
      *--- Retryable errors ---*
               WHEN SQLCODE = -911
                   SET ERR-RETRYABLE TO TRUE
                   MOVE 'DEADLOCK' TO WS-ERROR-CODE
                   MOVE 'ROLLBACK AND RETRY'
                       TO WS-ERROR-ACTION
               WHEN SQLCODE = -904
                   SET ERR-RETRYABLE TO TRUE
                   MOVE 'RESOURCE-UNAVAIL'
                       TO WS-ERROR-CODE
                   MOVE 'WAIT AND RETRY'
                       TO WS-ERROR-ACTION

      *--- Skippable errors (log and continue) ---*
               WHEN SQLCODE = -803
                   SET ERR-SKIPPABLE TO TRUE
                   MOVE 'DUPLICATE-KEY' TO WS-ERROR-CODE
                   MOVE 'LOG AND SKIP RECORD'
                       TO WS-ERROR-ACTION
               WHEN SQLCODE = +100
                   SET ERR-SKIPPABLE TO TRUE
                   MOVE 'NOT-FOUND' TO WS-ERROR-CODE
                   MOVE 'LOG AND SKIP RECORD'
                       TO WS-ERROR-ACTION

      *--- Fatal errors ---*
               WHEN SQLCODE = -922
                   SET ERR-FATAL TO TRUE
                   MOVE 'AUTH-FAILURE' TO WS-ERROR-CODE
                   MOVE 'ROLLBACK AND ABEND'
                       TO WS-ERROR-ACTION
               WHEN OTHER
                   SET ERR-FATAL TO TRUE
                   MOVE 'UNKNOWN' TO WS-ERROR-CODE
                   MOVE 'ROLLBACK AND ABEND'
                       TO WS-ERROR-ACTION
           END-EVALUATE.

32.14.2 The Error Log Pattern

Every error should be logged for post-mortem analysis:

       01  WS-ERROR-LOG-REC.
           05  EL-TIMESTAMP      PIC X(26).
           05  EL-PROGRAM        PIC X(8).
           05  EL-PARAGRAPH      PIC X(30).
           05  EL-ERROR-CLASS    PIC X(1).
           05  EL-ERROR-CODE     PIC X(10).
           05  EL-ERROR-DESC     PIC X(80).
           05  EL-KEY-VALUE      PIC X(20).
           05  EL-SQLCODE        PIC S9(9) COMP.
           05  EL-SQLERRM        PIC X(70).
           05  EL-RECORD-NUM     PIC 9(9).

       8100-LOG-ERROR.
           MOVE FUNCTION CURRENT-DATE TO EL-TIMESTAMP
           MOVE 'CLMSAGA1' TO EL-PROGRAM
           MOVE WS-DEBUG-PARA TO EL-PARAGRAPH
           MOVE WS-ERROR-CLASS TO EL-ERROR-CLASS
           MOVE WS-ERROR-CODE TO EL-ERROR-CODE
           MOVE WS-ERROR-DESC TO EL-ERROR-DESC
           MOVE WS-CURRENT-KEY TO EL-KEY-VALUE
           MOVE SQLCODE TO EL-SQLCODE
           MOVE SQLERRMC TO EL-SQLERRM
           MOVE WS-RECORD-COUNT TO EL-RECORD-NUM

           WRITE ERROR-LOG-RECORD FROM WS-ERROR-LOG-REC

           DISPLAY 'ERROR LOGGED: ' EL-ERROR-CODE
                   ' AT RECORD ' EL-RECORD-NUM.

32.14.3 The Graceful Degradation Pattern

Some batch programs should continue processing even when individual records fail. The key is maintaining accurate counts and ensuring that failures are fully logged for later resolution:

       01  WS-BATCH-STATS.
           05  WS-TOTAL-READ      PIC 9(9) VALUE ZEROS.
           05  WS-TOTAL-PROCESSED PIC 9(9) VALUE ZEROS.
           05  WS-TOTAL-SKIPPED   PIC 9(9) VALUE ZEROS.
           05  WS-TOTAL-ERRORS    PIC 9(9) VALUE ZEROS.
           05  WS-MAX-ERRORS      PIC 9(5) VALUE 00100.

       2000-PROCESS-BATCH.
           PERFORM UNTIL END-OF-INPUT
                      OR WS-TOTAL-ERRORS >= WS-MAX-ERRORS
               READ INPUT-FILE INTO WS-INPUT-REC
                   AT END SET END-OF-INPUT TO TRUE
               END-READ

               IF NOT END-OF-INPUT
                   ADD 1 TO WS-TOTAL-READ
                   PERFORM 2100-PROCESS-ONE-RECORD

                   EVALUATE WS-ERROR-CLASS
                       WHEN SPACES
                           ADD 1 TO WS-TOTAL-PROCESSED
                       WHEN 'S'
                           ADD 1 TO WS-TOTAL-SKIPPED
                       WHEN 'F'
                           ADD 1 TO WS-TOTAL-ERRORS
                   END-EVALUATE
               END-IF
           END-PERFORM

           IF WS-TOTAL-ERRORS >= WS-MAX-ERRORS
               DISPLAY '*** MAX ERRORS EXCEEDED ***'
               DISPLAY '*** BATCH TERMINATED ***'
           END-IF.

The WS-MAX-ERRORS threshold prevents the program from continuing indefinitely when something is fundamentally wrong (e.g., a corrupted file or a systemic database problem). At MedClaim, James Okafor sets this to 0.1% of expected volume — if more than 0.1% of records fail, the batch stops for human investigation.

32.15 Transaction Design in Mixed Environments

Modern mainframe applications often span multiple technologies. A single business transaction might involve: - Reading from a VSAM file - Updating a DB2 table - Sending a message to MQ - Writing to an IMS database

32.15.1 The MQ Transaction Pattern

IBM MQ (formerly WebSphere MQ) provides messaging that participates in transaction coordination. When MQ is configured for transactional messaging, a message put (or get) can be committed or rolled back along with database updates.

      *--- In CICS, MQ participates in SYNCPOINT ---*
       EXEC CICS PUT
           CONTAINER('MQCONTAINER')
           QUEUE('CLAIM.OUTBOUND')
           FROM(WS-MQ-MESSAGE)
           RESP(WS-RESP)
       END-EXEC

       IF WS-RESP NOT = DFHRESP(NORMAL)
           SET TRANSACTION-FAILED TO TRUE
       END-IF.

      *--- SYNCPOINT commits both DB2 and MQ ---*
       IF TRANSACTION-OK
           EXEC CICS SYNCPOINT END-EXEC
      *    Message is now visible to consumers
       ELSE
           EXEC CICS SYNCPOINT ROLLBACK END-EXEC
      *    Message is removed from queue
       END-IF.

This ensures that the message is only visible to consumers if the database updates are also committed. Without transactional MQ, you risk sending a message about a claim that was never actually processed (because the database update was rolled back).

32.15.2 The Outbox Pattern

When true two-phase commit is not available or practical (e.g., when sending to an external system), use the outbox pattern:

Write the business data AND the outbound message to the database in the same transaction
A separate process reads the outbox table and sends the messages
After successful send, mark the outbox entry as sent

      *--- Step 1: Business update + outbox in one COMMIT ---*
       EXEC SQL
           UPDATE CLAIM_MASTER
           SET CLAIM_STATUS = 'APPROVED'
           WHERE CLAIM_ID = :WS-CLAIM-ID
       END-EXEC

       EXEC SQL
           INSERT INTO OUTBOX
           (MSG_ID, MSG_TYPE, MSG_DATA, MSG_STATUS)
           VALUES
           (:WS-MSG-ID, 'CLM-APPROVED',
            :WS-MSG-DATA, 'PENDING')
       END-EXEC

       EXEC SQL COMMIT END-EXEC.

      *--- Step 2: Separate sender process (different program) ---*
      *--- Reads PENDING outbox entries, sends via MQ, ---*
      *--- marks as SENT. If send fails, retries later. ---*

The outbox pattern guarantees that messages are sent if and only if the business transaction committed. It trades immediate delivery for guaranteed delivery.

32.16 Batch Checkpoint/Restart Design

For batch programs that process millions of records, checkpoint/restart is not optional — it is a critical design requirement. Without it, a failure near the end of a 12-hour batch run means restarting from the beginning.

32.16.1 The Checkpoint/Restart Framework

A well-designed checkpoint/restart framework saves all state needed to resume processing from the last checkpoint:

       01  WS-CHECKPOINT-DATA.
           05  CHKP-RECORD-COUNT  PIC 9(10).
           05  CHKP-TOTAL-DEBITS  PIC S9(13)V99 COMP-3.
           05  CHKP-TOTAL-CREDITS PIC S9(13)V99 COMP-3.
           05  CHKP-LAST-KEY      PIC X(20).
           05  CHKP-ERROR-COUNT   PIC 9(7).
           05  CHKP-TIMESTAMP     PIC X(26).
           05  CHKP-INPUT-POS     PIC 9(10).
           05  CHKP-HASH-TOTAL    PIC 9(18).

       5000-TAKE-CHECKPOINT.
      *--- Save all accumulators ---*
           MOVE WS-RECORD-COUNT  TO CHKP-RECORD-COUNT
           MOVE WS-TOTAL-DEBITS  TO CHKP-TOTAL-DEBITS
           MOVE WS-TOTAL-CREDITS TO CHKP-TOTAL-CREDITS
           MOVE WS-LAST-KEY      TO CHKP-LAST-KEY
           MOVE WS-ERROR-COUNT   TO CHKP-ERROR-COUNT
           MOVE FUNCTION CURRENT-DATE
               TO CHKP-TIMESTAMP
           MOVE WS-INPUT-POSITION TO CHKP-INPUT-POS
           MOVE WS-HASH-TOTAL    TO CHKP-HASH-TOTAL

      *--- Commit database changes ---*
           EXEC SQL COMMIT END-EXEC

      *--- Write checkpoint to checkpoint file ---*
           WRITE CHKP-RECORD FROM WS-CHECKPOINT-DATA

           DISPLAY 'CHECKPOINT: RECORD '
                   CHKP-RECORD-COUNT
                   ' AT ' CHKP-TIMESTAMP(1:19).

       1000-CHECK-RESTART.
      *--- Read last checkpoint if restart ---*
           READ CHKP-FILE INTO WS-CHECKPOINT-DATA
               AT END
                   DISPLAY 'NORMAL START — NO CHECKPOINT'
                   PERFORM 1100-INITIALIZE-FRESH
               NOT AT END
                   DISPLAY 'RESTARTING FROM CHECKPOINT'
                   DISPLAY '  RECORDS: '
                           CHKP-RECORD-COUNT
                   PERFORM 1200-RESTORE-FROM-CHECKPOINT
           END-READ.

       1200-RESTORE-FROM-CHECKPOINT.
           MOVE CHKP-RECORD-COUNT  TO WS-RECORD-COUNT
           MOVE CHKP-TOTAL-DEBITS  TO WS-TOTAL-DEBITS
           MOVE CHKP-TOTAL-CREDITS TO WS-TOTAL-CREDITS
           MOVE CHKP-LAST-KEY      TO WS-LAST-KEY
           MOVE CHKP-ERROR-COUNT   TO WS-ERROR-COUNT
           MOVE CHKP-INPUT-POS     TO WS-INPUT-POSITION
           MOVE CHKP-HASH-TOTAL    TO WS-HASH-TOTAL

      *--- Reposition the input file ---*
           PERFORM WS-INPUT-POSITION TIMES
               READ INPUT-FILE INTO WS-INPUT-REC
           END-PERFORM

           DISPLAY 'REPOSITIONED TO RECORD: '
                   WS-INPUT-POSITION.

32.16.2 Checkpoint Frequency Calculation

The optimal checkpoint frequency depends on three factors:

Commit overhead: Each COMMIT writes to the DB2 log. At roughly 2ms per COMMIT, 1,000 commits add 2 seconds to the batch run.
Reprocessing time: If the program abends, records between the last checkpoint and the failure must be reprocessed. At 0.5ms per record and a checkpoint interval of 10,000 records, maximum reprocessing is 5 seconds.
Rollback time: Without checkpoints, DB2 must roll back all uncommitted changes. Rolling back 1 million updates at 0.5ms each takes 500 seconds (over 8 minutes), during which the affected tables are locked.

The formula for optimal checkpoint interval is roughly:

Optimal interval = sqrt(Total records * Commit overhead / Record processing time)

For a typical batch program processing 1 million records at 0.5ms each with 2ms commit overhead, the optimal interval is approximately 2,000 records.

At GlobalBank, the standard is 1,000 records for critical financial data and 5,000 records for non-financial data.

32.16.3 Hash Totals for Integrity

Include hash totals in your checkpoints to verify that reprocessed records produce the same results:

      *--- Add each key to a hash total ---*
       COMPUTE WS-HASH-TOTAL =
           FUNCTION MOD(
               WS-HASH-TOTAL +
               FUNCTION ORD(WS-ACCT-KEY(1:1)) *
               FUNCTION ORD(WS-ACCT-KEY(2:1)),
               999999999999999999)

On restart, reprocess from the checkpoint and verify that the hash total matches. If it does not, the input data has changed since the original run, and the restart may produce inconsistent results.

32.17 Transaction Design Review Checklist

Before any transaction program goes to production, Maria Chen requires the following review:

Unit of work boundaries: Are all related operations within a single COMMIT scope?
Error handling: Is every SQL/DL/I/file status code checked?
Rollback on failure: Does every failure path issue ROLLBACK before exiting?
Lock ordering: If multiple resources are locked, is the order consistent?
Lock duration: How long are exclusive locks held? Can this be reduced?
Idempotency: Is the transaction safe to retry? What happens if it runs twice?
Commit frequency: For batch, is the checkpoint interval appropriate?
Compensation: For long-running processes, are compensating transactions defined?
Monitoring: Are transaction metrics captured (elapsed time, SQL count, retries)?
Documentation: Are the transaction boundaries and recovery behavior documented?

This checklist has prevented dozens of production incidents at GlobalBank. "Transaction design is not something you get right by luck," Maria says. "You get it right by discipline."

32.18 Transaction Monitoring and Diagnostics

Production transaction problems require diagnostic data. Design your transactions to be observable:

       01  WS-TRANSACTION-METRICS.
           05  WS-TXN-START-TIME  PIC X(26).
           05  WS-TXN-END-TIME    PIC X(26).
           05  WS-SQL-COUNT       PIC 9(5).
           05  WS-LOCK-WAIT-TIME  PIC 9(9).
           05  WS-RETRY-COUNT     PIC 9(2).
           05  WS-COMMIT-COUNT    PIC 9(3).

       1000-INITIALIZE.
           MOVE FUNCTION CURRENT-DATE TO WS-TXN-START-TIME
           MOVE ZEROS TO WS-SQL-COUNT
           MOVE ZEROS TO WS-LOCK-WAIT-TIME
           MOVE ZEROS TO WS-RETRY-COUNT
           MOVE ZEROS TO WS-COMMIT-COUNT.

       9000-FINALIZE.
           MOVE FUNCTION CURRENT-DATE TO WS-TXN-END-TIME
           DISPLAY 'TXN METRICS:'
           DISPLAY '  ELAPSED:    ' WS-TXN-START-TIME
                   ' TO ' WS-TXN-END-TIME
           DISPLAY '  SQL CALLS:  ' WS-SQL-COUNT
           DISPLAY '  RETRIES:    ' WS-RETRY-COUNT
           DISPLAY '  COMMITS:    ' WS-COMMIT-COUNT.

At GlobalBank, Priya Kapoor's monitoring system automatically flags transactions that exceed baseline metrics — a sudden increase in retries might indicate a new deadlock pattern, and a rising SQL count might indicate a missing index.

32.18.1 Logging Transaction Metrics to DB2

For production monitoring, write transaction metrics to a database table rather than SYSOUT:

       9100-LOG-METRICS.
           EXEC SQL
               INSERT INTO TRANSACTION_METRICS
               (TXN_ID, TXN_TYPE, START_TIME, END_TIME,
                SQL_COUNT, RETRY_COUNT, COMMIT_COUNT,
                RESULT_STATUS, ERROR_CODE)
               VALUES
               (:WS-TXN-ID, :WS-TXN-TYPE,
                :WS-TXN-START-TIME, :WS-TXN-END-TIME,
                :WS-SQL-COUNT, :WS-RETRY-COUNT,
                :WS-COMMIT-COUNT,
                :WS-RESULT-STATUS, :WS-ERROR-CODE)
           END-EXEC.

This metrics table becomes a goldmine for performance analysis and capacity planning. Priya Kapoor runs weekly queries to identify: - Transactions with rising average SQL counts (potential index degradation) - Transactions with increasing retry rates (deadlock patterns emerging) - Batch jobs with growing elapsed times (data volume growth) - Error rate trends by transaction type

32.18.2 Real-Time Alerting

For CICS online transactions, response time monitoring is critical. CICS provides transaction-level statistics through the CICS Monitoring Facility (CMF). Additionally, you can implement application-level thresholds:

       9200-CHECK-THRESHOLDS.
           COMPUTE WS-ELAPSED-MS =
               WS-END-STCK - WS-START-STCK

           IF WS-ELAPSED-MS > WS-RESPONSE-THRESHOLD
               EXEC CICS WRITEQ TD
                   QUEUE('CSMT')
                   FROM(WS-ALERT-MSG)
                   LENGTH(LENGTH OF WS-ALERT-MSG)
               END-EXEC
               DISPLAY 'THRESHOLD EXCEEDED: '
                       WS-ELAPSED-MS 'ms FOR '
                       WS-TXN-TYPE
           END-IF.

32.19 Transaction Testing Strategies

Testing transaction correctness requires more than verifying that the program produces the right output. You must also verify that the program behaves correctly under failure conditions.

32.19.1 The Abend-and-Verify Test

Deliberately abend the program at various points in the transaction and verify that the database is in a consistent state:

Run the transfer program
Kill it after the debit but before the credit
Check both account balances — they should be unchanged (rollback occurred)
Verify the transfer log shows no completed entry

32.19.2 The Concurrent Access Test

Run multiple instances of the same program simultaneously, accessing the same records:

Start two transfer programs that both transfer between accounts A and B
Verify that both complete without deadlock (or that deadlock retry works)
Verify that the final balances are correct (no lost updates)
Run this test 100 times to check for race conditions

32.19.3 The Duplicate Message Test

Simulate message redelivery:

Process a payment message
Process the same message again (identical ID)
Verify that the payment is applied only once (idempotency)
Verify that the second processing returns the cached result

32.19.4 The Compensation Test

Trigger each saga step failure and verify compensation:

Fail at step 1 — verify nothing is persisted
Fail at step 2 — verify step 1 is compensated
Fail at step 3 — verify steps 2 and 1 are compensated in reverse order
Run each compensation twice — verify idempotency

✅ Try It Yourself: Design a test suite for the XFER0200 transfer program that includes all four test categories above. For each test, specify: (a) the setup conditions, (b) the action to perform, (c) the expected database state after the test, (d) how to verify the expected state. This exercise teaches you that testing transactions is fundamentally different from testing sequential logic.

32.20 Transaction Isolation Levels

DB2 provides four isolation levels that control how much a transaction can see of other transactions' uncommitted changes. Choosing the right isolation level is a critical transaction design decision.

32.20.1 The Four Isolation Levels

Level	DB2 Name	Read Behavior	Lock Duration
Read Uncommitted	UR (Uncommitted Read)	Can see uncommitted changes from other transactions	No read locks
Read Committed	CS (Cursor Stability)	Sees only committed data; locks released after cursor moves	Short
Repeatable Read	RS (Read Stability)	Re-reading returns the same data; locked rows cannot change	Medium
Serializable	RR (Repeatable Read)	Full serialization; phantom reads prevented	Long

32.20.2 Choosing an Isolation Level

For most COBOL batch programs, CS (Cursor Stability) is the default and appropriate choice. It provides a good balance between data consistency and concurrency.

For reporting programs that can tolerate slightly stale data, UR (Uncommitted Read) reduces lock contention dramatically:

       EXEC SQL
           SELECT CUST_NAME, ACCT_BALANCE
           INTO :WS-CUST-NAME, :WS-ACCT-BALANCE
           FROM CUSTOMER C
           JOIN ACCOUNT A ON C.CUST_ID = A.CUST_ID
           WHERE C.CUST_ID = :WS-CUST-ID
           WITH UR
       END-EXEC.

The WITH UR clause tells DB2 to read without acquiring any locks and without waiting for other transactions to commit. The trade-off: you might read a balance that is in the process of being updated. For a display-only inquiry, this is usually acceptable.

For financial transactions that must guarantee consistency (like the funds transfer), RS (Read Stability) or explicit FOR UPDATE locking is appropriate. This ensures that once you read a balance, no other transaction can change it until your transaction completes.

      *--- For the transfer: use explicit locking ---*
       EXEC SQL
           SELECT ACCT_BALANCE
           INTO :WS-BALANCE
           FROM ACCOUNT
           WHERE ACCT_NUMBER = :WS-ACCT-NUM
           FOR UPDATE OF ACCT_BALANCE
           WITH RS USE AND KEEP EXCLUSIVE LOCKS
       END-EXEC.

32.20.3 Isolation Level Impact on Performance

At GlobalBank, Priya Kapoor measured the impact of isolation levels on their daily reporting batch:

Isolation	Elapsed Time	Lock Waits	Lock Timeouts
RR	4.2 hours	12,847	23
RS	3.1 hours	4,291	5
CS	2.4 hours	987	0
UR	1.8 hours	0	0

The reporting batch switched from CS to UR, saving 30 minutes per night. Since the report reflects point-in-time balances and the data does not need to be perfectly current, UR is the appropriate choice.

⚖️ Defensive Programming: Never use UR for transactions that update data. The risk of reading uncommitted data and then making decisions based on it is too high. UR is only appropriate for read-only operations where slight inconsistency is acceptable.

32.21 Putting It All Together: The Transaction Design Spectrum

Throughout this chapter, we have covered many patterns and techniques. Here is how they fit together in a typical enterprise system:

Simple batch update (single file, single DB2 table): - Periodic COMMIT (every 1,000-5,000 records) - Checkpoint/restart for recoverability - Error logging with graceful degradation

Online inquiry (CICS, read-only): - Isolation level UR or CS for reads - No explicit COMMIT needed (no updates) - Response time monitoring

Online update (CICS, single resource): - Pessimistic or optimistic locking depending on contention - Single SYNCPOINT after all updates - RESP/RESP2 checking on all EXEC CICS commands

Online update (CICS, multiple resources — DB2 + VSAM): - Two-phase commit via SYNCPOINT - Resource ordering to prevent deadlocks - Deadlock retry with maximum attempts - Idempotency via unique transaction ID

Multi-day business process (saga): - Each step is an independent committed transaction - Compensating transactions for each step - Saga state table tracking progress - Idempotent steps and compensations - Automated compensation orchestrator

Cross-system integration (MQ, external APIs): - Outbox pattern for guaranteed delivery - Transactional MQ for internal messaging - Idempotent receivers for at-least-once delivery - Request-response logging for duplicate detection

Each pattern builds on the fundamentals of atomicity, consistency, isolation, and durability. The complexity increases as you add more resources, longer durations, and more failure modes. But the core principle never changes: either everything succeeds, or the system recovers to a consistent state.

32.22 Chapter Summary

Transaction design is the art of making concurrent, failure-prone systems behave correctly. In this chapter, you learned:

Unit of work: A sequence of operations that must all succeed or all be undone — the foundation of data integrity
COMMIT and ROLLBACK: Explicit transaction boundaries in DB2, CICS, and IMS
Two-phase commit: Coordinating commits across multiple resource managers (DB2 + VSAM, DB2 + IMS)
Deadlock prevention: Resource ordering eliminates most deadlocks; timeout and retry handle the rest
Optimistic vs. pessimistic locking: Choose based on contention level — pessimistic for hot resources, optimistic for cold
Long-running transactions: The saga pattern breaks multi-day processes into committed steps with compensating transactions
Compensation logic: Business-appropriate reversal, designed to be idempotent and auditable
Idempotent design: Unique transaction IDs and state checks prevent duplicate processing
Performance optimization: Minimize lock duration, choose appropriate commit intervals, reduce I/O calls

These patterns apply regardless of whether you are working with DB2, IMS, VSAM, or a combination. They are the engineering discipline that separates programs that "usually work" from programs that "always work — even when things go wrong."

In the next chapter, we shift from data integrity to code correctness: debugging strategies for finding and fixing the bugs that inevitably creep into even the most carefully designed programs.

"The junior developer writes code that works. The senior developer writes code that fails safely." — Maria Chen's advice to Derek Washington after his first production incident

In This Chapter

Chapter 32: Transaction Design Patterns

32.1 The Unit of Work Concept

32.1.1 What Constitutes a Transaction?

32.1.2 Implicit vs. Explicit Units of Work

32.2 COMMIT and ROLLBACK in Practice

32.2.1 DB2 COMMIT and ROLLBACK

32.2.2 CICS SYNCPOINT

32.2.3 IMS Checkpoint

32.2.4 Choosing Commit Frequency

32.3 Two-Phase Commit

32.3.1 How Two-Phase Commit Works

32.3.2 Two-Phase Commit in Practice

32.4 Deadlock Prevention

32.4.1 Resource Ordering

32.4.2 Timeout Strategies

32.4.3 Deadlock Detection in CICS

32.4.4 Minimizing Lock Duration

32.5 Optimistic vs. Pessimistic Locking

32.5.1 Pessimistic Locking

32.5.2 Optimistic Locking

32.6 Long-Running Transactions — The Saga Pattern

32.6.1 Saga Design

32.6.2 Implementing Sagas in COBOL

32.7 Compensation Logic

32.7.1 Types of Compensation

32.7.2 Compensation Design Principles

32.8 Idempotent Transaction Design

32.8.1 The Idempotent Key Pattern

32.8.2 The Idempotent Update Pattern

32.8.3 The Request-Response Log Pattern

32.9 Throughput and Response Time Optimization

32.9.1 Batch Throughput Optimization

32.9.2 Online Response Time Optimization

32.10 GlobalBank: The Atomic Transfer

32.11 MedClaim: Claim Adjudication Transaction

32.12 Transaction Anti-Patterns

32.12.1 The Chatty Transaction

32.12.2 The Mega-Transaction

32.12.3 The Optimistic Failure Ignorer

32.12.4 The Lock-and-Think

32.13 VSAM Transaction Patterns

32.13.1 VSAM and Record-Level Sharing (RLS)

32.13.2 VSAM Locking Patterns

32.13.3 VSAM Commit in Batch

32.13.4 Mixed DB2/VSAM Transactions

32.14 Error Recovery Patterns

32.14.1 The Error Classification Pattern

32.14.2 The Error Log Pattern

32.14.3 The Graceful Degradation Pattern

32.15 Transaction Design in Mixed Environments

32.15.1 The MQ Transaction Pattern

32.15.2 The Outbox Pattern

32.16 Batch Checkpoint/Restart Design

32.16.1 The Checkpoint/Restart Framework

32.16.2 Checkpoint Frequency Calculation

32.16.3 Hash Totals for Integrity

32.17 Transaction Design Review Checklist

32.18 Transaction Monitoring and Diagnostics

32.18.1 Logging Transaction Metrics to DB2

32.18.2 Real-Time Alerting

32.19 Transaction Testing Strategies

32.19.1 The Abend-and-Verify Test

32.19.2 The Concurrent Access Test

32.19.3 The Duplicate Message Test

32.19.4 The Compensation Test

32.20 Transaction Isolation Levels

32.20.1 The Four Isolation Levels

32.20.2 Choosing an Isolation Level

32.20.3 Isolation Level Impact on Performance

32.21 Putting It All Together: The Transaction Design Spectrum

32.22 Chapter Summary