> "A transaction is a promise. You promise that either everything happens or nothing happens. Breaking that promise breaks trust — and in banking, trust is all we have." — Maria Chen, GlobalBank senior developer
In This Chapter
- 32.1 The Unit of Work Concept
- 32.2 COMMIT and ROLLBACK in Practice
- 32.3 Two-Phase Commit
- 32.4 Deadlock Prevention
- 32.5 Optimistic vs. Pessimistic Locking
- 32.6 Long-Running Transactions — The Saga Pattern
- 32.7 Compensation Logic
- 32.8 Idempotent Transaction Design
- 32.9 Throughput and Response Time Optimization
- 32.10 GlobalBank: The Atomic Transfer
- 32.11 MedClaim: Claim Adjudication Transaction
- 32.12 Transaction Anti-Patterns
- 32.13 VSAM Transaction Patterns
- 32.14 Error Recovery Patterns
- 32.15 Transaction Design in Mixed Environments
- 32.16 Batch Checkpoint/Restart Design
- 32.17 Transaction Design Review Checklist
- 32.18 Transaction Monitoring and Diagnostics
- 32.19 Transaction Testing Strategies
- 32.20 Transaction Isolation Levels
- 32.21 Putting It All Together: The Transaction Design Spectrum
- 32.22 Chapter Summary
Chapter 32: Transaction Design Patterns
"A transaction is a promise. You promise that either everything happens or nothing happens. Breaking that promise breaks trust — and in banking, trust is all we have." — Maria Chen, GlobalBank senior developer
Derek Washington's first production incident at GlobalBank taught him a lesson no textbook had prepared him for. A funds transfer program crashed halfway through execution. It had debited $5,000 from the source account but never credited the destination. The money had vanished into the ether — not lost, technically, but trapped in an inconsistent state that took four hours and three senior developers to untangle. "This," Maria Chen said, reviewing the code afterward, "is why we design transactions."
This chapter teaches you to design transactions that keep data consistent even when things go wrong — and things will go wrong. Hardware fails, networks drop, programs abend, operators cancel jobs, and databases run out of space. Your transaction design must account for all of these scenarios.
32.1 The Unit of Work Concept
A unit of work (UOW) is a sequence of operations that must either all complete successfully or all be undone. In database terms, this is the "A" in ACID:
- Atomicity: All or nothing — either every operation in the unit completes, or none of them take effect
- Consistency: The transaction moves the database from one valid state to another
- Isolation: Concurrent transactions do not interfere with each other
- Durability: Once committed, changes survive system failures
In mainframe COBOL, a unit of work begins implicitly when your program starts (or after the last commit point) and ends explicitly when you issue a COMMIT (or equivalent sync point operation) or implicitly when the program terminates.
32.1.1 What Constitutes a Transaction?
Consider GlobalBank's funds transfer. The business operation is: "Move $5,000 from Account A to Account B." The database operations are:
- Read Account A balance → verify sufficient funds
- Subtract $5,000 from Account A → write back
- Add $5,000 to Account B → write back
- Write an audit record for the debit
- Write an audit record for the credit
- Update daily transaction totals
All six operations form a single unit of work. If any one fails, they must all be undone. You cannot have a state where Account A is debited but Account B is not credited.
📊 The Cost of Inconsistency: In financial systems, data inconsistency is not just a technical problem — it is a regulatory violation. Banking regulators require that all transactions balance to zero (debits must equal credits). A system that loses money, even temporarily, can trigger audits, fines, and loss of charter. At GlobalBank, every transaction is designed to fail safely.
32.1.2 Implicit vs. Explicit Units of Work
In a batch COBOL program: - Implicit: The program's entire execution is one unit of work. If it abends, all changes since the last sync point are rolled back. If it completes normally, all changes are committed at program termination. - Explicit: The program issues COMMIT at defined intervals, breaking the work into multiple units. Each COMMIT is a point of no return — changes before the COMMIT survive even if the program abends later.
In CICS: - Each task (transaction) is typically one unit of work - EXEC CICS SYNCPOINT commits the current unit of work - EXEC CICS SYNCPOINT ROLLBACK undoes changes since the last sync point
In IMS: - The CHKP (checkpoint) call serves as the sync point - ROLB rolls back to the last checkpoint
32.2 COMMIT and ROLLBACK in Practice
32.2.1 DB2 COMMIT and ROLLBACK
In embedded SQL programs, COMMIT and ROLLBACK are straightforward:
*================================================================*
* PROGRAM: XFER0100 *
* PURPOSE: Transfer funds between accounts *
* AUTHOR: Maria Chen / GlobalBank Development *
*================================================================*
IDENTIFICATION DIVISION.
PROGRAM-ID. XFER0100.
DATA DIVISION.
WORKING-STORAGE SECTION.
EXEC SQL INCLUDE SQLCA END-EXEC.
01 WS-TRANSFER-FIELDS.
05 WS-SOURCE-ACCT PIC X(12).
05 WS-DEST-ACCT PIC X(12).
05 WS-AMOUNT PIC S9(11)V99 COMP-3.
05 WS-SOURCE-BAL PIC S9(11)V99 COMP-3.
05 WS-DEST-BAL PIC S9(11)V99 COMP-3.
05 WS-TIMESTAMP PIC X(26).
05 WS-XFER-ID PIC X(20).
01 WS-FLAGS.
05 WS-ERROR-FLAG PIC X(1) VALUE 'N'.
88 TRANSFER-OK VALUE 'N'.
88 TRANSFER-FAILED VALUE 'Y'.
05 WS-ERROR-MSG PIC X(80).
PROCEDURE DIVISION.
0000-MAIN.
PERFORM 1000-INITIALIZE
PERFORM 2000-EXECUTE-TRANSFER
PERFORM 3000-FINALIZE
STOP RUN.
1000-INITIALIZE.
MOVE 'ACCT00001001' TO WS-SOURCE-ACCT
MOVE 'ACCT00001002' TO WS-DEST-ACCT
MOVE 5000.00 TO WS-AMOUNT
MOVE FUNCTION CURRENT-DATE TO WS-TIMESTAMP
STRING 'XFER' WS-TIMESTAMP(1:14)
DELIMITED BY SIZE INTO WS-XFER-ID.
2000-EXECUTE-TRANSFER.
*--- Step 1: Lock and read source account ---*
EXEC SQL
SELECT ACCT_BALANCE
INTO :WS-SOURCE-BAL
FROM ACCOUNT
WHERE ACCT_NUMBER = :WS-SOURCE-ACCT
FOR UPDATE OF ACCT_BALANCE
WITH RS USE AND KEEP EXCLUSIVE LOCKS
END-EXEC
IF SQLCODE NOT = 0
MOVE 'SOURCE ACCOUNT NOT FOUND'
TO WS-ERROR-MSG
SET TRANSFER-FAILED TO TRUE
GO TO 2000-EXIT
END-IF
*--- Step 2: Verify sufficient funds ---*
IF WS-SOURCE-BAL < WS-AMOUNT
MOVE 'INSUFFICIENT FUNDS' TO WS-ERROR-MSG
SET TRANSFER-FAILED TO TRUE
GO TO 2000-EXIT
END-IF
*--- Step 3: Lock and read destination account ---*
EXEC SQL
SELECT ACCT_BALANCE
INTO :WS-DEST-BAL
FROM ACCOUNT
WHERE ACCT_NUMBER = :WS-DEST-ACCT
FOR UPDATE OF ACCT_BALANCE
WITH RS USE AND KEEP EXCLUSIVE LOCKS
END-EXEC
IF SQLCODE NOT = 0
MOVE 'DEST ACCOUNT NOT FOUND' TO WS-ERROR-MSG
SET TRANSFER-FAILED TO TRUE
GO TO 2000-EXIT
END-IF
*--- Step 4: Debit source ---*
EXEC SQL
UPDATE ACCOUNT
SET ACCT_BALANCE = ACCT_BALANCE - :WS-AMOUNT,
LAST_UPDATE = CURRENT TIMESTAMP
WHERE ACCT_NUMBER = :WS-SOURCE-ACCT
END-EXEC
IF SQLCODE NOT = 0
MOVE 'DEBIT FAILED' TO WS-ERROR-MSG
SET TRANSFER-FAILED TO TRUE
GO TO 2000-EXIT
END-IF
*--- Step 5: Credit destination ---*
EXEC SQL
UPDATE ACCOUNT
SET ACCT_BALANCE = ACCT_BALANCE + :WS-AMOUNT,
LAST_UPDATE = CURRENT TIMESTAMP
WHERE ACCT_NUMBER = :WS-DEST-ACCT
END-EXEC
IF SQLCODE NOT = 0
MOVE 'CREDIT FAILED' TO WS-ERROR-MSG
SET TRANSFER-FAILED TO TRUE
GO TO 2000-EXIT
END-IF
*--- Step 6: Write audit records ---*
EXEC SQL
INSERT INTO TRANSACTION_LOG
(XFER_ID, ACCT_NUMBER, TRANS_TYPE,
TRANS_AMOUNT, TRANS_TIMESTAMP)
VALUES
(:WS-XFER-ID, :WS-SOURCE-ACCT, 'DEBIT',
:WS-AMOUNT, CURRENT TIMESTAMP)
END-EXEC
IF SQLCODE NOT = 0
MOVE 'DEBIT AUDIT FAILED' TO WS-ERROR-MSG
SET TRANSFER-FAILED TO TRUE
GO TO 2000-EXIT
END-IF
EXEC SQL
INSERT INTO TRANSACTION_LOG
(XFER_ID, ACCT_NUMBER, TRANS_TYPE,
TRANS_AMOUNT, TRANS_TIMESTAMP)
VALUES
(:WS-XFER-ID, :WS-DEST-ACCT, 'CREDIT',
:WS-AMOUNT, CURRENT TIMESTAMP)
END-EXEC
IF SQLCODE NOT = 0
MOVE 'CREDIT AUDIT FAILED' TO WS-ERROR-MSG
SET TRANSFER-FAILED TO TRUE
GO TO 2000-EXIT
END-IF.
2000-EXIT.
EXIT.
3000-FINALIZE.
IF TRANSFER-OK
EXEC SQL COMMIT END-EXEC
DISPLAY 'TRANSFER COMMITTED: ' WS-XFER-ID
ELSE
EXEC SQL ROLLBACK END-EXEC
DISPLAY 'TRANSFER ROLLED BACK: ' WS-ERROR-MSG
END-IF.
⚠️ Critical Design Point: Notice how the program accumulates an error flag throughout the transfer process and only issues COMMIT or ROLLBACK at the very end. This ensures atomicity — every step must succeed before anything is committed.
32.2.2 CICS SYNCPOINT
In a CICS program, the equivalent operations use EXEC CICS commands:
*--- Commit all changes ---*
EXEC CICS SYNCPOINT END-EXEC.
*--- Rollback all changes ---*
EXEC CICS SYNCPOINT ROLLBACK END-EXEC.
In CICS, a SYNCPOINT commits changes to all resources managed by the task — DB2 tables, VSAM files, temporary storage queues, transient data queues. This is coordinated through CICS's sync point manager.
32.2.3 IMS Checkpoint
In IMS batch DL/I or BMP programs:
*--- Commit via checkpoint ---*
CALL 'CBLTDLI' USING DLI-CHKP
IO-PCB
WS-CHECKPOINT-AREA.
*--- Rollback to last checkpoint ---*
CALL 'CBLTDLI' USING DLI-ROLB
IO-PCB.
32.2.4 Choosing Commit Frequency
In batch programs, you must decide how often to commit. The trade-offs:
| Frequency | Advantage | Disadvantage |
|---|---|---|
| Every record | Minimal rollback on failure | High overhead, slow throughput |
| Every N records | Balanced approach | Up to N records to reprocess on restart |
| End of job only | Maximum throughput | Full restart on failure |
The standard practice at GlobalBank: commit every 500-1000 records, with checkpoint/restart logic to handle reprocessing.
PERFORM UNTIL END-OF-INPUT
READ INPUT-FILE INTO WS-INPUT-REC
AT END SET END-OF-INPUT TO TRUE
END-READ
IF NOT END-OF-INPUT
PERFORM 2000-PROCESS-RECORD
ADD 1 TO WS-RECORD-COUNT
IF FUNCTION MOD(WS-RECORD-COUNT,
WS-COMMIT-INTERVAL) = 0
EXEC SQL COMMIT END-EXEC
DISPLAY 'COMMITTED AT RECORD: '
WS-RECORD-COUNT
END-IF
END-IF
END-PERFORM.
*--- Final commit for remaining records ---*
EXEC SQL COMMIT END-EXEC.
💡 Defensive Programming: Always issue a final COMMIT after the processing loop. Without it, the last batch of records (fewer than the commit interval) would only be committed at program termination — and if that termination is abnormal, they would be lost.
32.3 Two-Phase Commit
When a single transaction updates multiple resource managers — for example, both a DB2 table and a VSAM file, or DB2 and IMS — a simple COMMIT is not enough. What if DB2 commits but the VSAM update fails? You need two-phase commit (2PC).
32.3.1 How Two-Phase Commit Works
The sync point manager (RRS — Resource Recovery Services on z/OS) coordinates the commit across all resource managers:
Phase 1 — Prepare: 1. The sync point manager sends a "prepare" message to each resource manager 2. Each resource manager writes its changes to a log and responds "ready" or "abort" 3. If all respond "ready," proceed to Phase 2 4. If any responds "abort," all resource managers are told to roll back
Phase 2 — Commit: 1. The sync point manager sends "commit" to all resource managers 2. Each resource manager makes its changes permanent 3. All respond "committed"
Sync Point Manager (RRS)
│
├─── Prepare ──→ DB2 → "Ready"
│
├─── Prepare ──→ VSAM → "Ready"
│
├─── Prepare ──→ IMS → "Ready"
│
│ (All ready? Yes!)
│
├─── Commit ───→ DB2 → "Committed"
├─── Commit ───→ VSAM → "Committed"
└─── Commit ───→ IMS → "Committed"
32.3.2 Two-Phase Commit in Practice
As a COBOL programmer, you rarely code 2PC explicitly. The middleware handles it:
- In CICS: EXEC CICS SYNCPOINT automatically coordinates all resource managers. If your CICS program updates a DB2 table and a VSAM file in the same task, SYNCPOINT handles the two-phase commit transparently.
- In batch: RRS (Resource Recovery Services) coordinates when your batch program uses DB2 and VSAM RLS (Record Level Sharing).
- In IMS: IMS's sync point processing coordinates IMS databases and external resources.
However, you must design your programs to work with 2PC:
*--- Update DB2 table ---*
EXEC SQL
UPDATE CLAIM_MASTER
SET CLAIM_STATUS = 'APPROVED',
APPROVED_AMT = :WS-APPROVED-AMT
WHERE CLAIM_ID = :WS-CLAIM-ID
END-EXEC
IF SQLCODE NOT = 0
SET TRANSACTION-FAILED TO TRUE
GO TO 5000-FINALIZE
END-IF
*--- Update VSAM payment file ---*
WRITE PAYMENT-RECORD FROM WS-PAYMENT-REC
IF WS-VSAM-STATUS NOT = '00'
SET TRANSACTION-FAILED TO TRUE
GO TO 5000-FINALIZE
END-IF
*--- Both succeeded — commit both ---*
5000-FINALIZE.
IF TRANSACTION-OK
EXEC CICS SYNCPOINT END-EXEC
ELSE
EXEC CICS SYNCPOINT ROLLBACK END-EXEC
END-IF.
⚠️ Important: Two-phase commit adds overhead. Each prepare and commit phase requires I/O to the log. In high-throughput systems, this overhead matters. This is one reason why many mainframe systems keep related data in the same resource manager (e.g., all in DB2) rather than splitting across DB2 and VSAM.
32.4 Deadlock Prevention
A deadlock occurs when two transactions each hold a lock that the other needs, creating a circular wait. Neither can proceed.
Transaction A: Transaction B:
Lock Account 001 ✓ Lock Account 002 ✓
Lock Account 002 → WAIT Lock Account 001 → WAIT
(waiting for B) (waiting for A)
32.4.1 Resource Ordering
The most effective deadlock prevention strategy is resource ordering: always acquire locks in the same sequence. If every program that transfers funds between accounts locks the lower-numbered account first, deadlocks cannot occur.
2000-EXECUTE-TRANSFER.
*--- Always lock accounts in ascending order ---*
IF WS-SOURCE-ACCT < WS-DEST-ACCT
MOVE WS-SOURCE-ACCT TO WS-FIRST-LOCK-ACCT
MOVE WS-DEST-ACCT TO WS-SECOND-LOCK-ACCT
ELSE
MOVE WS-DEST-ACCT TO WS-FIRST-LOCK-ACCT
MOVE WS-SOURCE-ACCT TO WS-SECOND-LOCK-ACCT
END-IF
*--- Lock first account ---*
EXEC SQL
SELECT ACCT_BALANCE
INTO :WS-FIRST-BAL
FROM ACCOUNT
WHERE ACCT_NUMBER = :WS-FIRST-LOCK-ACCT
FOR UPDATE OF ACCT_BALANCE
END-EXEC
*--- Lock second account ---*
EXEC SQL
SELECT ACCT_BALANCE
INTO :WS-SECOND-BAL
FROM ACCOUNT
WHERE ACCT_NUMBER = :WS-SECOND-LOCK-ACCT
FOR UPDATE OF ACCT_BALANCE
END-EXEC
*--- Now perform the actual debit/credit ---*
...
💡 Defensive Programming: Resource ordering is a discipline, not a technical feature. It only works if every program that accesses these resources follows the same convention. At GlobalBank, the convention is documented in the programming standards manual, and code reviews check for it.
32.4.2 Timeout Strategies
Even with resource ordering, deadlocks can occur (due to bugs or unforeseen interactions). The safety net is a lock timeout:
*--- Set a lock timeout for this transaction ---*
EXEC SQL
SET CURRENT LOCK TIMEOUT = 10
END-EXEC.
If a lock is not granted within 10 seconds, DB2 returns SQLCODE -911 (deadlock/timeout). Your program must handle this:
EVALUATE SQLCODE
WHEN 0
CONTINUE
WHEN -911
* Deadlock or timeout — rollback and retry
EXEC SQL ROLLBACK END-EXEC
ADD 1 TO WS-RETRY-COUNT
IF WS-RETRY-COUNT <= WS-MAX-RETRIES
PERFORM 2000-EXECUTE-TRANSFER
ELSE
MOVE 'MAX RETRIES EXCEEDED'
TO WS-ERROR-MSG
PERFORM 9000-ERROR-EXIT
END-IF
WHEN OTHER
PERFORM 9000-SQL-ERROR
END-EVALUATE.
32.4.3 Deadlock Detection in CICS
CICS has its own deadlock detection for VSAM resources. When CICS detects a deadlock, it abends one of the transactions with ABEND code AFCR. Your program should include:
EXEC CICS HANDLE ABEND
LABEL(9000-DEADLOCK-HANDLER)
END-EXEC.
*--- Or, preferably, use RESP/RESP2 ---*
EXEC CICS READ
FILE('ACCTFILE')
INTO(WS-ACCT-REC)
RIDFLD(WS-ACCT-KEY)
UPDATE
RESP(WS-RESP)
RESP2(WS-RESP2)
END-EXEC
IF WS-RESP = DFHRESP(LOCKED)
PERFORM 3100-HANDLE-LOCK-WAIT
END-IF.
32.4.4 Minimizing Lock Duration
The best deadlock prevention is reducing the window during which locks are held:
- Acquire locks as late as possible — do validation and preparation before locking
- Release locks as early as possible — commit promptly after updates
- Minimize processing between lock acquisition and release — no I/O to non-locked resources, no complex calculations
- Use the least restrictive lock mode — share locks for reads, exclusive locks only for updates
*--- BAD: Hold locks during long calculation ---*
EXEC SQL SELECT ... FOR UPDATE END-EXEC
PERFORM COMPLEX-CALCULATION *> 500ms of CPU
EXEC SQL UPDATE ... END-EXEC
EXEC SQL COMMIT END-EXEC
*--- GOOD: Calculate first, then lock-update-commit ---*
EXEC SQL SELECT ... END-EXEC *> No lock yet
PERFORM COMPLEX-CALCULATION *> Calculate freely
EXEC SQL SELECT ... FOR UPDATE END-EXEC *> Lock now
EXEC SQL UPDATE ... END-EXEC *> Update immediately
EXEC SQL COMMIT END-EXEC *> Release quickly
📊 Lock Duration Rule: At GlobalBank, the standard is: no transaction should hold exclusive locks for more than 100 milliseconds. Transactions that exceed this are flagged in performance monitoring and must be redesigned.
32.5 Optimistic vs. Pessimistic Locking
32.5.1 Pessimistic Locking
Pessimistic locking assumes conflicts are common and prevents them by locking resources before accessing them. This is the default approach in DB2's SELECT ... FOR UPDATE:
*--- Pessimistic: Lock immediately ---*
EXEC SQL
SELECT ACCT_BALANCE, LAST_UPDATE
INTO :WS-BALANCE, :WS-LAST-UPDATE
FROM ACCOUNT
WHERE ACCT_NUMBER = :WS-ACCT-NUM
FOR UPDATE OF ACCT_BALANCE
END-EXEC.
*--- Row is now locked. Other transactions wait. ---*
Pros: Simple, safe, guaranteed consistency Cons: Reduces concurrency, increases deadlock risk, locks held during user think time in online programs
32.5.2 Optimistic Locking
Optimistic locking assumes conflicts are rare. It reads without locking, performs work, then checks at update time whether anyone else changed the data:
*--- Step 1: Read without lock (note: no FOR UPDATE) ---*
EXEC SQL
SELECT ACCT_BALANCE, LAST_UPDATE
INTO :WS-BALANCE, :WS-SAVED-TIMESTAMP
FROM ACCOUNT
WHERE ACCT_NUMBER = :WS-ACCT-NUM
END-EXEC.
*--- Step 2: Process (user think time, calculations, etc.) ---*
PERFORM CALCULATE-NEW-BALANCE.
*--- Step 3: Update only if unchanged since we read ---*
EXEC SQL
UPDATE ACCOUNT
SET ACCT_BALANCE = :WS-NEW-BALANCE,
LAST_UPDATE = CURRENT TIMESTAMP
WHERE ACCT_NUMBER = :WS-ACCT-NUM
AND LAST_UPDATE = :WS-SAVED-TIMESTAMP
END-EXEC.
*--- Step 4: Check if update succeeded ---*
IF SQLCODE = 0 AND SQLERRD(3) = 1
* Exactly one row updated — success
EXEC SQL COMMIT END-EXEC
ELSE IF SQLERRD(3) = 0
* Zero rows updated — someone else changed it
EXEC SQL ROLLBACK END-EXEC
MOVE 'DATA CHANGED — PLEASE RETRY'
TO WS-USER-MESSAGE
PERFORM REDISPLAY-SCREEN
END-IF.
The LAST_UPDATE timestamp acts as a version number. If it changed between your read and your update, someone else modified the row, and your update affects zero rows.
Pros: Higher concurrency, no lock contention, no deadlocks Cons: Requires retry logic, wasted work on conflict, requires a version column
💡 When to Use Which: Use pessimistic locking for high-contention resources (like account balances during peak hours) and optimistic locking for low-contention resources (like customer address updates). At GlobalBank, the CICS online transfer transaction uses pessimistic locking because the same popular accounts are hit constantly. The batch address-update program uses optimistic locking because address changes are rare.
32.6 Long-Running Transactions — The Saga Pattern
Some business processes span hours or days and cannot be held in a single database transaction. Consider MedClaim's claim lifecycle:
- Claim is received (Day 1)
- Claim is validated (Day 1)
- Claim is adjudicated (Day 2)
- Payment is approved (Day 3)
- Payment is issued (Day 5)
- Explanation of Benefits is generated (Day 5)
You cannot hold a database lock for five days. The solution is the saga pattern: break the long-running business process into a sequence of short database transactions, each with a compensating transaction that can undo it if a later step fails.
32.6.1 Saga Design
Step 1: Receive Claim → Compensation: Cancel Claim
Step 2: Validate Claim → Compensation: Mark Invalid
Step 3: Adjudicate Claim → Compensation: Reverse Adjudication
Step 4: Approve Payment → Compensation: Reverse Approval
Step 5: Issue Payment → Compensation: Stop/Void Check
Step 6: Generate EOB → Compensation: Void EOB
Each step is an independent transaction that commits immediately. If Step 4 fails, you execute the compensation transactions for Steps 3, 2, and 1 in reverse order.
32.6.2 Implementing Sagas in COBOL
*================================================================*
* PROGRAM: CLMSAGA1 *
* PURPOSE: Saga-based claim processing with compensation *
* AUTHOR: James Okafor / MedClaim Development *
*================================================================*
01 WS-SAGA-STATE.
05 WS-SAGA-STEP PIC 9(2).
05 WS-SAGA-STATUS PIC X(1).
88 SAGA-OK VALUE 'Y'.
88 SAGA-FAILED VALUE 'N'.
05 WS-SAGA-STEPS-COMPLETED PIC 9(2).
PROCEDURE DIVISION.
2000-EXECUTE-SAGA.
SET SAGA-OK TO TRUE
MOVE ZEROS TO WS-SAGA-STEPS-COMPLETED
*--- Step 1: Receive ---*
PERFORM 2100-RECEIVE-CLAIM
IF SAGA-FAILED
PERFORM 3000-COMPENSATE
GO TO 2000-EXIT
END-IF
ADD 1 TO WS-SAGA-STEPS-COMPLETED
*--- Step 2: Validate ---*
PERFORM 2200-VALIDATE-CLAIM
IF SAGA-FAILED
PERFORM 3000-COMPENSATE
GO TO 2000-EXIT
END-IF
ADD 1 TO WS-SAGA-STEPS-COMPLETED
*--- Step 3: Adjudicate ---*
PERFORM 2300-ADJUDICATE-CLAIM
IF SAGA-FAILED
PERFORM 3000-COMPENSATE
GO TO 2000-EXIT
END-IF
ADD 1 TO WS-SAGA-STEPS-COMPLETED
*--- Step 4: Approve Payment ---*
PERFORM 2400-APPROVE-PAYMENT
IF SAGA-FAILED
PERFORM 3000-COMPENSATE
GO TO 2000-EXIT
END-IF
ADD 1 TO WS-SAGA-STEPS-COMPLETED
*--- Step 5: Issue Payment ---*
PERFORM 2500-ISSUE-PAYMENT
IF SAGA-FAILED
PERFORM 3000-COMPENSATE
GO TO 2000-EXIT
END-IF
ADD 1 TO WS-SAGA-STEPS-COMPLETED.
2000-EXIT.
EXIT.
3000-COMPENSATE.
*--- Execute compensations in reverse order ---*
DISPLAY 'SAGA FAILED AT STEP: '
WS-SAGA-STEPS-COMPLETED
' — COMPENSATING'
EVALUATE WS-SAGA-STEPS-COMPLETED
WHEN 4
PERFORM 3400-REVERSE-APPROVAL
PERFORM 3300-REVERSE-ADJUDICATION
PERFORM 3200-MARK-INVALID
PERFORM 3100-CANCEL-CLAIM
WHEN 3
PERFORM 3300-REVERSE-ADJUDICATION
PERFORM 3200-MARK-INVALID
PERFORM 3100-CANCEL-CLAIM
WHEN 2
PERFORM 3200-MARK-INVALID
PERFORM 3100-CANCEL-CLAIM
WHEN 1
PERFORM 3100-CANCEL-CLAIM
END-EVALUATE.
2100-RECEIVE-CLAIM.
EXEC SQL
INSERT INTO CLAIM_MASTER
(CLAIM_ID, CLAIM_STATUS, RECEIVE_DATE,
PROVIDER_ID, MEMBER_ID, BILLED_AMT)
VALUES
(:WS-CLAIM-ID, 'RECEIVED',
CURRENT DATE,
:WS-PROVIDER-ID, :WS-MEMBER-ID,
:WS-BILLED-AMT)
END-EXEC
IF SQLCODE = 0
EXEC SQL COMMIT END-EXEC
ELSE
SET SAGA-FAILED TO TRUE
EXEC SQL ROLLBACK END-EXEC
END-IF.
3100-CANCEL-CLAIM.
*--- Compensation for Step 1 ---*
EXEC SQL
UPDATE CLAIM_MASTER
SET CLAIM_STATUS = 'CANCELLED',
CANCEL_REASON = 'SAGA COMPENSATION',
CANCEL_DATE = CURRENT DATE
WHERE CLAIM_ID = :WS-CLAIM-ID
END-EXEC
EXEC SQL COMMIT END-EXEC
DISPLAY 'COMPENSATED: CLAIM CANCELLED'.
🔴 Critical Consideration: Compensation transactions must be idempotent — safe to execute multiple times. If the compensation itself fails and is retried, it must produce the same result. This is why we use UPDATE with specific status values rather than DELETE: if the compensation runs twice, the second UPDATE simply sets a value that is already set.
32.7 Compensation Logic
Compensation is not simply "undo." It is "apply the business-appropriate reversal." The difference matters.
32.7.1 Types of Compensation
| Original Action | Simple Undo | Business Compensation |
|---|---|---|
| Debit $100 | Add $100 back | Add $100 back + audit trail | |
| Insert claim | Delete claim | Set status to CANCELLED |
| Ship product | Un-ship? (impossible) | Issue return authorization |
| Send notification | Un-send? (impossible) | Send correction notification |
Notice that some actions cannot be undone — you cannot un-send an email or un-ship a package. Compensation must account for this reality.
32.7.2 Compensation Design Principles
- Log everything: Every action must create an audit trail so compensation knows what to reverse
- Use status flags, not deletes: Mark records as cancelled rather than deleting them
- Design for partial failure: Each compensation step must handle the case where the original step partially completed
- Make compensations idempotent: Safe to retry
- Track compensation state: Record which compensations have been executed
01 WS-COMPENSATION-LOG.
05 WS-COMP-ENTRIES OCCURS 10 TIMES.
10 WS-COMP-STEP PIC 9(2).
10 WS-COMP-ACTION PIC X(30).
10 WS-COMP-STATUS PIC X(1).
88 COMP-SUCCESS VALUE 'S'.
88 COMP-FAILED VALUE 'F'.
88 COMP-SKIPPED VALUE 'K'.
10 WS-COMP-TIME PIC X(26).
3100-CANCEL-CLAIM.
*--- Check if already compensated ---*
EXEC SQL
SELECT CLAIM_STATUS
INTO :WS-CURRENT-STATUS
FROM CLAIM_MASTER
WHERE CLAIM_ID = :WS-CLAIM-ID
END-EXEC
IF WS-CURRENT-STATUS = 'CANCELLED'
* Already compensated — idempotent
SET COMP-SKIPPED(1) TO TRUE
ELSE
EXEC SQL
UPDATE CLAIM_MASTER
SET CLAIM_STATUS = 'CANCELLED'
WHERE CLAIM_ID = :WS-CLAIM-ID
END-EXEC
IF SQLCODE = 0
EXEC SQL COMMIT END-EXEC
SET COMP-SUCCESS(1) TO TRUE
ELSE
SET COMP-FAILED(1) TO TRUE
END-IF
END-IF.
32.8 Idempotent Transaction Design
An idempotent operation produces the same result whether executed once or multiple times. This is critical in mainframe environments where: - Batch jobs may be restarted from the last checkpoint - CICS transactions may be retried after timeout - Messages in MQ queues may be delivered more than once
32.8.1 The Idempotent Key Pattern
Assign a unique identifier to each transaction and check for it before processing:
2000-PROCESS-PAYMENT.
*--- Check if this payment was already processed ---*
EXEC SQL
SELECT COUNT(*)
INTO :WS-EXISTING-COUNT
FROM PAYMENT_HISTORY
WHERE PAYMENT_ID = :WS-PAYMENT-ID
END-EXEC
IF WS-EXISTING-COUNT > 0
* Already processed — idempotent skip
DISPLAY 'PAYMENT ALREADY PROCESSED: '
WS-PAYMENT-ID
MOVE 'DUPLICATE' TO WS-PROCESS-RESULT
ELSE
PERFORM 2100-EXECUTE-PAYMENT
END-IF.
32.8.2 The Idempotent Update Pattern
Design UPDATE statements to be naturally idempotent:
*--- NON-IDEMPOTENT: Running this twice doubles the credit ---*
EXEC SQL
UPDATE ACCOUNT
SET ACCT_BALANCE = ACCT_BALANCE + :WS-AMOUNT
WHERE ACCT_NUMBER = :WS-ACCT-NUM
END-EXEC.
*--- IDEMPOTENT: Running this twice has the same effect ---*
EXEC SQL
UPDATE ACCOUNT
SET ACCT_BALANCE = :WS-NEW-BALANCE
WHERE ACCT_NUMBER = :WS-ACCT-NUM
AND LAST_UPDATE < :WS-TRANS-TIMESTAMP
END-EXEC.
The idempotent version sets an absolute value rather than a relative increment, and includes a timestamp guard to prevent reapplication.
32.8.3 The Request-Response Log Pattern
For complex transactions, maintain a log of requests and their results:
01 WS-REQUEST-LOG-REC.
05 RL-REQUEST-ID PIC X(20).
05 RL-REQUEST-TYPE PIC X(10).
05 RL-REQUEST-TIME PIC X(26).
05 RL-STATUS PIC X(10).
05 RL-RESULT-CODE PIC X(4).
05 RL-RESULT-DATA PIC X(200).
2000-PROCESS-REQUEST.
*--- Check if we already processed this request ---*
EXEC SQL
SELECT STATUS, RESULT_CODE, RESULT_DATA
INTO :RL-STATUS, :RL-RESULT-CODE,
:RL-RESULT-DATA
FROM REQUEST_LOG
WHERE REQUEST_ID = :RL-REQUEST-ID
END-EXEC
EVALUATE SQLCODE
WHEN 0
* Already processed — return cached result
DISPLAY 'RETURNING CACHED RESULT FOR: '
RL-REQUEST-ID
WHEN +100
* Not found — process normally
PERFORM 2100-EXECUTE-REQUEST
PERFORM 2200-LOG-RESULT
WHEN OTHER
PERFORM 9000-SQL-ERROR
END-EVALUATE.
📊 Idempotency in Practice at MedClaim: James Okafor implemented the request-log pattern across all of MedClaim's claim processing programs after a weekend production restart processed 12,000 claims twice. The duplicate payments cost $2.3 million to reverse manually. Now, every claim processing step checks a transaction log before executing. "Idempotency is not a nice-to-have," James says. "It is the difference between a calm Monday and a catastrophe."
32.9 Throughput and Response Time Optimization
Transaction design directly affects two key metrics: throughput (transactions per second) and response time (how long each transaction takes).
32.9.1 Batch Throughput Optimization
*--- Pattern: Batch with periodic commits ---*
01 WS-BATCH-CONTROL.
05 WS-COMMIT-INTERVAL PIC 9(5) VALUE 01000.
05 WS-RECORD-COUNT PIC 9(9) VALUE ZEROS.
05 WS-COMMIT-COUNT PIC 9(5) VALUE ZEROS.
05 WS-ERROR-COUNT PIC 9(5) VALUE ZEROS.
05 WS-START-TIME PIC X(26).
05 WS-END-TIME PIC X(26).
2000-PROCESS-BATCH.
MOVE FUNCTION CURRENT-DATE TO WS-START-TIME
PERFORM UNTIL END-OF-INPUT
READ INPUT-FILE INTO WS-INPUT-REC
AT END SET END-OF-INPUT TO TRUE
END-READ
IF NOT END-OF-INPUT
PERFORM 2100-PROCESS-ONE-RECORD
ADD 1 TO WS-RECORD-COUNT
IF FUNCTION MOD(WS-RECORD-COUNT,
WS-COMMIT-INTERVAL) = 0
EXEC SQL COMMIT END-EXEC
ADD 1 TO WS-COMMIT-COUNT
PERFORM 2200-CHECKPOINT
END-IF
END-IF
END-PERFORM
*--- Final commit ---*
EXEC SQL COMMIT END-EXEC
MOVE FUNCTION CURRENT-DATE TO WS-END-TIME
DISPLAY 'RECORDS PROCESSED: ' WS-RECORD-COUNT
DISPLAY 'COMMITS ISSUED: ' WS-COMMIT-COUNT
DISPLAY 'ERRORS: ' WS-ERROR-COUNT
DISPLAY 'START TIME: ' WS-START-TIME
DISPLAY 'END TIME: ' WS-END-TIME.
Throughput levers: 1. Commit interval: Too small (every record) wastes I/O on commits. Too large (end of job) risks long rollbacks. Profile to find the sweet spot. 2. Buffer pool sizing: Larger DB2 buffer pools reduce physical I/O. Work with your DBA to tune. 3. Sequential prefetch: DB2 automatically prefetches data for sequential scans. Design your access pattern to be sequential when possible. 4. Parallel processing: Split the input into ranges and run multiple batch jobs concurrently (but watch for lock contention).
32.9.2 Online Response Time Optimization
For CICS transactions, response time is king:
*--- Minimize DB2 calls ---*
*--- BAD: Multiple selects ---*
EXEC SQL SELECT CUST_NAME INTO :H ... END-EXEC.
EXEC SQL SELECT ACCT_BAL INTO :H ... END-EXEC.
EXEC SQL SELECT LAST_TXN INTO :H ... END-EXEC.
*--- GOOD: Single join ---*
EXEC SQL
SELECT C.CUST_NAME, A.ACCT_BAL, T.LAST_TXN
INTO :WS-CUST-NAME, :WS-ACCT-BAL,
:WS-LAST-TXN
FROM CUSTOMER C
JOIN ACCOUNT A ON C.CUST_ID = A.CUST_ID
LEFT JOIN LAST_TRANSACTION T
ON A.ACCT_NUM = T.ACCT_NUM
WHERE C.CUST_ID = :WS-CUST-ID
AND A.ACCT_NUM = :WS-ACCT-NUM
END-EXEC.
Response time levers: 1. Minimize the number of I/O operations — SQL calls, VSAM reads, IMS DL/I calls 2. Avoid table scans — ensure WHERE clauses use indexed columns 3. Commit early — release locks as soon as possible 4. Avoid GETMAIN/FREEMAIN in loops — allocate working storage once 5. Keep transaction scope small — do only what is necessary for this request
✅ Try It Yourself: Write a CICS-style transfer transaction that implements pessimistic locking with resource ordering. Add a retry loop for deadlock handling (SQLCODE -911). Measure the "lock hold time" by capturing timestamps before the first SELECT FOR UPDATE and after the COMMIT.
32.10 GlobalBank: The Atomic Transfer
Let us put it all together with a complete, production-quality funds transfer program for GlobalBank. This version includes resource ordering, deadlock retry, idempotency, compensation logging, and proper error handling:
*================================================================*
* PROGRAM: XFER0200 *
* PURPOSE: Production funds transfer with full safety *
* AUTHOR: Maria Chen / GlobalBank Development *
* DATE: 2025-07-01 *
*================================================================*
IDENTIFICATION DIVISION.
PROGRAM-ID. XFER0200.
DATA DIVISION.
WORKING-STORAGE SECTION.
EXEC SQL INCLUDE SQLCA END-EXEC.
01 WS-TRANSFER-REQUEST.
05 WS-XFER-ID PIC X(20).
05 WS-SOURCE-ACCT PIC X(12).
05 WS-DEST-ACCT PIC X(12).
05 WS-AMOUNT PIC S9(11)V99 COMP-3.
05 WS-REQUESTOR PIC X(8).
05 WS-REQUEST-TIME PIC X(26).
01 WS-ORDERED-LOCKS.
05 WS-FIRST-ACCT PIC X(12).
05 WS-SECOND-ACCT PIC X(12).
05 WS-FIRST-BAL PIC S9(11)V99 COMP-3.
05 WS-SECOND-BAL PIC S9(11)V99 COMP-3.
01 WS-CONTROL.
05 WS-RETRY-COUNT PIC 9(2) VALUE ZEROS.
05 WS-MAX-RETRIES PIC 9(2) VALUE 03.
05 WS-STATUS-FLAG PIC X(1).
88 XFER-SUCCESS VALUE 'S'.
88 XFER-FAILED VALUE 'F'.
88 XFER-DUPLICATE VALUE 'D'.
05 WS-ERROR-MSG PIC X(80).
PROCEDURE DIVISION.
0000-MAIN.
PERFORM 1000-CHECK-IDEMPOTENCY
IF NOT XFER-DUPLICATE
PERFORM 2000-EXECUTE-WITH-RETRY
END-IF
PERFORM 9000-REPORT-RESULT
GOBACK.
1000-CHECK-IDEMPOTENCY.
EXEC SQL
SELECT XFER_STATUS
INTO :WS-STATUS-FLAG
FROM TRANSFER_LOG
WHERE XFER_ID = :WS-XFER-ID
END-EXEC
IF SQLCODE = 0
SET XFER-DUPLICATE TO TRUE
DISPLAY 'DUPLICATE TRANSFER: ' WS-XFER-ID
ELSE IF SQLCODE = +100
CONTINUE
ELSE
PERFORM 9800-SQL-ERROR
END-IF.
2000-EXECUTE-WITH-RETRY.
MOVE ZEROS TO WS-RETRY-COUNT
PERFORM 2100-ATTEMPT-TRANSFER
PERFORM UNTIL XFER-SUCCESS
OR XFER-FAILED
OR WS-RETRY-COUNT >= WS-MAX-RETRIES
ADD 1 TO WS-RETRY-COUNT
DISPLAY 'RETRYING TRANSFER: ATTEMPT '
WS-RETRY-COUNT
PERFORM 2100-ATTEMPT-TRANSFER
END-PERFORM
IF NOT XFER-SUCCESS
SET XFER-FAILED TO TRUE
MOVE 'MAX RETRIES EXCEEDED' TO WS-ERROR-MSG
END-IF.
2100-ATTEMPT-TRANSFER.
*--- Resource ordering: lock lower account first ---*
IF WS-SOURCE-ACCT < WS-DEST-ACCT
MOVE WS-SOURCE-ACCT TO WS-FIRST-ACCT
MOVE WS-DEST-ACCT TO WS-SECOND-ACCT
ELSE
MOVE WS-DEST-ACCT TO WS-FIRST-ACCT
MOVE WS-SOURCE-ACCT TO WS-SECOND-ACCT
END-IF
*--- Lock first account ---*
EXEC SQL
SELECT ACCT_BALANCE
INTO :WS-FIRST-BAL
FROM ACCOUNT
WHERE ACCT_NUMBER = :WS-FIRST-ACCT
FOR UPDATE OF ACCT_BALANCE
END-EXEC
IF SQLCODE = -911
EXEC SQL ROLLBACK END-EXEC
GO TO 2100-EXIT
END-IF
IF SQLCODE NOT = 0
MOVE 'FIRST LOCK FAILED' TO WS-ERROR-MSG
SET XFER-FAILED TO TRUE
EXEC SQL ROLLBACK END-EXEC
GO TO 2100-EXIT
END-IF
*--- Lock second account ---*
EXEC SQL
SELECT ACCT_BALANCE
INTO :WS-SECOND-BAL
FROM ACCOUNT
WHERE ACCT_NUMBER = :WS-SECOND-ACCT
FOR UPDATE OF ACCT_BALANCE
END-EXEC
IF SQLCODE = -911
EXEC SQL ROLLBACK END-EXEC
GO TO 2100-EXIT
END-IF
IF SQLCODE NOT = 0
MOVE 'SECOND LOCK FAILED' TO WS-ERROR-MSG
SET XFER-FAILED TO TRUE
EXEC SQL ROLLBACK END-EXEC
GO TO 2100-EXIT
END-IF
*--- Verify sufficient funds ---*
IF WS-SOURCE-ACCT = WS-FIRST-ACCT
IF WS-FIRST-BAL < WS-AMOUNT
MOVE 'INSUFFICIENT FUNDS'
TO WS-ERROR-MSG
SET XFER-FAILED TO TRUE
EXEC SQL ROLLBACK END-EXEC
GO TO 2100-EXIT
END-IF
ELSE
IF WS-SECOND-BAL < WS-AMOUNT
MOVE 'INSUFFICIENT FUNDS'
TO WS-ERROR-MSG
SET XFER-FAILED TO TRUE
EXEC SQL ROLLBACK END-EXEC
GO TO 2100-EXIT
END-IF
END-IF
*--- Debit source ---*
EXEC SQL
UPDATE ACCOUNT
SET ACCT_BALANCE = ACCT_BALANCE - :WS-AMOUNT,
LAST_UPDATE = CURRENT TIMESTAMP
WHERE ACCT_NUMBER = :WS-SOURCE-ACCT
END-EXEC
*--- Credit destination ---*
EXEC SQL
UPDATE ACCOUNT
SET ACCT_BALANCE = ACCT_BALANCE + :WS-AMOUNT,
LAST_UPDATE = CURRENT TIMESTAMP
WHERE ACCT_NUMBER = :WS-DEST-ACCT
END-EXEC
*--- Log the transfer ---*
EXEC SQL
INSERT INTO TRANSFER_LOG
(XFER_ID, SOURCE_ACCT, DEST_ACCT,
AMOUNT, XFER_STATUS, XFER_TIMESTAMP)
VALUES
(:WS-XFER-ID, :WS-SOURCE-ACCT,
:WS-DEST-ACCT, :WS-AMOUNT,
'S', CURRENT TIMESTAMP)
END-EXEC
*--- Commit everything ---*
EXEC SQL COMMIT END-EXEC
SET XFER-SUCCESS TO TRUE.
2100-EXIT.
EXIT.
9000-REPORT-RESULT.
EVALUATE TRUE
WHEN XFER-SUCCESS
DISPLAY 'TRANSFER SUCCESSFUL: '
WS-XFER-ID
WHEN XFER-DUPLICATE
DISPLAY 'TRANSFER ALREADY PROCESSED: '
WS-XFER-ID
WHEN XFER-FAILED
DISPLAY 'TRANSFER FAILED: ' WS-ERROR-MSG
END-EVALUATE.
9800-SQL-ERROR.
DISPLAY 'SQL ERROR: ' SQLCODE.
🧪 Design Review: This program demonstrates five defensive patterns working together: 1. Idempotency check (paragraph 1000) — prevents duplicate processing 2. Resource ordering (paragraph 2100) — prevents deadlocks 3. Deadlock retry (paragraph 2000) — recovers from unavoidable deadlocks 4. All-or-nothing commit — only commits when all steps succeed 5. Transfer logging — creates an audit trail and supports the idempotency check
32.11 MedClaim: Claim Adjudication Transaction
At MedClaim, the claim adjudication process updates three tables atomically: CLAIM_MASTER (status and approved amount), PAYMENT (new payment record), and PROVIDER_BALANCE (running balance). James Okafor designed it with the saga pattern for the multi-day lifecycle but atomic transactions within each step:
*================================================================*
* PROGRAM: CLMADJ01 *
* PURPOSE: Adjudicate a single claim — atomic step in saga *
*================================================================*
3000-ADJUDICATE-CLAIM.
*--- Read claim with lock ---*
EXEC SQL
SELECT CLAIM_STATUS, BILLED_AMT,
PROVIDER_ID, MEMBER_ID
INTO :WS-CLAIM-STATUS, :WS-BILLED-AMT,
:WS-PROVIDER-ID, :WS-MEMBER-ID
FROM CLAIM_MASTER
WHERE CLAIM_ID = :WS-CLAIM-ID
FOR UPDATE
END-EXEC
IF SQLCODE NOT = 0
MOVE 'CLAIM NOT FOUND' TO WS-ERROR-MSG
SET ADJ-FAILED TO TRUE
GO TO 3000-EXIT
END-IF
*--- Verify claim is in correct state ---*
IF WS-CLAIM-STATUS NOT = 'VALIDATED'
MOVE 'CLAIM NOT IN VALIDATED STATUS'
TO WS-ERROR-MSG
SET ADJ-FAILED TO TRUE
GO TO 3000-EXIT
END-IF
*--- Calculate approved amount (business rules) ---*
PERFORM 3100-APPLY-ADJUDICATION-RULES
*--- Update claim status ---*
EXEC SQL
UPDATE CLAIM_MASTER
SET CLAIM_STATUS = 'ADJUDICATED',
APPROVED_AMT = :WS-APPROVED-AMT,
ADJUDICATION_DATE = CURRENT DATE,
ADJUDICATOR = :WS-ADJUDICATOR-ID
WHERE CLAIM_ID = :WS-CLAIM-ID
END-EXEC
*--- Create payment record ---*
EXEC SQL
INSERT INTO PAYMENT
(PAYMENT_ID, CLAIM_ID, PROVIDER_ID,
PAYMENT_AMT, PAYMENT_STATUS,
CREATE_DATE)
VALUES
(:WS-PAYMENT-ID, :WS-CLAIM-ID,
:WS-PROVIDER-ID, :WS-APPROVED-AMT,
'PENDING', CURRENT DATE)
END-EXEC
*--- Update provider balance ---*
EXEC SQL
UPDATE PROVIDER_BALANCE
SET PENDING_AMT = PENDING_AMT
+ :WS-APPROVED-AMT,
LAST_CLAIM_DATE = CURRENT DATE
WHERE PROVIDER_ID = :WS-PROVIDER-ID
END-EXEC
*--- All succeeded — commit ---*
EXEC SQL COMMIT END-EXEC
SET ADJ-SUCCESS TO TRUE.
3000-EXIT.
IF ADJ-FAILED
EXEC SQL ROLLBACK END-EXEC
END-IF.
⚖️ The Modernization Spectrum: Notice that MedClaim's adjudication uses DB2 for all three tables. James Okafor lobbied to move the PROVIDER_BALANCE from a VSAM file to a DB2 table specifically so that all three updates could participate in a single DB2 COMMIT, avoiding the complexity and overhead of two-phase commit. "Sometimes the best modernization is simplifying your transaction boundaries," he says.
32.12 Transaction Anti-Patterns
Learning what not to do is as important as learning what to do. Here are the most common transaction design mistakes:
32.12.1 The Chatty Transaction
*--- ANTI-PATTERN: Too many commits ---*
EXEC SQL UPDATE ... END-EXEC.
EXEC SQL COMMIT END-EXEC. *> Commit 1
EXEC SQL UPDATE ... END-EXEC.
EXEC SQL COMMIT END-EXEC. *> Commit 2
EXEC SQL INSERT ... END-EXEC.
EXEC SQL COMMIT END-EXEC. *> Commit 3
Problem: If the program fails between Commit 2 and Commit 3, the data is inconsistent. Each update should be part of the same unit of work unless they are truly independent.
32.12.2 The Mega-Transaction
*--- ANTI-PATTERN: One million updates, no commit ---*
PERFORM 1000000 TIMES
EXEC SQL UPDATE ... END-EXEC
END-PERFORM
EXEC SQL COMMIT END-EXEC.
Problem: If the program abends at record 999,999, the entire million-record rollback takes hours, during which the database is locked and unavailable.
32.12.3 The Optimistic Failure Ignorer
*--- ANTI-PATTERN: Ignoring optimistic lock failure ---*
EXEC SQL UPDATE ... WHERE LAST_UPDATE = :SAVED END-EXEC.
IF SQLERRD(3) = 0
DISPLAY 'CONCURRENT MODIFICATION'
* But then continues processing anyway!
END-IF.
Problem: The program detects the conflict but does not stop or retry. It continues with stale data.
32.12.4 The Lock-and-Think
*--- ANTI-PATTERN: Holding locks during user interaction ---*
EXEC SQL SELECT ... FOR UPDATE END-EXEC.
EXEC CICS SEND MAP(...) END-EXEC. *> Show screen
EXEC CICS RECEIVE MAP(...) END-EXEC. *> Wait for user
* Lock held for entire user think time!
EXEC SQL UPDATE ... END-EXEC.
EXEC SQL COMMIT END-EXEC.
Problem: In pseudo-conversational CICS, this can lock a row for minutes while the user reads the screen. Use optimistic locking for conversational patterns.
32.13 VSAM Transaction Patterns
Many COBOL programs work with VSAM files rather than (or in addition to) DB2 tables. VSAM has its own transaction patterns and considerations.
32.13.1 VSAM and Record-Level Sharing (RLS)
VSAM Record-Level Sharing (RLS) enables multiple programs across multiple z/OS systems to access the same VSAM file concurrently with record-level locking. Without RLS, VSAM locking is at the control-interval (CI) level, which is much more restrictive.
With RLS, VSAM participates in z/OS's Resource Recovery Services (RRS) for two-phase commit. This means a single CICS transaction can atomically update both a DB2 table and a VSAM file.
32.13.2 VSAM Locking Patterns
*--- Read for update (acquires exclusive lock) ---*
READ ACCT-FILE INTO WS-ACCT-REC
KEY IS WS-ACCT-KEY
IF WS-FILE-STATUS = '00'
*--- Record is now locked ---*
ADD WS-DEPOSIT TO ACCT-BALANCE
REWRITE ACCT-RECORD FROM WS-ACCT-REC
IF WS-FILE-STATUS = '00'
DISPLAY 'UPDATE SUCCESSFUL'
ELSE
DISPLAY 'REWRITE FAILED: ' WS-FILE-STATUS
END-IF
END-IF.
32.13.3 VSAM Commit in Batch
In batch COBOL programs without CICS or RRS, VSAM files do not participate in formal COMMIT/ROLLBACK. Changes are written directly to disk. This means:
- There is no automatic rollback if the program abends after a REWRITE
- The programmer must implement application-level recovery
- For critical updates, consider writing to a temporary file first, then applying all changes in a single pass
*--- Application-level backup before batch update ---*
PERFORM UNTIL END-OF-UPDATE-FILE
READ UPDATE-FILE INTO WS-UPDATE-REC
AT END SET END-OF-UPDATE-FILE TO TRUE
END-READ
IF NOT END-OF-UPDATE-FILE
*--- Read current record and save for backup ---*
READ MASTER-FILE INTO WS-MASTER-REC
KEY IS WS-MASTER-KEY
IF WS-MASTER-STATUS = '00'
WRITE BACKUP-RECORD FROM WS-MASTER-REC
END-IF
*--- Apply update ---*
PERFORM 2100-APPLY-UPDATE
REWRITE MASTER-RECORD FROM WS-MASTER-REC
END-IF
END-PERFORM.
The backup file provides a manual recovery path: if the program fails, a restore program can read the backup file and revert the master file to its pre-update state.
⚠️ Important: This application-level backup approach is much less robust than DB2's COMMIT/ROLLBACK or CICS's SYNCPOINT. It does not handle the case where the program abends during a REWRITE (leaving a partially written record). For critical data, strongly consider using VSAM RLS with CICS/RRS, or migrating the data to DB2.
32.13.4 Mixed DB2/VSAM Transactions
Many real-world COBOL programs update both DB2 tables and VSAM files in the same transaction. In CICS with VSAM RLS:
*--- Update DB2 table ---*
EXEC SQL
UPDATE CLAIM_MASTER
SET CLAIM_STATUS = 'PROCESSED'
WHERE CLAIM_ID = :WS-CLAIM-ID
END-EXEC
*--- Update VSAM file ---*
EXEC CICS READ
FILE('CLMVSAM')
INTO(WS-VSAM-REC)
RIDFLD(WS-CLAIM-KEY)
UPDATE
RESP(WS-RESP)
END-EXEC
IF WS-RESP = DFHRESP(NORMAL)
MOVE 'PROCESSED' TO VSAM-CLM-STATUS
EXEC CICS REWRITE
FILE('CLMVSAM')
FROM(WS-VSAM-REC)
RESP(WS-RESP)
END-EXEC
END-IF
*--- Commit both atomically ---*
EXEC CICS SYNCPOINT END-EXEC.
The SYNCPOINT coordinates the DB2 commit and the VSAM commit through RRS's two-phase commit protocol. Either both succeed or both roll back.
32.14 Error Recovery Patterns
Beyond rollback, production COBOL programs need structured error recovery that handles different categories of failures differently.
32.14.1 The Error Classification Pattern
Not all errors are equal. A classification system helps the program respond appropriately:
01 WS-ERROR-CLASSIFICATION.
05 WS-ERROR-CLASS PIC X(1).
88 ERR-RETRYABLE VALUE 'R'.
88 ERR-SKIPPABLE VALUE 'S'.
88 ERR-FATAL VALUE 'F'.
05 WS-ERROR-CODE PIC X(10).
05 WS-ERROR-DESC PIC X(80).
05 WS-ERROR-ACTION PIC X(20).
8000-CLASSIFY-ERROR.
EVALUATE TRUE
*--- Retryable errors ---*
WHEN SQLCODE = -911
SET ERR-RETRYABLE TO TRUE
MOVE 'DEADLOCK' TO WS-ERROR-CODE
MOVE 'ROLLBACK AND RETRY'
TO WS-ERROR-ACTION
WHEN SQLCODE = -904
SET ERR-RETRYABLE TO TRUE
MOVE 'RESOURCE-UNAVAIL'
TO WS-ERROR-CODE
MOVE 'WAIT AND RETRY'
TO WS-ERROR-ACTION
*--- Skippable errors (log and continue) ---*
WHEN SQLCODE = -803
SET ERR-SKIPPABLE TO TRUE
MOVE 'DUPLICATE-KEY' TO WS-ERROR-CODE
MOVE 'LOG AND SKIP RECORD'
TO WS-ERROR-ACTION
WHEN SQLCODE = +100
SET ERR-SKIPPABLE TO TRUE
MOVE 'NOT-FOUND' TO WS-ERROR-CODE
MOVE 'LOG AND SKIP RECORD'
TO WS-ERROR-ACTION
*--- Fatal errors ---*
WHEN SQLCODE = -922
SET ERR-FATAL TO TRUE
MOVE 'AUTH-FAILURE' TO WS-ERROR-CODE
MOVE 'ROLLBACK AND ABEND'
TO WS-ERROR-ACTION
WHEN OTHER
SET ERR-FATAL TO TRUE
MOVE 'UNKNOWN' TO WS-ERROR-CODE
MOVE 'ROLLBACK AND ABEND'
TO WS-ERROR-ACTION
END-EVALUATE.
32.14.2 The Error Log Pattern
Every error should be logged for post-mortem analysis:
01 WS-ERROR-LOG-REC.
05 EL-TIMESTAMP PIC X(26).
05 EL-PROGRAM PIC X(8).
05 EL-PARAGRAPH PIC X(30).
05 EL-ERROR-CLASS PIC X(1).
05 EL-ERROR-CODE PIC X(10).
05 EL-ERROR-DESC PIC X(80).
05 EL-KEY-VALUE PIC X(20).
05 EL-SQLCODE PIC S9(9) COMP.
05 EL-SQLERRM PIC X(70).
05 EL-RECORD-NUM PIC 9(9).
8100-LOG-ERROR.
MOVE FUNCTION CURRENT-DATE TO EL-TIMESTAMP
MOVE 'CLMSAGA1' TO EL-PROGRAM
MOVE WS-DEBUG-PARA TO EL-PARAGRAPH
MOVE WS-ERROR-CLASS TO EL-ERROR-CLASS
MOVE WS-ERROR-CODE TO EL-ERROR-CODE
MOVE WS-ERROR-DESC TO EL-ERROR-DESC
MOVE WS-CURRENT-KEY TO EL-KEY-VALUE
MOVE SQLCODE TO EL-SQLCODE
MOVE SQLERRMC TO EL-SQLERRM
MOVE WS-RECORD-COUNT TO EL-RECORD-NUM
WRITE ERROR-LOG-RECORD FROM WS-ERROR-LOG-REC
DISPLAY 'ERROR LOGGED: ' EL-ERROR-CODE
' AT RECORD ' EL-RECORD-NUM.
32.14.3 The Graceful Degradation Pattern
Some batch programs should continue processing even when individual records fail. The key is maintaining accurate counts and ensuring that failures are fully logged for later resolution:
01 WS-BATCH-STATS.
05 WS-TOTAL-READ PIC 9(9) VALUE ZEROS.
05 WS-TOTAL-PROCESSED PIC 9(9) VALUE ZEROS.
05 WS-TOTAL-SKIPPED PIC 9(9) VALUE ZEROS.
05 WS-TOTAL-ERRORS PIC 9(9) VALUE ZEROS.
05 WS-MAX-ERRORS PIC 9(5) VALUE 00100.
2000-PROCESS-BATCH.
PERFORM UNTIL END-OF-INPUT
OR WS-TOTAL-ERRORS >= WS-MAX-ERRORS
READ INPUT-FILE INTO WS-INPUT-REC
AT END SET END-OF-INPUT TO TRUE
END-READ
IF NOT END-OF-INPUT
ADD 1 TO WS-TOTAL-READ
PERFORM 2100-PROCESS-ONE-RECORD
EVALUATE WS-ERROR-CLASS
WHEN SPACES
ADD 1 TO WS-TOTAL-PROCESSED
WHEN 'S'
ADD 1 TO WS-TOTAL-SKIPPED
WHEN 'F'
ADD 1 TO WS-TOTAL-ERRORS
END-EVALUATE
END-IF
END-PERFORM
IF WS-TOTAL-ERRORS >= WS-MAX-ERRORS
DISPLAY '*** MAX ERRORS EXCEEDED ***'
DISPLAY '*** BATCH TERMINATED ***'
END-IF.
The WS-MAX-ERRORS threshold prevents the program from continuing indefinitely when something is fundamentally wrong (e.g., a corrupted file or a systemic database problem). At MedClaim, James Okafor sets this to 0.1% of expected volume — if more than 0.1% of records fail, the batch stops for human investigation.
32.15 Transaction Design in Mixed Environments
Modern mainframe applications often span multiple technologies. A single business transaction might involve: - Reading from a VSAM file - Updating a DB2 table - Sending a message to MQ - Writing to an IMS database
32.15.1 The MQ Transaction Pattern
IBM MQ (formerly WebSphere MQ) provides messaging that participates in transaction coordination. When MQ is configured for transactional messaging, a message put (or get) can be committed or rolled back along with database updates.
*--- In CICS, MQ participates in SYNCPOINT ---*
EXEC CICS PUT
CONTAINER('MQCONTAINER')
QUEUE('CLAIM.OUTBOUND')
FROM(WS-MQ-MESSAGE)
RESP(WS-RESP)
END-EXEC
IF WS-RESP NOT = DFHRESP(NORMAL)
SET TRANSACTION-FAILED TO TRUE
END-IF.
*--- SYNCPOINT commits both DB2 and MQ ---*
IF TRANSACTION-OK
EXEC CICS SYNCPOINT END-EXEC
* Message is now visible to consumers
ELSE
EXEC CICS SYNCPOINT ROLLBACK END-EXEC
* Message is removed from queue
END-IF.
This ensures that the message is only visible to consumers if the database updates are also committed. Without transactional MQ, you risk sending a message about a claim that was never actually processed (because the database update was rolled back).
32.15.2 The Outbox Pattern
When true two-phase commit is not available or practical (e.g., when sending to an external system), use the outbox pattern:
- Write the business data AND the outbound message to the database in the same transaction
- A separate process reads the outbox table and sends the messages
- After successful send, mark the outbox entry as sent
*--- Step 1: Business update + outbox in one COMMIT ---*
EXEC SQL
UPDATE CLAIM_MASTER
SET CLAIM_STATUS = 'APPROVED'
WHERE CLAIM_ID = :WS-CLAIM-ID
END-EXEC
EXEC SQL
INSERT INTO OUTBOX
(MSG_ID, MSG_TYPE, MSG_DATA, MSG_STATUS)
VALUES
(:WS-MSG-ID, 'CLM-APPROVED',
:WS-MSG-DATA, 'PENDING')
END-EXEC
EXEC SQL COMMIT END-EXEC.
*--- Step 2: Separate sender process (different program) ---*
*--- Reads PENDING outbox entries, sends via MQ, ---*
*--- marks as SENT. If send fails, retries later. ---*
The outbox pattern guarantees that messages are sent if and only if the business transaction committed. It trades immediate delivery for guaranteed delivery.
32.16 Batch Checkpoint/Restart Design
For batch programs that process millions of records, checkpoint/restart is not optional — it is a critical design requirement. Without it, a failure near the end of a 12-hour batch run means restarting from the beginning.
32.16.1 The Checkpoint/Restart Framework
A well-designed checkpoint/restart framework saves all state needed to resume processing from the last checkpoint:
01 WS-CHECKPOINT-DATA.
05 CHKP-RECORD-COUNT PIC 9(10).
05 CHKP-TOTAL-DEBITS PIC S9(13)V99 COMP-3.
05 CHKP-TOTAL-CREDITS PIC S9(13)V99 COMP-3.
05 CHKP-LAST-KEY PIC X(20).
05 CHKP-ERROR-COUNT PIC 9(7).
05 CHKP-TIMESTAMP PIC X(26).
05 CHKP-INPUT-POS PIC 9(10).
05 CHKP-HASH-TOTAL PIC 9(18).
5000-TAKE-CHECKPOINT.
*--- Save all accumulators ---*
MOVE WS-RECORD-COUNT TO CHKP-RECORD-COUNT
MOVE WS-TOTAL-DEBITS TO CHKP-TOTAL-DEBITS
MOVE WS-TOTAL-CREDITS TO CHKP-TOTAL-CREDITS
MOVE WS-LAST-KEY TO CHKP-LAST-KEY
MOVE WS-ERROR-COUNT TO CHKP-ERROR-COUNT
MOVE FUNCTION CURRENT-DATE
TO CHKP-TIMESTAMP
MOVE WS-INPUT-POSITION TO CHKP-INPUT-POS
MOVE WS-HASH-TOTAL TO CHKP-HASH-TOTAL
*--- Commit database changes ---*
EXEC SQL COMMIT END-EXEC
*--- Write checkpoint to checkpoint file ---*
WRITE CHKP-RECORD FROM WS-CHECKPOINT-DATA
DISPLAY 'CHECKPOINT: RECORD '
CHKP-RECORD-COUNT
' AT ' CHKP-TIMESTAMP(1:19).
1000-CHECK-RESTART.
*--- Read last checkpoint if restart ---*
READ CHKP-FILE INTO WS-CHECKPOINT-DATA
AT END
DISPLAY 'NORMAL START — NO CHECKPOINT'
PERFORM 1100-INITIALIZE-FRESH
NOT AT END
DISPLAY 'RESTARTING FROM CHECKPOINT'
DISPLAY ' RECORDS: '
CHKP-RECORD-COUNT
PERFORM 1200-RESTORE-FROM-CHECKPOINT
END-READ.
1200-RESTORE-FROM-CHECKPOINT.
MOVE CHKP-RECORD-COUNT TO WS-RECORD-COUNT
MOVE CHKP-TOTAL-DEBITS TO WS-TOTAL-DEBITS
MOVE CHKP-TOTAL-CREDITS TO WS-TOTAL-CREDITS
MOVE CHKP-LAST-KEY TO WS-LAST-KEY
MOVE CHKP-ERROR-COUNT TO WS-ERROR-COUNT
MOVE CHKP-INPUT-POS TO WS-INPUT-POSITION
MOVE CHKP-HASH-TOTAL TO WS-HASH-TOTAL
*--- Reposition the input file ---*
PERFORM WS-INPUT-POSITION TIMES
READ INPUT-FILE INTO WS-INPUT-REC
END-PERFORM
DISPLAY 'REPOSITIONED TO RECORD: '
WS-INPUT-POSITION.
32.16.2 Checkpoint Frequency Calculation
The optimal checkpoint frequency depends on three factors:
- Commit overhead: Each COMMIT writes to the DB2 log. At roughly 2ms per COMMIT, 1,000 commits add 2 seconds to the batch run.
- Reprocessing time: If the program abends, records between the last checkpoint and the failure must be reprocessed. At 0.5ms per record and a checkpoint interval of 10,000 records, maximum reprocessing is 5 seconds.
- Rollback time: Without checkpoints, DB2 must roll back all uncommitted changes. Rolling back 1 million updates at 0.5ms each takes 500 seconds (over 8 minutes), during which the affected tables are locked.
The formula for optimal checkpoint interval is roughly:
Optimal interval = sqrt(Total records * Commit overhead / Record processing time)
For a typical batch program processing 1 million records at 0.5ms each with 2ms commit overhead, the optimal interval is approximately 2,000 records.
At GlobalBank, the standard is 1,000 records for critical financial data and 5,000 records for non-financial data.
32.16.3 Hash Totals for Integrity
Include hash totals in your checkpoints to verify that reprocessed records produce the same results:
*--- Add each key to a hash total ---*
COMPUTE WS-HASH-TOTAL =
FUNCTION MOD(
WS-HASH-TOTAL +
FUNCTION ORD(WS-ACCT-KEY(1:1)) *
FUNCTION ORD(WS-ACCT-KEY(2:1)),
999999999999999999)
On restart, reprocess from the checkpoint and verify that the hash total matches. If it does not, the input data has changed since the original run, and the restart may produce inconsistent results.
32.17 Transaction Design Review Checklist
Before any transaction program goes to production, Maria Chen requires the following review:
- Unit of work boundaries: Are all related operations within a single COMMIT scope?
- Error handling: Is every SQL/DL/I/file status code checked?
- Rollback on failure: Does every failure path issue ROLLBACK before exiting?
- Lock ordering: If multiple resources are locked, is the order consistent?
- Lock duration: How long are exclusive locks held? Can this be reduced?
- Idempotency: Is the transaction safe to retry? What happens if it runs twice?
- Commit frequency: For batch, is the checkpoint interval appropriate?
- Compensation: For long-running processes, are compensating transactions defined?
- Monitoring: Are transaction metrics captured (elapsed time, SQL count, retries)?
- Documentation: Are the transaction boundaries and recovery behavior documented?
This checklist has prevented dozens of production incidents at GlobalBank. "Transaction design is not something you get right by luck," Maria says. "You get it right by discipline."
32.18 Transaction Monitoring and Diagnostics
Production transaction problems require diagnostic data. Design your transactions to be observable:
01 WS-TRANSACTION-METRICS.
05 WS-TXN-START-TIME PIC X(26).
05 WS-TXN-END-TIME PIC X(26).
05 WS-SQL-COUNT PIC 9(5).
05 WS-LOCK-WAIT-TIME PIC 9(9).
05 WS-RETRY-COUNT PIC 9(2).
05 WS-COMMIT-COUNT PIC 9(3).
1000-INITIALIZE.
MOVE FUNCTION CURRENT-DATE TO WS-TXN-START-TIME
MOVE ZEROS TO WS-SQL-COUNT
MOVE ZEROS TO WS-LOCK-WAIT-TIME
MOVE ZEROS TO WS-RETRY-COUNT
MOVE ZEROS TO WS-COMMIT-COUNT.
9000-FINALIZE.
MOVE FUNCTION CURRENT-DATE TO WS-TXN-END-TIME
DISPLAY 'TXN METRICS:'
DISPLAY ' ELAPSED: ' WS-TXN-START-TIME
' TO ' WS-TXN-END-TIME
DISPLAY ' SQL CALLS: ' WS-SQL-COUNT
DISPLAY ' RETRIES: ' WS-RETRY-COUNT
DISPLAY ' COMMITS: ' WS-COMMIT-COUNT.
At GlobalBank, Priya Kapoor's monitoring system automatically flags transactions that exceed baseline metrics — a sudden increase in retries might indicate a new deadlock pattern, and a rising SQL count might indicate a missing index.
32.18.1 Logging Transaction Metrics to DB2
For production monitoring, write transaction metrics to a database table rather than SYSOUT:
9100-LOG-METRICS.
EXEC SQL
INSERT INTO TRANSACTION_METRICS
(TXN_ID, TXN_TYPE, START_TIME, END_TIME,
SQL_COUNT, RETRY_COUNT, COMMIT_COUNT,
RESULT_STATUS, ERROR_CODE)
VALUES
(:WS-TXN-ID, :WS-TXN-TYPE,
:WS-TXN-START-TIME, :WS-TXN-END-TIME,
:WS-SQL-COUNT, :WS-RETRY-COUNT,
:WS-COMMIT-COUNT,
:WS-RESULT-STATUS, :WS-ERROR-CODE)
END-EXEC.
This metrics table becomes a goldmine for performance analysis and capacity planning. Priya Kapoor runs weekly queries to identify: - Transactions with rising average SQL counts (potential index degradation) - Transactions with increasing retry rates (deadlock patterns emerging) - Batch jobs with growing elapsed times (data volume growth) - Error rate trends by transaction type
32.18.2 Real-Time Alerting
For CICS online transactions, response time monitoring is critical. CICS provides transaction-level statistics through the CICS Monitoring Facility (CMF). Additionally, you can implement application-level thresholds:
9200-CHECK-THRESHOLDS.
COMPUTE WS-ELAPSED-MS =
WS-END-STCK - WS-START-STCK
IF WS-ELAPSED-MS > WS-RESPONSE-THRESHOLD
EXEC CICS WRITEQ TD
QUEUE('CSMT')
FROM(WS-ALERT-MSG)
LENGTH(LENGTH OF WS-ALERT-MSG)
END-EXEC
DISPLAY 'THRESHOLD EXCEEDED: '
WS-ELAPSED-MS 'ms FOR '
WS-TXN-TYPE
END-IF.
32.19 Transaction Testing Strategies
Testing transaction correctness requires more than verifying that the program produces the right output. You must also verify that the program behaves correctly under failure conditions.
32.19.1 The Abend-and-Verify Test
Deliberately abend the program at various points in the transaction and verify that the database is in a consistent state:
- Run the transfer program
- Kill it after the debit but before the credit
- Check both account balances — they should be unchanged (rollback occurred)
- Verify the transfer log shows no completed entry
32.19.2 The Concurrent Access Test
Run multiple instances of the same program simultaneously, accessing the same records:
- Start two transfer programs that both transfer between accounts A and B
- Verify that both complete without deadlock (or that deadlock retry works)
- Verify that the final balances are correct (no lost updates)
- Run this test 100 times to check for race conditions
32.19.3 The Duplicate Message Test
Simulate message redelivery:
- Process a payment message
- Process the same message again (identical ID)
- Verify that the payment is applied only once (idempotency)
- Verify that the second processing returns the cached result
32.19.4 The Compensation Test
Trigger each saga step failure and verify compensation:
- Fail at step 1 — verify nothing is persisted
- Fail at step 2 — verify step 1 is compensated
- Fail at step 3 — verify steps 2 and 1 are compensated in reverse order
- Run each compensation twice — verify idempotency
✅ Try It Yourself: Design a test suite for the XFER0200 transfer program that includes all four test categories above. For each test, specify: (a) the setup conditions, (b) the action to perform, (c) the expected database state after the test, (d) how to verify the expected state. This exercise teaches you that testing transactions is fundamentally different from testing sequential logic.
32.20 Transaction Isolation Levels
DB2 provides four isolation levels that control how much a transaction can see of other transactions' uncommitted changes. Choosing the right isolation level is a critical transaction design decision.
32.20.1 The Four Isolation Levels
| Level | DB2 Name | Read Behavior | Lock Duration |
|---|---|---|---|
| Read Uncommitted | UR (Uncommitted Read) | Can see uncommitted changes from other transactions | No read locks |
| Read Committed | CS (Cursor Stability) | Sees only committed data; locks released after cursor moves | Short |
| Repeatable Read | RS (Read Stability) | Re-reading returns the same data; locked rows cannot change | Medium |
| Serializable | RR (Repeatable Read) | Full serialization; phantom reads prevented | Long |
32.20.2 Choosing an Isolation Level
For most COBOL batch programs, CS (Cursor Stability) is the default and appropriate choice. It provides a good balance between data consistency and concurrency.
For reporting programs that can tolerate slightly stale data, UR (Uncommitted Read) reduces lock contention dramatically:
EXEC SQL
SELECT CUST_NAME, ACCT_BALANCE
INTO :WS-CUST-NAME, :WS-ACCT-BALANCE
FROM CUSTOMER C
JOIN ACCOUNT A ON C.CUST_ID = A.CUST_ID
WHERE C.CUST_ID = :WS-CUST-ID
WITH UR
END-EXEC.
The WITH UR clause tells DB2 to read without acquiring any locks and without waiting for other transactions to commit. The trade-off: you might read a balance that is in the process of being updated. For a display-only inquiry, this is usually acceptable.
For financial transactions that must guarantee consistency (like the funds transfer), RS (Read Stability) or explicit FOR UPDATE locking is appropriate. This ensures that once you read a balance, no other transaction can change it until your transaction completes.
*--- For the transfer: use explicit locking ---*
EXEC SQL
SELECT ACCT_BALANCE
INTO :WS-BALANCE
FROM ACCOUNT
WHERE ACCT_NUMBER = :WS-ACCT-NUM
FOR UPDATE OF ACCT_BALANCE
WITH RS USE AND KEEP EXCLUSIVE LOCKS
END-EXEC.
32.20.3 Isolation Level Impact on Performance
At GlobalBank, Priya Kapoor measured the impact of isolation levels on their daily reporting batch:
| Isolation | Elapsed Time | Lock Waits | Lock Timeouts |
|---|---|---|---|
| RR | 4.2 hours | 12,847 | 23 |
| RS | 3.1 hours | 4,291 | 5 |
| CS | 2.4 hours | 987 | 0 |
| UR | 1.8 hours | 0 | 0 |
The reporting batch switched from CS to UR, saving 30 minutes per night. Since the report reflects point-in-time balances and the data does not need to be perfectly current, UR is the appropriate choice.
⚖️ Defensive Programming: Never use UR for transactions that update data. The risk of reading uncommitted data and then making decisions based on it is too high. UR is only appropriate for read-only operations where slight inconsistency is acceptable.
32.21 Putting It All Together: The Transaction Design Spectrum
Throughout this chapter, we have covered many patterns and techniques. Here is how they fit together in a typical enterprise system:
Simple batch update (single file, single DB2 table): - Periodic COMMIT (every 1,000-5,000 records) - Checkpoint/restart for recoverability - Error logging with graceful degradation
Online inquiry (CICS, read-only): - Isolation level UR or CS for reads - No explicit COMMIT needed (no updates) - Response time monitoring
Online update (CICS, single resource): - Pessimistic or optimistic locking depending on contention - Single SYNCPOINT after all updates - RESP/RESP2 checking on all EXEC CICS commands
Online update (CICS, multiple resources — DB2 + VSAM): - Two-phase commit via SYNCPOINT - Resource ordering to prevent deadlocks - Deadlock retry with maximum attempts - Idempotency via unique transaction ID
Multi-day business process (saga): - Each step is an independent committed transaction - Compensating transactions for each step - Saga state table tracking progress - Idempotent steps and compensations - Automated compensation orchestrator
Cross-system integration (MQ, external APIs): - Outbox pattern for guaranteed delivery - Transactional MQ for internal messaging - Idempotent receivers for at-least-once delivery - Request-response logging for duplicate detection
Each pattern builds on the fundamentals of atomicity, consistency, isolation, and durability. The complexity increases as you add more resources, longer durations, and more failure modes. But the core principle never changes: either everything succeeds, or the system recovers to a consistent state.
32.22 Chapter Summary
Transaction design is the art of making concurrent, failure-prone systems behave correctly. In this chapter, you learned:
- Unit of work: A sequence of operations that must all succeed or all be undone — the foundation of data integrity
- COMMIT and ROLLBACK: Explicit transaction boundaries in DB2, CICS, and IMS
- Two-phase commit: Coordinating commits across multiple resource managers (DB2 + VSAM, DB2 + IMS)
- Deadlock prevention: Resource ordering eliminates most deadlocks; timeout and retry handle the rest
- Optimistic vs. pessimistic locking: Choose based on contention level — pessimistic for hot resources, optimistic for cold
- Long-running transactions: The saga pattern breaks multi-day processes into committed steps with compensating transactions
- Compensation logic: Business-appropriate reversal, designed to be idempotent and auditable
- Idempotent design: Unique transaction IDs and state checks prevent duplicate processing
- Performance optimization: Minimize lock duration, choose appropriate commit intervals, reduce I/O calls
These patterns apply regardless of whether you are working with DB2, IMS, VSAM, or a combination. They are the engineering discipline that separates programs that "usually work" from programs that "always work — even when things go wrong."
In the next chapter, we shift from data integrity to code correctness: debugging strategies for finding and fixing the bugs that inevitably creep into even the most carefully designed programs.
"The junior developer writes code that works. The senior developer writes code that fails safely." — Maria Chen's advice to Derek Washington after his first production incident