Chapter 8: DB2 Locking, Concurrency, and Deadlock Resolution: The Chapter That Prevents 2am Pages

DataField.Dev

34 min read

> "Every senior mainframe programmer has a deadlock story. The good ones only have one." — Kwame Mensah, CNB Lead Systems Programmer

In This Chapter

8.1 The 2am Page That Teaches Locking
8.2 DB2 Lock Hierarchy and Lock Modes
8.3 Isolation Levels for COBOL Programs
8.4 Lock Escalation: The Silent Performance Killer
8.5 Deadlock Detection, Prevention, and Resolution
8.6 Concurrent Online and Batch: The Most Common Deadlock Scenario
8.7 Lock Avoidance and Optimistic Concurrency
8.8 HA Banking System Project Checkpoint: Locking Strategy
8.9 Spaced Review
Chapter Summary

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

Chapter 8: DB2 Locking, Concurrency, and Deadlock Resolution: The Chapter That Prevents 2am Pages

"Every senior mainframe programmer has a deadlock story. The good ones only have one." — Kwame Mensah, CNB Lead Systems Programmer

8.1 The 2am Page That Teaches Locking

Lisa Tran's phone buzzed at 2:07am on a Tuesday in March. She didn't need to look at the screen to know what it was. The on-call rotation had her number, and the message from the automated monitoring system was exactly three words long: BATCH CYCLE ABENDED.

Lisa had been a DB2 application programmer at Continental National Bank for six years. She'd seen plenty of abends. But this one was different. The nightly batch cycle — the one that processed 500 million transactions per day across CNB's core banking platform — had ground to a halt. Not because of a program bug. Not because of a disk failure. Because two programs, each doing exactly what they were designed to do, had locked each other out of the data they needed.

The SQLCODE was -911. Deadlock or timeout. The message in the DB2 log read:

DSNT501I RESOURCE UNAVAILABLE
REASON 00C9008E TYPE 00000200
DEADLOCK DETECTED - VICTIM IS PLAN CNBBATCH

By 2:15am, Lisa had Rob Chen on the phone. Rob was the DBA on call, and he was already pulling the IFCID 0172 trace records. What they found over the next four hours would fundamentally change how CNB designed its concurrent processing.

Here's the short version: The nightly batch cycle ran a program called CNBBATCH that updated account balances in the ACCT_MASTER table. It processed accounts sequentially by account number. At the same time, the online system — which CNB kept running 23.5 hours a day — was processing real-time wire transfers through a program called CNBWIRE. The wire transfer program updated the same ACCT_MASTER table, but it accessed accounts in the order the wire requests arrived, which was effectively random.

CNBBATCH held a lock on account 1000447 and needed account 1000892. CNBWIRE held a lock on account 1000892 and needed account 1000447. Neither could proceed. DB2's Internal Resource Lock Manager (IRLM) detected the deadlock and chose CNBBATCH as the victim — the program that would be rolled back.

The batch cycle restarted. Forty minutes later, it hit another deadlock. Then another. The 4-hour batch window was blown. Morning processing was delayed. The branch network opened late. Kwame Mensah's phone started ringing at 6am.

This chapter exists because of nights like that one. Every concept we cover — lock hierarchy, isolation levels, lock escalation, deadlock detection — is a tool that could have prevented Lisa's 2am page. By the end of this chapter, you'll be able to look at a COBOL program and identify the locking problems before they become production incidents.

What Lisa's incident teaches us:

Locking is not optional. DB2 enforces data integrity through locking. You don't choose whether locking happens — you choose how it happens.
Concurrent access patterns create deadlocks. When two programs access the same resources in different orders, deadlocks are inevitable, not unlikely.
The fix is design, not tuning. You can't tune your way out of a fundamentally broken locking strategy. The solution is understanding how DB2 locks work and designing your programs accordingly.

Let's get into the mechanics.

8.2 DB2 Lock Hierarchy and Lock Modes

8.2.1 The Lock Hierarchy

DB2 uses a hierarchical locking system. Think of it like a building security system: you can lock the entire building, a floor, a suite, or an individual office. DB2's hierarchy has four levels:

Level	Granularity	Analogy	When Used
Tablespace	Entire tablespace	Lock the building	`LOCK TABLE`, mass utilities, lock escalation
Table	Single table (segmented/universal tablespaces)	Lock a floor	`LOCK TABLE`, lock escalation
Page	4K, 8K, 16K, or 32K page	Lock a suite	Default for most access (LOCKSIZE PAGE)
Row	Single row	Lock an office	LOCKSIZE ROW tablespaces

The hierarchy matters because DB2 uses intent locks at higher levels to signal what's happening at lower levels. When your COBOL program reads a row with cursor stability, DB2 doesn't just lock the row — it places an intent lock on the table and tablespace to tell other processes "someone is doing something in here."

This is critical for understanding lock escalation, which we'll hit in Section 8.4.

8.2.2 Lock Modes

DB2 supports six lock modes. If you remember nothing else from this section, remember the compatibility matrix.

The Six Lock Modes:

Mode	Name	Purpose	Compatibility
IS	Intent Share	"I intend to read something inside this object"	Compatible with IS, IX, S, SIX
IX	Intent Exclusive	"I intend to modify something inside this object"	Compatible with IS, IX
S	Share	"I'm reading this entire object — nobody modify it"	Compatible with IS, S
U	Update	"I'm reading now, but I plan to modify"	Compatible with IS, S
SIX	Share with Intent Exclusive	"I'm reading the whole thing, modifying some of it"	Compatible with IS
X	Exclusive	"I own this. Go away."	Compatible with nothing

The Lock Compatibility Matrix:

         IS   IX    S    U   SIX   X
IS        Y    Y    Y    Y    Y    N
IX        Y    Y    N    N    N    N
S         Y    N    Y    Y    N    N
U         Y    N    Y    N    N    N
SIX       Y    N    N    N    N    N
X         N    N    N    N    N    N

Read this matrix until you can reproduce it from memory. Every deadlock diagnosis starts here.

8.2.3 How Locks Flow in a COBOL Program

Let's trace what happens when a COBOL program executes a simple SELECT:

       EXEC SQL
           SELECT ACCT_BAL
           INTO :WS-ACCT-BAL
           FROM ACCT_MASTER
           WHERE ACCT_ID = :WS-ACCT-ID
       END-EXEC

Under cursor stability (the default isolation level), DB2 does the following:

Acquires an IS (Intent Share) lock on the tablespace containing ACCT_MASTER.
Acquires an IS lock on the ACCT_MASTER table (if segmented/universal tablespace).
Acquires an S (Share) lock on the page (or row, if LOCKSIZE ROW) containing the target row.
Reads the data into WS-ACCT-BAL.
Releases the S lock on the page/row (because cursor stability only holds locks on the current cursor position).
Retains the IS locks on the tablespace and table until COMMIT.

Now, an UPDATE:

       EXEC SQL
           UPDATE ACCT_MASTER
           SET ACCT_BAL = :WS-NEW-BAL
           WHERE ACCT_ID = :WS-ACCT-ID
       END-EXEC

Acquires an IX (Intent Exclusive) lock on the tablespace.
Acquires an IX lock on the table.
Acquires an X (Exclusive) lock on the page/row.
Modifies the data.
Holds the X lock until COMMIT. This is the crucial difference. Exclusive locks are held until commit because DB2 must be able to roll back the change.

This is why COMMIT frequency matters so much. Every uncommitted UPDATE is holding an exclusive lock. The longer you wait to commit, the more locks you hold, and the more likely you are to collide with other programs.

8.2.4 The Update Lock — The Misunderstood Middle Child

The U (Update) lock deserves special attention because it prevents a specific type of deadlock called a conversion deadlock.

Consider this pattern:

       EXEC SQL
           DECLARE CSR1 CURSOR FOR
           SELECT ACCT_BAL
           FROM ACCT_MASTER
           WHERE ACCT_STATUS = 'ACTIVE'
           FOR UPDATE OF ACCT_BAL
       END-EXEC

Without the FOR UPDATE OF clause, DB2 would acquire an S lock when reading each row. When you later execute an UPDATE ... WHERE CURRENT OF CSR1, DB2 would need to convert the S lock to an X lock. But if two programs are both holding S locks on the same page (which is allowed — S is compatible with S), neither can convert to X. Deadlock.

The FOR UPDATE OF clause tells DB2 to acquire a U lock instead of an S lock. U is compatible with S (readers can still read), but U is not compatible with U. This means only one program at a time can hold a U lock on a resource, which prevents the conversion deadlock.

Rule: If your cursor will be used for positioned updates, ALWAYS use FOR UPDATE OF. No exceptions. This isn't a performance optimization — it's a deadlock prevention mechanism.

8.2.5 Latches vs. Locks

Don't confuse locks with latches. They're different mechanisms:

	Locks	Latches
Purpose	Protect data from concurrent access	Protect internal DB2 structures
Duration	Held until COMMIT/ROLLBACK	Held microseconds
Managed by	IRLM	DB2 internally
Visible to	Application programs, DBA monitoring	Only DB2 internals, performance traces
Deadlock detection	Yes (IRLM detects)	No (ordered acquisition prevents)

You can't control latches from COBOL. You can control locks. Focus on locks.

8.3 Isolation Levels for COBOL Programs

DB2 provides four isolation levels. Each represents a different tradeoff between data consistency and concurrency. Choosing the right one is an engineering decision, not a religious one.

8.3.1 The Four Isolation Levels

Uncommitted Read (UR)

       EXEC SQL
           SELECT ACCT_BAL
           INTO :WS-ACCT-BAL
           FROM ACCT_MASTER
           WHERE ACCT_ID = :WS-ACCT-ID
           WITH UR
       END-EXEC

UR acquires no row/page locks for read operations. It can read data that another program has modified but not yet committed ("dirty reads"). Use UR when: - You're running reports where approximate data is acceptable - You're doing existence checks where the row won't disappear - You need to avoid lock contention at all costs

Never use UR for data that drives financial calculations or regulatory reporting. Sandra Chen at Federal Benefits learned this the hard way when a UR-based report showed a beneficiary's status as "active" during the fraction of a second when a batch program was mid-update. The report went to an auditor.

Cursor Stability (CS)

CS is the default. It holds a read lock only on the row/page the cursor is currently positioned on. When the cursor moves, the previous lock is released. Write locks are held until COMMIT.

CS prevents dirty reads but allows non-repeatable reads — if you read a row, do some processing, and read it again, the value might have changed because someone else modified and committed it between your two reads.

CS is the right choice for most OLTP programs. It provides good data consistency with minimal lock contention.

Read Stability (RS)

RS holds read locks on all rows that qualify for the result set for the duration of the unit of work. Rows that were scanned but didn't qualify are released (unlike RR).

RS was introduced in DB2 V4 and is the sweet spot for many applications. It prevents non-repeatable reads for qualified rows without the overhead of locking every row you touch. Use RS when: - You need to re-read data within a unit of work and get the same results - You can't tolerate phantom reads on qualified rows - RR's overhead is too high

Repeatable Read (RR)

RR holds read locks on every row accessed — whether it qualified for the result set or not — until COMMIT. This prevents non-repeatable reads and phantom reads, but at a significant cost: you're locking rows you don't even care about.

       EXEC SQL
           SELECT COUNT(*)
           INTO :WS-COUNT
           FROM ACCT_MASTER
           WHERE ACCT_STATUS = 'DELINQUENT'
           WITH RR
       END-EXEC

If DB2 scans 10,000 rows to find 50 delinquent accounts, RR locks all 10,000 rows. RS would lock only the 50.

Use RR only when you absolutely must guarantee that no other program can insert, update, or delete any row in the range you've accessed. Financial close processing, regulatory snapshots, and audit extracts are legitimate use cases. General OLTP is not.

8.3.2 Isolation Level Summary Matrix

Isolation Level	Dirty Read	Non-Repeatable Read	Phantom Read	Lock Overhead
UR	Yes	Yes	Yes	Minimal
CS	No	Yes	Yes	Low
RS	No	No (qualified rows)	Possible	Medium
RR	No	No	No	High

8.3.3 Specifying Isolation Levels in COBOL

You can set isolation levels at three scopes:

Statement level (most granular): Use WITH UR, WITH CS, WITH RS, or WITH RR on individual SQL statements.
Plan/package level: BIND with ISOLATION(CS), ISOLATION(RR), etc. This sets the default for all SQL in the plan or package.
Subsystem level: ZPARM ISOLATION sets the default for all plans that don't specify one. Don't rely on this — always specify explicitly.

Best practice: Bind your plan or package with ISOLATION(CS) and override at the statement level where needed. This gives you the best balance of concurrency (CS default) with the ability to tighten up for specific queries.

      *---------------------------------------------------------
      * Default plan isolation is CS. Override for this
      * query because we need a stable count for the
      * end-of-day settlement report.
      *---------------------------------------------------------
       EXEC SQL
           SELECT SUM(TXN_AMOUNT)
           INTO :WS-SETTLE-TOTAL
           FROM DAILY_TXNS
           WHERE TXN_DATE = :WS-TODAY
             AND TXN_STATUS = 'POSTED'
           WITH RS
       END-EXEC

8.3.4 The Currently Committed Semantics (DB2 12+)

Starting with DB2 12, DB2 introduced currently committed semantics for CS isolation. When a reader encounters a row that's being modified (X-locked) by another process, instead of waiting for the lock to be released, DB2 can return the committed version of the data from the log.

This is a game-changer for high-concurrency systems. It means readers don't block on writers under CS. Enable it with ZPARM EVALUNC = YES and CONCURRENTACCESSRESOLUTION at bind time.

At CNB, enabling currently committed reduced CS lock waits by 73% on their account inquiry transactions. The cost? A few extra log reads. The benefit? Hundreds of thousands of avoided lock waits per day.

How currently committed works under the hood. When a CS reader encounters an X-locked row, DB2 does not return the in-flight uncommitted data. Instead, it reads the log to reconstruct the last committed version of that row. DB2 locates the undo log record for the active update, applies the inverse operation to the current row image, and returns the resulting pre-update values to the reader. This means the reader always sees a consistent, committed state — never dirty data.

Enabling currently committed requires two settings:

ZPARM EVALUNC = YES at the subsystem level. This enables the currently committed infrastructure globally.
CONCURRENTACCESSRESOLUTION USE CURRENTLY COMMITTED at BIND time. This opts individual packages into the behavior.

You can also specify it at the statement level using WITH CURRENTLY COMMITTED:

       EXEC SQL
           SELECT ACCT_BAL
           INTO :WS-ACCT-BAL
           FROM ACCT_MASTER
           WHERE ACCT_ID = :WS-ACCT-ID
           WITH CS USE CURRENTLY COMMITTED
       END-EXEC

When currently committed does not help. Currently committed applies only to CS isolation. It does not help with RS or RR, where read locks are held for the duration of the unit of work regardless. It also does not help when the reader needs to acquire an update lock (FOR UPDATE OF) — the reader must still wait for the writer to release the X lock before acquiring the U lock. Finally, if the log records needed to reconstruct the committed version have been archived (very rare in normal operations, but possible during extended batch windows), DB2 falls back to waiting for the lock.

Rob Chen at CNB tracks "currently committed fallback" events using the IFCID 0318 trace records. In six months of operation, they saw zero fallbacks — the active log always contained the necessary undo records because their commit intervals were short enough that no transaction's log records aged out of the active log.

Enabling currently committed at the subsystem level. Currently committed is not active by default. The ZPARM EVALUNC (Evaluate Uncommitted) controls whether DB2 even considers currently committed behavior. Set EVALUNC=YES to enable it. Additionally, the bind option CONCURRENTACCESSRESOLUTION must be set to USECURRENTLYCOMMITTED on the package or plan. At CNB, this is the default bind option for all online packages — Rob's team added it to the standard BIND parameters in their deployment automation so no developer has to remember it. For batch packages that need strict consistency (e.g., end-of-day reconciliation), they bind with WAITFOROUTCOME instead, accepting the lock waits in exchange for guaranteed real-time accuracy.

8.4 Lock Escalation: The Silent Performance Killer

This is the threshold concept for this chapter. If you understand lock escalation, you understand why seemingly well-designed programs can suddenly destroy system performance.

8.4.1 What Lock Escalation Is

Every lock DB2 holds consumes storage in the IRLM lock structure. This structure has a finite size controlled by the ZPARM LOCKS PER TABLESPACE (LOCKMAX). When the number of page or row locks held on a single tablespace by a single connection exceeds LOCKMAX, DB2 escalates — it replaces all the individual page/row locks with a single tablespace lock (or table lock in segmented/universal tablespaces).

Before escalation:

Row lock on ACCT_MASTER row 1
Row lock on ACCT_MASTER row 2
Row lock on ACCT_MASTER row 3
... (10,000 more)
Row lock on ACCT_MASTER row 10,003

After escalation:

X lock on ACCT_MASTER tablespace

That single tablespace-level X lock blocks every other program trying to access any row in the tablespace. Your 500M-transaction-per-day banking system just became single-threaded.

8.4.2 LOCKMAX and LOCKSIZE

Two DDL parameters control lock behavior at the tablespace level:

LOCKSIZE: Determines the default lock granularity. - LOCKSIZE PAGE — Lock at the 4K/8K/16K/32K page level (default) - LOCKSIZE ROW — Lock at the individual row level - LOCKSIZE TABLE — Always lock the entire table - LOCKSIZE TABLESPACE — Always lock the entire tablespace - LOCKSIZE ANY — DB2 chooses (starts at page, may escalate)

LOCKMAX: Maximum number of page/row locks per tablespace per connection before escalation. Set to 0 to disable escalation (dangerous — IRLM can run out of storage). The system-level ZPARM NUMLKTS sets a global threshold.

The interaction between LOCKSIZE and LOCKMAX is where performance tuning lives:

LOCKSIZE	LOCKMAX	Behavior
ROW	5000	Fine-grained, high concurrency, escalates at 5000 row locks
PAGE	5000	Medium granularity, escalates at 5000 page locks
ROW	0	Fine-grained, never escalates, risk of IRLM storage exhaustion
TABLESPACE	(ignored)	Always takes tablespace lock, no escalation needed

8.4.3 Why Lock Escalation Is Devastating

Lock escalation isn't just a performance problem — it's a cascading failure. Here's the sequence:

Batch program starts processing. Holds 100 row locks. No problem.
Program continues. 500 locks. 1,000 locks. Fine.
LOCKMAX is 5,000. Program hits 5,001 locks.
DB2 escalates: all row locks replaced with tablespace X lock.
Every online transaction trying to access that tablespace now waits.
Online response times spike. Timeouts start.
Timed-out transactions retry (if coded to retry), creating more load.
Monitoring alerts fire. Operators investigate.
Meanwhile, the batch program finishes its unit of work, commits, releases the tablespace lock.
Flood of waiting online transactions all execute simultaneously.
Lock contention spikes again. Some of those escalate.
System spirals.

Lisa Tran saw exactly this pattern during CNB's investigation. The CNBBATCH program was processing 50,000 accounts per COMMIT. With LOCKSIZE ROW and LOCKMAX set at the default, escalation was inevitable. Each escalation locked the entire ACCT_MASTER tablespace, blocking online wire transfers for the duration of the batch commit interval.

8.4.4 Designing COBOL Programs to Prevent Lock Escalation

Rule 1: COMMIT frequently.

The single most effective way to prevent lock escalation is to COMMIT before you reach LOCKMAX. If LOCKMAX is 5,000, commit every 2,000–3,000 rows.

       01  WS-COMMIT-COUNTER      PIC S9(8) COMP VALUE 0.
       01  WS-COMMIT-FREQUENCY    PIC S9(8) COMP VALUE 2000.

       PERFORM UNTIL SQLCODE = 100
           EXEC SQL
               FETCH CSR-BATCH INTO :WS-ACCT-ROW
           END-EXEC
           IF SQLCODE = 0
               PERFORM PROCESS-ACCOUNT
               ADD 1 TO WS-COMMIT-COUNTER
               IF WS-COMMIT-COUNTER >= WS-COMMIT-FREQUENCY
                   EXEC SQL COMMIT END-EXEC
                   MOVE 0 TO WS-COMMIT-COUNTER
      *            Re-open cursors declared WITH HOLD
      *            or re-position non-HOLD cursors
               END-IF
           END-IF
       END-PERFORM

Rule 2: Use WITH HOLD cursors for batch programs.

When you COMMIT, all cursors that are not declared WITH HOLD are closed. Re-opening and re-positioning them is expensive. Declare your batch cursors WITH HOLD:

       EXEC SQL
           DECLARE CSR-BATCH CURSOR WITH HOLD FOR
           SELECT ACCT_ID, ACCT_BAL
           FROM ACCT_MASTER
           WHERE ACCT_STATUS = 'ACTIVE'
           ORDER BY ACCT_ID
           FOR UPDATE OF ACCT_BAL
       END-EXEC

WITH HOLD keeps the cursor open across COMMITs. The cursor position is retained. Locks acquired before the COMMIT are released; the cursor simply points to the last fetched row. The next FETCH acquires a new lock on the next row.

Rule 3: Process in primary key order.

This reduces the chance of deadlocks with other programs and makes restart/recovery simpler. If a batch program abends after processing 50,000 accounts, you know the restart point: account 50,001.

Rule 4: Consider LOCKSIZE ROW for high-contention tables.

Row-level locking means more locks (one per row instead of one per page with potentially dozens of rows), but each lock blocks less data. For tables with heavy concurrent access from online and batch, row-level locking often reduces contention enough to justify the increased IRLM overhead.

Rule 5: Know your LOCKMAX.

If you don't know the LOCKMAX for your tablespace, find out before you write your batch program. Run:

SELECT LOCKMAX, LOCKSIZE
FROM SYSIBM.SYSTABLESPACE
WHERE NAME = 'your-tablespace-name'
  AND DBNAME = 'your-database-name';

8.4.5 Additional Lock Escalation Prevention Techniques

Beyond the five rules above, experienced shops use several additional techniques to keep escalation at bay:

Technique: Partition-aware COMMIT strategies. If your table is partitioned and your batch processes across multiple partitions, track lock counts per partition. LOCKMAX applies per tablespace, but if you know you're crossing partition boundaries, you can issue targeted COMMITs at partition transitions even if your normal commit frequency hasn't been reached.

Technique: MAXROWS tuning for row-level locking. When LOCKSIZE ROW is active, the number of rows per page affects how many locks accumulate. The DDL parameter MAXROWS controls the maximum rows DB2 will place on a single page. Setting MAXROWS to a lower value (e.g., 1 or 4) spreads rows across more pages, which seems counterintuitive for space efficiency but can reduce lock contention by ensuring that different transactions are less likely to need locks on the same page when lock escalation is triggered back to page level.

Technique: Application-level lock counting. For critical batch programs, maintain an explicit lock counter in WORKING-STORAGE. Increment it for every SQL statement that acquires a lock (UPDATE, DELETE, INSERT, SELECT FOR UPDATE). When the counter approaches 70% of LOCKMAX, force a COMMIT regardless of your normal commit frequency. This provides a safety margin against unexpected query plan changes that might acquire more locks per statement than you anticipated.

       01  WS-LOCK-ESTIMATE       PIC S9(8) COMP VALUE 0.
       01  WS-LOCKMAX-THRESHOLD   PIC S9(8) COMP VALUE 3500.
      *    (70% of LOCKMAX 5000)

       PERFORM PROCESS-ROW
       ADD 1 TO WS-LOCK-ESTIMATE
       IF WS-LOCK-ESTIMATE >= WS-LOCKMAX-THRESHOLD
           EXEC SQL COMMIT END-EXEC
           MOVE 0 TO WS-LOCK-ESTIMATE
       END-IF

8.4.6 Monitoring Lock Escalation

Lock escalation produces IFCID 0196 trace records and increments the ESCALATIONS counter in DB2PM / OMEGAMON statistics. If you see escalations in production, treat it as a design defect, not a tuning opportunity. Escalation means your program is holding too many locks for too long.

Rob Chen at CNB set up a real-time alert on the escalation counter. Any non-zero value triggers an investigation within 24 hours. Their target: zero escalations, 365 days a year.

8.5 Deadlock Detection, Prevention, and Resolution

8.5.1 What a Deadlock Is

A deadlock is a cycle of lock waits where each participant is waiting for a resource held by another participant. The simplest deadlock involves two programs:

Program A holds lock on Resource 1, waits for Resource 2
Program B holds lock on Resource 2, waits for Resource 1

Neither can proceed. Without intervention, they'd wait forever.

DB2's IRLM runs a deadlock detection cycle at a configurable interval (ZPARM DEADLOCK TIME, typically 1–5 seconds). When it detects a cycle, it chooses a victim — one of the participants — and rolls back that participant's current unit of work. The victim receives SQLCODE -911 with reason code 00C90088.

8.5.2 Timeouts vs. Deadlocks

Don't confuse timeouts with deadlocks. They're related but different:

	Deadlock	Timeout
SQLCODE	-911 (reason 00C90088)	-911 (reason 00C9008E) or -913
Cause	Circular lock wait	One-way lock wait exceeding time limit
Detection	IRLM deadlock detection cycle	IRLM timeout detection (RESOURCE TIMEOUT zparm)
Duration before detection	DEADLOCK TIME (seconds)	RESOURCE TIMEOUT (seconds, typically 30–120)
Action	One victim rolled back	Waiting program rolled back

In practice, timeouts are more common than deadlocks. A timeout usually means some program is holding locks for too long (often because of a missing COMMIT, a long-running utility, or a program that's suspended waiting for operator input while holding locks).

The critical distinction for your COBOL error handling: both -911 and -913 mean "your unit of work was rolled back." You must handle both identically — reopen cursors, re-read data, and retry.

8.5.3 Reading IFCID 0172 — Deadlock Trace Records

When DB2 detects a deadlock, it writes an IFCID 0172 trace record. This record is your forensic tool. It tells you:

Who was involved (plan names, correlation IDs, connection IDs)
What they were waiting for (resource name, lock mode requested)
What they were holding (resource names, lock modes held)
Who was chosen as victim (and why)

See code/example-01-deadlock-trace.txt for an annotated example. The key fields to examine:

Plan Name / Package Name: Identifies the programs involved. This tells you which COBOL source to examine.
Resource Name: Encoded tablespace DBID, OBID, and page/row identifier. Decode using SYSIBM.SYSTABLESPACE and SYSIBM.SYSTABLES.
Lock State Held: What lock mode each participant already had.
Lock State Requested: What lock mode each participant was trying to acquire.
Victim: Which participant DB2 rolled back.

8.5.3.1 Detailed Deadlock Trace Analysis — A Walk-Through

Let's trace through the actual IFCID 0172 output Rob Chen pulled during the 2 AM incident. The raw trace record looks like wall-of-hex gibberish until you know the field offsets. Here's the annotated version:

IFCID 0172 - DEADLOCK DETECTED
  TIMESTAMP: 2026-03-12-02.13.47.893217
  NUMBER OF PARTICIPANTS: 2

  PARTICIPANT 1:
    PLAN NAME:    CNBBATCH
    CORR ID:      CNBBATCH
    CONN ID:      BATCH
    LUW TOKEN:    =X'B4E2A7001C3F'
    RESOURCE HELD:
      DBID=0042 OBID=0003 TYPE=ROW
      RESOURCE=X'000004E200000A1F'
      LOCK STATE: X  (EXCLUSIVE)
    RESOURCE WAITING FOR:
      DBID=0042 OBID=0003 TYPE=ROW
      RESOURCE=X'0000061B00000B3C'
      LOCK STATE REQUESTED: X  (EXCLUSIVE)
    ** THIS PARTICIPANT IS THE VICTIM **

  PARTICIPANT 2:
    PLAN NAME:    CNBWIRE
    CORR ID:      W9847231
    CONN ID:      CICSAOR1
    LUW TOKEN:    =X'B4E2A70023AF'
    RESOURCE HELD:
      DBID=0042 OBID=0003 TYPE=ROW
      RESOURCE=X'0000061B00000B3C'
      LOCK STATE: X  (EXCLUSIVE)
    RESOURCE WAITING FOR:
      DBID=0042 OBID=0003 TYPE=ROW
      RESOURCE=X'000004E200000A1F'
      LOCK STATE REQUESTED: X  (EXCLUSIVE)

Decoding this step by step:

Step 1: Identify the tablespace. DBID=0042 maps to the database, OBID=0003 maps to the tablespace. Query the catalog:

SELECT NAME, DBNAME
FROM SYSIBM.SYSTABLESPACE
WHERE DBID = 66 AND OBID = 3;
-- Returns: NAME=TSACCTMS, DBNAME=DBCNB01

Both participants are contending on the same tablespace — ACCT_MASTER.

Step 2: Decode the resource identifiers. The RESOURCE field X'000004E200000A1F' encodes the partition number and the internal row identifier (RID). The first four bytes give you the page number; the remaining bytes give the slot within the page. Using DB2's internal mappings (or IBM's diagnostic reference), Rob determined that X'000004E2' corresponded to page 1250, which contained account number 1000447, and X'0000061B' corresponded to page 1563, containing account number 1000892.

Step 3: Confirm the deadlock cycle. CNBBATCH holds an X lock on account 1000447 (page 1250) and wants an X lock on account 1000892 (page 1563). CNBWIRE holds an X lock on account 1000892 and wants an X lock on account 1000447. Classic two-participant cycle.

Step 4: Determine why CNBBATCH was the victim. DB2 chooses the victim based on the amount of work that would be lost by rolling back each participant. CNBBATCH had been running for 47 minutes with 23,000 uncommitted updates — a massive amount of rollback work, but DB2 still chose it because the wire transfer (CNBWIRE) was a shorter-lived unit of work and rolling back CNBBATCH was actually faster (fewer log records to undo). The victim selection algorithm weighs log bytes written, not row count.

This trace analysis took Rob and Lisa four hours at 2 AM because they hadn't practiced it. After the incident, Kwame made IFCID 0172 trace analysis a required skill for every DB2 application developer at CNB. They now run quarterly "deadlock fire drills" using intentionally created deadlock scenarios in the test environment.

8.5.4 The Five Deadlock Prevention Strategies

Strategy 1: Access resources in the same order.

The single most effective deadlock prevention technique. If every program that accesses ACCT_MASTER and TXN_HISTORY always accesses ACCT_MASTER first, no deadlock between these two tables is possible.

For row-level deadlocks within the same table, process rows in primary key order. This is exactly what CNB should have done: both CNBBATCH and CNBWIRE should access accounts in account-number order.

For wire transfers, this means: if a transfer moves money from account A to account B, always lock the lower-numbered account first, regardless of which is the sender and which is the receiver.

      *---------------------------------------------------------
      * ALWAYS lock lower account number first to prevent
      * deadlocks with any other program accessing these
      * accounts.
      *---------------------------------------------------------
       IF WS-FROM-ACCT < WS-TO-ACCT
           PERFORM LOCK-AND-UPDATE-FROM-ACCT
           PERFORM LOCK-AND-UPDATE-TO-ACCT
       ELSE
           PERFORM LOCK-AND-UPDATE-TO-ACCT
           PERFORM LOCK-AND-UPDATE-FROM-ACCT
       END-IF

Strategy 2: Minimize lock duration.

Acquire locks as late as possible. COMMIT as early as possible. Don't hold locks while doing non-database work (file I/O, network calls, MQ operations).

Bad pattern:

       EXEC SQL SELECT ... INTO :data FROM TABLE1 END-EXEC
       PERFORM COMPLEX-CALCULATION          (100ms)
       PERFORM WRITE-TO-MQ-QUEUE            (50ms)
       EXEC SQL UPDATE TABLE1 SET ... END-EXEC
       EXEC SQL COMMIT END-EXEC

The lock from the SELECT is held for 150ms of non-database work. Under cursor stability it might release after the SELECT, but the UPDATE lock is held from UPDATE through COMMIT. If WRITE-TO-MQ-QUEUE were between UPDATE and COMMIT, that's 50ms of unnecessary lock holding.

Better pattern:

       EXEC SQL SELECT ... INTO :data FROM TABLE1 END-EXEC
       PERFORM COMPLEX-CALCULATION
       EXEC SQL UPDATE TABLE1 SET ... END-EXEC
       EXEC SQL COMMIT END-EXEC
       PERFORM WRITE-TO-MQ-QUEUE

Move non-database work outside the lock window. The MQ write happens after COMMIT, so no database locks are held during it.

Strategy 3: Use the minimum necessary isolation level.

Don't use RR when RS will do. Don't use RS when CS will do. Each step up the isolation hierarchy holds more locks for longer.

Strategy 4: Use LOCK TABLE for batch programs that will access most of the table.

This sounds counterintuitive — why take a big lock? Because a single tablespace-level S or X lock is better than 10 million row locks that escalate unpredictably. If your batch program will access 80%+ of the table, consider:

       EXEC SQL
           LOCK TABLE ACCT_MASTER IN SHARE MODE
       END-EXEC

This prevents escalation, makes lock management trivial, and clearly communicates the program's intent. The tradeoff: no other program can modify the table until you COMMIT. Schedule accordingly.

Strategy 5: Use retry logic for -911 and -913.

Deadlocks and timeouts will happen, even in well-designed systems. Your COBOL programs must handle them gracefully:

       01  WS-RETRY-COUNT        PIC S9(4) COMP VALUE 0.
       01  WS-MAX-RETRIES        PIC S9(4) COMP VALUE 3.
       01  WS-RETRY-NEEDED       PIC X VALUE 'N'.

       PERFORM UPDATE-ACCOUNT-BALANCE
       IF WS-RETRY-NEEDED = 'Y'
           PERFORM VARYING WS-RETRY-COUNT FROM 1 BY 1
               UNTIL WS-RETRY-COUNT > WS-MAX-RETRIES
                  OR WS-RETRY-NEEDED = 'N'
               PERFORM UPDATE-ACCOUNT-BALANCE
           END-PERFORM
       END-IF
       IF WS-RETRY-NEEDED = 'Y'
           PERFORM HANDLE-PERMANENT-FAILURE
       END-IF

       UPDATE-ACCOUNT-BALANCE.
           MOVE 'N' TO WS-RETRY-NEEDED
           EXEC SQL
               UPDATE ACCT_MASTER
               SET ACCT_BAL = :WS-NEW-BAL
               WHERE ACCT_ID = :WS-ACCT-ID
           END-EXEC
           EVALUATE SQLCODE
               WHEN 0
                   EXEC SQL COMMIT END-EXEC
               WHEN -911
                   MOVE 'Y' TO WS-RETRY-NEEDED
      *            -911 means unit of work was rolled back
      *            Must re-read data before retrying
                   PERFORM RE-READ-ACCOUNT-DATA
               WHEN -913
                   MOVE 'Y' TO WS-RETRY-NEEDED
                   PERFORM RE-READ-ACCOUNT-DATA
               WHEN OTHER
                   PERFORM HANDLE-SQL-ERROR
           END-EVALUATE.

Critical point: After -911 or -913, DB2 has rolled back your unit of work. All your uncommitted changes are gone. All your cursors are closed. You must start over from the beginning of the unit of work, not just retry the failed statement. This is the most common mistake in deadlock retry logic.

8.6 Concurrent Online and Batch: The Most Common Deadlock Scenario

8.6.1 The Problem

The single most common source of locking problems in mainframe DB2 environments is the collision between online and batch processing against the same tables. This is exactly what happened at CNB.

The pattern: - Online programs access rows randomly based on user input or external requests. Each transaction touches a small number of rows but there are thousands of concurrent transactions. - Batch programs access rows sequentially, often processing large portions of a table. Each batch thread holds many locks simultaneously.

When online and batch run concurrently, three problems emerge:

Direct deadlocks: Online and batch lock the same rows in different orders.
Lock escalation: Batch programs accumulate enough locks to trigger escalation, which blocks all online access.
Timeout cascades: Batch holds long-duration locks that cause online transactions to timeout, which causes retries, which causes more contention.

8.6.2 CNB's Solution

After the 2am incident, Kwame Mensah convened a design review. Lisa Tran, Rob Chen, and the architecture team implemented a multi-pronged solution:

Change 1: Batch COMMIT frequency

CNBBATCH was changed from committing every 50,000 rows to committing every 500 rows. This reduced the maximum number of locks held at any time from 50,000 to 500 — well below LOCKMAX.

      *---------------------------------------------------------
      * Old: COMMIT every 50,000 (caused escalation)
      * New: COMMIT every 500 (prevents escalation,
      *      keeps lock window small)
      *---------------------------------------------------------
       01  WS-COMMIT-FREQ    PIC S9(8) COMP VALUE 500.

Change 2: Access order alignment

Both CNBBATCH and CNBWIRE were modified to always access accounts in ascending account-number order. For wire transfers involving two accounts, the lower account number is locked first:

      *---------------------------------------------------------
      * Wire transfer: lock lower account first
      *---------------------------------------------------------
       IF WS-DEBIT-ACCT < WS-CREDIT-ACCT
           PERFORM DEBIT-ACCOUNT
           PERFORM CREDIT-ACCOUNT
       ELSE
           PERFORM CREDIT-ACCOUNT
           PERFORM DEBIT-ACCOUNT
       END-IF

Change 3: Row-level locking

Rob changed ACCT_MASTER from LOCKSIZE PAGE to LOCKSIZE ROW. This increased the number of locks (one per row instead of one per page holding ~40 rows), but dramatically reduced false contention where two programs needed different rows on the same page.

Change 4: Currently committed

For read-only online transactions (account inquiries, balance checks), the team enabled currently committed semantics. These transactions no longer wait for batch locks.

Result: Zero deadlocks in the first 6 months after the changes. Batch elapsed time increased 4% due to more frequent COMMITs, but online response time improved 31% due to reduced lock contention. Kwame called it "the best 4% we ever spent."

8.6.3 The Batch Window Is Dead — Design Accordingly

Twenty years ago, most mainframe shops had a "batch window" — a period (usually overnight) when online systems were shut down and batch ran exclusively. No concurrency, no lock contention, no deadlocks.

That window is gone. CNB runs online 23.5 hours a day. Pinnacle Health runs 24/7. Federal Benefits has a 30-minute maintenance window at 4am Sunday. Your batch programs must coexist with online processing, and your locking strategy must account for it.

This means: - Design batch for concurrency. Assume online is running. Design your lock strategy for concurrent access. - Test under concurrent load. Don't test batch in isolation and declare victory. Test with online and batch running simultaneously. Tools like BMC DB2 MainView or IBM OMEGAMON can show lock contention in real time. - Monitor lock waits. Track IFCID 0044 (lock suspension) and IFCID 0196 (escalation) in production. Any increase is an early warning.

8.6.4 Partition-Level Locking for Concurrent Batch

If your table is partitioned (and high-volume tables should be), you can design batch programs to process one partition at a time. Combine this with partition-level LOCK TABLE:

      *---------------------------------------------------------
      * Lock only partition 5 of ACCT_MASTER
      * Other partitions remain fully available
      *---------------------------------------------------------
       EXEC SQL
           LOCK TABLE ACCT_MASTER PART 5 IN EXCLUSIVE MODE
       END-EXEC

Online transactions that access accounts in other partitions are completely unaffected. This is the gold standard for high-volume batch/online coexistence.

At Pinnacle Health, Ahmad designed the claims processing batch to lock one partition at a time, processing partitions sequentially. Each partition represents one month of claims. The current month's partition is never locked by batch during business hours — it's processed in the early morning window. Previous months' partitions are processed throughout the day with no impact on online claims submission.

8.7 Lock Avoidance and Optimistic Concurrency

8.7.1 Lock Avoidance

DB2 doesn't always need to take a lock. Lock avoidance is DB2's ability to skip acquiring a read lock when it can determine that no other program holds an incompatible lock on the resource.

DB2 uses a mechanism called claim/drain to support lock avoidance:

A claim is a lightweight counter that indicates a page or tablespace is being accessed. Claims are cheaper than locks.
A drain is used by utilities and other processes that need exclusive access. A drain waits for all claims to be released.

When your COBOL program reads a page under CS or RS, DB2 checks whether any exclusive lock (X or IX) is held on the page. If not, DB2 avoids the lock entirely and uses a claim instead. This means your read doesn't take a lock, doesn't use IRLM storage, and can't contribute to escalation.

Lock avoidance is automatic. You don't code for it. But you can measure it: IFCID 0020 reports the lock avoidance rate. A healthy system sees 90%+ lock avoidance for read operations. If your avoidance rate is below 80%, you have a contention problem.

8.7.2 Optimistic Concurrency in COBOL

Optimistic concurrency is a design pattern that avoids holding locks during "think time" — the period when a user is looking at data on their screen before deciding to update it.

The traditional (pessimistic) pattern:

1. Read row with FOR UPDATE (acquires U lock)
2. Display to user
3. User thinks for 30 seconds
4. User makes changes
5. Update row
6. COMMIT

The U lock is held for 30+ seconds. In a system with 5,000 concurrent users, this is untenable.

The optimistic pattern:

1. Read row (no FOR UPDATE, acquires S lock, released under CS)
2. Save a "version" value (timestamp, hash, or version number)
3. Display to user
4. User thinks for 30 seconds (no locks held)
5. User makes changes
6. Re-read row with FOR UPDATE
7. Compare current version to saved version
8. If same: apply update, COMMIT
9. If different: someone else changed it — notify user, start over

Implementation in COBOL:

      *---------------------------------------------------------
      * OPTIMISTIC CONCURRENCY PATTERN
      * Step 1: Read without locks (CS releases after read)
      *---------------------------------------------------------
       EXEC SQL
           SELECT ACCT_BAL, LAST_UPDATE_TS
           INTO :WS-ACCT-BAL, :WS-SAVED-TIMESTAMP
           FROM ACCT_MASTER
           WHERE ACCT_ID = :WS-ACCT-ID
       END-EXEC

      *  ... user interaction happens here, no locks held ...

      *---------------------------------------------------------
      * Step 2: Re-read with FOR UPDATE, check version
      *---------------------------------------------------------
       EXEC SQL
           SELECT ACCT_BAL, LAST_UPDATE_TS
           INTO :WS-CURRENT-BAL, :WS-CURRENT-TIMESTAMP
           FROM ACCT_MASTER
           WHERE ACCT_ID = :WS-ACCT-ID
           FOR UPDATE OF ACCT_BAL
       END-EXEC

       IF WS-CURRENT-TIMESTAMP NOT = WS-SAVED-TIMESTAMP
      *    Someone else modified this row
           EXEC SQL ROLLBACK END-EXEC
           MOVE 'Y' TO WS-CONFLICT-FLAG
       ELSE
           EXEC SQL
               UPDATE ACCT_MASTER
               SET ACCT_BAL = :WS-NEW-BAL,
                   LAST_UPDATE_TS = CURRENT TIMESTAMP
               WHERE ACCT_ID = :WS-ACCT-ID
           END-EXEC
           EXEC SQL COMMIT END-EXEC
       END-IF

The version column is essential. Without it, you have no way to detect that someone else changed the row between your initial read and your update. Use a TIMESTAMP column updated on every modification, or a simple integer version counter incremented on each update.

Yuki Tanaka at SecureFirst Insurance implemented optimistic concurrency across all their CICS policy-update transactions. Before the change, lock wait times for policy updates averaged 450ms during peak hours. After: 12ms. The conflict rate (someone else modified the row during think time) was 0.3%, and those users simply got a message saying "This policy was updated by another user. Please review and resubmit."

8.7.3 Optimistic Locking with Row Change Timestamp

DB2 provides a built-in mechanism for optimistic concurrency: the ROW CHANGE TIMESTAMP expression.

       EXEC SQL
           SELECT ACCT_BAL,
                  ROW CHANGE TIMESTAMP FOR ACCT_MASTER
           INTO :WS-ACCT-BAL, :WS-ROW-TIMESTAMP
           FROM ACCT_MASTER
           WHERE ACCT_ID = :WS-ACCT-ID
       END-EXEC

This returns the timestamp of the last modification to the row without requiring you to maintain a separate version column. Requires the table to have been created with ADD EXTRA ROW CHANGE TIMESTAMP or implicitly maintained in universal tablespaces.

8.8 HA Banking System Project Checkpoint: Locking Strategy

It's time to apply everything in this chapter to the HA Banking Transaction Processing System you've been building throughout this textbook.

8.8.1 Requirements Recap

The HA Banking system must: - Process online transactions 23 hours/day with sub-second response times - Run batch account maintenance nightly during a 4-hour window (with online still active) - Support 100,000 concurrent online users - Maintain ACID compliance for all financial transactions - Survive a single-site failure without data loss (from Chapter 1's Parallel Sysplex design)

8.8.2 Locking Strategy Design

Based on what we've covered, here's the locking strategy:

Table: HACCT_MASTER (core account table) - LOCKSIZE ROW — Minimizes false contention between online transactions - LOCKMAX 10000 — High threshold (batch will COMMIT well before this) - Partitioned by account-number range (12 partitions) - Batch processes one partition at a time, partitions in sequence - Online uses CS isolation with currently committed enabled - Wire transfers lock lower account number first

Table: HTXN_HISTORY (transaction log) - LOCKSIZE PAGE — Insert-heavy, row locking overhead not justified - LOCKMAX 5000 - Partitioned by date (monthly) - Online inserts into current partition only - Batch reads from previous partitions (no contention with online inserts) - Online reads use UR (historical queries don't need current accuracy)

Table: HACCT_HOLD (fraud holds) - LOCKSIZE ROW — High contention from fraud detection batch + online customer service - LOCKMAX 2000 - Not partitioned (small table, < 1M rows) - Fraud batch commits every 200 rows - Online uses optimistic concurrency for hold modifications

Cross-table access order (enforced by coding standards): 1. HACCT_HOLD (check for holds first) 2. HACCT_MASTER (then access account) 3. HTXN_HISTORY (then log transaction)

This order is documented, enforced by code review, and verified by a static analysis rule that flags any program accessing these tables in a different sequence.

COMMIT strategy: - Online: COMMIT after each transaction (CICS pseudo-conversational, one UOW per transaction) - Batch account maintenance: COMMIT every 500 rows - Batch fraud detection: COMMIT every 200 rows - End-of-day settlement: Uses RS isolation, COMMIT after full settlement (this runs during the 1-hour reduced-activity window)

Retry logic: - All programs implement retry for -911 and -913 - Online: 3 retries with no delay (transactions are fast) - Batch: 3 retries with 1-second delay between retries (allows contention to clear) - If all retries fail: online returns error to user; batch writes to exception file and continues

8.8.3 Your Checkpoint Task

Review the locking strategy above and answer the following in your project documentation:

What happens if the fraud detection batch is running when a customer service rep needs to release a hold? Trace the lock interactions.
The end-of-day settlement uses RS isolation and commits after the full settlement. What's the risk? How would you mitigate it?
A new requirement arrives: real-time account-to-account transfers must complete in under 200ms. Both accounts are in HACCT_MASTER. Design the lock strategy for this transaction, including deadlock prevention.
The operations team reports that batch account maintenance is occasionally timing out on partition 7 (the most active partition, containing the most popular account ranges). Diagnose three possible causes and propose solutions for each.

8.9 Spaced Review

In a data sharing environment, locking becomes more complex because multiple DB2 subsystems can access the same data simultaneously. The IRLM on each member communicates through the Coupling Facility (CF) to maintain a global lock table.

Review question: If Member 1 holds an X lock on a page in ACCT_MASTER and Member 2 requests an S lock on the same page, what happens? (Answer: The CF detects the inter-system lock conflict. Member 2's request is suspended until Member 1 releases the X lock. This is called a global lock contention event and is tracked by IFCID 0261.)

The locking concepts in this chapter apply to both single-member and data-sharing environments. In data sharing, the cost of each lock is higher (global lock requests involve CF communication, adding microseconds of latency). This makes all of our optimization strategies — frequent COMMITs, lock avoidance, proper isolation levels — even more important.

From Chapter 6: Optimizer and Access Paths

The optimizer's access path decision directly affects locking behavior:

A tablespace scan acquires locks on every page it reads (under CS, it locks and releases as it scans; under RR, it holds all locks).
An index access acquires locks only on the qualifying pages.
A list prefetch acquires locks in the order the rows are accessed, which may differ from the order an application developer expects.

Review question: You have a query bound with RR isolation that does a tablespace scan of a 10,000-page table. How many page locks does it acquire? (Answer: 10,000 — one for every page scanned, because RR holds locks on all accessed pages, not just qualifying pages. This is why RR + tablespace scan is almost always a mistake for large tables. Ensure proper indexes exist for queries bound with RR.)

Understanding this interaction between optimizer decisions and lock behavior is why Chapter 6 is a prerequisite for this chapter. The optimizer decides how data is accessed; the lock manager decides what locks are acquired for that access pattern. A change in access path (e.g., due to a RUNSTATS update) can change lock behavior even if the COBOL program hasn't changed.

Chapter Summary

DB2 locking is not something that happens to your programs. It's something you design. The difference between a system that processes 500 million transactions a day and one that wakes people up at 2am is a deliberate locking strategy.

The key principles:

Understand the lock hierarchy and modes. You can't design what you don't understand.
Choose isolation levels deliberately. CS for most OLTP. RS when you need repeatable reads on qualified rows. RR only when absolutely necessary. UR for read-only queries where dirty reads are acceptable.
Prevent lock escalation through COMMIT frequency. If your batch program reaches LOCKMAX, you've already failed.
Prevent deadlocks through access ordering. If every program accesses resources in the same order, deadlocks between those resources are impossible.
Handle -911 and -913 in every program. Deadlocks and timeouts will happen. Your program must retry gracefully.
Design for concurrent online/batch. The batch window is dead. Your programs must coexist.
Use optimistic concurrency for user-facing transactions. Don't hold locks during think time.

Lisa Tran doesn't get 2am pages about deadlocks anymore. Not because deadlocks are impossible — they're an inherent risk of concurrent data access. But because CNB's programs are designed to minimize their likelihood and handle them gracefully when they occur. That's the standard we're building toward in this textbook.

Next chapter, we'll build on this foundation to examine DB2 program preparation and package management — because the way you BIND your programs determines the default locking behavior for every SQL statement they contain.

From Kwame's whiteboard: "If you don't design your locking strategy, DB2 will choose one for you. You won't like its choice."

In This Chapter

Chapter 8: DB2 Locking, Concurrency, and Deadlock Resolution: The Chapter That Prevents 2am Pages

8.1 The 2am Page That Teaches Locking

8.2 DB2 Lock Hierarchy and Lock Modes

8.2.1 The Lock Hierarchy

8.2.2 Lock Modes

8.2.3 How Locks Flow in a COBOL Program

8.2.4 The Update Lock — The Misunderstood Middle Child

8.2.5 Latches vs. Locks

8.3 Isolation Levels for COBOL Programs

8.3.1 The Four Isolation Levels

8.3.2 Isolation Level Summary Matrix

8.3.3 Specifying Isolation Levels in COBOL

8.3.4 The Currently Committed Semantics (DB2 12+)

8.4 Lock Escalation: The Silent Performance Killer

8.4.1 What Lock Escalation Is

8.4.2 LOCKMAX and LOCKSIZE

8.4.3 Why Lock Escalation Is Devastating

8.4.4 Designing COBOL Programs to Prevent Lock Escalation

8.4.5 Additional Lock Escalation Prevention Techniques

8.4.6 Monitoring Lock Escalation

8.5 Deadlock Detection, Prevention, and Resolution

8.5.1 What a Deadlock Is

8.5.2 Timeouts vs. Deadlocks

8.5.3 Reading IFCID 0172 — Deadlock Trace Records

8.5.3.1 Detailed Deadlock Trace Analysis — A Walk-Through

8.5.4 The Five Deadlock Prevention Strategies

8.6 Concurrent Online and Batch: The Most Common Deadlock Scenario

8.6.1 The Problem

8.6.2 CNB's Solution

8.6.3 The Batch Window Is Dead — Design Accordingly

8.6.4 Partition-Level Locking for Concurrent Batch

8.7 Lock Avoidance and Optimistic Concurrency

8.7.1 Lock Avoidance

8.7.2 Optimistic Concurrency in COBOL

8.7.3 Optimistic Locking with Row Change Timestamp

8.8 HA Banking System Project Checkpoint: Locking Strategy

8.8.1 Requirements Recap

8.8.2 Locking Strategy Design

8.8.3 Your Checkpoint Task

8.9 Spaced Review

From Chapter 1: Parallel Sysplex and Data Sharing

From Chapter 6: Optimizer and Access Paths

Chapter Summary

Related Reading