> "The lock manager is the silent arbiter of every multi-user database. Ignore it and your application will teach you about it — at 2 AM on a Saturday, with the pager screaming."
In This Chapter
- Learning Objectives
- 26.1 Why Locking Exists
- 26.2 DB2 Lock Types and Modes
- 26.3 Lock Granularity and Escalation
- 26.4 Isolation Levels — The Four Guarantees
- 26.5 Currently Committed Semantics and Lock Avoidance
- 26.6 Deadlock Detection and Resolution
- 26.7 Lock Waits and Timeout Tuning
- 26.8 Optimistic vs. Pessimistic Concurrency
- 26.9 SKIP LOCKED DATA
- 26.10 Locking in Batch vs. OLTP
- 26.11 Meridian Bank Concurrency Design
- Spaced Review: Connecting to Earlier Chapters
- Common Locking Anti-Patterns
- Summary
Chapter 26: Locking, Concurrency, and Isolation Levels — Balancing Throughput and Consistency
"The lock manager is the silent arbiter of every multi-user database. Ignore it and your application will teach you about it — at 2 AM on a Saturday, with the pager screaming." — A senior DBA who learned the hard way
Learning Objectives
After completing this chapter you will be able to:
- Explain DB2's lock types (share, update, exclusive) and lock modes.
- Understand lock escalation and how to prevent unwanted escalation.
- Implement the four isolation levels (UR, CS, RS, RR) with real behavior examples.
- Detect and resolve deadlocks.
- Apply lock avoidance and optimistic concurrency techniques.
- Design a concurrency strategy for Meridian National Bank's OLTP workload.
26.1 Why Locking Exists
Imagine two bank tellers at Meridian National Bank. Teller A reads an account balance of $10,000 and begins processing a $3,000 withdrawal. At the same instant, Teller B reads the same balance — $10,000 — and processes a $5,000 withdrawal. Both compute the new balance independently: Teller A writes $7,000, Teller B writes $5,000. One of those writes obliterates the other. The customer either gained or lost money depending on who wrote last.
This is the lost update problem, and it is the most intuitive reason that databases require a concurrency control mechanism. Locking is DB2's primary weapon against data anomalies in a multi-user environment.
The Concurrency Anomalies
The SQL standard defines four phenomena that can occur when transactions execute concurrently:
Dirty Read. Transaction A modifies a row but has not committed. Transaction B reads the uncommitted value. If Transaction A rolls back, Transaction B has acted on data that never officially existed. In a banking context, a balance inquiry might show funds from a deposit that is about to be reversed.
Non-Repeatable Read. Transaction A reads a row. Transaction B updates that row and commits. Transaction A reads the same row again and gets a different value. A report that reads the same account twice within a single transaction could see two different balances.
Phantom Read. Transaction A runs a query with a range predicate — say, all accounts with balances above $100,000. Transaction B inserts a new account with a balance of $200,000 and commits. Transaction A re-runs the identical query and finds an extra row that was not there before. The new row is a "phantom."
Lost Update. Two transactions read the same row, compute a new value based on the old one, and both write. The first write is overwritten by the second. This is the teller scenario above.
Every concurrency control strategy exists to prevent some or all of these anomalies. DB2 uses a lock-based approach (with some lock avoidance optimizations) rather than a purely multi-version approach, although it incorporates multi-version ideas in specific features like Currently Committed on LUW.
The Fundamental Tradeoff
More locking means more consistency but less concurrency. Fewer locks mean higher throughput but greater risk of data anomalies. The art of DB2 performance tuning is finding the point on this spectrum that satisfies your application's correctness requirements without destroying throughput. There is no universal "right answer." A high-frequency trading system has different requirements than a data warehouse refresh.
Historical Context: How DB2 Evolved Its Locking Strategy
Early versions of DB2 used page-level locking exclusively. Every lock covered an entire 4 KB page, meaning that two transactions updating different rows on the same page would conflict. This was acceptable in the 1980s when transaction volumes were lower and hardware was expensive — fewer locks meant less memory overhead.
Row-level locking was introduced in DB2 for z/OS Version 3 (early 1990s) and has been the default on LUW since its inception. Row-level locking provides maximum concurrency because two transactions can update different rows on the same page simultaneously, but it comes at the cost of higher memory usage (each row lock requires its own lock control block in memory).
The evolution continued with lock avoidance on z/OS (reducing the number of locks acquired in the first place), Currently Committed on LUW (allowing readers to bypass writer locks), and SKIP LOCKED DATA (allowing workers to bypass each other's locks for queue processing). Each enhancement represents a step toward the ideal of maximum concurrency with zero data anomalies — an ideal that is theoretically impossible but practically approachable.
Understanding this evolution helps explain why z/OS and LUW handle locking differently in certain areas: z/OS retains page-level locking as an option (for backward compatibility and for workloads where per-row tracking overhead exceeds the concurrency benefit), while LUW offers only row-level locking but supplements it with Currently Committed semantics that z/OS does not need (because z/OS has its own lock avoidance mechanism).
26.2 DB2 Lock Types and Modes
Lock Objects: What Gets Locked
DB2 can acquire locks at multiple levels of granularity:
| Lock Object | z/OS | LUW |
|---|---|---|
| Row | Yes | Yes |
| Page | Yes | No (LUW locks at row level) |
| Table | Yes | Yes |
| Tablespace | Yes | N/A (LUW does not have tablespace locks in the same sense) |
| Partition | Yes (data partitioned tablespaces) | Yes (range-partitioned tables) |
On z/OS, the default locking granularity is determined by the LOCKSIZE clause of the tablespace definition. On LUW, row-level locking is the default and there is no page-level locking.
Lock Modes: How It Gets Locked
DB2 defines a hierarchy of lock modes. Understanding these modes and their compatibility is essential for diagnosing concurrency problems.
Intent Locks (acquired at the table or tablespace level to signal intent to lock at a finer granularity):
- IS (Intent Share): "I intend to read some rows in this table." Acquired when a transaction reads rows under CS, RS, or RR isolation.
- IX (Intent Exclusive): "I intend to modify some rows in this table." Acquired when a transaction updates, inserts, or deletes rows.
- SIX (Share with Intent Exclusive): "I am reading the entire table and intend to modify some rows." This is a combination of S and IX. It occurs when a transaction holds a share lock on the entire table but also needs to update individual rows.
Row/Page-Level Locks:
- S (Share): "I am reading this row. Others may also read it, but no one may modify it until I release my lock."
- U (Update): "I am reading this row with the intention of updating it. Others may read it (S locks are compatible), but no other transaction may acquire a U or X lock." The U lock is critical for preventing a common deadlock pattern — more on this in Section 26.6.
- X (Exclusive): "I am modifying this row. No other transaction may read or modify it." (Exception: UR isolation can read X-locked rows.)
The Lock Compatibility Matrix
This matrix determines whether two concurrent lock requests on the same object can coexist:
Requested → IS IX S U X SIX
Held ↓
IS Yes Yes Yes Yes No Yes
IX Yes Yes No No No No
S Yes No Yes Yes No No
U Yes No Yes No No No
X No No No No No No
SIX Yes No No No No No
Read this matrix carefully. When Transaction A holds a lock (row on the left) and Transaction B requests a lock (column on the top), the cell tells you whether B can proceed (Yes) or must wait (No).
Key observations:
- IS is the most permissive. It is compatible with everything except X. This is why read-only transactions under CS isolation are generally light on locking overhead.
- X is the most restrictive. Nothing is compatible with an exclusive lock except UR readers (which bypass locks entirely).
- U prevents the symmetric deadlock. Two transactions that both read with S locks and then try to upgrade to X will deadlock (S is compatible with S, but neither can upgrade to X while the other holds S). By using U, only one transaction can hold the "intent to update" lock at a time, serializing the upgrade path.
Lock Duration
How long DB2 holds a lock depends on the isolation level:
- UR: No row locks for reads (the row is read without acquiring a lock, or the lock is acquired and released instantly).
- CS: The lock is released when the cursor moves off the row (or when the transaction commits, depending on whether the lock was acquired for an update).
- RS: Row locks on qualifying rows are held until COMMIT.
- RR: All row locks — including those on rows that were scanned but did not qualify — are held until COMMIT.
We will explore each isolation level in depth in Section 26.4.
26.3 Lock Granularity and Escalation
Choosing Lock Granularity
On z/OS, you control the default locking granularity at the tablespace level:
-- z/OS: Create tablespace with row-level locking
CREATE TABLESPACE TS_ACCOUNTS
IN MERIDIANDB
USING STOGROUP MERSG01
LOCKSIZE ROW
MAXROWS 255;
-- z/OS: Create tablespace with page-level locking
CREATE TABLESPACE TS_HISTORY
IN MERIDIANDB
USING STOGROUP MERSG01
LOCKSIZE PAGE;
-- z/OS: Create tablespace with table-level locking (rare for OLTP)
CREATE TABLESPACE TS_LOOKUP
IN MERIDIANDB
USING STOGROUP MERSG01
LOCKSIZE TABLE;
On LUW, row-level locking is the default and only row-level granularity. You can use LOCK TABLE to explicitly escalate to table-level, but there is no page-level locking.
Row-level locking provides the highest concurrency. Each transaction locks only the specific rows it touches. The cost is more memory for lock tracking (each lock consumes memory in the IRLM on z/OS or the lock list on LUW).
Page-level locking (z/OS only) is a compromise. A single lock covers all rows on a page (typically 4 KB or 32 KB). If transactions tend to access different pages, this is nearly as concurrent as row locking but with lower overhead. If transactions frequently access the same page, page locking can cause unnecessary contention.
Table-level locking serializes all access to the table. Use this only for small lookup tables that are rarely updated or for batch operations where exclusive access is acceptable.
Lock Escalation
DB2 does not have unlimited memory for locks. When a single transaction accumulates too many locks on a single object, DB2 escalates — it replaces the many fine-grained locks with a single coarser lock (typically a table or tablespace lock).
On z/OS:
Lock escalation is controlled by the IRLM parameter NUMLKTS (number of locks per tablespace) and the overall lock limit NUMLKUS (number of locks per user). When a transaction's lock count on a single tablespace exceeds NUMLKTS, DB2 escalates to a tablespace-level lock.
-- z/OS IRLM parameters (set in IRLM startup parameters)
NUMLKTS = 2000 -- Max locks per tablespace per transaction
NUMLKUS = 10000 -- Max locks per user (connection)
On LUW:
Escalation is controlled by two database configuration parameters:
-- LUW: View current settings
db2 get db cfg for MERIDIANDB | grep -i lock
-- Key parameters:
-- LOCKLIST = Memory allocated for lock storage (4KB pages)
-- MAXLOCKS = Percentage of LOCKLIST a single application can use
-- before escalation is triggered
-- LUW: Increase lock memory to reduce escalation
db2 update db cfg for MERIDIANDB using LOCKLIST 8192
db2 update db cfg for MERIDIANDB using MAXLOCKS 22
When an application's locks consume more than MAXLOCKS percent of LOCKLIST, DB2 escalates the table with the most locks to a table-level lock. If the total lock list usage exceeds a system threshold, DB2 escalates locks across all applications.
Why Escalation Is Dangerous
Lock escalation transforms many surgical row locks into a single blunt table lock. A transaction that was peacefully coexisting with hundreds of other transactions suddenly locks the entire table. Every other transaction that needs that table — for any row — now waits.
In an OLTP environment, lock escalation is one of the most common causes of sudden throughput collapse. The system runs fine for hours, then a single batch query triggers escalation, and response times spike from milliseconds to minutes.
Preventing Escalation
-
COMMIT frequently. Locks are released at COMMIT. A transaction that reads 100,000 rows and commits only at the end holds 100,000 locks simultaneously. One that commits every 1,000 rows never holds more than 1,000.
-
Increase lock memory. On z/OS, increase
NUMLKTS. On LUW, increaseLOCKLISTandMAXLOCKS. But this is a band-aid — you are trading memory for deferred escalation, not solving the underlying problem. -
Use appropriate isolation. Under CS isolation, read locks are released when the cursor moves. Under RS or RR, they are held until COMMIT. If your application does not require RS or RR, using CS dramatically reduces lock accumulation.
-
Partition large tables. On z/OS, partition-level locking allows escalation to affect only one partition rather than the entire tablespace. A batch job scanning partition 3 can escalate without affecting transactions on partitions 1, 2, and 4.
-
Use LOCK TABLE explicitly for batch. If you know a batch job needs to process an entire table, it is better to acquire a table lock explicitly at the start than to let DB2 escalate mid-flight, which causes lock-wait storms.
Monitoring Escalation
z/OS:
-- Check for lock escalation events in IFCID 0196
-- Or use the DB2 command:
-DISPLAY DATABASE(MERIDIANDB) SPACENAM(*) LOCKS
LUW:
-- Monitor lock escalations
SELECT LOCK_ESCALS, LOCK_TIMEOUTS, DEADLOCKS
FROM TABLE(MON_GET_DATABASE(-2)) AS T;
-- Per-table escalation
SELECT TABSCHEMA, TABNAME, LOCK_ESCALS
FROM TABLE(MON_GET_TABLE('', '', -2)) AS T
WHERE LOCK_ESCALS > 0
ORDER BY LOCK_ESCALS DESC;
26.4 Isolation Levels — The Four Guarantees
This is the threshold concept of the chapter. Most developers have a vague sense that "higher isolation = more locking," but few can articulate the precise behavioral differences. We will walk through each level with concrete, step-by-step concurrent transaction examples.
Overview
| Isolation Level | DB2 Name | Dirty Read | Non-Repeatable Read | Phantom Read |
|---|---|---|---|---|
| Read Uncommitted | UR | Possible | Possible | Possible |
| Read Committed | CS | Prevented | Possible | Possible |
| Repeatable Read | RS | Prevented | Prevented | Possible |
| Serializable | RR | Prevented | Prevented | Prevented |
26.4.1 Uncommitted Read (UR)
What it does: The transaction reads data without acquiring row locks. It can see uncommitted changes made by other transactions.
When to use it: Reports where approximate data is acceptable, dashboards showing "live" counts, monitoring queries where freshness matters more than precision.
Concrete Example:
Time Transaction A Transaction B
---- ------------- -------------
T1 UPDATE accounts
SET balance = balance + 5000
WHERE acct_id = 1001;
-- balance was 10000, now 15000
-- NOT YET COMMITTED
T2 SELECT balance
FROM accounts
WHERE acct_id = 1001
WITH UR;
-- Returns 15000 (uncommitted!)
T3 ROLLBACK;
-- balance reverts to 10000
T4 -- Transaction B acted on a
-- balance of 15000 that
-- never actually existed.
Transaction B read a "dirty" value. If it made a decision based on that value — approving a withdrawal, for instance — the decision was based on phantom data.
z/OS Syntax:
SELECT * FROM accounts WHERE acct_id = 1001
WITH UR;
LUW Syntax:
-- Statement level
SELECT * FROM accounts WHERE acct_id = 1001 WITH UR;
-- Session level
SET CURRENT ISOLATION = UR;
Performance Impact: Minimal lock overhead. UR readers never wait for writers and never cause writers to wait. This makes UR ideal for long-running read-only queries that must not interfere with OLTP traffic.
26.4.2 Cursor Stability (CS)
What it does: CS is DB2's default isolation level. The transaction acquires a share lock on each row as it reads it, and releases the lock when the cursor moves to the next row. Only the "current" row is locked at any moment.
What it prevents: Dirty reads. You will never see uncommitted data.
What it allows: Non-repeatable reads and phantom reads. Because the lock is released when the cursor moves, another transaction can modify the row after you release it.
Concrete Example:
Time Transaction A (CS) Transaction B
---- ---------------- -------------
T1 SELECT balance FROM accounts
WHERE acct_id = 1001;
-- Returns 10000
-- S lock on row acquired, then
-- released when cursor moves
T2 UPDATE accounts
SET balance = 7000
WHERE acct_id = 1001;
COMMIT;
T3 SELECT balance FROM accounts
WHERE acct_id = 1001;
-- Returns 7000!
-- Different from T1 read.
Transaction A sees a different balance the second time it reads. This is a non-repeatable read. Under CS, this is expected and permitted behavior.
Why CS is usually the right choice: Most OLTP transactions are short — read a row, process it, update it, commit. Within such a short window, the probability of a non-repeatable read is low, and the reduced locking overhead means higher throughput. For 90% of OLTP applications, CS provides the right balance.
26.4.3 Read Stability (RS)
What it does: RS acquires share locks on all rows that the transaction reads and qualify for the result set, and holds those locks until COMMIT. This ensures that any row you have read will not be modified by another transaction until your transaction completes.
What it prevents: Dirty reads and non-repeatable reads.
What it allows: Phantom reads. Other transactions can insert new rows that would qualify for your predicate, because RS only locks rows you have already read, not the "gaps" between them.
Concrete Example:
Time Transaction A (RS) Transaction B
---- ---------------- -------------
T1 SELECT balance FROM accounts
WHERE acct_id = 1001;
-- Returns 10000
-- S lock held until COMMIT
T2 UPDATE accounts
SET balance = 7000
WHERE acct_id = 1001;
-- BLOCKED! Waits for
-- Transaction A's S lock.
T3 SELECT balance FROM accounts
WHERE acct_id = 1001;
-- Returns 10000 again.
-- Non-repeatable read prevented.
T4 COMMIT;
-- S lock released.
T5 -- Transaction B's UPDATE
-- now proceeds.
But observe the phantom scenario:
Time Transaction A (RS) Transaction B
---- ---------------- -------------
T1 SELECT COUNT(*) FROM accounts
WHERE branch_id = 100;
-- Returns 500
-- S locks on all 500 rows
T2 INSERT INTO accounts
(acct_id, branch_id, balance)
VALUES (9999, 100, 1000);
COMMIT;
-- Succeeds! RS did not lock
-- the "gap" for new rows.
T3 SELECT COUNT(*) FROM accounts
WHERE branch_id = 100;
-- Returns 501!
-- Phantom read occurred.
When to use RS: Transactions that read data and then make decisions based on what they read, where the correctness of the decision depends on the data not changing. For example, reading an account balance, checking business rules, and then posting a transaction against that balance. RS ensures the balance does not change between the read and the update.
26.4.4 Repeatable Read (RR)
What it does: RR locks every row the transaction touches — even rows that were examined but did not qualify for the result set. On z/OS, this effectively means the predicate evaluation range is locked. No other transaction can insert, update, or delete rows in that range.
What it prevents: Dirty reads, non-repeatable reads, and phantom reads. RR provides full serializability.
What it costs: Dramatically more locking. Because RR locks even non-qualifying rows, it can lock vast swathes of a table. This is the most restrictive and most expensive isolation level.
Concrete Example:
Time Transaction A (RR) Transaction B
---- ---------------- -------------
T1 SELECT COUNT(*) FROM accounts
WHERE branch_id = 100;
-- Returns 500
-- Locks ALL rows examined during
-- the scan, including rows where
-- branch_id != 100 if they were
-- scanned.
T2 INSERT INTO accounts
(acct_id, branch_id, balance)
VALUES (9999, 100, 1000);
-- BLOCKED! The range is locked.
T3 SELECT COUNT(*) FROM accounts
WHERE branch_id = 100;
-- Returns 500 again.
-- Phantom prevented.
T4 COMMIT;
T5 -- INSERT now proceeds.
When to use RR: Rarely. RR is appropriate for audit queries, regulatory reports, or any scenario where the result set must be absolutely stable and complete for the duration of the transaction. The performance cost is significant, and in most OLTP scenarios, RS is sufficient.
Specifying Isolation Levels
Per-statement (z/OS and LUW):
SELECT * FROM accounts WITH UR;
SELECT * FROM accounts WITH CS;
SELECT * FROM accounts WITH RS;
SELECT * FROM accounts WITH RR;
Per-plan/package (z/OS BIND):
BIND PACKAGE(COLL1) MEMBER(PROG01) ISOLATION(CS)
Per-session (LUW):
SET CURRENT ISOLATION = RS;
-- All subsequent statements use RS until changed
Per-application (LUW CLI/JDBC):
connection.setTransactionIsolation(
Connection.TRANSACTION_REPEATABLE_READ);
Practical Guidance
For Meridian National Bank:
| Transaction Type | Recommended Isolation | Rationale |
|---|---|---|
| Balance inquiry | CS | Fast, non-blocking; momentary staleness is acceptable for display |
| Funds transfer | RS | Must ensure source balance does not change between read and debit |
| End-of-day reconciliation report | RR or RS | Depends on regulatory requirement for point-in-time consistency |
| Dashboard / monitoring | UR | Must never block OLTP transactions |
| Statement generation (batch) | CS with COMMIT every N rows | Minimize lock duration during long batch run |
26.5 Currently Committed Semantics and Lock Avoidance
Currently Committed (LUW)
Introduced in DB2 9.7 for LUW, Currently Committed (CC) changes the behavior of CS isolation when a reader encounters a row that is locked by an uncommitted writer.
Without CC (traditional CS behavior): the reader waits for the writer to commit or rollback, then reads the committed value.
With CC: the reader does not wait. Instead, it reads the previously committed version of the row from the log. The reader sees consistent, committed data without waiting for the in-flight transaction.
-- LUW: Enable Currently Committed (on by default since DB2 9.7)
db2 update db cfg for MERIDIANDB using CUR_COMMIT ON
This is conceptually similar to Oracle's multi-version read consistency, but implemented differently (DB2 retrieves the old value from the transaction log rather than maintaining a separate undo tablespace).
Impact: CC dramatically reduces lock waits in mixed read/write workloads. Readers and writers no longer block each other under CS isolation. This is one of the most significant concurrency improvements in DB2 LUW history.
Limitation: CC only applies to CS isolation. Under RS and RR, readers still wait for writers because those isolation levels require stability guarantees that CC cannot provide.
Lock Avoidance (z/OS)
On z/OS, DB2 implements lock avoidance — a technique where the system determines that a lock is unnecessary and skips it entirely. Lock avoidance checks the log to determine whether a row has been committed and whether it is currently being modified. If the row is committed and not in flight, no lock is acquired.
Lock avoidance is automatic and transparent. It applies primarily to read operations under CS isolation. You can monitor its effectiveness using IFCID 0107 or by examining accounting trace fields:
-- z/OS: Lock avoidance statistics in accounting trace
-- QXLALOCK = Number of lock requests avoided
-- QXLACOMP = Number of lock avoidance comparisons
Lock avoidance can dramatically reduce the number of lock requests in read-heavy workloads. In well-tuned systems, 90% or more of read lock requests may be avoided.
Combining CC and Lock Avoidance
On LUW, when CC is enabled, the system first attempts to read without waiting. If the row is uncommitted, it reads the previously committed version. On z/OS, lock avoidance checks whether a lock is needed at all.
Both mechanisms serve the same goal: reducing the cost of the concurrency control mechanism without sacrificing consistency. They represent DB2's pragmatic approach to the fundamental tradeoff discussed in Section 26.1.
26.6 Deadlock Detection and Resolution
What Is a Deadlock?
A deadlock occurs when two or more transactions are each waiting for a lock held by the other, creating a circular dependency that can never resolve on its own.
Classic Deadlock Scenario:
Time Transaction A Transaction B
---- ------------- -------------
T1 UPDATE accounts
SET balance = balance - 100
WHERE acct_id = 1001;
-- X lock on row 1001
T2 UPDATE accounts
SET balance = balance - 200
WHERE acct_id = 2002;
-- X lock on row 2002
T3 UPDATE accounts
SET balance = balance + 100
WHERE acct_id = 2002;
-- WAITS for Transaction B's
-- X lock on row 2002
T4 UPDATE accounts
SET balance = balance + 200
WHERE acct_id = 1001;
-- WAITS for Transaction A's
-- X lock on row 1001
-- DEADLOCK! A waits for B,
-- B waits for A. Neither
-- can proceed.
DB2's Deadlock Detector
DB2 runs a background process that periodically scans the lock wait graph for cycles:
z/OS: The IRLM deadlock detector runs at an interval controlled by the DEADLOK parameter (default: 5 seconds for local deadlocks, 15 seconds for global deadlocks in data sharing).
LUW: The deadlock detector runs at an interval controlled by the DLCHKTIME database manager configuration parameter (default: 10000 milliseconds = 10 seconds).
When a deadlock is detected, DB2 chooses a victim — the transaction that has done the least work (fewest log bytes written) — and rolls it back. The victim receives:
- SQLCODE -911, reason code 2 (deadlock)
- SQLSTATE 40001
The other transaction(s) in the deadlock proceed normally.
Deadlock vs. Timeout
A timeout occurs when a transaction waits for a lock longer than the configured threshold, regardless of whether a deadlock exists:
- z/OS: Controlled by
IRLMRWT(resource wait time) in the DSNZPARM. SQLCODE -911, reason code 68. - LUW: Controlled by
LOCKTIMEOUTdatabase configuration parameter. SQLCODE -911, reason code 68.
The distinction matters: a deadlock is a logical impossibility (circular wait), while a timeout is simply impatience (linear wait that exceeded the threshold). Both produce -911 but with different reason codes.
The -911 Retry Pattern
Any application that accesses DB2 must handle -911. The standard pattern is retry with backoff:
-- Pseudocode for -911 retry logic
max_retries = 3
retry_count = 0
LOOP:
BEGIN TRANSACTION
-- Your SQL operations here
UPDATE accounts SET balance = balance - 100 WHERE acct_id = 1001;
UPDATE accounts SET balance = balance + 100 WHERE acct_id = 2002;
COMMIT
IF SQLCODE = -911 THEN
ROLLBACK -- Ensure clean state
retry_count = retry_count + 1
IF retry_count > max_retries THEN
RAISE ERROR 'Transaction failed after max retries'
END IF
WAIT (retry_count * 0.5 seconds) -- Exponential backoff
GOTO LOOP
END IF
Critical: After receiving -911, DB2 has already rolled back the transaction (for deadlocks) or the current statement (for timeouts). Your application must handle both cases and restart cleanly.
Preventing Deadlocks
-
Access objects in a consistent order. If every transaction that needs to lock accounts 1001 and 2002 always locks 1001 first, the circular dependency in our example cannot form. This is the single most effective deadlock prevention technique.
-
Use the UPDATE lock mode. When reading a row that you intend to update, use
SELECT ... FOR UPDATEor (on z/OS)SELECT ... FOR UPDATE OF column. This acquires a U lock rather than an S lock, preventing the S-S upgrade deadlock. -
Keep transactions short. The longer a transaction holds locks, the wider the window for deadlock. COMMIT early and often.
-
Avoid lock escalation. Escalation changes the shape of the lock graph and can introduce deadlocks that would not occur with row-level locking.
-
Use appropriate isolation. Higher isolation levels hold locks longer, increasing deadlock probability.
26.7 Lock Waits and Timeout Tuning
The Cost of Waiting
Every millisecond a transaction spends waiting for a lock is a millisecond of wasted response time. In a high-throughput OLTP system, lock waits are often the dominant contributor to response time degradation.
Configuring Timeouts
z/OS:
-- DSNZPARM parameter
IRLMRWT = 30 -- Wait up to 30 seconds for a lock
-- before timing out
A common z/OS practice is to set IRLMRWT to a short value (10-30 seconds) for OLTP plans and a longer value for batch plans. You can control this at the plan level using the ACQUIRE/RELEASE bind options.
LUW:
-- Database-level timeout
db2 update db cfg for MERIDIANDB using LOCKTIMEOUT 30
-- Session-level override
SET CURRENT LOCK TIMEOUT = 10;
-- Wait forever (dangerous!)
SET CURRENT LOCK TIMEOUT = -1;
-- No wait (immediate timeout if lock unavailable)
SET CURRENT LOCK TIMEOUT = 0;
Monitoring Lock Waits
z/OS:
-- Display current lock holders and waiters
-DISPLAY DATABASE(MERIDIANDB) LOCKS
-- Accounting trace class 3 fields:
-- QXLOCK = Number of lock requests
-- QXLKWT = Number of times the transaction waited for a lock
-- QXLKWTM = Total lock wait time (microseconds)
LUW:
-- Find applications currently waiting for locks
SELECT
HLD.AGENT_ID AS HOLDER_AGENT,
HLD.APPLICATION_NAME AS HOLDER_APP,
REQ.AGENT_ID AS WAITER_AGENT,
REQ.APPLICATION_NAME AS WAITER_APP,
REQ.LOCK_OBJECT_TYPE,
REQ.LOCK_MODE_REQUESTED,
REQ.LOCK_WAIT_ELAPSED_TIME
FROM
TABLE(MON_GET_APPL_LOCKWAIT(NULL, -2)) AS REQ,
TABLE(MON_GET_CONNECTION(HLD.AGENT_ID, -2)) AS HLD
WHERE REQ.LOCK_WAIT_ELAPSED_TIME > 5000; -- Waiting > 5 seconds
-- Snapshot of lock waits (simpler approach)
SELECT AGENT_ID, LOCK_WAITS, LOCK_WAIT_TIME,
LOCK_ESCALS, DEADLOCKS, LOCK_TIMEOUTS
FROM TABLE(MON_GET_CONNECTION(NULL, -2)) AS T
WHERE LOCK_WAITS > 0
ORDER BY LOCK_WAIT_TIME DESC;
Identifying Lock Holders
When a transaction is waiting, you need to know who holds the lock:
z/OS:
-- Display lock detail
-DISPLAY DATABASE(MERIDIANDB) SPACENAM(TS_ACCOUNTS) LOCKS DETAIL
LUW:
-- Detailed lock wait information
SELECT
REQ.AGENT_ID AS WAITER,
REQ.LOCK_MODE_REQUESTED,
REQ.LOCK_OBJECT_TYPE,
REQ.TABSCHEMA,
REQ.TABNAME,
HLD.AGENT_ID AS HOLDER,
HLD.LOCK_MODE AS HOLDER_LOCK_MODE
FROM TABLE(MON_GET_APPL_LOCKWAIT(NULL, -2)) AS REQ
INNER JOIN TABLE(MON_GET_LOCKS(NULL, -2)) AS HLD
ON REQ.LOCK_NAME = HLD.LOCK_NAME
AND HLD.LOCK_STATUS = 'G'; -- Granted locks only
26.8 Optimistic vs. Pessimistic Concurrency
Pessimistic Concurrency: Lock It Now
Pessimistic concurrency assumes that conflicts are likely. The application locks the data as soon as it reads it, holding the lock until the transaction completes.
Pattern: SELECT FOR UPDATE
-- Pessimistic: Lock the row immediately
SELECT balance INTO :balance
FROM accounts
WHERE acct_id = 1001
FOR UPDATE;
-- Process the balance...
-- The row is X-locked. No one else can read (under CS+) or modify it.
UPDATE accounts
SET balance = :balance - 500
WHERE acct_id = 1001;
COMMIT;
Pros: Simple to understand. Guarantees no conflicts. Data integrity is assured by the database lock manager.
Cons: Locks are held for the duration of processing. If processing involves a user think-time (web form, API call, approval workflow), locks can be held for seconds or minutes, devastating concurrency.
Optimistic Concurrency: Check Before You Write
Optimistic concurrency assumes conflicts are rare. The application reads data without holding locks (or releases them immediately), processes the data, and then checks whether the data has changed before writing.
Pattern: Timestamp-Based Optimistic Locking
-- Step 1: Read the data (no FOR UPDATE)
SELECT balance, last_modified
INTO :balance, :last_mod
FROM accounts
WHERE acct_id = 1001
WITH CS;
-- Lock released when cursor moves (CS behavior)
-- Step 2: Process (potentially long time, even user think-time)
-- No lock is held during this phase!
-- Step 3: Update with version check
UPDATE accounts
SET balance = :balance - 500,
last_modified = CURRENT TIMESTAMP
WHERE acct_id = 1001
AND last_modified = :last_mod;
-- Step 4: Check if update succeeded
IF SQLERRD(3) = 0 THEN -- No rows updated
-- Someone else modified the row!
-- Re-read and retry, or notify the user
END IF
COMMIT;
Pattern: Row Version Number
-- Add a version column to the table
ALTER TABLE accounts ADD COLUMN version_num INTEGER DEFAULT 0;
-- Read
SELECT balance, version_num INTO :balance, :version
FROM accounts WHERE acct_id = 1001;
-- Process...
-- Update with version check
UPDATE accounts
SET balance = :new_balance,
version_num = :version + 1
WHERE acct_id = 1001
AND version_num = :version;
-- Check row count to detect conflict
Pros: Locks are held for the minimum possible time. Excellent for web applications, APIs, and any scenario with user think-time.
Cons: Requires application logic to handle conflicts. In high-contention scenarios (many users updating the same row), the retry rate can become unacceptable.
When to Use Each
| Scenario | Approach | Reasoning |
|---|---|---|
| ATM withdrawal | Pessimistic | Short transaction, high contention on the same account is unlikely, correctness is critical |
| Web-based account profile update | Optimistic | Long think-time, low contention (each user updates their own profile) |
| Batch fund transfer processing | Pessimistic | No think-time, COMMIT every N rows |
| Auction bidding system | Optimistic | Extremely high contention on the same item, pessimistic would serialize all bids |
26.9 SKIP LOCKED DATA
The Queue Processing Pattern
Consider a work queue table:
CREATE TABLE work_queue (
item_id INTEGER NOT NULL PRIMARY KEY,
status CHAR(1) NOT NULL DEFAULT 'N', -- N=new, P=processing, D=done
payload VARCHAR(4000),
worker_id VARCHAR(50),
created_ts TIMESTAMP DEFAULT CURRENT TIMESTAMP
);
Multiple worker threads need to pick up the next available item. Without SKIP LOCKED, workers would contend on the same "next" row:
-- Without SKIP LOCKED: Worker 1 and Worker 2 both try to get the
-- same first row, one waits for the other.
SELECT item_id, payload
FROM work_queue
WHERE status = 'N'
ORDER BY created_ts
FETCH FIRST 1 ROW ONLY
FOR UPDATE;
With SKIP LOCKED DATA, each worker automatically skips rows that are locked by other workers:
-- With SKIP LOCKED: Each worker gets a different row
SELECT item_id, payload
FROM work_queue
WHERE status = 'N'
ORDER BY created_ts
FETCH FIRST 1 ROW ONLY
FOR UPDATE
SKIP LOCKED DATA;
-- Update the row to mark it as being processed
UPDATE work_queue
SET status = 'P', worker_id = :my_worker_id
WHERE item_id = :item_id;
This transforms a serialized queue into a parallel work distribution system with zero lock contention.
SKIP LOCKED DATA Syntax and Behavior
z/OS (DB2 10+):
SELECT ... FROM table
WHERE ...
FOR UPDATE
SKIP LOCKED DATA;
-- Can also be used with READ ONLY:
SELECT ... FROM table
WHERE ...
FOR READ ONLY
SKIP LOCKED DATA;
LUW (DB2 9.7+):
SELECT ... FROM table
WHERE ...
FOR UPDATE WITH RS USE AND KEEP UPDATE LOCKS
SKIP LOCKED DATA;
Important behaviors: - Skipped rows are silently omitted from the result set. There is no indication of how many rows were skipped. - Aggregate functions (COUNT, SUM) produce results that exclude skipped rows. This is by design but can be confusing. - SKIP LOCKED DATA interacts with isolation levels: under RR, its use is restricted because RR requires range-level stability.
Use Cases Beyond Queues
- Parallel batch processing: Multiple batch threads each grab a chunk of work and process it independently.
- Load balancing: Distribute work across application servers without a central coordinator.
- Retry queues: Failed items remain locked by the retry processor while other workers process new items.
SKIP LOCKED DATA and Aggregate Gotchas
A subtlety that catches many developers: when SKIP LOCKED DATA is used with aggregate functions, the skipped rows are excluded from the aggregate calculation. Consider:
SELECT COUNT(*) FROM work_queue WHERE status = 'N' SKIP LOCKED DATA;
If there are 100 new items and 5 are currently locked by active workers, the COUNT returns 95, not 100. This is correct behavior (you are counting available items), but if you need the total count including items being processed, you must run a separate query without SKIP LOCKED DATA.
Similarly, SUM and AVG calculations will exclude skipped rows. If you are computing a total dollar amount from a queue of pending payments, the total will be lower than expected whenever workers are actively processing items. Design your monitoring queries and business logic with this behavior in mind.
SKIP LOCKED DATA with ORDER BY
When SKIP LOCKED DATA is combined with ORDER BY and FETCH FIRST N ROWS, the behavior is deterministic but may not be intuitive. DB2 evaluates rows in the order specified by ORDER BY, skips any that are locked, and returns the first N unlocked rows. This means the "first" row returned may not be the first row in the ordering — it is the first unlocked row.
For queue processing, this is exactly the desired behavior: each worker gets the oldest available (unlocked) item. But for reporting queries, this can produce surprising results. As a rule, use SKIP LOCKED DATA only in work-distribution patterns, not in reporting or business logic queries.
26.10 Locking in Batch vs. OLTP
The Batch-OLTP Conflict
One of the most common production problems in DB2 shops is the conflict between batch jobs and OLTP transactions. Batch jobs tend to:
- Read or update large numbers of rows
- Hold locks for extended periods
- Trigger lock escalation
- Consume buffer pool space, evicting OLTP working sets
OLTP transactions tend to: 1. Touch a small number of rows 2. Require fast response times 3. Be highly sensitive to lock contention
When a batch job escalates a table lock and an OLTP transaction needs to update a single row in that table, the OLTP transaction waits — and so does every other OLTP transaction, and every one behind those, cascading into a throughput collapse.
Strategies for Peaceful Coexistence
1. COMMIT Frequency in Batch
The single most important batch tuning parameter is COMMIT frequency. A batch job that commits every 1,000 rows holds at most 1,000 locks. A batch job that commits only at the end can hold millions.
-- Batch processing with periodic COMMIT
DECLARE batch_cursor CURSOR FOR
SELECT acct_id, balance FROM accounts
WHERE last_interest_date < CURRENT DATE - 30 DAYS
FOR UPDATE OF balance;
OPEN batch_cursor;
SET :commit_count = 0;
FETCH_LOOP:
FETCH NEXT FROM batch_cursor INTO :acct_id, :balance;
IF SQLCODE = 100 GOTO DONE;
UPDATE accounts
SET balance = :balance * 1.005,
last_interest_date = CURRENT DATE
WHERE CURRENT OF batch_cursor;
SET :commit_count = :commit_count + 1;
IF MOD(:commit_count, 1000) = 0 THEN
COMMIT;
-- Must re-open cursor after COMMIT (z/OS)
-- or use WITH HOLD (LUW)
END IF;
GOTO FETCH_LOOP;
DONE:
COMMIT;
CLOSE batch_cursor;
2. Partition-Level Locking (z/OS)
On z/OS with partitioned tablespaces, lock escalation occurs at the partition level. A batch job that processes one partition at a time will only escalate locks on that partition, leaving other partitions available for OLTP.
-- Process one partition at a time
-- Partition 1: accounts with acct_id 1-100000
-- Partition 2: accounts with acct_id 100001-200000
-- etc.
-- Batch processes partition 3
SELECT * FROM accounts
WHERE acct_id BETWEEN 200001 AND 300000
FOR UPDATE;
-- Escalation, if it occurs, only locks partition 3
3. Utility Locking (z/OS)
DB2 utilities on z/OS (REORG, RUNSTATS, COPY) acquire their own locks. Understanding utility locking is critical for maintaining availability:
- REORG SHRLEVEL NONE: Exclusive access. No other access during REORG.
- REORG SHRLEVEL REFERENCE: Read-only access allowed during REORG.
- REORG SHRLEVEL CHANGE: Read and write access allowed during REORG (online REORG).
-- Online REORG that allows concurrent access
REORG TABLESPACE MERIDIANDB.TS_ACCOUNTS
SHRLEVEL CHANGE
LOG YES
KEEPDICTIONARY;
4. Batch Window Design
At Meridian National Bank, the batch window runs from 11 PM to 5 AM. During this window:
- Interest calculation batch jobs run against the ACCOUNTS table
- Statement generation batch jobs run against the TRANSACTIONS table
- End-of-day reconciliation runs against multiple tables
Concurrency strategy: - Interest calculation uses RS isolation with COMMIT every 5,000 rows and processes one branch at a time (partition-based) - Statement generation uses CS isolation with UR for read-only lookups against reference tables - Reconciliation runs after the other batches and uses RR isolation for its final verification pass (acceptable because it runs when OLTP volume is lowest)
26.11 Meridian Bank Concurrency Design
Transaction Inventory
Before choosing isolation levels and locking strategies, we inventory every transaction type:
| Transaction | Frequency | Tables Touched | Sensitivity | Recommended Isolation |
|---|---|---|---|---|
| Balance Inquiry | 50,000/hour | ACCOUNTS (read) | Low — display only | CS |
| Funds Transfer | 10,000/hour | ACCOUNTS (update x2), TRANSACTIONS (insert) | High — money movement | RS |
| Account Open | 500/hour | ACCOUNTS (insert), CUSTOMERS (update) | Medium | CS |
| Loan Payment | 2,000/hour | LOANS (update), ACCOUNTS (update), TRANSACTIONS (insert) | High — money movement | RS |
| Dashboard (internal) | 100/hour | Various (read-only) | None — approximate OK | UR |
| Interest Calculation (batch) | 1/day | ACCOUNTS (update all) | High — but batch window | CS + frequent COMMIT |
| Statement Generation (batch) | 1/day | TRANSACTIONS (read), ACCOUNTS (read) | Medium | CS |
| Month-End Reconciliation | 1/month | All financial tables | Very High | RR (final pass only) |
Deadlock Prevention Strategy
Rule 1: Consistent Lock Ordering. All transactions that touch multiple tables must lock them in this order: CUSTOMERS → ACCOUNTS → LOANS → TRANSACTIONS. This is enforced at the application framework level.
Rule 2: Funds Transfer Pattern. To prevent deadlocks when transferring between two accounts, always lock the lower account number first:
-- Application code for funds transfer
IF :from_acct < :to_acct THEN
-- Lock source first (lower number)
SELECT balance INTO :from_balance FROM accounts
WHERE acct_id = :from_acct FOR UPDATE;
SELECT balance INTO :to_balance FROM accounts
WHERE acct_id = :to_acct FOR UPDATE;
ELSE
-- Lock destination first (lower number)
SELECT balance INTO :to_balance FROM accounts
WHERE acct_id = :to_acct FOR UPDATE;
SELECT balance INTO :from_balance FROM accounts
WHERE acct_id = :from_acct FOR UPDATE;
END IF;
Rule 3: Retry Logic Mandatory. All application code includes -911 retry logic with a maximum of 3 retries and exponential backoff. This is implemented in the database access layer, not in individual transactions.
Batch Window Locking Plan
23:00 - 01:00 Interest Calculation
- Process by partition (branch ranges)
- COMMIT every 5,000 rows
- CS isolation
- OLTP continues normally
01:00 - 03:00 Statement Generation
- CS isolation with UR on reference tables
- Read-only — no lock conflicts with OLTP
- OLTP continues normally
03:00 - 05:00 End-of-Day Reconciliation
- Phase 1 (03:00-04:30): RS isolation, cross-check balances
- Phase 2 (04:30-05:00): RR isolation, final verification
* RR acceptable because OLTP volume < 100 TPS at this hour
- Any discrepancy triggers alert, manual review before business day
Monitoring and Alerting
-- LUW: Alert if lock escalation exceeds threshold
-- (Run from monitoring scheduled task)
SELECT LOCK_ESCALS
FROM TABLE(MON_GET_DATABASE(-2)) AS T;
-- Alert if LOCK_ESCALS increased by > 10 since last check
-- LUW: Alert if deadlocks exceed threshold
SELECT DEADLOCKS
FROM TABLE(MON_GET_DATABASE(-2)) AS T;
-- Alert if DEADLOCKS increased by > 5 in 1 hour
-- z/OS: Monitor via IFCID
-- IFCID 0172 = Lock timeout
-- IFCID 0196 = Lock escalation
-- IFCID 0147 = Deadlock
Spaced Review: Connecting to Earlier Chapters
From Chapter 9 (Transaction Management)
Question: In Chapter 9, you learned that COMMIT releases locks and makes changes permanent. Given what you now know about isolation levels, explain why a transaction running under RS isolation that reads 10,000 rows and commits every 100 rows might not achieve the "repeatable read" guarantee it expects.
Answer: RS holds read locks until COMMIT. If the transaction commits every 100 rows, it releases its locks on the first 100 rows at the first COMMIT. Another transaction can then modify those rows. When the first transaction continues processing (after COMMIT), it no longer has the stability guarantee for the already-committed rows. The frequent COMMIT pattern breaks the RS guarantee. This is why RS is best used for short transactions, not long batch processes with intermediate COMMITs.
From Chapter 17 (Indexing Strategies)
Question: How do indexes affect locking? If a query uses a tablespace scan instead of an index access, what happens to locking under RR isolation?
Answer: Under RR, DB2 locks every row it examines, not just rows that qualify. A tablespace scan examines every row in the table, so RR isolation on a tablespace scan locks the entire table — effectively a table lock via row locks, which may then escalate. With an index access, DB2 only examines qualifying rows (and perhaps a few extra), so far fewer locks are acquired. Good indexing reduces not only I/O but also lock contention under high isolation levels.
From Chapter 22 (EXPLAIN and Access Paths)
Question: You are diagnosing lock contention on a table. The application uses RS isolation, and you see frequent lock escalation. You run EXPLAIN on the problematic query and see a tablespace scan. How do these findings connect?
Answer: The tablespace scan under RS means DB2 acquires share locks on every qualifying row — potentially thousands or millions. These locks accumulate until COMMIT because RS holds them. When the count exceeds the escalation threshold, DB2 escalates to a table lock, blocking all writers. The fix is likely to create an index that allows DB2 to access only the needed rows, reducing the lock count below the escalation threshold. This connects access path optimization (Chapter 22) directly to concurrency behavior (this chapter).
Common Locking Anti-Patterns
Before we summarize, it is worth cataloging the locking mistakes that appear repeatedly in production DB2 systems. Recognizing these patterns can save hours of debugging.
Anti-Pattern 1: The "SELECT *, Then Filter in the Application" Pattern
-- Application reads ALL accounts, then filters in Java/COBOL
SELECT * FROM accounts WITH RS;
-- Application loops through results, processing only branch 100
Under RS, this query locks every qualifying row in the ACCOUNTS table until COMMIT — potentially millions of rows — when the application only needs a few hundred. The fix is to push the filter into the SQL:
SELECT * FROM accounts WHERE branch_id = 100 WITH RS;
This locks only the qualifying rows, dramatically reducing the lock footprint. This anti-pattern is especially dangerous because it appears to work correctly in development (low volume, no contention) but causes lock escalation and throughput collapse in production.
Anti-Pattern 2: The "Open Cursor, Go to Lunch" Pattern
An interactive application opens a cursor with FOR UPDATE, displays the data to the user, and waits for the user to make changes. The cursor (and its locks) remain open for minutes or hours. Every other transaction that needs those rows waits.
The fix is optimistic locking: read the data (releasing the lock), let the user edit, and use a version check when writing.
Anti-Pattern 3: The "RR for Everything" Pattern
A well-meaning developer sets the application's default isolation to RR "for safety." Under RR, every query locks all examined rows (including non-qualifying rows from index scans and tablespace scans) until COMMIT. This creates massive lock footprints, frequent escalation, and deadlocks.
The fix is to use CS as the default and RS only for transactions that require read stability. RR should be reserved for specific regulatory or audit requirements.
Anti-Pattern 4: The "Fire and Forget" Batch Pattern
A batch job processes millions of rows with a single COMMIT at the end. If the job fails at row 999,999, all 999,999 rows of work are rolled back. If the job runs during OLTP hours, it accumulates locks until escalation. If another batch job needs the same table, it waits for hours.
The fix is COMMIT every N rows (typically 1,000-10,000) with a restartable design that tracks progress and can resume from the last committed point.
Anti-Pattern 5: The "Ignore -911" Pattern
An application does not handle SQLCODE -911 (deadlock/timeout). When the error occurs, the application crashes or returns a generic error to the user. The user retries manually, adding load to an already-contended system.
The fix is automatic retry with exponential backoff, built into the data access layer so every transaction benefits without individual coding.
Summary
Locking is the mechanism by which DB2 ensures data consistency in a multi-user environment. The four isolation levels — UR, CS, RS, and RR — provide progressively stronger consistency guarantees at progressively higher concurrency costs. CS is the right choice for most OLTP transactions. RS is appropriate when read stability is required for correctness. RR should be reserved for scenarios requiring absolute consistency. UR is valuable for read-only queries that must not interfere with OLTP traffic.
Lock escalation, deadlocks, and lock waits are the three primary locking problems in production systems. Preventing escalation requires frequent COMMITs, appropriate isolation levels, and adequate lock memory. Preventing deadlocks requires consistent lock ordering and short transactions. Managing lock waits requires proper timeout configuration and monitoring.
The combination of lock avoidance (z/OS) and Currently Committed semantics (LUW) represents DB2's evolution toward reducing lock overhead without sacrificing consistency. These features, combined with SKIP LOCKED DATA for queue processing patterns, give DBAs a rich toolkit for designing high-concurrency systems.
In the next chapter, we will build on this foundation by developing a systematic methodology for diagnosing performance problems — including lock-bound problems — in production DB2 systems.