Case Study 02: Concurrency Design for High-Volume OLTP
Background
PayStream Financial is a payment processing company that handles credit card authorizations for 15,000 merchants. They are migrating from a legacy mainframe system to DB2 LUW and need to design a concurrency strategy that supports their peak throughput of 8,000 authorization requests per second with a 99th percentile response time requirement of 100ms.
The System
Transaction Types
| Transaction | Volume | Tables | Operation |
|---|---|---|---|
| Authorization Request | 8,000/sec peak | MERCHANT_ACCOUNTS, AUTH_LOG, DAILY_LIMITS | Read merchant, validate, log auth, update daily total |
| Settlement (batch) | 1/day | AUTH_LOG, MERCHANT_ACCOUNTS, SETTLEMENT_RECORDS | Read all authorized transactions, compute settlement amounts, create settlement records |
| Chargeback | 50/hour | AUTH_LOG, CHARGEBACKS, MERCHANT_ACCOUNTS | Create chargeback record, adjust merchant balance |
| Merchant Balance Inquiry | 200/hour | MERCHANT_ACCOUNTS | Read-only |
| Daily Limit Reset (batch) | 1/day | DAILY_LIMITS | Reset all daily spending limits to zero |
Table Designs
CREATE TABLE merchant_accounts (
merchant_id INTEGER NOT NULL PRIMARY KEY,
merchant_name VARCHAR(200),
balance DECIMAL(15,2) NOT NULL DEFAULT 0.00,
status CHAR(1) NOT NULL DEFAULT 'A',
risk_level CHAR(1) NOT NULL DEFAULT 'L',
last_modified TIMESTAMP DEFAULT CURRENT TIMESTAMP,
version_num INTEGER DEFAULT 0
);
CREATE TABLE auth_log (
auth_id BIGINT GENERATED ALWAYS AS IDENTITY,
merchant_id INTEGER NOT NULL,
card_hash CHAR(64) NOT NULL,
amount DECIMAL(10,2) NOT NULL,
auth_code CHAR(6),
status CHAR(1) NOT NULL, -- A=approved, D=declined, P=pending
auth_timestamp TIMESTAMP DEFAULT CURRENT TIMESTAMP,
settlement_id INTEGER,
PRIMARY KEY (auth_id)
)
PARTITION BY RANGE (auth_timestamp) (
PARTITION p_current STARTING CURRENT DATE,
PARTITION p_prev STARTING CURRENT DATE - 1 DAY,
PARTITION p_archive STARTING MINVALUE
);
CREATE TABLE daily_limits (
merchant_id INTEGER NOT NULL,
limit_date DATE NOT NULL,
daily_total DECIMAL(15,2) NOT NULL DEFAULT 0.00,
daily_max DECIMAL(15,2) NOT NULL,
trans_count INTEGER NOT NULL DEFAULT 0,
PRIMARY KEY (merchant_id, limit_date)
);
The Concurrency Challenges
Challenge 1: Hot Row on DAILY_LIMITS
The DAILY_LIMITS table has one row per merchant per day. A busy merchant might process 1,000 authorizations per second, all updating the same DAILY_LIMITS row. Under pessimistic locking, these 1,000 transactions would serialize on that single row.
Analysis: If each authorization holds an X lock for 1ms (the time to update the row and release the lock), the theoretical maximum throughput for a single merchant is 1,000 authorizations per second. But the actual X lock hold time includes network round-trip, commit processing, and log write — realistically 5-10ms. This limits a single merchant to 100-200 authorizations per second.
Solution: Partitioned Counter
Instead of a single DAILY_LIMITS row per merchant, create N rows (one per "slot"):
CREATE TABLE daily_limits_partitioned (
merchant_id INTEGER NOT NULL,
limit_date DATE NOT NULL,
slot_id SMALLINT NOT NULL, -- 0-9
daily_total DECIMAL(15,2) NOT NULL DEFAULT 0.00,
trans_count INTEGER NOT NULL DEFAULT 0,
PRIMARY KEY (merchant_id, limit_date, slot_id)
);
-- Initialize 10 slots per merchant per day
INSERT INTO daily_limits_partitioned
SELECT merchant_id, CURRENT DATE, slot_id, 0.00, 0
FROM merchant_accounts,
(VALUES 0,1,2,3,4,5,6,7,8,9) AS slots(slot_id);
Each authorization randomly picks a slot:
UPDATE daily_limits_partitioned
SET daily_total = daily_total + :auth_amount,
trans_count = trans_count + 1
WHERE merchant_id = :merchant_id
AND limit_date = CURRENT DATE
AND slot_id = MOD(:random_number, 10);
To check the daily total:
SELECT SUM(daily_total) AS total, SUM(trans_count) AS count
FROM daily_limits_partitioned
WHERE merchant_id = :merchant_id
AND limit_date = CURRENT DATE;
With 10 slots, the X lock contention is reduced by 10x, supporting 1,000-2,000 authorizations per second per merchant.
Challenge 2: Authorization vs. Settlement Conflict
The settlement batch job reads millions of AUTH_LOG rows to compute daily settlements. If it uses RS isolation, it holds S locks on all read rows, blocking new authorizations from inserting into the AUTH_LOG table (because of table-level IX vs. IS contention during escalation).
Solution: Time-Partitioned Processing
AUTH_LOG is partitioned by date. The settlement batch processes only yesterday's partition, while current authorizations write to today's partition. There is zero lock conflict because the operations touch different partitions.
-- Settlement reads yesterday's data (no conflict with today's inserts)
SELECT merchant_id, SUM(amount) AS settlement_amount, COUNT(*) AS auth_count
FROM auth_log
WHERE auth_timestamp >= CURRENT DATE - 1 DAY
AND auth_timestamp < CURRENT DATE
AND status = 'A'
GROUP BY merchant_id
WITH CS; -- CS is sufficient because data is frozen (yesterday's partition)
Challenge 3: Chargeback During Authorization
A chargeback adjusts a merchant's balance in MERCHANT_ACCOUNTS. Authorizations read (but do not update) MERCHANT_ACCOUNTS to check merchant status. Under CS, the chargeback's X lock blocks the authorization's S lock on the same row.
Solution: Currently Committed + Optimistic Validation
Enable Currently Committed so authorization reads see the previously committed merchant status without waiting for the chargeback to complete:
-- Database configuration
-- db2 update db cfg using CUR_COMMIT ON
-- Authorization reads merchant status (CC allows reading without waiting)
SELECT status, risk_level
FROM merchant_accounts
WHERE merchant_id = :merchant_id
WITH CS;
-- If merchant is in a chargeback, CC returns the pre-chargeback status
-- This is acceptable because the chargeback does not change the
-- merchant's active/inactive status in most cases
Challenge 4: Daily Limit Reset vs. Active Authorizations
At midnight, the daily limit reset batch sets all DAILY_LIMITS rows to zero. Active authorizations are still updating these rows.
Solution: Rolling Date Design
Instead of resetting rows, create new rows for each day. Authorizations always write to the current date's rows. There is no reset needed — old date rows are simply archived.
-- Authorization: writes to today's row (creates it if needed via MERGE)
MERGE INTO daily_limits_partitioned AS target
USING (VALUES (:merchant_id, CURRENT DATE, MOD(:random, 10), :amount))
AS source (merchant_id, limit_date, slot_id, amount)
ON target.merchant_id = source.merchant_id
AND target.limit_date = source.limit_date
AND target.slot_id = source.slot_id
WHEN MATCHED THEN UPDATE SET
daily_total = target.daily_total + source.amount,
trans_count = target.trans_count + 1
WHEN NOT MATCHED THEN INSERT
(merchant_id, limit_date, slot_id, daily_total, trans_count)
VALUES (source.merchant_id, source.limit_date, source.slot_id,
source.amount, 1);
No batch reset is needed. A nightly cleanup archives rows older than 7 days.
Complete Concurrency Design
Isolation Levels
| Transaction | Isolation | Rationale |
|---|---|---|
| Authorization Request | CS | Short transaction, CC handles read-write conflicts |
| Settlement (batch) | CS | Processes frozen (yesterday's) data; no conflict with OLTP |
| Chargeback | CS | Short transaction, updates a single merchant row |
| Merchant Balance Inquiry | UR | Read-only, approximate balance acceptable |
| Daily Limit Reset | N/A | Eliminated by rolling date design |
Lock Ordering
All transactions that touch multiple tables follow this order: 1. MERCHANT_ACCOUNTS (read or update) 2. DAILY_LIMITS_PARTITIONED (update) 3. AUTH_LOG (insert) 4. SETTLEMENT_RECORDS (insert) 5. CHARGEBACKS (insert)
Configuration
LOCKLIST = 32768 -- 128 MB for lock memory
MAXLOCKS = 10 -- 10% per application before escalation
LOCKTIMEOUT = 10 -- 10 seconds (short for OLTP)
DLCHKTIME = 5000 -- 5-second deadlock detection
CUR_COMMIT = ON -- Enable Currently Committed
Monitoring Thresholds
| Metric | Warning | Critical | Action |
|---|---|---|---|
| Deadlocks/hour | > 10 | > 50 | Investigate lock ordering violation |
| Lock timeouts/hour | > 20 | > 100 | Find long-running transaction |
| Lock escalations/hour | > 5 | > 20 | Review batch commit frequency |
| Avg lock wait (ms) | > 5 | > 20 | Identify contention source |
Performance Verification
Load Test Results
Configuration: 200 concurrent authorization threads, 8,000 TPS target
| Metric | Before Optimization | After Optimization |
|---|---|---|
| Throughput (TPS) | 3,200 | 8,500 |
| 99th percentile latency | 450 ms | 62 ms |
| Deadlocks/hour | 340 | 0 |
| Lock escalations/hour | 45 | 0 |
| Lock waits/hour | 85,000 | 4,200 |
| Avg lock wait time | 28 ms | 1.2 ms |
The optimized design exceeds the 8,000 TPS target with the 99th percentile latency well within the 100ms requirement.
Key Design Principles
- Partition hot rows to distribute lock contention across multiple physical rows.
- Use time-based partitioning to isolate batch reads from OLTP writes.
- Enable Currently Committed to prevent reader-writer blocking under CS.
- Design away the conflict rather than tuning parameters. The daily limit reset was eliminated entirely by using a rolling date design.
- Keep transactions short. The authorization transaction touches 3 rows and commits. Total lock hold time is under 2ms.
Discussion Questions
-
The partitioned counter approach (10 slots per merchant) trades accuracy for concurrency. During the brief window between checking the daily total and updating the slot, could a merchant exceed their daily limit? How would you handle this?
-
If PayStream needs to support real-time balance updates (not "currently committed" approximate reads), how would the concurrency design change for the authorization transaction?
-
The settlement batch uses CS because it processes "frozen" data (yesterday's partition). Under what circumstances could this assumption break, and how would you guard against it?
-
How would this design differ on z/OS with page-level locking? Which aspects would need to change?