Case Study 02: Concurrency Design for High-Volume OLTP

Background

PayStream Financial is a payment processing company that handles credit card authorizations for 15,000 merchants. They are migrating from a legacy mainframe system to DB2 LUW and need to design a concurrency strategy that supports their peak throughput of 8,000 authorization requests per second with a 99th percentile response time requirement of 100ms.

The System

Transaction Types

Transaction Volume Tables Operation
Authorization Request 8,000/sec peak MERCHANT_ACCOUNTS, AUTH_LOG, DAILY_LIMITS Read merchant, validate, log auth, update daily total
Settlement (batch) 1/day AUTH_LOG, MERCHANT_ACCOUNTS, SETTLEMENT_RECORDS Read all authorized transactions, compute settlement amounts, create settlement records
Chargeback 50/hour AUTH_LOG, CHARGEBACKS, MERCHANT_ACCOUNTS Create chargeback record, adjust merchant balance
Merchant Balance Inquiry 200/hour MERCHANT_ACCOUNTS Read-only
Daily Limit Reset (batch) 1/day DAILY_LIMITS Reset all daily spending limits to zero

Table Designs

CREATE TABLE merchant_accounts (
    merchant_id     INTEGER NOT NULL PRIMARY KEY,
    merchant_name   VARCHAR(200),
    balance         DECIMAL(15,2) NOT NULL DEFAULT 0.00,
    status          CHAR(1) NOT NULL DEFAULT 'A',
    risk_level      CHAR(1) NOT NULL DEFAULT 'L',
    last_modified   TIMESTAMP DEFAULT CURRENT TIMESTAMP,
    version_num     INTEGER DEFAULT 0
);

CREATE TABLE auth_log (
    auth_id         BIGINT GENERATED ALWAYS AS IDENTITY,
    merchant_id     INTEGER NOT NULL,
    card_hash       CHAR(64) NOT NULL,
    amount          DECIMAL(10,2) NOT NULL,
    auth_code       CHAR(6),
    status          CHAR(1) NOT NULL,  -- A=approved, D=declined, P=pending
    auth_timestamp  TIMESTAMP DEFAULT CURRENT TIMESTAMP,
    settlement_id   INTEGER,
    PRIMARY KEY (auth_id)
)
PARTITION BY RANGE (auth_timestamp) (
    PARTITION p_current  STARTING CURRENT DATE,
    PARTITION p_prev     STARTING CURRENT DATE - 1 DAY,
    PARTITION p_archive  STARTING MINVALUE
);

CREATE TABLE daily_limits (
    merchant_id     INTEGER NOT NULL,
    limit_date      DATE NOT NULL,
    daily_total     DECIMAL(15,2) NOT NULL DEFAULT 0.00,
    daily_max       DECIMAL(15,2) NOT NULL,
    trans_count     INTEGER NOT NULL DEFAULT 0,
    PRIMARY KEY (merchant_id, limit_date)
);

The Concurrency Challenges

Challenge 1: Hot Row on DAILY_LIMITS

The DAILY_LIMITS table has one row per merchant per day. A busy merchant might process 1,000 authorizations per second, all updating the same DAILY_LIMITS row. Under pessimistic locking, these 1,000 transactions would serialize on that single row.

Analysis: If each authorization holds an X lock for 1ms (the time to update the row and release the lock), the theoretical maximum throughput for a single merchant is 1,000 authorizations per second. But the actual X lock hold time includes network round-trip, commit processing, and log write — realistically 5-10ms. This limits a single merchant to 100-200 authorizations per second.

Solution: Partitioned Counter

Instead of a single DAILY_LIMITS row per merchant, create N rows (one per "slot"):

CREATE TABLE daily_limits_partitioned (
    merchant_id     INTEGER NOT NULL,
    limit_date      DATE NOT NULL,
    slot_id         SMALLINT NOT NULL,  -- 0-9
    daily_total     DECIMAL(15,2) NOT NULL DEFAULT 0.00,
    trans_count     INTEGER NOT NULL DEFAULT 0,
    PRIMARY KEY (merchant_id, limit_date, slot_id)
);

-- Initialize 10 slots per merchant per day
INSERT INTO daily_limits_partitioned
SELECT merchant_id, CURRENT DATE, slot_id, 0.00, 0
FROM merchant_accounts,
     (VALUES 0,1,2,3,4,5,6,7,8,9) AS slots(slot_id);

Each authorization randomly picks a slot:

UPDATE daily_limits_partitioned
SET daily_total = daily_total + :auth_amount,
    trans_count = trans_count + 1
WHERE merchant_id = :merchant_id
  AND limit_date = CURRENT DATE
  AND slot_id = MOD(:random_number, 10);

To check the daily total:

SELECT SUM(daily_total) AS total, SUM(trans_count) AS count
FROM daily_limits_partitioned
WHERE merchant_id = :merchant_id
  AND limit_date = CURRENT DATE;

With 10 slots, the X lock contention is reduced by 10x, supporting 1,000-2,000 authorizations per second per merchant.

Challenge 2: Authorization vs. Settlement Conflict

The settlement batch job reads millions of AUTH_LOG rows to compute daily settlements. If it uses RS isolation, it holds S locks on all read rows, blocking new authorizations from inserting into the AUTH_LOG table (because of table-level IX vs. IS contention during escalation).

Solution: Time-Partitioned Processing

AUTH_LOG is partitioned by date. The settlement batch processes only yesterday's partition, while current authorizations write to today's partition. There is zero lock conflict because the operations touch different partitions.

-- Settlement reads yesterday's data (no conflict with today's inserts)
SELECT merchant_id, SUM(amount) AS settlement_amount, COUNT(*) AS auth_count
FROM auth_log
WHERE auth_timestamp >= CURRENT DATE - 1 DAY
  AND auth_timestamp < CURRENT DATE
  AND status = 'A'
GROUP BY merchant_id
WITH CS;  -- CS is sufficient because data is frozen (yesterday's partition)

Challenge 3: Chargeback During Authorization

A chargeback adjusts a merchant's balance in MERCHANT_ACCOUNTS. Authorizations read (but do not update) MERCHANT_ACCOUNTS to check merchant status. Under CS, the chargeback's X lock blocks the authorization's S lock on the same row.

Solution: Currently Committed + Optimistic Validation

Enable Currently Committed so authorization reads see the previously committed merchant status without waiting for the chargeback to complete:

-- Database configuration
-- db2 update db cfg using CUR_COMMIT ON

-- Authorization reads merchant status (CC allows reading without waiting)
SELECT status, risk_level
FROM merchant_accounts
WHERE merchant_id = :merchant_id
WITH CS;
-- If merchant is in a chargeback, CC returns the pre-chargeback status
-- This is acceptable because the chargeback does not change the
-- merchant's active/inactive status in most cases

Challenge 4: Daily Limit Reset vs. Active Authorizations

At midnight, the daily limit reset batch sets all DAILY_LIMITS rows to zero. Active authorizations are still updating these rows.

Solution: Rolling Date Design

Instead of resetting rows, create new rows for each day. Authorizations always write to the current date's rows. There is no reset needed — old date rows are simply archived.

-- Authorization: writes to today's row (creates it if needed via MERGE)
MERGE INTO daily_limits_partitioned AS target
USING (VALUES (:merchant_id, CURRENT DATE, MOD(:random, 10), :amount))
  AS source (merchant_id, limit_date, slot_id, amount)
ON target.merchant_id = source.merchant_id
  AND target.limit_date = source.limit_date
  AND target.slot_id = source.slot_id
WHEN MATCHED THEN UPDATE SET
  daily_total = target.daily_total + source.amount,
  trans_count = target.trans_count + 1
WHEN NOT MATCHED THEN INSERT
  (merchant_id, limit_date, slot_id, daily_total, trans_count)
  VALUES (source.merchant_id, source.limit_date, source.slot_id,
          source.amount, 1);

No batch reset is needed. A nightly cleanup archives rows older than 7 days.

Complete Concurrency Design

Isolation Levels

Transaction Isolation Rationale
Authorization Request CS Short transaction, CC handles read-write conflicts
Settlement (batch) CS Processes frozen (yesterday's) data; no conflict with OLTP
Chargeback CS Short transaction, updates a single merchant row
Merchant Balance Inquiry UR Read-only, approximate balance acceptable
Daily Limit Reset N/A Eliminated by rolling date design

Lock Ordering

All transactions that touch multiple tables follow this order: 1. MERCHANT_ACCOUNTS (read or update) 2. DAILY_LIMITS_PARTITIONED (update) 3. AUTH_LOG (insert) 4. SETTLEMENT_RECORDS (insert) 5. CHARGEBACKS (insert)

Configuration

LOCKLIST = 32768        -- 128 MB for lock memory
MAXLOCKS = 10           -- 10% per application before escalation
LOCKTIMEOUT = 10        -- 10 seconds (short for OLTP)
DLCHKTIME = 5000        -- 5-second deadlock detection
CUR_COMMIT = ON         -- Enable Currently Committed

Monitoring Thresholds

Metric Warning Critical Action
Deadlocks/hour > 10 > 50 Investigate lock ordering violation
Lock timeouts/hour > 20 > 100 Find long-running transaction
Lock escalations/hour > 5 > 20 Review batch commit frequency
Avg lock wait (ms) > 5 > 20 Identify contention source

Performance Verification

Load Test Results

Configuration: 200 concurrent authorization threads, 8,000 TPS target

Metric Before Optimization After Optimization
Throughput (TPS) 3,200 8,500
99th percentile latency 450 ms 62 ms
Deadlocks/hour 340 0
Lock escalations/hour 45 0
Lock waits/hour 85,000 4,200
Avg lock wait time 28 ms 1.2 ms

The optimized design exceeds the 8,000 TPS target with the 99th percentile latency well within the 100ms requirement.

Key Design Principles

  1. Partition hot rows to distribute lock contention across multiple physical rows.
  2. Use time-based partitioning to isolate batch reads from OLTP writes.
  3. Enable Currently Committed to prevent reader-writer blocking under CS.
  4. Design away the conflict rather than tuning parameters. The daily limit reset was eliminated entirely by using a rolling date design.
  5. Keep transactions short. The authorization transaction touches 3 rows and commits. Total lock hold time is under 2ms.

Discussion Questions

  1. The partitioned counter approach (10 slots per merchant) trades accuracy for concurrency. During the brief window between checking the daily total and updating the slot, could a merchant exceed their daily limit? How would you handle this?

  2. If PayStream needs to support real-time balance updates (not "currently committed" approximate reads), how would the concurrency design change for the authorization transaction?

  3. The settlement batch uses CS because it processes "frozen" data (yesterday's partition). Under what circumstances could this assumption break, and how would you guard against it?

  4. How would this design differ on z/OS with page-level locking? Which aspects would need to change?