Case Study 1: GlobalBank Quick Lookup Table Performance Optimization

DataField.Dev

Case Study 1: GlobalBank Quick Lookup Table Performance Optimization

Background

GlobalBank's nightly batch cycle processes 2.3 million transactions against the ACCT-MASTER VSAM KSDS file. The batch window — the time between the close of business and the morning's online systems coming up — is 6 hours. By late 2023, the transaction processing job (TXN-PROC) consumed 3 hours and 47 minutes of that window, leaving uncomfortably little margin for the downstream report generation, reconciliation, and backup jobs.

Maria Chen analyzed the I/O profile and discovered that 60% of transactions (approximately 1.38 million) targeted only 50,000 accounts — the "hot accounts." These were corporate accounts with multiple daily transactions, high-activity checking accounts, and accounts receiving automated payments. Every lookup against the KSDS required 3-4 I/O operations for index traversal plus data read.

"We're doing roughly 5 million I/Os just for the hot account lookups," Maria reported to the architecture review board. "If we can get those down to 1 I/O each, we save 4 million I/Os and cut the job by at least 20 minutes."

Design Phase

Option Analysis

Option	Estimated Improvement	Complexity	Risk
Add VSAM LSR buffering	10-15% (index caching)	Low	Low
Increase BUFND/BUFNI	5-10%	Low	Low
RRDS quick lookup cache	25-30% for hot accounts	Medium	Medium
In-memory table (OCCURS)	40%+	High	High (memory)

Priya Kapoor recommended the RRDS approach: "LSR buffering helps but won't solve the fundamental problem of 3-4 I/Os per lookup. An in-memory table for 50,000 records would consume 17 MB of working storage — too much for our region size. The RRDS gives us single-I/O access without the memory pressure."

Hash Function Selection

Derek Washington was assigned to prototype hash functions. He tested three approaches against the actual hot-account key distribution:

Hash Function	Collisions (out of 50,000)	Max Chain	Avg Probes
Division by 64,997 (prime)	8,247	7	1.17
Folding (2-digit groups)	12,891	12	1.26
Mid-square	9,103	9	1.19

Division by prime produced the fewest collisions and shortest chains. "The account numbers have patterns — branches assign them in blocks," Derek explained. "The prime divisor breaks up those patterns better than folding or mid-square."

File Sizing

With 50,000 records and a target load factor of 77%:

Slots needed = 50,000 / 0.77 = 64,935
Nearest prime ≥ 64,935 = 64,997

Record size: 100 bytes (account number, name, balance, status, last-access date). Total file size: 64,997 * 100 = ~6.5 MB — trivial for DASD allocation.

Implementation

The implementation followed three stages:

Stage 1: Hot Account Identification

A new batch step ran before TXN-PROC, analyzing the previous 30 days of transaction history to identify the 50,000 most-accessed accounts:

//HOTIDENT EXEC PGM=HOTACCTS
//TXN-HIST DD DSN=GLOBANK.TXN.HIST.VSAM,DISP=SHR
//HOTLIST  DD DSN=GLOBANK.HOT.ACCT.LIST,DISP=(,CATLG,...),
//            SPACE=(CYL,(1,1)),DCB=(RECFM=FB,LRECL=55)

Stage 2: RRDS Load

The QUICK-LOAD program (presented in Section 13.6) loaded the hot accounts into the RRDS using the division/remainder hash with linear probing.

Stage 3: TXN-PROC Modification

The existing TXN-PROC program was modified to check the RRDS first:

For each transaction:
  1. Compute hash of account number
  2. Probe RRDS (1-2 I/Os typical)
  3. If found → use cached data
  4. If not found → fall back to KSDS (3-4 I/Os)
  5. Process transaction

Testing Results

Pilot Run (10% of transactions)

Metric	Before (KSDS only)	After (RRDS + KSDS)
I/Os for hot accounts	838,000	247,000
I/Os for cold accounts	368,000	368,000
Total I/Os	1,206,000	615,000
Elapsed time	23 min	17 min

Full Production Run

Metric	Before	After	Improvement
Hot account I/Os	4,830,000	1,590,000	-67%
Cold account I/Os	3,680,000	3,680,000	No change
Total I/Os	8,510,000	5,270,000	-38%
TXN-PROC elapsed	3h 47min	3h 25min	-22 min
QUICK-LOAD elapsed	N/A	3 min	New step
Net time savings	—	—	19 min

Cache Statistics

QUICK LOOKUP LOAD COMPLETE
Records loaded:  0050000
Collisions:      0008247
Overflows:       0000000
Load factor:     0000076%

TRANSACTION PROCESSING COMPLETE
Cache hits:      1,241,837
Cache misses:    138,163 (hot accts not found on first probe)
KSDS fallbacks:  920,000 (cold accounts)
Cache hit ratio: 57.4% (of all transactions)

Production Issues

Issue 1: Stale Cache Data

During the first week of production, the team discovered that accounts whose balances changed during batch processing had stale data in the RRDS cache. The TXN-PROC program read the cached balance, processed a transaction, then needed to write the updated balance back to the KSDS master — but the cache was now out of sync.

Fix: After each KSDS REWRITE (balance update), also REWRITE the RRDS cache record with the new balance. This added minimal overhead (one extra I/O per update) but kept the cache consistent throughout the batch run.

Issue 2: Hash Function Drift

After six months, the hot account profile shifted. New corporate accounts had different number patterns, causing more collisions:

Month	Collisions	Avg Probes
Month 1	8,247	1.17
Month 3	9,412	1.19
Month 6	14,823	1.31

Fix: The QUICK-LOAD program was enhanced to log collision statistics. A monitoring job alerts operations when average probes exceed 1.5, triggering a review of the hash function parameters.

Lessons Learned

Profile before optimizing: The 60/15 ratio (60% of transactions hitting 15% of accounts) made the cache viable. Without that skewed distribution, the approach would not have been worthwhile.
Cache coherence matters: Even in batch processing, cache consistency must be managed. The stale data issue could have caused balance discrepancies.
Monitor hash performance over time: Account number patterns change as new accounts are opened. The hash function's effectiveness can degrade without anyone noticing.
Keep the fallback path: The KSDS lookup must always work, even if the cache fails completely. The cache is an optimization, not a replacement.

Discussion Questions

What would happen if the hot account profile changed dramatically — say, from 50,000 accounts to 200,000? How would you adjust the design?
Instead of rebuilding the cache nightly, could the cache be maintained continuously (updated as accounts change)? What are the trade-offs?
The team chose linear probing for collision handling. Would quadratic probing or double hashing have been significantly better for this use case? Why or why not?
At what point should GlobalBank consider replacing the RRDS cache with an in-memory approach (larger COBOL tables or a data-grid technology like IBM zOSConnect)?