Case Study: MedClaim — Tuning Claim Adjudication Throughput
Background
MedClaim Health Services signed a contract with HealthNet Alliance, a network of 2,000 providers. The contract required MedClaim to process claims within 24 hours of electronic submission — a significant improvement over their existing 72-hour turnaround. To meet this SLA, the CLM-ADJUD program needed to process at least 20,000 claims per hour, up from the current throughput of 8,000.
James Okafor was given six weeks to achieve the throughput increase without replacing the existing COBOL program.
Profiling
James ran a Strobe profile during a peak processing period:
CLM-ADJUD Performance Profile
================================
Total Elapsed: 450.0 seconds (for 2,000 claims)
Total CPU: 68.2 seconds
CPU/Elapsed: 15.2%
Time Breakdown:
DB2 Operations: 292.5 sec (65.0%)
VSAM I/O: 90.0 sec (20.0%)
CPU Processing: 45.0 sec (10.0%)
CICS Overhead: 22.5 sec (5.0%)
Top DB2 Queries:
Fee schedule lookup: 180.0 sec (40.0% of total)
- 2,000 executions, avg 90ms each
Member eligibility: 67.5 sec (15.0%)
- 2,000 executions, avg 34ms each
Provider validation: 45.0 sec (10.0%)
- 2,000 executions, avg 22ms each
The fee schedule lookup alone consumed 40% of total elapsed time. Each lookup was a moderately complex query with a date range predicate and an ORDER BY.
Root Cause Analysis
James traced the fee schedule query:
SELECT ALLOWED_AMOUNT, MODIFIER_FACTOR
FROM FEE_SCHEDULE
WHERE PROVIDER_ID = :hv-provider
AND PROCEDURE_CODE = :hv-procedure
AND EFFECTIVE_DATE <= :hv-service-date
ORDER BY EFFECTIVE_DATE DESC
FETCH FIRST 1 ROW ONLY
This query was well-written individually — it used FETCH FIRST, it had an appropriate WHERE clause. But it executed once per claim, and the combination of the date range predicate with the ORDER BY prevented simple index-only access. Each execution required at least 2 DB2 I/O operations (index + data page).
The Cache Opportunity
James analyzed claim data patterns: - 50,000 unique provider/procedure combinations existed in the fee schedule - In any given batch, 80% of claims involved the top 500 combinations - Fee schedule entries changed at most monthly - The active fee schedule for any combination was the row with the most recent EFFECTIVE_DATE
This meant that 80% of DB2 calls were looking up the same 500 answers repeatedly.
The Three-Level Cache
Level 1: Pre-loaded Top 500
At program initialization, James loaded the top 500 provider/procedure combinations into a sorted WORKING-STORAGE table:
01 WS-FEE-CACHE.
05 WS-CACHE-COUNT PIC 9(4) COMP VALUE 0.
05 WS-CACHE-MAX PIC 9(4) COMP VALUE 500.
05 WS-CACHE-ENTRY OCCURS 500 TIMES
ASCENDING KEY IS WS-CACHE-KEY
INDEXED BY WS-CACHE-IDX.
10 WS-CACHE-KEY.
15 WS-CACHE-PROV PIC X(8).
15 WS-CACHE-PROC PIC X(5).
10 WS-CACHE-ALLOWED PIC S9(7)V99 COMP-3.
10 WS-CACHE-MODIFIER PIC S9(3)V99 COMP-3.
10 WS-CACHE-HIT-COUNT PIC 9(7) COMP.
The preload query ran once at startup:
SELECT F.PROVIDER_ID, F.PROCEDURE_CODE,
F.ALLOWED_AMOUNT, F.MODIFIER_FACTOR
FROM FEE_SCHEDULE F
INNER JOIN (
SELECT PROVIDER_ID, PROCEDURE_CODE,
MAX(EFFECTIVE_DATE) AS MAX_DATE
FROM FEE_SCHEDULE
WHERE EFFECTIVE_DATE <= CURRENT DATE
GROUP BY PROVIDER_ID, PROCEDURE_CODE
) M ON F.PROVIDER_ID = M.PROVIDER_ID
AND F.PROCEDURE_CODE = M.PROCEDURE_CODE
AND F.EFFECTIVE_DATE = M.MAX_DATE
ORDER BY (SELECT COUNT(*)
FROM CLAIMS_HISTORY CH
WHERE CH.PROVIDER_ID = F.PROVIDER_ID
AND CH.PROCEDURE_CODE = F.PROCEDURE_CODE
AND CH.CLAIM_DATE >= CURRENT DATE - 90 DAYS) DESC
FETCH FIRST 500 ROWS ONLY
Level 2: Dynamic Cache with LRU Eviction
Claims that missed the top-500 cache fell through to a secondary dynamic cache. When a DB2 lookup was needed, the result was stored in the least-recently-used (LRU) cache slot:
01 WS-DYN-CACHE.
05 WS-DYN-COUNT PIC 9(4) COMP VALUE 0.
05 WS-DYN-MAX PIC 9(4) COMP VALUE 200.
05 WS-DYN-ENTRY OCCURS 200 TIMES.
10 WS-DYN-KEY.
15 WS-DYN-PROV PIC X(8).
15 WS-DYN-PROC PIC X(5).
10 WS-DYN-ALLOWED PIC S9(7)V99 COMP-3.
10 WS-DYN-MODIFIER PIC S9(3)V99 COMP-3.
10 WS-DYN-ACCESS-TS PIC 9(14).
Level 3: Direct DB2 Lookup
Claims that missed both caches fell through to the direct DB2 query. James optimized this query by adding a composite index:
CREATE INDEX IX_FEE_PROV_PROC_DATE
ON FEE_SCHEDULE (PROVIDER_ID, PROCEDURE_CODE, EFFECTIVE_DATE DESC)
This allowed the query to satisfy the WHERE and ORDER BY from a single index scan.
Additional Optimizations
Member Eligibility Caching
James applied the same caching pattern to member eligibility lookups. Since members were rarely checked more than once per batch (unless they had multiple claims), the cache was less effective here (32% hit rate), but still saved time.
Multi-Row FETCH
For the provider validation query, which was already simple (single-row lookup by key), James switched to multi-row FETCH by batching claims by provider. When 10 claims for the same provider arrived in sequence, one DB2 call handled all 10.
VSAM Buffer Tuning
James increased BUFND from 5 to 25 for the claims master file, and BUFNI from 3 to 15 for the index component.
Results
Throughput Comparison
| Metric | Before | After | Improvement |
|---|---|---|---|
| Claims per hour | 8,000 | 26,000 | 3.25x |
| Avg time per claim | 450ms | 138ms | 3.26x |
| DB2 calls per claim | 3.0 | 0.68 | 77% reduction |
| Fee schedule cache hit rate | N/A | 78% (L1) + 9% (L2) = 87% | |
| Member cache hit rate | N/A | 32% |
Cost Breakdown (Per 10,000 Claims)
| Component | Before | After |
|---|---|---|
| DB2 elapsed | 325 sec | 72 sec |
| VSAM I/O | 100 sec | 35 sec |
| CPU | 50 sec | 42 sec |
| Other | 25 sec | 15 sec |
| Total | 500 sec | 164 sec |
Cache Statistics (Typical Run)
Fee Schedule Cache Report
========================
Level 1 (Top 500): 78,432 hits / 100,000 lookups (78.4%)
Level 2 (Dynamic): 8,891 hits / 21,568 lookups (41.2%)
Level 3 (DB2): 12,677 queries
Total DB2 calls: 12,677 (was 100,000)
DB2 call reduction: 87.3%
Lessons Learned
-
Caching is the most powerful optimization when access patterns are skewed. The Pareto principle (80/20 rule) applied perfectly — 80% of claims hit the top 500 fee schedule entries.
-
Pre-loading is worth the startup cost. The preload query took 8 seconds to run. It saved an average of 260 seconds per 10,000-claim batch. The ROI was 32x.
-
Index design matters as much as query design. The composite index on FEE_SCHEDULE eliminated the sort operation that was costing 90ms per query.
-
The residual 13% of cache misses still matter. Level 3 (direct DB2) handled 12,677 queries per batch. Without the index optimization, those queries alone would have taken 19 minutes.
Discussion Questions
-
The LRU dynamic cache (Level 2) had a 41% hit rate. Is this worth the complexity of maintaining it? Under what circumstances would you remove it?
-
James chose a cache size of 500 for Level 1. How would you determine the optimal cache size? What data would you need to analyze?
-
The fee schedule changes monthly. What happens if a fee schedule update is applied mid-batch? How would you handle cache invalidation?
-
This solution trades memory (WORKING-STORAGE for caches) for I/O (fewer DB2 calls). What are the limits of this trade-off? At what point does the cache itself become a performance problem?