Case Study: MedClaim — Tuning Claim Adjudication Throughput

Background

MedClaim Health Services signed a contract with HealthNet Alliance, a network of 2,000 providers. The contract required MedClaim to process claims within 24 hours of electronic submission — a significant improvement over their existing 72-hour turnaround. To meet this SLA, the CLM-ADJUD program needed to process at least 20,000 claims per hour, up from the current throughput of 8,000.

James Okafor was given six weeks to achieve the throughput increase without replacing the existing COBOL program.

Profiling

James ran a Strobe profile during a peak processing period:

CLM-ADJUD Performance Profile
================================
Total Elapsed: 450.0 seconds (for 2,000 claims)
Total CPU:      68.2 seconds
CPU/Elapsed:    15.2%

Time Breakdown:
  DB2 Operations:      292.5 sec  (65.0%)
  VSAM I/O:             90.0 sec  (20.0%)
  CPU Processing:        45.0 sec  (10.0%)
  CICS Overhead:         22.5 sec   (5.0%)

Top DB2 Queries:
  Fee schedule lookup:   180.0 sec  (40.0% of total)
    - 2,000 executions, avg 90ms each
  Member eligibility:     67.5 sec  (15.0%)
    - 2,000 executions, avg 34ms each
  Provider validation:    45.0 sec  (10.0%)
    - 2,000 executions, avg 22ms each

The fee schedule lookup alone consumed 40% of total elapsed time. Each lookup was a moderately complex query with a date range predicate and an ORDER BY.

Root Cause Analysis

James traced the fee schedule query:

SELECT ALLOWED_AMOUNT, MODIFIER_FACTOR
FROM FEE_SCHEDULE
WHERE PROVIDER_ID = :hv-provider
  AND PROCEDURE_CODE = :hv-procedure
  AND EFFECTIVE_DATE <= :hv-service-date
ORDER BY EFFECTIVE_DATE DESC
FETCH FIRST 1 ROW ONLY

This query was well-written individually — it used FETCH FIRST, it had an appropriate WHERE clause. But it executed once per claim, and the combination of the date range predicate with the ORDER BY prevented simple index-only access. Each execution required at least 2 DB2 I/O operations (index + data page).

The Cache Opportunity

James analyzed claim data patterns: - 50,000 unique provider/procedure combinations existed in the fee schedule - In any given batch, 80% of claims involved the top 500 combinations - Fee schedule entries changed at most monthly - The active fee schedule for any combination was the row with the most recent EFFECTIVE_DATE

This meant that 80% of DB2 calls were looking up the same 500 answers repeatedly.

The Three-Level Cache

Level 1: Pre-loaded Top 500

At program initialization, James loaded the top 500 provider/procedure combinations into a sorted WORKING-STORAGE table:

       01  WS-FEE-CACHE.
           05 WS-CACHE-COUNT      PIC 9(4) COMP VALUE 0.
           05 WS-CACHE-MAX        PIC 9(4) COMP VALUE 500.
           05 WS-CACHE-ENTRY OCCURS 500 TIMES
                ASCENDING KEY IS WS-CACHE-KEY
                INDEXED BY WS-CACHE-IDX.
              10 WS-CACHE-KEY.
                 15 WS-CACHE-PROV   PIC X(8).
                 15 WS-CACHE-PROC   PIC X(5).
              10 WS-CACHE-ALLOWED   PIC S9(7)V99 COMP-3.
              10 WS-CACHE-MODIFIER  PIC S9(3)V99 COMP-3.
              10 WS-CACHE-HIT-COUNT PIC 9(7)     COMP.

The preload query ran once at startup:

SELECT F.PROVIDER_ID, F.PROCEDURE_CODE,
       F.ALLOWED_AMOUNT, F.MODIFIER_FACTOR
FROM FEE_SCHEDULE F
INNER JOIN (
    SELECT PROVIDER_ID, PROCEDURE_CODE,
           MAX(EFFECTIVE_DATE) AS MAX_DATE
    FROM FEE_SCHEDULE
    WHERE EFFECTIVE_DATE <= CURRENT DATE
    GROUP BY PROVIDER_ID, PROCEDURE_CODE
) M ON F.PROVIDER_ID = M.PROVIDER_ID
   AND F.PROCEDURE_CODE = M.PROCEDURE_CODE
   AND F.EFFECTIVE_DATE = M.MAX_DATE
ORDER BY (SELECT COUNT(*)
          FROM CLAIMS_HISTORY CH
          WHERE CH.PROVIDER_ID = F.PROVIDER_ID
            AND CH.PROCEDURE_CODE = F.PROCEDURE_CODE
            AND CH.CLAIM_DATE >= CURRENT DATE - 90 DAYS) DESC
FETCH FIRST 500 ROWS ONLY

Level 2: Dynamic Cache with LRU Eviction

Claims that missed the top-500 cache fell through to a secondary dynamic cache. When a DB2 lookup was needed, the result was stored in the least-recently-used (LRU) cache slot:

       01  WS-DYN-CACHE.
           05 WS-DYN-COUNT        PIC 9(4) COMP VALUE 0.
           05 WS-DYN-MAX          PIC 9(4) COMP VALUE 200.
           05 WS-DYN-ENTRY OCCURS 200 TIMES.
              10 WS-DYN-KEY.
                 15 WS-DYN-PROV    PIC X(8).
                 15 WS-DYN-PROC    PIC X(5).
              10 WS-DYN-ALLOWED    PIC S9(7)V99 COMP-3.
              10 WS-DYN-MODIFIER   PIC S9(3)V99 COMP-3.
              10 WS-DYN-ACCESS-TS  PIC 9(14).

Level 3: Direct DB2 Lookup

Claims that missed both caches fell through to the direct DB2 query. James optimized this query by adding a composite index:

CREATE INDEX IX_FEE_PROV_PROC_DATE
ON FEE_SCHEDULE (PROVIDER_ID, PROCEDURE_CODE, EFFECTIVE_DATE DESC)

This allowed the query to satisfy the WHERE and ORDER BY from a single index scan.

Additional Optimizations

Member Eligibility Caching

James applied the same caching pattern to member eligibility lookups. Since members were rarely checked more than once per batch (unless they had multiple claims), the cache was less effective here (32% hit rate), but still saved time.

Multi-Row FETCH

For the provider validation query, which was already simple (single-row lookup by key), James switched to multi-row FETCH by batching claims by provider. When 10 claims for the same provider arrived in sequence, one DB2 call handled all 10.

VSAM Buffer Tuning

James increased BUFND from 5 to 25 for the claims master file, and BUFNI from 3 to 15 for the index component.

Results

Throughput Comparison

Metric Before After Improvement
Claims per hour 8,000 26,000 3.25x
Avg time per claim 450ms 138ms 3.26x
DB2 calls per claim 3.0 0.68 77% reduction
Fee schedule cache hit rate N/A 78% (L1) + 9% (L2) = 87%
Member cache hit rate N/A 32%

Cost Breakdown (Per 10,000 Claims)

Component Before After
DB2 elapsed 325 sec 72 sec
VSAM I/O 100 sec 35 sec
CPU 50 sec 42 sec
Other 25 sec 15 sec
Total 500 sec 164 sec

Cache Statistics (Typical Run)

Fee Schedule Cache Report
========================
Level 1 (Top 500):   78,432 hits / 100,000 lookups (78.4%)
Level 2 (Dynamic):    8,891 hits / 21,568 lookups  (41.2%)
Level 3 (DB2):       12,677 queries
Total DB2 calls:     12,677 (was 100,000)
DB2 call reduction:  87.3%

Lessons Learned

  1. Caching is the most powerful optimization when access patterns are skewed. The Pareto principle (80/20 rule) applied perfectly — 80% of claims hit the top 500 fee schedule entries.

  2. Pre-loading is worth the startup cost. The preload query took 8 seconds to run. It saved an average of 260 seconds per 10,000-claim batch. The ROI was 32x.

  3. Index design matters as much as query design. The composite index on FEE_SCHEDULE eliminated the sort operation that was costing 90ms per query.

  4. The residual 13% of cache misses still matter. Level 3 (direct DB2) handled 12,677 queries per batch. Without the index optimization, those queries alone would have taken 19 minutes.

Discussion Questions

  1. The LRU dynamic cache (Level 2) had a 41% hit rate. Is this worth the complexity of maintaining it? Under what circumstances would you remove it?

  2. James chose a cache size of 500 for Level 1. How would you determine the optimal cache size? What data would you need to analyze?

  3. The fee schedule changes monthly. What happens if a fee schedule update is applied mid-batch? How would you handle cache invalidation?

  4. This solution trades memory (WORKING-STORAGE for caches) for I/O (fewer DB2 calls). What are the limits of this trade-off? At what point does the cache itself become a performance problem?