Case Study 2: MedClaim's Provider Database Performance Crisis

Background

MedClaim's provider database stores information about 85,000 healthcare providers. The IMS hierarchy:

PROVIDER (root) — 85,000 records
├── SPECIALTY — avg 1.8 per provider
├── LOCATION — avg 2.3 per provider
│   └── SCHEDULE — avg 5 per location (one per weekday)
├── CONTRACT — avg 3.1 per provider
│   └── FEE-SCHEDULE — avg 180 per contract (one per procedure code)
└── CREDENTIAL — avg 4 per provider

The database is accessed by the claim adjudication batch job (CLM-ADJUD) that processes 500,000 claims per night.

The Crisis

In Q2 2024, MedClaim onboarded 12,000 new providers from a network expansion. The nightly batch window expanded from 3 hours to 7.5 hours, exceeding the 6-hour maximum. Claims were not being processed in time for the morning payment run.

Investigation

James Okafor analyzed the CLM-ADJUD program and found the problem in the provider verification logic. For each claim, the program:

  1. GU to PROVIDER (1 call)
  2. GNP to scan all CREDENTIALs looking for active one (avg 4 calls)
  3. GU back to PROVIDER, then navigate to CONTRACT with matching plan (avg 2 calls)
  4. GNP to scan FEE-SCHEDULE for matching procedure code (avg 90 calls!)

Total: ~97 DL/I calls per claim for provider verification alone.

For 500,000 claims: 48.5 million DL/I calls just for provider verification. At 1,000 calls/second (with I/O), that is 13.5 hours — clearly the bottleneck.

The FEE-SCHEDULE scan was the killer. With 180 fee schedule entries per contract, and the program scanning sequentially for a match, each lookup averaged 90 calls (half the entries).

The Fix

James implemented three optimizations:

Optimization 1: Qualified SSA for FEE-SCHEDULE

Before (unqualified — sequential scan):

CALL 'CBLTDLI' USING DLI-GNP
    PROVIDER-PCB
    FEE-IO-AREA
    FEE-UNQUAL-SSA.
*> Loop until matching procedure code found

After (qualified — direct access):

CALL 'CBLTDLI' USING DLI-GNP
    PROVIDER-PCB
    FEE-IO-AREA
    FEE-QUAL-SSA.
*> FEE-QUAL-SSA includes: PROC-CODE = :CLM-PROC-CODE

Result: Reduced fee schedule lookup from avg 90 calls to 1 call.

Optimization 2: Path Call with D Command Code

Combined the PROVIDER and CONTRACT retrieval into a single call:

CALL 'CBLTDLI' USING DLI-GU
    PROVIDER-PCB
    PATH-IO-AREA
    PROVIDER-PATH-SSA    *> with D command code
    CONTRACT-QUAL-SSA.   *> with D command code

Result: Reduced 3 calls to 1 call.

Optimization 3: Caching Frequently Accessed Providers

Many claims reference the same providers. James added a 500-entry cache in WORKING-STORAGE:

01  WS-PROVIDER-CACHE.
    05  WS-CACHE-ENTRY OCCURS 500 TIMES.
        10  WS-CACHE-PROV-ID    PIC X(10).
        10  WS-CACHE-ACTIVE     PIC X(1).
        10  WS-CACHE-HIT-COUNT  PIC 9(5).

Before each DL/I call, the program checks the cache. Cache hit rate: 73%.

Results

Metric Before After Improvement
DL/I calls per claim 97 5 (avg, with cache) 95% reduction
Total DL/I calls 48.5M 2.5M 95% reduction
Batch runtime 7.5 hours 1.8 hours 76% reduction
Avg call time 1.0ms 0.8ms 20% (better locality)

Discussion Questions

  1. Why did the original programmer use unqualified SSAs for the FEE-SCHEDULE scan? What assumptions might have been reasonable when the program was first written?
  2. The cache introduces the risk of stale data. Under what circumstances could this cause a problem? How would you mitigate it?
  3. Would migrating the FEE-SCHEDULE to a DB2 table have solved the performance problem? What trade-offs would that introduce?
  4. James considered adding a secondary index to the FEE-SCHEDULE segment. Why might a DBA resist this? What are the trade-offs of IMS secondary indexes?
  5. How does this case study illustrate the importance of understanding DL/I call patterns for IMS performance?

Key Takeaway

In IMS, the number of DL/I calls is the dominant performance factor. Reducing calls through qualified SSAs, path calls, and caching can yield order-of-magnitude improvements. The hierarchy structure that enables fast parent-child navigation can become a liability when programs scan large numbers of child segments — but qualified SSAs and command codes provide the tools to avoid those scans.