Case Study 2: MedClaim's Saga-Based Claim Lifecycle
Background
MedClaim processes 500,000 health insurance claims per month. A claim's lifecycle spans multiple days:
| Day | Step | Program | Action |
|---|---|---|---|
| 1 | Intake | CLM-INTAKE | Receive and validate claim |
| 1 | Eligibility | CLM-ELIG | Verify member eligibility |
| 2 | Adjudication | CLM-ADJUD | Apply business rules, determine payment |
| 3 | Payment approval | CLM-APPR | Manager approval for high-dollar claims |
| 3-5 | Payment | CLM-PAY | Generate payment to provider |
| 5 | EOB | CLM-EOB | Send Explanation of Benefits to member |
Each step is a separate batch program running on a different schedule. The entire lifecycle cannot be a single database transaction — you cannot hold locks for five days.
The Problem
Before the saga implementation, each program updated claim status independently. When a step failed, there was no systematic way to undo previous steps. Common problems included:
- Claims stuck in "ADJUDICATED" status because CLM-PAY failed but no one reversed the adjudication
- Duplicate payments when CLM-PAY was restarted without idempotency checks
- Provider balance discrepancies when CLM-ADJUD updated the provider pending balance but CLM-PAY never issued the payment
James Okafor estimated that 2-3% of claims required manual intervention each month — approximately 12,000 claims requiring human review.
The Saga Implementation
James designed a saga orchestrator that tracks the state of each claim's lifecycle:
Saga State Table
CREATE TABLE SAGA_STATE (
CLAIM_ID CHAR(15) NOT NULL,
SAGA_STEP SMALLINT NOT NULL,
STEP_NAME CHAR(20) NOT NULL,
STEP_STATUS CHAR(10) NOT NULL,
STEP_TIME TIMESTAMP NOT NULL,
COMP_STATUS CHAR(10),
COMP_TIME TIMESTAMP,
PRIMARY KEY (CLAIM_ID, SAGA_STEP)
);
Each Step's Contract
Every program in the lifecycle follows the same contract:
- Check saga state — is this step appropriate for the claim's current state?
- Execute the step's business logic within a single COMMIT
- Update the saga state table
- If the step fails, update saga state with FAILED status
Compensation Orchestrator
A separate program, CLM-COMP, runs every 30 minutes. It queries for claims in FAILED status and executes compensations in reverse order:
2000-PROCESS-FAILED-CLAIMS.
EXEC SQL DECLARE FAILED-CURSOR CURSOR FOR
SELECT DISTINCT CLAIM_ID
FROM SAGA_STATE
WHERE STEP_STATUS = 'FAILED'
ORDER BY CLAIM_ID
END-EXEC
EXEC SQL OPEN FAILED-CURSOR END-EXEC
PERFORM UNTIL SQLCODE = +100
EXEC SQL FETCH FAILED-CURSOR
INTO :WS-CLAIM-ID
END-EXEC
IF SQLCODE = 0
PERFORM 2100-COMPENSATE-CLAIM
END-IF
END-PERFORM
EXEC SQL CLOSE FAILED-CURSOR END-EXEC.
2100-COMPENSATE-CLAIM.
*--- Find the highest completed step ---*
EXEC SQL
SELECT MAX(SAGA_STEP) INTO :WS-MAX-STEP
FROM SAGA_STATE
WHERE CLAIM_ID = :WS-CLAIM-ID
AND STEP_STATUS IN ('COMPLETED', 'FAILED')
END-EXEC
*--- Compensate in reverse order ---*
PERFORM VARYING WS-STEP
FROM WS-MAX-STEP BY -1
UNTIL WS-STEP < 1
PERFORM 2200-COMPENSATE-STEP
END-PERFORM.
Idempotent Step Design
Each step checks whether it has already processed this claim:
3000-ADJUDICATE-CLAIM.
*--- Idempotency check ---*
EXEC SQL
SELECT STEP_STATUS INTO :WS-STEP-STATUS
FROM SAGA_STATE
WHERE CLAIM_ID = :WS-CLAIM-ID
AND SAGA_STEP = 3
END-EXEC
IF SQLCODE = 0
IF WS-STEP-STATUS = 'COMPLETED'
DISPLAY 'CLAIM ALREADY ADJUDICATED: '
WS-CLAIM-ID
GO TO 3000-EXIT
END-IF
END-IF
*--- Proceed with adjudication ---*
...
Results
| Metric | Before Saga | After Saga | Improvement |
|---|---|---|---|
| Claims requiring manual intervention | 2-3% (12,000/mo) | 0.1% (500/mo) | 96% reduction |
| Duplicate payments | ~200/month | 0 | 100% elimination |
| Provider balance discrepancies | ~500/month | ~10/month | 98% reduction |
| Time to resolve stuck claims | 2-5 days | Automatic (30 min) | ~99% faster |
Discussion Questions
- Why is a compensation orchestrator (separate program) better than having each step attempt its own compensation on failure?
- The saga state table creates a single point of truth for claim lifecycle status. What happens if the saga state table itself becomes corrupted or unavailable?
- James chose a 30-minute compensation cycle. What are the trade-offs of running compensation more frequently (every 1 minute) vs. less frequently (every 4 hours)?
- Some compensations are not perfectly reversible (e.g., a notification has already been sent). How should the saga handle these "non-compensatable" steps?
- How does this saga design support the "Defensive Programming" theme from the textbook?
Key Takeaway
The saga pattern transforms unreliable, multi-step business processes into self-healing systems. By breaking long-running processes into committed steps with explicit compensations, and by making every step idempotent, you create a system that automatically recovers from failures. The initial investment in saga infrastructure pays for itself many times over in reduced manual intervention and eliminated data inconsistencies.