Case Study 2: MedClaim's Saga-Based Claim Lifecycle

Background

MedClaim processes 500,000 health insurance claims per month. A claim's lifecycle spans multiple days:

Day Step Program Action
1 Intake CLM-INTAKE Receive and validate claim
1 Eligibility CLM-ELIG Verify member eligibility
2 Adjudication CLM-ADJUD Apply business rules, determine payment
3 Payment approval CLM-APPR Manager approval for high-dollar claims
3-5 Payment CLM-PAY Generate payment to provider
5 EOB CLM-EOB Send Explanation of Benefits to member

Each step is a separate batch program running on a different schedule. The entire lifecycle cannot be a single database transaction — you cannot hold locks for five days.

The Problem

Before the saga implementation, each program updated claim status independently. When a step failed, there was no systematic way to undo previous steps. Common problems included:

  • Claims stuck in "ADJUDICATED" status because CLM-PAY failed but no one reversed the adjudication
  • Duplicate payments when CLM-PAY was restarted without idempotency checks
  • Provider balance discrepancies when CLM-ADJUD updated the provider pending balance but CLM-PAY never issued the payment

James Okafor estimated that 2-3% of claims required manual intervention each month — approximately 12,000 claims requiring human review.

The Saga Implementation

James designed a saga orchestrator that tracks the state of each claim's lifecycle:

Saga State Table

CREATE TABLE SAGA_STATE (
    CLAIM_ID      CHAR(15) NOT NULL,
    SAGA_STEP     SMALLINT NOT NULL,
    STEP_NAME     CHAR(20) NOT NULL,
    STEP_STATUS   CHAR(10) NOT NULL,
    STEP_TIME     TIMESTAMP NOT NULL,
    COMP_STATUS   CHAR(10),
    COMP_TIME     TIMESTAMP,
    PRIMARY KEY (CLAIM_ID, SAGA_STEP)
);

Each Step's Contract

Every program in the lifecycle follows the same contract:

  1. Check saga state — is this step appropriate for the claim's current state?
  2. Execute the step's business logic within a single COMMIT
  3. Update the saga state table
  4. If the step fails, update saga state with FAILED status

Compensation Orchestrator

A separate program, CLM-COMP, runs every 30 minutes. It queries for claims in FAILED status and executes compensations in reverse order:

       2000-PROCESS-FAILED-CLAIMS.
           EXEC SQL DECLARE FAILED-CURSOR CURSOR FOR
               SELECT DISTINCT CLAIM_ID
               FROM SAGA_STATE
               WHERE STEP_STATUS = 'FAILED'
               ORDER BY CLAIM_ID
           END-EXEC

           EXEC SQL OPEN FAILED-CURSOR END-EXEC

           PERFORM UNTIL SQLCODE = +100
               EXEC SQL FETCH FAILED-CURSOR
                   INTO :WS-CLAIM-ID
               END-EXEC

               IF SQLCODE = 0
                   PERFORM 2100-COMPENSATE-CLAIM
               END-IF
           END-PERFORM

           EXEC SQL CLOSE FAILED-CURSOR END-EXEC.

       2100-COMPENSATE-CLAIM.
      *--- Find the highest completed step ---*
           EXEC SQL
               SELECT MAX(SAGA_STEP) INTO :WS-MAX-STEP
               FROM SAGA_STATE
               WHERE CLAIM_ID = :WS-CLAIM-ID
                 AND STEP_STATUS IN ('COMPLETED', 'FAILED')
           END-EXEC

      *--- Compensate in reverse order ---*
           PERFORM VARYING WS-STEP
                       FROM WS-MAX-STEP BY -1
                       UNTIL WS-STEP < 1
               PERFORM 2200-COMPENSATE-STEP
           END-PERFORM.

Idempotent Step Design

Each step checks whether it has already processed this claim:

       3000-ADJUDICATE-CLAIM.
      *--- Idempotency check ---*
           EXEC SQL
               SELECT STEP_STATUS INTO :WS-STEP-STATUS
               FROM SAGA_STATE
               WHERE CLAIM_ID = :WS-CLAIM-ID
                 AND SAGA_STEP = 3
           END-EXEC

           IF SQLCODE = 0
               IF WS-STEP-STATUS = 'COMPLETED'
                   DISPLAY 'CLAIM ALREADY ADJUDICATED: '
                           WS-CLAIM-ID
                   GO TO 3000-EXIT
               END-IF
           END-IF

      *--- Proceed with adjudication ---*
           ...

Results

Metric Before Saga After Saga Improvement
Claims requiring manual intervention 2-3% (12,000/mo) 0.1% (500/mo) 96% reduction
Duplicate payments ~200/month 0 100% elimination
Provider balance discrepancies ~500/month ~10/month 98% reduction
Time to resolve stuck claims 2-5 days Automatic (30 min) ~99% faster

Discussion Questions

  1. Why is a compensation orchestrator (separate program) better than having each step attempt its own compensation on failure?
  2. The saga state table creates a single point of truth for claim lifecycle status. What happens if the saga state table itself becomes corrupted or unavailable?
  3. James chose a 30-minute compensation cycle. What are the trade-offs of running compensation more frequently (every 1 minute) vs. less frequently (every 4 hours)?
  4. Some compensations are not perfectly reversible (e.g., a notification has already been sent). How should the saga handle these "non-compensatable" steps?
  5. How does this saga design support the "Defensive Programming" theme from the textbook?

Key Takeaway

The saga pattern transforms unreliable, multi-step business processes into self-healing systems. By breaking long-running processes into committed steps with explicit compensations, and by making every step idempotent, you create a system that automatically recovers from failures. The initial investment in saga infrastructure pays for itself many times over in reduced manual intervention and eliminated data inconsistencies.