Case Study 2: Pinnacle Health's Monthly Claims Processing Pipeline
"Fifty million claims. Thirty-one days. One shot at getting the numbers right."
— Diane Okoye, Systems Architect, Pinnacle Health Insurance
The Context
Pinnacle Health Insurance processes 50 million medical claims per month across commercial, Medicare Advantage, and Medicaid managed care lines of business. Unlike CNB's nightly batch window — which runs every day and must complete by morning — Pinnacle's biggest batch challenge is the monthly claims cycle: a multi-day batch processing event that runs during the last three days of each month and produces the financial, actuarial, and regulatory outputs that drive the entire business.
The monthly cycle isn't just a bigger version of a nightly batch. It's a different beast entirely — one where the batch window spans 72 hours, the dependencies cross business domains, regulatory deadlines are measured in calendar days (not hours), and a failure on Day 1 can cascade into a missed CMS (Centers for Medicare & Medicaid Services) filing deadline that triggers federal audit action.
Diane Okoye has architected Pinnacle's batch processing for nine years. She inherited a monthly cycle that was designed for 20 million claims and has nursed it through to 50 million without a complete re-architecture — yet. This case study examines how she manages the monthly cycle, the crisis that forced a partial re-architecture in 2025, and the lessons that apply to any large-scale batch processing environment.
The Monthly Cycle Architecture
The Three-Day Window
Pinnacle's monthly cycle runs on the 28th, 29th, and 30th of each month (adjusted for short months and weekends):
Day 1 (28th): Claim Finalization and Adjudication
Window: 8:00 PM – 6:00 AM (10 hours)
Jobs: 312
Critical path: 8.2 hours
Day 2 (29th): Financial Processing and Provider Payments
Window: 8:00 PM – 6:00 AM (10 hours)
Jobs: 247
Critical path: 7.5 hours
Day 3 (30th): Regulatory Reporting and Reconciliation
Window: 8:00 PM – 6:00 AM (10 hours)
Jobs: 198
Critical path: 6.8 hours
Each day's batch has its own DAG, but Day 2 depends on Day 1's successful completion, and Day 3 depends on Day 2. The three-day window is itself a DAG with three macro-nodes chained serially.
Day 1: Claim Finalization
The Day 1 batch is the most complex. It processes all claims in "pending" status, applies final adjudication rules, reprices services, and produces the adjudicated claims master file that drives everything else.
Day 1 Critical Path:
CLAIM-EXTRACT (extract 50M claims from DB2) ────── 95 min
│
CLAIM-SORT (sort by provider/service date) ─────── 40 min
│
ADJUD-PHASE1 (primary adjudication rules) ──────── 110 min ★
│
ADJUD-PHASE2 (secondary rules/overrides) ───────── 65 min
│
REPRICE-01 (apply fee schedule repricing) ──────── 85 min ★
│
MERGE-ADJ (merge adjudication results) ─────────── 25 min
│
CLAIM-POST (post to claims master DB2) ─────────── 70 min ★
│
CLAIM-VERFY (verification/reconciliation) ──────── 15 min
│
Total: 505 min (8.4 hours)
★ marks the three jobs with the highest volume elasticity — the ones that grow fastest as claims volume increases.
Parallel Streams on Day 1
Not everything is serial. Day 1 has five parallel streams:
Stream A: Commercial claims (22M claims)
Stream B: Medicare Advantage (18M claims)
Stream C: Medicaid managed care (10M claims)
Stream D: Dental/Vision (auxiliary, 3M claims)
Stream E: Pharmacy (separate adjudication engine, 8M claims)
Streams A, B, and C share the same adjudication engine but process different claim populations. They can run in parallel if — and this is a critical constraint — they don't update the same provider records simultaneously.
Parallel execution plan:
8:00 PM ─ Stream A (Commercial): Extract → Sort → Adjud → Reprice
Stream D (Dental): Extract → Sort → Adjud → Reprice
Stream E (Pharmacy): Extract → Adjud (different engine)
9:30 PM ─ Stream B (Medicare): Extract → Sort → Adjud → Reprice
(delayed to avoid provider table contention with Stream A)
10:15 PM─ Stream C (Medicaid): Extract → Sort → Adjud → Reprice
(delayed further — Medicaid repricing uses same fee tables)
1:00 AM ─ MERGE-ALL: Combine all stream outputs
POST: Update claims master
VERIFY: Reconcile counts and amounts
The critical path is Stream A (the largest at 22M claims), followed by the merge and post steps that require all streams to complete.
Day 2: Financial Processing
Day 2 takes the adjudicated claims from Day 1 and produces:
- Provider payment files (EFT and check)
- Member liability statements
- Accounts payable journal entries
- Reserve adjustments
- Reinsurance calculations
The critical path runs through the provider payment stream:
PAY-EXTRACT (claims by provider) ──────── 45 min
│
PAY-CALC (calculate payment amounts) ──── 75 min
│
PAY-DEDUCT (apply withholdings) ────────── 35 min
│
PAY-EFT (generate EFT files) ─────────── 30 min
│
PAY-REMIT (generate remittance advice) ── 55 min
│
PAY-POST (post to AP ledger) ─────────── 40 min
│
PAY-RECON (reconcile payments) ────────── 20 min
│
RESERVE (calculate reserves) ──────────── 65 min
│
REINS (reinsurance calculations) ──────── 40 min
│
FIN-CLOSE (financial period close) ────── 35 min
│
Total: 440 min (7.3 hours)
Day 3: Regulatory Reporting
Day 3 produces federally mandated reports:
CMS-835 (Medicare remittance) ───────── 40 min
CMS-837 (claim submission to CMS) ──── 55 min
MLR-CALC (Medical Loss Ratio) ──────── 85 min
STATE-RPT (state regulatory reports) ── 45 min
AUDIT-EXT (audit trail extraction) ──── 60 min
RECON-FINAL (full cycle reconciliation)─ 35 min
ARCHIVE (cycle archive to tape/cloud) ── 90 min
Many Day 3 jobs can run in parallel — the CMS files, state reports, and audit extracts are independent. The critical path runs through the MLR calculation (which needs all financial data) and the final reconciliation.
The Crisis: February 2025
What Happened
Pinnacle acquired a regional health plan in January 2025, adding 8 million members and approximately 12 million claims per month. The integration plan called for a 6-month migration, but regulatory pressure from the state insurance commissioner accelerated the timeline — all claims had to be processed on Pinnacle's platform by the February month-end cycle.
February 28th. The monthly cycle began at 8:00 PM.
Day 1 (February 28th):
Claim extract ran normally — 62 million claims instead of the expected 50 million (the acquisition's 12M claims were already loaded into the pending queue). The extract took 118 minutes instead of 95.
ADJUD-PHASE1, the primary adjudication engine, started processing. The acquired plan's claims used different coding conventions (ICD-10 mapping variations, non-standard place-of-service codes), triggering the "manual review" exception path at 3x the normal rate. The program's exception handling wrote each exception to a DB2 audit table — and the high exception rate caused DB2 lock escalation on the audit table, which blocked other adjudication streams.
By 1:30 AM, Stream A (Commercial) was at 65% complete instead of the expected 90%. Streams B and C hadn't started because the DB2 lock escalation was blocking the shared provider tables.
Diane got the call at 1:45 AM.
The Decision Tree
Diane's assessment at 1:45 AM:
Day 1 status:
Stream A: 65% complete, running 90 min behind
Stream B: Queued (DB2 contention)
Stream C: Queued (DB2 contention)
Stream D: Complete (dental claims unaffected)
Stream E: 80% complete (pharmacy uses different engine)
Projected Day 1 completion: 9:30 AM (3.5 hours late)
Day 2 cannot start until Day 1 completes.
Day 3 must complete by 11:59 PM March 2nd (CMS filing deadline).
Time budget:
Available: 76 hours (Feb 28 8PM to Mar 2 midnight)
Day 1 projected: 13.5 hours (3.5 late)
Day 2 expected: 7.5 hours
Day 3 expected: 6.8 hours
Inter-day buffers: 2 hours (two transitions)
Total projected: 29.8 hours
Remaining: 46.2 hours
The math said they'd finish with 46 hours of margin. But Diane didn't trust the math — the Day 1 projection assumed the DB2 contention would be resolved, and it hadn't been yet.
The Recovery
Diane's team executed a three-part recovery:
Part 1 (Immediate): Fix the DB2 lock escalation.
Ahmad Rashidi (compliance) confirmed that the audit table writes weren't needed in real-time — they could be deferred to a post-processing step. Lisa Tran's DB2 equivalent at Pinnacle added a secondary audit table for the batch insert load, bypassing the contended table.
-- Redirect audit writes to staging table (no contention)
INSERT INTO CLAIM_AUDIT_STAGING
(CLAIM_ID, EXCEPTION_CODE, EXCEPTION_DESC, AUDIT_TS)
VALUES
(:WS-CLAIM-ID, :WS-EXCEPT-CD, :WS-EXCEPT-DESC, CURRENT TIMESTAMP);
-- After batch completes, merge staging to main audit table
INSERT INTO CLAIM_AUDIT_MASTER
SELECT * FROM CLAIM_AUDIT_STAGING
WHERE AUDIT_TS >= :WS-CYCLE-START;
This required an emergency code change — a 4-line COBOL modification to redirect the audit INSERT to the staging table. The change was tested on a parallel LPAR at 2:15 AM, promoted to production at 2:30 AM, and the adjudication jobs were restarted from their last checkpoint.
Part 2 (Parallel recovery): Start Streams B and C immediately.
With the audit table contention resolved, Streams B and C could start while Stream A continued. Stream A had already processed 65% of its claims — the remaining 35% would take approximately 40 minutes. Streams B and C would take their normal 90 minutes each (running in parallel = 90 minutes elapsed).
Part 3 (Schedule compression): Overlap Day 1 tail with Day 2 start.
Normally, Day 2 waits for ALL of Day 1 to complete, including the merge and post steps. But Diane realized that the provider payment calculation (Day 2's biggest job) only needed the adjudicated claims — it didn't need the merge to be complete. She could start Day 2's payment extraction as soon as each stream's adjudication finished, rather than waiting for the final merge.
Normal flow:
Stream A adjud → Stream B adjud → Stream C adjud → MERGE → Day 2 start
Compressed flow:
Stream A adjud ──→ Day 2 payment calc (Stream A claims) ──→
Stream B adjud ──→ Day 2 payment calc (Stream B claims) ──→ MERGE payments
Stream C adjud ──→ Day 2 payment calc (Stream C claims) ──→
Savings: ~2 hours (overlap of Day 1 merge with Day 2 processing)
The Outcome
Actual timing:
Day 1 completion: 8:45 AM Mar 1 (2.75 hours late)
Day 2 completion: 11:00 PM Mar 1 (started at 2:00 PM using overlap)
Day 3 completion: 7:30 AM Mar 2 (well within deadline)
CMS filing: 9:15 AM Mar 2 (deadline: 11:59 PM Mar 2)
Margin: 14 hours 45 minutes
Claims processed: 62 million (all adjudicated, posted, and reported)
Errors: 0 (after the audit table fix)
The Post-Crisis Re-architecture
Diane didn't wait for the next crisis. She launched a batch architecture review the following week.
Finding 1: The Adjudication Engine Wasn't Designed for Exceptions
The adjudication COBOL program was written when exception rates were 2–3%. At 9% (the acquired plan's rate), the exception handling path — especially the synchronous DB2 audit writes — became a bottleneck. The fix:
-
Asynchronous audit logging: Exceptions are written to a sequential file during adjudication. A separate post-processing job loads them to DB2 in bulk. This eliminated all DB2 contention from the adjudication path.
-
Exception rate monitoring: A threshold check after every 100,000 claims. If the exception rate exceeds 5%, the job writes a warning message and (optionally) activates a "high-exception" processing mode that batches audit writes more aggressively.
2500-CHECK-EXCEPTION-RATE.
DIVIDE WS-EXCEPTION-COUNT BY WS-RECORD-COUNT
GIVING WS-EXCEPTION-RATE ROUNDED
IF WS-EXCEPTION-RATE > 0.05
DISPLAY 'WARNING: Exception rate '
WS-EXCEPTION-RATE
' exceeds 5% threshold at record '
WS-RECORD-COUNT
SET WS-HIGH-EXCEPTION-MODE TO TRUE
END-IF.
Finding 2: The Three-Day Model Was Fragile
A single bad night on Day 1 cascaded through the entire cycle. Diane re-architected to allow partial overlap between days:
Old model (serial):
Day 1 ────────────→ Day 2 ────────────→ Day 3
New model (pipelined):
Day 1 Stream A ──→ Day 2 payment A ──→ Day 3 CMS A
Day 1 Stream B ──→ Day 2 payment B ──→ Day 3 CMS B
Day 1 Stream C ──→ Day 2 payment C ──→ Day 3 CMS C
↓ ↓
Day 2 merge ──→ Day 3 final recon
Each stream flows through all three days independently, converging only for cross-stream operations (GL posting, final reconciliation, MLR calculation). This reduced the end-to-end critical path from 22.5 hours (serial) to 15.2 hours (pipelined).
Finding 3: Acquisition Integration Needs Batch Impact Assessment
The 12 million new claims were loaded into the production pending queue with no batch capacity analysis. The coding convention differences weren't flagged because nobody ran the new claims through a test adjudication cycle before month-end.
Diane implemented a mandatory "batch integration test" for any data migration that adds more than 5% to existing volume:
- Extract a representative sample (10%) of new data
- Run it through the batch pipeline on a test LPAR
- Measure throughput, exception rates, and DB2 resource consumption
- Project elapsed time impact on the production critical path
- If projected impact exceeds 10% of margin, require architecture review before go-live
Finding 4: Checkpoint Granularity Was Too Coarse
The adjudication program took checkpoints every 500,000 claims. When the job was restarted at 2:30 AM, it had to re-process 340,000 claims that had already been adjudicated since the last checkpoint. That's 25 minutes of wasted re-processing.
New checkpoint frequency: every 50,000 claims. The checkpoint overhead increased by approximately 2 minutes per run (additional I/O and DB2 commits), but the maximum re-processing on restart dropped from 500,000 claims to 50,000 — saving up to 45 minutes in recovery scenarios.
Cost-benefit analysis:
Checkpoint overhead increase: +2 min per run (every month-end)
Recovery time savings: up to 45 min (when failures occur)
Break-even: if failures occur more than once every 22 months
Actual failure frequency: ~3 times per year
ROI: clearly positive
Regulatory Dimensions
Ahmad Rashidi flagged a critical compliance consideration during the post-crisis review: the CMS filing deadline isn't just a business target — it's a contractual obligation under Pinnacle's Medicare Advantage contract. Missing it triggers:
- Corrective Action Plan (CAP) requirement from CMS
- Financial penalties up to $25,000 per day of late filing
- Audit risk — late filers get flagged for enhanced compliance review
- Enrollment sanctions — repeat offenders can be barred from enrolling new members
The February crisis finished with 14 hours of margin. But if the DB2 fix had taken longer, or if the overlap technique hadn't worked, the margin would have been thin enough to trigger Ahmad's escalation protocol — which includes pre-positioning a regulatory notification letter and engaging outside counsel.
"The batch window isn't just an IT problem," Ahmad told the post-crisis review meeting. "It's a regulatory compliance obligation. When the window breaks, my phone rings next."
Key Metrics: Before and After Re-architecture
Metric Before After Change
──────────────────────────────────────────────────────────────────────
Monthly cycle elapsed 22.5 hours 15.2 hours -32%
Day 1 critical path 8.4 hours 6.1 hours -27%
Day 2 critical path 7.3 hours 5.4 hours -26%
Day 3 critical path 6.8 hours 5.8 hours -15%
Max claims capacity 55M 80M +45%
Recovery time (worst case) 4.2 hours 1.8 hours -57%
Checkpoint frequency 500K claims 50K claims 10x
Pipeline overlap None Full N/A
Batch integration tests None Mandatory N/A
Discussion Questions
-
Diane's emergency code change at 2:30 AM bypassed normal change management. Under what circumstances is this acceptable? What controls should exist for emergency batch changes?
-
The pipeline model (streams flowing independently through all three days) introduces complexity — more jobs, more dependencies, more potential failure points. How would you manage this increased complexity?
-
Ahmad Rashidi's compliance concerns add a dimension that pure technical analysis misses. How should regulatory deadlines be represented in the batch DAG? Should they be time dependencies, or something else?
-
Pinnacle's adjudication engine couldn't handle a 3x increase in exception rates. What design principles would make batch programs more resilient to unexpected data characteristics?
-
The "batch integration test" process adds time to data migration projects. Business stakeholders may push back. How would you justify the additional lead time?
-
Compare CNB's crisis (Case Study 1) with Pinnacle's. Both involved volume growth that broke the batch window. What are the common patterns? What's different about a monthly cycle versus a nightly cycle?
-
If Pinnacle's claims volume reaches 100M per month (double current), is the re-architected pipeline sufficient? What would need to change?