Case Study 2: Pinnacle Health Insurance's GDG Strategy for Monthly Claims Processing
Tiered Generation Data Groups for 50 Million Claims Per Month
Background
Pinnacle Health Insurance processes approximately 50 million medical, dental, and pharmacy claims per month across its national network of 200,000 providers and 12 million members. Their mainframe batch processing environment handles the full claims lifecycle: intake, validation, adjudication, pricing, payment, and regulatory reporting.
Diane Okoye, Pinnacle's chief mainframe architect, inherited a batch environment that had grown organically over 15 years. When she started her architecture review, she found 847 GDG bases — many with conflicting LIMIT values, inconsistent naming, and no documented relationship between them.
"The worst part wasn't the number of GDGs," Diane says. "It was that nobody could tell me the data lineage. When a regulator asked, 'Show me every dataset that contains member Social Security numbers,' we couldn't answer that question from the dataset architecture. We had to read JCL and COBOL source code for 400 programs. That's when I knew we needed to redesign from scratch."
The Claims Processing Pipeline
Pinnacle's nightly batch cycle runs in seven stages, each producing GDG generations that feed the next:
Stage 1: INTAKE
└─ PIN.CLAIMS.DAILY.INTAKE(+1) Raw claims from providers
↓
Stage 2: VALIDATION
└─ PIN.CLAIMS.DAILY.VALID(+1) Validated claims
└─ PIN.CLAIMS.DAILY.REJECT(+1) Rejected claims
↓
Stage 3: ADJUDICATION
└─ PIN.CLAIMS.DAILY.ADJUD(+1) Adjudicated claims
└─ PIN.CLAIMS.DAILY.PEND(+1) Pended (needs review)
↓
Stage 4: PRICING
└─ PIN.CLAIMS.DAILY.PRICED(+1) Priced claims
↓
Stage 5: PAYMENT
└─ PIN.CLAIMS.DAILY.PAYMENT(+1) Payment transactions
└─ PIN.CLAIMS.DAILY.REMIT(+1) Remittance advices
↓
Stage 6: RECONCILIATION
└─ PIN.CLAIMS.DAILY.RECON(+1) Reconciliation bridge
↓
Stage 7: REPORTING
└─ PIN.CLAIMS.DAILY.RPTDATA(+1) Report-ready data
Each stage reads the previous stage's output using relative GDG references. Stage 2 reads PIN.CLAIMS.DAILY.INTAKE(+1) (created in Stage 1 of the same job stream). Stage 3 reads PIN.CLAIMS.DAILY.VALID(+1) from Stage 2. And so on.
The Tiered GDG Architecture
Diane designed a four-tier GDG strategy that balances operational flexibility, recovery capability, regulatory compliance, and storage cost:
Tier 1 — Daily Processing (LIMIT=14, SCRATCH)
//*-----------------------------------------------------------
//* TIER 1: DAILY GDG DEFINITIONS
//*-----------------------------------------------------------
//DEFGDG1 EXEC PGM=IDCAMS
//SYSPRINT DD SYSOUT=*
//SYSIN DD *
DEFINE GDG (NAME(PIN.CLAIMS.DAILY.INTAKE) -
LIMIT(14) NOEMPTY SCRATCH PURGE)
DEFINE GDG (NAME(PIN.CLAIMS.DAILY.VALID) -
LIMIT(14) NOEMPTY SCRATCH PURGE)
DEFINE GDG (NAME(PIN.CLAIMS.DAILY.REJECT) -
LIMIT(14) NOEMPTY SCRATCH PURGE)
DEFINE GDG (NAME(PIN.CLAIMS.DAILY.ADJUD) -
LIMIT(14) NOEMPTY SCRATCH PURGE)
DEFINE GDG (NAME(PIN.CLAIMS.DAILY.PEND) -
LIMIT(14) NOEMPTY SCRATCH PURGE)
DEFINE GDG (NAME(PIN.CLAIMS.DAILY.PRICED) -
LIMIT(14) NOEMPTY SCRATCH PURGE)
DEFINE GDG (NAME(PIN.CLAIMS.DAILY.PAYMENT) -
LIMIT(14) NOEMPTY SCRATCH PURGE)
DEFINE GDG (NAME(PIN.CLAIMS.DAILY.REMIT) -
LIMIT(14) NOEMPTY SCRATCH PURGE)
DEFINE GDG (NAME(PIN.CLAIMS.DAILY.RECON) -
LIMIT(14) NOEMPTY SCRATCH PURGE)
DEFINE GDG (NAME(PIN.CLAIMS.DAILY.RPTDATA) -
LIMIT(14) NOEMPTY SCRATCH PURGE)
/*
Design rationale for Tier 1: - LIMIT(14): Two weeks of daily generations. Provides enough history for restart/recovery of any day within the past two weeks and for trend comparison. - SCRATCH: When a generation rolls off at day 15, delete it. The data has been rolled up into the weekly summary by then. - NOEMPTY: Only roll off the oldest generation. Never empty the entire GDG.
Diane considered LIMIT(7) to save space but rejected it: "When the nightly cycle fails on a Friday and we can't fix it until Monday, we need Thursday's data. With LIMIT(7), Monday's run would have rolled off Thursday. With LIMIT(14), we have cushion."
Tier 2 — Weekly Rollup (LIMIT=8, SCRATCH)
//*-----------------------------------------------------------
//* TIER 2: WEEKLY GDG DEFINITIONS
//*-----------------------------------------------------------
//DEFGDG2 EXEC PGM=IDCAMS
//SYSPRINT DD SYSOUT=*
//SYSIN DD *
DEFINE GDG (NAME(PIN.CLAIMS.WEEKLY.SUMMARY) -
LIMIT(8) NOEMPTY SCRATCH PURGE)
DEFINE GDG (NAME(PIN.CLAIMS.WEEKLY.RECON) -
LIMIT(8) NOEMPTY SCRATCH PURGE)
DEFINE GDG (NAME(PIN.CLAIMS.WEEKLY.PROVSTAT) -
LIMIT(8) NOEMPTY SCRATCH PURGE)
DEFINE GDG (NAME(PIN.CLAIMS.WEEKLY.MEMSTAT) -
LIMIT(8) NOEMPTY SCRATCH PURGE)
/*
Every Sunday night, after the daily cycle completes, a weekly rollup job consolidates the week's daily generations:
//*-----------------------------------------------------------
//* WEEKLY ROLLUP - CONSOLIDATE 7 DAILY GENS
//*-----------------------------------------------------------
//ROLLUP EXEC PGM=PINWKRL1
//DAY1 DD DSN=PIN.CLAIMS.DAILY.ADJUD(-6),DISP=SHR
//DAY2 DD DSN=PIN.CLAIMS.DAILY.ADJUD(-5),DISP=SHR
//DAY3 DD DSN=PIN.CLAIMS.DAILY.ADJUD(-4),DISP=SHR
//DAY4 DD DSN=PIN.CLAIMS.DAILY.ADJUD(-3),DISP=SHR
//DAY5 DD DSN=PIN.CLAIMS.DAILY.ADJUD(-2),DISP=SHR
//DAY6 DD DSN=PIN.CLAIMS.DAILY.ADJUD(-1),DISP=SHR
//DAY7 DD DSN=PIN.CLAIMS.DAILY.ADJUD(0),DISP=SHR
//WKLYOUT DD DSN=PIN.CLAIMS.WEEKLY.SUMMARY(+1),
// DISP=(NEW,CATLG,DELETE),
// SPACE=(CYL,(1000,200),RLSE),
// DCB=(RECFM=FB,LRECL=500,BLKSIZE=27500)
Design rationale for Tier 2: - LIMIT(8): Eight weeks of weekly summaries — nearly two months. Covers any month-end processing window and provides quarter-over-quarter comparison capability. - The COBOL rollup program aggregates daily detail into weekly summary records, reducing record count by roughly 5:1 (multiple claim transactions per claim per week are consolidated).
Tier 3 — Monthly Archive (LIMIT=24, NOSCRATCH)
//*-----------------------------------------------------------
//* TIER 3: MONTHLY GDG DEFINITIONS
//*-----------------------------------------------------------
//DEFGDG3 EXEC PGM=IDCAMS
//SYSPRINT DD SYSOUT=*
//SYSIN DD *
DEFINE GDG (NAME(PIN.CLAIMS.MONTHLY.ARCHIVE) -
LIMIT(24) NOEMPTY NOSCRATCH PURGE)
DEFINE GDG (NAME(PIN.CLAIMS.MONTHLY.PROVPAY) -
LIMIT(24) NOEMPTY NOSCRATCH PURGE)
DEFINE GDG (NAME(PIN.CLAIMS.MONTHLY.REGSNAP) -
LIMIT(24) NOEMPTY NOSCRATCH PURGE)
/*
Design rationale for Tier 3: - LIMIT(24): Two years of monthly snapshots. Covers regulatory audit cycles (typically 12-18 months lookback). - NOSCRATCH: When a generation rolls off at month 25, it's uncataloged but NOT deleted. DFSMShsm migrates the uncataloged dataset to tape based on the management class. Ahmad Rashidi insisted: "If a regulator requests data from 30 months ago, I need to be able to recall it from tape. NOSCRATCH ensures the data survives uncataloging."
Ahmad's compliance perspective: "HIPAA requires us to retain certain records for six years. The monthly archive GDG handles the first two years on DASD. After a generation rolls off at month 25, HSM migrates it to tape, where it stays for the remaining four years under management class MCREGULATORY. The GDG catalog no longer tracks it, but our tape management system (DFSMSrmm) maintains the retention for the full six years."
Tier 4 — Annual Regulatory (LIMIT=10, NOSCRATCH)
//*-----------------------------------------------------------
//* TIER 4: ANNUAL GDG DEFINITIONS
//*-----------------------------------------------------------
//DEFGDG4 EXEC PGM=IDCAMS
//SYSPRINT DD SYSOUT=*
//SYSIN DD *
DEFINE GDG (NAME(PIN.CLAIMS.YEARLY.REGCOMP) -
LIMIT(10) NOEMPTY NOSCRATCH PURGE)
DEFINE GDG (NAME(PIN.CLAIMS.YEARLY.ACTUARIAL) -
LIMIT(10) NOEMPTY NOSCRATCH PURGE)
DEFINE GDG (NAME(PIN.CLAIMS.YEARLY.TAXRPT) -
LIMIT(10) NOEMPTY NOSCRATCH PURGE)
/*
Design rationale for Tier 4: - LIMIT(10): Ten years of annual snapshots. Covers the longest regulatory retention requirement (IRS records, 7 years; some state insurance regulations, 10 years). - NOSCRATCH: Same reasoning as Tier 3 — data must survive beyond the GDG catalog lifetime. - Annual generations are created during the January year-end close processing. Each is a comprehensive snapshot of the entire year's claims activity.
The Restart/Recovery Strategy
The tiered GDG architecture enables a powerful restart strategy. Diane's operations team (led by her batch operations manager, who occupies a similar role to Rob Calloway at CNB) follows a decision tree:
Scenario 1: Nightly cycle fails at Stage 4 (Pricing) - Stages 1-3 output is in GDG generations already cataloged - Restart from Stage 4, referencing the (+1) generations created by Stages 1-3 - Since GDG relative references are resolved at job initiation, the restart job must use absolute generation names (G0047V00) for the already-created generations and (+1) only for the new output
//*-----------------------------------------------------------
//* RESTART FROM STAGE 4 - USE ABSOLUTE NAMES FOR INPUT
//*-----------------------------------------------------------
//STAGE4 EXEC PGM=PINPRC01
//INPUT DD DSN=PIN.CLAIMS.DAILY.ADJUD.G0047V00,DISP=SHR
//OUTPUT DD DSN=PIN.CLAIMS.DAILY.PRICED(+1),
// DISP=(NEW,CATLG,DELETE),
// SPACE=(CYL,(500,100),RLSE),
// DCB=(RECFM=FB,LRECL=500,BLKSIZE=27500)
Scenario 2: Nightly cycle never ran (system outage) - All of yesterday's data is in (0) for each GDG - Today's intake has accumulated - Run two nightly cycles back-to-back using different restart procedures
Scenario 3: Weekly rollup fails - Daily generations are still available (LIMIT=14 provides 2-week window) - Restart the weekly rollup referencing the correct daily generations - The daily generations won't roll off for another week, so there's no urgency
Scenario 4: Month-end close error discovered 3 weeks later - Weekly summaries for the affected weeks are still in the weekly GDG (LIMIT=8) - Can reprocess from weekly to monthly without rerunning the entire month of daily cycles
VSAM Integration: The Claims Master
The GDG strategy doesn't exist in isolation. The claims master KSDS is the central online file that the GDG batch process updates:
DEFINE CLUSTER ( -
NAME(PIN.PROD.CLAIMS.VSAM.MASTER) -
INDEXED -
KEYS(30 0) -
RECORDSIZE(1800 4500) -
SHAREOPTIONS(2 3) -
SPEED -
FREESPACE(15 20) -
) -
DATA ( -
NAME(PIN.PROD.CLAIMS.VSAM.MASTER.DATA) -
CONTROLINTERVALSIZE(8192) -
CYLINDERS(3000 500) -
) -
INDEX ( -
NAME(PIN.PROD.CLAIMS.VSAM.MASTER.INDEX) -
CONTROLINTERVALSIZE(2048) -
CYLINDERS(100 20) -
)
CI size rationale: Diane chose 8192 for the data CI because the average record is 1,800 bytes. With CI overhead, a 4096 CI holds only 2 records. An 8192 CI holds 4 records. For the nightly batch (sequential scan of all 200 million records for month-end), 4 records per CI means half as many I/Os as 2 records per CI. The online random access penalty is modest — reading 8K instead of 4K per random access adds negligible time on modern storage subsystems with large caches.
Free space rationale: 15% CI / 20% CA. Pinnacle adds 2 million new claims daily. With 200 million total claims, that's a 1% daily growth rate. CI free space of 15% absorbs roughly 15 days of inserts before CI splits begin. CA free space of 20% provides room for CI splits without triggering the more expensive CA splits. Diane schedules a full VSAM reorganization every 30 days to reset the free space.
"I could do a tighter free space and reorg more often," Diane says. "But every reorg takes the file offline for 90 minutes. That means a 90-minute window where online claim adjudication is unavailable. Monthly is the right cadence — frequent enough to prevent degradation, infrequent enough to minimize the availability impact."
Space Consumption Analysis
Diane tracks storage consumption across all four tiers:
| Tier | GDG Bases | Active Gens | Avg Gen Size | Total DASD |
|---|---|---|---|---|
| Daily | 10 | 140 | 45 CYL | 6,300 CYL |
| Weekly | 4 | 32 | 200 CYL | 6,400 CYL |
| Monthly | 3 | 72 | 800 CYL | 57,600 CYL |
| Annual | 3 | 30 | 5,000 CYL | 150,000 CYL |
| Total | 20 | 274 | — | ~220,300 CYL |
At approximately 0.85 GB per cylinder on a 3390, the total GDG estate consumes roughly 187 TB. The annual tier (Tier 4) dominates at 127 TB. This is why the Tier 4 datasets use NOSCRATCH and HSM migration — after the GDG generation rolls off at year 11, HSM migrates it to tape, freeing the DASD.
"Without the tiered architecture, we'd need everything on primary DASD for the full retention period," Diane explains. "That would be over 1 PB. The tiered approach with HSM migration keeps our primary DASD requirement under 200 TB. The rest lives on tape at a fraction of the cost."
Lessons Learned
-
Design GDG tiers around your recovery windows, not your retention requirements. Daily LIMIT should match your restart window. Weekly LIMIT should cover your month-end processing. Monthly and annual LIMITs handle regulatory retention — but HSM does the heavy lifting for long-term storage.
-
NOSCRATCH on long-retention tiers. If a generation must survive beyond the GDG catalog lifetime, use NOSCRATCH and let HSM manage the data on tape. SCRATCH is fine for short-retention tiers where you genuinely want the data deleted.
-
Absolute generation names for restart. Relative references are convenient for normal processing but dangerous for restarts. Always resolve to absolute names in restart JCL.
-
Align VSAM reorg cycles with GDG retention. If your monthly reorg window is 90 minutes, make sure your GDG strategy doesn't require file access during that window. Diane schedules the reorg for Sunday 2 AM when no claims processing runs.
-
Track space consumption at the tier level, not the dataset level. Individual dataset sizes fluctuate daily. Tier-level consumption is the metric that drives capacity planning.
Discussion Questions
-
Diane chose LIMIT(14) for daily GDGs. If Pinnacle Health moved to a 24/7 processing model (eliminating the nightly batch window), how would the GDG strategy need to change?
-
The weekly rollup job reads 7 individual daily GDG generations. An alternative design would read the claims master KSDS directly and extract weekly data. Compare the two approaches in terms of I/O cost, recovery complexity, and operational risk.
-
Ahmad Rashidi's NOSCRATCH + HSM strategy for regulatory retention relies on the tape management system (DFSMSrmm) tracking datasets that are no longer in the GDG catalog. What happens if the tape management catalog and the GDG catalog become inconsistent? How would you detect and resolve this?
-
Diane's claims master VSAM reorg takes 90 minutes. For a system that needs 99.9% availability, a 90-minute monthly outage consumes almost the entire annual budget for downtime (8.76 hours). What alternatives to full offline reorganization exist, and what are their trade-offs?
-
If Pinnacle Health acquires another insurer and must merge 80 million additional claims into their system, how would the GDG limits, VSAM configurations, and storage consumption projections need to change? Design the migration plan for the GDG tier architecture.