Case Study 1: Tablespace Strategy for a 10 TB Transaction Database
Background
Pacific Mutual Insurance Company operates a claims processing system on DB2 for z/OS. The system has been running for 12 years, and the original physical design — created when the database was 200 GB — has not kept pace with data growth. The database now holds 10 TB of data across 45 tables, and the DBA team is facing escalating operational problems.
The core challenge: the database was designed in 2012 using segmented tablespaces and classic partitioned tablespaces with 64 partitions. What worked at 200 GB is failing at 10 TB.
The Current State
Infrastructure: - DB2 12 for z/OS on a z15 LPAR with 8 GCPs and 6 zIIPs - DS8900 storage array with Easy Tier (SSD + spinning disk) - 250 GB of buffer pool memory across all pools
The problem tables:
| Table | Rows | Size | Tablespace Type | Issues |
|---|---|---|---|---|
| CLAIM | 180M | 120 GB | Segmented | REORG takes 14 hours; locks all 180M rows |
| CLAIM_LINE | 1.2B | 450 GB | Classic partitioned (64) | Partitions full; DSSIZE 2G hit limit |
| CLAIM_DOCUMENT | 90M LOBs | 6.8 TB | LOB TS (single) | Backup takes 22 hours |
| POLICY | 25M | 18 GB | Segmented | Shares TS with 12 reference tables |
| PAYMENT | 500M | 180 GB | Classic partitioned (64) | Uneven partition sizes (1-8 GB each) |
| PROVIDER | 2M | 1.5 GB | Segmented | No issues (small table) |
| DIAGNOSIS_CODE | 150K | 12 MB | Segmented | Shares TS with PROVIDER; REORG conflict |
Operational symptoms: 1. Monthly REORG of CLAIM takes 14 hours and requires an exclusive lock, pushing the batch window to 18 hours. 2. CLAIM_LINE partitions are approaching the 2 GB DSSIZE limit. Three partitions have already entered STOP status in the past year. 3. CLAIM_DOCUMENT backup exceeds the 24-hour window. The DBA resorted to skipping weekly backups, creating a recovery risk. 4. PAYMENT partitions range from 1 GB to 8 GB because the partitioning key (PAYMENT_ID, a sequential number) does not correlate with data volume over time. Early partitions received fewer rows per range than recent partitions. 5. When POLICY needs a REORG, the 12 reference tables in the same segmented tablespace are also unavailable — blocking online quoting for the duration.
The Redesign
A physical design review team (two DBAs, one data architect, one storage administrator) developed the following plan over a two-week period.
Step 1: Inventory and Classification
They classified every table into one of four categories:
- Category A (Large, partitioned): CLAIM_LINE, PAYMENT, CLAIM_DOCUMENT — need range partitioning with partition-level operations.
- Category B (Large, non-partitioned): CLAIM, POLICY — large enough for their own tablespace but not large enough to require partitioning (under 200 GB).
- Category C (Small, shared): PROVIDER, DIAGNOSIS_CODE, and 30 other reference tables — share a tablespace for administrative simplicity.
- Category D (Temporary/work): 5 work tables used by batch processing — move to temporary table strategy.
Step 2: Target Tablespace Design
CLAIM_LINE — PBR UTS, range-partitioned by CLAIM_DATE:
CREATE TABLESPACE TS_CLAIM_LINE
IN PACMUT_DB
USING STOGROUP SG_FAST
DSSIZE 16G
SEGSIZE 32
NUMPARTS 120 -- 10 years of monthly partitions
BUFFERPOOL BP8K0
LOCKSIZE ROW
COMPRESS YES;
Decision rationale: Monthly partitions by CLAIM_DATE align with the primary query pattern (85% of queries filter by claim date range). At 1.2 billion rows over 10 years, each monthly partition holds ~10 million rows (~3.75 GB) — well within the 16 GB DSSIZE. Partition-level REORG now takes 15 minutes instead of hours.
PAYMENT — PBR UTS, range-partitioned by PAYMENT_DATE:
The team rejected the original PAYMENT_ID partitioning because it created severe size skew. Repartitioning by PAYMENT_DATE creates even monthly partitions (~1.5 GB each) and enables partition elimination for date-range queries.
CLAIM — PBG UTS with compression:
CREATE TABLESPACE TS_CLAIM
IN PACMUT_DB
USING STOGROUP SG_FAST
MAXPARTITIONS 64
DSSIZE 8G
SEGSIZE 32
BUFFERPOOL BP8K0
LOCKSIZE ROW
COMPRESS YES;
At 120 GB (compressed to ~50 GB), CLAIM does not need range partitioning — the entire table REORGs in 2 hours with the new compressed size. PBG lets it grow without manual intervention.
CLAIM_DOCUMENT — PBR UTS with LOB, partitioned by DOCUMENT_DATE:
The critical insight: partition the LOB tablespace by the document date. This enables partition-level backups — the current month's LOB partition (50 GB) backs up in 30 minutes, not 22 hours. Historical partitions only need incremental backups.
Reference tables — PBG UTS (shared):
CREATE TABLESPACE TS_REFERENCE
IN PACMUT_DB
USING STOGROUP SG_STD
MAXPARTITIONS 4
DSSIZE 4G
SEGSIZE 32
BUFFERPOOL BP4K0
LOCKSIZE ROW;
POLICY gets its own PBG tablespace. The remaining 30+ reference tables share TS_REFERENCE. Critically, POLICY is no longer in the same tablespace as DIAGNOSIS_CODE — REORG on one does not affect the other.
Step 3: Migration Execution
The migration was executed over a three-day holiday weekend:
Day 1 (Friday evening): 1. Full image copies of all affected tablespaces (recovery point). 2. UNLOAD CLAIM_LINE, PAYMENT, CLAIM data. 3. DROP old tablespaces.
Day 2 (Saturday): 4. CREATE new tablespaces (PBR UTS for CLAIM_LINE/PAYMENT, PBG UTS for CLAIM/POLICY). 5. LOAD data into new tablespaces with COMPRESS YES (dictionary built during LOAD). 6. CREATE indexes, RUNSTATS.
Day 3 (Sunday): 7. Regression testing — run standard query suite and verify results match pre-migration outputs. 8. Full image copies of new tablespaces. 9. Update application bind plans (REBIND all packages).
CLAIM_DOCUMENT migration was deferred to a separate weekend due to the 6.8 TB volume.
Step 4: Compression Results
| Table | Before | After (Compressed) | Ratio | Savings |
|---|---|---|---|---|
| CLAIM_LINE | 450 GB | 175 GB | 61% | 275 GB |
| CLAIM | 120 GB | 48 GB | 60% | 72 GB |
| PAYMENT | 180 GB | 70 GB | 61% | 110 GB |
| Total | 750 GB | 293 GB | 457 GB |
At $0.12/GB/month for DS8900 storage: $54,840/year in storage savings.
Results After Six Months
| Metric | Before | After | Improvement |
|---|---|---|---|
| CLAIM_LINE REORG time | Failed (too big) | 15 min/partition | Now possible |
| CLAIM REORG time | 14 hours | 1.5 hours | 89% reduction |
| CLAIM_LINE STOP incidents | 3/year | 0 | Eliminated |
| Batch window duration | 18 hours | 6 hours | 67% reduction |
| CLAIM_DOCUMENT backup time | 22 hours | 30 min (current) | 97% reduction |
| Total storage | 10 TB | 7.2 TB | 28% reduction |
| Annual storage cost savings | — | $54,840/year | New savings |
| POLICY REORG availability | Blocked ref tables | Independent | No conflicts |
Lessons Learned
-
Physical design is not "set and forget." What works at 200 GB fails spectacularly at 10 TB. Schedule a physical design review every 2-3 years or whenever data volume doubles.
-
Partitioning key matters more than partition count. The original PAYMENT_ID partitioning created a maintenance nightmare. PAYMENT_DATE partitioning solved the skew problem and enabled date-range partition elimination.
-
Separate unrelated tables into separate tablespaces. The POLICY + reference tables in a shared segmented tablespace caused unnecessary operational coupling. The cost of a few extra tablespaces is trivial compared to the cost of unavailability.
-
Compression is almost always the right answer for large tables. The 457 GB savings paid for the entire migration project in storage cost savings within 10 months.
-
LOB partitioning is a game-changer for backup. The CLAIM_DOCUMENT table — 68% of the database by volume — went from an impossible 22-hour backup to a 30-minute partition-level backup. This single change eliminated the most critical operational risk in the system.
Discussion Questions
-
The team chose DSSIZE 16G for CLAIM_LINE. Under what circumstances would they need to increase this? What would the procedure be?
-
Why did the team choose PBG instead of PBR for the CLAIM table? At what size should they reconsider?
-
The CLAIM_DOCUMENT migration was deferred. Design a migration plan for the 6.8 TB LOB data that minimizes downtime.
-
The original PAYMENT_ID partitioning seemed reasonable at design time. What design review process could have caught the skew problem earlier?
-
If Pacific Mutual migrates to DB2 LUW on cloud infrastructure in 5 years, which aspects of this physical design translate directly, and which require rethinking?