Case Study 1: Tablespace Strategy for a 10 TB Transaction Database

Background

Pacific Mutual Insurance Company operates a claims processing system on DB2 for z/OS. The system has been running for 12 years, and the original physical design — created when the database was 200 GB — has not kept pace with data growth. The database now holds 10 TB of data across 45 tables, and the DBA team is facing escalating operational problems.

The core challenge: the database was designed in 2012 using segmented tablespaces and classic partitioned tablespaces with 64 partitions. What worked at 200 GB is failing at 10 TB.

The Current State

Infrastructure: - DB2 12 for z/OS on a z15 LPAR with 8 GCPs and 6 zIIPs - DS8900 storage array with Easy Tier (SSD + spinning disk) - 250 GB of buffer pool memory across all pools

The problem tables:

Table Rows Size Tablespace Type Issues
CLAIM 180M 120 GB Segmented REORG takes 14 hours; locks all 180M rows
CLAIM_LINE 1.2B 450 GB Classic partitioned (64) Partitions full; DSSIZE 2G hit limit
CLAIM_DOCUMENT 90M LOBs 6.8 TB LOB TS (single) Backup takes 22 hours
POLICY 25M 18 GB Segmented Shares TS with 12 reference tables
PAYMENT 500M 180 GB Classic partitioned (64) Uneven partition sizes (1-8 GB each)
PROVIDER 2M 1.5 GB Segmented No issues (small table)
DIAGNOSIS_CODE 150K 12 MB Segmented Shares TS with PROVIDER; REORG conflict

Operational symptoms: 1. Monthly REORG of CLAIM takes 14 hours and requires an exclusive lock, pushing the batch window to 18 hours. 2. CLAIM_LINE partitions are approaching the 2 GB DSSIZE limit. Three partitions have already entered STOP status in the past year. 3. CLAIM_DOCUMENT backup exceeds the 24-hour window. The DBA resorted to skipping weekly backups, creating a recovery risk. 4. PAYMENT partitions range from 1 GB to 8 GB because the partitioning key (PAYMENT_ID, a sequential number) does not correlate with data volume over time. Early partitions received fewer rows per range than recent partitions. 5. When POLICY needs a REORG, the 12 reference tables in the same segmented tablespace are also unavailable — blocking online quoting for the duration.

The Redesign

A physical design review team (two DBAs, one data architect, one storage administrator) developed the following plan over a two-week period.

Step 1: Inventory and Classification

They classified every table into one of four categories:

  • Category A (Large, partitioned): CLAIM_LINE, PAYMENT, CLAIM_DOCUMENT — need range partitioning with partition-level operations.
  • Category B (Large, non-partitioned): CLAIM, POLICY — large enough for their own tablespace but not large enough to require partitioning (under 200 GB).
  • Category C (Small, shared): PROVIDER, DIAGNOSIS_CODE, and 30 other reference tables — share a tablespace for administrative simplicity.
  • Category D (Temporary/work): 5 work tables used by batch processing — move to temporary table strategy.

Step 2: Target Tablespace Design

CLAIM_LINE — PBR UTS, range-partitioned by CLAIM_DATE:

CREATE TABLESPACE TS_CLAIM_LINE
    IN PACMUT_DB
    USING STOGROUP SG_FAST
    DSSIZE 16G
    SEGSIZE 32
    NUMPARTS 120         -- 10 years of monthly partitions
    BUFFERPOOL BP8K0
    LOCKSIZE ROW
    COMPRESS YES;

Decision rationale: Monthly partitions by CLAIM_DATE align with the primary query pattern (85% of queries filter by claim date range). At 1.2 billion rows over 10 years, each monthly partition holds ~10 million rows (~3.75 GB) — well within the 16 GB DSSIZE. Partition-level REORG now takes 15 minutes instead of hours.

PAYMENT — PBR UTS, range-partitioned by PAYMENT_DATE:

The team rejected the original PAYMENT_ID partitioning because it created severe size skew. Repartitioning by PAYMENT_DATE creates even monthly partitions (~1.5 GB each) and enables partition elimination for date-range queries.

CLAIM — PBG UTS with compression:

CREATE TABLESPACE TS_CLAIM
    IN PACMUT_DB
    USING STOGROUP SG_FAST
    MAXPARTITIONS 64
    DSSIZE 8G
    SEGSIZE 32
    BUFFERPOOL BP8K0
    LOCKSIZE ROW
    COMPRESS YES;

At 120 GB (compressed to ~50 GB), CLAIM does not need range partitioning — the entire table REORGs in 2 hours with the new compressed size. PBG lets it grow without manual intervention.

CLAIM_DOCUMENT — PBR UTS with LOB, partitioned by DOCUMENT_DATE:

The critical insight: partition the LOB tablespace by the document date. This enables partition-level backups — the current month's LOB partition (50 GB) backs up in 30 minutes, not 22 hours. Historical partitions only need incremental backups.

Reference tables — PBG UTS (shared):

CREATE TABLESPACE TS_REFERENCE
    IN PACMUT_DB
    USING STOGROUP SG_STD
    MAXPARTITIONS 4
    DSSIZE 4G
    SEGSIZE 32
    BUFFERPOOL BP4K0
    LOCKSIZE ROW;

POLICY gets its own PBG tablespace. The remaining 30+ reference tables share TS_REFERENCE. Critically, POLICY is no longer in the same tablespace as DIAGNOSIS_CODE — REORG on one does not affect the other.

Step 3: Migration Execution

The migration was executed over a three-day holiday weekend:

Day 1 (Friday evening): 1. Full image copies of all affected tablespaces (recovery point). 2. UNLOAD CLAIM_LINE, PAYMENT, CLAIM data. 3. DROP old tablespaces.

Day 2 (Saturday): 4. CREATE new tablespaces (PBR UTS for CLAIM_LINE/PAYMENT, PBG UTS for CLAIM/POLICY). 5. LOAD data into new tablespaces with COMPRESS YES (dictionary built during LOAD). 6. CREATE indexes, RUNSTATS.

Day 3 (Sunday): 7. Regression testing — run standard query suite and verify results match pre-migration outputs. 8. Full image copies of new tablespaces. 9. Update application bind plans (REBIND all packages).

CLAIM_DOCUMENT migration was deferred to a separate weekend due to the 6.8 TB volume.

Step 4: Compression Results

Table Before After (Compressed) Ratio Savings
CLAIM_LINE 450 GB 175 GB 61% 275 GB
CLAIM 120 GB 48 GB 60% 72 GB
PAYMENT 180 GB 70 GB 61% 110 GB
Total 750 GB 293 GB 457 GB

At $0.12/GB/month for DS8900 storage: $54,840/year in storage savings.

Results After Six Months

Metric Before After Improvement
CLAIM_LINE REORG time Failed (too big) 15 min/partition Now possible
CLAIM REORG time 14 hours 1.5 hours 89% reduction
CLAIM_LINE STOP incidents 3/year 0 Eliminated
Batch window duration 18 hours 6 hours 67% reduction
CLAIM_DOCUMENT backup time 22 hours 30 min (current) 97% reduction
Total storage 10 TB 7.2 TB 28% reduction
Annual storage cost savings $54,840/year New savings
POLICY REORG availability Blocked ref tables Independent No conflicts

Lessons Learned

  1. Physical design is not "set and forget." What works at 200 GB fails spectacularly at 10 TB. Schedule a physical design review every 2-3 years or whenever data volume doubles.

  2. Partitioning key matters more than partition count. The original PAYMENT_ID partitioning created a maintenance nightmare. PAYMENT_DATE partitioning solved the skew problem and enabled date-range partition elimination.

  3. Separate unrelated tables into separate tablespaces. The POLICY + reference tables in a shared segmented tablespace caused unnecessary operational coupling. The cost of a few extra tablespaces is trivial compared to the cost of unavailability.

  4. Compression is almost always the right answer for large tables. The 457 GB savings paid for the entire migration project in storage cost savings within 10 months.

  5. LOB partitioning is a game-changer for backup. The CLAIM_DOCUMENT table — 68% of the database by volume — went from an impossible 22-hour backup to a 30-minute partition-level backup. This single change eliminated the most critical operational risk in the system.

Discussion Questions

  1. The team chose DSSIZE 16G for CLAIM_LINE. Under what circumstances would they need to increase this? What would the procedure be?

  2. Why did the team choose PBG instead of PBR for the CLAIM table? At what size should they reconsider?

  3. The CLAIM_DOCUMENT migration was deferred. Design a migration plan for the 6.8 TB LOB data that minimizes downtime.

  4. The original PAYMENT_ID partitioning seemed reasonable at design time. What design review process could have caught the skew problem earlier?

  5. If Pacific Mutual migrates to DB2 LUW on cloud infrastructure in 5 years, which aspects of this physical design translate directly, and which require rethinking?