Case Study 2: Compression ROI Analysis — Saving $200K in Storage
Background
Continental Federal Credit Union is a large credit union with 1.8 million members, 4.2 million accounts, and a core banking system running on DB2 11.5 for LUW. The IT infrastructure team has issued a mandate: reduce storage costs by 20% within 12 months without reducing data retention or functionality. The current storage footprint is 18 TB, costing $1.08 million per year at $0.05/GB/month (enterprise SAN with tiered storage).
The DBA team — two senior DBAs and one junior DBA — was tasked with evaluating DB2 compression as the primary tool to meet this mandate.
Phase 1: Inventory and Assessment (Week 1-2)
The team began by inventorying every table in the production database, focusing on the largest consumers of storage.
Top 15 tables by storage consumption:
| Rank | Table | Rows | Data Size | Index Size | Total | Compressed? |
|---|---|---|---|---|---|---|
| 1 | TRANSACTION_DETAIL | 3.2B | 4,800 GB | 1,200 GB | 6,000 GB | NO |
| 2 | ACCOUNT_BALANCE_HISTORY | 1.5B | 2,100 GB | 480 GB | 2,580 GB | NO |
| 3 | MEMBER_DOCUMENT | 45M (LOBs) | 2,200 GB | 12 GB | 2,212 GB | NO |
| 4 | ACH_TRANSACTION | 800M | 960 GB | 320 GB | 1,280 GB | NO |
| 5 | AUDIT_LOG | 2.1B | 840 GB | 210 GB | 1,050 GB | NO |
| 6 | CHECK_IMAGE | 120M (LOBs) | 780 GB | 8 GB | 788 GB | NO |
| 7 | CARD_TRANSACTION | 600M | 540 GB | 180 GB | 720 GB | NO |
| 8 | WIRE_TRANSFER | 50M | 120 GB | 35 GB | 155 GB | NO |
| 9 | LOAN_AMORTIZATION | 200M | 100 GB | 30 GB | 130 GB | NO |
| 10 | MEMBER | 1.8M | 2.4 GB | 0.8 GB | 3.2 GB | NO |
| 11 | ACCOUNT | 4.2M | 3.1 GB | 1.2 GB | 4.3 GB | NO |
| 12 | BRANCH | 180 | 0.02 GB | 0.01 GB | 0.03 GB | NO |
| 13-15 | (various small tables) | < 1M each | 0.5 GB | 0.3 GB | 0.8 GB | NO |
Key observation: The top 7 tables account for 14.6 TB out of 18 TB — 81% of total storage. The LOB tables (MEMBER_DOCUMENT, CHECK_IMAGE) account for 3.0 TB but contain pre-compressed data (PDFs, TIFF images) that will not benefit from DB2 compression. That leaves 11.6 TB of structured data as compression candidates.
Phase 2: Compression Estimation (Week 3)
The team used SYSPROC.ADMIN_GET_TAB_COMPRESS_INFO to estimate compression ratios without affecting production:
SELECT COMPRESS_ATTR,
PAGES_SAVED_PERCENT,
BYTES_SAVED_PERCENT,
AVG_COMPRESS_REC_LENGTH
FROM TABLE(SYSPROC.ADMIN_GET_TAB_COMPRESS_INFO(
'CCFCU', 'TRANSACTION_DETAIL', 'ESTIMATE')) AS T;
Estimation results:
| Table | Data Size | Est. Ratio | Est. Compressed | Savings |
|---|---|---|---|---|
| TRANSACTION_DETAIL | 4,800 GB | 64% | 1,728 GB | 3,072 GB |
| ACCOUNT_BALANCE_HISTORY | 2,100 GB | 68% | 672 GB | 1,428 GB |
| ACH_TRANSACTION | 960 GB | 61% | 374 GB | 586 GB |
| AUDIT_LOG | 840 GB | 72% | 235 GB | 605 GB |
| CARD_TRANSACTION | 540 GB | 59% | 221 GB | 319 GB |
| WIRE_TRANSFER | 120 GB | 55% | 54 GB | 66 GB |
| LOAN_AMORTIZATION | 100 GB | 66% | 34 GB | 66 GB |
| MEMBER | 2.4 GB | 52% | 1.2 GB | 1.2 GB |
| ACCOUNT | 3.1 GB | 48% | 1.6 GB | 1.5 GB |
| Totals | 9,465 GB | 3,321 GB | 6,144 GB |
The AUDIT_LOG has the highest compression ratio (72%) because it contains highly repetitive data: the same table names, column names, user IDs, and operation types appear in millions of rows. The dictionary-based compression excels at exactly this pattern.
The LOB tables (MEMBER_DOCUMENT, CHECK_IMAGE) were excluded — test compression on a 1 GB sample showed only 2-3% additional compression on already-compressed TIFF and PDF data. Not worth the CPU overhead.
Phase 3: CPU Impact Analysis (Week 4)
Compression is not free. Every page read requires decompression; every insert requires compression. The team needed to quantify the CPU impact before committing.
Testing methodology: 1. Created a test database with a representative 500 GB subset. 2. Ran the standard production workload benchmark (3-hour replay of production SQL trace). 3. Measured CPU consumption with and without compression.
Results:
| Metric | Without Compression | With Compression | Change |
|---|---|---|---|
| Total CPU seconds (3 hours) | 8,420 | 8,840 | +5.0% |
| Peak CPU utilization | 62% | 65% | +3 pts |
| Average query response time | 12.4 ms | 11.8 ms | -4.8% |
| Buffer pool hit ratio | 94.2% | 97.1% | +2.9 pts |
| Sequential scan throughput | 850 MB/s | 2,150 MB/s | +153% |
| I/O operations (3 hours) | 42M | 18M | -57% |
Critical finding: Although compression added 5% CPU overhead, query response times improved by 4.8% because: 1. The buffer pool hit ratio increased from 94.2% to 97.1% (more data fits in the same buffer pool). 2. Sequential scans read 57% fewer pages from disk. 3. The I/O reduction more than offset the decompression CPU cost.
This result is typical for I/O-bound systems. Only CPU-bound systems (sustained >85% CPU utilization) risk performance degradation from compression.
CPU cost calculation: - The server has 32 cores at approximately $150,000/year total compute cost. - 5% additional CPU = $7,500/year. - With headroom: the system peaks at 65% CPU, well below the 80% threshold where additional CPU might be needed.
Phase 4: Implementation Plan (Week 5-8)
The team developed a phased rollout to minimize risk:
Phase 4A: Low-risk, high-impact tables (Week 5-6)
Start with AUDIT_LOG (72% ratio, insert-only, no OLTP reads) and LOAN_AMORTIZATION (small, batch-only):
-- Enable compression
ALTER TABLE CCFCU.AUDIT_LOG COMPRESS YES ADAPTIVE;
ALTER TABLE CCFCU.LOAN_AMORTIZATION COMPRESS YES ADAPTIVE;
-- Build compression dictionary via online REORG
REORG TABLE CCFCU.AUDIT_LOG INPLACE ALLOW WRITE ACCESS;
REORG TABLE CCFCU.LOAN_AMORTIZATION INPLACE ALLOW WRITE ACCESS;
Results validated within one week: AUDIT_LOG compressed from 840 GB to 231 GB (72.5% ratio, slightly better than estimated). No measurable performance impact.
Phase 4B: High-volume transaction tables (Week 6-7)
TRANSACTION_DETAIL (4.8 TB) required partition-level processing:
ALTER TABLE CCFCU.TRANSACTION_DETAIL COMPRESS YES ADAPTIVE;
-- REORG one partition at a time (monthly partitions)
-- Starting with the oldest, working toward current
REORG TABLE CCFCU.TRANSACTION_DETAIL
ON DATA PARTITION P_2019_01 ALLOW WRITE ACCESS;
REORG TABLE CCFCU.TRANSACTION_DETAIL
ON DATA PARTITION P_2019_02 ALLOW WRITE ACCESS;
-- ... repeat for each partition
The partition-level approach allowed the team to compress 1-2 months per night during the batch window. The full 4.8 TB was compressed over 3 weeks without any application downtime.
Phase 4C: Remaining tables (Week 7-8)
ACCOUNT_BALANCE_HISTORY, ACH_TRANSACTION, CARD_TRANSACTION, WIRE_TRANSFER, MEMBER, and ACCOUNT were compressed in the final phase.
Phase 5: Results (Week 9-10)
Storage Savings
| Table | Before | After | Savings | Actual Ratio |
|---|---|---|---|---|
| TRANSACTION_DETAIL | 4,800 GB | 1,680 GB | 3,120 GB | 65% |
| ACCOUNT_BALANCE_HISTORY | 2,100 GB | 651 GB | 1,449 GB | 69% |
| ACH_TRANSACTION | 960 GB | 384 GB | 576 GB | 60% |
| AUDIT_LOG | 840 GB | 231 GB | 609 GB | 72.5% |
| CARD_TRANSACTION | 540 GB | 227 GB | 313 GB | 58% |
| WIRE_TRANSFER | 120 GB | 55 GB | 65 GB | 54% |
| LOAN_AMORTIZATION | 100 GB | 33 GB | 67 GB | 67% |
| MEMBER | 2.4 GB | 1.1 GB | 1.3 GB | 54% |
| ACCOUNT | 3.1 GB | 1.6 GB | 1.5 GB | 48% |
| Structured data total | 9,465 GB | 3,264 GB | 6,201 GB | 65.5% avg |
Actual compression ratios matched estimates within 1-3 percentage points — a validation of the estimation methodology.
Financial Summary
Storage before compression: 18,000 GB
Storage after compression (structured data): -6,201 GB savings
New total storage: 11,799 GB
Annual storage cost before: 18,000 GB * $0.05/GB/month * 12 = $1,080,000
Annual storage cost after: 11,799 GB * $0.05/GB/month * 12 = $707,940
Annual storage savings: $372,060
Annual CPU cost increase: -$7,500
Net annual savings: $364,560
Percentage reduction: 33.8%
The mandate was a 20% reduction. The team delivered 33.8% — a net annual savings of $364,560 against a $7,500 CPU cost increase.
Performance Impact
| Metric | Before | After | Change |
|---|---|---|---|
| Average OLTP response time | 12.4 ms | 11.6 ms | 6.5% faster |
| Batch window duration | 7.2 hours | 5.1 hours | 29% shorter |
| Buffer pool hit ratio (avg) | 94.2% | 97.4% | +3.2 points |
| Monthly storage alerts | 3-5 | 0 | Eliminated |
| Backup duration | 8.5 hours | 5.2 hours | 39% shorter |
Performance improved across every metric. The backup duration reduction was particularly valuable — the compressed data sets are smaller, so backup I/O is proportionally reduced.
Total Cost of Ownership — 5-Year Projection
Year 1 Year 2 Year 3 Year 4 Year 5 Total
Storage savings: $372,060 $409,266 $450,193 $495,212 $544,733 $2,271,464
(Savings grow 10%/year as data grows — compression applied to new data too)
CPU cost: -$7,500 -$8,250 -$9,075 -$9,983 -$10,981 -$45,789
(CPU cost grows 10%/year with data volume)
Migration labor: -$25,000 $0 $0 $0 $0 -$25,000
(One-time: 160 hours of DBA time at $156/hour average)
─────────────────────────────────────────────────────────────────────────────────
Net benefit: $339,560 $401,016 $441,118 $485,229 $533,752 $2,200,675
Five-year net benefit: $2.2 million from a project that took 10 weeks and required no hardware purchases, no application changes, and zero downtime.
Lessons Learned
1. Estimate Before You Compress
The ADMIN_GET_TAB_COMPRESS_INFO function on LUW (and DSN1COMP on z/OS) provides reliable estimates. The team's estimates were within 1-3% of actual results. Never compress a production table without first running the estimation tool.
2. LOBs Are Usually Already Compressed
The team wisely excluded MEMBER_DOCUMENT and CHECK_IMAGE. Pre-compressed content (JPEG, TIFF, PDF, ZIP) yields negligible benefit from DB2 compression. The CPU overhead is pure waste. Identify and exclude these tables early.
3. Partition-Level REORG Is Essential for Large Tables
TRANSACTION_DETAIL at 4.8 TB would have taken days to REORG as a single operation. Partition-level REORG (one month at a time) completed each partition in 20-40 minutes, with no impact on concurrent access. Partitioning is a prerequisite for practical compression of very large tables.
4. Performance Often Improves
The intuition that "compression costs CPU and slows things down" is wrong for most workloads. The I/O reduction (57% fewer operations) and buffer pool improvement (+3.2% hit ratio) more than compensated for the 5% CPU overhead. Only CPU-saturated systems should be cautious.
5. Compression Benefits Compound Over Time
As data grows, the savings grow proportionally. Year 5 savings ($534K) are 43% higher than Year 1 savings ($372K) because compression applies to all new data automatically. The compression dictionary is maintained through periodic REORG operations.
6. The Hardest Part Is Getting Approval
The technical implementation was straightforward. The hardest part was convincing management that enabling compression would not degrade performance. The test environment benchmark (Phase 3) was the key evidence. Always test, always measure, always present data.
Discussion Questions
-
The team excluded LOB tables from compression. Under what circumstances would compressing LOB data make sense? (Hint: consider XML CLOBs and text-heavy document stores.)
-
The AUDIT_LOG achieved 72.5% compression — the highest ratio in the database. What characteristics of audit data make it highly compressible? Design an audit table structure that would compress even better.
-
If the credit union's CPU utilization were at 82% instead of 62%, how would the analysis change? Would you still recommend compression? What alternatives might you pursue?
-
The 5-year projection assumes 10% annual growth. What happens to the ROI if growth is 25%? What if it is 5%? Calculate the sensitivity.
-
Continental Federal plans to migrate from on-premises to a cloud-hosted DB2 instance. How does cloud pricing (per-GB storage, per-CPU-hour compute) change the compression ROI calculation?