Case Study 1: Index Strategy Overhaul — From 47 Indexes to 12

"We Never Drop Indexes Here" — How a Regional Insurance Company Reclaimed 40% of Its Batch Window


The Company

Great Lakes Mutual Insurance was a regional property and casualty insurer headquartered in Milwaukee, Wisconsin. Founded in 1952, the company served 1.2 million policyholders across six Midwestern states. Their core policy administration system ran on DB2 for z/OS, version 12, on a z15 mainframe. The system had been in continuous operation since 1991, migrating through every DB2 version along the way.

The database was substantial: 380 tables, 2.1 billion rows across the largest tables, and a total data footprint of approximately 4.8 TB. It processed 45,000 OLTP transactions per second during peak hours (weekday mornings when agents were quoting and binding policies) and ran a nightly batch cycle that generated policy documents, processed claims payments, and produced regulatory reports.

The Problem

In early 2025, the nightly batch window was failing. The batch cycle, which had a hard deadline of 6:00 AM (when OLTP workload resumed), was consistently finishing between 6:15 and 6:45 AM. The overlap caused contention between batch jobs and OLTP transactions, degrading morning response times. Agents complained that the quoting system was "sluggish until mid-morning."

The DBA team — three experienced z/OS specialists — had been tuning individual batch jobs for months, squeezing a few minutes here and there. But the fundamental trend was worsening: each quarter, the batch window grew by 10-15 minutes as data volumes increased.

The infrastructure team proposed upgrading to a larger z15 partition with 20% more capacity. The estimated cost: $2.8 million per year in additional software licensing (IBM, CA, BMC, and other ISV products that charged by MSU). Management asked the DBA team to explore alternatives before approving the expenditure.

The Discovery

Senior DBA Margaret Chen began a systematic review of the database's index strategy — something that had never been done comprehensively. The last index review had been in 2009, conducted by a consultant who had since retired.

Margaret's first query was simple:

SELECT TBNAME, COUNT(*) AS INDEX_COUNT
FROM SYSIBM.SYSINDEXES
WHERE TBCREATOR = 'GLMPROD'
GROUP BY TBNAME
HAVING COUNT(*) > 5
ORDER BY INDEX_COUNT DESC;

The results were alarming:

Table Row Count Index Count
POLICY 12M 17
CLAIM 8M 14
CLAIM_LINE 45M 9
PAYMENT 30M 7
POLICYHOLDER 4M 11

The top five tables alone accounted for 58 indexes. Across the entire database: 347 indexes on 380 tables. Some tables had more indexes than columns.

Margaret then examined index usage. On z/OS, she analyzed recent EXPLAIN output from the plan management facility and correlated it with index names. She also reviewed the IFCID trace data from the past 90 days. Her findings:

  • 47 indexes had not appeared in any access plan in over 12 months. These indexes were consuming storage, slowing every INSERT and UPDATE, adding to REORG time, and providing zero query benefit.
  • 23 indexes were redundant — their leading columns were a subset of another index on the same table. For example, the POLICY table had indexes on both (AGENT_ID) and (AGENT_ID, EFFECTIVE_DATE). The single-column index served no purpose that the composite index could not.
  • 8 indexes were on low-cardinality columns (STATUS fields with 3-5 distinct values) and were never selected by the optimizer because tablespace scans were cheaper.

The Root Causes

Margaret identified three cultural and process failures:

1. No index governance process. There was no formal review or approval process for creating indexes. Any developer or DBA could create an index on any table without documenting the justification. Over 35 years, indexes accumulated like sediment.

2. Fear of dropping. The team had an unwritten rule: "We never drop indexes." A decade earlier, a junior DBA had dropped an index that turned out to support a quarterly regulatory report. The report failed during a state insurance examination, creating a crisis. Since then, the culture was to never remove anything. This was a rational response to a real incident, but it created a permanent accumulation problem.

3. Application changes without index review. The application had been rewritten twice (in 2005 and 2017). Each rewrite changed query patterns, but nobody reviewed whether old indexes were still needed. Indexes created for queries that no longer existed persisted indefinitely.

The Analysis

Margaret conducted a rigorous analysis over four weeks:

Week 1: Catalog mining. She queried SYSIBM.SYSINDEXES, SYSIBM.SYSKEYS, and SYSIBM.SYSINDEXPART to build a complete inventory of every index, including its columns, uniqueness, clustering status, and space consumption. Total index space: 1.9 TB — 40% of the total data footprint.

Week 2: Usage correlation. She analyzed EXPLAIN output from the dynamic statement cache and static package binds. She matched every index to the queries that used it (or did not use it). For each unused index, she searched application source code to verify no query existed that might use it.

Week 3: Redundancy analysis. She built a matrix of index leading-column overlaps. For each pair of indexes where one was a prefix of the other, she documented the relationship and verified whether the narrower index was ever selected over the wider one.

Week 4: Impact estimation. For each candidate index to drop, she estimated the write overhead reduction. She calculated: each INSERT to the POLICY table currently writes to the table plus 17 indexes. Reducing to 6 indexes would reduce write operations per INSERT by 65%.

The Execution

Margaret presented her findings to management with a phased plan:

Phase 1 — Drop 23 obviously redundant indexes. These were indexes whose leading columns were a strict prefix of another index on the same table. Risk: minimal, because the wider index serves all queries the narrower one could. Execution: one weekend maintenance window.

Phase 2 — Drop 16 confirmed-unused indexes. These had not appeared in any access plan for 12+ months AND had no corresponding query in application source code. Risk: low, but Margaret saved every CREATE INDEX statement and documented a rollback plan. Execution: two weekend windows.

Phase 3 — Drop 8 low-cardinality standalone indexes. These were indexes on columns like STATUS, POLICY_TYPE, and STATE_CODE that the optimizer never selected. Risk: very low. Execution: one weekend window.

Deferred: 12 questionable indexes. These were infrequently used but not clearly unnecessary. Margaret left them in place for further monitoring.

Total indexes dropped: 47 (from 347 to 300 across the database; from 58 to 12 across the five critical tables).

The Results

The impact was measured over the following month:

Batch window: Reduced from an average of 6 hours 25 minutes to 3 hours 50 minutes — a 40% reduction. The batch now finished by 4:00 AM, well within the 6:00 AM deadline. The improvement came from two sources: faster INSERTs/UPDATEs (fewer indexes to maintain) and faster REORGs (fewer indexes to rebuild).

OLTP response times: Average transaction response time improved from 12ms to 9ms — a 25% improvement driven by reduced INSERT overhead. The agents' "sluggish mornings" disappeared entirely because the batch no longer overlapped with OLTP hours.

Storage: 780 GB of index space reclaimed. The storage team reallocated it to other subsystems.

REORG time: The weekly REORG of the POLICY table dropped from 4 hours 10 minutes to 1 hour 35 minutes. With fewer indexes to rebuild, the utility completed much faster.

CPU consumption: Mainframe CPU usage during the batch window dropped by approximately 15%, directly reducing MSU consumption. The $2.8 million hardware upgrade was cancelled.

The Lessons

1. Index accumulation is a silent performance killer. Unlike a missing index (which causes an obvious slow query), unnecessary indexes degrade performance gradually. Each one adds a small amount of overhead to every write. The cumulative effect is significant but hard to attribute to any single index.

2. Index governance is a discipline, not a project. Margaret established a quarterly index review process. New indexes required a documented justification (which query, what improvement, what write overhead). Old indexes were reviewed for continued usage. The team tracked index-to-query mappings in a simple spreadsheet.

3. The fear of dropping must be overcome with data. The 2015 incident (the dropped index for the quarterly report) was real, but the response (never drop anything) was disproportionate. The correct response was to verify usage before dropping, save the DDL for rollback, and test before production deployment — which is exactly what Margaret did.

4. The cost of doing nothing is real. For years, the team added indexes but never removed them. The $2.8 million upgrade that was almost approved would have bought time, but the same accumulation problem would have filled the new capacity within 2-3 years. Addressing the root cause — undisciplined index management — provided a lasting solution.

Epilogue

Six months after the overhaul, Margaret reported zero incidents attributable to the dropped indexes. The quarterly regulatory reports ran correctly. No query experienced a performance regression. The 12 "questionable" indexes from Phase 3 were reviewed in the next quarter; 7 were dropped, 5 were retained with documented justifications.

Great Lakes Mutual subsequently hired Margaret to lead a company-wide database governance initiative. The index review became the template for similar reviews of tablespace design, buffer pool allocation, and SQL tuning.


Return to Chapter 15 | Continue to Case Study 2