Case Study 2: Impact Analysis Using the Catalog Before a Major Change

DataField.Dev

Case Study 2: Impact Analysis Using the Catalog Before a Major Change

Background

Sentinel Insurance Group (SIG) is a commercial property and casualty insurer headquartered in Hartford, Connecticut, with $22 billion in annual gross written premium. SIG's policy administration and claims management systems run on DB2 for z/OS 12 within a Parallel Sysplex configuration, processing approximately 95 million transactions per day across peak periods.

The DB2 environment is substantial:

Metric	Value
Total databases	28
Total tables	6,400
Total indexes	14,200
Total views	2,800
Total stored procedures	3,600
Total packages (bound plans)	18,000
Total foreign key relationships	4,200
Total data volume (compressed)	18 TB

The DBA team consists of eight people led by Raymond Zhao, a principal DBA with 19 years of experience at SIG.

The Proposed Change

SIG's Chief Data Officer has mandated a strategic initiative: consolidate customer identity across all product lines. Today, SIG maintains separate customer records in the policy system (POLDB), the claims system (CLMDB), and the billing system (BILDB). A single human customer may have three or more rows across these systems with inconsistent identifiers, addresses, and contact information.

The solution requires a new master customer table (MASTER_CUSTOMER) and changes to the existing CUSTOMER table in POLDB:

Add three new columns to POLDB.CUSTOMER: MASTER_CUSTOMER_ID, IDENTITY_VERIFIED_FLAG, and IDENTITY_VERIFIED_DATE
Add a foreign key from POLDB.CUSTOMER.MASTER_CUSTOMER_ID to the new MASTER.MASTER_CUSTOMER table
Modify the data type of POLDB.CUSTOMER.EMAIL_ADDRESS from CHAR(50) to VARCHAR(254) to support longer email addresses discovered during identity consolidation
Drop the column POLDB.CUSTOMER.LEGACY_SYSTEM_CODE, which has been populated with blanks since a 2018 migration and is no longer referenced

The project sponsor wants to deploy all four changes in a single maintenance window, scheduled for the third Saturday of the month — a 6-hour window from 11:00 PM to 5:00 AM.

Raymond's job is to determine whether this is safe.

The Impact Analysis

Raymond begins by systematically querying the catalog to understand every dependency on the CUSTOMER table.

Step 1: Establish the Scope

First, Raymond needs to know the basic dimensions of the table being modified.

SELECT NAME, CARD, NPAGES, COLCOUNT,
       CREATEDTS, ALTEREDTS, STATSTIME
FROM   SYSIBM.SYSTABLES
WHERE  CREATOR = 'POLDB' AND NAME = 'CUSTOMER';

Results: 14.2 million rows, 847,000 pages, 68 columns, created in 2003, last altered in 2021, last RUNSTATS two days ago. This is a large, mature table. Any change carries significant risk.

Step 2: Foreign Key Dependencies

Raymond queries for all foreign key relationships where CUSTOMER is the parent table — any child tables that reference it.

SELECT R.CREATOR AS CHILD_SCHEMA,
       R.TBNAME AS CHILD_TABLE,
       R.RELNAME AS CONSTRAINT_NAME,
       R.DELETERULE,
       T.CARD AS CHILD_ROWS
FROM   SYSIBM.SYSRELS R
       JOIN SYSIBM.SYSTABLES T
         ON T.CREATOR = R.CREATOR AND T.NAME = R.TBNAME
WHERE  R.REFTBCREATOR = 'POLDB'
  AND  R.REFTBNAME    = 'CUSTOMER'
ORDER BY T.CARD DESC;

Results: 23 child tables across three schemas (POLDB, CLMDB, BILDB). The largest child table has 142 million rows. Seven of the relationships use CASCADE delete rules.

This is the first red flag. The proposed changes include adding a column and modifying a column data type. Adding a column is generally safe, but modifying a data type on a table with 23 dependent child tables requires checking whether any of those relationships involve the EMAIL_ADDRESS column being modified.

Step 3: Foreign Key Column Details

Raymond drills into the foreign key columns to determine which parent columns are referenced.

SELECT R.RELNAME, FK.COLNAME, FK.COLSEQ
FROM   SYSIBM.SYSRELS R
       JOIN SYSIBM.SYSFOREIGNKEYS FK
         ON FK.CREATOR = R.CREATOR
        AND FK.TBNAME  = R.TBNAME
        AND FK.RELNAME = R.RELNAME
WHERE  R.REFTBCREATOR = 'POLDB'
  AND  R.REFTBNAME    = 'CUSTOMER'
ORDER BY R.RELNAME, FK.COLSEQ;

Results: All 23 foreign keys reference either CUSTOMER_ID (the primary key) or a combination of CUSTOMER_ID and PRODUCT_LINE_CODE. None reference EMAIL_ADDRESS or LEGACY_SYSTEM_CODE. The data type change and column drop will not affect foreign key constraints directly.

Step 4: Index Dependencies

Raymond checks every index on the CUSTOMER table to see if any include the columns being modified or dropped.

SELECT I.NAME AS INDEX_NAME,
       K.COLSEQ, K.COLNAME, K.ORDERING,
       I.UNIQUERULE, I.CLUSTERING
FROM   SYSIBM.SYSINDEXES I
       JOIN SYSIBM.SYSKEYS K
         ON K.IXCREATOR = I.CREATOR AND K.IXNAME = I.NAME
WHERE  I.TBCREATOR = 'POLDB'
  AND  I.TBNAME    = 'CUSTOMER'
ORDER BY I.NAME, K.COLSEQ;

Results: 11 indexes on the CUSTOMER table. One index (IX_CUST_EMAIL) includes EMAIL_ADDRESS as its second key column. No index includes LEGACY_SYSTEM_CODE.

This is the second critical finding. The VARCHAR conversion of EMAIL_ADDRESS will affect IX_CUST_EMAIL. On z/OS, ALTER COLUMN changes that affect indexed columns require the index to be rebuilt. Raymond notes this for the implementation plan.

Step 5: View Dependencies

Raymond checks for views that reference the CUSTOMER table.

SELECT D.DCREATOR AS VIEW_SCHEMA,
       D.DNAME    AS VIEW_NAME,
       V.CREATOR  AS BASE_SCHEMA,
       V.NAME     AS BASE_TABLE,
       V.TYPE
FROM   SYSIBM.SYSVIEWDEP D
       JOIN SYSIBM.SYSTABLES V
         ON V.CREATOR = D.DCREATOR AND V.NAME = D.DNAME
WHERE  D.BCREATOR = 'POLDB'
  AND  D.BNAME    = 'CUSTOMER'
ORDER BY D.DNAME;

Results: 34 views reference the CUSTOMER table. Raymond needs to check whether any of these views explicitly reference LEGACY_SYSTEM_CODE (the column being dropped) or EMAIL_ADDRESS (the column being altered).

He writes a follow-up query to check the view definitions stored in SYSIBM.SYSVIEWS:

SELECT V.CREATOR, V.NAME, V.SEQNO, V.TEXT
FROM   SYSIBM.SYSVIEWS V
WHERE  (V.CREATOR, V.NAME) IN (
         SELECT D.DCREATOR, D.DNAME
         FROM   SYSIBM.SYSVIEWDEP D
         WHERE  D.BCREATOR = 'POLDB' AND D.BNAME = 'CUSTOMER'
       )
  AND  (V.TEXT LIKE '%LEGACY_SYSTEM_CODE%'
        OR V.TEXT LIKE '%EMAIL_ADDRESS%')
ORDER BY V.CREATOR, V.NAME, V.SEQNO;

Results: 3 views reference LEGACY_SYSTEM_CODE. 8 views reference EMAIL_ADDRESS. The 3 views referencing LEGACY_SYSTEM_CODE will fail after the column is dropped — they must be altered or recreated as part of the change.

Step 6: Package Dependencies

Raymond checks for compiled packages that reference the CUSTOMER table.

SELECT PD.DCREATOR AS PACKAGE_COLLECTION,
       PD.DNAME    AS PACKAGE_NAME,
       P.VALID, P.OPERATIVE,
       P.BINDTIME, P.OWNER
FROM   SYSIBM.SYSPACKDEP PD
       JOIN SYSIBM.SYSPACKAGE P
         ON P.LOCATION = PD.DLOCATION
        AND P.COLLID   = PD.DCOLLID
        AND P.NAME     = PD.DNAME
WHERE  PD.BCREATOR = 'POLDB'
  AND  PD.BNAME    = 'CUSTOMER'
  AND  PD.BTYPE    = 'T'
ORDER BY PD.DNAME;

Results: 187 packages reference the CUSTOMER table. When the table is altered, all 187 packages will be invalidated and require rebinding. Raymond needs to understand how long rebinding 187 packages will take and whether it fits within the 6-hour maintenance window.

He further filters for packages that reference the specific columns being changed:

-- Packages referencing LEGACY_SYSTEM_CODE
SELECT DISTINCT PD.DCREATOR, PD.DNAME
FROM   SYSIBM.SYSPACKDEP PD
       JOIN SYSIBM.SYSPACKSTMT PS
         ON PS.LOCATION = PD.DLOCATION
        AND PS.COLLID   = PD.DCOLLID
        AND PS.NAME     = PD.DNAME
WHERE  PD.BCREATOR = 'POLDB'
  AND  PD.BNAME    = 'CUSTOMER'
  AND  PS.TEXT LIKE '%LEGACY_SYSTEM_CODE%';

Results: 4 packages explicitly reference LEGACY_SYSTEM_CODE. These packages will not just be invalidated — they will fail at bind time until the SQL is modified to remove the reference to the dropped column. These require code changes before the schema change.

Step 7: Trigger Dependencies

SELECT SCHEMA, NAME, TRIGTIME, TRIGEVENT, GRANULARITY
FROM   SYSIBM.SYSTRIGGERS
WHERE  TBCREATOR = 'POLDB'
  AND  TBNAME    = 'CUSTOMER';

Results: 5 triggers defined on the CUSTOMER table. Two are AFTER INSERT triggers for audit logging. One is a BEFORE UPDATE trigger for data validation. Two are AFTER UPDATE triggers for replication. Raymond reviews the trigger text to determine if any reference the columns being modified.

The Consolidated Impact Report

Raymond compiles his findings:

Dependency Type	Count	Affected by Change	Action Required
Foreign key child tables	23	0 (no FK references modified columns)	None
Indexes	11	1 (IX_CUST_EMAIL uses EMAIL_ADDRESS)	REBUILD INDEX after ALTER
Views	34	11 (3 reference LEGACY_SYSTEM_CODE, 8 reference EMAIL_ADDRESS)	Recreate 3 views; verify 8 views
Packages	187	187 (all invalidated) + 4 (reference dropped column)	REBIND 187 packages; fix 4 packages first
Triggers	5	1 (audit trigger logs EMAIL_ADDRESS changes)	Verify trigger still valid after VARCHAR conversion
Total objects requiring attention	260+

The Recommendation

Raymond presents his analysis to the project team and recommends splitting the change into two maintenance windows:

Window 1 (Week 1): - Add the three new columns (MASTER_CUSTOMER_ID, IDENTITY_VERIFIED_FLAG, IDENTITY_VERIFIED_DATE) - Add the new foreign key to MASTER.MASTER_CUSTOMER - These changes are additive and low-risk; existing code continues to work unchanged

Window 2 (Week 2): - Modify EMAIL_ADDRESS from CHAR(50) to VARCHAR(254) - Drop LEGACY_SYSTEM_CODE - Prerequisite: the 4 packages and 3 views referencing LEGACY_SYSTEM_CODE must be modified and recompiled before the window - During the window: ALTER TABLE, REBUILD IX_CUST_EMAIL, REBIND 187 packages - Estimated time: 4 hours (tight but feasible within the 6-hour window)

The project sponsor initially pushes back: "Why two windows? Can we not do it all at once?" Raymond walks through the risk analysis: if anything goes wrong during the additive changes in Window 1, the rollback is simple (DROP COLUMN, DROP FOREIGN KEY). But if the VARCHAR conversion and column drop are combined with the additive changes and something fails midway, the rollback becomes exponentially more complex — and the 6-hour window does not provide enough buffer for both forward execution and potential rollback.

The sponsor agrees to the two-window approach.

Outcome

Window 1 executes in 47 minutes without incident. All 187 packages remain valid because only new columns and a new foreign key were added — no existing columns were modified or dropped.

Window 2 executes in 3 hours and 22 minutes. The VARCHAR conversion completes in 18 minutes (faster than expected because the column was largely padded blanks). The column drop completes in 4 minutes. IX_CUST_EMAIL is rebuilt in 31 minutes. The 187 package rebinds complete in 2 hours and 29 minutes (the bottleneck, as expected). All 5 triggers remain valid after testing. No application errors are reported when systems resume at 5:00 AM.

Lessons Learned

1. Impact analysis is a catalog problem. Every dependency that Raymond needed to assess — foreign keys, indexes, views, packages, triggers — was available through standard catalog queries. Without the catalog, he would have been guessing based on documentation that may or may not be current.

2. The catalog reveals what documentation does not. The 34 views referencing the CUSTOMER table included 8 views that the application team had not documented. Three of these would have broken if the column drop had proceeded without the view modifications. The catalog is always current; documentation frequently is not.

3. Package count drives maintenance window sizing. The 187 package rebinds consumed 70% of the Window 2 time. Estimating rebind time from the catalog (number of packages, complexity of statements) is essential for accurate window planning.

4. Splitting changes reduces risk without adding cost. The two-window approach added one week to the timeline but dramatically reduced the risk of each window. In regulated industries, this trade-off is almost always worth making.

5. The recursive CTE is the DBA's secret weapon. Raymond's dependency chain queries allowed him to trace not just direct dependencies but indirect ones — views that depend on views, packages that reference views that reference the CUSTOMER table. Without recursive catalog queries, these second-order dependencies are easy to miss.

Discussion Questions

Raymond found 4 packages that explicitly reference LEGACY_SYSTEM_CODE. How would you verify that these are the ONLY packages affected? Could there be dynamic SQL that references this column but does not appear in SYSPACKDEP?
The VARCHAR conversion of EMAIL_ADDRESS required rebuilding IX_CUST_EMAIL. Under what circumstances could an ALTER COLUMN change avoid an index rebuild? Does the answer differ between z/OS and LUW?
If Raymond had been working on DB2 for LUW instead of z/OS, which SYSCAT views would he have used for each step of his analysis? Would the overall approach have been the same?
The project sponsor initially wanted all changes in a single window. Construct a risk argument quantifying the probability and impact of failure for the single-window approach versus the two-window approach. What is the expected cost of each option?
How would you build Raymond's impact analysis into an automated pre-change checklist that runs every time a schema change is proposed? What catalog queries would the checklist include?

Return to Chapter 21 | Continue to Key Takeaways