Case Study 2: DDL Standards and Governance at a Fortune 500 — Harrington Financial Group


Background

Harrington Financial Group (HFG) is a Fortune 200 diversified financial services company headquartered in Boston, Massachusetts. With $420 billion in assets under management, HFG operates across retail banking, investment management, insurance, and commercial lending.

HFG runs DB2 on both platforms:

Platform Version Purpose Tables Databases
DB2 for z/OS DB2 13 Core banking, insurance claims, regulatory reporting 42,000+ 180
DB2 for LUW 11.5 Digital channels, analytics, customer data platform 18,000+ 45

The DB2 team consists of 28 DBAs: 16 specializing in z/OS, 8 in LUW, and 4 who work across both platforms. They manage over 60,000 tables, 200,000+ indexes, and handle an average of 120 DDL change requests per week.

The Problem

In the past 18 months, HFG has experienced a series of DDL-related incidents that have cost the company significant money and reputational damage:

Incident 1: The VARCHAR(4000) Query Performance Crisis A development team created a customer-facing lookup table with 14 VARCHAR(4000) columns. The actual data rarely exceeded 50 characters per column. When the table reached 30 million rows, a critical nightly report that joined this table with three others began timing out. The optimizer grossly overestimated the table's row width, choosing nested-loop joins where hash joins would have been orders of magnitude faster. Resolution required ALTER TABLE to reduce all column lengths, followed by REORG — total downtime: 6 hours.

Incident 2: The Missing Foreign Key Cascade A developer specified ON DELETE CASCADE on a foreign key between a policy table and a policy_notes table. When a data cleanup script deleted 12,000 expired policies, it silently cascaded through to delete 2.4 million policy notes — including notes for active policies that had been incorrectly linked. Recovery from backup took 11 hours.

Incident 3: The DROP TABLE in Production A contractor running a deployment script intended for the QA environment accidentally executed it against production. The script included DROP TABLE customer_preference. The table had 8 million rows and was referenced by the mobile banking application. The outage lasted 4 hours and generated 3,200 customer complaints.

Incident 4: The Unnamed Constraint Mystery A batch process failed at 1:47 AM with SQLSTATE 23503, referencing constraint SQL230415082145290. The on-call DBA spent 35 minutes querying the catalog to determine which foreign key relationship was violated. The incident would have been resolved in 5 minutes if the constraint had been named fk_txn_account.


The Governance Initiative

HFG's Chief Data Officer, Dr. Eleanor Voss, commissioned a DDL governance initiative led by Principal DBA Rajesh Krishnamurthy. After six weeks of analysis, Krishnamurthy's team produced a comprehensive DDL standards document and a governance process. Here are the key elements.

Standard 1: Naming Conventions (Mandatory)

All DDL objects must follow strict naming conventions:

Object Pattern Example Enforcement
Schema {app}_{env} abbreviation RBK (retail banking) Review
Table Singular noun, uppercase CUSTOMER, ACCOUNT Automated
Column lowercase_snake_case first_name, account_id Automated
Primary key pk_{table} pk_customer Automated
Foreign key fk_{child}_{parent} fk_account_customer Automated
Unique uq_{table}_{column} uq_customer_taxid Automated
Check chk_{table}_{description} chk_cust_status Automated
Index ix_{table}_{column(s)} ix_customer_name Automated
Sequence seq_{table_or_purpose} seq_transaction_id Automated
Tablespace (z/OS) {app}{type}{seq} RBKCUS01 (retail banking, customer, space 01) Automated
Tablespace (LUW) ts_{app}_{purpose} ts_rbk_customer_data Automated

Enforcement: Krishnamurthy's team built a DDL linting tool that parses CREATE and ALTER statements and rejects non-conforming names before they reach the DBA review queue. The tool integrates with the CI/CD pipeline and runs automatically on every DDL change request.

Standard 2: Data Type Rules (Mandatory)

Rule Rationale
All monetary columns: DECIMAL(15,2) minimum Prevents floating-point rounding in financial calculations
All primary keys: INTEGER or BIGINT (no SMALLINT) Prevents capacity issues without space savings that matter
All status/flag columns: CHAR(1) with CHECK constraint Consistent, compact, constrained
All timestamps: TIMESTAMP NOT NULL DEFAULT CURRENT TIMESTAMP Consistent audit trail across all tables
VARCHAR maximum must be within 2x actual maximum observed data Prevents optimizer estimation errors
No FLOAT or DOUBLE for business data Reserved for scientific/statistical columns only, requiring justification
No VARCHAR for dates, times, or codes with fixed lengths DATE for dates, CHAR(n) for fixed-length codes

Standard 3: Constraint Requirements (Mandatory)

Requirement Detail
Every table must have a PRIMARY KEY No exceptions
Every constraint must have an explicit name System-generated names are rejected by the linting tool
Every foreign key must specify ON DELETE explicitly No implicit default behavior — the designer must make a conscious choice
ON DELETE CASCADE requires written justification and DBA approval The justification must explain the cascade chain and confirm that cascaded data is genuinely dependent
CHECK constraints on all status and code columns The valid values must be documented in the constraint
NOT NULL on all columns except those with a documented business reason to allow NULLs The default position is NOT NULL; NULLability requires justification

Standard 4: Index Requirements (Mandatory)

Requirement Detail
Every foreign key column must be indexed Prevents full table scans during parent-row operations
Every table must have a clustering index (z/OS) Ensures physical ordering for range scans
Index naming must follow the ix_{table}_{column} convention Consistency for catalog queries and operational scripts
No more than 12 indexes per OLTP table without DBA approval Prevents excessive INSERT/UPDATE overhead
Index changes on tables over 100 million rows require a performance impact assessment Large index builds can consume significant resources

Standard 5: Tablespace Standards (z/OS — Mandatory)

Requirement Detail
All new tablespaces must be Universal Table Space (UTS) No legacy segmented or classic partitioned types
Tables over 10 GB must use partition-by-range UTS Enables partition-level operations and parallel utilities
COMPRESS YES for all tablespaces over 1 GB Storage savings typically 40-70%
LOCKSIZE ROW for OLTP tablespaces Maximizes concurrency
Buffer pool assignment must be reviewed by the DBA team Prevents resource contention from poor buffer pool choices
DSSIZE based on projected 5-year growth plus 50% headroom Avoids DSSIZE exhaustion requiring tablespace rebuild

Standard 6: Change Management Process

The governance process for DDL changes has four stages:

Stage 1: Design Review The developer or architect submits a DDL change request through a web portal. The request includes the DDL script, a business justification, estimated row volumes, and an impact assessment. The DDL linting tool runs automatically and reports any standards violations.

Stage 2: DBA Review A DBA reviews the DDL for correctness, performance implications, and standards compliance. For complex changes (partition additions, tablespace reorganization, large ALTER operations), two DBAs must approve. Average turnaround: 2 business days.

Stage 3: Test Execution The DDL is executed in the QA environment. For ALTER operations on large tables, the DBA measures execution time and resource consumption. Any REORG-pending or CHECK-pending states are resolved and documented.

Stage 4: Production Execution The DDL is executed in production by a DBA (never by a developer or contractor). All production DDL execution is logged with the DBA's ID, timestamp, and the approved change request number. DROP statements require a second DBA to verify the statement before execution.


The DROP TABLE Safeguard

After the Incident 3 post-mortem, Krishnamurthy implemented a "DROP guard" — a technical control that prevents accidental drops:

  1. No developer or contractor has DROP authority on production schemas. Only the 6 senior DBAs have this privilege.

  2. A pre-DROP verification script runs automatically before any DROP TABLE: sql -- Automated pre-DROP check SELECT 'DEPENDENT_VIEWS' AS dep_type, COUNT(*) AS dep_count FROM SYSIBM.SYSVIEWDEP WHERE DCREATOR = 'MERIDIAN' AND DNAME = 'CUSTOMER' UNION ALL SELECT 'DEPENDENT_FKS', COUNT(*) FROM SYSIBM.SYSFOREIGNKEYS WHERE REFTBCREATOR = 'MERIDIAN' AND REFTBNAME = 'CUSTOMER' UNION ALL SELECT 'ROW_COUNT', COUNT(*) FROM MERIDIAN.CUSTOMER;

  3. A 24-hour "cooling off" period between DROP approval and execution. This has already caught two mistakes where a DBA realized overnight that the wrong table was targeted.

  4. Backup verification: Before any DROP, the DBA must confirm that a current backup exists and has been tested with a trial restore.


Results After 12 Months

After one year of the governance program:

Metric Before After Change
DDL-related production incidents 14 2 -86%
Average DDL review turnaround 5 days 2 days -60%
DDL standards compliance rate ~40% (estimated) 97% (measured) +57 pts
Unnamed constraints in production 12,400+ 800 (legacy only) -94%
Tables without primary keys 340 12 (legacy only) -96%
Average incident resolution time 3.2 hours 45 minutes -77%

The two remaining incidents were both minor: one involved a non-standard index name that slipped through the linting tool, and the other was a REORG-pending state that went unmonitored for 48 hours before being detected.


Lessons Learned

Krishnamurthy presented these takeaways to HFG's technology leadership:

  1. DDL quality is a leading indicator of operational health. Organizations with sloppy DDL practices have more outages, longer recovery times, and higher operational costs. Investing in DDL governance pays for itself many times over.

  2. Automation beats education. Training developers to follow naming conventions is valuable, but a linting tool that rejects non-conforming DDL before it reaches the DBA queue is more reliable. Humans forget conventions; tools do not.

  3. The 24-hour cooling-off period for DROP is the single most effective safeguard. Two of three averted mistakes were caught by the DBA themselves during the cooling period, not by a second reviewer. Time to reconsider is more powerful than an additional pair of eyes.

  4. Cross-platform standards matter. With DB2 on both z/OS and LUW, consistent naming conventions and data type standards make it possible for DBAs to work across platforms without constantly translating mental models. A foreign key named fk_account_customer means the same thing on both platforms.

  5. Legacy remediation is a marathon, not a sprint. Renaming 12,400 unnamed constraints in production is not a weekend project. HFG allocated dedicated capacity to rename constraints in batches over 18 months, prioritizing high-traffic tables first. The linting tool ensures no new unnamed constraints are created while the backlog is cleared.


Discussion Questions

  1. HFG's standard requires NOT NULL by default, with NULLability requiring justification. What business scenarios would justify a nullable column? Give three specific examples from a banking schema.

  2. The DDL linting tool catches naming violations automatically. What other DDL anti-patterns could be detected automatically? Design five additional lint rules.

  3. Krishnamurthy's team chose a 24-hour cooling-off period for DROP statements. Is this too conservative, not conservative enough, or about right? What factors should determine the appropriate delay?

  4. HFG limits OLTP tables to 12 indexes without DBA approval. How would you determine the right limit for your organization? What metrics would inform this decision?

  5. The governance process adds 2 business days to DDL changes. A development team argues this slows their sprint velocity. How would you balance governance rigor with development speed? Are there DDL changes that could be fast-tracked?


Return to Chapter 11 | Continue to Exercises