Case Study 2: DDL Standards and Governance at a Fortune 500 — Harrington Financial Group
Background
Harrington Financial Group (HFG) is a Fortune 200 diversified financial services company headquartered in Boston, Massachusetts. With $420 billion in assets under management, HFG operates across retail banking, investment management, insurance, and commercial lending.
HFG runs DB2 on both platforms:
| Platform | Version | Purpose | Tables | Databases |
|---|---|---|---|---|
| DB2 for z/OS | DB2 13 | Core banking, insurance claims, regulatory reporting | 42,000+ | 180 |
| DB2 for LUW | 11.5 | Digital channels, analytics, customer data platform | 18,000+ | 45 |
The DB2 team consists of 28 DBAs: 16 specializing in z/OS, 8 in LUW, and 4 who work across both platforms. They manage over 60,000 tables, 200,000+ indexes, and handle an average of 120 DDL change requests per week.
The Problem
In the past 18 months, HFG has experienced a series of DDL-related incidents that have cost the company significant money and reputational damage:
Incident 1: The VARCHAR(4000) Query Performance Crisis A development team created a customer-facing lookup table with 14 VARCHAR(4000) columns. The actual data rarely exceeded 50 characters per column. When the table reached 30 million rows, a critical nightly report that joined this table with three others began timing out. The optimizer grossly overestimated the table's row width, choosing nested-loop joins where hash joins would have been orders of magnitude faster. Resolution required ALTER TABLE to reduce all column lengths, followed by REORG — total downtime: 6 hours.
Incident 2: The Missing Foreign Key Cascade A developer specified ON DELETE CASCADE on a foreign key between a policy table and a policy_notes table. When a data cleanup script deleted 12,000 expired policies, it silently cascaded through to delete 2.4 million policy notes — including notes for active policies that had been incorrectly linked. Recovery from backup took 11 hours.
Incident 3: The DROP TABLE in Production
A contractor running a deployment script intended for the QA environment accidentally executed it against production. The script included DROP TABLE customer_preference. The table had 8 million rows and was referenced by the mobile banking application. The outage lasted 4 hours and generated 3,200 customer complaints.
Incident 4: The Unnamed Constraint Mystery
A batch process failed at 1:47 AM with SQLSTATE 23503, referencing constraint SQL230415082145290. The on-call DBA spent 35 minutes querying the catalog to determine which foreign key relationship was violated. The incident would have been resolved in 5 minutes if the constraint had been named fk_txn_account.
The Governance Initiative
HFG's Chief Data Officer, Dr. Eleanor Voss, commissioned a DDL governance initiative led by Principal DBA Rajesh Krishnamurthy. After six weeks of analysis, Krishnamurthy's team produced a comprehensive DDL standards document and a governance process. Here are the key elements.
Standard 1: Naming Conventions (Mandatory)
All DDL objects must follow strict naming conventions:
| Object | Pattern | Example | Enforcement |
|---|---|---|---|
| Schema | {app}_{env} abbreviation | RBK (retail banking) | Review |
| Table | Singular noun, uppercase | CUSTOMER, ACCOUNT | Automated |
| Column | lowercase_snake_case | first_name, account_id | Automated |
| Primary key | pk_{table} | pk_customer | Automated |
| Foreign key | fk_{child}_{parent} | fk_account_customer | Automated |
| Unique | uq_{table}_{column} | uq_customer_taxid | Automated |
| Check | chk_{table}_{description} | chk_cust_status | Automated |
| Index | ix_{table}_{column(s)} | ix_customer_name | Automated |
| Sequence | seq_{table_or_purpose} | seq_transaction_id | Automated |
| Tablespace (z/OS) | {app}{type}{seq} | RBKCUS01 (retail banking, customer, space 01) | Automated |
| Tablespace (LUW) | ts_{app}_{purpose} | ts_rbk_customer_data | Automated |
Enforcement: Krishnamurthy's team built a DDL linting tool that parses CREATE and ALTER statements and rejects non-conforming names before they reach the DBA review queue. The tool integrates with the CI/CD pipeline and runs automatically on every DDL change request.
Standard 2: Data Type Rules (Mandatory)
| Rule | Rationale |
|---|---|
| All monetary columns: DECIMAL(15,2) minimum | Prevents floating-point rounding in financial calculations |
| All primary keys: INTEGER or BIGINT (no SMALLINT) | Prevents capacity issues without space savings that matter |
| All status/flag columns: CHAR(1) with CHECK constraint | Consistent, compact, constrained |
| All timestamps: TIMESTAMP NOT NULL DEFAULT CURRENT TIMESTAMP | Consistent audit trail across all tables |
| VARCHAR maximum must be within 2x actual maximum observed data | Prevents optimizer estimation errors |
| No FLOAT or DOUBLE for business data | Reserved for scientific/statistical columns only, requiring justification |
| No VARCHAR for dates, times, or codes with fixed lengths | DATE for dates, CHAR(n) for fixed-length codes |
Standard 3: Constraint Requirements (Mandatory)
| Requirement | Detail |
|---|---|
| Every table must have a PRIMARY KEY | No exceptions |
| Every constraint must have an explicit name | System-generated names are rejected by the linting tool |
| Every foreign key must specify ON DELETE explicitly | No implicit default behavior — the designer must make a conscious choice |
| ON DELETE CASCADE requires written justification and DBA approval | The justification must explain the cascade chain and confirm that cascaded data is genuinely dependent |
| CHECK constraints on all status and code columns | The valid values must be documented in the constraint |
| NOT NULL on all columns except those with a documented business reason to allow NULLs | The default position is NOT NULL; NULLability requires justification |
Standard 4: Index Requirements (Mandatory)
| Requirement | Detail |
|---|---|
| Every foreign key column must be indexed | Prevents full table scans during parent-row operations |
| Every table must have a clustering index (z/OS) | Ensures physical ordering for range scans |
| Index naming must follow the ix_{table}_{column} convention | Consistency for catalog queries and operational scripts |
| No more than 12 indexes per OLTP table without DBA approval | Prevents excessive INSERT/UPDATE overhead |
| Index changes on tables over 100 million rows require a performance impact assessment | Large index builds can consume significant resources |
Standard 5: Tablespace Standards (z/OS — Mandatory)
| Requirement | Detail |
|---|---|
| All new tablespaces must be Universal Table Space (UTS) | No legacy segmented or classic partitioned types |
| Tables over 10 GB must use partition-by-range UTS | Enables partition-level operations and parallel utilities |
| COMPRESS YES for all tablespaces over 1 GB | Storage savings typically 40-70% |
| LOCKSIZE ROW for OLTP tablespaces | Maximizes concurrency |
| Buffer pool assignment must be reviewed by the DBA team | Prevents resource contention from poor buffer pool choices |
| DSSIZE based on projected 5-year growth plus 50% headroom | Avoids DSSIZE exhaustion requiring tablespace rebuild |
Standard 6: Change Management Process
The governance process for DDL changes has four stages:
Stage 1: Design Review The developer or architect submits a DDL change request through a web portal. The request includes the DDL script, a business justification, estimated row volumes, and an impact assessment. The DDL linting tool runs automatically and reports any standards violations.
Stage 2: DBA Review A DBA reviews the DDL for correctness, performance implications, and standards compliance. For complex changes (partition additions, tablespace reorganization, large ALTER operations), two DBAs must approve. Average turnaround: 2 business days.
Stage 3: Test Execution The DDL is executed in the QA environment. For ALTER operations on large tables, the DBA measures execution time and resource consumption. Any REORG-pending or CHECK-pending states are resolved and documented.
Stage 4: Production Execution The DDL is executed in production by a DBA (never by a developer or contractor). All production DDL execution is logged with the DBA's ID, timestamp, and the approved change request number. DROP statements require a second DBA to verify the statement before execution.
The DROP TABLE Safeguard
After the Incident 3 post-mortem, Krishnamurthy implemented a "DROP guard" — a technical control that prevents accidental drops:
-
No developer or contractor has DROP authority on production schemas. Only the 6 senior DBAs have this privilege.
-
A pre-DROP verification script runs automatically before any DROP TABLE:
sql -- Automated pre-DROP check SELECT 'DEPENDENT_VIEWS' AS dep_type, COUNT(*) AS dep_count FROM SYSIBM.SYSVIEWDEP WHERE DCREATOR = 'MERIDIAN' AND DNAME = 'CUSTOMER' UNION ALL SELECT 'DEPENDENT_FKS', COUNT(*) FROM SYSIBM.SYSFOREIGNKEYS WHERE REFTBCREATOR = 'MERIDIAN' AND REFTBNAME = 'CUSTOMER' UNION ALL SELECT 'ROW_COUNT', COUNT(*) FROM MERIDIAN.CUSTOMER; -
A 24-hour "cooling off" period between DROP approval and execution. This has already caught two mistakes where a DBA realized overnight that the wrong table was targeted.
-
Backup verification: Before any DROP, the DBA must confirm that a current backup exists and has been tested with a trial restore.
Results After 12 Months
After one year of the governance program:
| Metric | Before | After | Change |
|---|---|---|---|
| DDL-related production incidents | 14 | 2 | -86% |
| Average DDL review turnaround | 5 days | 2 days | -60% |
| DDL standards compliance rate | ~40% (estimated) | 97% (measured) | +57 pts |
| Unnamed constraints in production | 12,400+ | 800 (legacy only) | -94% |
| Tables without primary keys | 340 | 12 (legacy only) | -96% |
| Average incident resolution time | 3.2 hours | 45 minutes | -77% |
The two remaining incidents were both minor: one involved a non-standard index name that slipped through the linting tool, and the other was a REORG-pending state that went unmonitored for 48 hours before being detected.
Lessons Learned
Krishnamurthy presented these takeaways to HFG's technology leadership:
-
DDL quality is a leading indicator of operational health. Organizations with sloppy DDL practices have more outages, longer recovery times, and higher operational costs. Investing in DDL governance pays for itself many times over.
-
Automation beats education. Training developers to follow naming conventions is valuable, but a linting tool that rejects non-conforming DDL before it reaches the DBA queue is more reliable. Humans forget conventions; tools do not.
-
The 24-hour cooling-off period for DROP is the single most effective safeguard. Two of three averted mistakes were caught by the DBA themselves during the cooling period, not by a second reviewer. Time to reconsider is more powerful than an additional pair of eyes.
-
Cross-platform standards matter. With DB2 on both z/OS and LUW, consistent naming conventions and data type standards make it possible for DBAs to work across platforms without constantly translating mental models. A foreign key named fk_account_customer means the same thing on both platforms.
-
Legacy remediation is a marathon, not a sprint. Renaming 12,400 unnamed constraints in production is not a weekend project. HFG allocated dedicated capacity to rename constraints in batches over 18 months, prioritizing high-traffic tables first. The linting tool ensures no new unnamed constraints are created while the backlog is cleared.
Discussion Questions
-
HFG's standard requires NOT NULL by default, with NULLability requiring justification. What business scenarios would justify a nullable column? Give three specific examples from a banking schema.
-
The DDL linting tool catches naming violations automatically. What other DDL anti-patterns could be detected automatically? Design five additional lint rules.
-
Krishnamurthy's team chose a 24-hour cooling-off period for DROP statements. Is this too conservative, not conservative enough, or about right? What factors should determine the appropriate delay?
-
HFG limits OLTP tables to 12 indexes without DBA approval. How would you determine the right limit for your organization? What metrics would inform this decision?
-
The governance process adds 2 business days to DDL changes. A development team argues this slows their sprint velocity. How would you balance governance rigor with development speed? Are there DDL changes that could be fast-tracked?
Return to Chapter 11 | Continue to Exercises