Part III: Database Design

There is a particular kind of pain that every experienced DBA knows. It is the pain of inheriting a database that was designed poorly — or, more commonly, was not really designed at all. Someone created tables to match an application's object model. Someone else added columns when new requirements appeared. A third person created a denormalized reporting table because the joins were "too slow." A fourth person added a catch-all VARCHAR(4000) column called MISC_DATA because the change request process for adding proper columns took too long.

Five years later, you are staring at a schema that no one fully understands, where half the constraints are enforced in application code (sometimes inconsistently), where data quality problems cascade through downstream systems, and where every performance fix creates two new performance problems. I have seen this story play out at banks, insurers, manufacturers, and government agencies. The details vary but the root cause is always the same: the database was not designed, it accumulated.

This part of the book is about doing it right.

What This Part Covers

Part III spans four chapters that take you from the abstract principles of data modeling through the concrete physical decisions that determine how DB2 stores and retrieves your data.

Chapter 13 covers logical database design — entity-relationship modeling, normalization theory, and the disciplined process of translating business requirements into a relational schema. We work through the normal forms (1NF through 5NF and BCNF) with enough rigor that you understand not just the rules but the reasoning behind them. You will learn when to normalize, when to deliberately denormalize, and how to document those decisions so that the next person who touches the schema understands the trade-offs that were made.

Chapter 14 addresses physical database design — the decisions that map a logical model onto actual DB2 storage structures. Tablespace selection and configuration on both z/OS and LUW. Page size choices and their cascading effects on row length limits, index structure, and buffer pool efficiency. Compression options — row compression and value compression — and the circumstances where each provides benefit. Partitioning strategies, both range partitioning on z/OS and distribution keys on LUW, and the performance implications of each approach. This chapter is where the relational model meets the realities of disk I/O, memory management, and DB2's internal storage engine.

Chapter 15 is devoted entirely to index design. If this surprises you — a whole chapter just on indexes — then you have not yet experienced the difference a well-designed indexing strategy makes. We cover B-tree indexes in detail: how they are structured, how DB2 traverses them, how they are maintained during inserts and deletes, and how they degrade over time. We cover index-only access, index ANDing and ORing, clustering indexes on z/OS, and the critical relationship between index design and the optimizer's access path selection. We also address the cost side of the equation — the write overhead, the storage consumption, and the REORG frequency that indexes demand. By the end of this chapter, you will have a systematic methodology for designing indexes rather than the ad hoc approach that most people use.

Chapter 16 deals with schema evolution — how to modify a database design after it is in production. This is the chapter that most database design books skip, and it is the chapter that practicing DBAs need most. Adding columns, changing data types, splitting tables, merging tables, introducing partitioning to an existing tablespace, migrating data between schemas — all of these operations have to happen without downtime in most modern environments, and DB2 provides mechanisms for each of them. We cover online schema changes on both platforms, the use of views to provide backward compatibility during transitions, and the organizational processes that keep schema evolution from descending into chaos.

Why It Matters

Design is leverage. A well-designed database makes everything else easier — queries run faster because the optimizer has good structures to work with, administration is simpler because storage is organized logically, recovery is more predictable because related data lives together, and application development is more productive because the schema expresses the business domain clearly.

A poorly designed database fights you at every turn. You write complex SQL to compensate for missing relationships. You build materialized views and summary tables to work around structural inefficiencies. You spend hours tuning individual queries when the real problem is that the data should have been organized differently from the start. You add indexes to fix one query and degrade ten others.

The most expensive database design decision is the one you have to reverse after the system is in production with two years of accumulated data and fifty applications depending on the current schema. Getting the design right — or at least acceptably right — before those dependencies form is worth every hour you invest in the process.

I want to be clear about something: good design does not mean perfect design. Perfect is the enemy of shipped. What good design means is intentional design — design where every table, every relationship, every constraint, and every denormalization exists for a documented reason, and where the consequences of each decision are understood. You can make pragmatic compromises with your eyes open. What you cannot afford is to make them accidentally.

The Meridian Bank Data Model

This is where the Meridian National Bank project gets its definitive data architecture. In Part I, we identified the business entities and relationships. In Part II, we queried a working dataset. Now we design the schema from first principles.

We will model Meridian's core banking entities — customers (individuals and organizations), accounts (checking, savings, money market, certificates of deposit, loans), transactions, branches, and employees. We will model the regulatory reporting structures that banking regulations demand. We will handle temporal data — effective dates, audit trails, and historical versioning — using DB2's temporal table capabilities where appropriate and manual designs where they give us more control.

The physical design chapters will show both the z/OS implementation (using partitioned tablespaces, STOGROUP-managed storage, and clustering indexes) and the LUW implementation (using automatic storage tablespaces, range partitioning, and multi-dimensional clustering). You will see how the same logical model maps to different physical implementations depending on the platform, and why those differences exist.

The index design chapter will work through Meridian's query workload systematically, identifying the access patterns that drive index selection, resolving conflicts between competing queries, and arriving at an indexing strategy that balances read performance against write overhead.

The Relationship Between Design and Performance

I want to flag something that will become fully apparent in Part V but deserves mention here: database design and database performance are not separate topics. They are two views of the same underlying reality. The physical structures you choose in this part — the tablespace configurations, the partitioning schemes, the index selections — directly determine the access paths the optimizer considers in Part V. If you skip Part III or treat it as purely theoretical, you will arrive at the performance chapters without the structural knowledge you need to understand why the optimizer makes the choices it does.

Conversely, the performance knowledge in Part V will sometimes reveal that a design decision needs revision. That feedback loop is normal and healthy. Chapter 16 exists precisely because schema evolution is an expected part of a database's lifecycle, not a failure of initial design.

How to Approach This Part

If you are primarily a developer, you may be tempted to skim the physical design chapters. Resist that temptation. Understanding how DB2 physically stores data transforms your ability to write efficient SQL and to have productive conversations with the DBAs who manage the systems your applications depend on.

If you are primarily a DBA, you may be tempted to skip the logical design chapter because you think data modeling is the analysts' job. It is not — or at least, it should not be exclusively their job. A DBA who cannot evaluate a logical model cannot catch design problems before they become physical reality, and catching them later is an order of magnitude more expensive.

Work through the Meridian Bank design exercises in their entirety. Design is a skill that improves with practice, and the exercises are structured to present the kinds of ambiguities and trade-offs that real projects produce. There are places where reasonable people will disagree about the right answer, and those disagreements are the point. The goal is not to memorize a single correct design but to develop the judgment to evaluate competing alternatives.

Chapters in This Part