Chapter 21: Data Modeling Patterns — Solving Common Design Problems

DataField.Dev

29 min read

> Where you are: Part III, Chapter 21 of 40. You can model, map, normalize, and denormalize. This chapter gives you a vocabulary of reusable solutions to problems that recur in every domain — so you recognize them instantly and don't reinvent (often...

In This Chapter

Patterns: the designer's vocabulary
Audit trails: who changed what, when
Soft deletes: marking instead of removing
Versioning: keeping the history of changes
Multi-tenancy: one database, many customers
Hierarchies: four ways to model a tree
Tagging: the M:N junction (revisited)
Polymorphic associations: relating to "one of several" types
Audit trails in depth
Versioning and temporal data
A few more worth knowing
Soft deletes in depth
Hierarchies: the four models, compared
Polymorphic associations, worked through
Common mistakes
Multi-tenancy in depth
More patterns worth knowing
Patterns as a shared vocabulary
Combining patterns in practice
Progressive project: apply the patterns
Summary

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

Chapter 21: Data Modeling Patterns — Solving Common Design Problems

Where you are: Part III, Chapter 21 of 40. You can model, map, normalize, and denormalize. This chapter gives you a vocabulary of reusable solutions to problems that recur in every domain — so you recognize them instantly and don't reinvent (often badly) each time.

Learning paths: 💻 🏗️ especially; 🔬 CS students (the hierarchy models are classic); 📊 analysts (audit/versioning/soft-delete shape the data you query).

Patterns: the designer's vocabulary

Just as software developers have design patterns, database designers have data modeling patterns — proven structures for recurring problems. Audit trails, soft deletes, hierarchies, tagging: these show up in almost every application. Knowing the standard solutions (and their trade-offs) makes you faster and steers you away from the common wrong turns (the comma-separated tags, the array-as-junction). This chapter is a tour of the most useful ones.

Audit trails: who changed what, when

Many systems must record a history of changes — for compliance, debugging, or accountability. The pattern: a separate history/audit table that captures each change, populated automatically by a trigger (Chapter 15) so no write path can bypass it.

CREATE TABLE product_audit (
    audit_id   bigint GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    product_id integer NOT NULL,
    changed_at timestamptz NOT NULL DEFAULT now(),
    changed_by text,                          -- the DB user or app user
    action     text NOT NULL,                 -- 'INSERT' / 'UPDATE' / 'DELETE'
    old_row    jsonb,                          -- the row before (NULL on insert)
    new_row    jsonb                           -- the row after  (NULL on delete)
);
-- + a trigger on products that writes old_row/new_row as jsonb on every change

Storing the before/after as jsonb (Chapter 16) is a neat trick — one audit table works for any table, capturing the whole row without a column-per-column schema. The trigger guarantees completeness (it fires for app writes, admin tools, and cascades alike — the Case Study 2 lesson from Chapter 20).

Soft deletes: marking instead of removing

Sometimes you must not really delete data — for recoverability, audit, or referential safety. The soft delete pattern marks a row as deleted instead of removing it:

ALTER TABLE products ADD COLUMN deleted_at timestamptz;   -- NULL = active
-- "delete":
UPDATE products SET deleted_at = now() WHERE product_id = 15;
-- queries must exclude deleted rows:
SELECT * FROM products WHERE deleted_at IS NULL;

Pros: recoverable ("undelete" = set deleted_at = NULL); preserves history and referential integrity (old orders still reference the product); enables "deleted" reporting. Cons: every query must remember WHERE deleted_at IS NULL (easy to forget → showing deleted data); unique constraints get tricky (a "deleted" email shouldn't block re-registration — use a partial unique index WHERE deleted_at IS NULL); the table grows forever (may need archival).

Common mistake. Adopting soft deletes and then forgetting the WHERE deleted_at IS NULL filter somewhere, so deleted records reappear. Mitigate with a view (Chapter 15) that pre-filters (CREATE VIEW active_products AS SELECT * FROM products WHERE deleted_at IS NULL) and query the view everywhere.

Versioning: keeping the history of changes

Beyond "who changed it" (audit), sometimes you need every version of a row queryable — wiki page revisions, contract versions, price history. A common pattern: a *_versions table holding each version with a valid_from/valid_to (or version_number), so you can reconstruct the state at any point in time (a "temporal" table):

CREATE TABLE document_versions (
    document_id integer NOT NULL,
    version     integer NOT NULL,
    title       text, body text,
    valid_from  timestamptz NOT NULL,
    valid_to    timestamptz,                  -- NULL = current version
    PRIMARY KEY (document_id, version)
);

PostgreSQL's range types and exclusion constraints (Chapter 16) can enforce non-overlapping validity periods. This is heavier than audit logging — use it when querying past states is a real requirement, not just recording that a change happened.

Multi-tenancy: one database, many customers

SaaS applications serve many customer organizations ("tenants") from shared infrastructure. Three patterns, increasing isolation:

Shared schema, tenant_id column — every table has a tenant_id; every query filters by it. Simplest and cheapest; the risk is forgetting the filter and leaking data across tenants (mitigate with row-level security, Chapter 32). Scales to many small tenants.
Schema per tenant — each tenant gets its own schema (namespace, Chapter 14) with identical tables. Stronger isolation; more objects to manage; good for a moderate number of larger tenants.
Database per tenant — full isolation, easy per-tenant backup/restore, but the most operational overhead. For few, large, or highly-regulated tenants.

The choice trades isolation/operational cost against density/simplicity. Most SaaS starts with shared-schema + tenant_id (+ row-level security) and moves heavier tenants out as needed.

Hierarchies: four ways to model a tree

Trees (categories, org charts, comment threads, folders) recur constantly, and there are four classic models — each trading read vs. write performance differently (this deepens Chapter 11's recursion):

Model	How	Read subtree	Write/move	Notes
Adjacency list	each row has `parent_id`	recursive CTE	easy (one update)	Mercado's `categories`/`employees`; simplest
Materialized path	store the path string `'1/4/9'`	`LIKE '1/4/%'`	moderate	fast subtree reads; path maintenance on move
Nested set	store `lft`/`rgt` numbers	range query	expensive (renumber many rows)	very fast reads, painful writes
Closure table	a table of all ancestor-descendant pairs	simple join	moderate (insert/delete pairs)	fast reads and flexible; extra storage

Adjacency list (a parent_id self-reference) is the default — simplest, and recursive CTEs (Chapter 11) query it well. When subtree reads dominate and the tree is large/deep, a closure table (or materialized path) trades write cost and storage for fast reads — a denormalization (Chapter 20) for hierarchies. Nested sets give the fastest reads but brutal writes (moving a node renumbers much of the tree); use only for near-static trees.

Tagging: the M:N junction (revisited)

Tagging (an item has many tags; a tag applies to many items) is the canonical many-to-many, resolved with a junction table (Chapters 17–18). It's listed here because it's so common and so commonly botched (the comma-separated-tags anti-pattern):

   items ──< item_tags >── tags     (item_tags is the junction)

The junction can carry relationship attributes (who tagged it, when), and a UNIQUE/composite PK prevents duplicate tags. Whenever you see "a list of labels/categories/skills," reach for this pattern, not an array or a CSV column.

Polymorphic associations: relating to "one of several" types

Sometimes a row must relate to one of several entity types — a comment on a post or a photo or a video; an attachment on many kinds of records. This is genuinely tricky. Three approaches:

Separate junction/FK per type — post_comments, photo_comments, etc. Pro: real foreign keys, full integrity. Con: more tables; "all comments" needs a UNION.
Type + id columns (the "polymorphic" shortcut) — commentable_type text, commentable_id integer. Pro: one table. Con: no foreign key possible (the id points to different tables depending on type), so integrity isn't enforced — a known weakness (popular in some ORMs, but it sacrifices referential integrity).
Shared parent table — a commentable supertype table that posts/photos all reference, and comments point to it. Pro: real FKs, "all comments" is easy. Con: an extra layer.

There's no perfect answer; the right choice depends on how much you value enforced integrity (favor #1 or #3) vs. simplicity (#2). The key is to recognize it's a polymorphic association and choose deliberately — not stumble into the type+id shortcut unaware it drops integrity.

Audit trails in depth

Audit trails — recording who changed what and when — are required in a remarkable range of systems (compliance, security, debugging, accountability), and the pattern rewards doing it well. The defining requirement of an audit trail is completeness: it must capture every change, through every path, or it's not trustworthy. An audit log with gaps is worse than none, because it creates false confidence. This completeness requirement is exactly why the audit pattern is built on a trigger rather than application code: a trigger fires on every write to the table — from the application, from an admin tool, from a manual psql session, from a cascade — so no path can bypass it. Application-level audit logging, by contrast, only captures the paths the application code knows about, and the one path that forgets to log is the one that matters in the investigation.

The elegant generic implementation uses a single audit table with jsonb before/after snapshots, which works for any table without a column-per-column audit schema. The trigger captures the whole OLD row and NEW row as JSONB, so one audit_log table can record changes to products, customers, orders, and everything else — each entry holding the table name, the action, the timestamp, the changing user, and the complete before/after state as JSON. This is a beautiful application of JSONB (Chapter 16): the schema-less flexibility of JSON is exactly right for "store an arbitrary row's state," and it means adding audit to a new table is just attaching the trigger, no new audit-table design required. The captured changed_by can be the database user or, better, an application-level user id passed via a session variable, so the audit answers "which person" not just "which database account."

Beyond the structure, real audit trails involve decisions about retention and access. Audit data grows continuously (every change adds a row) and is rarely read but must be kept — often for years, for compliance — so it's a candidate for partitioning by time (Chapter 25) and archiving old partitions to cheaper storage. Access to the audit log is itself sensitive (it reveals the history of all changes) and is typically restricted more tightly than the data it audits. And the audit log should be append-only — even administrators shouldn't be able to alter it, or it loses its evidentiary value; some systems enforce this with permissions that grant INSERT but not UPDATE/DELETE on the audit table. These considerations — completeness via triggers, generic capture via JSONB, time-based retention, restricted append-only access — are what turn a naive "log changes" idea into a trustworthy audit trail. The pattern is common, but doing it right requires thinking through all of them, which is why it's worth treating as a real design problem rather than an afterthought bolted on late.

Versioning and temporal data

Sometimes recording that a change happened (audit) isn't enough — you need every version of the data to remain queryable, so you can reconstruct the state of a record as it was at any past moment. Wiki page histories, contract revisions, price histories, insurance policy versions: these domains need temporal modeling, where the past isn't just logged but is first-class, queryable data. This is a heavier pattern than audit logging, and knowing when each applies is the key judgment.

The common temporal pattern stores each version of a record with a validity period — a valid_from and valid_to timestamp (or a version number) — so the table holds not just the current state but every historical state, each marked with when it was true. "What was this document's title on March 1st?" becomes a query filtering for the version whose validity period contained March 1st. PostgreSQL's range types and exclusion constraints (Chapter 16) shine here: you can store the validity period as a single range column and use an exclusion constraint to guarantee that a given record's versions never have overlapping validity periods — the database enforcing the temporal consistency that would otherwise require careful application logic. This turns "no two versions of the same record can claim to be valid at the same instant" from a hoped-for invariant into a structural guarantee.

The crucial decision is audit versus temporal, because they're often confused and they have very different costs. An audit trail answers "what changed, who changed it, when" — it's a log about changes, and it's relatively cheap (a trigger and a history table). Temporal versioning answers "what was the data at time T" — it makes past states themselves queryable as data, and it's heavier (every version is a row in the live data model, queries must account for validity periods, the current state is one version among many). Use audit logging when you need accountability and the ability to see what happened; use temporal versioning when querying past states is a genuine, ongoing requirement of the application, not just an occasional forensic need. The mistake is reaching for full temporal versioning (with its complexity) when an audit table would have sufficed, or — less commonly — using only an audit log when the application genuinely needs to query historical states routinely. Match the pattern to whether past states are reference data the application uses (temporal) or a record of changes you occasionally inspect (audit). Getting this distinction right avoids both over-engineering (temporal where audit suffices) and under-engineering (audit where temporal is needed), which is the recurring judgment that runs through every pattern in this chapter.

A few more worth knowing

Lookup/reference tables — small tables of valid values (statuses, countries, categories) referenced by FK, instead of free-text or CHECK lists. Easier to extend and to attach attributes (a status's description/color).
State machines — model an entity's status transitions; sometimes a status column + CHECK, sometimes a transitions table enforcing valid moves.
Event sourcing — store an append-only log of events and derive current state from them (advanced; powerful for audit-heavy/temporal domains). Mentioned for awareness.

Soft deletes in depth

Soft deletes are among the most-used and most-misused patterns, so they reward a closer look. The core idea is simple — instead of physically removing a row, mark it deleted with a deleted_at timestamp (NULL meaning active) — but the consequences ripple through your entire query layer, and handling them well is what separates a clean implementation from a buggy one.

The central challenge is that every query against a soft-deleted table must now remember to exclude deleted rows with WHERE deleted_at IS NULL, and forgetting it anywhere causes deleted records to reappear where they shouldn't. Relying on every developer to remember this filter on every query is exactly the kind of fragile, diligence-dependent approach that fails over time. The robust solution is a filtering view: define CREATE VIEW active_products AS SELECT * FROM products WHERE deleted_at IS NULL, and have application code query the view rather than the base table. Now the filter is applied in one place, automatically, and no query can forget it. This is the views-as-interface idea from Chapter 15 applied to soft deletes: the view is the safe default interface, and direct access to the base table (which sees deleted rows too) is reserved for the rare cases that genuinely need the full history.

Soft deletes also complicate unique constraints in a way that catches people. Suppose email is unique on a users table with soft deletes. A user soft-deletes their account, then tries to re-register with the same email — but the soft-deleted row still holds that email, so the unique constraint blocks the re-registration, even though the old account is "deleted." The fix is a partial unique index (Chapter 23): CREATE UNIQUE INDEX ON users (email) WHERE deleted_at IS NULL, which enforces uniqueness only among active rows, allowing a deleted row and a new active row to share an email. This is a recurring soft-delete subtlety — uniqueness should usually apply to active records only — and the partial index is its clean solution. Without it, soft deletes and unique constraints fight each other.

Finally, soft deletes mean the table grows forever, since nothing is ever physically removed. For most tables this is fine (storage is cheap, and the history is valuable), but for high-churn tables it may eventually warrant archiving truly-old soft-deleted rows to a separate table. And soft deletes interact with foreign keys in a subtle way: a soft-deleted parent still physically exists, so foreign keys to it still resolve — which is often desirable (old orders should still reference their now-"deleted" product), and is indeed one of soft delete's advantages over hard delete (it preserves referential integrity with historical records). Weighing all this — the filtering view, the partial unique index, the unbounded growth, the referential behavior — is what implementing soft deletes well requires. Done carelessly (no view, no partial index), soft deletes are a source of bugs; done with these techniques, they're a clean, recoverable, history-preserving alternative to destruction.

Hierarchies: the four models, compared

Trees recur in nearly every domain — categories, org charts, comment threads, folders, bills of materials — and the four classic models for storing them each make a different read-versus-write trade-off. Choosing the right one for your access pattern is a genuine design decision, and understanding all four lets you make it deliberately rather than defaulting blindly.

The adjacency list — each row stores its parent_id — is the simplest and the right default. It's how Mercado stores categories and employees, it's trivial to write (moving a node is one UPDATE of its parent_id), and recursive CTEs (Chapter 11) query it well for most needs. Its weakness is that reading a deep subtree or a full ancestor path requires recursion, which, while elegant, can be costly for very large or very deep trees queried constantly. The materialized path stores each node's full path as a string ('1/4/9'), making subtree queries a simple, index-friendly LIKE '1/4/%' — very fast reads — at the cost of maintaining the path strings when nodes move (a moved subtree needs all its descendants' paths updated). It's a denormalization (Chapter 20) for hierarchies: redundant path data traded for read speed.

The nested set model stores left and right numbers that encode the tree's structure such that a subtree is a contiguous numeric range, making subtree reads extremely fast range queries. But its writes are brutal: inserting or moving a node requires renumbering a large fraction of the tree, so it's appropriate only for near-static hierarchies that are read constantly and almost never edited. The closure table stores, in a separate table, every ancestor-descendant pair (including each node to itself), so finding all descendants or all ancestors is a simple indexed join with no recursion — fast reads and reasonable writes (inserting a node adds a bounded set of pairs), at the cost of extra storage (a tree of N nodes can have many more pairs). The closure table is often the best choice when you need both fast subtree reads and flexible editing, and it's worth knowing how to build:

-- Closure table: one row per (ancestor, descendant) pair, including self-pairs.
CREATE TABLE category_closure (
    ancestor_id   integer NOT NULL REFERENCES categories (category_id),
    descendant_id integer NOT NULL REFERENCES categories (category_id),
    depth         integer NOT NULL,                  -- 0 for self, 1 for direct child, ...
    PRIMARY KEY (ancestor_id, descendant_id)
);
-- "All descendants of category 4" is now a simple, indexed query — no recursion:
SELECT descendant_id FROM category_closure WHERE ancestor_id = 4 AND depth > 0;

The selection logic: start with the adjacency list (simplest, fine for most cases, and recursive CTEs handle the queries). If subtree reads become a measured bottleneck on a large tree, move to a closure table (fast reads, flexible writes) or a materialized path (fast reads, simpler but path-maintenance on moves). Reserve nested sets for trees that are read-heavy and almost never modified. This is the read-write trade-off of denormalization (Chapter 20) applied specifically to hierarchies: the adjacency list is the "normalized" baseline, and the other three are progressively more denormalized structures that buy read speed with write cost and storage. Match the model to how your tree is actually used — mostly read, mostly written, how deep, how large — and you'll avoid both the under-engineering of forcing recursion on a read-hot deep tree and the over-engineering of a nested set on a tree that's edited daily.

Polymorphic associations, worked through

Polymorphic associations — where a row relates to one of several entity types, like a comment that can belong to a post, a photo, or a video — are genuinely tricky, and they're where the integrity-versus-convenience trade-off is sharpest. Walking through the three approaches in detail clarifies a decision people often make unknowingly (and regret).

The type+id shortcut stores two columns, commentable_type (e.g., 'post' or 'photo') and commentable_id (the id within that type's table). It's seductively simple — one comments table handles everything — and it's popular in some ORMs. But it has a fatal weakness: no foreign key is possible, because the commentable_id points to different tables depending on the type, and a foreign key can only reference one table. So referential integrity is gone — nothing stops a comment from referencing a post that doesn't exist, and a deleted post leaves orphaned comments the database doesn't catch. You've traded the relational model's core guarantee for convenience, often without realizing it. This is the approach to reach for only when you've consciously decided integrity matters less than simplicity here, and you'll enforce integrity in application code instead.

The separate junction/FK per type approach creates post_comments, photo_comments, video_comments — each a proper table with a real foreign key to its specific parent type. This preserves full referential integrity (each table's foreign key is real and enforced), at the cost of more tables and the need to UNION across them for an "all comments" query. It's the right choice when integrity is paramount and the number of types is small and stable. The shared parent table approach introduces a commentable supertype table that posts, photos, and videos all reference (a 1:1 from each to a commentable row), and comments point to that shared table with a real foreign key. This restores integrity (the comment's FK to commentable is real) and makes "all comments" easy (they all reference one table), at the cost of an extra layer of indirection and the supertype table to maintain. It's the most "correct" relational solution, favored when you need both integrity and easy cross-type queries.

The crucial lesson is not that one approach is universally right — they trade integrity against simplicity differently — but that you should recognize you're facing a polymorphic association and choose deliberately, fully aware of what each option costs. The common failure is reaching for the type+id shortcut (because an ORM makes it easy) without realizing you've just dropped foreign-key integrity for that relationship. If you choose type+id, choose it knowing integrity is now your application's responsibility, not the database's. If integrity matters, choose the per-type tables or the shared parent. The pattern itself is unavoidable in many domains (anything that can attach to multiple kinds of things — comments, tags, attachments, audit references); what's avoidable is making the integrity-sacrificing choice unconsciously. Recognize the shape, weigh the trade-off, and decide with eyes open — which is, once again, the theme of the whole chapter.

Common mistakes

Reinventing a pattern badly — comma-separated tags instead of a junction; phone1/2/3 instead of a child table; a hand-rolled hierarchy when a closure table exists.
Soft deletes without a filtering view — deleted rows leak into queries.
The type+id polymorphic shortcut without realizing it drops foreign-key integrity.
Nested sets for a frequently-edited tree — writes become agony.
Over-patterning — applying event sourcing or temporal versioning when a simple audit table or updated_at would do.

Multi-tenancy in depth

Multi-tenancy — serving many customer organizations (tenants) from shared infrastructure — is a defining challenge of SaaS applications, and the choice among its three models has profound consequences for isolation, cost, and operations. Understanding the trade-offs in depth lets you make a choice you won't have to painfully reverse later.

The shared-schema model gives every table a tenant_id column, and every query filters by it — all tenants' data coexists in the same tables, distinguished only by that column. It's the simplest and most cost-efficient (one schema, one set of tables, maximum density — thousands of small tenants in one database), which is why most SaaS products start here. Its danger is data leakage: forget the tenant_id filter on a single query, and one tenant sees another's data — a catastrophic security failure. The essential mitigation is row-level security (RLS, Chapter 32), a PostgreSQL feature that enforces tenant isolation at the database level, automatically adding the tenant_id filter to every query so the application physically cannot see another tenant's rows even if a query forgets the filter. Shared-schema with RLS is a strong, common architecture: the density and simplicity of shared tables, with the database guaranteeing isolation rather than trusting every query to filter correctly.

The schema-per-tenant model gives each tenant its own schema (namespace, Chapter 14) containing identical tables. This provides stronger isolation (a tenant's data is in its own namespace, harder to leak across) and allows per-tenant customization, at the cost of more objects to manage (N tenants means N copies of every table) and more complex migrations (a schema change must be applied to every tenant's schema). It suits a moderate number of larger tenants where stronger isolation justifies the operational overhead. The database-per-tenant model goes furthest: each tenant gets an entirely separate database, giving complete isolation, trivial per-tenant backup and restore, and the ability to place different tenants on different servers — but the highest operational overhead (managing many databases) and the lowest density. It suits few, large, or highly-regulated tenants (think enterprise customers with strict data-residency or compliance requirements) where the isolation is worth the cost.

The decision is fundamentally a spectrum from density and simplicity (shared schema) to isolation and operational cost (database per tenant), and the right point depends on your tenant profile: many small tenants favor shared-schema; few large or regulated tenants favor more isolation. Crucially, this is a decision that's expensive to change later — migrating from shared-schema to database-per-tenant once you have thousands of tenants is a major undertaking — so it deserves careful thought up front, with an eye to how your tenant base is likely to evolve. A common pragmatic path is to start shared-schema-with-RLS (cheap, simple, secure enough) and move specific large or demanding tenants to more isolated models as needs arise — a hybrid that gets density for the many and isolation for the few. Whatever you choose, choose it understanding the trade-off and the cost of reversal, because multi-tenancy architecture is one of the most consequential and hardest-to-undo design decisions a SaaS system makes.

More patterns worth knowing

Beyond the major patterns, several smaller ones recur often enough to belong in your vocabulary. Lookup (reference) tables store the valid values for a category — statuses, countries, currencies — in a small table referenced by foreign key, rather than as free text or a CHECK ... IN (...) list. The advantage over a CHECK constraint is extensibility and richness: adding a new status is an INSERT (no schema change), and the lookup table can carry attributes about each value (a status's display label, color, sort order, or whether it's "active"). Use a CHECK list for a small, truly-fixed set (the four loyalty tiers); use a lookup table when the set may grow or the values need their own attributes. State machines model an entity's lifecycle of statuses and the valid transitions between them — an order goes pending → paid → shipped → delivered, but never delivered → pending. Sometimes this is a status column with a CHECK, sometimes a separate transitions table that enforces which moves are legal, depending on how strictly the transitions must be controlled.

Event sourcing is a more advanced pattern where, instead of storing current state and overwriting it on each change, you store an append-only log of events and derive the current state by replaying them. Rather than a balance column you update, you store every deposit and withdrawal event and compute the balance from them. This gives a perfect audit trail (every change is an immutable event), the ability to reconstruct state at any past moment, and natural support for temporal queries — at the cost of significant complexity (deriving current state is work, and querying "current" data requires either replaying events or maintaining a derived view). It's powerful for audit-heavy, temporal, or financial domains, but it's genuine architectural complexity, mentioned here for awareness rather than as a default. The judgment with all these patterns is to match the pattern's complexity to the problem's actual needs — a lookup table where a CHECK suffices is mild over-engineering; event sourcing where an audit table would do is major over-engineering. Reach for the simplest pattern that solves the real problem.

Patterns as a shared vocabulary

The deepest value of knowing these patterns isn't just that they save you from reinventing solutions — it's that they give you a shared vocabulary with other database practitioners, which makes design conversations dramatically more efficient. When you say "we'll use a closure table for the category hierarchy" or "that's a polymorphic association, so let's use a shared parent table to keep integrity" or "soft-delete it with a filtering view," you're communicating a complete design — structure, trade-offs, and maintenance approach — in a few words, to anyone who shares the vocabulary. This is exactly why software design patterns became so influential: not because the patterns were novel, but because naming them let developers discuss complex designs concisely and precisely. Data modeling patterns do the same for database design.

This shared vocabulary also sharpens your own thinking, because recognizing that a problem is an instance of a known pattern immediately brings the pattern's known trade-offs and pitfalls to mind. Seeing "an item has a list of labels" and thinking "that's tagging, the canonical M:N — junction table, not a CSV column" saves you from the comma-separated-tags mistake automatically. Seeing "comments can attach to several types" and thinking "polymorphic association — and the type+id shortcut drops integrity" stops you from stumbling into that trap unaware. The patterns are pre-packaged judgment: each one bundles a recurring problem with its proven solutions and their costs, so recognizing the problem gives you the solutions for free, along with the warnings. That's why the chapter framed patterns as the designer's vocabulary — they're how experienced designers think fast and communicate precisely, turning recurring problems into recognized shapes with known answers.

So the goal of this chapter isn't to memorize a catalog but to develop the recognition: to look at a new domain's requirements and see the patterns latent in them — "there's a hierarchy here, an audit need there, a multi-tenancy decision, an M:N relationship, a polymorphic association" — and to reach for the right pattern, with its trade-offs already in mind, rather than improvising. That recognition is what makes design fast and reliable, and it's what distinguishes someone who designs databases professionally from someone who assembles tables ad hoc. Combined with the modeling (Chapter 17), mapping (18), normalization (19), and denormalization (20) you've already learned, this vocabulary of patterns completes the core toolkit of database design — leaving only the question of how to change a design safely once it's in production, which is where Chapter 22 takes us next.

Combining patterns in practice

Real schemas rarely use patterns in isolation — they combine several, and understanding how patterns interact is part of using them well. A single table might be soft-deleted, audited, and part of a hierarchy all at once, and these patterns layer in ways that are mostly harmonious but occasionally require care. Recognizing the combinations, and the few friction points, rounds out the chapter's vocabulary into working fluency.

Many combinations are natural and reinforcing. Soft deletes pair beautifully with audit trails: the audit log records the "deletion" (really an UPDATE of deleted_at) like any other change, giving you a complete history of when things were soft-deleted and by whom, while the soft delete keeps the row recoverable. Multi-tenancy's tenant_id combines with essentially every other pattern — your hierarchies, junctions, and audited tables all carry the tenant_id and are protected by the same row-level security. Lookup tables underpin state machines (the valid states live in a lookup table the transitions reference). These layer cleanly because each pattern addresses a distinct concern — deletion semantics, change history, isolation, valid values — and concerns that don't conflict can coexist.

The friction points are worth knowing precisely because they're where combinations get subtle. Soft deletes plus unique constraints need the partial unique index (as covered). Soft deletes plus foreign keys raise the question of whether a soft-deleted parent should still be referenceable (usually yes — that's an advantage). Hierarchies plus soft deletes raise "what happens to the children of a soft-deleted node?" (a design decision: orphan them, soft-delete them too, or block the deletion). Temporal versioning plus most other patterns gets complex fast, because "which version" interacts with everything — another reason to use full temporal modeling only when truly needed. Polymorphic associations plus referential integrity is the trade-off already explored. None of these friction points is a reason to avoid combining patterns; they're just the places to slow down and decide deliberately rather than assuming the patterns compose without thought.

The meta-skill is to see a table or a relationship through the lens of all the patterns that might apply, and to decide consciously which to use and how they interact. A products table might need: a hierarchy (via categories), soft deletes (recoverable removal), audit (price-change history), denormalized aggregates (review counts, from Chapter 20), and multi-tenancy (if SaaS). Each is a deliberate choice with its own structure and maintenance, and together they make the table robust — if you've thought through how they layer. This is the culmination of Part III's design skills: you don't just model entities and relationships, normalize, and denormalize — you also apply the right patterns, in the right combinations, each chosen with its trade-offs in mind. That complete design judgment — modeling plus normalization plus denormalization plus patterns, all wielded deliberately — is what produces schemas that serve an application well for years. The patterns are the final layer of vocabulary, and combining them fluently is the final layer of skill.

Progressive project: apply the patterns

For your domain, identify and apply the patterns that fit:

Audit/soft delete: does any table need a change history (audit trigger) or recoverable deletes (deleted_at + a filtering view)?
Hierarchy: if you have a tree (categories, threads, prerequisites, org), choose a model (start with adjacency list) and justify it.
Tagging/M:N: model any "list of labels" as a junction table.
Polymorphic: if something relates to multiple types, pick an approach deliberately and note the integrity trade-off.
Multi-tenancy: if your app is multi-customer, pick a tenancy model and justify it.

Document each pattern choice and why. You're building a design that won't surprise you later.

Summary

Data modeling patterns are reusable solutions to recurring problems. Audit trails (history table + trigger, often jsonb before/after) record who changed what. Soft deletes (deleted_at + a filtering view) mark instead of remove — recoverable, but every query must filter. Versioning/temporal tables keep queryable history of past states. Multi-tenancy ranges from shared-schema+tenant_id (+RLS) to schema- or database-per-tenant, trading density for isolation. Hierarchies have four models — adjacency list (default), materialized path, nested set (read-fast/write-slow), and closure table (read-fast, flexible) — pick by your read/write balance. Tagging is the canonical M:N junction; polymorphic associations (relate to one-of-many types) trade integrity vs. simplicity across three approaches. Recognize the pattern, choose deliberately, and avoid the badly-reinvented versions.

You can now: - Implement audit trails and soft deletes (with a filtering view). - Model versioned/temporal data when past states must be queryable. - Choose a multi-tenancy model by isolation vs. density. - Pick the right hierarchy model (adjacency/path/nested-set/closure) for your read/write mix. - Recognize and correctly model tagging (M:N) and polymorphic associations (with eyes open on integrity).

What's next. Chapter 22 — Schema Migration and Evolution — schemas change; this chapter covers how to evolve them safely in production: migration tools, safe vs. dangerous changes, zero-downtime (expand-contract) migrations, and version control for your database. The close of Part III.

Practice in exercises.md, test yourself with the quiz, apply it in the case studies, review the key takeaways, and go deeper with further reading.