28 min read

> Where you are: Part II, Chapter 14 of 40. You've queried and modified data inside tables someone else built (the Mercado schema). Now you'll build the tables — turning the relational-model concepts of Chapter 3 into real, well-constrained...

Chapter 14: Data Definition — CREATE, ALTER, DROP, and Building Your Schema

Where you are: Part II, Chapter 14 of 40. You've queried and modified data inside tables someone else built (the Mercado schema). Now you'll build the tables — turning the relational-model concepts of Chapter 3 into real, well-constrained structures.

Learning paths: 💻 📊 🔬 🏗️ — everyone, but 💻 developers and 🏗️ DBAs own this. This chapter is also where your progressive project finally gets a real schema.


DDL: defining the shape of your data

Data Definition Language (DDL)CREATE, ALTER, DROP — defines and changes the structure of your database: tables, columns, types, constraints, indexes, schemas. Where DML (Chapter 13) changes the rows, DDL changes the container. Every design idea from Chapter 3 (keys, domains, integrity constraints) becomes concrete here, in the CREATE TABLE statement.

This is also where the book's first theme — design is the most important skill — meets the keyboard. A CREATE TABLE is a series of decisions: what columns, what types, what rules. Good decisions here make every future query simpler and make whole classes of bad data impossible. Sloppy decisions echo for years.


CREATE TABLE

The anatomy of a table definition (this is mercado's customers, which you've been querying all along):

CREATE TABLE customers (
    customer_id  integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    first_name   text        NOT NULL,
    last_name    text        NOT NULL,
    email        text        NOT NULL UNIQUE,
    phone        text,
    loyalty_tier text        NOT NULL DEFAULT 'standard'
                             CHECK (loyalty_tier IN ('standard','silver','gold','platinum')),
    created_at   timestamptz NOT NULL DEFAULT now()
);

Each line is column_name type [constraints]. Reading it tells you the table's entire contract: an auto-generated primary key, required names, a unique required email, an optional phone, a tier restricted to four values defaulting to 'standard', and a creation timestamp defaulting to now.


Choosing data types (the domains of Chapter 3)

Picking the right type is a correctness decision, not just storage. The essentials (full reference in Appendix D):

Need Use Avoid
Whole numbers integer, bigint
Money / exact decimals numeric(p,s) real/double precision (rounding errors!)
Text text (no length penalty) overusing varchar(n) for arbitrary limits
True/false boolean char(1) flags like 'Y'/'N'
Date only date text dates
Timestamp timestamptz (zone-aware) timestamp unless you truly want no zone
Unique id uuid (or integer identity) text ids
Flexible/nested jsonb a pile of nullable columns
List of values array, or a child table comma-separated text

Common mistake. varchar(255) everywhere (a habit from other databases). In PostgreSQL, text and varchar have identical performance; a length limit should reflect a real rule, not a reflex. And never store money as a float — numeric is the only correct choice (Chapters 3 and 8).

Dialect Difference. Type names differ across databases (PostgreSQL text/timestamptz vs. SQL Server nvarchar/datetime2 vs. Oracle VARCHAR2/NUMBER). The concepts port; the spellings don't (Appendix J).


Constraints in depth: encoding your rules

Constraints are how a good schema makes integrity automatic (Chapter 3). PostgreSQL enforces them on every write, for every client, forever.

CREATE TABLE order_items (
    order_id   integer       NOT NULL REFERENCES orders (order_id) ON DELETE CASCADE,
    product_id integer       NOT NULL REFERENCES products (product_id),
    quantity   integer       NOT NULL CHECK (quantity > 0),
    unit_price numeric(10,2) NOT NULL CHECK (unit_price >= 0),
    discount   numeric(4,3)  NOT NULL DEFAULT 0 CHECK (discount >= 0 AND discount < 1),
    PRIMARY KEY (order_id, product_id)            -- a TABLE constraint (spans columns)
);

The constraint family:

  • NOT NULL — value required.
  • DEFAULT expr — value used when none is supplied (now(), 0, 'standard', etc.).
  • UNIQUE — no two rows share this value (or combination).
  • PRIMARY KEY — unique + not null; one per table; can span columns (composite, as above).
  • CHECK (condition) — any boolean rule per row (quantity > 0, rating BETWEEN 1 AND 5).
  • REFERENCES other(col) — a foreign key (referential integrity).

Column vs. table constraints: a single-column rule can go inline on the column; a rule spanning multiple columns (a composite PK, or a CHECK referencing two columns) must be a table constraint (listed separately, like PRIMARY KEY (order_id, product_id)).

Name your constraints for clearer errors and easier ALTER later:

CONSTRAINT positive_quantity CHECK (quantity > 0)

Referential actions: what happens on delete/update

A foreign key can specify what to do when the referenced row is deleted or its key changes:

Action On delete of parent…
ON DELETE RESTRICT / NO ACTION (default) block the delete if children exist
ON DELETE CASCADE delete the children too
ON DELETE SET NULL set the child's FK to NULL
ON DELETE SET DEFAULT set the child's FK to its default

Mercado uses these deliberately: deleting a customer cascades to their addresses (they're meaningless without the customer) but is restricted by orders (you must not lose order history). Choosing the right action per relationship is a real design decision — CASCADE is convenient but dangerous (it can delete far more than you expect).

Why this matters (design, theme #1). The set of constraints you declare is your data's contract. Every rule you encode here is a bug you can never have. Spend the time: which columns can be null? what values are valid? which references must hold? what should deletes do? Answering these well is the essence of database design (Part III).


IDENTITY and sequences: auto-generating keys

You've seen GENERATED ALWAYS AS IDENTITY on every Mercado primary key. It tells PostgreSQL to assign the value automatically from an internal sequence:

id integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY   -- you cannot supply your own (use OVERRIDING to force)
id integer GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY -- you may supply your own, else auto

ALWAYS is the safer default (prevents accidental manual ids that collide with the sequence). Under the hood is a sequence — a special object that hands out increasing numbers; you can also create and use sequences directly (CREATE SEQUENCE, nextval(...)).

Dialect Difference. GENERATED ... AS IDENTITY is the SQL-standard form (PostgreSQL 10+). The older serial/bigserial pseudo-types still work but are legacy; prefer IDENTITY. MySQL uses AUTO_INCREMENT; SQL Server uses IDENTITY(1,1). See Appendix J.


ALTER TABLE: changing structure

Schemas evolve. ALTER TABLE adds, removes, and modifies columns and constraints:

ALTER TABLE products ADD COLUMN weight_grams integer;                 -- add a column
ALTER TABLE products ALTER COLUMN weight_grams SET DEFAULT 0;         -- set a default
ALTER TABLE products ADD CONSTRAINT positive_weight CHECK (weight_grams >= 0);
ALTER TABLE products RENAME COLUMN weight_grams TO weight_g;          -- rename
ALTER TABLE products DROP COLUMN weight_g;                            -- drop a column

Some alterations are cheap and safe; others are risky on a live, populated table:

  • Adding a nullable column is fast and safe.
  • Adding a NOT NULL column with no default to a populated table fails — existing rows would violate it. Add it nullable (or with a default), backfill, then add NOT NULL.
  • Adding a CHECK or foreign key validates all existing rows (can be slow; blocks writes briefly).
  • Dropping a column discards its data — irreversible (outside a transaction/backup).
  • Changing a column's type may require rewriting the table and can fail if data doesn't convert.

These risks are why schema migrations (Chapter 22) are a discipline of their own — safe, ordered, reversible structural changes. For now, know that ALTER is powerful and that some alters are dangerous on production data.

Common mistake. ALTER TABLE t ADD COLUMN c text NOT NULL; on a table with rows → error (existing rows have no value). The safe pattern: add nullable (or with a DEFAULT), backfill the data, then ALTER COLUMN c SET NOT NULL.


DROP: removing structures

DROP TABLE staging_import;              -- remove a table and its data
DROP TABLE products CASCADE;            -- also drop objects that depend on it (FKs, views) — careful!
DROP TABLE IF EXISTS temp_thing;        -- no error if it doesn't exist

DROP TABLE deletes the table and all its data and dependent objects (with CASCADE). It's the most destructive DDL — there's no WHERE, no row-by-row, just gone. Recall the hierarchy of removal:

  • DELETE — remove some rows (keeps the table).
  • TRUNCATE — remove all rows fast (keeps the table).
  • DROP TABLE — remove the table itself.

Common mistake. DROP TABLE ... CASCADE to "make the error go away." CASCADE silently drops everything that depended on the table — views, foreign keys, sometimes more than you realize. Read what depends on an object (\d+) before cascading.


Schemas: namespaces within a database

A schema (not to be confused with "the schema" meaning your overall design) is a namespace inside a database — a folder for tables. The default is public. Schemas let you group objects and avoid name clashes (e.g., sales.orders vs archive.orders):

CREATE SCHEMA reporting;
CREATE TABLE reporting.monthly_summary (...);
SELECT * FROM reporting.monthly_summary;     -- qualified name
SET search_path TO reporting, public;        -- which schemas unqualified names search

Mercado keeps everything in public for simplicity, but schemas are essential for multi-tenant designs, separating reporting from transactional tables, and managing permissions (Chapter 32). The search_path determines where unqualified names are looked up — recall the "missing table" mystery (Chapter 2 case study), which was partly about schemas.


CREATE INDEX (a preview of Chapter 23)

Indexes are technically DDL too. You'll meet them properly in Chapter 23, but the syntax is simple:

CREATE INDEX idx_orders_customer_id ON orders (customer_id);

An index speeds up lookups on customer_id at the cost of extra storage and slower writes. Mercado pre-creates a few; you'll learn which columns to index (and which not to) in Part IV.


A mental model: the schema is a contract

The deepest way to think about DDL is that a CREATE TABLE statement writes a contract — a set of promises about what the data will always look like, enforced by the database on every write, for every client, forever. Every clause in the definition is a clause in that contract. NOT NULL promises a value will always be present. CHECK (price >= 0) promises a price will never be negative. REFERENCES customers promises every order points at a real customer. UNIQUE promises no two rows collide on that value. Once written, these promises are unbreakable — not by your application, not by an admin script, not by a careless bulk import, not by anyone. The database simply refuses any write that would break them.

This contract framing reframes schema design from "declaring storage" to "declaring guarantees," and the shift is profound. A column without NOT NULL is a promise not made — a quiet admission that this value might be missing, with all the NULL-handling consequences that follow (Chapter 3). A missing CHECK is a rule left unenforced, an invitation for bad data to enter through whatever door your validation forgot. The discipline of good design is, in large part, the discipline of asking — for every column and every relationship — "what is always true here, and have I told the database?" Each true answer you encode becomes a category of bug that can never occur, a class of corruption made structurally impossible rather than merely discouraged.

The alternative — relying on application code to enforce these rules — fails for a reason worth understanding: the database is the one thing every path to the data must pass through. Your web app validates the rating, but the mobile backend has a subtly different check, the admin tool has none, the nightly import bypasses all of them, and the psql session a developer opens at 2 a.m. answers to no validation at all. A CHECK constraint on the table guards every one of those doors simultaneously, because they all ultimately write through the database. This is why "validate in the application" is necessary but never sufficient: application validation improves user experience (a friendly error before the round trip), but only database constraints provide guarantees. The professional instinct is to enforce every rule you can at the database level, and treat application validation as a courtesy layer on top, not the line of defense.

So when you write DDL, you are not merely arranging storage — you are deciding what shall be true of your data for as long as the table exists. That is why theme #1, design is the most important skill, lands most heavily here. The query you write today you can rewrite tomorrow; the schema you design today, once it holds production data, is something you will live with and migrate around (Chapter 22) for years. Spend the care up front. Every promise you encode now is a problem you never have later.


Data types in depth

Choosing a column's type is the first and most consequential design decision, because the type defines the domain of legal values and, often, the correctness of the data itself. PostgreSQL's type system is rich, and using it well is a real skill. The numeric types divide into exact and approximate: integer and bigint for whole numbers, numeric(p,s) for exact decimals, and real/double precision for approximate floating-point. The single most important rule in this entire area is that money and any value requiring exactness must be numeric, never a float — floating-point types cannot represent values like 0.10 exactly, so sums drift by fractions of a cent and eventually someone is billed wrong. The float types are for measurements and scientific data where tiny imprecision is acceptable; they are never for money.

For text, PostgreSQL's text type is the sensible default — it has no length limit and, crucially, no performance penalty compared to varchar(n). The habit of writing varchar(255) everywhere is imported from databases where length mattered for storage; in PostgreSQL it doesn't, so a length limit should reflect a genuine business rule (a country code is exactly two characters) rather than a reflexive cap. For dates and times, prefer timestamptz (timestamp with time zone) over plain timestamp in almost all cases — it stores an unambiguous moment regardless of the server's or client's zone, sparing you the off-by-hours bugs that plague zone-naive timestamps. Use date when you genuinely mean a calendar day with no time, and boolean for true/false rather than the char(1) 'Y'/'N' flags that other eras favored.

PostgreSQL also offers types that prevent whole categories of bad design. An enumerated type (CREATE TYPE loyalty AS ENUM ('standard','silver','gold','platinum')) constrains a column to a fixed set of values at the type level — though a text column with a CHECK ... IN (...) constraint, as Mercado uses, is often more flexible because it's easier to alter. The uuid type stores universally-unique identifiers compactly. jsonb stores nested, semi-structured data efficiently and queryably (Chapter 16) — the right choice for genuinely variable attributes, and far better than a sprawl of mostly-NULL columns or, worse, JSON crammed into a text field. Array types store ordered lists natively. Each of these is a tool for matching the column's type to the data's true shape, which is the essence of using a type system well: the closer the type fits the domain, the more the database can validate, optimize, and protect on your behalf.

The meta-lesson is that type choice is domain modeling, not a clerical step. A thoughtfully-typed schema rejects nonsense automatically — you cannot store "twenty dollars" in a numeric, cannot put a thirteenth month in a date, cannot duplicate a uuid primary key. A carelessly-typed schema (everything text, money as float, dates as strings) pushes all that validation into application code where it's done inconsistently and incompletely. Appendix D is the full type reference; the instinct to cultivate now is to pause at every column and ask "what is the true domain of this value, and which type expresses it most precisely?" That question, asked consistently, is most of what separates a robust schema from a fragile one.


ALTER in production: the danger you must respect

ALTER TABLE is where DDL gets genuinely dangerous, because changing the structure of a table that already holds data — especially a table serving live production traffic — ranges from trivially safe to catastrophically disruptive depending on exactly what you change. Understanding the spectrum is essential, and it's the bridge to the migration discipline of Chapter 22.

At the safe end: adding a nullable column is fast and non-disruptive — PostgreSQL records the new column in the catalog without touching existing rows. Adding a column with a constant default is also fast in modern PostgreSQL (it stores the default in the catalog rather than rewriting every row). At the dangerous end: adding a NOT NULL column with no default to a populated table fails outright, because the existing rows would have no value for it — the safe pattern is to add it nullable, backfill the values, then apply NOT NULL separately. Adding a CHECK constraint or foreign key forces PostgreSQL to validate every existing row, which can take a long time and hold locks on a large table. Changing a column's type may require rewriting the entire table and can fail if existing data doesn't convert. Dropping a column discards its data irreversibly.

The subtle killer is locking. Many ALTER operations take a strong lock on the table while they run, and on a busy production system, a lock held for even a few seconds can queue up every query behind it, cascading into timeouts and an outage — not because the change was wrong, but because it blocked everything else while it ran. An ALTER that's instant on your empty dev database can lock a million-row production table for minutes. This gap between "works in dev" and "safe in production" is exactly why schema changes on live systems are a discipline unto themselves (Chapter 22), with techniques like adding constraints as NOT VALID then validating separately, creating indexes CONCURRENTLY, and the expand-contract pattern for breaking changes.

For now, internalize the principle: ALTER TABLE is not uniformly safe, and the danger scales with table size and traffic. On a fresh table or a dev database, alter freely. On a populated production table, every alter deserves the question "what does this lock, and for how long?" — and anything beyond adding a nullable column deserves a tested migration plan rather than an ad-hoc statement typed into a production console. The same blast-radius thinking from Chapter 13's DML applies to DDL: know what your statement touches before you run it, and respect that structural changes to live tables are among the riskiest operations in all of database work.


Schemas: organizing a larger database

As a database grows beyond a handful of tables, the flat public namespace starts to feel cluttered, and schemas — namespaces within a database — become an organizing tool worth understanding. (Note the unfortunate terminology collision: "schema" here means a namespace, distinct from "the schema" meaning your overall table design. Context disambiguates, but the double meaning trips people up.) A schema is essentially a folder for database objects, letting you group related tables and avoid name clashes: sales.orders and archive.orders can coexist as two different tables, distinguished by their schema prefix.

Schemas serve several real purposes. They organize large databases by domain — a system might keep transactional tables in one schema, reporting tables in another, and staging tables in a third, so the hundreds of objects in a mature database aren't an undifferentiated heap. They enable multi-tenancy patterns where each tenant's data lives in its own schema (Chapter 21). And they're a unit of access control — you can grant a role access to an entire schema at once, or restrict a reporting user to the reporting schema while the transactional tables stay locked down (Chapter 32). The search_path setting controls which schemas unqualified names are looked up in, which is why a bare orders resolves to public.orders by default — and why a misconfigured search_path can cause the baffling "table not found, but I know it exists" mystery (it exists in a schema not on the path).

For your progressive project and for learning, keeping everything in public is perfectly fine and keeps things simple — schemas are an organizational tool you reach for when scale demands it, not a requirement. But knowing they exist matters, because the day you join a team whose database greets you with hr.employees, finance.invoices, and analytics.daily_rollup, you'll understand immediately that these are three namespaces organizing one database, not three separate databases, and that a bare table name resolves according to the search_path. That orientation — schemas as folders, search_path as the lookup order — turns an intimidating enterprise database into a navigable one.


From DDL to migrations

A final, forward-looking point ties this chapter to a discipline you'll meet in Part III. Everything in this chapter — CREATE, ALTER, DROP — you've practiced as statements typed directly into a database. That's exactly right for learning and for your own dev database. But in any serious project, with a team and a production system, you do not change the schema by typing ad-hoc DDL into a console. You change it through migrations: versioned, ordered, reviewed files of DDL that move the schema from one known state to the next, can be applied consistently across every environment (each developer's machine, staging, production), and can be rolled back.

The reason is everything this chapter has hinted at. Ad-hoc DDL on production is dangerous (the locking and blast-radius issues with ALTER and DROP), unrepeatable (the change you made on production exists nowhere else, so the next environment drifts out of sync), and unreviewable (no one checked it before it ran on live data). Migrations solve all three: the DDL lives in version control as a reviewed file, runs identically everywhere, and forms an ordered history of how the schema evolved. The CREATE TABLE and ALTER TABLE statements are exactly the same; what changes is the discipline around them — they're written into versioned files, applied through a tool, and tested on production-like data before they touch production.

So think of this chapter as teaching the vocabulary (the DDL statements themselves) and Chapter 22 as teaching the process (how to apply them safely and repeatably as a team). The skills compound: you can't write good migrations without fluent DDL, and fluent DDL without the migration discipline is a production hazard. For now, practice the statements freely on your own database to build that fluency; just know that the moment real data and teammates are involved, those same statements graduate into the more careful world of migrations. The DDL is the same; the stakes, and therefore the process, change.


Progressive project: build your real schema

This is the moment your project becomes real. Take the entities, keys, and constraints you've been sketching since Chapter 1 and write the CREATE TABLE statements:

  1. Create each table with appropriate types (money as numeric, timestamps as timestamptz, etc.).
  2. Add every constraint your rules require: PRIMARY KEY (prefer a surrogate IDENTITY), NOT NULL, UNIQUE on natural keys, CHECK for valid values, and FOREIGN KEYs with deliberate ON DELETE actions.
  3. Insert a handful of rows (Chapter 13) into each, and try to violate each constraint to confirm it's enforced (insert a duplicate, a bad value, a dangling reference).
  4. Save the full DDL as schema.sql for your project — you now have a real, well-constrained database.

This DDL is the spine of everything you'll build in Part III (where you'll refine it) and the capstone (Chapter 39).


Generated columns, sequences, and identity

PostgreSQL offers several ways to have the database compute or supply values automatically, and choosing among them well is part of schema craft. You've seen GENERATED ALWAYS AS IDENTITY on every Mercado primary key — it draws values from an internal sequence, a special object that hands out ever-increasing numbers. The ALWAYS variant is the safer default: it forbids inserting your own value (you must explicitly say OVERRIDING SYSTEM VALUE to do so), which prevents the classic bug where a manually-supplied id collides with one the sequence later generates. The BY DEFAULT variant permits manual values, useful for data loads that carry their own ids, at the cost of that collision risk. Prefer ALWAYS unless you have a specific reason.

A distinct and underused feature is the generated column — a column whose value is computed from other columns and stored automatically, kept correct on every write:

ALTER TABLE order_items
ADD COLUMN line_total numeric(12,2)
GENERATED ALWAYS AS (quantity * unit_price * (1 - discount)) STORED;

Now line_total is always exactly quantity * unit_price * (1 - discount), recomputed whenever those inputs change, and impossible to set wrong because you can't write to it directly. This is a beautiful tool for derived values that must stay consistent: rather than computing the line total in every query (and risking inconsistency if someone forgets the discount), the database guarantees it. Generated columns are the constraint-style answer to derived data — the value is correct by construction, not by the diligence of every query author. (For values derived from other tables rather than the same row, you'd reach for a trigger or a maintained column instead, since generated columns can only reference their own row.)

The broader principle these features share is let the database supply what the database can guarantee. An identity column guarantees unique keys better than application-generated ids; a DEFAULT now() guarantees a creation timestamp better than hoping every insert remembers to set one; a generated column guarantees a derived value better than recomputing it everywhere. Each moves a correctness responsibility from fallible application code into the database's unbreakable machinery. Recognizing where you can hand a guarantee to the database — and taking it every time — is a quiet but real part of designing schemas that stay correct under the pressure of real, messy, multi-path usage.


A worked scenario: designing a table from requirements

Let's practice the whole DDL skill on a concrete requirement, because design is best learned by doing it. Suppose Mercado needs to add gift cards: each has a unique code, a starting balance, a current balance, an optional expiry date, a status, the customer it was issued to (if any), and a creation timestamp. Watch how each requirement becomes a column-plus-constraint decision — the contract-writing from earlier, applied.

"A unique code" → a text column with UNIQUE NOT NULL (the code is a natural key; codes must be present and distinct). "A starting balance" and "current balance" → numeric(10,2) NOT NULL columns (money, so numeric; always present), each with CHECK (>= 0) (a balance can't go negative). "An optional expiry date" → a nullable date (optional means no NOT NULL; a calendar day means date, not timestamptz). "A status" → a text column with NOT NULL DEFAULT 'active' and CHECK (status IN ('active','redeemed','expired','void')) (constrained to known values, defaulting sensibly). "The customer it was issued to, if any" → a nullable foreign key REFERENCES customers ON DELETE SET NULL (optional, so nullable; if the customer is deleted, the card survives unlinked rather than vanishing). "A creation timestamp" → timestamptz NOT NULL DEFAULT now().

CREATE TABLE gift_cards (
    gift_card_id  integer GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
    code          text          NOT NULL UNIQUE,
    start_balance numeric(10,2) NOT NULL CHECK (start_balance >= 0),
    balance       numeric(10,2) NOT NULL CHECK (balance >= 0),
    expires_on    date,
    status        text          NOT NULL DEFAULT 'active'
                                CHECK (status IN ('active','redeemed','expired','void')),
    customer_id   integer       REFERENCES customers (customer_id) ON DELETE SET NULL,
    created_at    timestamptz   NOT NULL DEFAULT now(),
    CONSTRAINT balance_within_start CHECK (balance <= start_balance)   -- table constraint
);

Notice the final table constraint, balance <= start_balance — a rule spanning two columns, so it can't be inline on either one; it must be a table-level constraint. And notice that surrogate key (gift_card_id identity) alongside the natural key (code UNIQUE), exactly the pattern from Chapter 3: a stable internal id for relationships, with the meaningful natural key protected by UNIQUE. Every requirement became a guarantee the database now enforces. A gift card with a negative balance, a balance exceeding its start, an unknown status, a duplicate code, or no code at all is now unrepresentable — not discouraged, but impossible. That is what it means to design a table well: you translate the requirements into a contract so complete that bad data has nowhere to enter. This is the exact skill your progressive project now needs, and the work Part III will refine into full schema design.


DROP and managing dependencies

DROP is the most destructive DDL — it removes structures entirely, with no WHERE and no row-by-row mercy — and its main subtlety is dependencies. Database objects depend on each other: a view depends on the tables it queries, a foreign key depends on the table it references, an index depends on its table. When you try to drop something other objects depend on, PostgreSQL by default refuses, protecting you from accidentally breaking dependents. This is a safety feature, and the refusal is informative — it tells you exactly what depends on the object you're trying to remove.

The dangerous escape hatch is DROP ... CASCADE, which drops the object and everything that depends on it — the view that used the table, the foreign keys that referenced it, possibly more than you realize. CASCADE is sometimes necessary, but reaching for it to "make the error go away" is how people accidentally drop a dozen views they forgot existed. The disciplined approach is to first understand the dependencies — \d+ objectname shows what depends on an object — and only cascade when you genuinely intend to remove the whole dependency tree. The safer habit is to drop dependents explicitly, in order, so each removal is a conscious decision rather than a silent cascade. The IF EXISTS modifier (DROP TABLE IF EXISTS x) prevents an error when the object might not exist, useful in scripts that need to be re-runnable.

The removal hierarchy is worth keeping clear in your mind, because the three levels are easy to confuse and very different in consequence: DELETE removes some rows and keeps the table; TRUNCATE removes all rows fast and keeps the table; DROP TABLE removes the table itself, structure and data and dependents together. Reaching for a more destructive level than you intend is a classic mistake — DROPping a table when you meant to TRUNCATE it, or TRUNCATEing when you meant a selective DELETE. Each level has its place (selective cleanup, fast emptying of staging tables, removing obsolete structures respectively), but the destructiveness escalates sharply, and DROP sits at the top. Treat it with the caution that finality deserves: in production, a DROP should be a planned, reviewed change (Chapter 22), never an ad-hoc keystroke.


A field guide to DDL pitfalls

A few DDL mistakes recur often enough to warrant a checklist. The NOT NULL-without-default add on a populated table: ALTER TABLE t ADD COLUMN c text NOT NULL fails when rows exist, because they'd have no value. The fix is the three-step dance — add nullable (or with a default), backfill, then set NOT NULL. varchar(255) reflexively: a length limit should encode a real rule, not a habit; text is the right default in PostgreSQL with no performance cost. Money as float: the cardinal sin, causing cent-level drift; always numeric for exact values. Positional reliance anywhere DDL interacts with data (an INSERT INTO t VALUES (...) against a table whose columns you defined positionally): name your columns. DROP ... CASCADE to silence an error: it can remove far more than you intend; inspect dependencies first with \d+. Unnamed constraints: anonymous constraints get auto-generated names that make errors cryptic and later ALTERs awkward; name important constraints (CONSTRAINT positive_balance CHECK (...)) for clear error messages and easier maintenance.

The throughline of these pitfalls is that DDL changes are consequential and often hard to reverse — a dropped column's data is gone, a mis-typed money column corrupts silently, a careless CASCADE removes dependents. The defense is the same blast-radius awareness that governs DML: before running a DDL statement, especially on a table with data, know what it will do, what it locks, and what depends on it. On your own learning database, experiment freely — that's how fluency is built. The caution applies in proportion to whether real data and real users are on the other end of the statement, and it scales all the way up to the formal migration discipline of Chapter 22.


A note on portability

DDL is moderately portable in its concepts and less so in its specifics, which is worth knowing as your career crosses databases. The core idea — CREATE TABLE with typed columns and constraints — is universal, and the constraint kinds (NOT NULL, UNIQUE, PRIMARY KEY, CHECK, FOREIGN KEY) are standard SQL supported everywhere. The structure of a CREATE TABLE you write for PostgreSQL would be recognizable, and largely portable, to anyone working in any relational database.

The divergences are in type names and auto-increment syntax, the two least-standardized corners. PostgreSQL's text, timestamptz, numeric, boolean, jsonb, and uuid have different spellings elsewhere — SQL Server's nvarchar/datetime2/bit, Oracle's VARCHAR2/NUMBER/TIMESTAMP, MySQL's own set. Auto-generated keys vary too: the SQL-standard GENERATED ... AS IDENTITY (which PostgreSQL supports and you should prefer) competes with MySQL's AUTO_INCREMENT, SQL Server's IDENTITY(1,1), and the legacy serial pseudo-type. Referential-action support and ALTER capabilities also differ in detail. Appendix J maps the common translations.

The reassuring part is that the thinking — choosing types to model domains precisely, encoding rules as constraints, designing foreign keys with deliberate referential actions, treating the schema as a contract — transfers completely. A well-designed schema is well-designed in any relational database; only the spelling of the CREATE TABLE changes. Learn to design rigorously on PostgreSQL, and moving to another database is a matter of looking up type names and identity syntax, not relearning how to model data. That conceptual portability is the real asset; the syntax is a lookup away. Master the design thinking once, and every relational database becomes a dialect you can pick up rather than a new language you must learn from scratch.


Summary

DDL (CREATE/ALTER/DROP) defines structure. CREATE TABLE turns the relational model into reality: columns with types (the right type is a correctness decision — numeric for money, timestamptz for time, text over varchar(255)) and constraints (NOT NULL, DEFAULT, UNIQUE, PRIMARY KEY, CHECK, FOREIGN KEY with deliberate ON DELETE actions) that make integrity automatic. IDENTITY auto-generates keys via sequences. ALTER TABLE evolves structure — but some changes are risky on populated tables (a NOT NULL add needs a default or backfill), which is why migrations (Chapter 22) are a discipline. DROP removes structures (with CASCADE's real dangers). Schemas namespace objects. The constraints you choose are your data's contract — declaring them well is the heart of design.

You can now: - Write CREATE TABLE with appropriate types and all constraint kinds (column and table). - Choose correct data types (money, time, text, json) and explain why. - Define foreign keys with the right referential actions (CASCADE/RESTRICT/SET NULL). - Auto-generate keys with GENERATED ... AS IDENTITY. - Use ALTER TABLE and recognize which changes are dangerous on live data. - Use DROP safely and understand schemas/search_path.

What's next. Chapter 15 — Views, Materialized Views, and Functions — reusable SQL objects: views that name and hide complex queries, materialized views that cache results, and user-defined functions and triggers that put logic in the database.


Practice in exercises.md, test yourself with the quiz, apply it in the case studies, review the key takeaways, and go deeper with further reading.