Chapter 13: Data Modification — INSERT, UPDATE, DELETE, and UPSERT

DataField.Dev

27 min read

> Where you are: Part II, Chapter 13 of 40. For twelve chapters you've read data. Now you'll change it — add rows, modify them, remove them. With that power comes the most famous footgun in all of databases: the UPDATE (or DELETE) without a WHERE.

In This Chapter

Reading is safe; writing is not
INSERT: adding rows
UPDATE: changing existing rows
DELETE: removing rows
UPSERT: insert or update in one step
Transactions: your safety net (preview of Chapter 26)
A mental model: the four verbs and their blast radius
INSERT in depth
RETURNING: closing the loop on a modification
UPDATE patterns that go beyond the basics
The safe-modification workflow, in full
A performance note (preview of Chapters 23 & 31)
DELETE, foreign keys, and cascades in practice
UPSERT in depth, and MERGE
Idempotency: writes you can safely repeat
Batching large modifications
A worked scenario: placing an order safely
Progressive project: change your data
A field guide to DML disasters
DML across the standard and a note on portability
Summary

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

Chapter 13: Data Modification — INSERT, UPDATE, DELETE, and UPSERT

Where you are: Part II, Chapter 13 of 40. For twelve chapters you've read data. Now you'll change it — add rows, modify them, remove them. With that power comes the most famous footgun in all of databases: the UPDATE (or DELETE) without a WHERE.

Learning paths: 💻 📊 🔬 🏗️ — everyone. 💻 developers and 🏗️ DBAs especially: safe data modification is a professional habit, not a nicety.

Reading is safe; writing is not

Queries (SELECT) are read-only — run a wrong one and you get a wrong answer, but the data is untouched. Data Modification Language (DML) — INSERT, UPDATE, DELETE — changes the database. A mistake here doesn't return a wrong answer; it corrupts or destroys data. So this chapter is as much about habits (transactions, testing your WHERE first) as about syntax.

The good news: every protection you learned in Chapter 3 is still on duty. Constraints (NOT NULL, CHECK, foreign keys, UNIQUE) guard every modification — you cannot INSERT a negative price or UPDATE a rating to 7. The database defends its integrity even as you change it.

INSERT: adding rows

Insert a single row by naming columns and values:

INSERT INTO customers (first_name, last_name, email, loyalty_tier)
VALUES ('Maya', 'Singh', 'maya.singh@example.com', 'silver');

You don't list customer_id (it's GENERATED ALWAYS AS IDENTITY — PostgreSQL assigns it) or created_at (it has a DEFAULT now()). Columns you omit get their default, or NULL if they have none and allow it.

Insert multiple rows in one statement (faster than many separate inserts):

INSERT INTO categories (name, slug, parent_category_id) VALUES
    ('Tablets', 'tablets', 2),
    ('Monitors', 'monitors', 2),
    ('Keyboards', 'keyboards', 2);

Insert the result of a query with INSERT ... SELECT — e.g., copy rows into an archive table:

INSERT INTO archived_orders (order_id, customer_id, order_date)
SELECT order_id, customer_id, order_date
FROM orders
WHERE status = 'cancelled';

Common mistake. Column/value count or order mismatch. Always name the columns in INSERT (INSERT INTO t (a, b) VALUES (...)), never rely on positional INSERT INTO t VALUES (...) — the latter breaks silently when the table's column order changes (and it's unreadable). Naming columns is the same discipline as naming them in SELECT (Chapter 5).

RETURNING: get back what you inserted

A PostgreSQL superpower: RETURNING hands back columns from the rows you just modified — perfect for capturing a generated id without a second query:

INSERT INTO customers (first_name, last_name, email)
VALUES ('Theo', 'Park', 'theo.park@example.com')
RETURNING customer_id, created_at;

 customer_id |        created_at
-------------+------------------------
          13 | 2026-06-22 10:15:00+00

RETURNING works on UPDATE and DELETE too. It saves a round trip and is invaluable in application code (Chapter 29).

Dialect Difference. RETURNING is PostgreSQL (and Oracle); SQL Server uses OUTPUT; MySQL historically had LAST_INSERT_ID(). See Appendix J.

UPDATE: changing existing rows

UPDATE sets new values for rows matching a WHERE:

UPDATE products
SET price = price * 1.10           -- 10% price increase
WHERE category_id = 5;             -- only Audio products

You can set multiple columns at once, and use the row's current values in the expression:

UPDATE customers
SET loyalty_tier = 'gold',
    phone = COALESCE(phone, 'unknown')
WHERE customer_id = 2;

Update from another table with UPDATE ... FROM — e.g., bump the tier of every customer who has spent over a threshold:

UPDATE customers c
SET loyalty_tier = 'platinum'
FROM (
    SELECT o.customer_id, SUM(oi.quantity * oi.unit_price) AS spent
    FROM orders o JOIN order_items oi ON oi.order_id = o.order_id
    GROUP BY o.customer_id
) totals
WHERE totals.customer_id = c.customer_id
  AND totals.spent > 5000;

⚠️ The single most dangerous statement in SQL. An UPDATE with no WHERE updates every row in the table: sql UPDATE products SET price = 0; -- sets EVERY product's price to 0. No undo (unless in a transaction). This is not hypothetical — it is a recurring cause of real production disasters. Before running any UPDATE or DELETE, run the same WHERE as a SELECT first to see exactly which rows you'll affect, and wrap destructive changes in a transaction (below) so you can ROLLBACK.

DELETE: removing rows

DELETE removes rows matching a WHERE:

DELETE FROM reviews WHERE rating IS NULL;
DELETE FROM orders WHERE status = 'cancelled' AND order_date < '2023-01-01';

The same lethal rule applies: DELETE FROM orders; with no WHERE deletes every order. Test the WHERE with a SELECT first, every time.

DELETE also respects foreign keys. Deleting a customer that orders reference will fail (or cascade, depending on the FK's ON DELETE rule — Chapter 14). In Mercado, deleting a customer cascades to their addresses (ON DELETE CASCADE) but is blocked if orders reference them — a deliberate safety design.

DELETE vs TRUNCATE

To empty an entire table, TRUNCATE is far faster than DELETE (it doesn't scan and remove rows one by one; it deallocates the storage):

TRUNCATE TABLE staging_import;                 -- empties the table, fast
TRUNCATE TABLE staging_import RESTART IDENTITY;  -- also reset the id counter

	`DELETE FROM t`	`TRUNCATE t`
Removes	rows matching `WHERE` (or all)	all rows (no `WHERE`)
Speed on whole table	slow (row by row)	very fast
Resets identity	no	optionally (`RESTART IDENTITY`)
Fires row triggers	yes	no
Can target some rows	yes (`WHERE`)	no

Common mistake. Reaching for TRUNCATE when you only meant to delete some rows — TRUNCATE has no WHERE and removes everything. Use DELETE ... WHERE for selective removal; TRUNCATE only when you genuinely want the whole table empty (typically staging/temp tables).

UPSERT: insert or update in one step

A common need: "insert this row, but if it already exists, update it instead." Doing it with a check-then-insert is racy (two sessions can both check, both insert, one fails). PostgreSQL's INSERT ... ON CONFLICT ("upsert") does it atomically:

-- Set/refresh inventory for a (product, warehouse); insert if new, update if exists
INSERT INTO inventory (product_id, warehouse_id, quantity)
VALUES (1, 1, 50)
ON CONFLICT (product_id, warehouse_id)        -- the unique/PK columns that may clash
DO UPDATE SET quantity = EXCLUDED.quantity;    -- EXCLUDED = the row we tried to insert

ON CONFLICT (cols) names the unique constraint/columns that might collide.
DO UPDATE SET ... updates the existing row; EXCLUDED refers to the values from the attempted insert.
DO NOTHING silently skips the conflict (insert-if-absent): sql INSERT INTO categories (name, slug) VALUES ('Books', 'books') ON CONFLICT (slug) DO NOTHING;

Upsert is the right tool for idempotent loads, "set this setting," counters, and sync jobs — anywhere you'd otherwise check-then-write.

Dialect Difference. ON CONFLICT is PostgreSQL/SQLite; MySQL uses INSERT ... ON DUPLICATE KEY UPDATE; the SQL standard MERGE is supported by SQL Server, Oracle, and PostgreSQL 15+. See Appendix J.

Transactions: your safety net (preview of Chapter 26)

Every modification happens inside a transaction — by default, each statement is its own auto-committed transaction. But you can group statements and, crucially, undo them:

BEGIN;                                  -- start a transaction
UPDATE products SET price = price * 1.10 WHERE category_id = 5;
SELECT name, price FROM products WHERE category_id = 5;   -- check the result
-- happy? COMMIT;   -- not happy? ROLLBACK;  (undoes everything since BEGIN)
ROLLBACK;

Two habits this enables, both career-saving:

Wrap risky changes in BEGIN ... and verify before COMMIT. If the result is wrong, ROLLBACK and nothing happened.
Atomicity: multiple related changes (debit one account, credit another) either all happen or none do. This is the heart of Chapter 26; for now, know that BEGIN/COMMIT/ROLLBACK are your undo button for data changes.

Why this matters. The difference between a junior who runs UPDATE and prays, and a professional who runs the SELECT first, wraps it in BEGIN, verifies, then COMMITs, is the difference between an occasional catastrophe and a calm career. Make the safe sequence your reflex.

A mental model: the four verbs and their blast radius

Data modification has only four verbs — INSERT, UPDATE, DELETE, and TRUNCATE — and the most useful way to hold them in mind is by their blast radius: how much damage a mistake can do. INSERT is the gentlest; a wrong insert adds bad rows, but the existing data is untouched and the mistake is usually reversible by deleting what you added. UPDATE and DELETE are dangerous in direct proportion to their WHERE clause — a precise WHERE touches a few rows, a missing WHERE touches all of them, irreversibly (outside a transaction). TRUNCATE is the bluntest instrument of all: it empties an entire table by design, with no WHERE to limit it and, in most configurations, no row-level undo.

This blast-radius framing isn't just a mnemonic — it should govern how carefully you treat each statement. An INSERT you can run with relative confidence. An UPDATE or DELETE deserves the full safety ritual every single time: preview the WHERE with a SELECT, wrap in a transaction, verify, then commit. A TRUNCATE deserves a moment of genuine pause and a double-check that you're on the right table in the right database — because it's both fast and final. The amount of caution should scale with the radius, and internalizing that scale is what keeps the dangerous verbs from becoming career-defining mistakes.

The other half of the model is that every modification runs the gauntlet of the database's integrity rules. When you INSERT or UPDATE, PostgreSQL checks the row against every relevant constraint — types, NOT NULL, CHECK, UNIQUE, foreign keys — and rejects the entire statement if any rule is violated. When you DELETE, it checks whether other rows depend on what you're removing (foreign keys) and either blocks, cascades, or nullifies per the referential action. This means your data's integrity is defended even during modification: you cannot, no matter how careless the statement, leave the database in a state that violates its declared rules. The constraints you designed in Chapter 3 are working for you on every write. That safety net is real and valuable — but it protects structural integrity, not intent. A DELETE FROM orders; with no WHERE violates no constraint; it's perfectly valid SQL that simply does something catastrophic. The constraints can't save you from a correct statement that does the wrong thing — only your habits can.

INSERT in depth

INSERT looks simple, but a few capabilities make it far more powerful than "add one row," and knowing them saves you from clumsier alternatives. The first is multi-row insertion: listing several tuples of values in one statement is not just tidier than separate inserts — it's substantially faster, because the whole batch is one statement, one trip to the database, one transaction, rather than N of each. For loading a handful to a few thousand rows, multi-row INSERT is the right tool (beyond that, COPY from Chapter 31 takes over).

The second is INSERT ... SELECT, which inserts the result of a query — the foundation of archiving, copying, and transforming data between tables. "Copy all cancelled orders into an archive table" is a single INSERT INTO archive SELECT ... WHERE status = 'cancelled'. Because the source is a full query, you can join, filter, aggregate, and transform on the way in — INSERT ... SELECT is how a great deal of in-database data movement happens, and it's the seed of the ETL patterns in Chapter 31. The third is defaults and generated values: columns you omit from the column list take their DEFAULT (or NULL), which is why you never list customer_id (it's GENERATED ALWAYS AS IDENTITY) or created_at (it defaults to now()). Letting the database fill these is both convenient and correct — the identity column guarantees uniqueness in a way you couldn't reliably manage by hand.

The cardinal habit for INSERT is to always name your columns explicitly. INSERT INTO customers (first_name, last_name, email) VALUES (...) is self-documenting and robust; the positional INSERT INTO customers VALUES (...) is fragile — it breaks silently the day someone adds or reorders a column, suddenly mapping your values to the wrong fields. This is the same discipline as naming columns in SELECT rather than using *: explicitness costs a few keystrokes and buys resilience against schema change. In production code especially, named-column INSERT is non-negotiable, because the schema will evolve (Chapter 22), and positional inserts are a time bomb waiting for that evolution.

RETURNING: closing the loop on a modification

RETURNING is one of PostgreSQL's most beloved extensions, and it deserves emphasis because it elegantly solves a problem that's awkward in databases without it: knowing what your modification actually did. When you insert a row whose primary key is database-generated, you often need that new id immediately — to insert related rows, to return it to the application, to log it. Without RETURNING, you'd insert and then run a second query to find the id you just created — which is both wasteful and racy (between the insert and the select, another session might insert too). RETURNING hands the generated values straight back from the insert itself, atomically and in one round trip.

-- Insert an order and immediately get its generated id and timestamp:
INSERT INTO orders (customer_id, status)
VALUES (7, 'pending')
RETURNING order_id, order_date;

RETURNING works on all three of INSERT, UPDATE, and DELETE, and it returns the affected rows' columns — including computed and defaulted ones. On an UPDATE it can return both the changed rows and any computed columns; on a DELETE it returns what was removed, which is invaluable for logging or for the "move rows between tables" pattern (delete with RETURNING, feeding an insert — the writable-CTE trick from Chapter 11). In application code (Chapter 29), RETURNING is the clean way to get a newly-created record's server-assigned fields back without a follow-up query, and it's a major reason PostgreSQL feels pleasant to build on. Whenever you find yourself about to write "insert, then select to find what I inserted," RETURNING is the answer.

UPDATE patterns that go beyond the basics

Real UPDATE work often needs more than "set this column to this constant." Three patterns cover most of it. The first is expressions referencing the current value: SET price = price * 1.10 raises each matched row's price by 10%, computing from the row's existing data. The second is conditional updates with CASE: you can set different values per row in a single statement, like tiering customers by spend in one pass with a CASE in the SET. The third, and most powerful, is UPDATE ... FROM, which updates one table using data joined from another (or from a subquery) — the SQL way to "set each customer's tier based on their computed total spend," where the spend comes from aggregating their orders.

-- Update customers' tiers from a computed-spend subquery (UPDATE ... FROM):
UPDATE customers c
SET loyalty_tier = 'platinum'
FROM (SELECT customer_id, SUM(total_amount) AS spent FROM orders GROUP BY customer_id) t
WHERE t.customer_id = c.customer_id AND t.spent > 5000;

UPDATE ... FROM is the workhorse for data-correction and enrichment tasks — propagating a computed value, syncing one table from another, applying a bulk reclassification. Its one subtlety is that the join in the FROM/WHERE must match each target row to at most one source row; if the join produces multiple matches per target, which one "wins" is unspecified, so ensure your join keys are unique on the source side. As with all UPDATEs, the safe-modification ritual applies doubly here, because UPDATE ... FROM can touch many rows in ways that are hard to eyeball: preview with the equivalent SELECT (turn the UPDATE ... FROM ... WHERE into a SELECT ... FROM ... JOIN ... WHERE to see exactly which rows match and what they'd become) before you run the real thing.

The safe-modification workflow, in full

The chapter has mentioned the safety ritual repeatedly; it's worth laying out completely, because it's the single most valuable professional habit in this entire chapter — more valuable than any syntax. Here is the sequence a careful practitioner runs for any non-trivial UPDATE or DELETE, especially on production data.

First, preview with a SELECT. Take the exact WHERE clause of your intended modification and run it as SELECT * FROM table WHERE <your condition>. Look at the rows. Are they the ones you meant? Is the count what you expected? This single step catches the overwhelming majority of mistakes — the forgotten condition, the wrong operator, the NULL interaction that includes or excludes more than you thought — before any data changes. Second, wrap in a transaction. BEGIN; before the modification means nothing is permanent until you say so. Third, run the modification and verify. Execute the UPDATE/DELETE, then run a SELECT to confirm the result is what you intended — check the changed rows look right, and that the row count affected matches your preview. Fourth, commit or roll back. If everything checks out, COMMIT;. If anything looks wrong — wrong rows, wrong count, wrong values — ROLLBACK; and the database is exactly as it was before you started, as if nothing happened.

-- The professional reflex, every time:
SELECT count(*) FROM products WHERE category_id = 5;   -- 1. preview the target
BEGIN;                                                  -- 2. safety net on
UPDATE products SET price = price * 1.10 WHERE category_id = 5;   -- 3. modify
SELECT name, price FROM products WHERE category_id = 5; -- 3. verify
COMMIT;   -- 4. happy → commit;   unhappy → ROLLBACK;

This workflow feels slow the first few times and then becomes second nature, adding only seconds while eliminating the category of mistake that ends careers and causes 3 a.m. incident calls. The difference between a junior who types UPDATE ... and hits enter hopefully, and a senior who previews, wraps, verifies, and commits, is exactly this ritual — and it's entirely learnable. Make it your unbreakable habit now, on Mercado where mistakes are harmless, so that it's automatic later when the data is real and the stakes are high. The constraints protect structural integrity; this workflow protects against valid-but-wrong statements, which is the danger the database can't catch for you.

A performance note (preview of Chapters 23 & 31)

Bulk inserts: many single-row INSERTs are slow (each is a round trip and a transaction). A multi-row INSERT, or better COPY (Chapter 31), loads data orders of magnitude faster.
Indexes cut both ways: indexes speed up the WHERE of an UPDATE/DELETE (finding the rows), but every index must be maintained on every write, so heavily-indexed tables write more slowly (Chapter 23).
Big UPDATE/DELETE on huge tables can lock rows and generate lots of WAL; sometimes batching (delete 10,000 at a time in a loop) is gentler on a busy system (Chapters 27–28).

DELETE, foreign keys, and cascades in practice

DELETE is where the relationships you designed in Chapter 3 assert themselves most visibly, and understanding the interaction is essential to deleting safely. When you delete a row that other rows reference through a foreign key, the database doesn't just let the references dangle — it enforces the referential action declared on that foreign key, and the behavior differs by design.

If the foreign key is ON DELETE RESTRICT (or the default NO ACTION), the delete is blocked whenever dependent rows exist — you cannot delete a customer who has orders, because the orders reference them and deleting the customer would orphan financial records. This is a feature: it prevents you from accidentally destroying the parent of existing children. If the foreign key is ON DELETE CASCADE, the delete propagates — deleting the parent automatically deletes its children. Mercado deletes a customer's addresses this way, because an address has no meaning without its customer. And ON DELETE SET NULL unlinks — the children survive but their foreign key becomes NULL, appropriate when the relationship is optional. The action you encounter is the one designed into the schema, encoding a real decision about the data's lifecycle.

This means a DELETE can have consequences far beyond the rows you named. A single DELETE FROM customers WHERE customer_id = 5 could, through a chain of cascades, remove that customer's addresses, their cart, their saved searches — everything declared to cascade. That ripple is exactly why CASCADE must be designed deliberately (Chapter 14) and why deleting from a table with cascading children warrants extra care: the blast radius includes everything downstream. Before deleting from a table that parents cascading relationships, it's worth knowing what will go with it. The safe-modification ritual extends here: previewing a cascading delete means understanding the whole subtree it will take, not just the row you typed. When in doubt, a transaction lets you delete, inspect the damage across all affected tables, and roll back if the ripple was larger than intended.

UPSERT in depth, and MERGE

The "insert if new, update if it exists" need is so common — and so easy to get wrong with a naive check-then-insert — that it deserves a closer look. The naive approach, "SELECT to check if the row exists, then INSERT or UPDATE based on the result," has a fatal flaw under concurrency: between your check and your write, another session can insert the same row, and your insert then fails (or, worse, you both proceed and corrupt data). This is a race condition, and it's precisely the kind of bug that passes every test and then strikes in production under load.

INSERT ... ON CONFLICT solves it atomically — the check and the action are a single indivisible operation that the database serializes correctly. ON CONFLICT (columns) DO UPDATE updates the existing row when the specified unique constraint would be violated, with the special EXCLUDED pseudo-row giving access to the values you tried to insert. ON CONFLICT (columns) DO NOTHING simply skips the conflict, which is the clean way to express "insert this if it's not already there." The columns named in ON CONFLICT must match an actual unique constraint or primary key — that's what defines "already exists."

-- Idempotent inventory set: insert, or overwrite the quantity if the pair exists.
INSERT INTO inventory (product_id, warehouse_id, quantity)
VALUES (1, 1, 50)
ON CONFLICT (product_id, warehouse_id)
DO UPDATE SET quantity = EXCLUDED.quantity;

PostgreSQL 15 also added the SQL-standard MERGE statement, a more general (and more verbose) tool that can INSERT, UPDATE, and DELETE in one statement based on whether source rows match target rows — useful for complex synchronization where the three actions are all needed at once. For the common "insert-or-update" case, ON CONFLICT is simpler and remains the idiomatic PostgreSQL choice; reach for MERGE when you genuinely need the three-way matched/not-matched logic, such as reconciling a target table against a source feed where some rows are new, some changed, and some gone. Both exist because upsert-style operations are everywhere in real systems — sync jobs, counters, settings, idempotent loaders — and doing them correctly under concurrency requires database-level atomicity, not application-level check-then-act.

Idempotency: writes you can safely repeat

A concept that DML makes concrete, and that matters enormously in real systems, is idempotency: an operation is idempotent if running it twice has the same effect as running it once. This sounds academic until you consider how often writes get retried in practice — a network blip makes a client unsure whether its request succeeded, so it retries; a batch job crashes halfway and restarts; a message queue delivers the same event twice. If your writes aren't idempotent, these ordinary occurrences cause duplicate rows, double-counted totals, and corrupted data.

INSERT ... ON CONFLICT DO NOTHING and DO UPDATE are the primary tools for idempotent writes. "Record that this user accepted the terms" written as a plain INSERT creates a duplicate if it runs twice; written as INSERT ... ON CONFLICT DO NOTHING, the second run is a harmless no-op. "Set this product's inventory to 50" written as ON CONFLICT DO UPDATE produces the same final state whether it runs once or five times. Designing writes to be idempotent — so that retries, restarts, and duplicate deliveries are safe — is a hallmark of robust systems, and the database features in this chapter are what make it achievable. Chapter 31 (ETL) leans heavily on this: a well-built data pipeline is idempotent precisely so that re-running it after a failure cleans up rather than duplicates.

The broader lesson is to think about writes not as one-time events but as operations that might happen more than once, and to design them so that's safe. Sometimes that means ON CONFLICT; sometimes it means a natural key that prevents duplicates; sometimes it means recording an "already processed" marker. The instinct to ask "what happens if this runs twice?" before deploying a write is one that separates systems that survive contact with the messy real world from those that quietly accumulate corruption. DML is where that instinct gets exercised, and the tools to satisfy it are exactly the upsert features above.

Batching large modifications

A subtlety that bites at scale: a single UPDATE or DELETE affecting millions of rows is not always the right approach, even when it's correct. Such a statement locks all the affected rows for the duration of the transaction, generates a large amount of write-ahead log (Chapter 28), and holds those locks until it commits — which on a busy production system can block other queries, balloon disk usage, and cause timeouts. The correct answer is sometimes correct in logic but wrong in operations.

The remedy is batching: instead of one giant statement, run many small ones, each affecting a manageable chunk (say, 10,000 rows), committing between them. This keeps each transaction short, releases locks frequently so other work can proceed, and keeps WAL generation steady rather than spiking. A typical pattern deletes or updates in a loop, processing a bounded number of rows per iteration until none remain:

-- Delete old rows in batches of 10,000, committing each batch (run repeatedly,
-- or in a procedural loop), so no single transaction locks the whole table.
DELETE FROM events
WHERE event_id IN (
    SELECT event_id FROM events WHERE created_at < '2020-01-01' LIMIT 10000
);
-- repeat until 0 rows affected

This is gentler on a live system than a single DELETE FROM events WHERE created_at < '2020-01-01' that might lock the table for minutes and generate gigabytes of WAL at once. Batching trades a little total throughput for dramatically better concurrency behavior — other users barely notice the maintenance happening. It's a preview of the operational thinking in Part IV (transactions, locking) and Chapter 22 (safe migrations), where "the statement is correct" and "the statement is safe to run on production at 2 p.m." turn out to be different bars. For now, register the principle: huge single modifications are a smell on busy systems, and batching is the usual cure.

A worked scenario: placing an order safely

Let's tie the chapter together with the operation Mercado performs most: placing an order. It's a perfect DML case study because it involves multiple related writes that must all succeed or all fail together — the very definition of why transactions exist. Placing an order means: insert the order, insert its line items (referencing the new order's id), decrement inventory for each product, and record a payment. If any step fails — say, an item is out of stock — none of it should happen; a half-placed order is worse than no order.

The solution wraps everything in a transaction and uses RETURNING to thread the new order's id into the line items:

BEGIN;

-- 1. Create the order, capturing its generated id.
WITH new_order AS (
    INSERT INTO orders (customer_id, status)
    VALUES (7, 'pending')
    RETURNING order_id
)
-- 2. Insert the line items, referencing the new order id.
INSERT INTO order_items (order_id, product_id, quantity, unit_price)
SELECT new_order.order_id, p.product_id, v.qty, p.price
FROM new_order
JOIN (VALUES (1, 2), (7, 1)) AS v(product_id, qty) ON true
JOIN products p ON p.product_id = v.product_id;

-- 3. Decrement inventory (would fail a CHECK if it went negative — good).
UPDATE inventory SET quantity = quantity - 2 WHERE product_id = 1 AND warehouse_id = 1;
UPDATE inventory SET quantity = quantity - 1 WHERE product_id = 7 AND warehouse_id = 1;

-- 4. Record the payment, then finalize.
INSERT INTO payments (order_id, amount, method)
SELECT order_id, /* computed total */ 0, 'card' FROM orders WHERE customer_id = 7 AND status = 'pending';

COMMIT;   -- all four steps commit together; any failure → ROLLBACK undoes everything

The crucial property is atomicity: because all four steps are inside one BEGIN ... COMMIT, they form a single unit. If the inventory UPDATE violated a CHECK (quantity >= 0) constraint because an item was oversold, the entire transaction aborts — no order, no line items, no inventory change, no payment. The customer simply gets "out of stock," and the database is exactly as it was before. This is the bedrock guarantee that makes databases trustworthy for business operations, and it's why the writable-CTE and RETURNING features matter: they let you compose a multi-step operation that's also a single atomic unit. Chapter 26 formalizes the transaction theory; here you've seen why it's indispensable — real operations are almost always several writes that must stand or fall together.

Progressive project: change your data

In your project:

INSERT a few rows (single and multi-row), using RETURNING to capture a generated id.
Write an UPDATE — and first run its WHERE as a SELECT to confirm the target rows.
Write a DELETE with a WHERE, wrapped in BEGIN ... ROLLBACK so you can test it safely.
Use an upsert (ON CONFLICT) for an idempotent operation (e.g., "set or update a setting/inventory count").
Note where you'd use TRUNCATE (staging tables only).

Add these to project-notes.md.

A field guide to DML disasters

Some mistakes in data modification are common enough, and damaging enough, to warrant a dedicated catalogue. Each has a clear cause and a clear prevention, and recognizing them is what keeps DML from biting you.

The missing WHERE. The archetypal disaster: UPDATE products SET price = 0 or DELETE FROM orders with no WHERE changes or destroys every row. It's caused by running a statement before finishing it, or by deleting the WHERE while editing. Prevention is the safe-modification ritual — preview with a SELECT, work inside a transaction — plus a healthy fear of the bare UPDATE/DELETE. Many teams configure their SQL clients to refuse UPDATE/DELETE without a WHERE as a backstop.

The wrong WHERE. Subtler and more common than the missing WHERE: a condition that matches more (or fewer) rows than intended — an OR where you meant AND, a NULL interaction that includes unexpected rows (three-valued logic, Chapter 5), a join in UPDATE ... FROM that matches multiple source rows per target. Prevention is the same preview-SELECT, with attention to the row count: if you expected to update 12 rows and the preview shows 1,200, stop.

Positional INSERT after a schema change. INSERT INTO t VALUES (...) without named columns silently maps values to the wrong columns the day someone reorders or adds a column. Prevention: always name columns in INSERT.

TRUNCATE when you meant selective DELETE. TRUNCATE has no WHERE and empties the whole table, fast and (usually) unrecoverably. Prevention: reserve TRUNCATE for tables you genuinely want fully emptied (staging, temp), and use DELETE ... WHERE whenever you mean "some rows."

NOT IN with NULL in a modification's subquery. DELETE FROM x WHERE id NOT IN (SELECT ... ) returns nothing — deletes nothing — if the subquery yields a NULL, the same trap from Chapters 3, 5, 6, and 9, now with destructive stakes. Prevention: NOT EXISTS for anti-join deletes.

The un-batched mega-modification. A correct UPDATE/DELETE over millions of rows that locks a production table for minutes. Prevention: batch it. Run any of these past the safe-modification workflow and the danger evaporates; skip the workflow and you're one keystroke from an incident. The workflow exists precisely because these mistakes are easy to make and expensive to make.

DML across the standard and a note on portability

The core DML statements are among the most portable in SQL: INSERT, UPDATE, and DELETE with WHERE work essentially identically across PostgreSQL, MySQL, SQL Server, Oracle, and SQLite. The basic shape you've learned transfers cleanly, which is reassuring — the everyday work of changing data looks the same nearly everywhere. The divergences cluster in the conveniences and extensions.

RETURNING is PostgreSQL's (and Oracle's) way to get modified rows back; SQL Server uses the OUTPUT clause; MySQL historically offered only LAST_INSERT_ID() for the narrow case of a single insert's generated id. Upsert is even more divided: PostgreSQL and SQLite use INSERT ... ON CONFLICT, MySQL uses INSERT ... ON DUPLICATE KEY UPDATE, and the SQL-standard MERGE is supported by SQL Server, Oracle, and (since 15) PostgreSQL. TRUNCATE exists widely but varies in details like identity reset and whether it can be rolled back. UPDATE ... FROM is PostgreSQL syntax; other databases spell multi-table updates differently. Appendix J maps these.

The practical takeaway is that the concepts — insert, update, delete, upsert, returning modified data, atomic transactions — are universal, while the spellings of the advanced features differ. Learn the concepts deeply on PostgreSQL, know which features are extensions versus standard, and adapting to another database is a matter of looking up syntax, not relearning ideas. And the most important thing in this chapter — the safe-modification workflow — is database-agnostic entirely. Preview, wrap, verify, commit works the same everywhere, because it's a discipline, not a feature. That discipline is the one thing from this chapter you should carry, unchanged, into every database you ever touch.

It's fitting that the chapter where you gained the power to change data is also the chapter most insistent about habits, because the two are inseparable. Reading data wrong wastes a moment; writing data wrong can cost a company its records and you your weekend. The syntax of INSERT, UPDATE, DELETE, and upsert is genuinely easy — you could learn it in an hour. What takes professional maturity is the surrounding discipline: previewing before you modify, transacting around anything risky, designing writes to be idempotent, batching what's large, and respecting the blast radius of each verb. Those habits don't show up in a syntax reference, but they are what separate someone who can be trusted with production data from someone who can't. Learn them here, on Mercado, until they're reflex.

Summary

DML changes data, so habits matter as much as syntax. INSERT adds rows (name your columns; multi-row and INSERT ... SELECT are efficient; omitted columns take defaults). RETURNING hands back modified rows (e.g., a generated id). UPDATE sets new values for matching rows (and can update FROM another table) — never without a WHERE. DELETE removes matching rows — same warning; it respects foreign keys. TRUNCATE empties a whole table fast (no WHERE, optional RESTART IDENTITY). INSERT ... ON CONFLICT (upsert) atomically inserts-or-updates. And transactions (BEGIN/COMMIT/ROLLBACK) are your undo button — wrap risky changes, verify, then commit. The professional reflex: test the WHERE with a SELECT, wrap in a transaction, verify, commit.

You can now: - INSERT single rows, many rows, and query results; capture output with RETURNING. - UPDATE rows (including from another table) safely, always with a verified WHERE. - DELETE selectively and understand DELETE vs TRUNCATE. - Perform atomic upserts with ON CONFLICT DO UPDATE/NOTHING. - Use BEGIN/COMMIT/ROLLBACK to make risky changes reversible.

What's next. Chapter 14 — Data Definition — CREATE, ALTER, DROP: the DDL that builds the tables you've been modifying. You'll learn PostgreSQL's data types and constraints in depth, sequences, and schemas — turning the relational-model concepts of Chapter 3 into real, well-constrained tables.

Practice in exercises.md, test yourself with the quiz, apply it in the case studies, review the key takeaways, and go deeper with further reading.