Part IV — Performance and Internals

"Your query returns the right answer. Congratulations. It also does a full table scan on 50 million rows because you forgot an index on the JOIN column. Here is how to find that in 30 seconds and fix it."

There is a moment in every database career when a query that worked perfectly on test data takes 45 seconds in production. Part IV is about that moment — and about making sure it never surprises you again. This is where the book's fifth recurring theme comes fully to life: performance is not premature optimization; it is basic competence. Every developer should be able to read an EXPLAIN plan, recognize a missing index, and understand why a query is slow. None of that is wizardry. It is a small, learnable set of skills, and this part teaches them.

It is also where we lift the hood. To know why an index helps, you need to know what a B-tree is and how PostgreSQL stores rows on disk. To know why two transactions can corrupt each other, you need to understand isolation levels and MVCC. The book's theme of understanding the WHY reaches its deepest point here: the internals are not trivia, they are the explanation for everything you do to make databases fast and correct.

What you will learn

Chapter 23 — Indexing. What an index is (a separate, sorted data structure), B-tree / Hash / GIN / GiST / partial / expression / multi-column indexes, when to create one, and — just as important — when not to. Home of the book's slow-query-diagnosis anchor: a 45-second query becomes 12 milliseconds.
Chapter 24 — Query Optimization. Reading EXPLAIN and EXPLAIN ANALYZE; sequential vs. index vs. bitmap scans; nested-loop vs. hash vs. merge joins; spotting the bottleneck and fixing it.
Chapter 25 — Table Partitioning. Scaling a single huge table by splitting it into partitions (by range, list, and hash), partition pruning, and the maintenance trade-offs — using Mercado's orders table partitioned by date.
Chapter 26 — Transaction Management. ACID in depth; BEGIN/COMMIT/ROLLBACK; savepoints; the four isolation levels and the anomalies (dirty/non-repeatable/phantom reads) each prevents; and PostgreSQL's MVCC.
Chapter 27 — Concurrency Control. Locks (row, table, advisory), deadlocks and how to avoid them, optimistic vs. pessimistic strategies, SELECT ... FOR UPDATE, the lost-update problem, and connection pooling.
Chapter 28 — Database Internals. How PostgreSQL really works: pages and the heap, TOAST, the buffer pool, the write-ahead log (WAL), VACUUM and autovacuum, checkpoints, and replication — the machinery that explains every tuning decision.

Why this part matters

This is the part that makes you valuable. Anyone can write a query that returns the right rows; far fewer can make it return them in time, or guarantee it stays correct when a thousand users hit it at once. The skills here — reading a plan, adding the right index, choosing an isolation level, understanding why VACUUM matters — are the ones that get people promoted and keep systems alive.

For every learning path

🏗️ DBAs should treat all six chapters as core — this is your home turf. 💻 Developers need Chapters 23, 24, and 26 cold; 25, 27, and 28 round out the picture as your systems grow. 🔬 CS students get the algorithmic heart of the field here: B-tree height analysis, join algorithms, MVCC, and the cost-based optimizer. 📊 Analysts will find Chapters 23–24 immediately practical — knowing why your reporting query is slow is a superpower in any organization.

You have learned to write SQL and to design schemas. Now you will learn to make them fast and safe — the difference between a database that works in a demo and one that works in production.

Part IV — Performance and Internals

What you will learn

Why this part matters

For every learning path

Chapters in This Part