Chapter 24: Query Optimization — Reading EXPLAIN Plans and Making Queries Fast

DataField.Dev

31 min read

> Where you are: Part IV, Chapter 24 of 40. Chapter 23 gave you indexes; this chapter gives you the diagnostic tool — EXPLAIN — that tells you exactly why a query is slow and whether an index (or a rewrite) will fix it. This is the skill that turns...

In This Chapter

From "it's slow" to "here's why"
EXPLAIN vs EXPLAIN ANALYZE
Reading a plan
Scan types — how a single table is read
Join algorithms — how two tables are combined
Reading a plan, deeply
Scan types and join algorithms, in depth
A complete worked diagnosis
The optimizer and statistics
Red flags to look for in a plan
Fixing slow queries
A worked diagnosis
When a query is inherently expensive
Common mistakes
Memory, sorts, and disk spills
Query rewriting for performance
A field guide to plan red flags
Progressive project: profile and fix
Finding slow queries at scale
The optimization mindset
Summary

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

Chapter 24: Query Optimization — Reading EXPLAIN Plans and Making Queries Fast

Where you are: Part IV, Chapter 24 of 40. Chapter 23 gave you indexes; this chapter gives you the diagnostic tool — EXPLAIN — that tells you exactly why a query is slow and whether an index (or a rewrite) will fix it. This is the skill that turns "it's slow" into "here's the problem, here's the fix."

Learning paths: 💻 📊 🔬 🏗️ — everyone who writes queries on real data. Reading a plan is the difference between guessing and knowing.

From "it's slow" to "here's why"

Recall Chapter 1: SQL is declarative — you say what you want; the optimizer (planner) decides how to compute it, choosing among equivalent execution strategies (Chapter 4's algebra). Usually it chooses well. When it doesn't — or when it can't, because you haven't given it an index — queries get slow. EXPLAIN lets you see the plan the optimizer chose, so you can diagnose the problem instead of guessing at it. Mastering EXPLAIN is the core of performance work.

Load sql/generate_data.sql so the plans reflect real data.

EXPLAIN vs EXPLAIN ANALYZE

EXPLAIN <query> shows the optimizer's chosen plan and its cost estimates — without running the query. Fast, safe, but estimates only.
EXPLAIN ANALYZE <query> actually runs the query and reports the real row counts and timings alongside the estimates. The truth, but it executes the query (careful with UPDATE/DELETE — wrap in a transaction you ROLLBACK).

EXPLAIN ANALYZE
SELECT * FROM orders WHERE customer_id = 4267;

Always reach for EXPLAIN ANALYZE when diagnosing — the actual numbers reveal what the estimates miss. Add BUFFERS (EXPLAIN (ANALYZE, BUFFERS)) to also see how much data was read from cache vs. disk.

Reading a plan

A plan is a tree of nodes, each a step (a scan, a join, a sort, an aggregate). Read it inside-out / bottom-up: the most-indented nodes run first, feeding their parents. Each node shows:

   Node Type  (cost=<startup>..<total>  rows=<estimated>  width=<bytes>)
              (actual time=<startup>..<total>  rows=<actual>  loops=<n>)

cost — the optimizer's estimate of work (in arbitrary units; lower = cheaper). The number it minimizes when choosing a plan.
rows — estimated rows this node produces.
actual time / rows (from ANALYZE) — what really happened.

The single most useful diagnostic move: compare estimated rows to actual rows. A large mismatch means the optimizer is working from bad information (stale statistics) and likely chose a poor plan — fixable by ANALYZE (below).

Scan types — how a single table is read

Node	What it does	Good/bad
Seq Scan	read every row, filter	fine for small tables / returning most rows; a red flag on a big table returning few rows (missing index — Ch. 23)
Index Scan	walk the index to the matching rows, fetch each	great for selective lookups
Index Only Scan	answer entirely from the index (covering)	fastest — never touches the table
Bitmap Index Scan + Bitmap Heap Scan	build a bitmap of matching rows from the index, then fetch in physical order	great for medium selectivity (many matches, but not most of the table)

Seeing Seq Scan on orders ... Rows Removed by Filter: 51000000 while returning a handful of rows is the textbook missing-index signature (Chapter 23's anchor).

Join algorithms — how two tables are combined

The optimizer picks among three join strategies (Chapter 6's joins, executed):

Nested Loop — for each row of A, look up matches in B (ideally via an index on B). Great when A is small and B is indexed; catastrophic when both are large and unindexed (it becomes O(n×m)).
Hash Join — build a hash table of one input, probe it with the other. Excellent for joining large, unindexed inputs. Needs memory (work_mem).
Merge Join — sort both inputs by the join key, then merge in one pass. Great when inputs are already sorted (e.g., by an index) or for large joins where sorting is affordable.

A common slow-query cause: a Nested Loop over two large inputs because a join column isn't indexed — add the index and the planner switches to a Hash Join or an indexed Nested Loop, and it flies.

Reading a plan, deeply

A query plan intimidates people because it's dense with numbers, but it has a clear structure, and once you can read it methodically, it becomes the most informative diagnostic tool you have. A plan is a tree of nodes, where each node is one operation — a scan, a join, a sort, an aggregate — and data flows upward from the leaves (table scans) through intermediate operations to the root (the final result). The single most important reading skill is direction: read the plan bottom-up and inside-out. The most-indented nodes are the leaves; they run first and feed their parents. The root at the top is the last step, producing the final rows. Tracing data from the deep, indented nodes upward to the top is how you follow what the query actually does, in execution order.

Each node carries two sets of numbers, and understanding them is the heart of plan-reading. The estimates — cost=startup..total rows=N width=B — are the optimizer's predictions: cost is an abstract measure of work (lower is cheaper; the planner chooses the plan with the lowest total cost), rows is how many rows it expects this node to produce, and width is the estimated bytes per row. The actuals — shown only with EXPLAIN ANALYZE — are what really happened: actual time=startup..total rows=N loops=L. The actual time is in milliseconds and is cumulative (it includes the children's time), rows is the real row count, and loops is how many times this node executed (important for nested loops, where a node runs once per outer row — the per-loop time multiplied by loops is the real cost). Learning to map these numbers to "what work did this step do, and how long did it take" is what turns a wall of text into a diagnosis.

The cost numbers deserve a note because they confuse people. The cost is in arbitrary units, not milliseconds or any real-world measure — it's a relative estimate the planner uses to compare plans, calibrated so that reading one sequential page costs 1.0 by default. You don't interpret cost as time; you use it to understand what the planner thought — which plan it considered cheapest and why. The actual time, by contrast, is real milliseconds, and it's what you use to find the actual bottleneck. So the two number sets serve different purposes: the costs reveal the planner's reasoning (and whether it estimated well), the actuals reveal the reality (and where time truly went). Reading both, and comparing them, is the core diagnostic act.

The most powerful single move in all of plan-reading is comparing estimated rows to actual rows at each node. When they're close, the planner understood the data and likely chose a good plan. When they diverge wildly — the plan estimated 5 rows but 5 million actually flowed through — the planner was working from bad information (usually stale statistics), and it almost certainly chose a poor plan based on that wrong estimate (perhaps a nested loop that's catastrophic for 5 million rows but would've been fine for 5). This estimate-versus-actual gap is the master diagnostic: it instantly tells you whether the planner's information was the problem (fix with ANALYZE) or whether the plan is bad for some other reason. Scanning a plan for the node with the biggest estimate/actual mismatch, and the node with the highest actual time, points you straight at what to fix — which is exactly the methodical reading this chapter is teaching.

Scan types and join algorithms, in depth

The nodes you'll encounter most are scans (how a single table is read) and joins (how two are combined), and recognizing each — knowing when it's appropriate and when it's a red flag — is most of practical plan-reading. The sequential scan (Seq Scan) reads every row of a table and filters. It's the right choice for small tables (where reading everything is already fast) and for queries returning most of a table (where an index would be slower). But a Seq Scan on a large table returning few rows is the textbook signature of a missing index — the plan will show it scanned millions of rows and removed most with a filter, the smell from Chapter 23. The index scan (Index Scan) walks an index to the matching rows and fetches each from the table — excellent for selective queries. The index-only scan (Index Only Scan) answers entirely from the index without touching the table at all (when the index covers all needed columns) — the fastest case. The bitmap scan (Bitmap Index Scan feeding a Bitmap Heap Scan) is a hybrid for medium selectivity: it builds a bitmap of matching row locations from the index, then fetches them in physical table order (efficient for many matches that an index scan's random access would handle poorly).

The three join algorithms each suit different situations, and recognizing which the planner chose — and whether it's the right one — is key to diagnosing slow joins. The nested loop join takes each row from the outer input and looks up matches in the inner input; it's excellent when the outer input is small and the inner is indexed (each lookup is a fast index probe), making it the natural choice for "fetch a few rows, then look up their related rows." But a nested loop over two large inputs without an index on the inner side is catastrophic — it becomes O(n×m), looking up every outer row by scanning the inner table, and this is a common cause of a query that runs for minutes. The hash join builds a hash table from the smaller input and probes it with the larger — excellent for joining two large, unindexed inputs in a single pass, the planner's usual choice for big equi-joins without helpful indexes (it needs memory for the hash, the work_mem setting). The merge join sorts both inputs by the join key and merges them in one pass — great when the inputs are already sorted (by an index) or when sorting is affordable, common for large joins where both sides come pre-ordered.

Reading these in a plan tells a story. A nested loop with a large outer input and a Seq Scan on the inner side screams "missing index on the join column" — add it, and the planner switches to an indexed nested loop or a hash join, and the query flies. A hash join spilling to disk (visible in the plan) suggests insufficient work_mem. A merge join with explicit sorts beneath it might benefit from an index that supplies pre-sorted rows. The planner chooses among these based on its cost estimates, so the algorithm you see reflects what it thought was cheapest given its statistics — and when it chose badly, the estimate/actual gap usually reveals why. The skill is recognizing each node type on sight, knowing its appropriate use, and spotting when the chosen plan implies a missing index or a statistics problem. That recognition — "ah, a nested loop over two big tables, that's the slow part, it needs an index" — is what reading plans is for: it converts a slow query from a mystery into a specific, fixable diagnosis.

A complete worked diagnosis

Let's read a realistic plan end to end, because the integrated skill — taking a slow query, reading its full plan, finding the bottleneck, fixing it, and verifying — is what this chapter exists to build. The query: "for gold-tier customers, count their orders by last name," joining customers and orders, filtering on tier, grouping by name. It's slow, and you run EXPLAIN ANALYZE to find out why.

Suppose the plan shows, reading bottom-up: a Seq Scan on orders (actual time: 8 seconds, actual rows: 5,000,000 — it read the entire orders table) and a Seq Scan on customers with a filter on loyalty_tier, both feeding a Hash Join, which feeds a HashAggregate for the GROUP BY. The bottleneck is immediately visible: the Seq Scan on orders taking 8 seconds and reading all five million rows is where the time goes. Why is it scanning all of orders? Because the join o.customer_id = c.customer_id has no index on orders.customer_id to exploit, so the hash join must read every order to build or probe the hash. The plan didn't just tell you the query is slow — it told you exactly which operation is slow (the orders seq scan) and why (no index to avoid reading all orders for the join).

The fix follows directly from the diagnosis: index the join column. CREATE INDEX CONCURRENTLY idx_orders_customer_id ON orders (customer_id), then ANALYZE orders to refresh the statistics so the planner knows about the new index and the data. Now you re-run the same EXPLAIN ANALYZE to verify. The new plan shows the planner using an Index Scan (or an indexed nested loop) to fetch only the orders belonging to gold customers, rather than scanning all five million — and the actual time drops from 8 seconds to milliseconds. The plan changed (seq scan became index scan) and the time dropped, which together confirm the fix worked. You didn't guess that an index would help; the plan showed you the seq scan, you added the index it implied, and you confirmed by re-reading the plan that the seq scan was gone and the time improved.

This is the complete diagnostic loop, and it's worth naming its three parts because they're the discipline: diagnose (read the plan, find the bottleneck node by actual time, understand why it's slow), fix (apply the indicated remedy — usually an index, sometimes an ANALYZE or a rewrite), and verify (re-read the plan to confirm the bottleneck is gone and the time dropped). Each part matters. Skipping diagnosis means guessing at the slow part (often wrong — the slow node is rarely the one you'd assume). Skipping verification means hoping the fix worked without confirming it (sometimes the index isn't used, or a different bottleneck emerges). The loop — diagnose, fix, verify, all through EXPLAIN ANALYZE — is how performance work is actually done by professionals, and it's entirely learnable. Notice how it embodies theme #5: you don't optimize by intuition or folklore ("joins are slow," "add an index everywhere"); you measure, you read what the database actually did, you fix the specific problem the measurement revealed, and you measure again to confirm. That evidence-driven loop, not any single trick, is the real skill of query optimization.

The optimizer and statistics

The optimizer chooses a plan by estimating the cost of alternatives — and those estimates depend on statistics about your data (how many rows, how many distinct values, value distributions), stored in pg_statistic and gathered by ANALYZE (run automatically by autovacuum, Chapter 28). When statistics are stale — after a big data load or mass update — the optimizer's estimates are wrong, and it can pick a terrible plan (e.g., a nested loop expecting 5 rows when there are 5 million).

ANALYZE orders;          -- refresh statistics for one table
ANALYZE;                 -- refresh all (generate_data.sql runs this for you)

Common mistake. Diagnosing a slow query whose plan looks insane (huge estimate/actual mismatch) and concluding "the optimizer is broken." Usually the statistics are stale. Run ANALYZE and re-check — the plan often fixes itself. This is the first thing to try when estimates and actuals diverge wildly.

Red flags to look for in a plan

When a query is slow, scan the plan for these:

Seq Scan on a large table returning few rows → missing index (Ch. 23).
Big estimate-vs-actual rows mismatch → stale statistics → ANALYZE.
Rows Removed by Filter large → the filter ran after reading too many rows → an index on the filter column would pre-narrow.
A Sort node spilling to disk (Sort Method: external merge Disk: ...) → not enough work_mem, or an index could supply pre-sorted rows.
Nested Loop with a large outer input → likely a missing index on the inner side, or a hash join would be better.
The most expensive node (highest actual time, often deepest) → that's where to focus; optimize the bottleneck, not the cheap nodes.

Fixing slow queries

Once the plan points at the problem, the fixes (in rough order of frequency):

Add the right index (Ch. 23) — turns Seq Scans into Index Scans, enables indexed nested loops. The most common fix.
Run ANALYZE — fixes plans built on stale statistics.
Rewrite the query — push filters earlier (Chapter 4's "selection pushdown"); replace a correlated subquery with a join or window function (Chapters 9, 12); avoid functions on indexed columns (Ch. 8) or add an expression index (Ch. 23).
Increase work_mem — lets sorts and hash joins stay in memory instead of spilling to disk (a session or query setting; raise it cautiously, it's per-operation).
Denormalize / materialize — as a last resort for genuinely expensive aggregations (Chapters 15, 20).

A worked diagnosis

The anchor, through the plan. A slow query:

EXPLAIN ANALYZE
SELECT c.last_name, COUNT(*) AS orders
FROM customers c JOIN orders o ON o.customer_id = c.customer_id
WHERE c.loyalty_tier = 'gold'
GROUP BY c.last_name;

Suppose the plan shows a Seq Scan on orders feeding a Hash Join, with orders fully scanned (millions of rows) even though only gold customers' orders are needed. Diagnosis: there's no index on orders.customer_id, so the join must scan all of orders. Fix:

CREATE INDEX CONCURRENTLY idx_orders_customer_id ON orders (customer_id);
ANALYZE orders;

Re-run EXPLAIN ANALYZE: the plan now uses an Index Scan (or indexed nested loop) to fetch only the relevant orders, and runtime drops dramatically. You didn't guess — the plan showed you the Seq Scan, you added the index it implied, and you confirmed the fix by re-reading the plan. That diagnose→fix→verify loop is the whole discipline.

When a query is inherently expensive

An honest part of optimization maturity is recognizing when a query can't be made dramatically faster — when it's slow not because of a missing index or a bad plan, but because it genuinely must process a lot of data. Not every slow query has a clever fix, and knowing the difference between "slow because of a fixable problem" and "slow because the work is genuinely large" saves you from chasing optimizations that don't exist.

Some queries are inherently expensive: aggregating over the entire dataset (a report summing every order ever placed), scanning most of a huge table (a query that genuinely needs 80% of the rows, where an index would be slower than the scan the planner correctly chose), or computing complex analytics across millions of rows. For these, EXPLAIN ANALYZE will show a plan that's already optimal — the right scan type, no missing index, fresh statistics, no disk spills — and yet the query takes seconds because it's doing seconds' worth of legitimate work. Adding an index won't help (the query needs most of the data); rewriting won't help (the work is irreducible). The plan is fine; the data volume is the cost.

When you reach this point — a well-optimized plan that's still slow because the work is large — the remedies shift from "make the query faster" to "avoid running it repeatedly" or "do the work elsewhere." This is exactly where the denormalization tools from Chapter 20 enter: if an expensive aggregation is run constantly over slowly-changing data, precompute it into a materialized view or summary table (Chapter 15) so the expensive query runs once per refresh, not once per request. Or move the heavy analytical work to a separate system optimized for it — a data warehouse (Chapter 34) — rather than burdening the transactional database. Or accept the cost if the query is rare (a monthly report taking 30 seconds is usually fine). The point is that the fix for an inherently-expensive query isn't query optimization at all — it's architecture: caching the result, scheduling the work, or moving it to the right system.

Recognizing this boundary is what keeps you from two errors: endlessly tuning a query that's already optimal (the plan is fine; stop), and accepting a genuinely-slow query that could have been fixed (the plan showed a seq scan; you should have indexed it). The diagnostic loop tells you which situation you're in: if the plan has a red flag (missing index, stale stats, disk spill), fix it; if the plan is clean and the query is still slow, the work is inherent, and the answer is architectural (materialize, schedule, or relocate it). That judgment — distinguishing a fixable plan problem from irreducible work — is the capstone of the optimization mindset, and it's what lets you spend your effort where it actually helps rather than polishing queries that are already as fast as their honest workload allows.

Common mistakes

Optimizing without EXPLAIN — guessing at the slow part instead of reading the plan. The plan tells you exactly where the time goes.
Reading only the top node — the cost is usually deep in the tree; find the node with the highest actual time.
Forgetting ANALYZE after big data changes — stale stats produce bad plans.
Trusting estimates over actuals — always use EXPLAIN ANALYZE to see what really happened.
Premature micro-optimization — fix the one query the plan shows is slow; don't tune queries that are already fast.

Memory, sorts, and disk spills

A category of slow query that indexes alone don't fix involves sorting and hashing large amounts of data, and understanding it rounds out your optimization toolkit. Operations like ORDER BY, GROUP BY (via hashing), hash joins, and DISTINCT need working memory to do their job — a sort needs space to hold and order the rows, a hash join needs space for its hash table. PostgreSQL allocates each such operation an amount of memory governed by the work_mem setting. When the data fits in that memory, the operation runs fast, entirely in RAM. When it doesn't fit, the operation spills to disk — writing intermediate data to temporary files and reading it back — which is dramatically slower than staying in memory.

You spot a disk spill in the plan. A sort node showing Sort Method: external merge Disk: 52000kB is telling you it couldn't fit in memory and spilled 52 megabytes to disk — a performance red flag. The same can happen with hash joins and hash aggregates (Batches: 8 with disk usage indicates the hash spilled). When you see this, the query is slow not because of a missing index but because an operation exceeded its memory budget and fell back to disk. The diagnosis is different from the missing-index case, and so is the fix.

There are several remedies. The most direct is increasing work_mem for the query or session — giving the sort or hash enough memory to stay in RAM. This is a per-operation setting (each sort/hash in a query can use up to work_mem, so a complex query with several can use a multiple of it), which is why you raise it cautiously — set it too high globally and a busy server with many concurrent queries can exhaust total memory. Often you raise it just for a specific heavy query or reporting session, not globally. Another remedy is an index that supplies pre-sorted rows: if a query sorts by a column that has a B-tree index, the planner can sometimes read the rows already in order from the index, skipping the sort entirely — turning an expensive disk-spilling sort into no sort at all. And sometimes the fix is reducing the data being sorted — filtering earlier, selecting fewer columns (less width to sort), or rethinking whether the sort is needed. Recognizing the disk-spill signature in a plan, and knowing that the fix is memory (or a sort-avoiding index) rather than the usual index-the-filter remedy, is what makes you able to optimize the sorting-and-hashing class of slow queries, not just the missing-index class. It's a distinct diagnostic pattern worth having in your repertoire.

Query rewriting for performance

Sometimes the fix isn't an index or more memory but rewriting the query itself into an equivalent form the optimizer handles better — and this connects directly back to Chapter 4's relational algebra and its equivalences. The optimizer does a great deal of rewriting automatically (pushing selections down, reordering joins), but it can't always transform a poorly-expressed query into an optimal one, so knowing the rewrites yourself is occasionally what's needed.

The most impactful rewrites address the patterns earlier chapters flagged. Replacing a correlated subquery with a join or window function (Chapters 9, 12): a correlated subquery that runs once per outer row can be O(n²), and rewriting it as a single join-and-aggregate or a window function can collapse it to one pass — the plan changes from a nested loop running the subquery repeatedly to a single hash aggregate. Avoiding functions on indexed columns in WHERE (Chapter 8): WHERE lower(email) = '...' can't use an index on email, so either rewrite to avoid the function or add an expression index (Chapter 23) — the plan switches from a seq scan to an index scan. Replacing NOT IN with NOT EXISTS (Chapter 9): not just for the NULL-safety, but because the optimizer often handles the anti-join of NOT EXISTS better. Filtering before aggregating or joining rather than after, so less data flows through the expensive operations — the selection-pushdown idea, which the optimizer usually does but which you can help by writing the filter where it belongs.

The deeper point is that equivalent queries can have very different plans, and when the optimizer's automatic rewriting doesn't produce a good plan, expressing the query differently can give it a better starting point. This is where Chapter 4's lesson — that there are many equivalent algebra expressions for one question, and the optimizer searches among them — becomes practically useful: you're giving the optimizer a different, better-shaped expression to work with. The diagnostic loop applies here too: when a query is slow and it's not a missing index or stale statistics, try an equivalent rewrite, run EXPLAIN ANALYZE, and compare the plans. Sometimes a different phrasing of the same question produces a dramatically better plan. This rewriting skill is more advanced than indexing — it requires understanding both the query's intent and how the optimizer thinks — but it's a real tool for the queries that indexing alone can't fix, and it's why understanding the relational algebra beneath SQL (Chapter 4) pays off concretely in performance work.

A field guide to plan red flags

To consolidate the diagnostic skill, here is the checklist of red flags to scan for when reading a plan of a slow query — the patterns that point directly at problems and their fixes. Run a slow query's plan past these and you'll usually find the culprit fast.

A Seq Scan on a large table returning few rows is the classic missing index — the database read everything to find a little, and an index on the filtered or joined column is the fix (Chapter 23). A large estimate-versus-actual rows mismatch signals stale statistics misleading the planner — run ANALYZE and re-check, as the plan often corrects itself. A high Rows Removed by Filter means the query read many rows and then discarded most — an index on the filter column would pre-narrow, avoiding the wasted reads. A Sort or hash node spilling to disk (external merge Disk: or hash Batches: with disk) means an operation exceeded work_mem — raise the memory or supply a pre-sorted index. A Nested Loop with a large outer input usually means a missing index on the inner side (each outer row triggers an expensive inner scan) — index the inner join column so the loop becomes fast probes, or let the planner switch to a hash join. The single most expensive node — the one with the highest actual time, found by scanning the tree — is always where to focus; optimizing a cheap node while ignoring the expensive one wastes effort.

The meta-skill these red flags support is focusing on the bottleneck. A complex plan has many nodes, but the query's slowness almost always comes from one or two of them — the most expensive by actual time. Identifying that node first, then matching its symptom to a red flag and its fix, is far more effective than trying to optimize the whole query at once. This is a specific instance of a general performance truth: optimization is about finding and fixing the bottleneck, not improving everything uniformly. The plan, read for these red flags with attention to the most expensive node, hands you the bottleneck directly. That's the gift of EXPLAIN: it removes the guesswork, showing you precisely where the time goes and, through the red-flag patterns, what to do about it. A practitioner who reads plans fluently doesn't wonder why a query is slow — they see it, in the seq scan or the disk spill or the giant nested loop, and they know the fix. That sight, built from recognizing these red flags, is the core competence of query optimization.

Progressive project: profile and fix

For your project:

Run EXPLAIN ANALYZE on your three most important / slowest queries.
Identify the bottleneck node (highest actual time) and the red flag (Seq Scan? bad estimate? disk sort?).
Apply the fix (index / ANALYZE / rewrite) and re-run EXPLAIN ANALYZE to confirm the plan changed and the time dropped.
Document each: the before-plan, the diagnosis, the fix, the after-plan. This is a portfolio-worthy performance writeup.

Finding slow queries at scale

So far we've assumed you know which query is slow — but in a real application running thousands of queries, finding which queries to optimize is itself a skill, and PostgreSQL provides the tool: the pg_stat_statements extension. Optimizing a query you happened to notice is good; systematically finding the queries that actually consume the most database time is far better, because it directs your effort where it pays off most.

pg_stat_statements tracks every query the database executes, aggregating statistics by query shape: how many times each has run, the total and mean time spent, the rows returned, and cache-hit ratios. Querying it (it's itself a view you SELECT from) lets you rank queries by total time consumed — mean_time × calls — which surfaces the queries that matter most to overall performance. This ranking often surprises people: the slowest individual query might run once a day and barely matter, while a query that's only moderately slow but runs ten thousand times an hour dominates the database's total load. Optimizing the latter — a query you might never have "noticed" as slow — can improve overall system performance far more than fixing the dramatic-but-rare one. The total-time ranking points you at the real opportunities.

This reframes optimization from reactive to systematic. Rather than waiting for a user complaint about a specific slow page, you periodically review pg_stat_statements, identify the top time-consumers, and apply the diagnose-fix-verify loop to each — EXPLAIN ANALYZE the offender, find the bottleneck, fix it, confirm. This is how database performance is managed proactively in production: measure aggregate query load, find the heavy hitters, optimize them, repeat. It's theme #5 at the system level — performance as ongoing, measured competence rather than firefighting. The DBA chapters of Part VII develop this monitoring practice further (along with the other key metrics — cache hit ratios, connection counts, replication lag), but the core idea belongs here with optimization: you can't optimize what you don't measure, and pg_stat_statements is how you measure which queries to optimize, completing the picture of EXPLAIN, which measures why a given query is slow. Together they make optimization a directed, evidence-based practice rather than a guessing game.

The optimization mindset

Stepping back from the mechanics, query optimization is as much a mindset as a set of techniques, and internalizing that mindset is what makes you genuinely good at it. The mindset has a few defining principles, all flowing from theme #5 — performance is basic competence, approached through measurement.

First, measure, never guess. The recurring temptation in performance work is to assume you know why something is slow and start "fixing" based on folklore ("joins are slow," "subqueries are slow," "add indexes everywhere"). This is almost always wasteful and often wrong — the actual bottleneck is rarely where intuition points. The disciplined practitioner reads the plan first, lets the measurement reveal the real problem, and fixes that. EXPLAIN ANALYZE exists precisely to replace guessing with seeing. Second, focus on the bottleneck. A query has one or two operations that dominate its time; optimizing the cheap nodes while ignoring the expensive one accomplishes nothing. Find the highest-actual-time node, fix it, and re-measure — the bottleneck may have moved, revealing the next thing to fix, or the query may now be fast enough to stop. Third, know when to stop. Optimization has diminishing returns; a query that's fast enough for its purpose doesn't need further tuning, and over-optimizing already-adequate queries is wasted effort (and the "premature optimization" theme #5 warns against). Optimize what measurement shows is slow and matters; leave the rest alone.

Fourth, and underlying all of it, understand the why. The reason this chapter spent so long on how the planner thinks, how scans and joins work, how statistics drive estimates, is that understanding the mechanism is what lets you read a plan and know what it means and what to do. A practitioner who's memorized "seq scan bad, index scan good" can fix the simplest cases; one who understands why the planner chose the seq scan (selectivity, statistics, cost), why the nested loop is slow (no inner index), why the sort spilled (work_mem) can diagnose anything, including the cases no checklist covers. This is theme #3 — understand the why, not just the how — and in performance work it's the difference between someone who can fix the textbook cases and someone who can diagnose a novel slow query they've never seen before. The mechanism understanding generalizes; the memorized fixes don't.

This mindset — measure first, focus on the bottleneck, know when to stop, understand the mechanism — is what transforms query optimization from intimidating black art into a methodical, learnable engineering practice. It's why theme #5 insists performance is basic competence: not because it's trivial, but because it's systematic — a repeatable loop of measurement and targeted fixes, grounded in understanding how the database works. Master EXPLAIN, internalize the diagnose-fix-verify loop, develop the mindset, and you can take any slow query and make it fast, or know definitively why it can't be — which is exactly the competence that makes databases serve applications well at scale. The slow-query anchor that ran through Part IV's opening chapters — 45 seconds to 12 milliseconds — is not a magic trick; it's this mindset, applied. And it's now yours.

Summary

EXPLAIN reveals the plan the optimizer chose; EXPLAIN ANALYZE adds the actual row counts and timings (use it for diagnosis). Read the plan bottom-up as a tree of nodes, each showing cost, estimated/actual rows, and time. Recognize scan types (Seq/Index/Index-Only/Bitmap) and join algorithms (Nested Loop / Hash / Merge) and when each is appropriate. The optimizer relies on statistics (ANALYZE) — stale stats cause bad plans, signaled by big estimate-vs-actual mismatches. The red flags — Seq Scan on a big table, large Rows Removed by Filter, disk-spilling sorts, nested loops over large inputs, estimate/actual gaps — point to the fix: an index, an ANALYZE, a rewrite, more work_mem, or materialization. The professional loop is diagnose (read the plan) → fix → verify (re-read the plan) — never guess.

You can now: - Use EXPLAIN and EXPLAIN ANALYZE and read a plan tree bottom-up. - Identify scan types and join algorithms, and when each is good or bad. - Spot the red flags (Seq Scan, estimate/actual mismatch, disk sorts, big nested loops). - Explain the role of statistics and fix bad plans with ANALYZE. - Diagnose a slow query and verify the fix by re-reading the plan.

What's next. Chapter 25 — Table Partitioning — when a single table grows so large that even good indexes strain, you split it into partitions (by range, list, or hash) so the database can prune to just the relevant chunk. Scaling a table beyond a single heap.

Practice in exercises.md, test yourself with the quiz, apply it in the case studies, review the key takeaways, and go deeper with further reading.