Case Study 1 — 45 Seconds to 12 Milliseconds

DataField.Dev

Case Study 1 — 45 Seconds to 12 Milliseconds

The book's signature performance anchor, as a full incident. A dashboard query that quietly degraded as data grew finally became unusable — and one well-chosen index fixed it. This is the most satisfying skill in databases, and the most learnable.

Background

An internal "customer 360" page showed a single customer's recent orders. Early on, with thousands of orders, it was instant. Nobody thought about it. Two years later, with the orders table past 50 million rows, the page took 45 seconds to load — when it didn't time out. Support reps refreshed, gave up, and worked blind. The query was simple:

SELECT order_id, order_date, status
FROM orders
WHERE customer_id = 4267
ORDER BY order_date DESC
LIMIT 20;

It returns at most 20 rows. Why would 20 rows take 45 seconds?

Diagnosis

The on-call engineer ran the one command that answers "why is this slow" — EXPLAIN ANALYZE:

EXPLAIN ANALYZE SELECT order_id, order_date, status FROM orders
WHERE customer_id = 4267 ORDER BY order_date DESC LIMIT 20;

 Limit  (actual time=44210 ... )
   ->  Sort  (actual time=44210 ... )
         Sort Key: order_date DESC
         ->  Seq Scan on orders  (actual time=0.3..43800 rows=312 ...)
               Filter: (customer_id = 4267)
               Rows Removed by Filter: 51000000        ← the smoking gun
 Planning Time: 0.2 ms
 Execution Time: 44231 ms

The plan said it plainly: a Seq Scan on orders — reading all 51 million rows, throwing away 51,000,000 that didn't match customer_id = 4267, to find the 312 that did, then sorting them. The database had no way to find customer 4267's rows except by examining every row. The 45 seconds was the cost of scanning 50M+ rows on every page load. (Rows Removed by Filter: 51000000 is the classic missing-index signature — Chapter 24.)

The fix

One index on the filtered column:

CREATE INDEX CONCURRENTLY idx_orders_customer_id ON orders (customer_id);

(CONCURRENTLY so building it didn't lock the live table — Chapters 14, 22.) Re-running EXPLAIN ANALYZE:

 Limit  (actual time=0.05..0.12 rows=20 ...)
   ->  Index Scan using idx_orders_customer_id on orders  (actual time=0.04..0.10 rows=312 ...)
         Index Cond: (customer_id = 4267)
 Execution Time: 12 ms

The plan switched to an Index Scan: it jumped straight to customer 4267's rows via the B-tree (O(log n)) instead of scanning 50M rows (O(n)). 45,000 ms → 12 ms — about 3,500× faster. The page became instant.

Making it even better

The engineer noticed the query also sorts by order_date DESC and takes the top 20. A composite index matching the access pattern lets the database skip the sort too:

CREATE INDEX CONCURRENTLY idx_orders_cust_date ON orders (customer_id, order_date DESC);

Now the index supplies customer 4267's rows already in date order, so the LIMIT 20 reads just the first 20 index entries — no sort, no extra work. The plan's Sort node disappears. For a "most recent N per X" query, an index ordered (X, date DESC) is the ideal shape.

The analysis

This is the anchor, and it's basic competence (theme #5). A 3,500× speedup from one index isn't exotic — it's the everyday payoff of knowing that WHERE customer_id = ... on a big table needs an index. Every developer should be able to do this.
EXPLAIN ANALYZE turns "it's slow" into "here's why." The Seq Scan + Rows Removed by Filter: 51000000 told the whole story in one line. Don't guess at performance — read the plan (Chapter 24).
The smell is "scan a huge table, return few rows." Whenever a query fetches a small fraction of a large table and you see a sequential scan, suspect a missing index on the filter/join column.
Match the index to the whole access pattern. The single-column index fixed the filter; the composite (customer_id, order_date DESC) index also eliminated the sort — index design follows how the query filters and orders.
Build it CONCURRENTLY in production. A plain CREATE INDEX would have locked the live orders table while building; CONCURRENTLY avoided a second outage during the fix.
It degraded silently. The query was fine at small scale and got slowly worse — nobody "broke" it. Performance problems often grow with data; profiling top queries periodically (not just when they catch fire) catches them earlier.

Discussion questions

Why did a query returning 20 rows take 45 seconds before the index?
What two pieces of the EXPLAIN ANALYZE output pinpointed the problem?
Explain the before/after plans (Seq Scan vs Index Scan) in O(n) vs O(log n) terms.
Why does the composite (customer_id, order_date DESC) index help beyond the single-column one?
⭐ Why use CONCURRENTLY, and what's its trade-off? Why is "degraded silently" an argument for periodic profiling?