Chapter 23 — Key Takeaways

DataField.Dev

Chapter 23 — Key Takeaways

The big idea

An index turns finding rows from an O(n) scan into an O(log n) lookup — the single most impactful performance tool, and (theme #5) basic competence, not wizardry. The anchor: 45 s → 12 ms from one index. (Case Study 1.)

What an index is

A separate, sorted data structure (like a book's index) for finding rows by value without scanning the table.
The B-tree (default) stays shallow (logarithmic height), serving equality, ranges, sorting, and prefix LIKE.

When an index helps

Columns in WHERE (equality/range), JOIN conditions (index your foreign keys — #1 cause of slow joins), ORDER BY/GROUP BY, and UNIQUE constraints.

Diagnosing a missing index

EXPLAIN ANALYZE → a Seq Scan on a large table with Rows Removed by Filter huge while returning few rows = missing index. Add it → plan switches to Index Scan. (Chapter 24 covers plans fully.)

Index types

Type	For
B-tree (default)	equality, ranges, sorting, prefix
GIN	JSONB `@>`, arrays, full-text (Ch. 16)
GiST	ranges, spatial, nearest-neighbor
BRIN	huge naturally-ordered tables (time-series)
multi-column	leftmost-prefix (order matters!)
partial	a subset of rows (`WHERE ...`)
expression	computed values (`lower(email)` — fixes function-on-column)
covering/`INCLUDE`	index-only scans (answer from the index alone)

The cost — and when NOT to index

Indexes slow writes (every write maintains every index), use storage, and need maintenance. Don't index: small tables, low-cardinality columns (write-heavy), rarely-queried columns, or queries returning most of the table. Twenty indexes on a write-heavy table = twenty taxes per insert; drop unused ones (pg_stat_user_indexes, idx_scan = 0). (Case Study 2.)

The balance

Too few indexes → slow reads (Case Study 1). Too many → slow writes + bloat (Case Study 2).
Add an index in response to a measured slow query; review and prune periodically. Justify each index with a real query.
Build with CREATE INDEX CONCURRENTLY in production (no write lock).

You can now…

☐ Explain O(log n) vs O(n) and what an index is.
☐ Diagnose a missing index (Seq Scan smell) and add the right one.
☐ Choose among B-tree/GIN/GiST/BRIN/multi-column/partial/expression/covering.
☐ State index costs and when not to index; find and drop unused indexes.
☐ Build indexes safely (CONCURRENTLY).

Looking ahead

Chapter 24 — Query Optimization. Reading EXPLAIN plans in depth — scan types, join algorithms, cost estimates — to diagnose why any query is slow and fix it.

One sentence to carry forward: Index the columns you filter, join, and sort on — diagnosing the missing one is the 45s→12ms skill — but every index taxes writes, so add them for measured reads and prune the ones no query uses.