Chapter 28 — Key Takeaways
The big idea
The internals are the because behind all of Part IV. Once you know how PostgreSQL stores and moves data, the rules of thumb (index trade-offs, durability, why long transactions hurt) become consequences you can reason from. Theme #3 (understand the WHY) at its deepest.
Storage
- Data lives in 8 KB pages; a table's rows are in the heap (a set of pages). I/O and caching happen per page — cost ≈ how many pages an operation touches.
- TOAST stores oversized values (big text/jsonb) compressed/out-of-line.
Buffer pool
shared_bufferscaches recently-used pages in RAM. Cache hit ≫ disk read — why the second run of a query is fast, and why RAM for the working set is a top performance lever. (EXPLAIN (ANALYZE, BUFFERS)shows hits vs. reads.)
WAL = durability
- Write-ahead rule: write the change to the WAL (sequential, flushed to disk) before modifying the data page. On
COMMIT, the WAL is durable. - After a crash, replay the WAL from the last checkpoint → recover every committed change. The WAL also powers replication and point-in-time recovery.
- Never disable
fsync/full_page_writesin production (corruption risk).synchronous_commit = offis a conscious trade (lose last fraction-second of commits, no corruption) — only for data you can afford to lose. (Case Study 2.)
MVCC → dead tuples → VACUUM → bloat
UPDATE/DELETEleave dead tuples (old versions). VACUUM/autovacuum reclaim them (space reusable).- Long transactions block VACUUM (their old snapshot pins dead tuples) → bloat (table/indexes grow far beyond live data). (Case Study 1 — a leaked 9-day transaction → 48M dead tuples.)
VACUUMmakes space reusable;VACUUM FULL(heavy, locking, occasional) returns it to the OS.- Safeguards:
idle_in_transaction_session_timeout, short transactions, monitorn_dead_tupand oldest-transaction age, tune autovacuum.
Checkpoints & replication
- Checkpoints flush dirty pages and bound crash-recovery time.
- Replication (streaming/logical) replays the WAL on other servers (HA, read replicas, migrations).
How it explains Part IV
Indexes (fewer pages read / more pages+WAL written) · EXPLAIN costs (pages, cache) · durability (WAL) · isolation (MVCC snapshots) · long transactions hurt (block VACUUM) · partition DROP is cheap (frees whole files vs. marking tuples dead).
You can now…
- ☐ Explain pages/heap/TOAST and per-page I/O.
- ☐ Describe the buffer pool and cold vs. warm cache.
- ☐ Explain WAL-based durability and crash recovery (and why not to disable fsync).
- ☐ Trace MVCC → dead tuples → VACUUM → bloat, and why long transactions cause it.
Looking ahead — Part V
Performance & Internals complete. Chapter 29 — Connecting Applications opens Part V (Application Integration): connecting from Python (psycopg2), parameterized queries, transactions from code, and connection pooling — the seam between your code and the database.
One sentence to carry forward: Everything is pages cached in the buffer pool and protected by the WAL; MVCC leaves dead tuples that VACUUM reclaims — so keep transactions short, keep durability on, and tune autovacuum, and the rest of Part IV's rules follow.