Chapter 28 — Key Takeaways

The big idea

The internals are the because behind all of Part IV. Once you know how PostgreSQL stores and moves data, the rules of thumb (index trade-offs, durability, why long transactions hurt) become consequences you can reason from. Theme #3 (understand the WHY) at its deepest.

Storage

  • Data lives in 8 KB pages; a table's rows are in the heap (a set of pages). I/O and caching happen per page — cost ≈ how many pages an operation touches.
  • TOAST stores oversized values (big text/jsonb) compressed/out-of-line.

Buffer pool

  • shared_buffers caches recently-used pages in RAM. Cache hit ≫ disk read — why the second run of a query is fast, and why RAM for the working set is a top performance lever. (EXPLAIN (ANALYZE, BUFFERS) shows hits vs. reads.)

WAL = durability

  • Write-ahead rule: write the change to the WAL (sequential, flushed to disk) before modifying the data page. On COMMIT, the WAL is durable.
  • After a crash, replay the WAL from the last checkpoint → recover every committed change. The WAL also powers replication and point-in-time recovery.
  • Never disable fsync/full_page_writes in production (corruption risk). synchronous_commit = off is a conscious trade (lose last fraction-second of commits, no corruption) — only for data you can afford to lose. (Case Study 2.)

MVCC → dead tuples → VACUUM → bloat

  • UPDATE/DELETE leave dead tuples (old versions). VACUUM/autovacuum reclaim them (space reusable).
  • Long transactions block VACUUM (their old snapshot pins dead tuples) → bloat (table/indexes grow far beyond live data). (Case Study 1 — a leaked 9-day transaction → 48M dead tuples.)
  • VACUUM makes space reusable; VACUUM FULL (heavy, locking, occasional) returns it to the OS.
  • Safeguards: idle_in_transaction_session_timeout, short transactions, monitor n_dead_tup and oldest-transaction age, tune autovacuum.

Checkpoints & replication

  • Checkpoints flush dirty pages and bound crash-recovery time.
  • Replication (streaming/logical) replays the WAL on other servers (HA, read replicas, migrations).

How it explains Part IV

Indexes (fewer pages read / more pages+WAL written) · EXPLAIN costs (pages, cache) · durability (WAL) · isolation (MVCC snapshots) · long transactions hurt (block VACUUM) · partition DROP is cheap (frees whole files vs. marking tuples dead).

You can now…

  • ☐ Explain pages/heap/TOAST and per-page I/O.
  • ☐ Describe the buffer pool and cold vs. warm cache.
  • ☐ Explain WAL-based durability and crash recovery (and why not to disable fsync).
  • ☐ Trace MVCC → dead tuples → VACUUM → bloat, and why long transactions cause it.

Looking ahead — Part V

Performance & Internals complete. Chapter 29 — Connecting Applications opens Part V (Application Integration): connecting from Python (psycopg2), parameterized queries, transactions from code, and connection pooling — the seam between your code and the database.

One sentence to carry forward: Everything is pages cached in the buffer pool and protected by the WAL; MVCC leaves dead tuples that VACUUM reclaims — so keep transactions short, keep durability on, and tune autovacuum, and the rest of Part IV's rules follow.