Case Study 1 — The Table That Bloated to 10× Its Size
A table grew to ten times the size of its actual data, queries slowed, and disk filled — not because of more data, but because dead tuples couldn't be cleaned up. The culprit was a long-running transaction silently blocking VACUUM. This is the most common PostgreSQL operational surprise, and the internals explain it exactly.
Background
A sessions table tracked active user sessions — heavily updated (every request touched a session row to bump last_seen). It held about 500,000 live rows, which should be a few hundred megabytes. Yet over a couple of weeks it ballooned to several gigabytes, queries against it slowed, autovacuum seemed to run constantly without helping, and disk usage alarms fired. No one had added data — the live row count was steady. Where was the space going?
Diagnosis
The engineer checked the table's statistics:
SELECT relname, n_live_tup, n_dead_tup
FROM pg_stat_user_tables WHERE relname = 'sessions';
-- sessions | 500000 (live) | 48000000 (dead) ← 48 MILLION dead tuples!
Forty-eight million dead tuples against half a million live ones. Every UPDATE to a session (and there were millions per day) created a new row version and marked the old one dead (MVCC, Chapter 26). Normally autovacuum reclaims those — but here it couldn't. Checking for long-running transactions revealed why:
SELECT pid, state, now() - xact_start AS duration, query
FROM pg_stat_activity
WHERE state <> 'idle' ORDER BY xact_start;
-- a transaction had been open for 9 DAYS (a stuck analytics job / leaked connection)
A transaction had been left open for nine days — a buggy reporting job that began a transaction and never committed or rolled back (a leaked connection). Here's the internals chain (Chapter 28): a dead tuple can only be reclaimed once no transaction can still see it. The nine-day-old transaction held a snapshot from nine days ago, so VACUUM was forbidden from removing any tuple that became dead in those nine days — because that ancient transaction might (in principle) still need to see it. So dead tuples piled up by the tens of millions, the table and its indexes bloated to 10×, and autovacuum ran fruitlessly (it couldn't remove what the old snapshot pinned).
The fix
- Kill the offending transaction. Once the nine-day transaction was terminated (
pg_terminate_backend(pid)), VACUUM could finally reclaim the backlog:
SELECT pg_terminate_backend(12345); -- end the leaked transaction
VACUUM sessions; -- now it can reclaim the 48M dead tuples
The dead-tuple count dropped, and the space became reusable (the table stopped growing). To actually shrink the on-disk file back down (return space to the OS), they ran a one-time rewrite during a maintenance window:
VACUUM FULL sessions; -- rewrites the table compactly (exclusive lock — off-hours only)
- Prevent recurrence. The root cause was a leaked long transaction, so they:
- Set
idle_in_transaction_session_timeoutso a transaction left open and idle is automatically aborted after a few minutes — no more nine-day leaks. - Fixed the reporting job to commit/rollback promptly and not hold a transaction open while doing slow application work. - Added monitoring onn_dead_tupand on the age of the oldest transaction, alerting before bloat got out of hand. - Tuned autovacuum to be more aggressive on this high-churn table.
The analysis
-
Bloat is dead tuples that can't be reclaimed. MVCC makes every
UPDATE/DELETEleave a dead tuple; VACUUM reclaims them. When reclamation can't keep up — or is blocked — the table grows far beyond its live data. The live row count looked fine; the dead count was the story. -
Long transactions block VACUUM. This is the precise mechanism (Chapter 28, and why Chapters 26–27 warn against long transactions): a dead tuple can't be removed while any transaction's snapshot might still see it. One ancient open transaction freezes cleanup for the entire database, not just its own data.
-
A leaked/idle transaction is a time bomb. A reporting job, a debugging session left open, a connection-pool bug — any transaction that stays open holds back cleanup indefinitely.
idle_in_transaction_session_timeoutis the seatbelt. -
VACUUMreclaims for reuse;VACUUM FULLreturns space to the OS. PlainVACUUM(and autovacuum) makes dead-tuple space reusable (the table stops growing). To shrink the file back, you need a rewrite (VACUUM FULLorpg_repack) — heavy, locking, occasional. -
Monitor the internals.
n_dead_tupand the age of the oldest transaction are key health metrics (Chapter 38). Bloat is preventable with monitoring and short transactions; it's painful once it's gigabytes.
Discussion questions
- The live row count was steady. Why did the table grow 10×?
- Explain exactly how a nine-day-old transaction prevented VACUUM from reclaiming dead tuples.
- Why did killing the old transaction let VACUUM finally work?
- Difference between
VACUUMandVACUUM FULLhere — which stops growth, which shrinks the file? - ⭐ Name three safeguards that prevent this class of bloat, and what each protects against.