Case Study 1 — The Table That Bloated to 10× Its Size

DataField.Dev

Case Study 1 — The Table That Bloated to 10× Its Size

A table grew to ten times the size of its actual data, queries slowed, and disk filled — not because of more data, but because dead tuples couldn't be cleaned up. The culprit was a long-running transaction silently blocking VACUUM. This is the most common PostgreSQL operational surprise, and the internals explain it exactly.

Background

A sessions table tracked active user sessions — heavily updated (every request touched a session row to bump last_seen). It held about 500,000 live rows, which should be a few hundred megabytes. Yet over a couple of weeks it ballooned to several gigabytes, queries against it slowed, autovacuum seemed to run constantly without helping, and disk usage alarms fired. No one had added data — the live row count was steady. Where was the space going?

Diagnosis

The engineer checked the table's statistics:

SELECT relname, n_live_tup, n_dead_tup
FROM pg_stat_user_tables WHERE relname = 'sessions';
--   sessions | 500000 (live) | 48000000 (dead)   ← 48 MILLION dead tuples!

Forty-eight million dead tuples against half a million live ones. Every UPDATE to a session (and there were millions per day) created a new row version and marked the old one dead (MVCC, Chapter 26). Normally autovacuum reclaims those — but here it couldn't. Checking for long-running transactions revealed why:

SELECT pid, state, now() - xact_start AS duration, query
FROM pg_stat_activity
WHERE state <> 'idle' ORDER BY xact_start;
--   a transaction had been open for 9 DAYS (a stuck analytics job / leaked connection)

A transaction had been left open for nine days — a buggy reporting job that began a transaction and never committed or rolled back (a leaked connection). Here's the internals chain (Chapter 28): a dead tuple can only be reclaimed once no transaction can still see it. The nine-day-old transaction held a snapshot from nine days ago, so VACUUM was forbidden from removing any tuple that became dead in those nine days — because that ancient transaction might (in principle) still need to see it. So dead tuples piled up by the tens of millions, the table and its indexes bloated to 10×, and autovacuum ran fruitlessly (it couldn't remove what the old snapshot pinned).

The fix

Kill the offending transaction. Once the nine-day transaction was terminated (pg_terminate_backend(pid)), VACUUM could finally reclaim the backlog:

SELECT pg_terminate_backend(12345);   -- end the leaked transaction
VACUUM sessions;                       -- now it can reclaim the 48M dead tuples

The dead-tuple count dropped, and the space became reusable (the table stopped growing). To actually shrink the on-disk file back down (return space to the OS), they ran a one-time rewrite during a maintenance window:

VACUUM FULL sessions;   -- rewrites the table compactly (exclusive lock — off-hours only)

Prevent recurrence. The root cause was a leaked long transaction, so they: - Set idle_in_transaction_session_timeout so a transaction left open and idle is automatically aborted after a few minutes — no more nine-day leaks. - Fixed the reporting job to commit/rollback promptly and not hold a transaction open while doing slow application work. - Added monitoring on n_dead_tup and on the age of the oldest transaction, alerting before bloat got out of hand. - Tuned autovacuum to be more aggressive on this high-churn table.

The analysis

Bloat is dead tuples that can't be reclaimed. MVCC makes every UPDATE/DELETE leave a dead tuple; VACUUM reclaims them. When reclamation can't keep up — or is blocked — the table grows far beyond its live data. The live row count looked fine; the dead count was the story.
Long transactions block VACUUM. This is the precise mechanism (Chapter 28, and why Chapters 26–27 warn against long transactions): a dead tuple can't be removed while any transaction's snapshot might still see it. One ancient open transaction freezes cleanup for the entire database, not just its own data.
A leaked/idle transaction is a time bomb. A reporting job, a debugging session left open, a connection-pool bug — any transaction that stays open holds back cleanup indefinitely. idle_in_transaction_session_timeout is the seatbelt.
VACUUM reclaims for reuse; VACUUM FULL returns space to the OS. Plain VACUUM (and autovacuum) makes dead-tuple space reusable (the table stops growing). To shrink the file back, you need a rewrite (VACUUM FULL or pg_repack) — heavy, locking, occasional.
Monitor the internals. n_dead_tup and the age of the oldest transaction are key health metrics (Chapter 38). Bloat is preventable with monitoring and short transactions; it's painful once it's gigabytes.

Discussion questions

The live row count was steady. Why did the table grow 10×?
Explain exactly how a nine-day-old transaction prevented VACUUM from reclaiming dead tuples.
Why did killing the old transaction let VACUUM finally work?
Difference between VACUUM and VACUUM FULL here — which stops growth, which shrinks the file?
⭐ Name three safeguards that prevent this class of bloat, and what each protects against.