Case Study 2 — The App That Ran Out of Connections
A web service started returning "too many connections" errors under load and eventually fell over. The cause wasn't the database's capacity — it was the application opening (and leaking) a connection per request, with no pool. Connection management is part of database access, not an afterthought.
Background
A growing web app connected to PostgreSQL on each request: open a connection, run a query, return the response. Under light traffic it was fine. As traffic grew, errors appeared:
FATAL: sorry, too many clients already
and requests began timing out. The database CPU wasn't maxed; queries were fast. Yet the app was failing — because it had exhausted PostgreSQL's connection limit.
What went wrong: connection-per-request, plus leaks
Two compounding problems:
-
A connection per request. Each incoming HTTP request opened a new
psycopg2.connect(...). PostgreSQL allows a bounded number of connections (max_connections, often ~100), each costing a backend process and memory (Chapters 27–28). Under concurrency, hundreds of simultaneous requests tried to open hundreds of connections — past the limit — and the rest gottoo many clients already. -
Leaked connections. Worse, some code paths opened a connection and, on an error, never closed it (no
finally, no context manager). Those connections lingered, "leaking" — consuming slots without doing work. Over hours, leaked connections accumulated until even modest traffic hit the ceiling.
# ❌ a connection per request, leaked on error
def handle_request(req):
conn = psycopg2.connect(**params) # new connection every time
cur = conn.cursor()
cur.execute(...) # if this raises, conn is never closed → leak
result = cur.fetchall()
conn.close()
return result
The classic symptom: "we added more app servers to handle load, and the database got slower / started erroring." More app servers × per-request connections = even more connections slamming the database — scaling the app made the connection problem worse (Chapter 27).
The fix: pooling + guaranteed cleanup
1. Use a connection pool — a small, reused set of connections shared across requests:
from psycopg2 import pool
db_pool = pool.ThreadedConnectionPool(minconn=2, maxconn=10, **params)
def handle_request(req):
conn = db_pool.getconn() # borrow from the pool
try:
with conn.cursor() as cur:
cur.execute(...)
result = cur.fetchall()
conn.commit()
return result
finally:
db_pool.putconn(conn) # ALWAYS return it (even on error)
Now many requests share ~10 connections instead of opening hundreds. The database sees a small, steady connection count regardless of request volume.
2. Guarantee cleanup with finally / context managers, so a connection is always returned (or closed) even when the handler raises — no more leaks.
3. For larger scale, an external pooler (PgBouncer). In front of the database, PgBouncer multiplexes thousands of client connections onto a small pool of real database connections — the standard production answer (Chapters 27, 38). App servers connect to PgBouncer; PgBouncer manages the precious database connections.
After pooling and guaranteed cleanup, the too many clients errors vanished, and the app scaled smoothly — the database now saw a bounded, reused connection set.
The analysis
-
Connections are a bounded, expensive resource. PostgreSQL handles a limited number well; each is a backend process with memory. Treating connections as free (one per request) exhausts that limit under concurrency. Connection management is a first-class part of database access.
-
Pool, don't open-per-request. A pool maintains a small set of reusable connections shared across many requests — keeping the server's connection count sane while serving high volume. This is the fix for
too many clients. -
Always return/close connections — even on error. Leaks (un-closed connections on exception) silently consume slots until the app fails. Use
finallyor context managers so cleanup is guaranteed regardless of how the code path exits. -
Scaling app servers can worsen it. Without pooling (or with a per-app pool that doesn't account for total servers), adding app instances multiplies connections to the database. PgBouncer in front decouples client connection count from database connection count.
-
Symptom vs. cause. The database wasn't overloaded by work (CPU fine, queries fast) — it was overloaded by connections. "Too many clients" / mysterious timeouts with a healthy-looking database is the connection-exhaustion signature; the fix is pooling, not a bigger box.
Discussion questions
- Why does opening a connection per request fail under concurrency, even when queries are fast?
- How does a leaked connection (un-closed on error) make the problem worse over time?
- How does a connection pool fix
too many clients already? - Why can adding more app servers make connection exhaustion worse, and how does PgBouncer help?
- ⭐ The database's CPU was fine but it was erroring. What does that tell you about diagnosing "the database is slow/failing" complaints?