Case Study 2 — The App That Ran Out of Connections

DataField.Dev

Case Study 2 — The App That Ran Out of Connections

A web service started returning "too many connections" errors under load and eventually fell over. The cause wasn't the database's capacity — it was the application opening (and leaking) a connection per request, with no pool. Connection management is part of database access, not an afterthought.

Background

A growing web app connected to PostgreSQL on each request: open a connection, run a query, return the response. Under light traffic it was fine. As traffic grew, errors appeared:

FATAL: sorry, too many clients already

and requests began timing out. The database CPU wasn't maxed; queries were fast. Yet the app was failing — because it had exhausted PostgreSQL's connection limit.

What went wrong: connection-per-request, plus leaks

Two compounding problems:

A connection per request. Each incoming HTTP request opened a new psycopg2.connect(...). PostgreSQL allows a bounded number of connections (max_connections, often ~100), each costing a backend process and memory (Chapters 27–28). Under concurrency, hundreds of simultaneous requests tried to open hundreds of connections — past the limit — and the rest got too many clients already.
Leaked connections. Worse, some code paths opened a connection and, on an error, never closed it (no finally, no context manager). Those connections lingered, "leaking" — consuming slots without doing work. Over hours, leaked connections accumulated until even modest traffic hit the ceiling.

# ❌ a connection per request, leaked on error
def handle_request(req):
    conn = psycopg2.connect(**params)     # new connection every time
    cur = conn.cursor()
    cur.execute(...)                       # if this raises, conn is never closed → leak
    result = cur.fetchall()
    conn.close()
    return result

The classic symptom: "we added more app servers to handle load, and the database got slower / started erroring." More app servers × per-request connections = even more connections slamming the database — scaling the app made the connection problem worse (Chapter 27).

The fix: pooling + guaranteed cleanup

1. Use a connection pool — a small, reused set of connections shared across requests:

from psycopg2 import pool
db_pool = pool.ThreadedConnectionPool(minconn=2, maxconn=10, **params)

def handle_request(req):
    conn = db_pool.getconn()              # borrow from the pool
    try:
        with conn.cursor() as cur:
            cur.execute(...)
            result = cur.fetchall()
        conn.commit()
        return result
    finally:
        db_pool.putconn(conn)             # ALWAYS return it (even on error)

Now many requests share ~10 connections instead of opening hundreds. The database sees a small, steady connection count regardless of request volume.

2. Guarantee cleanup with finally / context managers, so a connection is always returned (or closed) even when the handler raises — no more leaks.

3. For larger scale, an external pooler (PgBouncer). In front of the database, PgBouncer multiplexes thousands of client connections onto a small pool of real database connections — the standard production answer (Chapters 27, 38). App servers connect to PgBouncer; PgBouncer manages the precious database connections.

After pooling and guaranteed cleanup, the too many clients errors vanished, and the app scaled smoothly — the database now saw a bounded, reused connection set.

The analysis

Connections are a bounded, expensive resource. PostgreSQL handles a limited number well; each is a backend process with memory. Treating connections as free (one per request) exhausts that limit under concurrency. Connection management is a first-class part of database access.
Pool, don't open-per-request. A pool maintains a small set of reusable connections shared across many requests — keeping the server's connection count sane while serving high volume. This is the fix for too many clients.
Always return/close connections — even on error. Leaks (un-closed connections on exception) silently consume slots until the app fails. Use finally or context managers so cleanup is guaranteed regardless of how the code path exits.
Scaling app servers can worsen it. Without pooling (or with a per-app pool that doesn't account for total servers), adding app instances multiplies connections to the database. PgBouncer in front decouples client connection count from database connection count.
Symptom vs. cause. The database wasn't overloaded by work (CPU fine, queries fast) — it was overloaded by connections. "Too many clients" / mysterious timeouts with a healthy-looking database is the connection-exhaustion signature; the fix is pooling, not a bigger box.

Discussion questions

Why does opening a connection per request fail under concurrency, even when queries are fast?
How does a leaked connection (un-closed on error) make the problem worse over time?
How does a connection pool fix too many clients already?
Why can adding more app servers make connection exhaustion worse, and how does PgBouncer help?
⭐ The database's CPU was fine but it was erroring. What does that tell you about diagnosing "the database is slow/failing" complaints?