Case Study 2 — The Dashboard That Hid a Third of the Customers
Fan-out (Case Study 1) makes numbers too big. The mirror-image bug — using
INNER JOINwhere you neededLEFT JOIN— makes them too small, by silently dropping the rows that don't match. It's just as common and just as invisible.
Background
A subscription company built an executive dashboard. One headline metric was "average revenue per customer." The query joined customers to their payments and averaged:
-- "Average revenue per customer"
SELECT AVG(total) AS avg_revenue_per_customer
FROM (
SELECT c.customer_id, SUM(p.amount) AS total
FROM customers c
JOIN payments p ON p.customer_id = c.customer_id -- ❌ INNER JOIN
GROUP BY c.customer_id
) per_customer;
The number looked healthy and trended nicely, so it anchored board discussions for two quarters. Then a new analyst, reconciling against the finance system, found the dashboard's customer count was about a third lower than the true number of customers. The "average revenue per customer" wasn't wrong arithmetic — it was averaging over the wrong set of customers.
What went wrong
The INNER JOIN between customers and payments keeps only customers who have at least one payment. Every customer who had signed up but not yet paid — free-trial users, brand-new accounts, churned users who never converted — was silently excluded from the calculation. The metric was really "average revenue per paying customer," a meaningfully different (and rosier) number than "average revenue per customer."
customers (all) payments
┌───────────────┐ ┌─────┐
│ paid ✔✔✔✔✔✔✔ │░░░░░░│ │ ← INNER JOIN keeps only these
│ never paid ✘✘✘ │ └─────┘ (the ✘ customers vanish)
└───────────────┘
Because the excluded customers had zero revenue, dropping them inflated the average: you removed all the zeros from the denominator and the numerator. The board believed each customer was worth more than they actually were — a dangerous input to spending and forecasting decisions.
The fix
Use a LEFT JOIN so every customer is kept, and treat a customer with no payments as zero revenue (not as "excluded"):
SELECT AVG(total) AS avg_revenue_per_customer
FROM (
SELECT c.customer_id, COALESCE(SUM(p.amount), 0) AS total
FROM customers c
LEFT JOIN payments p ON p.customer_id = c.customer_id -- keep ALL customers
GROUP BY c.customer_id
) per_customer;
Two changes matter:
LEFT JOINkeeps customers with no payments (they appear withNULLpayment data instead of vanishing).COALESCE(SUM(p.amount), 0)turns those customers'NULLrevenue into0, so they count as zero-revenue customers — which is exactly what they are.
The team also did something wise: they kept both metrics, clearly labeled — "average revenue per customer" (all customers, via LEFT JOIN) and "average revenue per paying customer" (via INNER JOIN). Both are legitimate; the bug was calling the second one by the first one's name.
The analysis
-
INNER JOINis a silent filter. Every inner join removes rows that don't match. Usually that's fine (an order always has a customer). But when the non-matching rows are meaningful — customers without payments, products without sales, days without events — an inner join quietly biases your results. Always ask: "do the no-match rows matter to this question?" If yes, you need an outer join. -
Averages and counts are especially sensitive. Dropping zero-valued rows inflates averages and understates counts, often dramatically. "Per X" metrics demand that every X be in the denominator — which usually means a
LEFT JOINfrom the X table. -
LEFT JOIN+COALESCEis the canonical "include the zeros" pattern. Keep all rows from the entity you're measuring, then convert the resultingNULLaggregates to0. Memorize this shape; you'll use it constantly for "per customer / per product / per day, including the empties." -
Name your metrics honestly. Half the damage here was linguistic: "average revenue per customer" silently meant "per paying customer." Precise metric definitions — written down, agreed on — prevent the wrong query from masquerading as the right one.
-
Reconcile against an independent source. As in Case Study 1, the bug surfaced only when someone cross-checked the customer count against finance. Independent reconciliation is how silent join bugs get caught.
Discussion questions
- Explain exactly why the
INNER JOINinflated "average revenue per customer." - Why does
LEFT JOIN+COALESCE(SUM(...), 0)produce the correct average? What would happen if you usedLEFT JOINbut forgot theCOALESCE? - Give three more "per X" metrics where an inner join would silently bias the result.
- The team kept both the inner-join and left-join versions. Why is that better than "fixing" one into the other?
- ⭐ Contrast this case with Case Study 1. One bug makes numbers too big, the other too small — state the general question you should ask about every join to avoid both.