Case Study 1 — The Average of Averages
Averaging things that are already averages is one of the most seductive mistakes in analytics. The query runs, the number looks reasonable, and it's wrong — sometimes by a lot. This is a real encounter with Simpson's paradox, in SQL.
Background
A marketplace tracked "average order value" (AOV) as a headline metric, broken out by region. An analyst built a company-wide AOV figure by first computing each region's AOV, then averaging those regional numbers:
-- "Company-wide AOV" = average of the regional AOVs ❌
SELECT AVG(region_aov) AS company_aov
FROM (
SELECT region, AVG(order_total) AS region_aov
FROM orders
GROUP BY region
) per_region;
The number it produced — call it $68 — was used in investor updates. But when finance computed AOV the straightforward way (total revenue ÷ number of orders), they got $52. A 30% discrepancy in a headline metric, and the analyst's version was the one in the deck.
What went wrong
Averaging per-region averages weights each region equally, no matter how many orders it has. Suppose:
Region Orders Avg order value
─────────────────────────────────
North 90,000 $50
South 100 $100 ← tiny region, big average
- Average of averages: ($50 + $100) / 2 = $75 — as if North and South mattered equally.
- True overall AOV: total revenue ÷ total orders = (90,000×$50 + 100×$100) / 90,100 = $4,510,000 / 90,100 ≈ **$50.06** — dominated, correctly, by the 90,000 North orders.
The tiny South region, with its high average, got the same weight as the enormous North region in the "average of averages." That's the trap: the outer AVG treats each group's summary as a single equal data point, discarding the fact that groups have wildly different sizes. This weighting error is the mechanism behind Simpson's paradox, where aggregates computed at different levels disagree — and even reverse — depending on how groups are weighted.
The fix
Compute the metric at the grain of the thing being averaged — here, orders — in a single aggregation:
-- Correct overall AOV: every ORDER weighted equally
SELECT SUM(order_total) / COUNT(*) AS company_aov
FROM orders;
-- (equivalently, AVG(order_total) over all orders)
AVG(order_total) over all orders, or SUM(total)/COUNT(*), weights each order equally — which is what "average order value" means. If you do want a per-region breakdown, present the regional AOVs alongside their order counts so no one mistakes a 100-order region for a 90,000-order one:
SELECT region,
COUNT(*) AS orders,
ROUND(AVG(order_total),2) AS region_aov
FROM orders
GROUP BY region
ORDER BY orders DESC;
And if a weighted combination of regional averages is genuinely needed (e.g., the regions come from separate systems), weight each by its order count:
company_aov = Σ(region_aov × region_orders) / Σ(region_orders)
— which, of course, simplifies right back to total revenue ÷ total orders.
The analysis
-
You cannot generally average averages. An average is a ratio (sum ÷ count). Averaging ratios discards the counts, so each group is silently weighted equally. Only when all groups are the same size does it coincide with the true overall average.
-
Compute metrics at their natural grain. "Average order value" is a property of orders, so aggregate over orders in one pass. Don't pre-aggregate to a coarser grain (region) and then aggregate again — re-aggregation loses the weights.
-
Simpson's paradox is the deep version of this. When you split data into groups, the per-group story can differ from — or reverse — the overall story, purely because of differing group sizes. Whenever you see surprising aggregates, check whether group sizes are doing the talking.
-
Always show the denominators. A regional AOV of $100 means something very different over 100 orders than over 90,000. Reporting averages without their counts invites exactly this misreading. Put
COUNT(*)next to everyAVG. -
Reconcile against the single-pass figure. The bug surfaced only because finance computed AOV directly and got a different number. Make "does this match the one-line overall calculation?" a standard check for any grouped metric.
Discussion questions
- Work the North/South numbers yourself. Why is "average of averages" $75 while the true AOV is ≈ $50?
- Under what exact condition does averaging group averages equal the true overall average?
- Rewrite "company-wide AOV" correctly in one query.
- Why does showing
COUNT(*)beside each regionalAVGhelp prevent this misreading? - ⭐ Give a non-AOV example from your own domain where averaging per-group rates could mislead (hint: think conversion rates across segments of very different sizes).