Case Study 1 — The Average of Averages

Averaging things that are already averages is one of the most seductive mistakes in analytics. The query runs, the number looks reasonable, and it's wrong — sometimes by a lot. This is a real encounter with Simpson's paradox, in SQL.

Background

A marketplace tracked "average order value" (AOV) as a headline metric, broken out by region. An analyst built a company-wide AOV figure by first computing each region's AOV, then averaging those regional numbers:

-- "Company-wide AOV" = average of the regional AOVs   ❌
SELECT AVG(region_aov) AS company_aov
FROM (
    SELECT region, AVG(order_total) AS region_aov
    FROM orders
    GROUP BY region
) per_region;

The number it produced — call it $68 — was used in investor updates. But when finance computed AOV the straightforward way (total revenue ÷ number of orders), they got $52. A 30% discrepancy in a headline metric, and the analyst's version was the one in the deck.

What went wrong

Averaging per-region averages weights each region equally, no matter how many orders it has. Suppose:

   Region    Orders   Avg order value
   ─────────────────────────────────
   North     90,000        $50
   South        100       $100      ← tiny region, big average
  • Average of averages: ($50 + $100) / 2 = $75 — as if North and South mattered equally.
  • True overall AOV: total revenue ÷ total orders = (90,000×$50 + 100×$100) / 90,100 = $4,510,000 / 90,100 ≈ **$50.06** — dominated, correctly, by the 90,000 North orders.

The tiny South region, with its high average, got the same weight as the enormous North region in the "average of averages." That's the trap: the outer AVG treats each group's summary as a single equal data point, discarding the fact that groups have wildly different sizes. This weighting error is the mechanism behind Simpson's paradox, where aggregates computed at different levels disagree — and even reverse — depending on how groups are weighted.

The fix

Compute the metric at the grain of the thing being averaged — here, orders — in a single aggregation:

-- Correct overall AOV: every ORDER weighted equally
SELECT SUM(order_total) / COUNT(*) AS company_aov
FROM orders;
-- (equivalently, AVG(order_total) over all orders)

AVG(order_total) over all orders, or SUM(total)/COUNT(*), weights each order equally — which is what "average order value" means. If you do want a per-region breakdown, present the regional AOVs alongside their order counts so no one mistakes a 100-order region for a 90,000-order one:

SELECT region,
       COUNT(*)                  AS orders,
       ROUND(AVG(order_total),2) AS region_aov
FROM orders
GROUP BY region
ORDER BY orders DESC;

And if a weighted combination of regional averages is genuinely needed (e.g., the regions come from separate systems), weight each by its order count:

   company_aov = Σ(region_aov × region_orders) / Σ(region_orders)

— which, of course, simplifies right back to total revenue ÷ total orders.

The analysis

  1. You cannot generally average averages. An average is a ratio (sum ÷ count). Averaging ratios discards the counts, so each group is silently weighted equally. Only when all groups are the same size does it coincide with the true overall average.

  2. Compute metrics at their natural grain. "Average order value" is a property of orders, so aggregate over orders in one pass. Don't pre-aggregate to a coarser grain (region) and then aggregate again — re-aggregation loses the weights.

  3. Simpson's paradox is the deep version of this. When you split data into groups, the per-group story can differ from — or reverse — the overall story, purely because of differing group sizes. Whenever you see surprising aggregates, check whether group sizes are doing the talking.

  4. Always show the denominators. A regional AOV of $100 means something very different over 100 orders than over 90,000. Reporting averages without their counts invites exactly this misreading. Put COUNT(*) next to every AVG.

  5. Reconcile against the single-pass figure. The bug surfaced only because finance computed AOV directly and got a different number. Make "does this match the one-line overall calculation?" a standard check for any grouped metric.

Discussion questions

  1. Work the North/South numbers yourself. Why is "average of averages" $75 while the true AOV is ≈ $50?
  2. Under what exact condition does averaging group averages equal the true overall average?
  3. Rewrite "company-wide AOV" correctly in one query.
  4. Why does showing COUNT(*) beside each regional AVG help prevent this misreading?
  5. ⭐ Give a non-AOV example from your own domain where averaging per-group rates could mislead (hint: think conversion rates across segments of very different sizes).