Case Study 2 — The KPI That Counted Line Items as Orders

Chapter 6 showed how a one-to-many join multiplies rows (fan-out). This case shows the aggregation-side consequence: a COUNT(*) over a fanned-out join that quietly counted line items as orders, and inflated a board-level KPI for months.

Background

An e-commerce team reported "total orders" and "orders per day" on an executive dashboard. The underlying query joined orders to order_items (to also compute revenue in the same query) and counted:

-- "Orders per day"   ❌
SELECT o.order_date::date AS day,
       COUNT(*)            AS orders,
       SUM(oi.quantity * oi.unit_price) AS revenue
FROM orders o
JOIN order_items oi ON oi.order_id = o.order_id
GROUP BY o.order_date::date
ORDER BY day;

The revenue numbers were fine. The "orders" numbers were inflated — by the average number of line items per order (about 2.6×). The dashboard proudly showed roughly 2.6 times more orders than the company actually received. Growth looked spectacular; targets were set against the inflated baseline; and a quarter later, when the numbers were reconciled against the order-management system, the correction was awkward and public.

What went wrong

The join orders ⋈ order_items produces one row per line item, not per order. An order with three items contributes three rows. COUNT(*) then counts those rows — so it counts line items, which it labels "orders":

   order 42 has 3 items  →  3 joined rows  →  COUNT(*) adds 3, not 1

The revenue SUM was correct precisely because it operates at line-item grain (each line item genuinely contributes its own revenue). But COUNT(*) of orders does not belong at line-item grain — it needs to count each order once. Mixing an order-grain metric (COUNT of orders) and a line-item-grain metric (SUM of revenue) in one fanned-out query is the root error.

The fixes

1. Count distinct orders:

SELECT o.order_date::date AS day,
       COUNT(DISTINCT o.order_id) AS orders,     -- each order once
       SUM(oi.quantity * oi.unit_price) AS revenue
FROM orders o
JOIN order_items oi ON oi.order_id = o.order_id
GROUP BY o.order_date::date
ORDER BY day;

COUNT(DISTINCT o.order_id) counts each order a single time regardless of how many line items it has. Quick, correct, and a one-word change.

2. Aggregate at separate grains (cleaner for more metrics). Compute order-grain and item-grain metrics in separate subqueries, then combine:

WITH daily_orders AS (
    SELECT order_date::date AS day, COUNT(*) AS orders
    FROM orders GROUP BY order_date::date
),
daily_revenue AS (
    SELECT o.order_date::date AS day, SUM(oi.quantity * oi.unit_price) AS revenue
    FROM orders o JOIN order_items oi ON oi.order_id = o.order_id
    GROUP BY o.order_date::date
)
SELECT d.day, d.orders, COALESCE(r.revenue, 0) AS revenue
FROM daily_orders d
LEFT JOIN daily_revenue r ON r.day = d.day
ORDER BY d.day;

Here orders is counted from the orders table at its natural grain (no fan-out at all), and revenue is summed separately. This scales cleanly when you add more metrics at different grains.

The analysis

  1. COUNT(*) counts rows at the current grain. After a one-to-many join, the grain is the "many" side. If your metric is about the "one" side (orders, customers, products), COUNT(*) over the joined result over-counts it. Use COUNT(DISTINCT key) or count from the un-joined table.

  2. Be suspicious of mixing grains in one query. Revenue (per line item) and order count (per order) live at different grains. Computing both over a single fanned-out join makes at least one of them wrong. Either use DISTINCT for the coarser metric or separate the aggregations.

  3. Correct-looking halves hide wrong halves. As in Chapter 6's revenue case, the right revenue number lent false credibility to the wrong order count. A partially-correct report is more dangerous than an obviously broken one.

  4. Validate against a join-free baseline. SELECT COUNT(*) FROM orders; gives the true order count in one line. Comparing the dashboard's total to that baseline would have caught the 2.6× inflation on day one. Always cross-check a grouped count against the simple count.

  5. Define KPIs with their grain. "Orders per day" should be specified as "distinct orders with order_date in the day." Writing the grain into the metric definition makes the correct query obvious and the wrong one easy to reject in review.

Discussion questions

  1. Explain precisely why COUNT(*) over orders ⋈ order_items counts line items, not orders.
  2. Why was the revenue SUM correct in the same query that got the order count wrong?
  3. Compare the two fixes. When is the separate-aggregations (CTE) approach worth the extra code?
  4. Write the one-line baseline query that would have caught the inflation immediately.
  5. ⭐ Generalize the rule: before adding any COUNT(*) or SUM(...) to a query with joins, what two questions should you ask? (Recall Chapter 6's "grain" and "no-match rows.")