Key Takeaways: Chapter 28 — Sales and Revenue Analytics


The Core Idea

Revenue is not a single number — it is a story told across dimensions: who is buying, what they are buying, where they are buying, and whether the relationship is growing or shrinking. Python gives you the tools to read that story in minutes rather than days.


Essential Concepts at a Glance

Concept What It Measures Key Formula / Method
Average Order Value Revenue per transaction (at order level, not line-item level) df.groupby("order_id")["revenue"].sum().mean()
Revenue by Dimension Which segment (region, product, rep) drives the most df.groupby(dim).agg(revenue=("revenue","sum")).sort_values("revenue",ascending=False)
MoM Growth Recent momentum df["revenue"].pct_change() * 100
YoY Growth Trend adjusted for seasonality df["revenue"].shift(12) on monthly sorted data
Pareto Analysis Which 20% drives 80% Sort descending + cumulative sum of percentage
Concentration Risk Dependence on top N customers Top N revenue / total revenue × 100
Product Mix Revenue vs. margin contribution by product margin_mix_pct = gross_margin / total_gross_margin × 100
Cohort Analysis Do customers come back after first purchase? Pivot on (cohort_month, periods_since_first_purchase)
RFM Scoring Segment customers by recency, frequency, spend Quartile-scored (1–4) on three dimensions
Sales Velocity Rate at which money is flowing in total_revenue / date_range_days

The Average Order Value Trap

This is the most common mistake in sales analytics:

# WRONG — gives average revenue per line item
avg_order = df["revenue"].mean()

# RIGHT — gives average revenue per order
avg_order = df.groupby("order_id")["revenue"].sum().mean()

An order with 5 products creates 5 rows. df["revenue"].mean() divides by 5× too many rows. Always group to the order level first.


Growth Rate Cheat Sheet

monthly = df.groupby("year_month")["revenue"].sum().reset_index().sort_values("year_month")

# Month-over-month: tracks momentum, sensitive to seasonality
monthly["mom_pct"] = monthly["revenue"].pct_change() * 100

# Year-over-year: removes seasonal effects
monthly["yoy_pct"] = (
    (monthly["revenue"] - monthly["revenue"].shift(12))
    / monthly["revenue"].shift(12)
    * 100
)

Use MoM to track recent momentum. Use YoY to determine if the business is actually improving or just experiencing seasonal variation.


Pareto Analysis Pattern

def pareto_analysis(df, dimension, value_col="revenue", top_n=20):
    grouped = (
        df.groupby(dimension)[value_col]
        .sum()
        .reset_index()
        .sort_values(value_col, ascending=False)
        .head(top_n)
    )
    total = grouped[value_col].sum()
    grouped["pct_of_total"] = (grouped[value_col] / total * 100).round(1)
    grouped["cumulative_pct"] = grouped["pct_of_total"].cumsum().round(1)
    return grouped.reset_index(drop=True)

# Find the 80% threshold
threshold = pareto_result[pareto_result["cumulative_pct"] >= 80].iloc[0].name + 1
print(f"{threshold} items account for 80% of {value_col}")

Revenue Concentration Risk Reference

Concentration Level Top 10 Customers % Risk Assessment
Low < 50% Revenue is diversified
Medium 50–70% Some dependence; monitor key accounts
High > 70% Meaningful risk; losing one account hurts

HHI above 2,500 also signals high concentration regardless of the top-10 percentage.


RFM Scoring Quick Reference

Score Recency Meaning Frequency Meaning Monetary Meaning
4 (Best) Purchased most recently Most frequent buyer Highest total spend
3 Recent Frequent High spend
2 Somewhat recent Occasional Moderate spend
1 (Worst) Purchased longest ago Rarely buys Lowest spend

Segment Labels (sum of R + F + M): - 10–12: Champions — reward and leverage - 8–9: Loyal Customers — nurture and deepen - 6–7: Potential Loyalists — engage and cross-sell - 4–5: At Risk — win-back campaigns - 3: Lost — minimal investment


Product Mix Analysis: Revenue vs. Margin

Products where margin_mix_pct > revenue_pct are margin champions — they contribute disproportionately to profitability. Consider promoting these products.

Products where revenue_pct >> margin_mix_pct are drag products — they generate revenue but thin the overall margin. Consider repricing or de-emphasizing.

product_df["revenue_pct"] = product_df["revenue"] / product_df["revenue"].sum() * 100
product_df["margin_mix_pct"] = product_df["gross_margin"] / product_df["gross_margin"].sum() * 100
product_df["margin_vs_revenue_delta"] = product_df["margin_mix_pct"] - product_df["revenue_pct"]
# Positive delta = margin champion; negative delta = margin drag

Dashboard Design Checklist

When building a sales dashboard:

  • [ ] Use horizontal bars when labels are long (customer names, product names)
  • [ ] Use pie charts only when there are 6 or fewer categories
  • [ ] Include a trend line on bar charts when you want to show direction
  • [ ] Format all monetary Y-axes with dollar signs and commas using FuncFormatter
  • [ ] Include a title on every panel, not just on the figure overall
  • [ ] Use consistent colors across all panels (same region = same color everywhere)
  • [ ] Save with dpi=150 and bbox_inches="tight" for clean output

The West Region Diagnostic Framework

When a region, product, or team is underperforming, test multiple hypotheses before concluding:

Hypothesis What to Check Data Signal
Deal quality problem Average order value vs. peers AOV significantly lower
Pricing problem Gross margin % vs. peers Margin % significantly lower
Product mix problem Top products vs. peers Different product ranking
Market saturation Total addressable accounts Fewer potential customers
Capacity problem Reps vs. accounts ratio Revenue per rep is normal; accounts per rep is low
Seasonal problem Month-by-month pattern Gap concentrated in specific months

Common Mistakes to Avoid

  1. Calculating AOV from line items. Always group to order level first.
  2. Using MoM growth for seasonal comparisons. YoY removes seasonal noise.
  3. Ignoring margin in product analysis. Revenue and margin rarely tell the same story.
  4. Confusing customer count with customer concentration. Many customers can still be highly concentrated.
  5. Building cohort analysis without sorting by acquisition date. The period_number calculation assumes chronological order.
  6. Using pie charts with more than 6 categories. Switch to a ranked horizontal bar chart.
  7. Forgetting to validate data before analysis. Check for duplicates, negatives, and nulls first.

Key Functions Summary

# In sales_analysis.py
from sales_analysis import (
    load_sales_data,          # Load CSV, derive revenue and margin columns
    calculate_revenue_metrics, # Total revenue, AOV, customer count summary
    revenue_by_dimension,     # Group by any column with % contribution
    monthly_revenue_trend,    # Monthly table with MoM and YoY growth
    revenue_growth_rate,      # Growth by monthly, quarterly, or yearly period
    pareto_analysis,          # 80/20 cumulative contribution table
    revenue_concentration_risk, # Top-N customer concentration + HHI
    product_mix_analysis,     # Revenue vs. margin mix by product
    conversion_rate,          # Sales funnel stage conversion rates
    build_customer_cohorts,   # Add cohort_month to DataFrame
    cohort_revenue_table,     # Pivot table of cohort × period revenue
    rfm_scoring,              # Recency/Frequency/Monetary customer scores
    sales_velocity,           # Daily rate and monthly run-rate metrics
    build_sales_dashboard,    # 4-panel matplotlib figure
)

One-Paragraph Summary

Chapter 28 gives you a complete Python toolkit for understanding your sales revenue. The core functions let you compute aggregate metrics (total revenue, average order value, gross margin), break revenue down by any dimension (region, product, salesperson, channel), track growth over time with both month-over-month and year-over-year comparisons, identify your most important customers and products with Pareto analysis, measure revenue concentration risk, analyze product profitability beyond just revenue, understand cohorts of customers and whether they retain, score customers with RFM methodology, calculate sales velocity, and build a four-panel visual dashboard. The chapter demonstrates all of these tools in the context of Acme Corp's West region underperformance — where the data conclusively showed a capacity problem rather than a market or performance problem — and in Maya Reyes's consulting practice, where the same tools revealed which project types were genuinely profitable and where she was leaving money on the table.