Chapter 28 Exercises: Sales and Revenue Analytics

These exercises progress from direct application (Tier 1) through advanced design challenges (Tier 5). Each tier builds on the previous one. Work through each tier in order before advancing.

Work with the acme_sales_2023.csv dataset described in the chapter for exercises that require data. If you do not yet have the actual file, the generate_acme_sales_data() function in acme_sales_dashboard.py produces a realistic synthetic equivalent.

Tier 1: Foundations (Direct Application)

These exercises ask you to apply functions exactly as shown in the chapter. Focus on getting the mechanics right.

Exercise 1.1 — Load and Inspect

Load acme_sales_2023.csv (or generate the synthetic version). Print: - The number of rows and columns - The date range (earliest and latest transaction dates) - The total unique count of order_id values - The total unique count of customer_id values

Verify that the revenue column has been correctly derived from quantity × unit_price.

Exercise 1.2 — Revenue Metrics

Using calculate_revenue_metrics(), compute the full-year 2023 revenue summary for Acme Corp. Then filter to Q4 only (October–December) and compute metrics for that period separately.

Answer these questions using only the function output: 1. How does Q4 average order value compare to the full-year average? 2. How many unique customers purchased in Q4? 3. What is the Q4 gross margin percentage?

Exercise 1.3 — Revenue by Dimension

Use revenue_by_dimension() to produce four separate breakdowns: 1. By region 2. By product 3. By salesperson 4. By channel

For each breakdown, identify the top performer and state what percentage of total revenue they represent.

Exercise 1.4 — Monthly Trend

Use monthly_revenue_trend() to produce the full 2023 monthly revenue table. Answer: 1. Which month had the highest revenue? 2. Which month had the largest positive MoM percentage change? 3. Which month (if any) showed a revenue decline from the prior month? 4. What was the average monthly revenue for 2023?

Exercise 1.5 — Basic Pareto

Run pareto_analysis() on the customer_name dimension with top_n=25. Identify: 1. How many customers account for the first 50% of revenue? 2. How many customers account for 80% of revenue? 3. How many customers account for 90% of revenue?

Display your results in a clean printed table with rank, customer name, revenue, and cumulative percentage columns.

Tier 2: Applied (Modified and Extended)

These exercises ask you to modify or extend the chapter's functions for specific business scenarios.

Exercise 2.1 — Quarterly Comparison

The chapter shows monthly revenue trends. Extend this to quarters: 1. Compute total revenue for each quarter of 2023 2. Compute quarter-over-quarter percentage change 3. Identify which quarter had the highest growth rate 4. Compare Q4 revenue to Q1 revenue and state the percentage difference

Write a function quarterly_revenue_summary(df) that returns this as a clean DataFrame.

Exercise 2.2 — Regional Drill-Down

Write a function region_monthly_breakdown(df, region_name) that: 1. Filters the data to the specified region 2. Returns a monthly revenue table for that region only 3. Shows that region's share of the company total in each month

Apply it to all four Acme regions and identify which region shows the most consistent growth pattern.

Exercise 2.3 — Salesperson Performance Report

Create a salesperson performance report that shows for each rep: - Total revenue - Number of orders - Number of unique customers - Average order value (at the order level, not line-item level) - Revenue as percentage of their region's total

Sort by total revenue descending. Which rep has the highest average order value?

Exercise 2.4 — Product Revenue vs. Margin Chart

Create a scatter plot where: - X-axis = product revenue as % of total revenue - Y-axis = product gross margin % - Each point represents one product, labeled with the product name - A horizontal reference line at the company's overall gross margin percentage - A vertical reference line at the average product revenue share

Products in the upper-right quadrant (high revenue, high margin) are your best performers. Which products fall in each quadrant?

Exercise 2.5 — Concentration Risk Trend

Calculate revenue concentration risk separately for each quarter: 1. Q1 2023 — what % of that quarter's revenue came from the top 10 customers? 2. Q2 2023 — same calculation 3. Q3 2023 — same calculation 4. Q4 2023 — same calculation

Is Acme's revenue concentration increasing or decreasing over the year? What would you conclude from that trend?

Tier 3: Integration (Combining Multiple Concepts)

These exercises require you to combine two or more techniques from the chapter to answer a business question.

Exercise 3.1 — The West Region Memo

Reproduce Priya's analysis from Section 28.13. Write Python code that generates all of the evidence used to conclude that West region had a capacity problem:

Regional revenue comparison table
Average order value comparison (West vs. rest)
Customer count comparison (West vs. rest)
Salesperson-to-customer ratio comparison
West monthly revenue as percentage of company average

Then write a three-sentence summary (as a Python print() statement or a comment block) stating the conclusion and its basis.

Exercise 3.2 — Q4 Deep Dive with Pareto and RFM

Analyze Q4 2023 (October–December) using three different lenses:

Pareto: Which 5 customers accounted for the most Q4 revenue? What percentage of Q4 revenue do they represent?
RFM (Q4 only): Run rfm_scoring() on Q4 data only (set analysis_date to '2023-12-31'). How many customers fall into each segment?
Cross-reference: Are the top 5 Pareto customers all "Champions" in the RFM analysis? If not, what segments are they in, and what does that tell you?

Exercise 3.3 — Cohort Revenue Heatmap

Build the full cohort revenue analysis for Acme's 2023 data:

Use build_customer_cohorts() and cohort_revenue_table() to build the pivot table
Create a seaborn heatmap of the result
Normalize each row by its period-0 value so that colors show retention percentage rather than absolute dollars
Identify which acquisition cohort shows the best revenue retention at period 3 (three months after first purchase)

The normalization step is: cohort_table.div(cohort_table[0], axis=0) * 100

Exercise 3.4 — Revenue Bridge

A "revenue bridge" explains the change in revenue from one period to the next by breaking it into components. Build a simplified revenue bridge comparing H1 2023 (January–June) to H2 2023 (July–December):

Calculate total H1 and H2 revenue
Identify customers who were active in H1 but not H2 ("lost customers") — calculate their H1 revenue
Identify customers active in H2 but not H1 ("new customers") — calculate their H2 revenue
Identify customers active in both halves ("retained customers") — calculate H1 and H2 revenue separately, and the change

Present results as:

H1 Revenue:          $X
+ New customer revenue: +$X
- Lost customer revenue: -$X
+/- Retained customer change: +/-$X
= H2 Revenue:        $X

Exercise 3.5 — Sales Velocity by Region

Calculate sales_velocity() for each region separately. Then:

Compare daily revenue rates across regions
Calculate what West region's monthly run rate would need to be to match the North region's per-day rate
Given West's current customer count of 15 and its average revenue per customer, how many additional customers would bring West to North-equivalent velocity?

This is the kind of analysis that turns a "West region is underperforming" observation into a "West region needs X more customers" recommendation.

Tier 4: Challenge (Open-Ended Analysis)

These exercises have no single correct answer. They require you to make analytical choices and defend them.

Exercise 4.1 — Executive Sales Dashboard

Build a complete executive dashboard that goes beyond the four-panel version in the chapter. Your dashboard must include at least six panels and should tell a coherent story about Acme's 2023 performance. Required panels: 1. Monthly revenue with a 3-month rolling average line 2. Revenue by region (your choice of chart type) 3. Top 10 customers 4. Product margin vs. revenue scatter (from Exercise 2.4) 5. MoM growth rate (green/red bar chart) 6. One additional panel of your own design that adds insight not covered in the other five

Include a title for the entire dashboard and appropriate labels on all axes.

Exercise 4.2 — Predictive Projection

Using only the first 9 months of 2023 data (January–September), build a projection for Q4:

Calculate the Q4 seasonal adjustment factor from historical patterns (if no prior year data is available, use the Q3→Q4 ratio from the available data)
Project Q4 revenue for each region
Compare your projection to the actual Q4 numbers
Calculate your projection error as a percentage

What was your biggest source of error? How would you improve the projection with one additional piece of information?

Exercise 4.3 — Customer Health Score

Design and implement a "Customer Health Score" that combines multiple signals into a single number for each customer. Your score must incorporate: - RFM score (from Chapter's rfm_scoring()) - Revenue growth trend (is this customer buying more or less over the year?) - Product breadth (how many different products has the customer purchased?)

Normalize each component to a 0–100 scale before combining. Weight the components however you see fit, but document your weighting rationale. Output a ranked list of customers from healthiest to least healthy.

Exercise 4.4 — West Region Business Case

Write a complete business case analysis (in Python, using print statements and data) supporting the recommendation to hire two additional West region salespeople. Your analysis must:

Quantify the current revenue gap (West actual vs. West target vs. per-region average)
Model the expected revenue uplift from 2 additional reps (use other regions' revenue per rep as a benchmark)
Estimate time to break even on hiring costs (assume $85,000 all-in annual cost per rep, 50% gross margin on new revenue)
Calculate what the West region's market share would be after reaching target
Identify the three largest West region customers and state what growth from those accounts alone would be worth

Present all numbers clearly labeled with units.

Tier 5: Stretch (Beyond the Chapter)

These exercises require research, creativity, or techniques not explicitly covered in the chapter.

Exercise 5.1 — Interactive Dashboard with Plotly

Recreate the four-panel dashboard from Section 28.12 using Plotly Express instead of matplotlib. The interactive version should allow: - Hover tooltips showing exact revenue values - Clickable legend items to toggle regions on/off - The monthly chart should have a range slider for date filtering

Install Plotly with: pip install plotly

Save as an HTML file using fig.write_html("acme_dashboard_interactive.html").

Exercise 5.2 — Automated Report Generation

Build a function generate_pdf_report(df, output_path) that creates a complete sales analytics PDF report using the reportlab library (install: pip install reportlab). The report should include: - An executive summary page with the key metrics table - Regional performance page with the comparison chart - Top 10 customers page with a Pareto chart - Product mix page - Formatted with Acme Corp branding (colors, consistent typography)

This is a significant undertaking. Break it into small steps and test each page independently.

Exercise 5.3 — Linear Regression Sales Forecast

Using scikit-learn's LinearRegression, build a simple sales forecasting model:

Create features from the monthly data: month number (1–12), quarter (1–4), whether it is Q4 (binary), and lagged revenue (prior month revenue)
Train on January–September 2023 data
Predict October–December 2023
Compare predictions to actuals
Calculate mean absolute percentage error (MAPE)

Install: pip install scikit-learn

What features were most important for the model? How does machine learning compare to the simple seasonal adjustment in Exercise 4.2?

Exercise 5.4 — Customer Lifetime Value (CLV) Model

Research the concept of Customer Lifetime Value (CLV) and implement a simplified version:

The basic formula is: CLV = (Average Purchase Value) × (Purchase Frequency) × (Customer Lifespan)

For each Acme customer: 1. Calculate average order value (at the order level) 2. Calculate annual purchase frequency (orders per year) 3. Estimate customer lifespan from cohort data (use median active months as a proxy) 4. Compute CLV for each customer 5. Segment customers by CLV quartile and compare to their RFM segment

Do CLV segments and RFM segments agree? Where do they diverge most?

Exercise 5.5 — Anomaly Detection

Build a system to flag unusual sales activity:

Calculate each customer's average monthly order count and average monthly revenue
For each customer-month combination, flag those that are more than 2 standard deviations above or below the customer's personal average
Produce a list of "anomalous" customer-months with their actual vs. expected values
Categorize anomalies as "positive surprises" (significantly above average) or "risk flags" (significantly below average)

Why might positive anomalies be just as important to investigate as negative ones?