Case Study 26-1: Priya Builds the Quarterly Sales Forecast
Characters: - Priya Okonkwo — Junior Analyst, Acme Corp - Sandra Chen — VP of Sales, Acme Corp - Marcus Webb — IT Manager, Acme Corp (brief appearance)
Business Question: What should Acme expect for Q3 and Q4 revenue by region, and how should those projections influence inventory purchasing and staffing decisions?
Deliverable: A two-page summary with one chart per region plus an executive summary, due in 48 hours.
The Brief
Sandra Chen's Friday afternoon conversation with Priya was direct: "I need defensible revenue projections for Q3 and Q4. We've been managing inventory by gut instinct for two years and I want to change that. I don't need perfection — I need a range I can take into a budget meeting."
Priya had access to acme_sales_2023.csv, which contained 18 months of daily sales transaction data across all four regions. Her first step was not writing code — it was planning the analysis.
She wrote three questions on a sticky note: 1. What is the trend direction and strength for each region? 2. Does Acme have identifiable seasonal patterns? 3. What uncertainty range is honest and defensible?
The Analysis
Step 1: Data Preparation
"""
Acme Q3/Q4 Forecast Analysis — Priya Okonkwo, Acme Corp
"""
import pandas as pd
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
from datetime import datetime
# Load and aggregate to monthly by region
df = pd.read_csv("data/acme_sales_2023.csv", parse_dates=["sale_date"])
df["month"] = df["sale_date"].dt.to_period("M")
monthly_by_region = (
df.groupby(["month", "region"])["revenue"]
.sum()
.reset_index()
.sort_values(["region", "month"])
)
# Also get all-region monthly totals
monthly_total = (
df.groupby("month")["revenue"]
.sum()
.reset_index()
)
print("Data shape:", df.shape)
print("Date range:", df["sale_date"].min(), "to", df["sale_date"].max())
print("Regions:", df["region"].unique())
print("\nMonthly revenue sample:")
print(monthly_by_region.head(12).to_string(index=False))
Step 2: Seasonality Check
Before forecasting, Priya checked whether Acme had a seasonal pattern. She did not want to apply a linear trend to data with strong seasonality without acknowledging it.
def compute_seasonality_index(monthly_df: pd.DataFrame) -> pd.DataFrame:
"""
Compute monthly seasonality indices from multi-year monthly data.
Returns a DataFrame with month number, month name, avg_revenue,
and seasonality_index (100 = average month).
"""
monthly_df = monthly_df.copy()
monthly_df["month_num"] = monthly_df["month"].dt.month
monthly_df["month_name"] = monthly_df["month"].dt.strftime("%B")
monthly_avg = (
monthly_df.groupby(["month_num", "month_name"])["revenue"]
.mean()
.reset_index()
.rename(columns={"revenue": "avg_monthly_revenue"})
)
overall_avg = monthly_avg["avg_monthly_revenue"].mean()
monthly_avg["seasonality_index"] = (
monthly_avg["avg_monthly_revenue"] / overall_avg * 100
).round(1)
return monthly_avg.sort_values("month_num")
# Convert period index back to datetime for the function
monthly_total_dt = monthly_total.copy()
monthly_total_dt["month"] = monthly_total_dt["month"].dt.to_timestamp()
seasonality = compute_seasonality_index(monthly_total_dt)
print("\nAcme Corp Monthly Seasonality Profile:")
print(seasonality[["month_name", "seasonality_index"]].to_string(index=False))
What Priya found: September had a seasonality index of 118 (18% above average), consistent with back-to-office/back-to-school office supply demand. July had an index of 82 (18% below average). Q3 as a whole showed an index of 94 — slightly below average — driven by July's weakness, even though September was strong.
This meant a straight linear forecast would need a seasonal adjustment for Q3.
Step 3: Linear Trend and Forecast by Region
def regional_linear_forecast(
monthly_region_df: pd.DataFrame,
region: str,
quarters_ahead: int = 2,
) -> dict:
"""
Fit linear trend and generate quarterly forecast for one region.
Returns dict with trend stats and forecast quarters.
"""
region_df = monthly_region_df[
monthly_region_df["region"] == region
].copy().sort_values("month")
# Convert to quarterly
region_df["quarter"] = region_df["month"].dt.to_timestamp().dt.to_period("Q")
quarterly = (
region_df.groupby("quarter")["revenue"]
.sum()
.reset_index()
)
quarterly["q_ordinal"] = range(len(quarterly))
# Linear regression
slope, intercept, r_value, p_value, std_err = stats.linregress(
quarterly["q_ordinal"],
quarterly["revenue"],
)
# Residuals and confidence band parameters
fitted = slope * quarterly["q_ordinal"] + intercept
residuals = quarterly["revenue"].values - fitted.values
residual_std = np.std(residuals, ddof=1)
z_95 = 1.96
last_ordinal = quarterly["q_ordinal"].max()
last_quarter = quarterly["quarter"].max()
forecasts = []
for i in range(1, quarters_ahead + 1):
future_ordinal = last_ordinal + i
point = slope * future_ordinal + intercept
margin = z_95 * residual_std * np.sqrt(i)
future_q = last_quarter + i
forecasts.append({
"quarter": str(future_q),
"point_forecast": round(max(0, point), 2),
"lower_95": round(max(0, point - margin), 2),
"upper_95": round(point + margin, 2),
})
avg_quarterly = quarterly["revenue"].mean()
quarterly_growth_pct = (slope / avg_quarterly * 100) if avg_quarterly else 0
return {
"region": region,
"r_squared": round(r_value ** 2, 3),
"quarterly_growth_pct": round(quarterly_growth_pct, 2),
"p_value": round(p_value, 4),
"residual_std": round(residual_std, 2),
"forecasts": forecasts,
}
# Run for all four regions
regions = ["North", "South", "East", "West"]
all_forecasts = [
regional_linear_forecast(monthly_by_region, r, quarters_ahead=2)
for r in regions
]
# Print summary table
print("\nRegional Forecast Summary:")
print(f"{'Region':<10} {'R²':<8} {'Qtly Growth':<14} {'Q3 Forecast':<15} {'Q3 Range (95%)'}")
print("-" * 70)
for r in all_forecasts:
q3 = r["forecasts"][0]
print(
f"{r['region']:<10} {r['r_squared']:<8.3f} "
f"{r['quarterly_growth_pct']:+.1f}% "
f"${q3['point_forecast']:>12,.0f} "
f"${q3['lower_95']:,.0f} – ${q3['upper_95']:,.0f}"
)
Step 4: Seasonality-Adjusted Forecast
# Get Q3 seasonality index (average of July, August, September indices)
q3_months = seasonality[seasonality["month_num"].isin([7, 8, 9])]
q3_seasonality_index = q3_months["seasonality_index"].mean() / 100
print(f"\nQ3 Seasonality Index: {q3_seasonality_index:.3f}")
print(f"(Q3 is typically {(1 - q3_seasonality_index) * 100:.1f}% below annual average)")
print("\nSeasonality-Adjusted Q3 Forecasts:")
print(f"{'Region':<10} {'Raw Forecast':<16} {'Adj. Forecast':<16} {'Adj. Lower':<14} {'Adj. Upper'}")
print("-" * 70)
for r in all_forecasts:
q3 = r["forecasts"][0]
adj_point = q3["point_forecast"] * q3_seasonality_index
adj_lower = q3["lower_95"] * q3_seasonality_index
adj_upper = q3["upper_95"] * q3_seasonality_index
print(
f"{r['region']:<10} ${q3['point_forecast']:>12,.0f} "
f"${adj_point:>12,.0f} "
f"${adj_lower:>10,.0f} ${adj_upper:,.0f}"
)
Step 5: The Chart
Priya built one chart for each region using the plot_forecast() function from the chapter. For the executive summary, she assembled a 2×2 grid of the four regional charts.
fig, axes = plt.subplots(2, 2, figsize=(16, 10))
fig.suptitle(
"Acme Corp Sales Forecast by Region\nQ3 and Q4 Projections with 95% Confidence Bands",
fontsize=14,
fontweight="bold",
)
region_axes = dict(zip(regions, axes.flatten()))
for region, ax in region_axes.items():
# ... (uses the regional data and forecast dict to plot)
# Historical line, trend line, forecast points, confidence band
ax.set_title(f"{region} Region", fontsize=12)
plt.tight_layout()
fig.savefig("output/acme_regional_forecast.png", dpi=150, bbox_inches="tight")
print("Chart saved: output/acme_regional_forecast.png")
The Presentation
Priya prepared a two-page document for Sandra:
Page 1: Executive Summary - Q3 total company forecast: $X.XM to $X.XM (95% confidence range) - Q4 total company forecast: $X.XM to $X.XM (95% confidence range) - North region has the strongest trend (R² = 0.87, +4.2% quarterly growth) - South region has the widest uncertainty band (R² = 0.63, higher variability) - Q3 adjusted for seasonality: typically 6% below annual average
Page 2: Methodology and Limitations 1. Method: linear regression on quarterly revenue, 18 months of history 2. Seasonality: Q3 carries a seasonal adjustment based on historical monthly averages 3. R-squared values: North (0.87), East (0.79), West (0.71), South (0.63) 4. Limitations explicitly stated: - "This model assumes recent growth rates continue. A macro shift — new competitor, supply disruption, economic contraction — is not captured." - "18 months of data is a relatively short history. A 3–5 year dataset would produce more reliable seasonality estimates." - "The confidence bands reflect historical variability only. They do not account for model error or structural breaks."
Sandra's Response
Sandra read both pages. She asked two questions:
"What's driving the South region's higher uncertainty?"
Priya: "South has had the most month-to-month variability — some months were 30% above trend, some 20% below. North is steadier. That's why the confidence bands for South are wider: the model is honest about that variability."
"If I'm ordering inventory, should I plan for the low end or the midpoint?"
Priya: "For inventory, you generally want to plan for slightly above the midpoint to avoid stockouts — but that's a business judgment, not a statistical one. The model gives you the range. You decide where in that range to plan."
Sandra nodded. "This is what I needed. We've been guessing. Now we have a defensible starting point."
Marcus Webb, who had reviewed the analysis from a data quality perspective, added: "The data we gave her is clean. The methodology is sound. If the numbers are wrong next quarter, it won't be because of bad analysis."
What Made This Analysis Good
-
Seasonality was acknowledged, not ignored. Applying a raw linear trend to seasonal data without adjustment would have overstated Q3 projections.
-
Confidence intervals were honest. Priya did not narrow them to make the forecast look more precise than it is.
-
R-squared was explained in plain English. Sandra understood that 0.87 means "the trend is reliable" and 0.63 means "expect more variability."
-
Limitations were stated explicitly in the deliverable, not buried or omitted. This protects the analyst when reality diverges from the forecast.
-
The business question drove the analysis, not the other way around. The question was about inventory and staffing decisions, which requires a range, not a point estimate.
Discussion Questions
-
Priya used 18 months of data for her analysis. How would having only 6 months of data change the analysis, and specifically how would it affect the confidence intervals?
-
Sandra asks for a Q1 forecast (3 quarters ahead instead of 2). How should the confidence bands change relative to the Q3 and Q4 bands, and why?
-
The South region shows R² = 0.63. Should Priya present a different type of forecast for South than for North (R² = 0.87), or treat them the same way? What would you recommend?
-
After Priya presents, Marcus suggests adding transaction count and average order value forecasts alongside revenue. How would this multi-metric approach add value to Sandra's inventory planning?