Case Study 26-1: Priya Builds the Quarterly Sales Forecast

Characters: - Priya Okonkwo — Junior Analyst, Acme Corp - Sandra Chen — VP of Sales, Acme Corp - Marcus Webb — IT Manager, Acme Corp (brief appearance)

Business Question: What should Acme expect for Q3 and Q4 revenue by region, and how should those projections influence inventory purchasing and staffing decisions?

Deliverable: A two-page summary with one chart per region plus an executive summary, due in 48 hours.


The Brief

Sandra Chen's Friday afternoon conversation with Priya was direct: "I need defensible revenue projections for Q3 and Q4. We've been managing inventory by gut instinct for two years and I want to change that. I don't need perfection — I need a range I can take into a budget meeting."

Priya had access to acme_sales_2023.csv, which contained 18 months of daily sales transaction data across all four regions. Her first step was not writing code — it was planning the analysis.

She wrote three questions on a sticky note: 1. What is the trend direction and strength for each region? 2. Does Acme have identifiable seasonal patterns? 3. What uncertainty range is honest and defensible?


The Analysis

Step 1: Data Preparation

"""
Acme Q3/Q4 Forecast Analysis — Priya Okonkwo, Acme Corp
"""

import pandas as pd
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
from datetime import datetime

# Load and aggregate to monthly by region
df = pd.read_csv("data/acme_sales_2023.csv", parse_dates=["sale_date"])
df["month"] = df["sale_date"].dt.to_period("M")

monthly_by_region = (
    df.groupby(["month", "region"])["revenue"]
    .sum()
    .reset_index()
    .sort_values(["region", "month"])
)

# Also get all-region monthly totals
monthly_total = (
    df.groupby("month")["revenue"]
    .sum()
    .reset_index()
)

print("Data shape:", df.shape)
print("Date range:", df["sale_date"].min(), "to", df["sale_date"].max())
print("Regions:", df["region"].unique())
print("\nMonthly revenue sample:")
print(monthly_by_region.head(12).to_string(index=False))

Step 2: Seasonality Check

Before forecasting, Priya checked whether Acme had a seasonal pattern. She did not want to apply a linear trend to data with strong seasonality without acknowledging it.

def compute_seasonality_index(monthly_df: pd.DataFrame) -> pd.DataFrame:
    """
    Compute monthly seasonality indices from multi-year monthly data.
    Returns a DataFrame with month number, month name, avg_revenue,
    and seasonality_index (100 = average month).
    """
    monthly_df = monthly_df.copy()
    monthly_df["month_num"] = monthly_df["month"].dt.month
    monthly_df["month_name"] = monthly_df["month"].dt.strftime("%B")

    monthly_avg = (
        monthly_df.groupby(["month_num", "month_name"])["revenue"]
        .mean()
        .reset_index()
        .rename(columns={"revenue": "avg_monthly_revenue"})
    )

    overall_avg = monthly_avg["avg_monthly_revenue"].mean()
    monthly_avg["seasonality_index"] = (
        monthly_avg["avg_monthly_revenue"] / overall_avg * 100
    ).round(1)

    return monthly_avg.sort_values("month_num")


# Convert period index back to datetime for the function
monthly_total_dt = monthly_total.copy()
monthly_total_dt["month"] = monthly_total_dt["month"].dt.to_timestamp()

seasonality = compute_seasonality_index(monthly_total_dt)
print("\nAcme Corp Monthly Seasonality Profile:")
print(seasonality[["month_name", "seasonality_index"]].to_string(index=False))

What Priya found: September had a seasonality index of 118 (18% above average), consistent with back-to-office/back-to-school office supply demand. July had an index of 82 (18% below average). Q3 as a whole showed an index of 94 — slightly below average — driven by July's weakness, even though September was strong.

This meant a straight linear forecast would need a seasonal adjustment for Q3.

Step 3: Linear Trend and Forecast by Region

def regional_linear_forecast(
    monthly_region_df: pd.DataFrame,
    region: str,
    quarters_ahead: int = 2,
) -> dict:
    """
    Fit linear trend and generate quarterly forecast for one region.
    Returns dict with trend stats and forecast quarters.
    """
    region_df = monthly_region_df[
        monthly_region_df["region"] == region
    ].copy().sort_values("month")

    # Convert to quarterly
    region_df["quarter"] = region_df["month"].dt.to_timestamp().dt.to_period("Q")
    quarterly = (
        region_df.groupby("quarter")["revenue"]
        .sum()
        .reset_index()
    )
    quarterly["q_ordinal"] = range(len(quarterly))

    # Linear regression
    slope, intercept, r_value, p_value, std_err = stats.linregress(
        quarterly["q_ordinal"],
        quarterly["revenue"],
    )

    # Residuals and confidence band parameters
    fitted = slope * quarterly["q_ordinal"] + intercept
    residuals = quarterly["revenue"].values - fitted.values
    residual_std = np.std(residuals, ddof=1)
    z_95 = 1.96

    last_ordinal = quarterly["q_ordinal"].max()
    last_quarter = quarterly["quarter"].max()

    forecasts = []
    for i in range(1, quarters_ahead + 1):
        future_ordinal = last_ordinal + i
        point = slope * future_ordinal + intercept
        margin = z_95 * residual_std * np.sqrt(i)
        future_q = last_quarter + i

        forecasts.append({
            "quarter": str(future_q),
            "point_forecast": round(max(0, point), 2),
            "lower_95": round(max(0, point - margin), 2),
            "upper_95": round(point + margin, 2),
        })

    avg_quarterly = quarterly["revenue"].mean()
    quarterly_growth_pct = (slope / avg_quarterly * 100) if avg_quarterly else 0

    return {
        "region": region,
        "r_squared": round(r_value ** 2, 3),
        "quarterly_growth_pct": round(quarterly_growth_pct, 2),
        "p_value": round(p_value, 4),
        "residual_std": round(residual_std, 2),
        "forecasts": forecasts,
    }


# Run for all four regions
regions = ["North", "South", "East", "West"]
all_forecasts = [
    regional_linear_forecast(monthly_by_region, r, quarters_ahead=2)
    for r in regions
]

# Print summary table
print("\nRegional Forecast Summary:")
print(f"{'Region':<10} {'R²':<8} {'Qtly Growth':<14} {'Q3 Forecast':<15} {'Q3 Range (95%)'}")
print("-" * 70)
for r in all_forecasts:
    q3 = r["forecasts"][0]
    print(
        f"{r['region']:<10} {r['r_squared']:<8.3f} "
        f"{r['quarterly_growth_pct']:+.1f}%        "
        f"${q3['point_forecast']:>12,.0f}   "
        f"${q3['lower_95']:,.0f} – ${q3['upper_95']:,.0f}"
    )

Step 4: Seasonality-Adjusted Forecast

# Get Q3 seasonality index (average of July, August, September indices)
q3_months = seasonality[seasonality["month_num"].isin([7, 8, 9])]
q3_seasonality_index = q3_months["seasonality_index"].mean() / 100

print(f"\nQ3 Seasonality Index: {q3_seasonality_index:.3f}")
print(f"(Q3 is typically {(1 - q3_seasonality_index) * 100:.1f}% below annual average)")

print("\nSeasonality-Adjusted Q3 Forecasts:")
print(f"{'Region':<10} {'Raw Forecast':<16} {'Adj. Forecast':<16} {'Adj. Lower':<14} {'Adj. Upper'}")
print("-" * 70)

for r in all_forecasts:
    q3 = r["forecasts"][0]
    adj_point = q3["point_forecast"] * q3_seasonality_index
    adj_lower = q3["lower_95"] * q3_seasonality_index
    adj_upper = q3["upper_95"] * q3_seasonality_index

    print(
        f"{r['region']:<10} ${q3['point_forecast']:>12,.0f}   "
        f"${adj_point:>12,.0f}   "
        f"${adj_lower:>10,.0f}   ${adj_upper:,.0f}"
    )

Step 5: The Chart

Priya built one chart for each region using the plot_forecast() function from the chapter. For the executive summary, she assembled a 2×2 grid of the four regional charts.

fig, axes = plt.subplots(2, 2, figsize=(16, 10))
fig.suptitle(
    "Acme Corp Sales Forecast by Region\nQ3 and Q4 Projections with 95% Confidence Bands",
    fontsize=14,
    fontweight="bold",
)

region_axes = dict(zip(regions, axes.flatten()))

for region, ax in region_axes.items():
    # ... (uses the regional data and forecast dict to plot)
    # Historical line, trend line, forecast points, confidence band
    ax.set_title(f"{region} Region", fontsize=12)

plt.tight_layout()
fig.savefig("output/acme_regional_forecast.png", dpi=150, bbox_inches="tight")
print("Chart saved: output/acme_regional_forecast.png")

The Presentation

Priya prepared a two-page document for Sandra:

Page 1: Executive Summary - Q3 total company forecast: $X.XM to $X.XM (95% confidence range) - Q4 total company forecast: $X.XM to $X.XM (95% confidence range) - North region has the strongest trend (R² = 0.87, +4.2% quarterly growth) - South region has the widest uncertainty band (R² = 0.63, higher variability) - Q3 adjusted for seasonality: typically 6% below annual average

Page 2: Methodology and Limitations 1. Method: linear regression on quarterly revenue, 18 months of history 2. Seasonality: Q3 carries a seasonal adjustment based on historical monthly averages 3. R-squared values: North (0.87), East (0.79), West (0.71), South (0.63) 4. Limitations explicitly stated: - "This model assumes recent growth rates continue. A macro shift — new competitor, supply disruption, economic contraction — is not captured." - "18 months of data is a relatively short history. A 3–5 year dataset would produce more reliable seasonality estimates." - "The confidence bands reflect historical variability only. They do not account for model error or structural breaks."


Sandra's Response

Sandra read both pages. She asked two questions:

"What's driving the South region's higher uncertainty?"

Priya: "South has had the most month-to-month variability — some months were 30% above trend, some 20% below. North is steadier. That's why the confidence bands for South are wider: the model is honest about that variability."

"If I'm ordering inventory, should I plan for the low end or the midpoint?"

Priya: "For inventory, you generally want to plan for slightly above the midpoint to avoid stockouts — but that's a business judgment, not a statistical one. The model gives you the range. You decide where in that range to plan."

Sandra nodded. "This is what I needed. We've been guessing. Now we have a defensible starting point."

Marcus Webb, who had reviewed the analysis from a data quality perspective, added: "The data we gave her is clean. The methodology is sound. If the numbers are wrong next quarter, it won't be because of bad analysis."


What Made This Analysis Good

  1. Seasonality was acknowledged, not ignored. Applying a raw linear trend to seasonal data without adjustment would have overstated Q3 projections.

  2. Confidence intervals were honest. Priya did not narrow them to make the forecast look more precise than it is.

  3. R-squared was explained in plain English. Sandra understood that 0.87 means "the trend is reliable" and 0.63 means "expect more variability."

  4. Limitations were stated explicitly in the deliverable, not buried or omitted. This protects the analyst when reality diverges from the forecast.

  5. The business question drove the analysis, not the other way around. The question was about inventory and staffing decisions, which requires a range, not a point estimate.


Discussion Questions

  1. Priya used 18 months of data for her analysis. How would having only 6 months of data change the analysis, and specifically how would it affect the confidence intervals?

  2. Sandra asks for a Q1 forecast (3 quarters ahead instead of 2). How should the confidence bands change relative to the Q3 and Q4 bands, and why?

  3. The South region shows R² = 0.63. Should Priya present a different type of forecast for South than for North (R² = 0.87), or treat them the same way? What would you recommend?

  4. After Priya presents, Marcus suggests adding transaction count and average order value forecasts alongside revenue. How would this multi-metric approach add value to Sandra's inventory planning?