Chapter 26 Exercises: Business Forecasting and Trend Analysis

Work through these exercises sequentially within each tier. Each tier builds on the previous. You will need pandas, numpy, matplotlib, scipy, and statsmodels installed.

pip install pandas numpy matplotlib scipy statsmodels openpyxl

Tier 1: Foundations — Understanding Time Series and Moving Averages

These exercises build intuition for the core concepts before applying them to real data.

Exercise 1.1 — Identify Time Series Components

For each of the following descriptions, identify which component(s) — trend, seasonality, cyclicality, noise — are most prominent. Explain your reasoning in 2–3 sentences each.

Dataset A: Monthly sales of holiday greeting cards. Sales jump dramatically in November and December every year, then fall sharply in January.

Dataset B: Annual revenue of a mid-size manufacturing company over 20 years. Revenue grew steadily from 2000–2007, fell in 2008–2009, grew again through 2019, and fell in 2020.

Dataset C: Daily temperature readings from a weather station, showing random fluctuations around a gradual warming trend from winter to summer.

Dataset D: Monthly website traffic for a tech startup. Traffic grows by roughly 5% per month with some months higher or lower than that average.

Exercise 1.2 — Compute Moving Averages Manually

Given this quarterly revenue series:

Q1 2022: $285,000
Q2 2022: $310,000
Q3 2022: $278,000
Q4 2022: $342,000
Q1 2023: $295,000
Q2 2023: $330,000
Q3 2023: $312,000
Q4 2023: $365,000

By hand (no Python): 1. Compute the 4-quarter simple moving average for Q4 2022, Q1 2023, Q2 2023, Q3 2023, Q4 2023. 2. Explain why there is no 4-quarter SMA value for Q1, Q2, or Q3 2022. 3. Describe in plain English what the 4-quarter SMA tells Sandra about the trend.

In Python: Write the same computation using pandas.Series.rolling(window=4).mean() and verify your manual calculations match.

Exercise 1.3 — Compare Moving Average Windows

Write a script that: 1. Creates the quarterly revenue series from Exercise 1.2 as a pandas Series 2. Computes 2-period, 4-period, and 6-period SMAs 3. Plots all three alongside the raw data in a single chart 4. Writes three sentences explaining: which window produces the smoothest line, which reacts fastest to the Q4 2023 increase, and which would you choose to show a non-technical executive?

Exercise 1.4 — EMA Smoothing Factor Intuition

Using pandas.ewm(span=N).mean():

Apply EMA with span=2, span=6, and span=12 to the quarterly revenue series
Plot all three alongside the raw data
Answer: which span value makes the EMA most closely track the raw data? Which makes it smoothest? What business question would each answer best?

Exercise 1.5 — Percent Change Calculation

Given this monthly revenue data:

monthly_revenue = pd.Series([
    95000, 88000, 102000, 115000, 98000, 121000,
    94000, 91000, 108000, 119000, 103000, 128000
], name="revenue")

Compute month-over-month percent change for each month
Compute the 3-month rolling average of the month-over-month change
Identify the month with the largest positive change and the largest negative change
If this were January through December data, compute what the "seasonality-adjusted" growth rate is — remove seasonal effects by averaging same-month values across years (you only have one year here, so this is a conceptual exercise: describe what you would need)

Tier 2: Applied — Working with Real Business Data Structures

These exercises use realistic data structures and the full toolkit from the chapter.

Exercise 2.1 — Build a Monthly Seasonality Profile

Create a synthetic two-year monthly revenue dataset for a business that has: - An overall upward trend (about 2% per month growth baseline) - Seasonal pattern: strong January/September, weak July/August - Random noise of ±10%

import pandas as pd
import numpy as np

np.random.seed(42)
months = pd.date_range("2022-01-01", periods=24, freq="MS")
baseline = np.array([1.15, 0.90, 0.95, 1.00, 1.05, 0.95,
                     0.80, 0.85, 1.20, 1.05, 1.00, 1.10] * 2)
trend = np.array([100_000 * 1.02**i for i in range(24)])
noise = np.random.normal(1.0, 0.10, 24)
revenue = (trend * baseline * noise).round(2)

df = pd.DataFrame({"month": months, "revenue": revenue})

Using this dataset: 1. Compute the monthly seasonality profile (average by month, express as index) 2. Identify the two strongest and two weakest months 3. Plot the raw data alongside the monthly average pattern 4. Explain how Sandra could use this profile to adjust a linear forecast

Exercise 2.2 — Linear Trend Analysis with Statistics

Using the same synthetic dataset from Exercise 2.1:

Convert month dates to ordinal integers
Run scipy.stats.linregress and print the slope, intercept, R-squared, and p-value
Compute the "monthly growth implied by the slope" as a percentage of the average revenue
Interpret the R-squared: what does it mean that the trend explains X% of variation?
Interpret the p-value: what does it mean that the trend is statistically significant (p < 0.05)?

Exercise 2.3 — Year-Over-Year Comparison

Using the 24-month synthetic dataset: 1. Compute YoY growth for each month (compare to same month one year earlier) 2. Compute the average YoY growth rate 3. Plot a bar chart showing YoY growth rates for each month in 2023 4. Answer: does YoY growth look more stable than MoM growth? Why or why not?

Exercise 2.4 — Applying Simple Exponential Smoothing

Using statsmodels.tsa.holtwinters.SimpleExpSmoothing:

Fit SES to the first 20 months of your synthetic dataset with auto-optimized smoothing
Print the optimal alpha (smoothing level) that statsmodels found
Generate a 4-month forecast
Compare the SES forecast to a simple 3-month rolling average forecast for the same 4 periods
Explain which method produces more reasonable-looking forecasts for this data

Exercise 2.5 — Confidence Band Construction

Using the residuals from the linear trend fit you computed in Exercise 2.2:

Compute the standard deviation of the residuals
Build a 95% confidence band (±1.96 × std_error) around a 3-period-ahead linear extrapolation
Widen the band by sqrt(horizon) for each forecast period ahead
Print a formatted table: Period, Point Forecast, Lower Bound, Upper Bound, Width
Answer: why does the confidence band widen as you forecast further ahead?

Tier 3: Integration — Building Complete Forecasting Pipelines

Exercise 3.1 — Acme Corp Regional Forecast

Using acme_sales_2023.csv (or a realistic substitute with columns: sale_date, region, revenue):

Aggregate to monthly revenue by region
For each of the four regions (North, South, East, West): - Compute a 3-month SMA - Fit a linear trend (scipy linregress) - Generate a 3-month forecast with 90% confidence bands
Present results as a summary table with columns: Region, 3-Month Forecast, Lower (90%), Upper (90%), Monthly Growth Rate
Identify which region has the strongest trend (highest R-squared) and which has the most uncertainty (widest confidence band)

Exercise 3.2 — Seasonality-Adjusted Forecast

Using the Acme data:

Compute quarterly seasonality indices (average by Q1/Q2/Q3/Q4 across all years)
Apply the seasonal adjustment to the linear forecast: - If Q3 historically runs 12% below average, multiply the Q3 forecast by 0.88
Compare the raw linear forecast to the seasonality-adjusted forecast
Explain to Sandra in one paragraph: why might the seasonally adjusted number be more useful for inventory planning?

Exercise 3.3 — Maya's Pipeline Forecast

Using maya_projects.csv (or a substitute with: project_name, client_name, start_date, deadline, completion_percent, hourly_rate, estimated_hours):

For each active project, estimate monthly billable hours based on completion pace and time remaining
Multiply by hourly rate to get monthly revenue contribution per project
Aggregate to a 6-month forecast DataFrame
Identify which month looks like the revenue peak and which looks like the trough
Compute the standard deviation across months as a measure of revenue stability

Exercise 3.4 — Forecast Visualization Function

Write a plot_regional_forecast(region_name, history_df, forecast_df) -> matplotlib.figure.Figure function that produces a publication-quality chart:

Solid blue line for historical data (with actual values labeled at the last point)
Dashed orange trend line
Open circle markers for forecast points
Shaded confidence band
Vertical line separating historical from forecast
Title, axis labels, legend, and grid
A footnote: "Based on linear trend from [start_date] to [end_date]"

Test it for all four Acme regions and save each as a PNG file.

Exercise 3.5 — Forecast Report Generator

Write a generate_forecast_report(df, output_path) -> None function that: 1. Computes quarterly forecasts for all four Acme regions 2. Computes overall company-level forecast (sum of regions) 3. Builds a multi-tab Excel file: - Tab 1: Summary table with forecasts for all regions - Tab 2: Chart data (values for charting in Excel) - Tab 3: Methodology notes (as text cells) 4. Saves the file to output_path

Tier 4: Challenge — Advanced Analysis

Exercise 4.1 — Holt's Method vs. Linear Regression

For each of the four Acme regions:

Fit both Holt's Linear Trend method (statsmodels.tsa.holtwinters.Holt) and linear regression
For each method, compute the in-sample RMSE (root mean squared error) on the historical data
Generate a 3-quarter forecast from each method
Present a comparison table: Region, Regression RMSE, Holt RMSE, Regression Forecast, Holt Forecast
Write 3–4 sentences explaining: do the two methods agree? Which would you present to the board and why?

Exercise 4.2 — Detecting a Trend Change Point

Sometimes a business experiences a structural shift — a trend change that is not captured by fitting one line through all the data. Write a function detect_trend_change(series: pd.Series, min_segment_length: int = 4) -> int | None that:

Splits the series at every possible break point (respecting min_segment_length)
Fits a linear regression to each sub-series
Returns the index where splitting produces the greatest improvement in total R-squared
Returns None if no split improves the fit by more than 10%

Apply this to the Acme North region data and discuss whether a trend change is detectable.

Exercise 4.3 — Bootstrapped Confidence Intervals

Instead of assuming normally distributed residuals (as the standard confidence interval approach does), use bootstrapping to generate empirical confidence intervals:

Fit a linear trend to the historical data
Compute residuals
Generate 1,000 bootstrap samples by resampling residuals with replacement
For each bootstrap sample, add resampled residuals to the trend line and refit
Generate 1,000 forecasts for 3 periods ahead
Compute the 5th and 95th percentile of those 1,000 forecasts to form a 90% confidence interval

Compare the bootstrapped CI to the analytical CI from Exercise 2.5. Are they similar or different? What might explain differences?

Exercise 4.4 — Forecast Accuracy Backtest

To evaluate how good your forecasting methods actually are, implement a walk-forward validation:

Start with the first 12 months of data as training
Forecast 3 months ahead
Compare forecast to actual (which you have)
Move the window forward 3 months and repeat
Compute the Mean Absolute Percentage Error (MAPE) across all out-of-sample forecasts
Repeat for both SMA-based and regression-based forecasting
Present a comparison table and explain which method has lower MAPE on this dataset

Exercise 4.5 — Composite Revenue Score Dashboard

Build a "revenue health dashboard" function that, for each Acme region:

Computes a composite score from: - Trend direction (+1 positive, -1 negative, 0 flat) - Trend R-squared (scaled 0–1) - Last 3-period growth vs. overall average (above average = +1, below = -1) - Seasonality adjustment for current quarter
Returns a DataFrame with Region, Score, Trend, R-Squared, Confidence Band Width, Status
Colors the output: green (score ≥ 1), yellow (0 to 1), red (< 0) in a matplotlib table visualization

Tier 5: Stretch — Open-Ended Projects

Exercise 5.1 — Automated Quarterly Forecast Report

Build a complete generate_quarterly_forecast_package(sales_csv, output_dir) -> None function that:

Loads and validates the sales data
Computes forecasts for all regions using at least two methods
Detects and reports any seasonality patterns
Generates one forecast chart per region plus a summary chart
Saves all charts to output_dir/charts/
Saves a formatted Excel report to output_dir/forecast_report.xlsx
Logs a summary to output_dir/forecast_log.txt

The function should be callable from the command line and handle errors gracefully.

Exercise 5.2 — Maya's 12-Month Revenue Scenario Analysis

Build a scenario analysis tool for Maya that:

Computes three forecasts: - Pessimistic: trend continues but at 50% of historical rate; no new clients - Base case: trend continues at historical rate; 1–2 new clients per quarter - Optimistic: trend accelerates by 20%; 2–3 new clients per quarter
Plots all three scenarios on one chart with different colors and labels
Computes the probability of each scenario based on historical win rates (if available)
Answers: at what monthly revenue level would Maya need to be to justify a year-long office lease? Is this achievable in any scenario within 12 months?

Exercise 5.3 — Automated Anomaly Detection

Write a function detect_revenue_anomalies(df, date_col, value_col, sensitivity=2.0) -> pd.DataFrame that:

Fits an EMA to the data
Computes residuals (actual − EMA)
Identifies periods where |residual| > sensitivity × std(residuals) as anomalies
Returns a DataFrame of anomalous periods with their date, actual value, expected value, and deviation
Creates a chart highlighting anomalies in red on the revenue timeline

Test it on Acme's monthly data and describe the business events that might explain any anomalies found.

Exercise 5.4 — Multi-Metric Forecasting System

Extend the forecasting system to handle multiple business metrics simultaneously:

Build a MultiMetricForecaster class that accepts a list of (metric_name, data_series) tuples
Fits the best-performing model for each metric (choose between SES, Holt, and linear regression based on in-sample RMSE)
Generates synchronized forecasts — all for the same time horizon
Produces a single summary DataFrame: date, metric1_forecast, metric2_forecast, etc.
Add a to_excel() method that saves all forecasts to a formatted multi-tab Excel file

Test with Acme's revenue, units sold, and transaction count as the three metrics.

These exercises range from approachable (Tier 1) to genuinely challenging (Tier 5). If you find yourself stuck on a later tier, the best approach is to return to the chapter examples and rebuild from there. Forecasting is a skill that develops through iteration, not just through reading.