Chapter 26 Quiz: Business Forecasting and Trend Analysis

DataField.Dev

Chapter 26 Quiz: Business Forecasting and Trend Analysis

Instructions: Select the best answer for each multiple-choice question. For True/False, write T or F. For short-answer questions, write 2–4 sentences. The answer key is at the end.

Section A: Multiple Choice

1. Acme Corp's revenue is consistently higher in September than in any other month, and this pattern repeats every year. Which time series component best describes this behavior?

A) Trend B) Cyclicality C) Seasonality D) Noise

2. You compute a 4-period simple moving average. For the very first data point in your dataset, the SMA value is:

A) Equal to the first data point itself B) The average of all available data points so far C) NaN, because there are not enough preceding periods D) Zero, as the default for missing computations

3. Which moving average method gives the most weight to the most recent observation?

A) Simple Moving Average (SMA) B) Equally-weighted centered moving average C) Exponential Moving Average (EMA) D) All moving averages weight observations equally

4. Sandra asks Priya: "How well does the linear trend explain the revenue data?" Priya answers with a single number. Which statistic is she most likely citing?

A) The slope of the trend line B) The p-value of the regression C) The R-squared value D) The standard error of the residuals

5. You run scipy.stats.linregress on quarterly revenue data and get R-squared = 0.25. What does this mean?

A) The linear trend explains 25% of the variation in revenue B) There is a 25% chance the trend is statistically significant C) Revenue is growing at 25% per year D) The model's predictions are off by 25% on average

6. Which pandas method computes year-over-year percent change for monthly data?

A) df["revenue"].pct_change(periods=1) B) df["revenue"].pct_change(periods=12) C) df["revenue"].rolling(12).mean() D) df["revenue"].diff(12) / 12

7. Priya's quarterly revenue model produces residuals with a standard deviation of $45,000. She is forecasting 3 quarters ahead. Using the formula z × std × sqrt(horizon), what is the 95% confidence band half-width for the 3-quarter-ahead forecast? (Use z = 1.96)

A) approximately $88,000 B) approximately $153,000 C) approximately $45,000 D) approximately $265,000

8. Maya has revenue data that shows consistent growth over 3 years but no seasonal pattern. Which statsmodels method is most appropriate for her forecast?

A) SimpleExpSmoothing — because it handles all types of data B) Holt's Linear Trend method — because it explicitly models level and trend C) A simple rolling average — because it is the most interpretable D) Linear regression only — because statsmodels is overkill for consultants

9. When building a business forecast for a quarterly board presentation, what should always accompany the point forecast number?

A) The Python source code used to generate it B) A confidence interval or range, plus explicit model assumptions and limitations C) A comparison to competitor forecasts D) A guarantee that the actual value will fall within 10% of the forecast

10. Priya uses df.groupby(df["sale_date"].dt.month)["revenue"].mean() to detect seasonality. What does this computation produce?

A) The linear trend coefficient by month B) The average revenue for each calendar month, pooled across all years in the data C) The month-over-month growth rate for each month D) A moving average aligned to the start of each calendar month

Section B: True or False

11. A 95% confidence interval means you are 95% certain the true value falls within that range.

True / False

12. The EMA (Exponential Moving Average) requires the same minimum number of prior data points as the SMA before it can produce values.

True / False

13. A high R-squared value (e.g., 0.95) guarantees that a linear forecast will be accurate for the next period.

True / False

14. Year-over-year comparison is generally more useful than month-over-month comparison for seasonal businesses because it removes the seasonal effect from the growth calculation.

True / False

15. statsmodels.tsa.holtwinters.Holt can automatically optimize its smoothing parameters using maximum likelihood estimation.

True / False

16. A forecast confidence band that widens as you project further into the future represents a flaw in the model — a better model would maintain constant band width.

True / False

17. In Python, pandas.Series.rolling(window=4).mean() for a 12-element series will return exactly 4 non-NaN values.

True / False

Section C: Short Answer

18. Sandra tells Priya: "Just give me a number. I don't want a range — it looks like we don't know what we're doing." Write the explanation Priya should give Sandra about why communicating a range is actually more professional and credible than presenting a single point estimate.

(Write 4–6 sentences.)

19. Explain the difference between seasonal variation and cyclical variation. Give one example of each from the context of a regional office supply distributor like Acme Corp. Why is cyclicality generally harder to model than seasonality?

(Write 4–5 sentences.)

20. You have 8 quarters of revenue data and use linear regression to forecast the next 2 quarters. You notice the first 4 quarters show flat growth and the last 4 show accelerating growth. Explain how fitting a single trend line through all 8 quarters might produce a misleading forecast, and describe one approach to handle this.

(Write 3–5 sentences.)

Answer Key

1. C — Seasonality is defined as a pattern that repeats at regular, calendar-based intervals. The fact that September is consistently the peak and the pattern repeats every year is the defining characteristic of seasonality. Trend (A) would be a long-term direction. Cyclicality (B) repeats on irregular, multi-year intervals. Noise (D) is random, not systematic.

2. C — rolling(window=4).mean() requires 4 values in the window before it can compute a result. For the first three observations, there are fewer than 4 preceding values, so the result is NaN. This is expected and correct behavior, not an error.

3. C — The Exponential Moving Average weights the most recent observation most heavily, with weights decaying exponentially for older observations. The SMA (A) weights all observations in its window equally. A centered moving average (B) also weights equally. The premise of (D) is incorrect — different moving averages have very different weighting schemes.

4. C — R-squared is the standard measure of "how well does the trend explain the data?" It ranges from 0 to 1, with 1 meaning the trend line perfectly explains all variation. The slope (A) tells you the rate of change. The p-value (B) tells you statistical significance. Standard error (D) tells you typical prediction error.

5. A — R-squared = 0.25 means the linear trend accounts for 25% of the total variation in revenue. The remaining 75% is unexplained by the trend (it is seasonality, noise, or other factors). R-squared does not represent probability (B), growth rate (C), or average error (D).

6. B — pct_change(periods=12) compares each month to the same month 12 periods earlier, which for monthly data means the same calendar month one year ago. This removes the seasonal effect because both months being compared are in the same season. periods=1 (A) gives month-over-month. rolling(12).mean() (C) is a moving average, not percent change.

7. B — Calculation: $45,000 × 1.96 × √3 = $45,000 × 1.96 × 1.732 ≈ $152,860 ≈ $153,000. The formula is z_score × std_error × sqrt(horizon). For the 1-quarter-ahead forecast the margin is $45,000 × 1.96 × 1 ≈ $88,200, so (A) is the 1-period band, not the 3-period band.

8. B — Holt's Linear Trend method is specifically designed for data with a trend but no seasonality. It models both the current level and the current trend direction, making it appropriate when consistent growth is present. SES (A) would underperform on trending data because it ignores the trend component. Rolling average (C) does not produce a proper forecast. Linear regression (D) is a valid alternative, but Holt's is the natural statsmodels choice for this pattern.

9. B — A professional forecast must include a confidence interval or range to communicate uncertainty, and explicit model assumptions and limitations so stakeholders understand what the model is and is not capturing. Source code (A) is irrelevant to executives. Competitor comparisons (C) are often unavailable. Guarantees of accuracy (D) are impossible for any honest forecast.

10. B — groupby(dt.month)["revenue"].mean() groups all revenue observations by calendar month number and computes the average for each month, pooling across all years in the dataset. This reveals the seasonal pattern: which months are typically high and which are typically low. It does not compute trend coefficients (A), growth rates (C), or moving averages (D).

11. False — This is one of the most common misinterpretations of confidence intervals. A 95% confidence interval means: if we repeatedly drew samples and computed this interval, 95% of those intervals would contain the true value. For a single forecast, you cannot say you are "95% certain" the answer is inside the range in a simple probability sense. The correct framing is that the interval reflects the range of outcomes consistent with the model's uncertainty.

12. False — Unlike SMA, which requires exactly window values before it can compute, EMA produces a value for every period. Because it weights all past values (with exponential decay), it can compute from the very first observation — though early values are less reliable because the model has had less time to converge.

13. False — R-squared measures how well the trend explains historical variation, not how accurately it will forecast the future. A very high R-squared means the trend fits the past well, but the future can always diverge due to structural changes, unexpected events, or simply because the historical period was unusually consistent. R-squared is a measure of fit quality, not forecast accuracy.

14. True — Month-over-month comparisons for seasonal businesses are polluted by the seasonal effect: December to January always looks like a crash even if the business is growing, simply because January is always weaker than December. YoY comparisons (same month, one year apart) remove this effect because both months being compared are in the same seasonal position.

15. True — statsmodels.tsa.holtwinters.Holt has an optimized=True parameter (the default) that uses maximum likelihood estimation to find the smoothing parameters (alpha for level, beta for trend) that best fit the historical data. You can also specify parameters manually if you have a business reason to prefer certain values.

16. False — A confidence band that widens with forecast horizon is correct and honest behavior. Uncertainty genuinely accumulates as you forecast further into the future — small errors in the current period compound over time. A constant-width band would be misleading, implying that a 12-period-ahead forecast is as reliable as a 1-period-ahead forecast, which it is not.

17. False — For a 12-element series with rolling(window=4).mean(), the result will have NaN for the first 3 positions (indices 0, 1, 2) and non-NaN values for indices 3 through 11 — that is 9 non-NaN values, not 4. The first valid window covers indices 0–3, the second covers 1–4, and so on through indices 8–11.

18. Priya should explain: a point forecast ("revenue will be $1.21M") implies a false precision that the data simply does not support. Any specific number will almost certainly be slightly wrong, and presenting it without context makes it look like poor forecasting when the actual outcome differs. A range communicates that we have analyzed the historical variability and are being honest about the inherent uncertainty in projecting the future. When the actual result falls within the forecasted range, the board sees that the analysis was sound — which is a stronger credibility signal than a point estimate that happens to be close. Sophisticated stakeholders, including most board members, will trust a range with explicit uncertainty more than a single number that pretends uncertainty does not exist.

19. Seasonal variation is a pattern that repeats on a fixed, calendar-based cycle — for Acme Corp, office supply sales likely peak in August/September (back-to-school/office season) and around January (new fiscal year supply orders) every year. Cyclical variation repeats over longer, irregular periods tied to economic conditions — Acme might see generally lower revenue during economic contractions and growth during expansions, but these cycles do not follow a predictable annual schedule. Cyclicality is harder to model because the length and magnitude of economic cycles are not fixed, and detecting them requires many years of data to distinguish the cycle from the overall trend.

20. Fitting a single line through data where the growth rate changed partway through produces a trend that underestimates recent growth (because the slower early period drags the slope down) and therefore generates a forecast that is likely too conservative for the near-term future. The forecast essentially averages the old growth rate with the new one. One approach is to use only the more recent data (the last 4 quarters of accelerating growth) for the forecast, which captures the current regime better than the full 8-quarter history. Another approach is to explicitly test for a trend break point and fit separate regression lines to the two periods.

End of Chapter 26 Quiz