Case Study 1: When Did Sales Peak? Marcus Discovers Seasonal Patterns

Contributors to Introduction to Data Science

Case Study 1: When Did Sales Peak? Marcus Discovers Seasonal Patterns

Tier 3 — Illustrative/Composite Example: This case study uses a fictional coffee shop and simulated sales data to illustrate time series analysis techniques. The business scenario, character, and all data are invented for pedagogical purposes. The seasonal patterns and analytical approaches described are representative of real small-business analytics challenges documented in business analytics literature.

The Setting

Marcus owns a coffee shop called The Daily Grind in a mid-sized college town. Business has been good — or at least, he thinks it has. The problem is that Marcus can't quite articulate how business has been. Some months feel slow. Others feel frantic. He has a vague sense that September (back to school) and January (New Year's resolutions, everyone needs caffeine to survive the cold) are busy, and that summer is slower. But "vague sense" isn't data.

Marcus's accountant has been sending him monthly revenue reports, but those only tell part of the story. Was that great September driven by one amazing week, or was the whole month strong? Do weekends matter more than weekdays? Has the overall trend been up or down?

He decides to analyze the raw daily sales data from his point-of-sale system. He exports a year's worth of transactions — 365 days of revenue, one number per day.

The Data

Marcus loads his sales data:

import pandas as pd
import numpy as np

df = pd.read_csv("daily_grind_sales.csv")
print(df.head())
print(df.dtypes)

         date  revenue
0  01/01/2023    742.50
1  01/02/2023    891.25
2  01/03/2023    923.00
3  01/04/2023    887.75
4  01/05/2023    956.50

date       object
revenue    float64
dtype: object

The date column is a string. Marcus's first task is to fix that:

df["date"] = pd.to_datetime(df["date"], format="%m/%d/%Y")
print(df.dtypes)

date       datetime64[ns]
revenue    float64
dtype: object

He sets the date as the index, which will make time-based operations easier:

df = df.set_index("date")
sales = df["revenue"]  # a Series with DatetimeIndex
print(f"Date range: {sales.index.min()} to {sales.index.max()}")
print(f"Total days: {len(sales)}")
print(f"Total revenue: ${sales.sum():,.2f}")

Date range: 2023-01-01 to 2023-12-31
Total days: 365
Total revenue: $312,847.50

Question 1: What Does the Daily Pattern Look Like?

Marcus starts with basic descriptive statistics:

print(sales.describe())

count      365.00
mean       857.12
std        142.38
min        412.50
25%        762.00
50%        851.75
75%        948.50
max       1287.00
Name: revenue, dtype: float64

Average daily revenue is $857, with a standard deviation of $142. The range is wide — from a $412 low to a $1,287 high. But looking at 365 individual numbers isn't useful. Marcus needs to aggregate.

Question 2: The Weekly Pattern

Is there a day-of-the-week effect?

daily_pattern = sales.groupby(sales.index.day_name()).mean()
# Reorder to start from Monday
day_order = ["Monday", "Tuesday", "Wednesday",
             "Thursday", "Friday", "Saturday", "Sunday"]
daily_pattern = daily_pattern.reindex(day_order)
print(daily_pattern.round(2))

Monday       814.23
Tuesday      798.45
Wednesday    832.67
Thursday     878.12
Friday       951.34
Saturday     912.88
Sunday       810.15
Name: revenue, dtype: float64

There it is. Friday is the best day ($951), likely from people treating themselves at the end of the work week. Saturday is strong too. Tuesday is the weakest. Marcus had always felt that Fridays were busy, and now the data confirms it.

But the difference matters in dollars:

weekly_range = daily_pattern.max() - daily_pattern.min()
print(f"Best day vs. worst day: ${weekly_range:.2f} difference")
print(f"That's ${weekly_range * 52:.2f} per year if the pattern holds")

Best day vs. worst day: $152.89 difference
That's $7,950.28 per year if the pattern holds

Almost $8,000 per year in revenue difference between the best and worst days of the week. Marcus makes a note: if he's going to run a promotion, Tuesdays might need the most help.

Question 3: Monthly Trends

monthly = sales.resample("ME").agg(["sum", "mean", "count"])
monthly.columns = ["total", "daily_avg", "days"]
print(monthly.round(2))

            total  daily_avg  days
2023-01-31  28942.50   933.63    31
2023-02-28  23661.00   845.04    28
2023-03-31  26234.75   846.28    31
2023-04-30  24891.00   829.70    30
2023-05-31  25234.50   814.02    31
2023-06-30  24123.75   804.13    30
2023-07-31  23987.25   773.78    31
2023-08-31  24567.50   792.50    31
2023-09-30  28734.00   957.80    30
2023-10-31  27456.75   885.70    31
2023-11-30  26834.50   894.48    30
2023-12-31  28180.00   908.71    31

Several things jump out:

September is the best month ($957.80 daily average) — students returning to campus.
January is the second best ($933.63) — post-holiday energy, cold weather drives coffee consumption.
July is the worst ($773.78) — summer break, many students are away.

The monthly totals are misleading without the daily average, because months have different numbers of days. February's total looks low, but its daily average ($845) is close to the annual mean. July's daily average is genuinely low.

Question 4: The Rolling Average Reveals the Trend

Daily data is too noisy to see trends. Marcus computes rolling averages:

sales_analysis = pd.DataFrame({
    "daily": sales,
    "rolling_7d": sales.rolling(7).mean(),
    "rolling_30d": sales.rolling(30).mean()
})

# Compare start vs. end of year
print("First 30-day average:",
      sales_analysis["rolling_30d"].dropna().iloc[0].round(2))
print("Last 30-day average:",
      sales_analysis["rolling_30d"].iloc[-1].round(2))

First 30-day average: 876.45
Last 30-day average: 908.23

The 30-day rolling average went from $876 to $908 over the course of the year — a modest upward trend. Business is growing, slowly but steadily.

The 7-day rolling average shows the weekly rhythm more clearly:

# Find the peak and trough of the 7-day rolling average
peak_date = sales_analysis["rolling_7d"].idxmax()
peak_value = sales_analysis["rolling_7d"].max()
trough_date = sales_analysis["rolling_7d"].idxmin()
trough_value = sales_analysis["rolling_7d"].min()

print(f"Peak: ${peak_value:.2f} on {peak_date.strftime('%B %d')}")
print(f"Trough: ${trough_value:.2f} on {trough_date.strftime('%B %d')}")

Peak: $1,042.50 on September 08
Trough: $623.14 on July 05

The peak aligns with the first week of fall semester, and the trough falls right on the Fourth of July holiday week. These aren't surprises, but having the exact numbers changes Marcus's thinking. The peak is 67% higher than the trough. That's not a mild seasonal fluctuation — it's a dramatic swing that should inform staffing, inventory, and promotion decisions.

Question 5: The Weekend Effect Over Time

Marcus noticed that Saturday is strong and Sunday is weaker. But is this pattern consistent throughout the year?

df_analysis = pd.DataFrame({"revenue": sales})
df_analysis["is_weekend"] = df_analysis.index.dayofweek >= 5
df_analysis["month"] = df_analysis.index.month

weekend_by_month = (df_analysis
    .groupby(["month", "is_weekend"])["revenue"]
    .mean()
    .unstack())
weekend_by_month.columns = ["Weekday", "Weekend"]
weekend_by_month["Gap %"] = (
    (weekend_by_month["Weekend"] - weekend_by_month["Weekday"])
    / weekend_by_month["Weekday"] * 100
).round(1)
print(weekend_by_month)

       Weekday  Weekend  Gap %
month
1        938.5    920.3   -1.9
2        849.2    832.1   -2.0
3        852.4    828.6   -2.8
4        836.7    808.2   -3.4
5        821.3    793.5   -3.4
6        812.8    780.1   -4.0
7        782.4    748.9   -4.3
8        801.2    769.3   -4.0
9        968.4    928.7   -4.1
10       893.4    864.2   -3.3
11       902.1    874.8   -3.0
12       916.3    888.5   -3.0

Interesting — weekends are actually slightly lower than weekdays in every month. The gap is smallest in winter (around 2%) and largest in summer (around 4%). Marcus had assumed weekends were busier because Saturday feels busy, but the data shows that Saturday is strong while Sunday is notably weak, and the average of both weekend days falls below weekdays.

The Takeaway

Marcus started with a vague sense that "some months are better than others." He now has precise answers:

The weekly cycle: Fridays are 19% better than Tuesdays. This justifies targeted promotions on slow days.
The seasonal cycle: September (back to school) is the peak, July (summer break) is the trough. The swing is 24% — large enough to affect staffing and inventory planning.
The annual trend: A modest upward trend of about 3.6% year-over-year, suggesting slow but real growth.
The weekend myth: Despite feeling busier on Saturdays, the overall weekend average is below weekdays. Sunday drags down the weekend numbers.

None of these insights required advanced statistics or machine learning. They required three things: properly parsed dates, resampling to the right time scale, and rolling averages to see through the noise. These are the foundational tools of time series analysis, and they're available to any business owner willing to export their sales data and write a few lines of pandas code.

Marcus saves his analysis notebook. He's going to bring it to his next meeting with his accountant — and this time, when they ask "how's business?", he'll have a real answer.