Case Study 2: Building a Sales Report Generator — Marcus's Weekly Automation

Contributors to Introduction to Data Science

Case Study 2: Building a Sales Report Generator — Marcus's Weekly Automation

Tier 3 — Illustrative/Composite Example: Marcus and Rise & Shine Bakery are fictional. This case study illustrates how a small business owner with no programming background might use Python to automate routine data analysis. The sales figures, business decisions, and seasonal patterns described are invented for pedagogical purposes, but they reflect real patterns common in small food-service businesses.

The Situation

Marcus opens Rise & Shine Bakery at 6 AM every day. By the time he locks up at 4 PM, he's baked 15 different products, served a hundred-plus customers, and managed three employees. The last thing he wants to do at the end of the week is spend two hours in a spreadsheet, manually computing totals, averages, and best-selling days.

But he needs to. Marcus's bakery runs on thin margins — typical for food service. Knowing whether this week was better or worse than last week, which days are consistently strong or weak, and whether the numbers are trending up or down isn't optional. It's the difference between making smart purchasing decisions and guessing.

For the past year, Marcus has been doing this analysis by hand every Sunday night. He opens a spreadsheet, types in the daily sales numbers he jotted in a notebook, adds formulas, looks at the numbers, and sighs. Then he does it again next week.

This week, Marcus decides to automate the whole thing. He took a few Python lessons (he's reading this textbook, after all), and he thinks he can build something that does in 5 seconds what takes him 2 hours.

He's right. And the key is functions.

The Data

Marcus records daily sales as simple dollar amounts. Here are two weeks of data:

# Daily sales (Monday through Sunday)
week_1 = [520, 480, 610, 430, 390, 710, 550]
week_2 = [485, 510, 575, 460, 420, 680, 590]

day_names = ["Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"]

His goals: 1. Compute the weekly total, daily average, best day, and worst day 2. Classify each day as "strong" (above average), "average" (within $50 of average), or "slow" (below average) 3. Compare two weeks side by side 4. Do all of this with functions so he can reuse it every week

Building Block 1: Basic Statistics

Marcus starts with the fundamentals — a function to compute basic numbers:

def weekly_stats(daily_sales):
    """Compute total, average, best, and worst for a week of sales."""
    total = 0
    best = daily_sales[0]
    worst = daily_sales[0]
    best_day = 0
    worst_day = 0

    for i in range(len(daily_sales)):
        total = total + daily_sales[i]
        if daily_sales[i] > best:
            best = daily_sales[i]
            best_day = i
        if daily_sales[i] < worst:
            worst = daily_sales[i]
            worst_day = i

    average = total / len(daily_sales)
    return total, average, best, best_day, worst, worst_day

Testing it:

total, avg, best, best_i, worst, worst_i = weekly_stats(week_1)
print(f"Total: ${total}")
print(f"Average: ${avg:.2f}")
print(f"Best: ${best} ({day_names[best_i]})")
print(f"Worst: ${worst} ({day_names[worst_i]})")

Output:

Total: $3690
Average: $527.14
Best: $710 (Sat)
Worst: $390 (Fri)

Marcus nods. Saturday is always the best day — people come in for pastries and weekend brunch. Friday is the worst, which surprises him slightly. He files that away.

Building Block 2: Classifying Days

Marcus wants to classify each day's performance relative to the week's average. He starts with a simple classification function:

def classify_day(sales, average, margin=50):
    """Classify a day as strong, average, or slow."""
    if sales > average + margin:
        return "strong"
    elif sales < average - margin:
        return "slow"
    else:
        return "average"

The margin=50 is a default parameter — Marcus considers a day "average" if it's within $50 of the weekly mean. He can adjust this threshold without rewriting the function.

Building Block 3: Formatting

Marcus wants his reports to look clean. He writes two formatting functions:

def format_dollar(amount):
    """Format a number as a dollar amount."""
    return f"${amount:,.2f}"

def format_change(current, previous):
    """Format the change between two values with an arrow."""
    diff = current - previous
    pct = (diff / previous) * 100 if previous != 0 else 0

    if diff > 0:
        arrow = "^"
    elif diff < 0:
        arrow = "v"
    else:
        arrow = "="

    return f"{arrow} {format_dollar(abs(diff))} ({pct:+.1f}%)"

Testing:

print(format_change(590, 550))  # Sunday improved
print(format_change(420, 480))  # Tuesday declined

Output:

^ $40.00 (+7.3%)
v $60.00 (-12.5%)

The format_change function uses several Chapter 3 concepts: f-string formatting, absolute value (abs()), the conditional to choose the arrow, and division for the percentage change. Notice the if previous != 0 guard — dividing by zero would crash the program, so Marcus handles it defensively.

Building Block 4: The Day-by-Day Breakdown

Now Marcus combines his building blocks:

def daily_breakdown(daily_sales, day_names):
    """Print a day-by-day breakdown with classifications."""
    total, avg, best, best_i, worst, worst_i = weekly_stats(daily_sales)

    print(f"{'Day':<6} {'Sales':>8} {'Status':<10}")
    print("-" * 26)

    for i in range(len(daily_sales)):
        sales = daily_sales[i]
        status = classify_day(sales, avg)
        marker = ""
        if i == best_i:
            marker = " << BEST"
        elif i == worst_i:
            marker = " << WORST"
        print(f"{day_names[i]:<6} {format_dollar(sales):>8} "
              f"{status:<10}{marker}")

    print("-" * 26)
    print(f"{'Total':<6} {format_dollar(total):>8}")
    print(f"{'Avg':<6} {format_dollar(avg):>8}")

Output for week 1:

Day    Sales    Status
--------------------------
Mon    $520.00  average
Tue    $480.00  average
Wed    $610.00  strong     << BEST... wait
Thu    $430.00  slow
Fri    $390.00  slow       << WORST
Sat    $710.00  strong     << BEST
Sun    $550.00  average
--------------------------
Total  $3,690.00
Avg    $527.14

Building Block 5: The Week-over-Week Comparison

Here's the function Marcus is most excited about — comparing this week to last week:

def weekly_comparison(current_sales, previous_sales, day_names):
    """Compare two weeks of sales, day by day and in total."""
    curr_total, curr_avg, _, _, _, _ = weekly_stats(current_sales)
    prev_total, prev_avg, _, _, _, _ = weekly_stats(previous_sales)

    print(f"\n{'':=<50}")
    print(f"  WEEKLY COMPARISON REPORT")
    print(f"{'':=<50}\n")

    print(f"{'Day':<6} {'Last Wk':>9} {'This Wk':>9} {'Change':>16}")
    print("-" * 42)

    for i in range(len(day_names)):
        prev = previous_sales[i]
        curr = current_sales[i]
        change = format_change(curr, prev)
        print(f"{day_names[i]:<6} {format_dollar(prev):>9} "
              f"{format_dollar(curr):>9} {change:>16}")

    print("-" * 42)
    print(f"{'Total':<6} {format_dollar(prev_total):>9} "
          f"{format_dollar(curr_total):>9} "
          f"{format_change(curr_total, prev_total):>16}")
    print(f"{'Avg':<6} {format_dollar(prev_avg):>9} "
          f"{format_dollar(curr_avg):>9} "
          f"{format_change(curr_avg, prev_avg):>16}")

Marcus runs it:

weekly_comparison(week_2, week_1, day_names)

Output:

==================================================
  WEEKLY COMPARISON REPORT
==================================================

Day    Last Wk   This Wk           Change
------------------------------------------
Mon    $520.00   $485.00  v $35.00 (-6.7%)
Tue    $480.00   $510.00  ^ $30.00 (+6.3%)
Wed    $610.00   $575.00  v $35.00 (-5.7%)
Thu    $430.00   $460.00  ^ $30.00 (+7.0%)
Fri    $390.00   $420.00  ^ $30.00 (+7.7%)
Sat    $710.00   $680.00  v $30.00 (-4.2%)
Sun    $550.00   $590.00  ^ $40.00 (+7.3%)
------------------------------------------
Total  $3,690.00 $3,720.00 ^ $30.00 (+0.8%)
Avg    $527.14   $531.43  ^ $4.29 (+0.8%)

Marcus stares at the output. Overall sales are up slightly. Tuesday, Thursday, Friday, and Sunday improved — the traditionally weaker days got a bit stronger. Saturday and Wednesday (typically the strong days) dipped slightly. The week is more even, which is actually good for staffing.

This is a small insight, but it's the kind of thing Marcus never noticed when he was adding up numbers in a spreadsheet at 10 PM on a Sunday. The structured report makes patterns visible.

Building Block 6: The Master Report Function

Finally, Marcus creates one function that runs everything:

def full_weekly_report(current_sales, previous_sales, day_names,
                       week_label="This Week"):
    """Generate a complete weekly sales report."""
    print(f"\n{'#'*50}")
    print(f"  RISE & SHINE BAKERY — {week_label}")
    print(f"{'#'*50}\n")

    # Daily breakdown for current week
    print("--- DAILY BREAKDOWN ---")
    daily_breakdown(current_sales, day_names)

    # Week-over-week comparison
    if previous_sales is not None:
        weekly_comparison(current_sales, previous_sales, day_names)

    # Alerts
    total, avg, best, best_i, worst, worst_i = weekly_stats(current_sales)
    print(f"\n--- ALERTS ---")
    for i in range(len(current_sales)):
        if classify_day(current_sales[i], avg) == "slow":
            print(f"  Slow day: {day_names[i]} "
                  f"({format_dollar(current_sales[i])})")

    if previous_sales is not None:
        prev_total, _, _, _, _, _ = weekly_stats(previous_sales)
        if total < prev_total:
            print(f"  WARNING: Total sales declined from "
                  f"last week")
        else:
            print(f"  Total sales up from last week")

    print(f"\n{'#'*50}\n")

Now Marcus's entire Sunday-night analysis is one line:

full_weekly_report(week_2, week_1, day_names, "Week of Jan 13")

The Architecture

Let's step back and look at what Marcus built. Here's the function hierarchy:

full_weekly_report()
├── daily_breakdown()
│   ├── weekly_stats()
│   ├── classify_day()
│   └── format_dollar()
├── weekly_comparison()
│   ├── weekly_stats()
│   ├── format_dollar()
│   └── format_change()
│       └── format_dollar()
└── weekly_stats()
    └── classify_day()

Seven functions, each doing one thing. The top-level function (full_weekly_report) reads almost like a to-do list: do the daily breakdown, do the comparison, check for alerts. The details are hidden inside the helper functions.

This is decomposition in action. And notice the reuse: weekly_stats is called in three different places. format_dollar is called in four. If Marcus decides to change how dollars are formatted (say, no cents for whole numbers), he edits one function and every report updates automatically.

What Marcus Learned

1. Start with small, testable pieces. Marcus didn't write full_weekly_report first. He built weekly_stats, tested it, built classify_day, tested it, and worked his way up. Each piece was verified before being combined.

2. Functions with default parameters are flexible. The margin=50 in classify_day and week_label="This Week" in full_weekly_report let Marcus customize behavior without changing function signatures. Most of the time, the defaults work fine.

3. Formatting is its own concern. Marcus separated formatting (format_dollar, format_change) from logic (weekly_stats, classify_day). This means he can reuse the same statistics in a text report, a CSV file, or (eventually) a chart, without recomputing anything.

4. Guard against edge cases. The if previous != 0 check in format_change prevents a crash when computing percentage change from zero. The if previous_sales is not None check in the master report allows it to work even when there's no previous week to compare against. Defensive coding isn't paranoia — it's professionalism.

5. Automation saves more than time. The report now takes 5 seconds instead of 2 hours. But the bigger win is consistency — the analysis is done the same way every week, with no risk of manual errors. Marcus can also look back at old reports and know they were computed identically.

The Payoff

Three weeks later, Marcus adds a third week of data:

week_3 = [550, 525, 640, 480, 410, 730, 620]
full_weekly_report(week_3, week_2, day_names, "Week of Jan 20")

The same code, one new line. The report generates in under a second.

Over time, Marcus starts saving each week's data in a list of lists. He writes a function to compute the trend across multiple weeks — average sales per week over time. He can see that his total weekly revenue has been growing by about 1-2% per week since the holiday season ended. That's useful information for deciding whether to hire a fourth employee.

None of this required machine learning, statistical modeling, or a computer science degree. It required if, for, def, and the willingness to think about a problem in terms of small, reusable pieces.

Connecting to the Bigger Picture

Marcus's report generator is a miniature data pipeline. It mirrors the data science lifecycle from Chapter 1:

Ask: "How did this week compare to last week, and where are my strong and weak spots?"
Acquire: Marcus enters the daily sales numbers. (In Chapter 12, he'll learn to load them directly from his point-of-sale system's CSV exports.)
Clean: The weekly_stats function handles the data consistently. There's no room for a misplaced formula.
Explore: The day-by-day breakdown and week-over-week comparison reveal patterns.
Communicate: The formatted report presents findings clearly.

The only missing stage is "Model" — and that will come. In Part V, Marcus will learn to build a simple model that predicts next week's sales based on historical patterns. But the foundation — clean data, clear questions, reusable functions — is already in place.

Discussion Questions:

Marcus wrote seven functions for his report. Could he have written the whole thing as one long function? What would be the disadvantages?

The format_change function handles the edge case of previous == 0. What other edge cases should Marcus consider? (Hint: what if a week has fewer than 7 days? What if all days have the same sales amount?)

Marcus's code currently uses two separate lists for daily sales and day names. What could go wrong if these lists get out of sync (different lengths)? How might you prevent this? (Preview: Chapter 5 introduces dictionaries, which solve this problem elegantly.)

How would you modify the alert system to also flag days where sales increased by more than 20% compared to the same day last week? What function would you need to create or modify?