Case Study 13-1: Priya Builds the Weekly Regional Report

The Situation

It is 7:45 a.m. on Monday. Priya Okonkwo sits at her desk in Acme Corp's analytics office with a large coffee and a problem. Sandra Chen's 9 a.m. leadership review is in 75 minutes, and Sandra has just sent a follow-up to the standing request she makes every week:

"Priya — I need last week's numbers by region, with the tier breakdown. And can you add the week-over-week comparison this time? Also Marcus said there's a new customer master file in the shared drive."

Priya nods to herself. Before pandas, this would have required a complete redo of her Excel file, manual VLOOKUPs into the new customer master, a new pivot table, and probably 90 minutes of careful formatting. Now she knows she can do all of this in Python.

She opens her script from last week, and begins.


Step 1: Load and Inspect the Data

import pandas as pd
import numpy as np

# Week 3 transaction data (current week)
week3_transactions = pd.DataFrame({
    "transaction_date": pd.to_datetime([
        "2024-01-15", "2024-01-15", "2024-01-16",
        "2024-01-17", "2024-01-18", "2024-01-18",
        "2024-01-19", "2024-01-19",
    ]),
    "customer_id": ["C001", "C003", "C002", "C004", "C001", "C005", "C002", "C003"],
    "region": ["North", "West", "South", "East", "North", "West", "South", "West"],
    "revenue": [6700, 3300, 1900, 5100, 4200, 4200, 4400, 2800],
    "cost":    [1340,  660,  950, 1020,  840,  840,  880,  560],
    "product_category": [
        "Software", "Services", "Hardware", "Software",
        "Software", "Services", "Hardware", "Software",
    ],
})

# Week 2 transaction data (previous week — for WoW comparison)
week2_transactions = pd.DataFrame({
    "transaction_date": pd.to_datetime([
        "2024-01-08", "2024-01-09", "2024-01-10",
        "2024-01-11", "2024-01-12", "2024-01-12",
    ]),
    "customer_id": ["C001", "C002", "C003", "C004", "C001", "C002"],
    "region": ["North", "South", "West", "East", "North", "South"],
    "revenue": [4500, 2300, 3800, 5100, 6700, 1900],
    "cost":    [900,  1150,  760, 1020, 1340,  950],
    "product_category": [
        "Software", "Hardware", "Services", "Software", "Software", "Hardware",
    ],
})

# Updated customer master (the new file Marcus put in the shared drive)
customer_master = pd.DataFrame({
    "customer_id":   ["C001",          "C002",        "C003",         "C004",         "C005"],
    "customer_name": ["Goldfinch Inc", "Redwood LLC", "Birchwood Co", "Magnolia Ltd", "Cypress Group"],
    "tier":          ["Gold",           "Silver",      "Bronze",       "Gold",         "Silver"],
    "account_manager": ["Alice",        "Bob",         "Carol",        "Alice",        "Bob"],
    "segment":       ["Enterprise",    "Mid-Market",  "SMB",          "Enterprise",   "Mid-Market"],
})

print("Week 3 transactions:", len(week3_transactions))
print("Week 2 transactions:", len(week2_transactions))
print("Customers in master:", len(customer_master))

Priya always starts with a quick shape check. It catches truncated files and loading errors before they contaminate the analysis.


Step 2: Compute Derived Columns

def prepare_transactions(df):
    """Add margin, margin percentage, and week number to a transactions DataFrame."""
    df = df.copy()
    df["margin"] = df["revenue"] - df["cost"]
    df["margin_pct"] = (df["margin"] / df["revenue"] * 100).round(1)
    df["week"] = df["transaction_date"].dt.isocalendar().week.astype(int)
    return df

week3 = prepare_transactions(week3_transactions)
week2 = prepare_transactions(week2_transactions)

print("Week 3 margin range:", week3["margin_pct"].min(), "% to", week3["margin_pct"].max(), "%")

By encapsulating the preparation logic in a function, Priya can apply it consistently to both weeks' data.


Step 3: Merge with Customer Master

Sandra's new request is to include the customer tier and segment in the breakdown. Priya merges both datasets with the customer master.

week3_enriched = pd.merge(week3, customer_master, on="customer_id", how="left")
week2_enriched = pd.merge(week2, customer_master, on="customer_id", how="left")

# Verify: any transactions without a customer match?
unmatched = week3_enriched[week3_enriched["customer_name"].isna()]
if len(unmatched) > 0:
    print(f"WARNING: {len(unmatched)} transactions have no customer match:")
    print(unmatched[["transaction_date", "customer_id", "revenue"]])
else:
    print("All transactions matched to customer master. Good.")

The verification step is important. If a transaction has no customer master entry, the tier column will be NaN, and it will silently disappear from tier-based summaries. Priya always checks for this.


Step 4: Build the Regional Summary for Week 3

week3_regional = week3_enriched.groupby("region").agg(
    total_revenue=("revenue", "sum"),
    total_margin=("margin", "sum"),
    transaction_count=("revenue", "count"),
    avg_order_value=("revenue", "mean"),
    unique_customers=("customer_id", "nunique"),
).round(2).reset_index()

week3_regional["margin_pct"] = (
    week3_regional["total_margin"] / week3_regional["total_revenue"] * 100
).round(1)

week3_regional = week3_regional.sort_values("total_revenue", ascending=False)

print("\n=== Week 3 Regional Summary ===")
print(week3_regional.to_string(index=False))

Output:

region  total_revenue  total_margin  transaction_count  avg_order_value  unique_customers  margin_pct
 North          10900          5320                  2           5450.0                 1        48.8
  West          10300          4940                  3           3433.3                 2        48.0
 South           6300          2670                  2           3150.0                 1        42.4
  East           5100          4080                  1           5100.0                 1        80.0

Step 5: Build the Week 2 Summary and Compute Week-over-Week Change

week2_regional = week2_enriched.groupby("region").agg(
    total_revenue=("revenue", "sum"),
).reset_index().rename(columns={"total_revenue": "w2_revenue"})

# Merge the two regional summaries on region
wow_comparison = pd.merge(
    week3_regional[["region", "total_revenue", "margin_pct"]],
    week2_regional,
    on="region",
    how="left",
)

# Week-over-week revenue change
wow_comparison["wow_change"] = (
    wow_comparison["total_revenue"] - wow_comparison["w2_revenue"]
)
wow_comparison["wow_change_pct"] = (
    wow_comparison["wow_change"] / wow_comparison["w2_revenue"] * 100
).round(1)

# Mark direction with a text indicator
def wow_indicator(pct):
    if pd.isna(pct):
        return "N/A"
    elif pct > 10:
        return "UP strongly"
    elif pct > 0:
        return "UP"
    elif pct > -10:
        return "DOWN"
    else:
        return "DOWN strongly"

wow_comparison["trend"] = wow_comparison["wow_change_pct"].apply(wow_indicator)

print("\n=== Week-over-Week Revenue Comparison ===")
print(wow_comparison[["region","total_revenue","w2_revenue","wow_change","wow_change_pct","trend"]].to_string(index=False))

Output:

region  total_revenue  w2_revenue  wow_change  wow_change_pct          trend
 North          10900       11200        -300            -2.7           DOWN
  West          10300        3800        6500           171.1   UP strongly
 South           6300        4200        2100            50.0   UP strongly
  East           5100        5100           0             0.0             UP

West had a strong week — three transactions versus one in week 2. Priya makes a mental note to call this out in the report narrative.


Step 6: Customer Tier Analysis

tier_analysis = week3_enriched.groupby("tier").agg(
    customer_count=("customer_id", "nunique"),
    transaction_count=("revenue", "count"),
    total_revenue=("revenue", "sum"),
    avg_revenue_per_deal=("revenue", "mean"),
    avg_margin_pct=("margin_pct", "mean"),
).round(2).reset_index()

tier_analysis["revenue_share_pct"] = (
    tier_analysis["total_revenue"] / tier_analysis["total_revenue"].sum() * 100
).round(1)

# Add the tier ranking for Sandra's benefit
tier_order = {"Gold": 1, "Silver": 2, "Bronze": 3}
tier_analysis["sort_key"] = tier_analysis["tier"].map(tier_order)
tier_analysis = tier_analysis.sort_values("sort_key").drop(columns="sort_key")

print("\n=== Week 3 Customer Tier Analysis ===")
print(tier_analysis.to_string(index=False))

Output:

   tier  customer_count  transaction_count  total_revenue  avg_revenue_per_deal  avg_margin_pct  revenue_share_pct
   Gold               2                  3          16000                5333.3            63.1               49.5
 Silver               2                  3          11300                3766.7            58.4               35.0
 Bronze               1                  2           4900                2450.0            42.0               15.2 (approx)

Gold tier customers represent about half of revenue despite being fewer than half the customer count. This supports Acme's premium customer strategy.


Step 7: Pivot to Region × Week Matrix

Sandra always wants the pivot view. It lets her scan the whole quarter in one table.

# Stack both weeks for the pivot
week3_enriched["week_label"] = "Week 3"
week2_enriched["week_label"] = "Week 2"
both_weeks = pd.concat([week2_enriched, week3_enriched], ignore_index=True)

region_week_pivot = pd.pivot_table(
    both_weeks,
    values="revenue",
    index="region",
    columns="week_label",
    aggfunc="sum",
    fill_value=0,
    margins=True,
    margins_name="Total",
)

# Order columns chronologically
region_week_pivot = region_week_pivot[["Week 2", "Week 3", "Total"]]

print("\n=== Revenue by Region × Week (Pivot View) ===")
print(region_week_pivot)

Output:

week_label  Week 2  Week 3   Total
region
East          5100    5100   10200
North        11200   10900   22100
South         4200    6300   10500
West          3800   10300   14100
Total        24300   32600   56900

Step 8: Product Category Mix

category_mix = week3_enriched.groupby("product_category").agg(
    total_revenue=("revenue", "sum"),
    deal_count=("revenue", "count"),
    avg_margin_pct=("margin_pct", "mean"),
).round(2).reset_index()

category_mix["revenue_share_pct"] = (
    category_mix["total_revenue"] / category_mix["total_revenue"].sum() * 100
).round(1)

category_mix = category_mix.sort_values("total_revenue", ascending=False)

print("\n=== Week 3 Product Category Mix ===")
print(category_mix.to_string(index=False))

Step 9: Assemble the Final Report

print("\n" + "=" * 60)
print("ACME CORP — WEEKLY SALES REPORT — WEEK 3 (Jan 15–19, 2024)")
print("Prepared by: Priya Okonkwo  |  Reviewed by: Sandra Chen, VP Sales")
print("=" * 60)

total_w3 = week3_enriched["revenue"].sum()
total_w2 = week2_enriched["revenue"].sum()
wow_total_pct = ((total_w3 - total_w2) / total_w2 * 100).round(1)

print(f"\nTotal Revenue: ${total_w3:,.0f}  ({wow_total_pct:+.1f}% WoW)")
print(f"Total Margin:  ${week3_enriched['margin'].sum():,.0f}  "
      f"({week3_enriched['margin_pct'].mean():.1f}% avg)")
print(f"Transactions:  {len(week3_enriched)}")
print(f"Active Customers: {week3_enriched['customer_id'].nunique()}")

print("\n--- Regional Revenue ---")
for _, row in week3_regional.iterrows():
    print(f"  {row['region']:<8}: ${row['total_revenue']:>7,.0f}  "
          f"(margin: {row['margin_pct']:.1f}%)")

print("\n--- Key Observations ---")
print("  1. West region grew 171% WoW — review customer mix for sustainability.")
print("  2. Gold tier customers account for ~50% of weekly revenue.")
print("  3. East region flat WoW; single customer (Magnolia Ltd) — follow up.")
print("  4. Software leading product mix; Hardware margin below company average.")

What Priya Learned

Priya finishes the script at 8:22 a.m. — 37 minutes before the meeting, with time to review the output and prepare talking points. The analysis that used to take two hours now takes 35 minutes, and most of that is the thinking, not the mechanics.

She reflects on the techniques that made this possible:

  1. prepare_transactions() as a reusable function — applying it to both weeks kept the logic consistent and the code DRY (Don't Repeat Yourself).
  2. Left merge — keeping all transactions even if the customer master had a gap, and then explicitly checking for unmatched rows.
  3. Named aggregations in .agg() — self-documenting column names that require no renaming.
  4. .concat() to combine weeks — a clean way to stack data before pivoting.
  5. pd.pivot_table() with margins=True — the row and column totals come for free.

Most importantly, the script is a living document. Next Monday, Priya will update the data, re-run the script, and have the report in minutes. That repeatability is the real value.


Discussion Questions

  1. Priya used a left join when merging transactions with the customer master. What would have happened with an inner join if one transaction had a customer ID not in the master?

  2. The prepare_transactions() function takes a copy of the DataFrame (df = df.copy()). Why is this important? What would happen if it modified the original DataFrame in place?

  3. Sandra wants to add a "Month-to-Date" column to the pivot showing the cumulative total for January. How would you compute this using the combined both_weeks DataFrame?

  4. How would you modify the tier analysis to also show the breakdown by segment (Enterprise / Mid-Market / SMB)?

  5. The wow_indicator() function uses hard-coded thresholds (+10%, -10%). Is this a good design? How might you make the thresholds configurable?