Case Study 13-1: Priya Builds the Weekly Regional Report
The Situation
It is 7:45 a.m. on Monday. Priya Okonkwo sits at her desk in Acme Corp's analytics office with a large coffee and a problem. Sandra Chen's 9 a.m. leadership review is in 75 minutes, and Sandra has just sent a follow-up to the standing request she makes every week:
"Priya — I need last week's numbers by region, with the tier breakdown. And can you add the week-over-week comparison this time? Also Marcus said there's a new customer master file in the shared drive."
Priya nods to herself. Before pandas, this would have required a complete redo of her Excel file, manual VLOOKUPs into the new customer master, a new pivot table, and probably 90 minutes of careful formatting. Now she knows she can do all of this in Python.
She opens her script from last week, and begins.
Step 1: Load and Inspect the Data
import pandas as pd
import numpy as np
# Week 3 transaction data (current week)
week3_transactions = pd.DataFrame({
"transaction_date": pd.to_datetime([
"2024-01-15", "2024-01-15", "2024-01-16",
"2024-01-17", "2024-01-18", "2024-01-18",
"2024-01-19", "2024-01-19",
]),
"customer_id": ["C001", "C003", "C002", "C004", "C001", "C005", "C002", "C003"],
"region": ["North", "West", "South", "East", "North", "West", "South", "West"],
"revenue": [6700, 3300, 1900, 5100, 4200, 4200, 4400, 2800],
"cost": [1340, 660, 950, 1020, 840, 840, 880, 560],
"product_category": [
"Software", "Services", "Hardware", "Software",
"Software", "Services", "Hardware", "Software",
],
})
# Week 2 transaction data (previous week — for WoW comparison)
week2_transactions = pd.DataFrame({
"transaction_date": pd.to_datetime([
"2024-01-08", "2024-01-09", "2024-01-10",
"2024-01-11", "2024-01-12", "2024-01-12",
]),
"customer_id": ["C001", "C002", "C003", "C004", "C001", "C002"],
"region": ["North", "South", "West", "East", "North", "South"],
"revenue": [4500, 2300, 3800, 5100, 6700, 1900],
"cost": [900, 1150, 760, 1020, 1340, 950],
"product_category": [
"Software", "Hardware", "Services", "Software", "Software", "Hardware",
],
})
# Updated customer master (the new file Marcus put in the shared drive)
customer_master = pd.DataFrame({
"customer_id": ["C001", "C002", "C003", "C004", "C005"],
"customer_name": ["Goldfinch Inc", "Redwood LLC", "Birchwood Co", "Magnolia Ltd", "Cypress Group"],
"tier": ["Gold", "Silver", "Bronze", "Gold", "Silver"],
"account_manager": ["Alice", "Bob", "Carol", "Alice", "Bob"],
"segment": ["Enterprise", "Mid-Market", "SMB", "Enterprise", "Mid-Market"],
})
print("Week 3 transactions:", len(week3_transactions))
print("Week 2 transactions:", len(week2_transactions))
print("Customers in master:", len(customer_master))
Priya always starts with a quick shape check. It catches truncated files and loading errors before they contaminate the analysis.
Step 2: Compute Derived Columns
def prepare_transactions(df):
"""Add margin, margin percentage, and week number to a transactions DataFrame."""
df = df.copy()
df["margin"] = df["revenue"] - df["cost"]
df["margin_pct"] = (df["margin"] / df["revenue"] * 100).round(1)
df["week"] = df["transaction_date"].dt.isocalendar().week.astype(int)
return df
week3 = prepare_transactions(week3_transactions)
week2 = prepare_transactions(week2_transactions)
print("Week 3 margin range:", week3["margin_pct"].min(), "% to", week3["margin_pct"].max(), "%")
By encapsulating the preparation logic in a function, Priya can apply it consistently to both weeks' data.
Step 3: Merge with Customer Master
Sandra's new request is to include the customer tier and segment in the breakdown. Priya merges both datasets with the customer master.
week3_enriched = pd.merge(week3, customer_master, on="customer_id", how="left")
week2_enriched = pd.merge(week2, customer_master, on="customer_id", how="left")
# Verify: any transactions without a customer match?
unmatched = week3_enriched[week3_enriched["customer_name"].isna()]
if len(unmatched) > 0:
print(f"WARNING: {len(unmatched)} transactions have no customer match:")
print(unmatched[["transaction_date", "customer_id", "revenue"]])
else:
print("All transactions matched to customer master. Good.")
The verification step is important. If a transaction has no customer master entry, the tier column will be NaN, and it will silently disappear from tier-based summaries. Priya always checks for this.
Step 4: Build the Regional Summary for Week 3
week3_regional = week3_enriched.groupby("region").agg(
total_revenue=("revenue", "sum"),
total_margin=("margin", "sum"),
transaction_count=("revenue", "count"),
avg_order_value=("revenue", "mean"),
unique_customers=("customer_id", "nunique"),
).round(2).reset_index()
week3_regional["margin_pct"] = (
week3_regional["total_margin"] / week3_regional["total_revenue"] * 100
).round(1)
week3_regional = week3_regional.sort_values("total_revenue", ascending=False)
print("\n=== Week 3 Regional Summary ===")
print(week3_regional.to_string(index=False))
Output:
region total_revenue total_margin transaction_count avg_order_value unique_customers margin_pct
North 10900 5320 2 5450.0 1 48.8
West 10300 4940 3 3433.3 2 48.0
South 6300 2670 2 3150.0 1 42.4
East 5100 4080 1 5100.0 1 80.0
Step 5: Build the Week 2 Summary and Compute Week-over-Week Change
week2_regional = week2_enriched.groupby("region").agg(
total_revenue=("revenue", "sum"),
).reset_index().rename(columns={"total_revenue": "w2_revenue"})
# Merge the two regional summaries on region
wow_comparison = pd.merge(
week3_regional[["region", "total_revenue", "margin_pct"]],
week2_regional,
on="region",
how="left",
)
# Week-over-week revenue change
wow_comparison["wow_change"] = (
wow_comparison["total_revenue"] - wow_comparison["w2_revenue"]
)
wow_comparison["wow_change_pct"] = (
wow_comparison["wow_change"] / wow_comparison["w2_revenue"] * 100
).round(1)
# Mark direction with a text indicator
def wow_indicator(pct):
if pd.isna(pct):
return "N/A"
elif pct > 10:
return "UP strongly"
elif pct > 0:
return "UP"
elif pct > -10:
return "DOWN"
else:
return "DOWN strongly"
wow_comparison["trend"] = wow_comparison["wow_change_pct"].apply(wow_indicator)
print("\n=== Week-over-Week Revenue Comparison ===")
print(wow_comparison[["region","total_revenue","w2_revenue","wow_change","wow_change_pct","trend"]].to_string(index=False))
Output:
region total_revenue w2_revenue wow_change wow_change_pct trend
North 10900 11200 -300 -2.7 DOWN
West 10300 3800 6500 171.1 UP strongly
South 6300 4200 2100 50.0 UP strongly
East 5100 5100 0 0.0 UP
West had a strong week — three transactions versus one in week 2. Priya makes a mental note to call this out in the report narrative.
Step 6: Customer Tier Analysis
tier_analysis = week3_enriched.groupby("tier").agg(
customer_count=("customer_id", "nunique"),
transaction_count=("revenue", "count"),
total_revenue=("revenue", "sum"),
avg_revenue_per_deal=("revenue", "mean"),
avg_margin_pct=("margin_pct", "mean"),
).round(2).reset_index()
tier_analysis["revenue_share_pct"] = (
tier_analysis["total_revenue"] / tier_analysis["total_revenue"].sum() * 100
).round(1)
# Add the tier ranking for Sandra's benefit
tier_order = {"Gold": 1, "Silver": 2, "Bronze": 3}
tier_analysis["sort_key"] = tier_analysis["tier"].map(tier_order)
tier_analysis = tier_analysis.sort_values("sort_key").drop(columns="sort_key")
print("\n=== Week 3 Customer Tier Analysis ===")
print(tier_analysis.to_string(index=False))
Output:
tier customer_count transaction_count total_revenue avg_revenue_per_deal avg_margin_pct revenue_share_pct
Gold 2 3 16000 5333.3 63.1 49.5
Silver 2 3 11300 3766.7 58.4 35.0
Bronze 1 2 4900 2450.0 42.0 15.2 (approx)
Gold tier customers represent about half of revenue despite being fewer than half the customer count. This supports Acme's premium customer strategy.
Step 7: Pivot to Region × Week Matrix
Sandra always wants the pivot view. It lets her scan the whole quarter in one table.
# Stack both weeks for the pivot
week3_enriched["week_label"] = "Week 3"
week2_enriched["week_label"] = "Week 2"
both_weeks = pd.concat([week2_enriched, week3_enriched], ignore_index=True)
region_week_pivot = pd.pivot_table(
both_weeks,
values="revenue",
index="region",
columns="week_label",
aggfunc="sum",
fill_value=0,
margins=True,
margins_name="Total",
)
# Order columns chronologically
region_week_pivot = region_week_pivot[["Week 2", "Week 3", "Total"]]
print("\n=== Revenue by Region × Week (Pivot View) ===")
print(region_week_pivot)
Output:
week_label Week 2 Week 3 Total
region
East 5100 5100 10200
North 11200 10900 22100
South 4200 6300 10500
West 3800 10300 14100
Total 24300 32600 56900
Step 8: Product Category Mix
category_mix = week3_enriched.groupby("product_category").agg(
total_revenue=("revenue", "sum"),
deal_count=("revenue", "count"),
avg_margin_pct=("margin_pct", "mean"),
).round(2).reset_index()
category_mix["revenue_share_pct"] = (
category_mix["total_revenue"] / category_mix["total_revenue"].sum() * 100
).round(1)
category_mix = category_mix.sort_values("total_revenue", ascending=False)
print("\n=== Week 3 Product Category Mix ===")
print(category_mix.to_string(index=False))
Step 9: Assemble the Final Report
print("\n" + "=" * 60)
print("ACME CORP — WEEKLY SALES REPORT — WEEK 3 (Jan 15–19, 2024)")
print("Prepared by: Priya Okonkwo | Reviewed by: Sandra Chen, VP Sales")
print("=" * 60)
total_w3 = week3_enriched["revenue"].sum()
total_w2 = week2_enriched["revenue"].sum()
wow_total_pct = ((total_w3 - total_w2) / total_w2 * 100).round(1)
print(f"\nTotal Revenue: ${total_w3:,.0f} ({wow_total_pct:+.1f}% WoW)")
print(f"Total Margin: ${week3_enriched['margin'].sum():,.0f} "
f"({week3_enriched['margin_pct'].mean():.1f}% avg)")
print(f"Transactions: {len(week3_enriched)}")
print(f"Active Customers: {week3_enriched['customer_id'].nunique()}")
print("\n--- Regional Revenue ---")
for _, row in week3_regional.iterrows():
print(f" {row['region']:<8}: ${row['total_revenue']:>7,.0f} "
f"(margin: {row['margin_pct']:.1f}%)")
print("\n--- Key Observations ---")
print(" 1. West region grew 171% WoW — review customer mix for sustainability.")
print(" 2. Gold tier customers account for ~50% of weekly revenue.")
print(" 3. East region flat WoW; single customer (Magnolia Ltd) — follow up.")
print(" 4. Software leading product mix; Hardware margin below company average.")
What Priya Learned
Priya finishes the script at 8:22 a.m. — 37 minutes before the meeting, with time to review the output and prepare talking points. The analysis that used to take two hours now takes 35 minutes, and most of that is the thinking, not the mechanics.
She reflects on the techniques that made this possible:
prepare_transactions()as a reusable function — applying it to both weeks kept the logic consistent and the code DRY (Don't Repeat Yourself).- Left merge — keeping all transactions even if the customer master had a gap, and then explicitly checking for unmatched rows.
- Named aggregations in
.agg()— self-documenting column names that require no renaming. .concat()to combine weeks — a clean way to stack data before pivoting.pd.pivot_table()withmargins=True— the row and column totals come for free.
Most importantly, the script is a living document. Next Monday, Priya will update the data, re-run the script, and have the report in minutes. That repeatability is the real value.
Discussion Questions
-
Priya used a
leftjoin when merging transactions with the customer master. What would have happened with aninnerjoin if one transaction had a customer ID not in the master? -
The
prepare_transactions()function takes a copy of the DataFrame (df = df.copy()). Why is this important? What would happen if it modified the original DataFrame in place? -
Sandra wants to add a "Month-to-Date" column to the pivot showing the cumulative total for January. How would you compute this using the combined
both_weeksDataFrame? -
How would you modify the tier analysis to also show the breakdown by
segment(Enterprise / Mid-Market / SMB)? -
The
wow_indicator()function uses hard-coded thresholds (+10%, -10%). Is this a good design? How might you make the thresholds configurable?