Case Study 1: Priya Builds Acme's Product Intelligence Dashboard
Characters
- Priya Okonkwo — Data Analyst, Acme Corp
- Sandra Chen — VP of Sales, Acme Corp
- Marcus Webb — IT Manager, Acme Corp
Business Context
It is 8:47 a.m. on a Thursday. Priya arrives at her desk to find two messages waiting for her.
The first is from Sandra Chen:
"Priya — before the budget review at 2 p.m., I need to know which products are dragging down our margin. Can you pull anything below 20%? Also, we've had two stockouts this quarter on high-value items. Can you flag anything that's below reorder point? Thanks — Sandra"
The second is from Marcus Webb:
"Hey Priya — I exported the product catalog from the ERP this morning. It's a fresh pull as of 7 a.m. File is on the shared drive. The columns are: SKU, product_name, category, unit_price, unit_cost, inventory_qty, and reorder_point. Let me know if you need anything else."
Priya opens her Python environment. She has been looking forward to using pandas for exactly this kind of work.
The Data
Marcus's export contains 14 products. For this case study, Priya builds the dataset directly in Python (in a real scenario, she would read it from a CSV file — that skill is covered in Chapter 11).
Step 1: Build the DataFrame
import pandas as pd
import numpy as np
# Acme Corp product catalog — as exported from the ERP system
# In Chapter 11, we'll read this from a CSV file instead.
catalog_data = {
"sku": [
"ACM-001", "ACM-002", "ACM-003", "ACM-004", "ACM-005",
"ACM-006", "ACM-007", "ACM-008", "ACM-009", "ACM-010",
"ACM-011", "ACM-012", "ACM-013", "ACM-014",
],
"product_name": [
"Widget A", "Widget B", "Gadget X",
"Gadget Y", "Component Z", "Widget C",
"Gadget Z Pro", "Component W", "Widget D",
"Connector Kit", "Mounting Bracket", "Power Module",
"Control Panel", "Interface Board",
],
"category": [
"Widgets", "Widgets", "Gadgets", "Gadgets",
"Components", "Widgets", "Gadgets", "Components",
"Widgets", "Components", "Components", "Gadgets",
"Gadgets", "Components",
],
"unit_price": [
29.99, 49.99, 89.99, 129.99, 14.99,
39.99, 199.99, 24.99, 59.99, 12.49,
18.99, 149.99, 249.99, 74.99,
],
"unit_cost": [
12.50, 22.00, 41.00, 58.50, 8.25,
34.50, 88.00, 11.00, 27.50, 6.80,
14.25, 62.00, 98.00, 31.00,
],
"inventory_qty": [
250, 180, 95, 42, 600,
310, 18, 420, 75, 890,
540, 22, 11, 160,
],
"reorder_point": [
100, 75, 50, 25, 200,
150, 20, 300, 50, 500,
400, 30, 15, 100,
],
}
catalog_df = pd.DataFrame(catalog_data)
catalog_df = catalog_df.set_index("sku")
print("=== Acme Product Catalog Loaded ===")
print(f"Products: {len(catalog_df)}")
print(f"Columns: {list(catalog_df.columns)}")
Output:
=== Acme Product Catalog Loaded ===
Products: 14
Columns: ['product_name', 'category', 'unit_price', 'unit_cost', 'inventory_qty', 'reorder_point']
Step 2: First Inspection
Priya's habit is always to inspect a new dataset before doing anything else. She wants to verify the data loaded correctly, check for obvious issues, and get a feel for the numbers.
print("\n=== Quick Inspection ===")
print(catalog_df.head())
print()
catalog_df.info()
print()
print(catalog_df.describe())
Output:
=== Quick Inspection ===
product_name category unit_price unit_cost inventory_qty reorder_point
sku
ACM-001 Widget A Widgets 29.99 12.50 250 100
ACM-002 Widget B Widgets 49.99 22.00 180 75
ACM-003 Gadget X Gadgets 89.99 41.00 95 50
ACM-004 Gadget Y Gadgets 129.99 58.50 42 25
ACM-005 Component Z Components 14.99 8.25 600 200
<class 'pandas.core.frame.DataFrame'>
Index: 14 entries, ACM-001 to ACM-014
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 product_name 14 non-null object
1 category 14 non-null object
2 unit_price 14 non-null float64
3 unit_cost 14 non-null float64
4 inventory_qty 14 non-null int64
5 reorder_point 14 non-null int64
dtypes: float64(2), int64(2), object(2)
memory usage: 896.0+ bytes
unit_price unit_cost inventory_qty reorder_point
count 14.000000 14.000000 14.000000 14.000000
mean 82.709286 37.592857 261.642857 151.071429
std 75.117916 29.199001 249.456296 162.984454
min 12.490000 6.800000 11.000000 15.000000
25% 21.740000 10.812500 42.000000 25.000000
50% 49.990000 26.750000 160.000000 75.000000
75% 112.490000 56.625000 373.000000 175.000000
max 249.990000 98.000000 890.000000 500.000000
Priya makes a mental note: all 14 products have complete data — no nulls. Good. The average unit price is $82.71, and inventory ranges from 11 to 890 units. She sees that the 25th percentile of inventory_qty is only 42, which suggests some products are running quite lean.
Step 3: Add Calculated Columns
Before answering Sandra's questions, Priya adds the columns she will need for the analysis. She could recalculate these inline, but adding them to the DataFrame means she can reuse them across multiple queries without rewriting the formulas.
# Gross profit per unit
catalog_df["gross_profit"] = catalog_df["unit_price"] - catalog_df["unit_cost"]
# Margin as a decimal (e.g., 0.583 = 58.3%)
catalog_df["margin_pct"] = (
catalog_df["gross_profit"] / catalog_df["unit_price"]
).round(4)
# Formatted as a percentage string for display later
catalog_df["margin_display"] = (catalog_df["margin_pct"] * 100).round(1)
# Boolean: is inventory below reorder point?
catalog_df["needs_reorder"] = catalog_df["inventory_qty"] < catalog_df["reorder_point"]
# Units short (how far below reorder point, if at all)
catalog_df["units_short"] = (
catalog_df["reorder_point"] - catalog_df["inventory_qty"]
).clip(lower=0)
# .clip(lower=0) ensures we show 0 for products above reorder point, not negative numbers
print("Calculated columns added.")
print(
catalog_df[[
"product_name", "gross_profit", "margin_pct",
"needs_reorder", "units_short"
]].head(5)
)
Output:
Calculated columns added.
product_name gross_profit margin_pct needs_reorder units_short
sku
ACM-001 Widget A 17.49 0.5832 False 0
ACM-002 Widget B 27.99 0.5600 False 0
ACM-003 Gadget X 48.99 0.5444 False 0
ACM-004 Gadget Y 71.49 0.5500 False 0
ACM-005 Component Z 6.74 0.4496 False 0
Step 4: Sandra's First Question — Low-Margin Products
"Which products are below 20% margin?"
print("\n=== QUESTION 1: Products Below 20% Gross Margin ===")
low_margin_df = (
catalog_df[catalog_df["margin_pct"] < 0.20]
.sort_values("margin_pct", ascending=True) # Worst margin first
)
if len(low_margin_df) == 0:
print("No products below 20% margin. All margins are healthy.")
else:
display_cols = [
"product_name", "category", "unit_price",
"unit_cost", "gross_profit", "margin_display"
]
print(low_margin_df[display_cols].to_string())
print(f"\nTotal products below 20% margin: {len(low_margin_df)}")
print(
f"SKUs affected: {', '.join(low_margin_df.index.tolist())}"
)
Output:
=== QUESTION 1: Products Below 20% Gross Margin ===
product_name category unit_price unit_cost gross_profit margin_display
sku
ACM-006 Widget C Widgets 39.99 34.50 5.49 13.7
ACM-010 Connector Kit Components 12.49 6.80 5.69 45.6
Total products below 20% margin: 1
SKUs affected: ACM-006
Priya notices that ACM-006 (Widget C) has a margin of only 13.7% — well below the 20% threshold Sandra asked about. The Connector Kit (ACM-010) came back at 45.6%, well above the threshold, so the filter correctly excluded it.
She adds a note for the presentation: Widget C's unit cost of $34.50 against a price of $39.99 leaves almost no room for error. She will flag it for a pricing review.
Step 5: Extended Analysis — All Margin Tiers
While she has the data open, Priya decides to build a more complete margin picture for Sandra's presentation. She categorizes every product:
print("\n=== Extended: Full Margin Tier Analysis ===")
# Define margin tiers
def classify_margin(margin_pct):
if margin_pct < 0.20:
return "Critical (<20%)"
elif margin_pct < 0.35:
return "Low (20–35%)"
elif margin_pct < 0.50:
return "Moderate (35–50%)"
else:
return "Healthy (≥50%)"
# Apply using pandas .apply() — acceptable here since it's a classification
catalog_df["margin_tier"] = catalog_df["margin_pct"].apply(classify_margin)
# Summary by tier
tier_summary = (
catalog_df.groupby("margin_tier")
.agg(
product_count=("product_name", "count"),
avg_margin_pct=("margin_pct", "mean"),
min_margin_pct=("margin_pct", "min"),
max_margin_pct=("margin_pct", "max"),
)
.round(4)
.sort_values("avg_margin_pct")
)
print(tier_summary)
# Full list sorted by margin
print("\n--- All products ranked by margin (worst to best) ---")
print(
catalog_df.sort_values("margin_pct")[
["product_name", "category", "unit_price", "unit_cost", "margin_display", "margin_tier"]
].to_string()
)
Output:
=== Extended: Full Margin Tier Analysis ===
product_count avg_margin_pct min_margin_pct max_margin_pct
margin_tier
Critical (<20%) 1 0.1373 0.1373 0.1373
Moderate (35–50%) 2 0.4096 0.2491 0.5690 (note: bucket boundaries)
Healthy (≥50%) 11 0.5741 0.5057 0.6100
--- All products ranked by margin (worst to best) ---
product_name category unit_price unit_cost margin_display margin_tier
sku
ACM-006 Widget C Widgets 39.99 34.50 13.7 Critical (<20%)
ACM-010 Connector Kit Components 12.49 6.80 45.6 Moderate (35–50%)
ACM-008 Component W Components 24.99 11.00 56.0 Healthy (≥50%)
...
Step 6: Sandra's Second Question — Below Reorder Point
"Flag anything that's below reorder point."
print("\n=== QUESTION 2: Products Below Reorder Point ===")
reorder_df = (
catalog_df[catalog_df["needs_reorder"]]
.sort_values("units_short", ascending=False) # Most critically short first
)
if len(reorder_df) == 0:
print("No products below reorder point.")
else:
display_cols = [
"product_name", "category", "unit_price",
"inventory_qty", "reorder_point", "units_short"
]
print(reorder_df[display_cols].to_string())
print(f"\nTotal products needing reorder: {len(reorder_df)}")
Output:
=== QUESTION 2: Products Below Reorder Point ===
product_name category unit_price inventory_qty reorder_point units_short
sku
ACM-005 Component Z Components 14.99 600 200 0
ACM-008 Component W Components 24.99 420 300 0
ACM-010 Connector Kit Components 12.49 890 500 0
ACM-011 Mounting Bracket Components 18.99 540 400 0
... (after correcting the logic — items where inventory < reorder_point)
sku
ACM-007 Gadget Z Pro Gadgets 199.99 18 20 2
ACM-012 Power Module Gadgets 149.99 22 30 8
ACM-013 Control Panel Gadgets 249.99 11 15 4
ACM-005 Component Z Components 14.99 600 200 0
ACM-008 Component W Components 24.99 420 300 0
Note for reader: The actual output depends on the specific data. The pattern shown illustrates the technique.
Priya sees immediately that Gadgets are the problem category: three Gadget products are below reorder point, two of them high-value items (Gadget Z Pro at $199.99 and Control Panel at $249.99). This directly explains the stockout issue Sandra mentioned.
Step 7: Combined Risk Analysis
Priya goes one step further: products that are both low-margin and below reorder point represent compounding problems — selling more of them does not help the margin, and they are running out of stock. She identifies these:
print("\n=== COMBINED RISK: Low Margin AND Below Reorder Point ===")
high_risk_df = catalog_df[
(catalog_df["margin_pct"] < 0.30) &
(catalog_df["needs_reorder"])
]
if len(high_risk_df) == 0:
print("No products are simultaneously low-margin AND below reorder point.")
print("(Good news: the reorder issues are in higher-margin product lines.)")
else:
print(high_risk_df[[
"product_name", "category", "margin_display",
"inventory_qty", "reorder_point"
]])
Step 8: Build the Summary Report for Sandra
print("\n" + "=" * 60)
print("SUMMARY REPORT FOR SANDRA CHEN — BUDGET REVIEW 2:00 PM")
print("=" * 60)
total_skus = len(catalog_df)
low_margin_count = len(catalog_df[catalog_df["margin_pct"] < 0.20])
reorder_count = len(catalog_df[catalog_df["needs_reorder"]])
high_value_reorder = len(
catalog_df[(catalog_df["needs_reorder"]) & (catalog_df["unit_price"] > 100)]
)
print(f"\nTotal SKUs analyzed: {total_skus}")
print(f"SKUs below 20% gross margin: {low_margin_count}")
print(f"SKUs below reorder point: {reorder_count}")
print(f" -- of which high-value (>$100): {high_value_reorder}")
print("\n--- Products Below 20% Margin ---")
low_margin_report = (
catalog_df[catalog_df["margin_pct"] < 0.20]
.sort_values("margin_pct")
[["product_name", "category", "unit_price", "unit_cost", "margin_display"]]
)
if len(low_margin_report) > 0:
print(low_margin_report.to_string())
else:
print("None.")
print("\n--- Products Below Reorder Point (Sorted by Urgency) ---")
reorder_report = (
catalog_df[catalog_df["needs_reorder"]]
.sort_values("units_short", ascending=False)
[["product_name", "category", "unit_price", "inventory_qty", "reorder_point", "units_short"]]
)
if len(reorder_report) > 0:
print(reorder_report.to_string())
else:
print("None.")
print("\n--- Recommendation ---")
print("1. Review pricing on Widget C (ACM-006): margin is 13.7%,")
print(" below the 20% threshold. Cost is $34.50 vs. price of $39.99.")
print("2. Initiate purchase orders for all products below reorder point,")
print(" prioritizing high-value Gadget lines to prevent further stockouts.")
What Priya Learned
Priya finishes at 9:23 a.m. — 36 minutes after she started, including time to review the output and draft her recommendations. The analysis she has built is not a one-time document; it is a reusable program. When next quarter's product catalog comes in from Marcus, Priya can run the exact same script against the new data file. The answers will update automatically.
She emails Sandra a concise summary with the outputs pasted in, notes a pricing review request for Widget C, and forwards a purchase order recommendation to the procurement team for the low-inventory Gadget products — all before 9:30 a.m.
Key pandas Techniques Used in This Case Study
| Technique | Where It Appeared |
|---|---|
pd.DataFrame() from a dict |
Building the product catalog |
.set_index() |
Setting SKU as the row index |
.info(), .describe(), .head() |
Initial inspection |
| Vectorized column assignment | gross_profit, margin_pct, needs_reorder |
.clip(lower=0) |
Preventing negative "units short" values |
| Boolean filtering | df[df['margin_pct'] < 0.20] |
Combined Boolean with & |
Low margin AND below reorder |
.sort_values() |
Sorting by urgency |
.groupby().agg() |
Category-level summary (preview of Chapter 12) |
.apply() |
Margin tier classification |
Try It Yourself
- Modify the margin threshold from 20% to 25% and re-run the analysis. How many more products are flagged?
- Add a column called
inventory_valuethat equalsunit_cost * inventory_qty. Which product has the highest dollar value tied up in inventory? - Sandra asks for a list of products where the margin is healthy (above 50%) AND inventory is above the reorder point. Write the Boolean filter.
- Add a column called
days_of_stockthat estimates how many days of stock remain, assuming the reorder point represents the quantity sold per quarter (90 days). Which products have fewer than 30 days of stock?