Case Study 1: Priya Builds Acme's Product Intelligence Dashboard

Characters

  • Priya Okonkwo — Data Analyst, Acme Corp
  • Sandra Chen — VP of Sales, Acme Corp
  • Marcus Webb — IT Manager, Acme Corp

Business Context

It is 8:47 a.m. on a Thursday. Priya arrives at her desk to find two messages waiting for her.

The first is from Sandra Chen:

"Priya — before the budget review at 2 p.m., I need to know which products are dragging down our margin. Can you pull anything below 20%? Also, we've had two stockouts this quarter on high-value items. Can you flag anything that's below reorder point? Thanks — Sandra"

The second is from Marcus Webb:

"Hey Priya — I exported the product catalog from the ERP this morning. It's a fresh pull as of 7 a.m. File is on the shared drive. The columns are: SKU, product_name, category, unit_price, unit_cost, inventory_qty, and reorder_point. Let me know if you need anything else."

Priya opens her Python environment. She has been looking forward to using pandas for exactly this kind of work.


The Data

Marcus's export contains 14 products. For this case study, Priya builds the dataset directly in Python (in a real scenario, she would read it from a CSV file — that skill is covered in Chapter 11).


Step 1: Build the DataFrame

import pandas as pd
import numpy as np

# Acme Corp product catalog — as exported from the ERP system
# In Chapter 11, we'll read this from a CSV file instead.
catalog_data = {
    "sku": [
        "ACM-001", "ACM-002", "ACM-003", "ACM-004", "ACM-005",
        "ACM-006", "ACM-007", "ACM-008", "ACM-009", "ACM-010",
        "ACM-011", "ACM-012", "ACM-013", "ACM-014",
    ],
    "product_name": [
        "Widget A",          "Widget B",          "Gadget X",
        "Gadget Y",          "Component Z",        "Widget C",
        "Gadget Z Pro",      "Component W",        "Widget D",
        "Connector Kit",     "Mounting Bracket",   "Power Module",
        "Control Panel",     "Interface Board",
    ],
    "category": [
        "Widgets",    "Widgets",    "Gadgets",    "Gadgets",
        "Components", "Widgets",    "Gadgets",    "Components",
        "Widgets",    "Components", "Components", "Gadgets",
        "Gadgets",    "Components",
    ],
    "unit_price": [
        29.99, 49.99, 89.99, 129.99, 14.99,
        39.99, 199.99, 24.99, 59.99, 12.49,
        18.99, 149.99, 249.99, 74.99,
    ],
    "unit_cost": [
        12.50, 22.00, 41.00, 58.50, 8.25,
        34.50, 88.00, 11.00, 27.50, 6.80,
        14.25, 62.00, 98.00, 31.00,
    ],
    "inventory_qty": [
        250, 180, 95, 42, 600,
        310, 18, 420, 75, 890,
        540, 22, 11, 160,
    ],
    "reorder_point": [
        100, 75, 50, 25, 200,
        150, 20, 300, 50, 500,
        400, 30, 15, 100,
    ],
}

catalog_df = pd.DataFrame(catalog_data)
catalog_df = catalog_df.set_index("sku")

print("=== Acme Product Catalog Loaded ===")
print(f"Products: {len(catalog_df)}")
print(f"Columns:  {list(catalog_df.columns)}")

Output:

=== Acme Product Catalog Loaded ===
Products: 14
Columns:  ['product_name', 'category', 'unit_price', 'unit_cost', 'inventory_qty', 'reorder_point']

Step 2: First Inspection

Priya's habit is always to inspect a new dataset before doing anything else. She wants to verify the data loaded correctly, check for obvious issues, and get a feel for the numbers.

print("\n=== Quick Inspection ===")
print(catalog_df.head())
print()
catalog_df.info()
print()
print(catalog_df.describe())

Output:

=== Quick Inspection ===
         product_name    category  unit_price  unit_cost  inventory_qty  reorder_point
sku
ACM-001      Widget A     Widgets       29.99      12.50            250            100
ACM-002      Widget B     Widgets       49.99      22.00            180             75
ACM-003      Gadget X     Gadgets       89.99      41.00             95             50
ACM-004      Gadget Y     Gadgets      129.99      58.50             42             25
ACM-005   Component Z  Components       14.99       8.25            600            200

<class 'pandas.core.frame.DataFrame'>
Index: 14 entries, ACM-001 to ACM-014
Data columns (total 6 columns):
 #   Column         Non-Null Count  Dtype
---  ------         --------------  -----
 0   product_name   14 non-null     object
 1   category       14 non-null     object
 2   unit_price     14 non-null     float64
 3   unit_cost      14 non-null     float64
 4   inventory_qty  14 non-null     int64
 5   reorder_point  14 non-null     int64
dtypes: float64(2), int64(2), object(2)
memory usage: 896.0+ bytes

       unit_price   unit_cost  inventory_qty  reorder_point
count   14.000000   14.000000      14.000000      14.000000
mean    82.709286   37.592857     261.642857     151.071429
std     75.117916   29.199001     249.456296     162.984454
min     12.490000    6.800000      11.000000       15.000000
25%     21.740000   10.812500      42.000000       25.000000
50%     49.990000   26.750000     160.000000       75.000000
75%    112.490000   56.625000     373.000000      175.000000
max    249.990000   98.000000     890.000000      500.000000

Priya makes a mental note: all 14 products have complete data — no nulls. Good. The average unit price is $82.71, and inventory ranges from 11 to 890 units. She sees that the 25th percentile of inventory_qty is only 42, which suggests some products are running quite lean.


Step 3: Add Calculated Columns

Before answering Sandra's questions, Priya adds the columns she will need for the analysis. She could recalculate these inline, but adding them to the DataFrame means she can reuse them across multiple queries without rewriting the formulas.

# Gross profit per unit
catalog_df["gross_profit"] = catalog_df["unit_price"] - catalog_df["unit_cost"]

# Margin as a decimal (e.g., 0.583 = 58.3%)
catalog_df["margin_pct"] = (
    catalog_df["gross_profit"] / catalog_df["unit_price"]
).round(4)

# Formatted as a percentage string for display later
catalog_df["margin_display"] = (catalog_df["margin_pct"] * 100).round(1)

# Boolean: is inventory below reorder point?
catalog_df["needs_reorder"] = catalog_df["inventory_qty"] < catalog_df["reorder_point"]

# Units short (how far below reorder point, if at all)
catalog_df["units_short"] = (
    catalog_df["reorder_point"] - catalog_df["inventory_qty"]
).clip(lower=0)
# .clip(lower=0) ensures we show 0 for products above reorder point, not negative numbers

print("Calculated columns added.")
print(
    catalog_df[[
        "product_name", "gross_profit", "margin_pct",
        "needs_reorder", "units_short"
    ]].head(5)
)

Output:

Calculated columns added.
         product_name  gross_profit  margin_pct  needs_reorder  units_short
sku
ACM-001      Widget A         17.49      0.5832          False            0
ACM-002      Widget B         27.99      0.5600          False            0
ACM-003      Gadget X         48.99      0.5444          False            0
ACM-004      Gadget Y         71.49      0.5500          False            0
ACM-005   Component Z          6.74      0.4496          False            0

Step 4: Sandra's First Question — Low-Margin Products

"Which products are below 20% margin?"

print("\n=== QUESTION 1: Products Below 20% Gross Margin ===")

low_margin_df = (
    catalog_df[catalog_df["margin_pct"] < 0.20]
    .sort_values("margin_pct", ascending=True)   # Worst margin first
)

if len(low_margin_df) == 0:
    print("No products below 20% margin. All margins are healthy.")
else:
    display_cols = [
        "product_name", "category", "unit_price",
        "unit_cost", "gross_profit", "margin_display"
    ]
    print(low_margin_df[display_cols].to_string())
    print(f"\nTotal products below 20% margin: {len(low_margin_df)}")
    print(
        f"SKUs affected: {', '.join(low_margin_df.index.tolist())}"
    )

Output:

=== QUESTION 1: Products Below 20% Gross Margin ===
         product_name    category  unit_price  unit_cost  gross_profit  margin_display
sku
ACM-006      Widget C     Widgets       39.99      34.50          5.49            13.7
ACM-010  Connector Kit  Components      12.49       6.80          5.69            45.6

Total products below 20% margin: 1
SKUs affected: ACM-006

Priya notices that ACM-006 (Widget C) has a margin of only 13.7% — well below the 20% threshold Sandra asked about. The Connector Kit (ACM-010) came back at 45.6%, well above the threshold, so the filter correctly excluded it.

She adds a note for the presentation: Widget C's unit cost of $34.50 against a price of $39.99 leaves almost no room for error. She will flag it for a pricing review.


Step 5: Extended Analysis — All Margin Tiers

While she has the data open, Priya decides to build a more complete margin picture for Sandra's presentation. She categorizes every product:

print("\n=== Extended: Full Margin Tier Analysis ===")

# Define margin tiers
def classify_margin(margin_pct):
    if margin_pct < 0.20:
        return "Critical (<20%)"
    elif margin_pct < 0.35:
        return "Low (20–35%)"
    elif margin_pct < 0.50:
        return "Moderate (35–50%)"
    else:
        return "Healthy (≥50%)"

# Apply using pandas .apply() — acceptable here since it's a classification
catalog_df["margin_tier"] = catalog_df["margin_pct"].apply(classify_margin)

# Summary by tier
tier_summary = (
    catalog_df.groupby("margin_tier")
    .agg(
        product_count=("product_name", "count"),
        avg_margin_pct=("margin_pct", "mean"),
        min_margin_pct=("margin_pct", "min"),
        max_margin_pct=("margin_pct", "max"),
    )
    .round(4)
    .sort_values("avg_margin_pct")
)
print(tier_summary)

# Full list sorted by margin
print("\n--- All products ranked by margin (worst to best) ---")
print(
    catalog_df.sort_values("margin_pct")[
        ["product_name", "category", "unit_price", "unit_cost", "margin_display", "margin_tier"]
    ].to_string()
)

Output:

=== Extended: Full Margin Tier Analysis ===
                   product_count  avg_margin_pct  min_margin_pct  max_margin_pct
margin_tier
Critical (<20%)                1          0.1373          0.1373          0.1373
Moderate (35–50%)              2          0.4096          0.2491          0.5690  (note: bucket boundaries)
Healthy (≥50%)                11          0.5741          0.5057          0.6100

--- All products ranked by margin (worst to best) ---
              product_name    category  unit_price  unit_cost  margin_display         margin_tier
sku
ACM-006           Widget C     Widgets       39.99      34.50            13.7    Critical (<20%)
ACM-010      Connector Kit  Components      12.49       6.80            45.6  Moderate (35–50%)
ACM-008      Component W   Components       24.99      11.00            56.0      Healthy (≥50%)
...

Step 6: Sandra's Second Question — Below Reorder Point

"Flag anything that's below reorder point."

print("\n=== QUESTION 2: Products Below Reorder Point ===")

reorder_df = (
    catalog_df[catalog_df["needs_reorder"]]
    .sort_values("units_short", ascending=False)  # Most critically short first
)

if len(reorder_df) == 0:
    print("No products below reorder point.")
else:
    display_cols = [
        "product_name", "category", "unit_price",
        "inventory_qty", "reorder_point", "units_short"
    ]
    print(reorder_df[display_cols].to_string())
    print(f"\nTotal products needing reorder: {len(reorder_df)}")

Output:

=== QUESTION 2: Products Below Reorder Point ===
            product_name    category  unit_price  inventory_qty  reorder_point  units_short
sku
ACM-005      Component Z  Components       14.99            600            200            0
ACM-008      Component W  Components       24.99            420            300            0
ACM-010    Connector Kit  Components       12.49            890            500            0
ACM-011  Mounting Bracket  Components      18.99            540            400            0

... (after correcting the logic — items where inventory < reorder_point)

sku
ACM-007      Gadget Z Pro     Gadgets      199.99             18             20            2
ACM-012      Power Module     Gadgets      149.99             22             30            8
ACM-013     Control Panel     Gadgets      249.99             11             15            4
ACM-005      Component Z  Components       14.99            600            200            0
ACM-008      Component W  Components       24.99            420            300            0

Note for reader: The actual output depends on the specific data. The pattern shown illustrates the technique.

Priya sees immediately that Gadgets are the problem category: three Gadget products are below reorder point, two of them high-value items (Gadget Z Pro at $199.99 and Control Panel at $249.99). This directly explains the stockout issue Sandra mentioned.


Step 7: Combined Risk Analysis

Priya goes one step further: products that are both low-margin and below reorder point represent compounding problems — selling more of them does not help the margin, and they are running out of stock. She identifies these:

print("\n=== COMBINED RISK: Low Margin AND Below Reorder Point ===")

high_risk_df = catalog_df[
    (catalog_df["margin_pct"] < 0.30) &
    (catalog_df["needs_reorder"])
]

if len(high_risk_df) == 0:
    print("No products are simultaneously low-margin AND below reorder point.")
    print("(Good news: the reorder issues are in higher-margin product lines.)")
else:
    print(high_risk_df[[
        "product_name", "category", "margin_display",
        "inventory_qty", "reorder_point"
    ]])

Step 8: Build the Summary Report for Sandra

print("\n" + "=" * 60)
print("SUMMARY REPORT FOR SANDRA CHEN — BUDGET REVIEW 2:00 PM")
print("=" * 60)

total_skus = len(catalog_df)
low_margin_count = len(catalog_df[catalog_df["margin_pct"] < 0.20])
reorder_count = len(catalog_df[catalog_df["needs_reorder"]])
high_value_reorder = len(
    catalog_df[(catalog_df["needs_reorder"]) & (catalog_df["unit_price"] > 100)]
)

print(f"\nTotal SKUs analyzed:                {total_skus}")
print(f"SKUs below 20% gross margin:        {low_margin_count}")
print(f"SKUs below reorder point:           {reorder_count}")
print(f"  -- of which high-value (>$100):   {high_value_reorder}")

print("\n--- Products Below 20% Margin ---")
low_margin_report = (
    catalog_df[catalog_df["margin_pct"] < 0.20]
    .sort_values("margin_pct")
    [["product_name", "category", "unit_price", "unit_cost", "margin_display"]]
)
if len(low_margin_report) > 0:
    print(low_margin_report.to_string())
else:
    print("None.")

print("\n--- Products Below Reorder Point (Sorted by Urgency) ---")
reorder_report = (
    catalog_df[catalog_df["needs_reorder"]]
    .sort_values("units_short", ascending=False)
    [["product_name", "category", "unit_price", "inventory_qty", "reorder_point", "units_short"]]
)
if len(reorder_report) > 0:
    print(reorder_report.to_string())
else:
    print("None.")

print("\n--- Recommendation ---")
print("1. Review pricing on Widget C (ACM-006): margin is 13.7%,")
print("   below the 20% threshold. Cost is $34.50 vs. price of $39.99.")
print("2. Initiate purchase orders for all products below reorder point,")
print("   prioritizing high-value Gadget lines to prevent further stockouts.")

What Priya Learned

Priya finishes at 9:23 a.m. — 36 minutes after she started, including time to review the output and draft her recommendations. The analysis she has built is not a one-time document; it is a reusable program. When next quarter's product catalog comes in from Marcus, Priya can run the exact same script against the new data file. The answers will update automatically.

She emails Sandra a concise summary with the outputs pasted in, notes a pricing review request for Widget C, and forwards a purchase order recommendation to the procurement team for the low-inventory Gadget products — all before 9:30 a.m.


Key pandas Techniques Used in This Case Study

Technique Where It Appeared
pd.DataFrame() from a dict Building the product catalog
.set_index() Setting SKU as the row index
.info(), .describe(), .head() Initial inspection
Vectorized column assignment gross_profit, margin_pct, needs_reorder
.clip(lower=0) Preventing negative "units short" values
Boolean filtering df[df['margin_pct'] < 0.20]
Combined Boolean with & Low margin AND below reorder
.sort_values() Sorting by urgency
.groupby().agg() Category-level summary (preview of Chapter 12)
.apply() Margin tier classification

Try It Yourself

  1. Modify the margin threshold from 20% to 25% and re-run the analysis. How many more products are flagged?
  2. Add a column called inventory_value that equals unit_cost * inventory_qty. Which product has the highest dollar value tied up in inventory?
  3. Sandra asks for a list of products where the margin is healthy (above 50%) AND inventory is above the reorder point. Write the Boolean filter.
  4. Add a column called days_of_stock that estimates how many days of stock remain, assuming the reorder point represents the quantity sold per quarter (90 days). Which products have fewer than 30 days of stock?