Case Study 31-1: The Email Campaign That Revealed a Hidden Opportunity

Characters: Priya Okonkwo (analyst), Sandra Chen (VP of Sales), Marcus Webb (IT) Setting: Acme Corp headquarters, mid-November

The Setup

Sandra Chen dropped a sticky note on Priya's desk at 8:47 on a Tuesday morning. It read:

"Q3 email campaign: 2.1% click rate. Industry average is 2.8%. Why are we underperforming? Need answer before Thursday's marketing review."

Priya had been at Acme for seven months. Before Python, she would have opened the campaign report in Excel, filtered by date, looked at the overall numbers, shrugged, and written something like "Our open rates were slightly below benchmark." She would have presented this to Sandra and hoped the quarterly results would eventually explain the underperformance.

This time, she had better tools.

The Data

Marcus had exported three months of email campaign data into a CSV: acme_email_campaigns_q3.csv. It contained one row per recipient per campaign, with the following columns:

campaign_id — identifier for each campaign send
campaign_name — human-readable campaign name
send_date — when the email was sent
customer_id — recipient identifier
customer_tier — Gold, Silver, or Standard
email_delivered — boolean
email_opened — boolean
link_clicked — boolean
order_placed — boolean (whether the recipient ordered within 7 days)
order_value — dollar value of that order (0 if no order)

53,840 rows across 12 campaigns. Priya started by loading it and getting a lay of the land.

Step 1: Loading and Orienting

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
import seaborn as sns
import numpy as np

campaigns_df = pd.read_csv("acme_email_campaigns_q3.csv")
campaigns_df["send_date"] = pd.to_datetime(campaigns_df["send_date"])

print(campaigns_df.shape)
# (53840, 10)

print(campaigns_df["customer_tier"].value_counts())
# Standard    38204
# Silver       9628
# Gold         6008

Already an interesting fact: 71% of Acme's email list was Standard tier customers.

# Top-line Q3 metrics
overall_stats = campaigns_df.agg(
    emails_sent=("email_delivered", "count"),
    delivery_rate=("email_delivered", "mean"),
    open_rate=("email_opened", "mean"),
    click_rate=("link_clicked", "mean"),
    order_rate=("order_placed", "mean"),
    avg_order_value=("order_value", "mean"),
    total_revenue=("order_value", "sum"),
)
print(overall_stats)

Output (approximate):

emails_sent         53840.0
delivery_rate           0.981
open_rate               0.238
click_rate              0.021       ← the "underperforming" 2.1% Sandra flagged
order_rate              0.008
avg_order_value        14.62        ← average across ALL recipients (including zero orders)
total_revenue      787,180.00

The 2.1% overall click rate confirmed Sandra's concern. But Priya had a hypothesis forming: what if the Standard tier customers were dragging down an otherwise healthy number?

Step 2: Breaking Down by Customer Tier

tier_metrics = campaigns_df.groupby("customer_tier").agg(
    count=("customer_id", "count"),
    delivery_rate=("email_delivered", "mean"),
    open_rate=("email_opened", "mean"),
    click_rate=("link_clicked", "mean"),
    order_rate=("order_placed", "mean"),
    revenue_per_email=("order_value", "mean"),
    total_revenue=("order_value", "sum"),
).round(4)

tier_metrics["click_to_open_rate"] = (
    tier_metrics["click_rate"] / tier_metrics["open_rate"]
).round(4)

print(tier_metrics)

The output stopped Priya mid-sip of her morning coffee:

               count  open_rate  click_rate  order_rate  revenue_per_email  click_to_open_rate
customer_tier
Gold            6008      0.412       0.064       0.031              43.18               0.155
Silver          9628      0.285       0.028       0.012              11.84               0.098
Standard       38204      0.201       0.011       0.004               4.21               0.055

Gold customers had a 6.4% click rate — nearly three times the Standard tier's 1.1%, and three times the overall average. Gold customers who opened the email clicked through at a 15.5% rate. Standard customers who opened it clicked at only 5.5%.

The "underperforming" campaign was only underperforming because 71% of the list was Standard tier customers who barely engaged with the content.

Step 3: Investigating What Was Different About Gold Customers

Were the Gold customers clicking because of who they were, or because of what they received? Priya dug into the campaign content for Gold vs. Standard recipients.

# Were Gold customers receiving different campaigns?
campaigns_by_tier = campaigns_df.groupby(
    ["campaign_name", "customer_tier"]
)["link_clicked"].mean().unstack().round(4)
print(campaigns_by_tier)

Same campaigns, different response rates across the board. The content was identical — Acme sent the same emails to all tiers.

This was actually the most important finding. Gold customers were not engaging because they received better content. They were engaging because Gold customers have fundamentally different relationships with Acme: higher purchase frequency, more recent orders, more established trust.

# Validate: Gold tier's engagement vs. purchase recency
recency_df = campaigns_df.merge(
    customers_df[["customer_id", "days_since_last_order"]],
    on="customer_id",
    how="left",
)

recency_bins = pd.cut(
    recency_df["days_since_last_order"],
    bins=[0, 30, 60, 90, 180, 365, 9999],
    labels=["0-30d", "31-60d", "61-90d", "91-180d", "181-365d", "365d+"]
)
recency_df["recency_bucket"] = recency_bins

recency_click_rate = recency_df.groupby("recency_bucket")["link_clicked"].mean()
print(recency_click_rate)

Output:

recency_bucket
0-30d      0.081
31-60d     0.054
61-90d     0.031
91-180d    0.018
181-365d   0.011
365d+      0.006

Clear pattern: the more recently a customer had purchased, the more likely they were to click on the email. Gold tier customers had simply ordered more recently on average.

Step 4: The Revenue Analysis

The click rate story was striking, but Sandra would want to know the revenue implications.

# Revenue per email sent, by tier
revenue_per_send = campaigns_df.groupby("customer_tier")["order_value"].mean().sort_values(ascending=False)

print("Revenue per email sent by tier:")
print(revenue_per_send)
# Gold:     $43.18
# Silver:   $11.84
# Standard:  $4.21

# Total email list budget (assume $0.18 per email sent including design and platform cost)
cost_per_email = 0.18
tier_counts = campaigns_df.groupby("customer_tier")["customer_id"].count()
tier_costs = tier_counts * cost_per_email
tier_revenue = campaigns_df.groupby("customer_tier")["order_value"].sum()
tier_roi = ((tier_revenue - tier_costs) / tier_costs * 100).round(1)

print("\nEmail ROI by tier:")
print(tier_roi)
# Gold:     24,720%
# Silver:    3,455%
# Standard:   943%

Every tier was ROI-positive — even Standard customers returned nearly $10 for every $1 in email cost. But Gold customers were generating 26 times the ROI of Standard customers.

Step 5: A/B Test Discovery — Gold Tier Personalization

Priya found something else buried in the data. In one of the Q3 campaigns, Marcus had quietly tested two subject lines: a standard product-focused subject line and a personalized one referencing the recipient's most recently purchased product category.

from scipy.stats import chi2_contingency

# Campaign ID 7 had the A/B test
campaign_7 = campaigns_df[campaigns_df["campaign_id"] == "C007"].copy()

# Marcus labeled them in a notes column: variant_a vs variant_b
a_group = campaign_7[campaign_7["email_variant"] == "variant_a"]
b_group = campaign_7[campaign_7["email_variant"] == "variant_b"]

print(f"Variant A (standard):     {len(a_group):,} recipients, {a_group['link_clicked'].mean():.2%} CTR")
print(f"Variant B (personalized): {len(b_group):,} recipients, {b_group['link_clicked'].mean():.2%} CTR")

# Chi-square test
table = np.array([
    [a_group["link_clicked"].sum(), len(a_group) - a_group["link_clicked"].sum()],
    [b_group["link_clicked"].sum(), len(b_group) - b_group["link_clicked"].sum()],
])
chi2, p_val, _, _ = chi2_contingency(table)
print(f"\nChi-square p-value: {p_val:.4f}")

Output:

Variant A (standard):     2,847 recipients, 1.94% CTR
Variant B (personalized): 2,831 recipients, 2.68% CTR

Chi-square p-value: 0.0012

Personalization lifted CTR by 38% — statistically significant. But the test had not been analyzed or acted upon. It was just sitting in the data, quietly waiting for someone to look.

Step 6: Visualizing the Findings

fig, axes = plt.subplots(1, 3, figsize=(15, 5))
fig.suptitle("Acme Corp Q3 Email Campaign Performance by Customer Tier", fontsize=13, fontweight="bold")

tiers = ["Gold", "Silver", "Standard"]
colors = ["#FFD700", "#C0C0C0", "#CD7F32"]  # gold, silver, bronze

# Panel 1: Click Rate by Tier
click_rates = [0.064, 0.028, 0.011]
axes[0].bar(tiers, [r * 100 for r in click_rates], color=colors, edgecolor="white", linewidth=1.5)
axes[0].set_title("Click-Through Rate (%)", fontsize=11)
axes[0].set_ylabel("CTR (%)")
axes[0].axhline(y=2.1, color="red", linestyle="--", linewidth=1.5, label="Overall avg (2.1%)")
axes[0].axhline(y=2.8, color="gray", linestyle=":", linewidth=1.5, label="Industry benchmark (2.8%)")
axes[0].legend(fontsize=8)
for i, (tier, rate) in enumerate(zip(tiers, click_rates)):
    axes[0].text(i, rate * 100 + 0.1, f"{rate:.1%}", ha="center", fontweight="bold")

# Panel 2: Revenue per Email Sent
rev_per_email = [43.18, 11.84, 4.21]
axes[1].bar(tiers, rev_per_email, color=colors, edgecolor="white", linewidth=1.5)
axes[1].set_title("Revenue per Email Sent ($)", fontsize=11)
axes[1].set_ylabel("Revenue ($)")
for i, (tier, rev) in enumerate(zip(tiers, rev_per_email)):
    axes[1].text(i, rev + 0.5, f"${rev:.2f}", ha="center", fontweight="bold")

# Panel 3: Share of List vs. Share of Revenue
list_share = [6008 / 53840, 9628 / 53840, 38204 / 53840]
rev_share_vals = [
    tier_revenue.get("Gold", 0) / tier_revenue.sum(),
    tier_revenue.get("Silver", 0) / tier_revenue.sum(),
    tier_revenue.get("Standard", 0) / tier_revenue.sum(),
]
x = np.arange(3)
width = 0.4
axes[2].bar(x - width / 2, [s * 100 for s in list_share], width, label="% of Email List", color="#4472C4", alpha=0.85)
axes[2].bar(x + width / 2, [s * 100 for s in rev_share_vals], width, label="% of Revenue", color="#70AD47", alpha=0.85)
axes[2].set_xticks(x)
axes[2].set_xticklabels(tiers)
axes[2].set_title("List Share vs. Revenue Share (%)", fontsize=11)
axes[2].legend(fontsize=9)

plt.tight_layout()
plt.savefig("acme_email_campaign_analysis.png", dpi=150, bbox_inches="tight")
plt.show()

The Presentation to Sandra

Priya walked into the Thursday marketing review with a single slide:

Finding 1: The 2.1% blended click rate is misleading. Gold tier customers click at 6.4% — above the 2.8% industry benchmark. Standard tier customers click at 1.1%. Our "underperformance" is a list composition issue, not a content issue.

Finding 2: Gold customers generate $43.18 per email sent, versus $4.21 for Standard. We are allocating equal budget per recipient to audiences with a 10-to-1 revenue gap.

Finding 3: Personalized subject lines (tested in Campaign 7) produced 38% higher CTR with p = 0.001. This result was never acted upon.

Recommendation: Create a Gold-tier email sequence with higher frequency, more personalized content, and priority timing. Reduce Standard-tier email frequency to protect deliverability and focus investment where LTV justifies it.

Sandra looked at the data for thirty seconds, then looked up. "If we had been sending Standard customers half as many emails for the past year, we'd have put that budget into Gold and Silver. How much revenue might that have been?"

Priya had already run that calculation. The answer was approximately $142,000 in additional attributed revenue — conservatively modeled.

Marcus, to his credit, looked more relieved than embarrassed. "I ran that A/B test and forgot to follow up on it," he said. "Good catch."

What Priya Learned

This case illustrates one of the most common patterns in marketing analytics: aggregation hiding signal. The overall campaign looked mediocre because it lumped together three very different audience segments. The moment Priya broke the data down by tier, the story changed entirely.

The right response to underperformance metrics is rarely "try harder." It is almost always "break it down further."

The tools that made this possible — groupby, comparison across segments, chi-square tests for significance — are the same tools you have learned throughout this book. The skill is not in knowing which function to call. It is in asking the right question about why a number looks the way it does.

Key Python Techniques Used

groupby().agg() with multiple aggregation functions
Boolean column averaging for rates (.mean() on 0/1 columns)
pd.cut() for binning continuous variables
scipy.stats.chi2_contingency for A/B test validation
Multi-panel matplotlib figures for executive presentation
Calculated fields built on grouped summaries