> "Half the money I spend on advertising is wasted; the trouble is I don't know which half."
In This Chapter
- 31.1 The Marketing Metrics Every Business Professional Should Know
- 31.2 Campaign Attribution: The Hardest Problem in Marketing
- 31.3 A/B Testing: The Scientific Method for Marketers
- 31.4 Funnel Analysis: From Awareness to Purchase
- 31.5 Customer Acquisition Analysis: CAC by Channel
- 31.6 Digital Marketing Metrics: Working with UTM Data
- 31.7 Marketing Mix Analysis: Which Channels Drive the Most Value
- 31.8 Cohort-Based Campaign Analysis
- 31.9 Marketing Budget Allocation Modeling
- 31.10 Putting It All Together: A Marketing Analytics Workflow
- Summary
- Chapter 31 Quick Reference
Chapter 31: Marketing Analytics and Campaign Analysis
"Half the money I spend on advertising is wasted; the trouble is I don't know which half." — John Wanamaker, department store magnate, circa 1900
John Wanamaker said that over a hundred years ago, and for most of the intervening century, marketers mostly shrugged and moved on. You ran your campaigns, hoped for the best, and counted sales at the end of the quarter.
That era is over.
Today, every digital touchpoint generates data. Every email open, ad click, and checkout abandonment leaves a trace. The businesses that turn those traces into decisions — real decisions, not gut feelings dressed up in data vocabulary — are the ones eating their competitors' lunch.
In this chapter, you will learn to be that kind of business.
31.1 The Marketing Metrics Every Business Professional Should Know
Before you write a single line of code, you need to speak the language of marketing analytics. These are not optional acronyms. They are the vocabulary of strategic conversations, budget justifications, and performance reviews. When a CMO says "our ROAS on the Q3 display campaign was 2.1," you need to know instantly whether that's a victory or a disaster (spoiler: it depends on context, but 2.1 is often a warning sign).
Customer Acquisition Cost (CAC)
Customer Acquisition Cost (CAC) is the total cost to acquire one new paying customer. It is the single most important efficiency metric in marketing.
CAC = Total Marketing and Sales Spend / Number of New Customers Acquired
If Acme Corp spent $45,000 on marketing and sales last month and acquired 150 new customers, their CAC is $300.
The number only makes sense in context. A $300 CAC is catastrophic if a customer only ever buys a $200 box of printer cartridges. It's outstanding if a customer signs a three-year contract worth $18,000.
Which brings us to:
Customer Lifetime Value (LTV or CLV)
Customer Lifetime Value (LTV) is the total revenue a business can expect from a single customer account over the entire duration of the relationship.
The simplest version:
LTV = Average Order Value × Purchase Frequency × Average Customer Lifespan
More sophisticated versions account for churn rates, gross margins, and discount rates, but this formula is enough to make better decisions than most organizations currently do.
The LTV:CAC ratio is arguably the most important strategic metric in business. Industry benchmarks:
- LTV:CAC < 1:1 — You are paying more to acquire customers than they are worth. This is existential.
- LTV:CAC 1:1 to 3:1 — You are acquiring customers profitably but likely not efficiently.
- LTV:CAC 3:1 to 5:1 — Generally considered healthy for a scaling business.
- LTV:CAC > 5:1 — You are probably underinvesting in growth. There's more market to capture.
Return on Ad Spend (ROAS)
Return on Ad Spend (ROAS) measures revenue generated for every dollar spent on advertising.
ROAS = Revenue from Ads / Ad Spend
A ROAS of 4.0 means you generated $4 in revenue for every $1 spent on ads. Note that ROAS measures revenue, not profit — a campaign can have a strong ROAS and still lose money if your gross margins are thin. For profitability analysis, you need to know your target ROAS, which works backward from your margin:
Target ROAS = 1 / Gross Margin %
At a 40% gross margin, you need at least a 2.5× ROAS just to break even on ad spend.
Click-Through Rate (CTR)
Click-Through Rate (CTR) measures what percentage of people who saw your content clicked on it.
CTR = Clicks / Impressions × 100
CTR benchmarks vary enormously by channel. A 2% CTR on a search ad is mediocre. A 2% CTR on a display banner ad is excellent. A 2% CTR on an email campaign to cold prospects is a home run. Always compare CTR against channel-specific benchmarks, never across channels.
Cost Per Click (CPC)
Cost Per Click (CPC) is exactly what it sounds like: the average cost you pay each time someone clicks your ad.
CPC = Ad Spend / Total Clicks
CPC matters because it determines the top of your acquisition economics. If your CPC is $12 and only 3% of clickers convert to customers, your cost per conversion is $400. Whether that's acceptable depends entirely on your LTV.
Conversion Rate
Conversion rate measures the percentage of visitors (or leads, or trial users) who take a desired action.
Conversion Rate = Conversions / Total Visitors × 100
"Conversion" means different things at different stages. A conversion on a landing page might be filling out a form. A conversion for a sales rep might be closing a deal. Always specify which stage you mean when discussing conversion rates.
Putting the Metrics Together: The Unit Economics View
These metrics only become powerful when you connect them into a chain:
Impressions → CTR → Clicks → Conversion Rate → Customers
↑ ↑
CPC CAC
↓
LTV:CAC Ratio
When you can model this chain, you can answer questions like: "If we improve our landing page conversion rate from 2.8% to 3.5%, what does that do to our CAC?" That is the kind of question that gets budgets approved.
31.2 Campaign Attribution: The Hardest Problem in Marketing
You run a search ad. The same customer sees a Facebook ad three days later. They click a LinkedIn post the following week. Then they Google your brand name, find your website, and buy.
Which channel gets credit?
This is the attribution problem, and it has launched a thousand arguments in marketing departments. There is no universally correct answer, but there are principled approaches.
First-Touch Attribution
First-touch attribution gives 100% of the credit for a conversion to the first channel the customer ever interacted with.
- Advantage: Simple. Credits the channel that started the relationship.
- Disadvantage: Ignores everything that happened after that first interaction. A channel that is great at discovery but terrible at closing gets full credit even when ten other channels did the actual work.
- Best for: Understanding how customers first discover you.
Last-Touch Attribution
Last-touch attribution gives 100% of the credit to the last channel the customer interacted with before converting.
- Advantage: Simple. Credits the channel that "closed the deal."
- Disadvantage: Often credits branded search (people typing your company name into Google) — a channel that rarely deserves full credit for a conversion.
- Best for: Understanding what channels close deals, as a complement to first-touch.
Multi-Touch Attribution
Multi-touch attribution distributes credit across all touchpoints in the customer journey. Common variants:
- Linear: Every touchpoint gets equal credit.
- Time-decay: Touchpoints closer to conversion get more credit.
- Position-based (U-shaped): First and last touch each get 40%, remaining touchpoints split the other 20%.
- Data-driven: Machine learning models determine weights based on actual conversion patterns (requires significant data volume).
Multi-touch is more accurate but also more complex and requires good tracking infrastructure (UTM parameters, a CRM, and ideally a customer data platform).
For this chapter, we will focus on what you can do with the data most businesses actually have: session-level data with UTM parameters, channel data, and conversion events.
31.3 A/B Testing: The Scientific Method for Marketers
If marketing attribution is about understanding the past, A/B testing is about shaping the future.
What A/B Testing Is and Why It Matters
An A/B test (also called a split test) is a controlled experiment where you show two versions of something — a webpage, an email subject line, an ad — to randomly divided audiences, then measure which version produces better outcomes.
The key word is controlled. In the absence of a controlled experiment, you cannot know whether a change in performance is due to your change or to external factors (seasonality, a competitor's sale, algorithm changes).
Acme Corp ran a flash sale promotion last July and saw a 40% spike in orders. Was it the "FREE SHIPPING" banner they added? The time-of-year? The email they sent? Without an A/B test, they will never know, and they will probably credit whichever explanation makes the most narrative sense rather than the one that is true.
Designing an A/B Test
A good A/B test has five elements:
- A clear hypothesis. "Changing the CTA button from 'Submit' to 'Get My Free Quote' will increase click-through rate."
- A single variable changed. Change one thing at a time. If you change the headline, the image, and the CTA simultaneously, you cannot know which change drove the result.
- A primary metric. What is the one number that determines success or failure? (Not five metrics — one.)
- A minimum sample size calculated in advance. More on this below.
- A predetermined end date. You commit to running the test until you reach sample size, then you stop and read the result.
Statistical Significance: The Intuitive Explanation
Here is what statistical significance actually means: if the two versions performed exactly the same in reality, what is the probability that random chance would produce a difference at least as large as the one you observed?
That probability is called the p-value. When the p-value is below your threshold (conventionally 0.05, meaning 5%), you declare the result statistically significant.
But let us ground this in business terms. Imagine you flipped a coin ten times and got seven heads. Is the coin biased? Maybe. But random chance produces that result fairly often — it is not strong evidence. Now imagine you flipped it 1,000 times and got 700 heads. Now you are confident something is off.
Statistical significance is the same idea applied to conversion rates. A conversion rate of 5.2% versus 4.9% from 100 visitors each is noise. The same rates from 5,000 visitors each is a real finding.
Calculating Sample Size
Before you run an A/B test, you need to know how many visitors each variant needs to produce a reliable result. This depends on:
- Baseline conversion rate: What is version A's current conversion rate?
- Minimum detectable effect (MDE): What is the smallest improvement you actually care about? (If a 0.1% improvement is meaningless to your business, you should not be designing a test that can only detect 0.1%.)
- Statistical power: Typically 0.80, meaning you want an 80% chance of detecting a real effect.
- Significance level: Typically 0.05.
import numpy as np
from scipy import stats
def calculate_sample_size(
baseline_rate: float,
minimum_detectable_effect: float,
alpha: float = 0.05,
power: float = 0.80,
) -> int:
"""
Calculate required sample size per variant for an A/B test.
Parameters
----------
baseline_rate : float
Current conversion rate (e.g., 0.05 for 5%)
minimum_detectable_effect : float
Minimum relative improvement worth detecting (e.g., 0.10 for 10%)
alpha : float
Significance level (default 0.05)
power : float
Statistical power (default 0.80)
Returns
-------
int
Required sample size per variant
"""
treatment_rate = baseline_rate * (1 + minimum_detectable_effect)
# Z-scores for alpha (two-tailed) and power
z_alpha = stats.norm.ppf(1 - alpha / 2)
z_beta = stats.norm.ppf(power)
# Pooled proportion
p_pooled = (baseline_rate + treatment_rate) / 2
# Sample size formula
numerator = (z_alpha * np.sqrt(2 * p_pooled * (1 - p_pooled)) +
z_beta * np.sqrt(baseline_rate * (1 - baseline_rate) +
treatment_rate * (1 - treatment_rate))) ** 2
denominator = (treatment_rate - baseline_rate) ** 2
return int(np.ceil(numerator / denominator))
At a 3% baseline conversion rate, if you want to detect a 15% relative improvement (from 3.0% to 3.45%), you need roughly 6,500 visitors per variant — that is 13,000 total — before you can trust the result. Many businesses run underpowered tests without knowing it.
Running the Test: Chi-Square for Conversion Rates
Once you have collected the data, the workhorse statistical test for comparing two conversion rates is the chi-square test of independence (or its equivalent, the two-proportion z-test). In Python:
from scipy.stats import chi2_contingency
import numpy as np
def analyze_ab_test(
control_visitors: int,
control_conversions: int,
treatment_visitors: int,
treatment_conversions: int,
alpha: float = 0.05,
) -> dict:
"""
Analyze an A/B test result for conversion rates.
Returns a dictionary with rates, lift, p-value, and decision.
"""
control_rate = control_conversions / control_visitors
treatment_rate = treatment_conversions / treatment_visitors
lift = (treatment_rate - control_rate) / control_rate
# Contingency table: [converted, did not convert] for each variant
contingency_table = np.array([
[control_conversions, control_visitors - control_conversions],
[treatment_conversions, treatment_visitors - treatment_conversions],
])
chi2, p_value, dof, expected = chi2_contingency(contingency_table)
return {
"control_rate": control_rate,
"treatment_rate": treatment_rate,
"lift": lift,
"p_value": p_value,
"significant": p_value < alpha,
"recommendation": (
"Launch treatment" if p_value < alpha and lift > 0
else "Keep control" if p_value < alpha and lift < 0
else "Inconclusive — collect more data"
),
}
Common A/B Testing Mistakes
Peeking. You launch the test, and two days in, the treatment is winning. You stop the test early and celebrate. This is one of the most common errors in business analytics. The problem: early in a test, random fluctuations are large. If you only stop when things look good, you are exploiting those fluctuations, not measuring real effects. Your false positive rate skyrockets. Commit to your sample size before you start.
Underpowering. Related to peeking: running a test until you get a result, rather than until you have collected enough data. A test with 200 visitors per variant that shows a non-significant result has told you almost nothing.
Multiple testing. If you test 20 hypotheses and use p < 0.05 as your threshold, you expect one false positive by chance alone. When teams test dozens of variants simultaneously, apply the Bonferroni correction (divide your alpha by the number of tests) or use a false discovery rate approach.
Ignoring novelty effects. A new button color might outperform simply because it is new and catches attention. Run tests long enough to outlast the novelty window.
Testing the wrong thing. Statistical significance is not the same as business significance. A test that shows a statistically significant 0.1% improvement in click rate, which translates to $200/year in revenue, is not worth the engineering time it took to implement.
31.4 Funnel Analysis: From Awareness to Purchase
A marketing funnel describes the stages a prospect moves through from first awareness to completed purchase. Funnel analysis identifies where the most prospects drop off, so you know where to focus optimization effort.
A typical e-commerce funnel might look like:
Homepage Visit → Product View → Add to Cart → Checkout Started → Purchase
10,000 4,200 1,800 820 410
Each step has a conversion rate (what percentage proceed to the next step) and a drop-off rate (what percentage leave).
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.ticker as mtick
def create_funnel_analysis(stages: list[str], counts: list[int]) -> pd.DataFrame:
"""
Build a funnel analysis DataFrame from stage names and visitor counts.
Parameters
----------
stages : list of str
Stage names in order (top to bottom)
counts : list of int
Visitor/user count at each stage
Returns
-------
pd.DataFrame
Funnel table with conversion rates and drop-off analysis
"""
funnel_df = pd.DataFrame({"stage": stages, "count": counts})
# Conversion rate from previous step
funnel_df["step_conversion"] = (
funnel_df["count"] / funnel_df["count"].shift(1) * 100
)
# Overall conversion from first step
funnel_df["overall_conversion"] = (
funnel_df["count"] / funnel_df["count"].iloc[0] * 100
)
# Drop-off count and rate
funnel_df["drop_off_count"] = funnel_df["count"].shift(1) - funnel_df["count"]
funnel_df["drop_off_rate"] = 100 - funnel_df["step_conversion"]
return funnel_df
def plot_funnel(funnel_df: pd.DataFrame, title: str = "Marketing Funnel") -> None:
"""Visualize a marketing funnel as a horizontal bar chart."""
fig, ax = plt.subplots(figsize=(10, 6))
colors = plt.cm.Blues(
[0.9, 0.75, 0.60, 0.45, 0.30][: len(funnel_df)]
)
bars = ax.barh(
funnel_df["stage"][::-1],
funnel_df["count"][::-1],
color=colors[::-1],
edgecolor="white",
linewidth=1.5,
)
# Add count and conversion labels
for bar, (_, row) in zip(bars, funnel_df[::-1].iterrows()):
ax.text(
bar.get_width() + max(funnel_df["count"]) * 0.01,
bar.get_y() + bar.get_height() / 2,
f"{row['count']:,} ({row['overall_conversion']:.1f}%)",
va="center",
fontsize=10,
)
ax.set_title(title, fontsize=14, fontweight="bold", pad=15)
ax.set_xlabel("Visitors / Users", fontsize=11)
ax.xaxis.set_major_formatter(mtick.FuncFormatter(lambda x, _: f"{x:,.0f}"))
ax.spines["top"].set_visible(False)
ax.spines["right"].set_visible(False)
plt.tight_layout()
plt.show()
Reading a Funnel: Where Is the Biggest Opportunity?
When analyzing a funnel, the instinct is to focus on the smallest step — the last step before purchase. But the greatest opportunity is usually at the step with the largest absolute drop-off.
In the example above: - Homepage → Product View: lost 5,800 visitors (58% drop-off rate) - Product View → Add to Cart: lost 2,400 visitors (57% drop-off rate) - Add to Cart → Checkout: lost 980 visitors (54% drop-off rate)
The biggest absolute opportunity is improving the first step — getting more homepage visitors to view a product. Even a 10% improvement there adds 580 more visitors entering the rest of the funnel.
Funnel Segmentation
Raw funnels hide important patterns. Always break your funnel down by:
- Device type (mobile users often have higher add-to-cart drop-off)
- Traffic source (email traffic typically converts better than social traffic)
- Customer tier (first-time visitors vs. returning customers)
- Product category (high-consideration items have longer paths to purchase)
def segment_funnel(
data: pd.DataFrame,
stage_column: str,
segment_column: str,
user_id_column: str,
) -> pd.DataFrame:
"""
Calculate funnel metrics segmented by a categorical variable.
Returns a pivot table of conversion rates by segment and stage.
"""
grouped = (
data.groupby([segment_column, stage_column])[user_id_column]
.nunique()
.reset_index(name="users")
)
# Get top-of-funnel count per segment for normalization
top_stage = data[stage_column].unique()[0] # assumes ordered categorical
top_counts = (
grouped[grouped[stage_column] == top_stage]
.set_index(segment_column)["users"]
)
grouped["pct_of_top"] = (
grouped.apply(
lambda row: row["users"] / top_counts.get(row[segment_column], 1) * 100,
axis=1,
)
)
return grouped.pivot(index=stage_column, columns=segment_column, values="pct_of_top")
31.5 Customer Acquisition Analysis: CAC by Channel
Not all acquisition channels are created equal. The most important CAC analysis you can do is break CAC down by channel — and then compare it against the LTV of customers acquired through each channel.
Setting Up Channel-Level CAC Analysis
import pandas as pd
import numpy as np
def calculate_channel_cac(spend_df: pd.DataFrame, customers_df: pd.DataFrame) -> pd.DataFrame:
"""
Calculate CAC by acquisition channel.
Parameters
----------
spend_df : pd.DataFrame
Columns: channel, period, spend
customers_df : pd.DataFrame
Columns: customer_id, acquisition_channel, acquisition_date, ltv
Returns
-------
pd.DataFrame
Channel-level CAC, new customer count, total spend, and LTV:CAC ratio
"""
channel_spend = spend_df.groupby("channel")["spend"].sum().reset_index()
channel_customers = (
customers_df.groupby("acquisition_channel")
.agg(
new_customers=("customer_id", "count"),
avg_ltv=("ltv", "mean"),
)
.reset_index()
.rename(columns={"acquisition_channel": "channel"})
)
channel_summary = channel_spend.merge(channel_customers, on="channel", how="outer")
channel_summary["cac"] = channel_summary["spend"] / channel_summary["new_customers"]
channel_summary["ltv_cac_ratio"] = channel_summary["avg_ltv"] / channel_summary["cac"]
channel_summary["payback_period_months"] = (
channel_summary["cac"] / (channel_summary["avg_ltv"] / 24)
) # assumes 24-month average customer lifespan
return channel_summary.sort_values("ltv_cac_ratio", ascending=False)
The CAC Payback Period
CAC payback period is the number of months it takes to recoup the cost of acquiring a customer. It is a cash-flow metric — even if a customer has a high LTV, a 36-month payback period means you are funding customer acquisition out of pocket for three years before you see a return.
CAC Payback Period = CAC / (Average Monthly Revenue per Customer × Gross Margin %)
As a rule of thumb: - SaaS businesses target < 12 months - E-commerce businesses target < 6 months - B2B enterprise businesses may accept 18–24 months given high LTV
31.6 Digital Marketing Metrics: Working with UTM Data
UTM parameters (Urchin Tracking Module, a legacy name from Google's acquisition of Urchin Analytics) are tags appended to URLs that tell your analytics system where traffic came from.
A UTM-tagged URL looks like this:
https://acmecorp.com/products?utm_source=email&utm_medium=newsletter&utm_campaign=q3-promo&utm_content=header-cta
The five standard UTM parameters:
- utm_source — Where the traffic comes from (email, google, facebook, linkedin)
- utm_medium — The marketing medium (cpc, organic, email, social)
- utm_campaign — The specific campaign name
- utm_content — The specific piece of content (for A/B testing ads)
- utm_term — The keyword (paid search)
Parsing and Analyzing UTM Data
import pandas as pd
from urllib.parse import urlparse, parse_qs
def parse_utm_from_url(url: str) -> dict:
"""Extract UTM parameters from a URL string."""
if pd.isna(url):
return {}
parsed = urlparse(url)
params = parse_qs(parsed.query)
utm_params = {
"utm_source": params.get("utm_source", [None])[0],
"utm_medium": params.get("utm_medium", [None])[0],
"utm_campaign": params.get("utm_campaign", [None])[0],
"utm_content": params.get("utm_content", [None])[0],
"utm_term": params.get("utm_term", [None])[0],
}
return utm_params
def enrich_sessions_with_utm(sessions_df: pd.DataFrame) -> pd.DataFrame:
"""
Parse UTM parameters from landing page URLs and merge into sessions DataFrame.
Expects sessions_df to have a 'landing_page_url' column.
"""
utm_data = sessions_df["landing_page_url"].apply(parse_utm_from_url)
utm_df = pd.DataFrame(utm_data.tolist())
return pd.concat([sessions_df, utm_df], axis=1)
def channel_performance_summary(
sessions_df: pd.DataFrame,
conversions_df: pd.DataFrame,
session_id_col: str = "session_id",
) -> pd.DataFrame:
"""
Summarize performance by utm_source/utm_medium combination.
Parameters
----------
sessions_df : pd.DataFrame
Session-level data with UTM columns
conversions_df : pd.DataFrame
Conversion events with session_id and revenue columns
"""
enriched = enrich_sessions_with_utm(sessions_df)
enriched["channel"] = enriched["utm_medium"].fillna("direct") + " / " + enriched["utm_source"].fillna("(none)")
merged = enriched.merge(
conversions_df[[session_id_col, "converted", "revenue"]],
on=session_id_col,
how="left",
)
merged["converted"] = merged["converted"].fillna(0)
merged["revenue"] = merged["revenue"].fillna(0)
summary = (
merged.groupby("channel")
.agg(
sessions=("session_id", "count"),
conversions=("converted", "sum"),
revenue=("revenue", "sum"),
)
.reset_index()
)
summary["conversion_rate"] = summary["conversions"] / summary["sessions"] * 100
summary["revenue_per_session"] = summary["revenue"] / summary["sessions"]
return summary.sort_values("revenue", ascending=False)
31.7 Marketing Mix Analysis: Which Channels Drive the Most Value
A marketing mix analysis (sometimes called media mix modeling or MMM) asks the question: across all the money we spend on marketing, which allocations produce the best returns?
For most small-to-mid-sized businesses, a simplified approach using correlation analysis and contribution metrics is sufficient and far more actionable than the full econometric models that large consumer brands use.
Building a Channel Attribution Summary
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
def marketing_mix_summary(
channel_data: pd.DataFrame,
spend_col: str = "spend",
revenue_col: str = "attributed_revenue",
customers_col: str = "new_customers",
) -> pd.DataFrame:
"""
Summarize marketing mix efficiency across channels.
Returns a DataFrame with ROAS, CAC, revenue share, and spend share
to identify over- and under-funded channels.
"""
df = channel_data.copy()
df["roas"] = df[revenue_col] / df[spend_col]
df["cac"] = df[spend_col] / df[customers_col]
df["spend_share"] = df[spend_col] / df[spend_col].sum() * 100
df["revenue_share"] = df[revenue_col] / df[revenue_col].sum() * 100
df["efficiency_ratio"] = df["revenue_share"] / df["spend_share"]
# efficiency_ratio > 1 means the channel punches above its spend weight
return df.sort_values("efficiency_ratio", ascending=False)
def plot_spend_vs_revenue_share(mix_df: pd.DataFrame, channel_col: str = "channel") -> None:
"""
Bubble chart comparing spend share vs revenue share by channel.
Channels above the diagonal are efficient; below are inefficient.
"""
fig, ax = plt.subplots(figsize=(10, 7))
scatter = ax.scatter(
mix_df["spend_share"],
mix_df["revenue_share"],
s=mix_df["spend"] / mix_df["spend"].max() * 800,
c=mix_df["efficiency_ratio"],
cmap="RdYlGn",
alpha=0.8,
edgecolors="gray",
linewidth=0.5,
)
# 1:1 line — channels on this line are perfectly efficient
max_val = max(mix_df["spend_share"].max(), mix_df["revenue_share"].max()) * 1.1
ax.plot([0, max_val], [0, max_val], "k--", alpha=0.4, linewidth=1, label="Break-even line")
# Label each bubble
for _, row in mix_df.iterrows():
ax.annotate(
row[channel_col],
(row["spend_share"], row["revenue_share"]),
textcoords="offset points",
xytext=(8, 4),
fontsize=9,
)
plt.colorbar(scatter, label="Efficiency Ratio (Revenue Share / Spend Share)")
ax.set_xlabel("Spend Share (%)", fontsize=11)
ax.set_ylabel("Revenue Share (%)", fontsize=11)
ax.set_title("Marketing Mix: Spend Share vs. Revenue Share", fontsize=13, fontweight="bold")
ax.legend()
plt.tight_layout()
plt.show()
31.8 Cohort-Based Campaign Analysis
Standard campaign metrics aggregate all customers together, which hides important patterns. A campaign that looks mediocre in aggregate might look excellent for new customers and poor for existing ones — or vice versa. Cohort analysis groups customers by when they were acquired (or when they first interacted with a campaign) and tracks their behavior over time.
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
def build_campaign_cohorts(
orders_df: pd.DataFrame,
date_col: str = "order_date",
customer_col: str = "customer_id",
revenue_col: str = "revenue",
acquisition_campaign_col: str = "acquisition_campaign",
) -> pd.DataFrame:
"""
Build a cohort retention and revenue table by acquisition campaign.
Each cohort is defined by the campaign that acquired the customer.
Periods are months since acquisition.
"""
orders_df = orders_df.copy()
orders_df[date_col] = pd.to_datetime(orders_df[date_col])
# Identify each customer's first order date and acquisition campaign
first_order = (
orders_df.sort_values(date_col)
.groupby(customer_col)
.agg(
acquisition_date=(date_col, "first"),
campaign=(acquisition_campaign_col, "first"),
)
.reset_index()
)
orders_with_cohort = orders_df.merge(first_order, on=customer_col)
orders_with_cohort["months_since_acquisition"] = (
(orders_with_cohort[date_col].dt.year - orders_with_cohort["acquisition_date"].dt.year) * 12
+ (orders_with_cohort[date_col].dt.month - orders_with_cohort["acquisition_date"].dt.month)
)
cohort_revenue = (
orders_with_cohort.groupby(["campaign", "months_since_acquisition"])[revenue_col]
.sum()
.unstack(fill_value=0)
)
return cohort_revenue
def plot_cohort_heatmap(
cohort_df: pd.DataFrame,
title: str = "Revenue by Acquisition Campaign Cohort ($)",
) -> None:
"""Heatmap visualization of cohort revenue over time."""
fig, ax = plt.subplots(figsize=(14, max(4, len(cohort_df) * 0.6)))
sns.heatmap(
cohort_df,
annot=True,
fmt=",.0f",
cmap="YlOrRd",
linewidths=0.5,
ax=ax,
cbar_kws={"label": "Revenue ($)"},
)
ax.set_title(title, fontsize=13, fontweight="bold", pad=15)
ax.set_xlabel("Months Since Acquisition", fontsize=11)
ax.set_ylabel("Acquisition Campaign", fontsize=11)
plt.tight_layout()
plt.show()
31.9 Marketing Budget Allocation Modeling
Once you have channel-level ROAS and CAC data, you can build simple optimization models for budget allocation. The key insight is diminishing returns: as you spend more on any single channel, each additional dollar produces less return (more expensive keywords, larger but less targeted audiences).
A practical approach:
import numpy as np
import pandas as pd
from scipy.optimize import minimize
def diminishing_returns_curve(
spend: float, alpha: float, beta: float
) -> float:
"""
Model revenue as a function of spend with diminishing returns.
Uses a power function: revenue = alpha * spend^beta
where 0 < beta < 1 captures diminishing returns.
"""
return alpha * (spend ** beta)
def optimize_budget_allocation(
channel_params: dict[str, tuple[float, float]],
total_budget: float,
min_spend_per_channel: float = 500.0,
) -> pd.DataFrame:
"""
Optimize budget allocation across channels to maximize total revenue.
Parameters
----------
channel_params : dict
{channel_name: (alpha, beta)} curve parameters per channel
total_budget : float
Total available marketing budget
min_spend_per_channel : float
Minimum spend to maintain brand presence in each channel
Returns
-------
pd.DataFrame
Optimal spend and projected revenue per channel
"""
channels = list(channel_params.keys())
n = len(channels)
def total_revenue(spends: np.ndarray) -> float:
return -sum(
diminishing_returns_curve(spends[i], *channel_params[channels[i]])
for i in range(n)
)
constraints = [{"type": "eq", "fun": lambda x: x.sum() - total_budget}]
bounds = [(min_spend_per_channel, total_budget) for _ in range(n)]
x0 = np.array([total_budget / n] * n)
result = minimize(total_revenue, x0, method="SLSQP", bounds=bounds, constraints=constraints)
optimal_spends = result.x
projected_revenues = [
diminishing_returns_curve(optimal_spends[i], *channel_params[channels[i]])
for i in range(n)
]
return pd.DataFrame({
"channel": channels,
"optimal_spend": optimal_spends,
"projected_revenue": projected_revenues,
"projected_roas": [r / s for r, s in zip(projected_revenues, optimal_spends)],
}).sort_values("optimal_spend", ascending=False)
This is a simplified model. Real media mix models account for carryover effects (the "adstock" of past advertising), external variables (macro conditions, seasonality), and interaction effects between channels. But for most businesses, this level of analysis represents a significant improvement over allocating budget based on feel and inertia.
31.10 Putting It All Together: A Marketing Analytics Workflow
Here is a practical end-to-end workflow for analyzing a marketing period:
Step 1: Pull your data. Gather channel-level spend data, session-level data with UTM parameters, conversion events, and customer records.
Step 2: Calculate top-line metrics. Total spend, total revenue, blended ROAS, blended CAC, new customers acquired.
Step 3: Break down by channel. CAC, ROAS, conversion rate, and LTV:CAC ratio for each channel.
Step 4: Run funnel analysis. Identify the highest drop-off stage for your top channels.
Step 5: Analyze any active A/B tests. Check whether tests have reached statistical significance before reading results.
Step 6: Cohort analysis. Are customers acquired this period retaining at the same rate as previous cohorts?
Step 7: Budget reallocation recommendation. Based on efficiency ratios, which channels should get more or less budget next period?
The marketing analytics Python scripts in this chapter's code directory implement this workflow end to end. The ab_test_analyzer.py handles steps 5, and funnel_analyzer.py handles step 4.
Summary
Marketing analytics is not magic. It is disciplined measurement combined with the curiosity to ask why the numbers look the way they do.
The tools you have learned in this chapter — A/B testing with statistical rigor, funnel analysis, channel-level CAC, UTM attribution, and budget optimization — represent the core of what a modern marketing analytics function does.
The businesses that do these things consistently are not smarter than their competitors. They are simply more deliberate about turning data into decisions. And now you have the tools to help them do that.
In Chapter 32, we shift from acquiring customers to keeping them happy by exploring the supply chain systems that actually deliver on marketing's promises.
Chapter 31 Quick Reference
| Metric | Formula | Benchmark |
|---|---|---|
| CAC | Total Spend / New Customers | Varies by industry |
| LTV | AOV × Frequency × Lifespan | 3–5× CAC target |
| ROAS | Revenue / Ad Spend | > 1 / Gross Margin % |
| CTR | Clicks / Impressions | Channel-dependent |
| CPC | Spend / Clicks | Set by auction |
| Conversion Rate | Conversions / Visitors | 1–5% typical |
| LTV:CAC Ratio | LTV / CAC | > 3:1 healthy |
| CAC Payback | CAC / Monthly Margin | < 12 mo (SaaS) |