Chapter 31 Exercises: Marketing Analytics and Campaign Analysis

These exercises progress from foundational metric calculation through to multi-channel optimization modeling. Work through them in order — each tier builds on techniques from the one before.

Tier 1: Foundational Metrics (Exercises 1–3)

These exercises test your ability to calculate and interpret core marketing metrics.

Exercise 1: Unit Economics Calculator

Scenario: You have the following data for a SaaS company's Q2 marketing results:

Total marketing spend: $84,000
Total sales team cost: $31,500
New customers acquired: 180
Average monthly subscription value: $340
Average customer lifespan: 18 months
Gross margin: 72%

Tasks:

a. Calculate the combined CAC (marketing + sales).

b. Calculate the LTV using the formula: Average Monthly Revenue × Gross Margin % × Avg Lifespan (months).

c. Calculate the LTV:CAC ratio. Based on the benchmarks in this chapter, how would you characterize the health of this business's acquisition economics?

d. Calculate the CAC payback period in months: CAC / (Monthly Revenue × Gross Margin %).

e. Write a function calculate_unit_economics(spend, new_customers, monthly_revenue, gross_margin, avg_lifespan_months) that accepts all these inputs and returns a formatted dictionary with all four metrics.

Expected output (approximate):

{
  'cac': 641.67,
  'ltv': 4406.40,
  'ltv_cac_ratio': 6.87,
  'cac_payback_months': 2.62
}

Exercise 2: CTR and Conversion Rate Analysis

Scenario: A retail company ran three ad campaigns in September. Here is the raw data:

Campaign	Impressions	Clicks	Conversions	Ad Spend
Search	45,000	1,890	142	$2,835
Display	280,000	2,240	47	$1,120
Email	18,500	1,110	198	$370

Tasks:

a. Create a pandas DataFrame from this data.

b. Calculate CTR (Clicks / Impressions) for each campaign.

c. Calculate CPC (Spend / Clicks) for each campaign.

d. Calculate conversion rate (Conversions / Clicks) for each campaign.

e. Calculate cost per conversion (Spend / Conversions) for each campaign.

f. If the average order value is $68.00, calculate ROAS for each campaign.

g. Which campaign has the best ROAS? Which has the best conversion rate? Are they the same campaign? Why might they differ?

Exercise 3: Funnel Drop-Off Analysis

Scenario: An online course platform has the following funnel data for the past month:

Homepage visit:    22,400
Course catalog:     8,960
Course detail:      5,376
Started checkout:   1,344
Enrolled:             537

Tasks:

a. Create a FunnelAnalyzer object using the class from funnel_analyzer.py.

b. Print the summary (call .summary()).

c. Which stage has the largest absolute drop-off?

d. Which transition has the worst step conversion rate?

e. If the course price is $249 and you could improve the "Started checkout → Enrolled" step from its current rate to 50%, how many additional enrollments would that produce per month? What is the additional monthly revenue?

f. Create the funnel visualization using .plot_funnel().

Tier 2: A/B Testing (Exercises 4–7)

Exercise 4: Sample Size Planning

Your team wants to run an A/B test on a landing page. The current conversion rate is 4.2%.

Tasks:

a. Using calculate_required_sample_size() from ab_test_analyzer.py, calculate the required sample size per variant for each of these minimum detectable effects: 10%, 15%, 20%, 25%, 30%.

b. Put your results in a DataFrame and add a column called days_to_complete assuming 500 visitors/day per variant.

c. Which MDE would you choose if you needed results in under 3 weeks? What is the tradeoff you are accepting?

d. What would happen to your sample size requirements if you increased power from 0.80 to 0.90? Recalculate for an MDE of 20% at both power levels and explain the difference in plain business language.

Exercise 5: Analyzing A/B Test Results

The test from Exercise 4 ran for 30 days. Here are the results:

Variant	Visitors	Conversions
Control	14,800	622
Treatment	14,650	721

Tasks:

a. Calculate the conversion rate for each variant.

b. Calculate the absolute and relative lift.

c. Use analyze_conversion_ab_test() from ab_test_analyzer.py to run the full analysis. Use daily traffic of 493 visitors per variant and a revenue per conversion of $85.

d. Print the stakeholder report using print_stakeholder_report().

e. Generate and save the visualization using plot_ab_test_results().

f. Based on the p-value and confidence interval, what is your recommendation? How would you explain the result to a non-technical marketing manager in two sentences?

Exercise 6: The Peeking Problem (Simulation)

This exercise demonstrates why peeking at A/B test results too early inflates false positive rates.

Tasks:

a. Simulate an A/B test where both variants have the exact same 5% conversion rate (the null hypothesis is true — no real difference exists). Use numpy to generate Bernoulli random samples:

import numpy as np
from scipy.stats import chi2_contingency

np.random.seed(42)
true_rate = 0.05
total_n = 10000  # final target sample size

control_outcomes = np.random.binomial(1, true_rate, total_n)
treatment_outcomes = np.random.binomial(1, true_rate, total_n)

b. Write a loop that checks significance at every 100 visitors (from 100 to 10,000). Record whether the test appears significant (p < 0.05) at each check.

c. At how many checkpoints does the test appear to show a significant result, even though no real difference exists? What percentage of checkpoints produced a false positive?

d. Compare this to the expected false positive rate if you had only read the result once at the end (5%).

e. Write a two-paragraph explanation of this result that you could put in a training document for your marketing team.

Exercise 7: Multiple Testing Correction

Your marketing team ran the following 10 A/B tests simultaneously last quarter. The p-values are:

test_results = [
    {"test": "Homepage headline", "p_value": 0.031},
    {"test": "CTA button color", "p_value": 0.048},
    {"test": "Email subject line A", "p_value": 0.012},
    {"test": "Email subject line B", "p_value": 0.089},
    {"test": "Checkout flow step 2", "p_value": 0.003},
    {"test": "Product image carousel", "p_value": 0.227},
    {"test": "Pricing page layout", "p_value": 0.041},
    {"test": "Blog CTA placement", "p_value": 0.156},
    {"test": "Popup timing", "p_value": 0.038},
    {"test": "Footer navigation", "p_value": 0.299},
]

Tasks:

a. Without any correction, how many tests appear significant at p < 0.05?

b. Apply the Bonferroni correction using bonferroni_correction() from ab_test_analyzer.py. How many tests remain significant?

c. Calculate the expected number of false positives without correction (hint: n_tests × alpha).

d. Which tests would you confidently launch? Which require more data? Write a brief recommendation for each.

Tier 3: Funnel and Channel Analysis (Exercises 8–11)

Exercise 8: Segmented Funnel Analysis

You have funnel data for a B2B software company, broken down by traffic source:

source_funnels = {
    "Organic Search": {
        "stages": ["Visit", "Feature Page", "Pricing View", "Trial Start", "Paid Convert"],
        "counts": [12000, 4800, 2160, 648, 194],
    },
    "Paid Search": {
        "stages": ["Visit", "Feature Page", "Pricing View", "Trial Start", "Paid Convert"],
        "counts": [6400, 3200, 1920, 768, 307],
    },
    "LinkedIn": {
        "stages": ["Visit", "Feature Page", "Pricing View", "Trial Start", "Paid Convert"],
        "counts": [3200, 1024, 409, 98, 22],
    },
    "Referral": {
        "stages": ["Visit", "Feature Page", "Pricing View", "Trial Start", "Paid Convert"],
        "counts": [1800, 1116, 760, 380, 190],
    },
}

Tasks:

a. Create a FunnelAnalyzer for each traffic source and call .summary() on each.

b. Which source has the highest overall conversion rate (Visit → Paid Convert)?

c. Which source has the worst "Trial Start → Paid Convert" step? What might explain this?

d. Use compare_segment_funnels() to create a heatmap comparing step conversion rates across all sources.

e. If paid search costs $0.95 per visit and organic costs effectively $0.08 per visit (from content investment), which source has a lower cost per conversion? Calculate cost per paid conversion for both.

Exercise 9: CAC by Channel Analysis

Build a channel-level CAC analysis from scratch using this data:

spend_data = pd.DataFrame({
    "channel": ["paid_search", "paid_social", "email", "content", "events"],
    "q3_spend": [28400, 19600, 4200, 8800, 12000],
})

customers_data = pd.DataFrame({
    "channel": ["paid_search", "paid_social", "email", "content", "events",
                "paid_search", "paid_social", "email", "content", "events"],
    "new_customers": [82, 44, 31, 48, 19, 91, 38, 29, 52, 22],
    "avg_ltv": [3200, 2800, 5100, 4400, 7200, 3100, 2600, 5300, 4600, 7400],
    "quarter": ["Q2", "Q2", "Q2", "Q2", "Q2", "Q3", "Q3", "Q3", "Q3", "Q3"],
})

Tasks:

a. Calculate total Q3 CAC by channel.

b. Calculate LTV:CAC ratio for each channel using the Q3 averages.

c. Build an efficiency ratio: revenue_share / spend_share. Which channels are punching above their weight?

d. Using the Q2 and Q3 data together, which channels showed improving or worsening LTV:CAC trends quarter over quarter?

e. Write a budget reallocation recommendation: if you had to reallocate 20% of the total spend from the least efficient channel(s) to the most efficient one(s), what would you do?

Exercise 10: UTM Parameter Analysis

You have session-level data from a website with UTM parameters embedded in landing page URLs. Create a synthetic dataset and analyze it:

import pandas as pd
import numpy as np
from urllib.parse import urlencode

np.random.seed(123)
n_sessions = 5000

sources = np.random.choice(
    ["google", "facebook", "linkedin", "email", "direct", "referral"],
    n_sessions, p=[0.32, 0.18, 0.12, 0.22, 0.10, 0.06]
)
mediums = {
    "google": "cpc", "facebook": "paid_social", "linkedin": "cpc",
    "email": "email", "direct": None, "referral": "referral"
}
campaigns = {
    "google": ["brand_search", "product_keywords", "competitor"],
    "facebook": ["retargeting", "lookalike", "interest"],
    "linkedin": ["lead_gen_form", "sponsored_content"],
    "email": ["weekly_newsletter", "promo_blast", "onboarding"],
    "direct": [None], "referral": [None],
}

sessions = []
for i, source in enumerate(sources):
    medium = mediums[source]
    campaign = np.random.choice(campaigns[source])
    base_conv = {"google": 0.052, "facebook": 0.028, "linkedin": 0.041,
                 "email": 0.071, "direct": 0.038, "referral": 0.085}[source]
    converted = np.random.binomial(1, base_conv)
    revenue = np.random.normal(85, 25) * converted if converted else 0
    sessions.append({
        "session_id": f"S{i:05d}",
        "utm_source": source,
        "utm_medium": medium,
        "utm_campaign": campaign,
        "converted": converted,
        "revenue": max(0, revenue),
    })

sessions_df = pd.DataFrame(sessions)

Tasks:

a. Aggregate performance by utm_source: sessions, conversions, revenue, conversion rate, revenue per session.

b. Find the top 3 campaigns by revenue across all sources.

c. For email campaigns, which individual campaign has the highest conversion rate?

d. Create a bar chart comparing conversion rates across sources.

e. Calculate the share of revenue vs. share of sessions for each source to identify efficiency gaps.

Exercise 11: Cohort Campaign Analysis

Use the build_campaign_cohorts() function from the chapter to analyze customer retention by acquisition campaign.

Create synthetic data:

import pandas as pd
import numpy as np
from datetime import date, timedelta

np.random.seed(42)
n_customers = 800

campaigns = np.random.choice(
    ["spring_sale", "summer_promo", "back_to_school", "holiday_preview"],
    n_customers, p=[0.25, 0.25, 0.25, 0.25]
)
acq_dates = {
    "spring_sale": date(2024, 3, 1),
    "summer_promo": date(2024, 6, 1),
    "back_to_school": date(2024, 8, 15),
    "holiday_preview": date(2024, 10, 1),
}

orders = []
for i, (cid, campaign) in enumerate(zip(range(n_customers), campaigns)):
    acq = acq_dates[campaign]
    # Retention probability varies by campaign quality
    retention_rates = {
        "spring_sale": 0.45, "summer_promo": 0.35,
        "back_to_school": 0.55, "holiday_preview": 0.25
    }
    for month_offset in range(6):
        order_date = acq + timedelta(days=30 * month_offset)
        if month_offset == 0 or np.random.random() < retention_rates[campaign]:
            orders.append({
                "customer_id": f"C{cid:04d}",
                "order_date": order_date,
                "revenue": np.random.normal(92, 28),
                "acquisition_campaign": campaign,
            })

orders_df = pd.DataFrame(orders)

Tasks:

a. Use build_campaign_cohorts() to create a 6-month cohort revenue table.

b. Plot the cohort heatmap using plot_cohort_heatmap().

c. Which campaign produced the highest Month 1 revenue per customer? Which produced the highest Month 5 revenue per customer? Are they the same?

d. Calculate the 6-month cumulative revenue per customer by campaign. Which campaign produced the best long-term value?

e. Based on this analysis, if you had to run just one of these campaign types in Q1 next year, which would you choose and why?

Tier 4: Advanced Analysis (Exercises 12–15)

Exercise 12: Budget Optimization Model

Using the optimize_budget_allocation() function from the chapter:

Setup:

# Estimated diminishing returns parameters (alpha, beta) per channel
# Revenue = alpha × spend^beta
channel_params = {
    "paid_search":   (850, 0.62),
    "paid_social":   (520, 0.58),
    "email":         (2100, 0.41),
    "content_seo":   (680, 0.75),
    "events":        (1200, 0.52),
}
total_budget = 50_000

Tasks:

a. Run the optimizer and display the recommended budget allocation per channel.

b. What percentage of budget does the optimizer recommend for each channel?

c. Compare the optimized allocation against a naive equal split. What is the projected revenue difference?

d. Run the optimizer again with total_budget = 80_000. Does the allocation change proportionally, or do some channels gain/lose share? Why?

e. What does the beta parameter represent in the model? Why is email likely to have a low beta while content_seo has a higher one?

Exercise 13: Marketing Attribution Comparison

Scenario: You have click-path data showing all touchpoints before a conversion. Build a comparison of first-touch, last-touch, and linear attribution.

# Each row is a touchpoint in a conversion path
touchpoints = pd.DataFrame([
    {"conversion_id": "CV001", "position": 1, "channel": "paid_search", "revenue": 0},
    {"conversion_id": "CV001", "position": 2, "channel": "email", "revenue": 0},
    {"conversion_id": "CV001", "position": 3, "channel": "direct", "revenue": 185.00},
    {"conversion_id": "CV002", "position": 1, "channel": "organic_search", "revenue": 0},
    {"conversion_id": "CV002", "position": 2, "channel": "direct", "revenue": 92.50},
    {"conversion_id": "CV003", "position": 1, "channel": "linkedin", "revenue": 0},
    {"conversion_id": "CV003", "position": 2, "channel": "email", "revenue": 0},
    {"conversion_id": "CV003", "position": 3, "channel": "email", "revenue": 0},
    {"conversion_id": "CV003", "position": 4, "channel": "direct", "revenue": 340.00},
    # Add at least 10 more rows with varied paths
])

Extend the dataset to at least 20 conversion paths, then:

a. Implement first-touch attribution: assign full revenue credit to position 1 per conversion.

b. Implement last-touch attribution: assign full revenue credit to the last position per conversion.

c. Implement linear attribution: distribute revenue equally across all touchpoints per conversion.

d. Build a comparison table showing attributed revenue by channel for all three models.

e. Which channels benefit most from first-touch vs. last-touch? What does this tell you about their role in the customer journey?

Exercise 14: ROAS-Based Budget Allocation

You manage digital advertising for a company with a 45% gross margin. Campaigns that cannot meet the break-even ROAS should have budget cut.

campaign_data = pd.DataFrame({
    "campaign": ["Brand Search", "Non-Brand Search", "Retargeting", "Prospecting Display",
                 "Social Retargeting", "Social Prospecting", "YouTube"],
    "spend_last_30d": [4200, 12800, 3100, 8400, 5600, 14200, 6800],
    "revenue_last_30d": [38400, 47600, 19800, 22100, 28900, 31400, 18200],
    "impression_share": [0.88, 0.42, 0.91, 0.34, 0.67, 0.28, 0.51],
})

Tasks:

a. Calculate ROAS for each campaign.

b. Calculate the break-even ROAS given a 45% gross margin.

c. Identify campaigns above and below break-even ROAS.

d. For campaigns below break-even ROAS, what ROAS would they need to achieve to justify their current spend? What does that imply about required improvements?

e. Campaigns with low impression share have room to scale (spend more). Campaigns with high impression share are near capacity. Which campaigns are both above break-even AND have room to scale? This is where you should increase budget.

f. Write a brief allocation recommendation memo (3-5 sentences) that a marketing manager without Python knowledge could read and act on.

Exercise 15: End-to-End Campaign Performance Dashboard

Build a complete marketing performance summary from the following data sources:

# Monthly channel spend and results
monthly_data = {
    "month": ["2024-07", "2024-08", "2024-09", "2024-10"],
    "paid_search_spend": [8200, 9100, 8800, 10200],
    "paid_search_revenue": [41000, 43900, 44800, 52100],
    "email_spend": [800, 800, 850, 900],
    "email_revenue": [18400, 19200, 17800, 22600],
    "social_spend": [5600, 6200, 5900, 7400],
    "social_revenue": [16800, 17400, 15700, 20200],
    "new_customers": [142, 158, 151, 184],
}

Tasks:

a. Calculate blended ROAS for each month across all channels combined.

b. Calculate ROAS per channel per month. Which channel showed the most consistent improvement over the 4-month period?

c. Calculate blended CAC for each month.

d. Create a 4-panel dashboard: - Panel 1: Blended ROAS trend (line chart) - Panel 2: ROAS by channel (grouped bar chart) - Panel 3: Monthly spend breakdown (stacked bar chart) - Panel 4: CAC trend (line chart with goal line at $250)

e. Write a 150-word executive summary of the Q3–Q4 period that Sandra Chen could read in 2 minutes before a board meeting.

Tier 5: Challenge Problems (Exercises 16–18)

Exercise 16: Statistical Power Simulation

Build a Monte Carlo simulation that demonstrates the relationship between sample size, effect size, and statistical power.

a. For effect sizes of 5%, 10%, 15%, 20%, and 25% relative lift at a 4% baseline conversion rate, simulate 1,000 A/B tests at each of these sample sizes per variant: 500, 1,000, 2,500, 5,000, 10,000.

b. For each combination, calculate the simulated power (fraction of simulations that correctly detected the effect at p < 0.05).

c. Plot a heatmap of simulated power vs. effect size vs. sample size.

d. Compare your simulated power values against the analytical formula from calculate_required_sample_size(). How close are they?

Exercise 17: Customer LTV Segmentation for Marketing

Using pandas and matplotlib, build a customer segmentation system based on LTV and engagement:

a. Create a synthetic customer dataset with 500 customers, each having: total_revenue (Pareto distributed), purchase_frequency, months_active, email_engagement_score, and acquisition_channel.

b. Calculate LTV for each customer.

c. Segment customers into quartiles based on LTV. Label them: Champions (Q4), Loyalists (Q3), Potentials (Q2), At-Risk (Q1).

d. Calculate average email engagement by LTV quartile. Is there a correlation?

e. For each LTV quartile, calculate what a 10% improvement in retention rate would be worth in annualized revenue. Show which segments have the highest dollar-value improvement potential.

Exercise 18: Multi-Metric A/B Test Dashboard

Build a complete A/B test monitoring dashboard that tracks multiple metrics simultaneously:

a. Simulate a 4-week A/B test where the treatment improves conversion rate by 12% but has no effect on average order value. Generate daily data for each variant.

b. Calculate and plot a "sequential p-value" chart: the p-value for the conversion rate test calculated at each day of the test (1 through 28). Add a horizontal line at p = 0.05.

c. Add a confidence interval around the daily lift estimate. Show how the confidence interval narrows as sample size grows.

d. Add a "required sample size remaining" chart showing how far the test is from reaching its predetermined sample size.

e. Discuss: on which day did the test first cross p < 0.05? If you had stopped then (peeked), would your lift estimate have been accurate? Compare the "early read" lift to the final 28-day lift.