Chapter 27 Exercises: Customer Analytics and Segmentation

These exercises progress from direct application of chapter concepts (Tier 1) through independent analysis and design challenges (Tier 5). Work through each tier before moving to the next.

Tier 1: Recall and Fundamentals

These exercises confirm you understood the core concepts. They require minimal coding.

Exercise 1.1 — RFM Definitions

Without looking at the chapter, write a one-sentence definition of each of the three RFM dimensions. Then explain why each one, on its own, gives an incomplete picture of customer value.

Exercise 1.2 — Segment Identification

For each customer profile below, identify the most likely RFM segment and explain your reasoning:

Customer	Last Purchase	Orders (12 mo)	Annual Spend
A	5 days ago	24	$85,000
B	210 days ago	18	$72,000
C	8 days ago	1	$340
D	190 days ago	2	$1,200
E	45 days ago	8	$14,500

Exercise 1.3 — Cohort Analysis Interpretation

The cohort retention table below shows retention rates for three acquisition cohorts:

Cohort	M+0	M+1	M+2	M+3	M+6	M+12
Jan-2022	100%	42%	31%	28%	19%	14%
Jul-2022	100%	45%	33%	30%	21%	15%
Jan-2023	100%	53%	41%	37%	—	—

Answer these questions: 1. Is month-1 retention improving or declining over time? 2. What does the blank in the Jan-2023 M+12 column mean? 3. Based on the trend, what would you estimate M+6 retention for the Jan-2023 cohort?

Exercise 1.4 — CLV Calculation

A subscription software company has the following metrics: - Average monthly revenue per customer: $299 - Average customer lifespan: 26 months - Gross margin: 72%

Calculate: (a) simple CLV based on revenue, (b) margin-adjusted CLV.

If their customer acquisition cost (CAC) is $1,800, is their unit economics healthy? Explain.

Exercise 1.5 — K-Means Concepts

Answer these conceptual questions about K-means clustering: 1. Why must you scale features before running K-means? 2. What does "inertia" measure in the context of the elbow method? 3. K-means requires you to specify K in advance. Name two approaches for deciding what K to use. 4. You run K-means on customer data and get four clusters. Cluster 3 has very high monetary value and very low recency. What business label would you assign to this cluster, and what action would you take?

Tier 2: Direct Application

Apply the chapter's code patterns to new data scenarios.

Exercise 2.1 — RFM Scoring from Scratch

Create a synthetic transaction dataset with these properties: - 300 unique customers - 2,500 total transactions - Transaction dates spanning 2022-01-01 to 2023-12-31 - Order values drawn from a log-normal distribution (mean $500, wide spread)

Then: 1. Compute raw R, F, M metrics for each customer 2. Apply quintile scoring (1–5) 3. Print the distribution of R, F, and M scores 4. Verify that each score has approximately equal numbers of customers (because quintiles split evenly by design)

Exercise 2.2 — At-Risk Customer Report

Using the data from Exercise 2.1, or the data generated by rfm_analysis.py:

Filter to only "At Risk" and "Cannot Lose Them" segments
Sort by monetary value descending
Add a column called days_until_one_year_inactive that calculates how many days until the customer will have been inactive for 365 days
Export this as urgent_outreach_list.csv

Exercise 2.3 — Cohort Table Construction

Given a DataFrame transactions with columns customer_id, purchase_date, and amount, write a function called build_cohort_table(transactions) that returns a retention rate matrix (as in Section 27.4.1) without using any code from the chapter directly. Write it from memory, then compare to the chapter code.

Exercise 2.4 — Health Score Customization

The health score formula in Section 27.7 weights recency, frequency, monetary, and trend equally (25 points each).

Redesign the health score for a subscription business where: - Recency is less important (customers pay monthly automatically) - Trend (whether they are expanding or contracting their subscription) is most important - Product breadth (number of features actively used) should be a factor

Write the revised calculate_health_score() function with your new weightings. Justify each weighting in a comment.

Exercise 2.5 — Elbow Method Practice

Generate a dataset of 500 customers with three features: r_score, f_score, and m_score (all 1–5 integers). Run the K-means elbow method for K=2 through K=10. Plot the elbow curve. Based on the plot, what K would you choose? Does the elbow method give a clear answer for this particular dataset? Why or why not?

Tier 3: Analysis and Interpretation

These exercises require you to think like a business analyst, not just run code.

Exercise 3.1 — Segment Action Planning

You have run RFM analysis on a retail company and found the following segment distribution:

Segment	Customers	% of Revenue
Champions	89	38%
Loyal Customers	213	27%
At Risk	156	18%
Potential Loyalists	298	9%
New Customers	412	5%
Lost	534	3%

Write a one-paragraph recommendation for where to focus marketing and sales effort, with a specific rationale for each segment you prioritize or deprioritize.
If you only had budget for two outreach campaigns, which two segments would you target and what would each campaign look like?
The "Lost" segment has 534 customers (the most of any segment) but only 3% of revenue. Should you invest in trying to win them back? Under what conditions would this be worth it?

Exercise 3.2 — Cohort Heatmap Analysis

Run cohort_analysis.py and examine the resulting heatmap. Then write a 200-word business memo (as if you were Priya, writing to Sandra) that: 1. Explains what a cohort retention chart shows (in plain English, without jargon) 2. States the key findings from the chart 3. Makes one specific recommendation based on those findings

Exercise 3.3 — Churn Signal Investigation

Using the churn signal framework from Section 27.6, write a query (using pandas) against a transaction dataset that identifies customers who meet ALL of the following criteria: 1. Were in the top 25% of spenders in the prior 12 months 2. Have not ordered in the last 45 days 3. Showed declining order frequency (fewer orders in the last 6 months than in the 6 months before that)

For each customer found, calculate a "days until one-year inactive" field. Export the results sorted by historical spend descending.

Exercise 3.4 — Comparing RFM and K-Means Segments

Run both the rule-based RFM segmentation (from Section 27.3) and K-means clustering (K=4) on the same dataset. Create a cross-tabulation (use pd.crosstab()) showing how the K-means clusters align with (or differ from) the RFM segments.

Write a paragraph explaining: where do the two methods agree? Where do they disagree? What does that tell you about your customer base?

Exercise 3.5 — CLV Distribution Analysis

Using per-customer CLV calculations from Section 27.2.2:

Plot a histogram of projected CLV values
Calculate what percentage of customers account for 80% of total projected CLV
Define a "CLV tier" column: Top 10%, Top 11–30%, Bottom 70%
For each tier, calculate average recency, frequency, and monetary values
What does this tell you about the relationship between current behavioral signals and projected lifetime value?

Tier 4: Extended Projects

These exercises require building something new, not just adapting chapter code.

Exercise 4.1 — Automated Monthly RFM Report

Build a Python script called monthly_rfm_report.py that: 1. Accepts a CSV filename as a command-line argument (python monthly_rfm_report.py transactions.csv) 2. Runs the complete RFM pipeline 3. Saves a PNG heatmap showing the RFM score distribution (R vs M, colored by segment) 4. Saves a CSV with the at-risk list, filtered to customers with monetary score >= 3 5. Prints a one-page executive summary to the console

The script should be fully self-contained and runnable without modification after the initial setup.

Exercise 4.2 — RFM Score Stability Analysis

Run RFM analysis on your transaction dataset, then simulate running it again "one month later" by shifting the analysis date forward 30 days.

Identify which customers changed segments between the two runs
Build a "migration matrix" showing how many customers moved from each segment to each other segment
Calculate the net movement: are more customers moving up (improving) or down (deteriorating)?
Visualize the migration with a Sankey-style flow chart (use matplotlib or the plotly library's Sankey diagram)

Exercise 4.3 — Industry-Specific RFM Customization

The standard RFM framework was designed for retail transaction data. Adapt it for one of the following contexts, justifying your choices:

Option A: B2B Software Subscriptions - Replace frequency with "feature adoption score" (number of distinct product features used per month) - Replace monetary with "contract value" (annual contract value) - Add a fourth dimension: "expansion" (whether the account has grown, stayed flat, or shrunk)

Option B: Professional Services / Consulting - Replace raw transaction counts with "project count" and "average project duration" - Add a "referral score" (has this client referred other clients?) - Define segment names that make sense in a services context

Write the scoring functions, segment assignment logic, and segment action recommendations for your chosen option.

Exercise 4.4 — Cohort Retention with Revenue

Standard cohort analysis counts customers. Build a parallel analysis that tracks average revenue per user (ARPU) by cohort, not just customer counts.

For each cohort and period: 1. Calculate total revenue from active customers 2. Divide by the original cohort size (not just active customers) to get "revenue retention" 3. Compare customer retention to revenue retention — in healthy businesses, revenue retention often exceeds customer retention (because surviving customers tend to spend more over time) 4. Plot both metrics on the same chart for comparison

Tier 5: Advanced and Open-Ended

These exercises have no single correct answer. They require judgment, creativity, and business thinking.

Exercise 5.1 — Designing a Customer Analytics Program

You have been asked by a regional bank to design a customer analytics program for their retail banking customers. The bank has: - 85,000 retail checking/savings customers - Transaction data going back 5 years - Product holding data (which products each customer has) - Customer service interaction records - No existing customer segmentation

Design (in writing and pseudocode, not necessarily working code) a full customer analytics program that includes: 1. Data inventory and quality assessment plan 2. Segmentation approach (justify your choice of RFM, K-means, or a hybrid) 3. Key metrics to track and their refresh cadence 4. Recommended actions for each segment 5. How you would measure the ROI of the analytics program itself

Exercise 5.2 — Retention Curve Fitting

Cohort retention curves follow a roughly exponential decay. Using scipy.optimize.curve_fit or a manual least-squares approach:

Fit an exponential decay model to the retention data from a cohort: retention(t) = a * e^(-b*t)
Extract the parameters a and b for each cohort
Use the fitted parameters to predict what month-12 retention will be for your most recent cohort (which may not yet have 12 months of data)
Discuss the limitations of this prediction approach

Exercise 5.3 — Customer Value Segmentation vs. Behavioral Segmentation

There is a philosophical debate in customer analytics between: - Value-based segmentation: Group customers by how much they are worth (CLV tiers, revenue tiers) - Behavioral segmentation: Group customers by what they do (RFM, K-means on behaviors)

Write a 400-word essay arguing for one approach over the other for a specific business context of your choice. Address: What decisions does each approach support? What does each approach miss? When would you use both?

Exercise 5.4 — Build a Customer Analytics Dashboard

Using matplotlib subplots (or plotly if you prefer interactive charts), build a single-page Customer Analytics Dashboard that fits on a standard widescreen display. The dashboard should show:

Segment distribution (pie or donut chart)
Revenue by segment (horizontal bar chart)
Cohort retention heatmap (or simplified version showing only M+1 and M+3)
Health score distribution (histogram)
Top 10 at-risk customers by revenue (simple table or annotated chart)

The dashboard should be generated from a single function call and take no more than 5 seconds to render on a standard laptop.

Exercise 5.5 — Critique and Improve the Chapter's Health Score

Review the calculate_health_score() function from Section 27.7. Identify at least three specific limitations or weaknesses: 1. A case where the formula would give a misleadingly high score to a customer who is actually unhealthy 2. A case where it would give a misleadingly low score to a customer who is actually thriving 3. A business scenario where the formula would be entirely inappropriate

For each limitation, propose a specific code change that would address it. Implement your improved version and compare the score distribution before and after your changes on the same dataset.