Chapter 27 Quiz: Customer Analytics and Segmentation

Instructions: Choose the best answer for each multiple-choice question. Short-answer questions require 2–4 sentences. Complete the answer key at the bottom only after attempting all questions.

Multiple Choice

Question 1

Customer Lifetime Value (CLV) is best described as:

A) The revenue a customer generated in the most recent fiscal year B) The total revenue or profit expected from a customer over the entire duration of the relationship C) A customer's average order value multiplied by the number of orders placed to date D) The acquisition cost amortized over the customer's active period

Question 2

In RFM analysis, the Recency dimension measures:

A) The number of times a customer has contacted customer support B) How long the customer has been doing business with you C) How recently the customer made their last purchase D) The recency of the customer's registration date

Question 3

When applying quintile scoring to the Recency dimension, which assignment is correct?

A) Customers who purchased most recently receive a score of 1 B) Customers who purchased most recently receive a score of 5 C) All customers in the same quartile receive the same score D) Recency is not scored numerically in standard RFM analysis

Question 4

A customer with R=1, F=5, M=5 should be placed in which segment?

A) Champions B) Loyal Customers C) Cannot Lose Them D) New Customers

Question 5

Which pandas function is most appropriate for converting a continuous variable (like recency days) into five equal-sized groups?

A) pd.cut() B) pd.qcut() C) pd.groupby() D) pd.pivot_table()

Question 6

In cohort analysis, an "acquisition cohort" is defined as:

A) All customers who made a purchase in a given calendar month B) All customers who were acquired through the same marketing channel C) A group of customers who made their first purchase in the same time period D) A group of customers segmented by their current purchase frequency

Question 7

The diagonal from top-left to bottom-right on a cohort retention heatmap represents:

A) The average retention rate across all cohorts at each period B) The most recently acquired cohort's retention curve C) Each cohort's retention rate at the same calendar month D) How each cohort retains in a given number of months after acquisition

Question 8

Why must you scale features before running K-means clustering?

A) Scikit-learn's KMeans class requires standardized inputs to run B) Features with larger numeric ranges would otherwise dominate the distance calculations C) Scaling makes the elbow curve cleaner and easier to interpret D) K-means cannot handle negative values, which scaling eliminates

Question 9

In the K-means elbow method, "inertia" refers to:

A) The number of iterations the algorithm took to converge B) The sum of squared distances from each data point to its assigned cluster centroid C) The ratio of between-cluster variance to total variance D) The computational time required to fit the model

Question 10

A customer who previously placed 12 orders per year and spent $60,000 annually has not placed an order in 95 days. Which of the following actions is most appropriate?

A) Archive this customer in the CRM and focus on new acquisition B) Send a generic marketing email with a 10% discount coupon C) Escalate to a senior account manager for personal outreach to understand what changed D) Reduce the customer's credit limit to minimize financial exposure

Question 11

Which of the following is an advantage of rule-based RFM segmentation over K-means clustering?

A) Rule-based RFM always produces more accurate segments B) Rule-based RFM can handle more features than K-means C) Rule-based RFM produces transparent, explainable segments that are easy to communicate to non-technical stakeholders D) Rule-based RFM requires no data preprocessing

Question 12

When computing churn risk signals, why do we compare a customer's activity in the last 6 months versus the prior 6 months, rather than simply looking at whether they have ordered recently?

A) Because the last 6 months are more statistically reliable than the prior 6 months B) Because this relative comparison controls for customers who naturally order infrequently, showing genuine decline regardless of baseline C) Because pandas cannot compute time periods longer than 6 months D) Because regulatory requirements mandate a 6-month comparison window

Question 13

What does a cohort retention heatmap cell showing a value of 0.28 at row "Jan-2022" and column "M+6" mean?

A) 28 customers from the January 2022 cohort made a purchase in month 6 B) 28% of customers acquired in January 2022 were still making purchases 6 months after acquisition C) The average order value for the January 2022 cohort declined by 28% in month 6 D) 28% of all customers made their first purchase in January 2022

Question 14

The customer health score in Section 27.7 is designed to:

A) Replace the RFM score as the primary segmentation metric B) Provide a single 0–100 composite metric that sales teams can use at a glance to identify accounts needing attention C) Predict the exact probability that a customer will churn in the next 30 days D) Replace the need for cohort analysis

Question 15

A retail company's cohort heatmap shows that month-1 retention has been declining steadily for the last eight cohorts. The most likely business interpretation is:

A) The company's recent acquisition efforts have been ineffective B) There is a problem with the customer experience between the first and second purchase — new customers are not being converted to repeat buyers C) The company's pricing has become uncompetitive D) Seasonality is affecting the month-1 retention calculation

Short Answer

Question 16

Explain the difference between pd.cut() and pd.qcut() and describe a business scenario where you would choose each one.

Question 17

You run K-means with K=4 and get four clusters. When you calculate the mean values for each cluster, you find that two of the four clusters have nearly identical profiles. What does this suggest, and what would you do?

Question 18

Describe two behavioral signals — beyond recency, frequency, and monetary value — that might indicate a B2B customer is at risk of churning. Explain the business logic behind each signal.

Question 19

A marketing manager looks at your RFM segment report and says: "The 'Lost' segment has 400 customers in it. We should send them all a re-engagement email." How would you advise them? What factors should influence the decision of whether and how to re-engage this segment?

Question 20

Explain why a declining cohort retention curve is a more concerning signal than a declining monthly revenue number. In other words, why does it matter which cohort customers belong to?

Answer Key

Multiple Choice:

B — CLV is the total expected revenue (or profit) over the entire customer relationship. Option A is just one year's revenue. Option C is historical spend-to-date, not a forward-looking value. Option D describes payback period math, not CLV.
C — Recency measures how recently the customer last purchased. Relationship tenure (B) is a different metric. Support contacts (A) are not part of standard RFM.
B — In standard RFM, a score of 5 is the "best" on every dimension. For recency, "best" means most recent, so the customer who purchased most recently gets a 5. This requires inverting the labels when using pd.qcut().
C — R=1 (has not purchased recently), F=5 and M=5 (was buying frequently and spending heavily). This is the "Cannot Lose Them" pattern: a high-value customer who has gone quiet and needs urgent attention.
B — pd.qcut() creates equal-frequency bins (each bin has approximately the same number of data points). pd.cut() creates equal-width bins, which would not divide a skewed distribution (like recency days) into balanced groups.
C — A cohort is defined by first purchase timing, not channel or current behavior. Option A describes all active customers in a month, which would include customers from many different cohorts.
D — The diagonal shows how each cohort retains in a given number of months after their acquisition (period 1, period 2, etc.). Comparing across the diagonal shows whether early retention is improving or worsening over time.
B — K-means uses Euclidean distance to assign points to centroids. A feature with a range of 0–100,000 will dominate a feature with a range of 1–5. Scaling to zero mean and unit variance puts all features on equal footing.
B — Inertia (also called within-cluster sum of squares) measures how compact the clusters are. The elbow occurs where adding another cluster stops significantly reducing inertia.
C — This customer profile (Cannot Lose Them or At Risk with very high historical value) warrants personal, senior-level outreach to understand the root cause. A generic discount email (B) is insufficient for a $60K/year account. Archiving (A) wastes the relationship. Credit reduction (D) is punitive and irrelevant.
C — Transparency is the key advantage of rule-based RFM. You can explain to a sales manager in plain English why a customer is labeled "At Risk." K-means produces mathematically derived clusters that are harder to explain and may be less stable between runs.
B — The relative comparison (recent vs. prior period) accounts for heterogeneous customer behavior. A customer who orders twice a year is not in decline simply because they have not ordered in three months. But if they ordered four times in the first six months and zero times in the last six months, that is a meaningful change regardless of their baseline frequency.
B — Cohort retention values are rates (proportions), not counts. A value of 0.28 means 28% of the original cohort was still purchasing 6 months after acquisition.
B — The health score is an operational tool for account management. It is not a replacement for RFM or cohort analysis, and it is not a probability model (D). It synthesizes multiple signals into one number so sales teams can scan large account lists quickly.
B — Declining month-1 retention specifically means fewer new customers are making a second purchase. This points to a failure in the early customer experience: onboarding, product quality, value delivery, or first-contact service. Acquisition effectiveness (A) would affect cohort sizes, not retention rates.

Short Answer:

pd.cut() divides data into bins of equal width (equal range of values). pd.qcut() divides data into bins of equal frequency (equal number of observations per bin). Use pd.cut() when the bin boundaries have a natural business meaning — for example, categorizing order values as "<$500," "$500–$2,000," "$2,000–$10,000," ">$10,000" because these tiers have pricing or service implications. Use pd.qcut() for RFM scoring, where you want equal representation in each score bucket (each score 1–5 should represent roughly 20% of customers).
Two clusters with nearly identical profiles suggest that K=4 may be one cluster too many for this dataset — the algorithm has "split" a natural cluster into two indistinguishable groups. You should re-examine the elbow curve and consider whether K=3 would be a better fit. You could also run the elbow method with silhouette scores, which measure how well-separated the clusters are, to confirm whether K=4 is genuinely adding information.
Two valid examples: (1) Shrinking product breadth: A customer who used to order from five product categories but now orders only from one. This suggests they may be consolidating purchasing with another vendor and using you only for a remaining niche. (2) Increased support escalations: A sudden rise in complaints or support escalations — particularly if they are unresolved — correlates with churn intent in B2B relationships. Dissatisfied customers who raise complaints and feel unheard are significantly more likely to leave than customers who never complain (because silent dissatisfied customers have often already decided to leave).
The advice: proceed carefully and with a clear test. The "Lost" segment typically has the lowest conversion rate of any re-engagement effort, so bulk email campaigns to this segment often have very poor ROI. A better approach is to segment within "Lost" by monetary value: former high-value customers who are lost warrant a personalized approach (direct phone call, specific offer), while low-value lost customers can receive a low-cost email test. Set a clear decision criterion: if the re-engagement email yields less than X% response in Y days, archive the segment and stop spending on it. The marketing manager's instinct to reach out is right; the execution needs to be targeted, not blanket.
A declining monthly revenue number could have many causes: fewer new customers, a bad month, seasonality, a price change. It does not tell you whether your ability to retain customers is changing. A declining cohort retention curve is more structural: it means that regardless of how many customers you acquire, a smaller percentage of each new wave is sticking. This is a business model issue, not a bad month. It means that the longer you continue without addressing the retention problem, the worse your customer base quality will become — because you will have an increasingly large proportion of customers who will leave within a few months. High revenue today can coexist with deteriorating retention, and the retention signal will predict revenue problems before they appear in the revenue numbers.