Case Study 27-1: Priya's RFM Wake-Up Call

The Setup

It is a Tuesday morning in mid-January. Priya Okonkwo has been working at Acme Corp for fourteen months. In that time she has gotten comfortable with pandas, built a handful of automated reports, and started to feel — for the first time — like she knows what she is doing.

Then Sandra Chen stops by her desk.

"I want to know which of our customers are going cold," Sandra says, without preamble. "We just lost Hartfield Facilities Management — one of our top ten accounts. I want to know if there are others like them that we don't know about yet."

Priya has heard of RFM analysis. She covered it in the Python analytics course she took last fall. This is the moment to actually use it.

"Give me until end of week," she tells Sandra.


The Data

Priya pulls two years of Acme's transaction history from the data warehouse: every order, with customer ID, order date, and order value. The resulting DataFrame has 47,312 rows and covers 2,156 unique customers.

import pandas as pd
import numpy as np

# Load transaction data
transactions = pd.read_csv(
    "acme_transactions_2022_2023.csv",
    parse_dates=["order_date"],
)

print(f"Transactions: {len(transactions):,}")
print(f"Customers:    {transactions['customer_id'].nunique():,}")
print(f"Date range:   {transactions['order_date'].min().date()} "
      f"to {transactions['order_date'].max().date()}")
print(f"Total revenue: ${transactions['order_value'].sum():,.2f}")

Output:

Transactions: 47,312
Customers:    2,156
Date range:   2022-01-03 to 2023-12-29
Total revenue: $14,287,440.18

Step 1: Compute Raw RFM Metrics

Priya sets the analysis date to January 15, 2024 — the day she is running the analysis.

analysis_date = pd.Timestamp("2024-01-15")

rfm = (
    transactions
    .groupby("customer_id")
    .agg(
        last_purchase_date=("order_date", "max"),
        frequency=("order_value", "count"),
        monetary=("order_value", "sum"),
    )
    .reset_index()
)

rfm["recency_days"] = (analysis_date - rfm["last_purchase_date"]).dt.days
rfm["avg_order_value"] = rfm["monetary"] / rfm["frequency"]

print(rfm[["recency_days", "frequency", "monetary"]].describe().round(2))

Even at this stage, Priya can see the distribution is heavily skewed. The median customer has bought 8 times and spent around $2,800. But the top 10% of customers have spent over $40,000. This is the 80/20 dynamic she had heard about in business school, made visible.


Step 2: Score and Segment

def score_rfm_column(series, n_bins=5, invert=False):
    """Score a Series into 1–5 quintile bins."""
    labels = list(range(1, n_bins + 1))
    if invert:
        labels = labels[::-1]
    return pd.qcut(
        series,
        q=n_bins,
        labels=labels,
        duplicates="drop",
    ).astype(int)


rfm["r_score"] = score_rfm_column(rfm["recency_days"], invert=True)
rfm["f_score"] = score_rfm_column(rfm["frequency"])
rfm["m_score"] = score_rfm_column(rfm["monetary"])
rfm["rfm_score"] = (rfm["r_score"] + rfm["f_score"] + rfm["m_score"]) / 3.0


def assign_segment(row):
    r, f, m = row["r_score"], row["f_score"], row["m_score"]
    if r >= 4 and f >= 4 and m >= 4:
        return "Champions"
    elif r >= 3 and f >= 3 and m >= 3:
        return "Loyal Customers"
    elif r <= 2 and f >= 4 and m >= 4:
        return "Cannot Lose Them"
    elif r <= 2 and f >= 3 and m >= 3:
        return "At Risk"
    elif r >= 4 and f <= 2:
        return "Recent Customers"
    elif r >= 3 and f >= 2 and m >= 2:
        return "Potential Loyalists"
    elif r == 1 and f <= 2:
        return "Lost"
    elif r == 2 and f <= 2:
        return "Hibernating"
    else:
        return "Needs Attention"


rfm["segment"] = rfm.apply(assign_segment, axis=1)

segment_counts = rfm["segment"].value_counts()
print(segment_counts)

Output:

Loyal Customers      487
Needs Attention      391
Potential Loyalists  318
Hibernating          267
At Risk              203
Champions            189
Recent Customers     142
Lost                 109
Cannot Lose Them      51

The Finding That Changes Everything

Priya looks at that "At Risk" number: 203 customers. But then she looks at the "Cannot Lose Them" segment: 51 customers who used to buy frequently and at high value but have gone quiet in the last several months.

She investigates further.

# Deep-dive: Cannot Lose Them segment
cannot_lose = rfm[rfm["segment"] == "Cannot Lose Them"].copy()
cannot_lose = cannot_lose.merge(
    transactions.groupby("customer_id").agg(
        first_purchase_date=("order_date", "min"),
    ).reset_index(),
    on="customer_id",
)

cannot_lose["years_as_customer"] = (
    (analysis_date - cannot_lose["first_purchase_date"]).dt.days / 365.25
)

print(f"Cannot Lose Them: {len(cannot_lose)} customers")
print(f"  Total historical revenue:  ${cannot_lose['monetary'].sum():,.2f}")
print(f"  Average recency:           {cannot_lose['recency_days'].mean():.0f} days")
print(f"  Average tenure:            {cannot_lose['years_as_customer'].mean():.1f} years")
print(f"  Average lifetime spend:    ${cannot_lose['monetary'].mean():,.2f}")

Output:

Cannot Lose Them: 51 customers
  Total historical revenue:  $2,847,392.11
  Average recency:           148 days
  Average tenure:            2.8 years
  Average lifetime spend:    $55,831.22

Fifty-one customers. Average tenure of 2.8 years. Average lifetime spend of $55,831. And they have not ordered in an average of 148 days — nearly five months.

"These are the ones I'm worried about," Priya tells herself.

She refines the analysis to focus on the At Risk and Cannot Lose Them segments combined — the accounts that were loyal or valuable and have gone quiet:

at_risk_combined = rfm[rfm["segment"].isin(["At Risk", "Cannot Lose Them"])].copy()

# Sort by monetary value descending (most valuable first)
at_risk_combined = at_risk_combined.sort_values("monetary", ascending=False)

# Add a flag for the highest-priority accounts
at_risk_combined["priority"] = np.where(
    at_risk_combined["segment"] == "Cannot Lose Them",
    "CRITICAL",
    np.where(at_risk_combined["monetary"] >= 20000, "HIGH", "MEDIUM"),
)

print(f"\nAt-Risk Customer Summary:")
print(f"  Total customers: {len(at_risk_combined)}")
print(f"  Critical priority: {(at_risk_combined['priority'] == 'CRITICAL').sum()}")
print(f"  High priority: {(at_risk_combined['priority'] == 'HIGH').sum()}")
print(f"  Total revenue at risk: ${at_risk_combined['monetary'].sum():,.2f}")

# Export for Sandra
at_risk_combined[
    ["customer_id", "recency_days", "frequency", "monetary",
     "segment", "priority"]
].to_csv("at_risk_customers_jan2024.csv", index=False)
print("\nExported: at_risk_customers_jan2024.csv")

Output:

At-Risk Customer Summary:
  Total customers: 254
  Critical priority: 51
  High priority: 87
  Total revenue at risk: $8,431,902.44

The Conversation with Sandra

On Friday afternoon, Priya walks into Sandra's office with a two-page printed summary and a laptop showing the segment chart.

"I found them," she says. "254 customers who show the same behavioral pattern as Hartfield. They were active and valuable — some of them have been with Acme for three-plus years — but they've gone quiet. The 51 I've marked Critical haven't ordered in over four months and represent an average of $55K in historical spend each."

Sandra leans forward. "Can you get me names?"

"That's the CSV I just emailed you. Customer IDs, how long since they last ordered, how much they've spent with us historically, and a priority flag. Your team can match the customer IDs to account names in the CRM."

Sandra scans the list. She recognizes three of the names near the top — accounts she thought were stable. One of them, she realizes, stopped ordering right around the time Acme had its inventory fulfillment issues last summer. Nobody followed up.

"Priya," Sandra says, "this is exactly what I needed. How did you build this?"

"It took about four hours, including the time to clean the data. The analysis itself is maybe 60 lines of code. I can run it every month."

Sandra picks up her phone. "I'm calling a team meeting Monday morning. We're going to work through this list."


The Outcome

The following Monday, Sandra's sales team divides the 254 at-risk accounts among the five regional sales managers. The script becomes a monthly routine: Priya runs it on the first Monday of every month, exports the at-risk list, and emails it to Sandra with a brief summary of any accounts that moved from the previous month.

Within six weeks, the team has re-engaged 31 of the 254 accounts. Three of the "Critical" accounts turn out to have switched to a competitor due to a pricing misunderstanding that the sales team was able to resolve. Two others had simply been waiting for a new contact after their original Acme rep changed territories — nobody had introduced themselves.

The Hartfield situation doesn't repeat. Not because Priya made Acme perfect, but because now someone is watching.


What Priya Learned

The 80/20 rule is real, but it cuts both ways. A small number of customers generate most of the revenue. A small number of at-risk customers represent most of the churn risk. The analysis makes both visible.

Recency is the earliest warning signal. By the time a customer's spend declines, the relationship is often already damaged. But a sudden stop in ordering — even if the customer hasn't complained — shows up in recency immediately.

The segment name matters for communication. When Priya first drafted her email to Sandra, she used the phrase "customers with R-scores of 1 or 2 and F/M scores above 3." Sandra replied: "I don't know what that means." When Priya re-sent it calling them "At Risk" and "Cannot Lose Them," Sandra understood immediately. Good analysis fails in translation. Choose your language for your audience.

Automation multiplies the value. Running RFM once is useful. Running it every month and tracking which customers move between segments is transformative. The month-over-month view shows you velocity — not just where customers are, but whether they are improving or deteriorating.