Case Study 25-1: Finding the Signal in the Sales Team

Character: Priya, Data Analyst at Acme Corp Stakeholder: Sandra Chen, CFO Business Question: Where should management focus to maximize revenue growth?


The Request

It was a Tuesday when Marcus Webb, Acme's VP of Sales, stopped by Priya's desk with a printout. The printout was a table of all 58 sales reps and their annual revenue numbers — a wall of digits that Marcus had been staring at for three days trying to make sense of.

"I know something is off," Marcus said. "I know we have some reps absolutely killing it and some who are basically spinning their wheels. But I can't see the pattern. Sandra wants a budget proposal for sales team investment next quarter and I need something more concrete than 'top performers should get more resources.'"

Priya took the printout, nodded, and said she would have something for him by end of day.

She opened her laptop and got to work.


The Data

The dataset contained one row per sales representative with the following fields:

rep_id, rep_name, region, product_line, annual_revenue,
deals_closed, avg_deal_size, close_rate_pct, calls_made,
tenure_years, satisfaction_score
import pandas as pd
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt

# Load the data
df = pd.read_csv("acme_sales_reps_annual.csv")

print(f"Total reps: {len(df)}")
print(f"Total revenue: ${df['annual_revenue'].sum():,.0f}")
print()
print(df.describe())

Output (abbreviated):

Total reps: 58
Total revenue: $8,423,150

       annual_revenue  deals_closed  close_rate_pct
count       58.000000     58.000000       58.000000
mean    145,226.72       19.86           41.3
std      98,432.11        7.12           11.8
min      28,450.00        4.00           11.2
25%      74,820.00       15.00           33.4
50%     118,350.00       20.00           41.1
75%     187,920.00       25.00           49.2
max     521,380.00       36.00           78.3

The first thing Priya noticed: the mean ($145K) was substantially higher than the median ($118K). That gap — almost 23% — told a story even before any deep analysis. Some high earners were pulling the average up. She made a mental note to use median as the "typical" performance figure in her report.


Analysis Step 1: The Revenue Distribution

import matplotlib.pyplot as plt

fig, axes = plt.subplots(1, 2, figsize=(13, 5))

# Histogram
ax1 = axes[0]
ax1.hist(df["annual_revenue"] / 1000, bins=20,
         color="#2196F3", edgecolor="white", alpha=0.85)
ax1.axvline(df["annual_revenue"].mean() / 1000, color="#F44336",
            linewidth=2, linestyle="--",
            label=f"Mean: ${df['annual_revenue'].mean()/1000:.0f}K")
ax1.axvline(df["annual_revenue"].median() / 1000, color="#4CAF50",
            linewidth=2,
            label=f"Median: ${df['annual_revenue'].median()/1000:.0f}K")
ax1.set_xlabel("Annual Revenue ($K)")
ax1.set_ylabel("Number of Reps")
ax1.set_title("Revenue Distribution — All 58 Reps")
ax1.legend()

# Box plot by product line
ax2 = axes[1]
product_lines = sorted(df["product_line"].unique())
data = [df[df["product_line"] == pl]["annual_revenue"] / 1000
        for pl in product_lines]
bp = ax2.boxplot(data, labels=product_lines, patch_artist=True)
colors = ["#BBDEFB", "#C8E6C9", "#FFECB3"]
for patch, color in zip(bp["boxes"], colors):
    patch.set_facecolor(color)
ax2.set_ylabel("Annual Revenue ($K)")
ax2.set_title("Revenue by Product Line")

plt.tight_layout()
plt.savefig("priya_rep_distribution.png", dpi=150)
plt.show()

The histogram confirmed her suspicion: the distribution had a long right tail. Most reps clustered in the $75K–$175K range, with a handful producing $300K–$520K. The histogram shape alone told her this was not a team of equals — it was a team with a high-performance tier that was carrying disproportionate weight.


Analysis Step 2: The 80/20 Breakdown

Priya had heard of the Pareto Principle — the idea that roughly 80% of outcomes come from 20% of inputs. She wanted to see if it held for Acme's sales team.

# Sort reps by revenue, descending
df_sorted = df.sort_values("annual_revenue", ascending=False).reset_index(drop=True)

# Cumulative revenue and share
df_sorted["cumulative_revenue"] = df_sorted["annual_revenue"].cumsum()
total_revenue = df_sorted["annual_revenue"].sum()
df_sorted["cumulative_share"] = df_sorted["cumulative_revenue"] / total_revenue * 100
df_sorted["rep_pct"] = (df_sorted.index + 1) / len(df_sorted) * 100

# Find the top 20% threshold
top_20_count = int(np.ceil(len(df_sorted) * 0.20))
top_20_reps = df_sorted.head(top_20_count)
top_20_revenue = top_20_reps["annual_revenue"].sum()
top_20_share = top_20_revenue / total_revenue * 100

print(f"Top 20% of reps = {top_20_count} reps")
print(f"Top 20% revenue = ${top_20_revenue:,.0f}")
print(f"Top 20% share   = {top_20_share:.1f}% of total revenue")
print()
print("Top reps:")
print(top_20_reps[["rep_name", "region", "product_line",
                    "annual_revenue"]].to_string(index=False))

Output:

Top 20% of reps = 12 reps
Top 20% revenue = $5,527,490
Top 20% share   = 65.6% of total revenue

Top reps:
  rep_name  region  product_line  annual_revenue
  J. Torres    East    Enterprise      521,380
  A. Patel     West    Enterprise      498,720
  ...

Priya leaned back. Twelve reps. Sixty-six percent of the revenue. The Pareto Principle understated it — this was more like a 20/65 distribution.

She immediately thought about risk: if Acme lost even two or three of those twelve reps to competitors, the revenue impact would be devastating. And she thought about opportunity: what were those twelve doing that the other 46 were not?


Analysis Step 3: The Bottom Quartile

# Bottom quartile threshold
q1_threshold = df["annual_revenue"].quantile(0.25)
bottom_quartile = df[df["annual_revenue"] <= q1_threshold].copy()

print(f"Q1 threshold: ${q1_threshold:,.0f}")
print(f"Bottom quartile reps: {len(bottom_quartile)}")
print()
print(bottom_quartile[["rep_name", "region", "product_line",
                        "annual_revenue", "tenure_years",
                        "close_rate_pct"]].sort_values("annual_revenue")
      .to_string(index=False))

# Characteristics of bottom quartile
print(f"\nBottom quartile averages:")
print(f"  Avg tenure : {bottom_quartile['tenure_years'].mean():.1f} years")
print(f"  Avg close rate: {bottom_quartile['close_rate_pct'].mean():.1f}%")
print(f"  Avg calls made: {bottom_quartile['calls_made'].mean():.0f}")

Output:

Q1 threshold: $74,820
Bottom quartile reps: 15

rep_name  region  product_line  annual_revenue  tenure_years  close_rate_pct
 B. Kim     South  Mid-Market        28,450         0.8           14.2
 ...

Bottom quartile averages:
  Avg tenure : 1.3 years
  Avg close rate: 24.8%
  Avg calls made: 142

A clear pattern emerged. The bottom quartile was predominantly newer reps (average 1.3 years tenure) with low close rates (24.8% versus the team median of 41.1%). They were making fewer calls than the middle tier — not laziness, Priya suspected, but lack of confidence and pipeline.

This told Priya that the bottom quartile issue was likely a training and onboarding problem, not a talent problem. These were mostly new people who had not yet learned the patterns of the top performers.


Analysis Step 4: What Distinguishes Top vs. Bottom?

# Create performance tier column
conditions = [
    df["annual_revenue"] >= df["annual_revenue"].quantile(0.80),
    df["annual_revenue"] <= df["annual_revenue"].quantile(0.25),
]
choices = ["Top 20%", "Bottom 25%"]
df["tier"] = np.select(conditions, choices, default="Middle 55%")

# Compare metrics across tiers
tier_summary = df.groupby("tier").agg(
    count=("annual_revenue", "count"),
    median_revenue=("annual_revenue", "median"),
    median_deal_size=("avg_deal_size", "median"),
    median_close_rate=("close_rate_pct", "median"),
    median_calls=("calls_made", "median"),
    median_tenure=("tenure_years", "median"),
).round(1)

print("\nPerformance Tier Comparison:")
print(tier_summary.to_string())

Output:

Performance Tier Comparison:
             count  median_revenue  median_deal_size  median_close_rate  median_calls  median_tenure
Bottom 25%      15       58,340           8,920              24.8           142           1.3
Middle 55%      31      131,450          14,870              41.5           198           3.8
Top 20%         12      312,780          42,340              58.3           241           6.2

The pattern was unmistakable. Top performers had: - 3.7× higher median deal sizes ($42K vs. $11K in the bottom quartile) - 2.3× higher close rates (58% vs. 25%) - 70% more calls made per year - 4.8× more tenure (6.2 years vs. 1.3 years)

Correlation analysis would come next, but even this simple comparison pointed to deal size and close rate as the key differentiators.


Analysis Step 5: Correlations with Revenue

numeric_cols = ["deals_closed", "avg_deal_size", "close_rate_pct",
                "calls_made", "tenure_years"]
rev_correlations = df[numeric_cols + ["annual_revenue"]].corr()["annual_revenue"]
rev_correlations = rev_correlations.drop("annual_revenue").sort_values(
    key=abs, ascending=False
)

print("Correlation with Annual Revenue:")
for col, r in rev_correlations.items():
    direction = "↑" if r > 0 else "↓"
    print(f"  {col:<22}: r = {r:+.3f} {direction}")

Output:

Correlation with Annual Revenue:
  avg_deal_size        : r = +0.782 ↑
  close_rate_pct       : r = +0.634 ↑
  tenure_years         : r = +0.521 ↑
  calls_made           : r = +0.318 ↑
  deals_closed         : r = +0.143 ↑

Priya stared at the last row. Deal volume (deals_closed) had almost no correlation with revenue. The number of deals a rep closed barely mattered — the size of those deals was everything. Close rate was important but secondary. Tenure (experience) mattered substantially.

This was a finding she had not expected, and it directly challenged how Acme measured activity. The sales team had weekly KPIs centered on "deals closed per week" — but closing lots of small deals was not the path to revenue success. Closing fewer, larger deals was.


The Presentation

Priya put together a five-slide summary for Marcus and Sandra. The key findings:

Slide 1 — The Distribution "58 reps, $8.4M in annual revenue. But the median rep produces $118K — the mean ($145K) is inflated by a high-performance tier."

Slide 2 — The 20/65 Rule "Our top 12 reps (20%) generate 65.6% of total revenue. This is not unusual for B2B sales — but it means these 12 people are a critical retention priority. Losing 2-3 would be a material revenue event."

Slide 3 — The Bottom Quartile "15 reps produce below $75K annually. The primary characteristic: average 1.3 years tenure. This looks like an onboarding and ramp problem, not a talent problem. These reps have the activity (calls) but not the skill (deal size, close rate) of established performers."

Slide 4 — The Biggest Lever "Average deal size correlates with revenue at r = 0.78. It is by far the strongest predictor. We measure deals closed, but we should be measuring and coaching deal size. Our KPIs are measuring the wrong thing."

Slide 5 — Recommended Actions 1. Implement a structured retention program for top-20% reps (compensation benchmarking, dedicated support, recognition) 2. Build an onboarding playbook focused on deal qualification and sizing — not just activity volume 3. Revise weekly KPIs to include average deal size alongside deal count 4. Run a 90-day coaching pilot with the bottom quartile, targeting close rate improvement


Sandra's Response

Sandra read the report that evening before the budget meeting the next morning. She called Marcus before he could even arrive at the office.

"The deal size finding is the one I want to talk about," she said. "We've been reporting average deals closed per rep in every board deck. If deal count doesn't actually correlate with revenue, we've been managing to the wrong metric for two years."

Marcus was quiet for a moment. "Yeah. Priya's right. We celebrate reps who close twenty small deals. But Jordan Torres — our top rep — closes eight deals. Eight deals and a half million dollars. The whole philosophy is backwards."

By the end of the budget meeting, Sandra had authorized funding for both the retention program and the onboarding overhaul. The statistics had not told her what to decide. But they had made the decision unmistakably clear.


Key Techniques Used

Technique Business Purpose
df.describe() First-look summary to find mean/median gap and distribution shape
df.quantile() Define performance tiers (top 20%, bottom 25%)
cumsum() + percentage Pareto-style revenue concentration analysis
df.groupby().agg() Compare metrics across performance tiers
df.corr() Identify which activities actually drive revenue
Histogram + box plot Visualize distribution shape and cross-segment comparison

The "So What" Chain

  • Finding: Mean ($145K) >> Median ($118K) → right-skewed distribution
  • Interpretation: A handful of top reps are pulling the average up artificially
  • Implication: Quota models based on the mean will feel unreachable to most reps
  • Decision: Anchor quotas to median performance; treat top-tier production as a bonus tier, not a base expectation

  • Finding: Deal size correlates with revenue at r = 0.78; deal count at r = 0.14

  • Interpretation: Success comes from closing bigger deals, not more deals
  • Implication: Activity-based KPIs (calls made, deals closed per week) measure the wrong thing
  • Decision: Shift KPI emphasis to average deal size and deal qualification discipline