Case Study 25-2: When Your Average Is a Lie
Character: Maya Reyes, Independent Consultant Business Question: Is her consulting practice as profitable as she thinks? Key Insight: Mean project margin is masking a less flattering median reality
The Setup
Maya had been consulting independently for three years, and by her own assessment, things were going well. Her standard answer when anyone asked how business was going: "My average project margin is around 58%." She had calculated that number herself — total profit across all projects divided by total revenue. Fifty-eight percent. She was proud of it.
But her accountant, during their quarterly review, had asked a question that stuck with her: "Are your margins consistent, or are a few great projects making the average look better than it is?"
Maya had brushed it off at the time. But the question nagged at her. She had her project data in a spreadsheet — time tracking, revenue, costs, everything. On a slow Friday afternoon, she decided to actually look.
The Data
Maya exported her project log to a CSV file. Each row represented a completed project over the past 30 months.
import pandas as pd
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
# Load Maya's project data
df = pd.read_csv("maya_project_log.csv")
print(f"Total projects: {len(df)}")
print(f"Total revenue: ${df['revenue'].sum():,.0f}")
print(f"Total profit: ${df['profit'].sum():,.0f}")
print()
print(df.describe())
The dataset had these columns:
project_id, client_name, client_type, project_type,
revenue, cost, profit, margin_pct, duration_weeks,
start_date
client_type was one of: "startup", "mid-size_company", "enterprise", "nonprofit"
project_type was one of: "strategy", "implementation", "training", "audit"
The First Surprise: Mean vs. Median
margin = df["margin_pct"]
mean_margin = margin.mean()
median_margin = margin.median()
gap = mean_margin - median_margin
print("Project Margin Statistics:")
print(f" Mean margin : {mean_margin:.1f}%")
print(f" Median margin : {median_margin:.1f}%")
print(f" Gap : {gap:+.1f} percentage points")
print()
print("Distribution details:")
print(f" Std deviation : {margin.std():.1f}%")
print(f" Min margin : {margin.min():.1f}%")
print(f" Q1 (25th pct) : {margin.quantile(0.25):.1f}%")
print(f" Q3 (75th pct) : {margin.quantile(0.75):.1f}%")
print(f" Max margin : {margin.max():.1f}%")
Output:
Project Margin Statistics:
Mean margin : 57.8%
Median margin : 41.3%
Gap : +16.5 percentage points
Distribution details:
Std deviation : 24.6%
Min margin : 8.2%
Q1 (25th pct) : 29.4%
Q3 (75th pct) : 62.1%
Max margin : 94.7%
Maya set down her coffee.
A 16.5 percentage point gap between mean and median. Her "58% average margin" figure — the one she had told her accountant, her friends, herself — was telling a fundamentally misleading story. The typical project (median) was earning her 41% margins. The mean was inflated by some very high-margin outliers.
She pulled up the histogram to see the shape.
Visualizing the Distribution
fig, axes = plt.subplots(1, 2, figsize=(13, 5))
# Histogram with mean and median marked
ax1 = axes[0]
ax1.hist(df["margin_pct"], bins=18, color="#7C4DFF", edgecolor="white",
alpha=0.85)
ax1.axvline(mean_margin, color="#F44336", linewidth=2.5, linestyle="--",
label=f"Mean: {mean_margin:.1f}%")
ax1.axvline(median_margin, color="#00C853", linewidth=2.5,
label=f"Median: {median_margin:.1f}%")
ax1.set_xlabel("Project Margin (%)")
ax1.set_ylabel("Number of Projects")
ax1.set_title("Project Margin Distribution")
ax1.legend(fontsize=11)
ax1.set_xlim(0, 100)
# Box plot by client type
ax2 = axes[1]
client_types = ["startup", "mid-size_company", "enterprise", "nonprofit"]
labels = ["Startup", "Mid-Size", "Enterprise", "Nonprofit"]
data = [df[df["client_type"] == ct]["margin_pct"] for ct in client_types]
bp = ax2.boxplot(data, labels=labels, patch_artist=True)
colors = ["#E8F5E9", "#E3F2FD", "#FFF8E1", "#FCE4EC"]
for patch, color in zip(bp["boxes"], colors):
patch.set_facecolor(color)
for med in bp["medians"]:
med.set_color("#D32F2F")
med.set_linewidth(2)
ax2.set_ylabel("Margin (%)")
ax2.set_title("Margin by Client Type")
ax2.axhline(median_margin, color="#757575", linewidth=1, linestyle="--",
label=f"Overall median: {median_margin:.1f}%")
ax2.legend(fontsize=9)
plt.suptitle("Maya Reyes Consulting — Project Profitability Analysis",
fontsize=13, fontweight="bold")
plt.tight_layout()
plt.savefig("maya_margin_analysis.png", dpi=150)
plt.show()
The histogram told the story visually. The distribution had a long right tail — the bulk of projects clustered in the 25%–55% range, with a distinct group of high-margin projects ($85%+) pulling the mean rightward.
The Second Surprise: Which Projects Had the High Margins?
# Sort projects by margin descending
df_sorted = df.sort_values("margin_pct", ascending=False)
print("Top 10 highest-margin projects:")
print(df_sorted[["client_name", "client_type", "project_type",
"revenue", "margin_pct"]].head(10).to_string(index=False))
print("\nBottom 10 lowest-margin projects:")
print(df_sorted[["client_name", "client_type", "project_type",
"revenue", "margin_pct"]].tail(10).to_string(index=False))
Output (top 10, simplified):
client_name client_type project_type revenue margin_pct
TechVenture A enterprise training 42,000 94.7
TechVenture B enterprise training 38,500 91.2
CloudCo enterprise training 35,000 88.9
MegaCorp enterprise audit 51,000 86.3
...
Bottom 10:
client_name client_type project_type revenue margin_pct
GreenNPO nonprofit implementation 18,000 8.2
LocalNPO nonprofit implementation 14,000 11.4
TechStartup_14 startup implementation 22,000 14.8
TechStartup_7 startup implementation 19,500 16.1
...
A clear pattern: the highest-margin projects were enterprise training engagements. The lowest-margin projects were implementation work for nonprofits and startups.
Segmenting by Client Type and Project Type
# Margin by client type
print("Median Margin by Client Type:")
by_client = df.groupby("client_type")["margin_pct"].agg(
count="count",
median="median",
mean="mean",
std="std"
).round(1)
print(by_client.sort_values("median", ascending=False).to_string())
print()
# Margin by project type
print("Median Margin by Project Type:")
by_project = df.groupby("project_type")["margin_pct"].agg(
count="count",
median="median",
mean="mean",
std="std"
).round(1)
print(by_project.sort_values("median", ascending=False).to_string())
Output:
Median Margin by Client Type:
count median mean std
enterprise 9 68.4 71.2 14.3
mid-size_company 16 48.7 52.1 18.9
startup 22 34.2 38.6 22.4
nonprofit 8 17.8 22.3 11.2
Median Margin by Project Type:
count median mean std
training 11 79.4 81.3 10.1
audit 10 63.2 65.8 13.4
strategy 21 41.8 46.2 19.7
implementation 13 18.4 22.6 12.8
Maya looked at these tables for a long time.
The enterprise/training combination was generating median margins of nearly 80%. The nonprofit/implementation combination was generating median margins of around 14%. She had been treating all her projects as roughly equivalent. She had been wrong by a factor of five.
The Outlier Analysis
# IQR-based outlier detection
Q1 = df["margin_pct"].quantile(0.25)
Q3 = df["margin_pct"].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
high_outliers = df[df["margin_pct"] > upper_bound]
low_outliers = df[df["margin_pct"] < lower_bound]
print(f"IQR: {IQR:.1f} percentage points")
print(f"Outlier bounds: {lower_bound:.1f}% to {upper_bound:.1f}%")
print()
print(f"High-margin outliers ({len(high_outliers)} projects):")
print(high_outliers[["client_type", "project_type", "revenue",
"margin_pct"]].to_string(index=False))
print(f"\nLow-margin outliers ({len(low_outliers)} projects):")
print(low_outliers[["client_type", "project_type", "revenue",
"margin_pct"]].to_string(index=False))
# What share of total profit came from high-margin outliers?
outlier_profit = high_outliers["profit"].sum()
total_profit = df["profit"].sum()
print(f"\nHigh-margin outlier projects represent:")
print(f" {len(high_outliers)/len(df)*100:.1f}% of project count")
print(f" {outlier_profit/total_profit*100:.1f}% of total profit")
Output:
IQR: 32.7 percentage points
Outlier bounds: -19.7% to 111.2%
(No statistical outliers by IQR method — the distribution is wide but no extreme values)
Wait — checking top quartile instead...
Projects above 75th percentile (>{Q3:.1f}%):
14 projects above 62.1% margin
These 14 projects (25% of total) represent:
52.4% of total profit
The high-margin projects — even without being statistical outliers by the IQR method — were doing enormous heavy lifting. A quarter of her projects were generating more than half her total profit.
The Profit Concentration Analysis
# Sort by profit contribution
df_profit = df.sort_values("profit", ascending=False).reset_index(drop=True)
df_profit["cumulative_profit"] = df_profit["profit"].cumsum()
df_profit["profit_share"] = df_profit["cumulative_profit"] / df_profit["profit"].sum() * 100
# How many projects to reach 50% of profit?
projects_for_50_pct = (df_profit["profit_share"] < 50).sum() + 1
print(f"Projects needed to reach 50% of total profit: {projects_for_50_pct}")
print(f"Out of {len(df_profit)} total projects ({projects_for_50_pct/len(df_profit)*100:.1f}%)")
# Cumulative profit curve
fig, ax = plt.subplots(figsize=(9, 5))
ax.plot(range(1, len(df_profit) + 1),
df_profit["profit_share"], color="#7C4DFF", linewidth=2.5)
ax.axhline(50, color="#9E9E9E", linestyle=":", linewidth=1.5)
ax.axvline(projects_for_50_pct, color="#F44336", linestyle="--",
linewidth=1.5, label=f"{projects_for_50_pct} projects = 50% of profit")
ax.set_xlabel("Number of Projects (ranked by profit, high to low)")
ax.set_ylabel("Cumulative % of Total Profit")
ax.set_title("Profit Concentration — Maya Reyes Consulting")
ax.legend()
ax.set_xlim(1, len(df_profit))
ax.set_ylim(0, 105)
ax.yaxis.set_major_formatter(plt.FuncFormatter(lambda x, _: f"{x:.0f}%"))
plt.tight_layout()
plt.savefig("maya_profit_concentration.png", dpi=150)
plt.show()
Output:
Projects needed to reach 50% of total profit: 9
Out of 55 total projects (16.4%)
Nine projects. Sixteen percent of her project count was generating half her profit. The bottom third of her projects — implementation work, primarily for startups and nonprofits — was profitable in absolute terms, but at margins that were costing her time she could have spent on higher-value work.
What This Changes
Maya sat with these findings for a while. Here is what the statistics had shown her:
-
Her "58% average margin" was inflated by a small number of high-margin enterprise training engagements. The typical project (median) generated 41% margins — not 58%.
-
Enterprise + training was her most profitable combination by a significant margin (79% median vs. 41% overall median).
-
Implementation work, especially for nonprofits and early-stage startups, had margins below 20%. She was not losing money, but she was spending significant time for modest returns.
-
A minority of her projects (the top 25%) generated the majority (52%) of her profit. The effort she was putting into low-margin projects was essentially subsidizing the rest of her business.
She thought back to how she had been selecting clients. She had said yes to a lot of implementation work because she liked the tangible, build-something quality of it. She had taken nonprofit clients at discounted rates because she believed in the causes. Both were legitimate choices — but she had made them without understanding the profitability cost.
The Strategic Decision
# Simulate: what if Maya rebalanced her project mix?
# Current mix
current_by_type = df.groupby("project_type").agg(
count=("revenue", "count"),
total_revenue=("revenue", "sum"),
total_profit=("profit", "sum"),
).assign(effective_margin=lambda x: x["total_profit"] / x["total_revenue"] * 100)
print("Current Project Mix:")
print(current_by_type.round(1).to_string())
# Hypothetical: shift 5 implementation projects to training/strategy
impl_avg_revenue = df[df["project_type"] == "implementation"]["revenue"].median()
impl_avg_margin = df[df["project_type"] == "implementation"]["margin_pct"].median() / 100
training_avg_revenue = df[df["project_type"] == "training"]["revenue"].median()
training_avg_margin = df[df["project_type"] == "training"]["margin_pct"].median() / 100
lost_profit = 5 * impl_avg_revenue * impl_avg_margin
gained_profit = 5 * training_avg_revenue * training_avg_margin
net_change = gained_profit - lost_profit
print(f"\nHypothetical: Replace 5 implementation projects with 5 training projects")
print(f" Lost profit from implementation : ${lost_profit:,.0f}")
print(f" Gained profit from training : ${gained_profit:,.0f}")
print(f" Net profit improvement : ${net_change:+,.0f}")
Output:
Hypothetical: Replace 5 implementation projects with 5 training projects
Lost profit from implementation : $15,400
Gained profit from training : $31,200
Net profit improvement : +$15,800
Without working more hours, without raising her rates, just by changing the type of work she pursued — Maya could increase her annual profit by approximately $15,800.
The New Strategy
Maya updated her client intake criteria:
Prioritize: - Enterprise clients seeking training or strategy work - Mid-size company strategy engagements (strong margins, interesting work) - Any client where the project can be templated for training delivery
Evaluate carefully: - Implementation projects (only at higher rates, or for strategic relationships) - Nonprofit engagements (selective — limit to 1-2 per year as pro bono/relationship work, not core revenue)
New success metric: She replaced "average project margin" in her mental scorecard with "median project margin" — the number that actually represented a typical engagement, not the inflated mean. Her goal: get the median above 50% over the next 18 months.
What Maya Learned About Statistics
Looking back at this analysis, Maya identified three statistical lessons:
Lesson 1: Always compare mean and median. If they diverge significantly, your mean is being distorted by outliers. In business, this almost always means a few exceptional outcomes are masking a more modest typical reality.
Lesson 2: Segment before you conclude. "Average margin" across all projects was meaningless for decision-making. Margin by client type and project type was actionable.
Lesson 3: Profit concentration matters. Knowing that 16% of her projects generated 50% of her profit was not just interesting — it told her exactly where to point her business development energy.
Her accountant had asked the right question. The data had answered it honestly. Maya just had to be willing to look.
Technical Summary
| Code Pattern | What It Revealed |
|---|---|
margin.mean() vs margin.median() |
16.5 point gap — mean inflated by high-margin outliers |
df.groupby("client_type")["margin_pct"].median() |
Enterprise clients: 68% median margin |
df.groupby("project_type")["margin_pct"].median() |
Training: 79% vs. Implementation: 18% |
| IQR outlier detection | No extreme outliers — but top quartile disproportionate |
| Cumulative profit analysis | 16% of projects = 50% of total profit |
| Histogram with mean/median lines | Right skew made visually immediate |