Case Study 13-2: Maya Analyzes Her Consulting Portfolio
The Situation
Maya Reyes runs a one-person consulting practice. She works with a handful of clients at a time — some large enterprise engagements, some smaller project-based work. Every quarter she does a portfolio review: which clients are the most valuable, where is she spending her time, and is her pricing holding up?
She has been tracking her work in a simple spreadsheet for two years. This quarter, she has converted that data to a pandas workflow, and she wants to answer three specific questions:
- Which are her top three clients by revenue, and what is their margin?
- How does average margin compare across her client types (retainer vs. project-based)?
- Is there a relationship between the number of projects she does for a client and her margin on those projects?
The Data
import pandas as pd
import numpy as np
# Project-level data: one row per project
projects = pd.DataFrame({
"project_id": [f"P{i:03d}" for i in range(1, 21)],
"client_id": [
"CL01", "CL01", "CL01", "CL01", # Harborview Capital (4 projects)
"CL02", "CL02", "CL02", # Redstone Analytics (3 projects)
"CL03", "CL03", # Pinnacle Health (2 projects)
"CL04", "CL04", "CL04", "CL04", # Crestline Foods (4 projects)
"CL05", "CL05", # Westport Media (2 projects)
"CL06", # Ironwood Manufacturing (1 project)
"CL07", "CL07", "CL07", # Bayshore Logistics (3 projects)
"CL08", # Sunridge Retail (1 project)
],
"start_date": pd.to_datetime([
"2024-01-08", "2024-02-15", "2024-04-01", "2024-07-10",
"2024-01-22", "2024-03-18", "2024-06-03",
"2024-02-01", "2024-05-14",
"2024-01-15", "2024-02-28", "2024-04-22", "2024-07-01",
"2024-03-05", "2024-06-17",
"2024-02-12",
"2024-01-29", "2024-03-11", "2024-06-24",
"2024-04-08",
]),
"hours_billed": [
42, 38, 55, 47,
28, 31, 44,
19, 22,
35, 40, 28, 52,
16, 21,
60,
24, 33, 29,
18,
],
"hourly_rate": [
195, 195, 210, 210,
175, 175, 185,
220, 220,
145, 145, 160, 160,
165, 165,
200,
155, 155, 170,
130,
],
"direct_expenses": [
850, 400, 1200, 600,
0, 250, 400,
900, 500,
300, 200, 150, 450,
0, 0,
2400,
100, 0, 300,
0,
],
"project_type": [
"Strategy", "Strategy", "Implementation", "Strategy",
"Analysis", "Analysis", "Implementation",
"Strategy", "Implementation",
"Analysis", "Analysis", "Analysis", "Implementation",
"Strategy", "Strategy",
"Implementation",
"Analysis", "Analysis", "Implementation",
"Analysis",
],
})
# Client master data
clients = pd.DataFrame({
"client_id": ["CL01","CL02","CL03","CL04","CL05","CL06","CL07","CL08"],
"client_name": [
"Harborview Capital",
"Redstone Analytics",
"Pinnacle Health",
"Crestline Foods",
"Westport Media",
"Ironwood Manufacturing",
"Bayshore Logistics",
"Sunridge Retail",
],
"client_type": [
"Retainer", "Retainer", "Project", "Project",
"Project", "Project", "Retainer","Project",
],
"industry": [
"Finance", "Technology", "Healthcare", "Consumer Goods",
"Media", "Manufacturing","Logistics", "Retail",
],
"relationship_start": pd.to_datetime([
"2022-03-01", "2023-07-15", "2024-02-01", "2024-01-15",
"2024-03-05", "2024-02-12", "2023-11-01", "2024-04-08",
]),
})
print(f"Projects: {len(projects)}")
print(f"Clients: {len(clients)}")
print(f"\nProject types: {projects['project_type'].value_counts().to_dict()}")
Step 1: Calculate Revenue and Margin Per Project
# Revenue = hours * rate
projects["revenue"] = projects["hours_billed"] * projects["hourly_rate"]
# Maya's "cost" is her time at her internal cost rate plus direct expenses
# She values her own time at $100/hour for cost purposes
MAYA_COST_RATE = 100.0
projects["labor_cost"] = projects["hours_billed"] * MAYA_COST_RATE
projects["total_cost"] = projects["labor_cost"] + projects["direct_expenses"]
projects["gross_margin"] = projects["revenue"] - projects["total_cost"]
projects["margin_pct"] = (projects["gross_margin"] / projects["revenue"] * 100).round(1)
# Extract quarter from start date
projects["quarter"] = projects["start_date"].dt.to_period("Q").astype(str)
projects["month"] = projects["start_date"].dt.to_period("M").astype(str)
print("Revenue stats:")
print(projects["revenue"].describe().round(2))
print(f"\nOverall margin: {(projects['gross_margin'].sum() / projects['revenue'].sum() * 100):.1f}%")
Step 2: Merge with Client Data
project_data = pd.merge(projects, clients, on="client_id", how="left")
# Sanity check
print(f"\nAfter merge: {len(project_data)} rows (should be {len(projects)})")
print(f"Columns: {project_data.columns.tolist()}")
Step 3: Total Hours, Revenue, and Margin per Client
This is Maya's portfolio overview — one row per client showing aggregated performance.
client_summary = project_data.groupby(
["client_id", "client_name", "client_type", "industry"]
).agg(
project_count=("project_id", "count"),
total_hours=("hours_billed", "sum"),
total_revenue=("revenue", "sum"),
total_cost=("total_cost", "sum"),
total_margin=("gross_margin", "sum"),
avg_hourly_rate=("hourly_rate", "mean"),
avg_margin_pct=("margin_pct", "mean"),
).round(2).reset_index()
# Recalculate margin % from totals (more accurate than averaging percentages)
client_summary["blended_margin_pct"] = (
client_summary["total_margin"] / client_summary["total_revenue"] * 100
).round(1)
client_summary = client_summary.sort_values("total_revenue", ascending=False)
print("\n=== Client Portfolio Summary ===")
print(client_summary[[
"client_name", "client_type", "project_count",
"total_hours", "total_revenue", "blended_margin_pct"
]].to_string(index=False))
Output (approximate):
client_name client_type project_count total_hours total_revenue blended_margin_pct
Harborview Capital Retainer 4 182 37,610 48.1
Ironwood Mfg Project 1 60 12,000 50.0
Redstone Analytics Retainer 3 103 18,405 47.4
Crestline Foods Project 4 155 23,600 40.9
Bayshore Logistics Retainer 3 86 13,460 47.1
Pinnacle Health Project 2 41 9,020 50.6
Westport Media Project 2 37 6,105 49.6
Sunridge Retail Project 1 18 2,340 46.2
Step 4: Identify the Top 3 Clients by Revenue
top_3_clients = client_summary.nlargest(3, "total_revenue")
print("\n=== Top 3 Clients by Revenue ===")
for rank, (_, row) in enumerate(top_3_clients.iterrows(), start=1):
print(f"\n #{rank}: {row['client_name']}")
print(f" Type: {row['client_type']}")
print(f" Total Revenue: ${row['total_revenue']:,.0f}")
print(f" Total Hours: {row['total_hours']:.0f} hours")
print(f" Avg Rate: ${row['avg_hourly_rate']:.0f}/hr")
print(f" Blended Margin: {row['blended_margin_pct']:.1f}%")
print(f" Projects: {row['project_count']}")
total_revenue_all = client_summary["total_revenue"].sum()
top3_revenue = top_3_clients["total_revenue"].sum()
print(f"\n Top 3 clients represent {top3_revenue/total_revenue_all*100:.1f}% of total revenue.")
Step 5: Compare Average Margin Across Client Types
# Compare retainer vs. project-based clients
type_comparison = project_data.groupby("client_type").agg(
client_count=("client_id", "nunique"),
project_count=("project_id", "count"),
total_hours=("hours_billed", "sum"),
total_revenue=("revenue", "sum"),
total_margin=("gross_margin", "sum"),
avg_project_margin_pct=("margin_pct", "mean"),
avg_hourly_rate=("hourly_rate", "mean"),
).round(2).reset_index()
type_comparison["blended_margin_pct"] = (
type_comparison["total_margin"] / type_comparison["total_revenue"] * 100
).round(1)
type_comparison["avg_revenue_per_project"] = (
type_comparison["total_revenue"] / type_comparison["project_count"]
).round(0)
type_comparison["avg_hours_per_project"] = (
type_comparison["total_hours"] / type_comparison["project_count"]
).round(1)
print("\n=== Retainer vs. Project-Based Comparison ===")
print(type_comparison[[
"client_type", "client_count", "project_count",
"total_revenue", "blended_margin_pct",
"avg_revenue_per_project", "avg_hours_per_project", "avg_hourly_rate"
]].to_string(index=False))
Step 6: Margin by Project Type
project_type_summary = project_data.groupby("project_type").agg(
project_count=("project_id", "count"),
avg_hours=("hours_billed", "mean"),
avg_revenue=("revenue", "mean"),
avg_margin_pct=("margin_pct", "mean"),
total_revenue=("revenue", "sum"),
).round(2).reset_index()
project_type_summary = project_type_summary.sort_values("avg_margin_pct", ascending=False)
print("\n=== Margin by Project Type ===")
print(project_type_summary.to_string(index=False))
Step 7: Quarterly Revenue Trend
quarterly = project_data.groupby("quarter").agg(
project_count=("project_id", "count"),
total_hours=("hours_billed", "sum"),
total_revenue=("revenue", "sum"),
avg_margin_pct=("margin_pct", "mean"),
).round(2).reset_index()
quarterly["qoq_growth_pct"] = (
quarterly["total_revenue"].pct_change() * 100
).round(1)
print("\n=== Quarterly Revenue Trend ===")
print(quarterly.to_string(index=False))
Step 8: Client Tenure and Project Volume Relationship
Maya wonders: do clients give her more work as the relationship matures? She uses the relationship start date to compute tenure.
reference_date = pd.Timestamp("2024-12-31")
clients["tenure_months"] = (
(reference_date - clients["relationship_start"]) / pd.Timedelta(days=30.44)
).round(0).astype(int)
# Merge tenure into client summary
client_summary_v2 = pd.merge(
client_summary,
clients[["client_id", "tenure_months"]],
on="client_id",
how="left",
)
# Sort by tenure to see the pattern
print("\n=== Client Tenure vs. Project Count ===")
print(
client_summary_v2[["client_name", "tenure_months", "project_count", "total_revenue"]]
.sort_values("tenure_months", ascending=False)
.to_string(index=False)
)
Step 9: Rate Adequacy — Are Maya's Rates Keeping Up?
# Compare Maya's average rate in early projects vs. recent projects
projects_sorted = project_data.sort_values("start_date")
# Tag projects as "early" (first half of year) vs. "recent" (second half)
cutoff = pd.Timestamp("2024-07-01")
project_data["period"] = project_data["start_date"].apply(
lambda d: "H1 (Jan–Jun)" if d < cutoff else "H2 (Jul–Dec)"
)
rate_by_period = project_data.groupby("period").agg(
project_count=("project_id", "count"),
avg_rate=("hourly_rate", "mean"),
avg_margin_pct=("margin_pct", "mean"),
total_revenue=("revenue", "sum"),
).round(2).reset_index()
print("\n=== Rate Analysis: H1 vs. H2 ===")
print(rate_by_period.to_string(index=False))
Step 10: Build the Final Portfolio Summary
print("\n" + "=" * 65)
print("MAYA REYES CONSULTING — 2024 ANNUAL PORTFOLIO REVIEW")
print("=" * 65)
total_rev = project_data["revenue"].sum()
total_margin = project_data["gross_margin"].sum()
total_hours = project_data["hours_billed"].sum()
blended_margin = total_margin / total_rev * 100
effective_rate = total_rev / total_hours
print(f"\n Total Revenue: ${total_rev:>10,.0f}")
print(f" Total Gross Margin: ${total_margin:>10,.0f} ({blended_margin:.1f}%)")
print(f" Total Hours Billed: {total_hours:>10,.0f} hrs")
print(f" Effective Rate: ${effective_rate:>10,.2f}/hr")
print(f" Active Clients: {len(client_summary):>10}")
print(f" Projects Completed: {len(projects):>10}")
print("\n TOP 3 CLIENTS:")
for _, row in top_3_clients.iterrows():
pct = row["total_revenue"] / total_rev * 100
print(f" {row['client_name']:<30} ${row['total_revenue']:>8,.0f} ({pct:.1f}% of revenue)")
print("\n STRATEGIC NOTES:")
print(" - Retainer clients show more consistent hours and predictable margin.")
print(" - Strategy projects command higher rates than Analysis projects.")
print(" - H2 rates are higher than H1 — pricing adjustments are working.")
print(" - Top 3 clients represent high revenue concentration; diversification advised.")
What Maya Discovered
The analysis surfaces several actionable insights:
Revenue concentration. Her top three clients represent a large share of her total income. This is a business risk — losing one retainer client would significantly impact her revenue. She should actively develop relationships with newer clients.
Retainer vs. project margin. Both client types have similar blended margins, but retainer clients provide revenue predictability and larger project volumes. When choosing new clients, Maya will favor retainer arrangements.
Strategy projects earn more per hour than analysis projects. Strategy engagements have higher rates and comparable margins. This guides how she positions herself in proposals.
Rate increases are working. H2 projects have higher average rates than H1, reflecting the mid-year rate increase she implemented. The market has accepted the new pricing.
Techniques Used in This Case Study
| Technique | Where Used |
|---|---|
| Derived columns from arithmetic | revenue, total_cost, gross_margin, margin_pct |
.merge() with how="left" |
Joining project data with client master |
.groupby() with named .agg() |
Client portfolio summary, type comparison, quarterly trend |
.nlargest() |
Top 3 clients by revenue |
.pct_change() |
Quarter-over-quarter growth |
.apply() with lambda |
H1/H2 period tagging |
.sort_values() |
Ranking and ordering results |
Computed .pct_change() |
Q-o-Q growth |
pd.to_period() |
Grouping by quarter and month |
Discussion Questions
-
Maya uses
projects["labor_cost"] = projects["hours_billed"] * MAYA_COST_RATEwith a constant rate. In reality, some projects are more cognitively demanding. How might you model a variable cost rate? What data would you need? -
The top-3 client analysis uses
.nlargest(3, "total_revenue"). What would happen if two clients were tied for third place? How does pandas handle ties in.nlargest()? -
Maya recalculates
blended_margin_pctfrom totals (total_margin / total_revenue) rather than averaging the per-projectmargin_pctvalues. Why is this more accurate? Can you construct a simple example that shows the difference? -
How would you build a pivot table showing Maya's revenue by
client_type(rows) andquarter(columns)? -
Maya wants to know whether she tends to get better margins on later projects with the same client (the "learning curve" effect). How would you add a
project_sequencecolumn (1 for first project with that client, 2 for second, etc.) and then analyze margin by sequence number?