Case Study 13-2: Maya Analyzes Her Consulting Portfolio

The Situation

Maya Reyes runs a one-person consulting practice. She works with a handful of clients at a time — some large enterprise engagements, some smaller project-based work. Every quarter she does a portfolio review: which clients are the most valuable, where is she spending her time, and is her pricing holding up?

She has been tracking her work in a simple spreadsheet for two years. This quarter, she has converted that data to a pandas workflow, and she wants to answer three specific questions:

  1. Which are her top three clients by revenue, and what is their margin?
  2. How does average margin compare across her client types (retainer vs. project-based)?
  3. Is there a relationship between the number of projects she does for a client and her margin on those projects?

The Data

import pandas as pd
import numpy as np

# Project-level data: one row per project
projects = pd.DataFrame({
    "project_id": [f"P{i:03d}" for i in range(1, 21)],
    "client_id": [
        "CL01", "CL01", "CL01", "CL01",   # Harborview Capital (4 projects)
        "CL02", "CL02", "CL02",             # Redstone Analytics (3 projects)
        "CL03", "CL03",                     # Pinnacle Health (2 projects)
        "CL04", "CL04", "CL04", "CL04",    # Crestline Foods (4 projects)
        "CL05", "CL05",                     # Westport Media (2 projects)
        "CL06",                             # Ironwood Manufacturing (1 project)
        "CL07", "CL07", "CL07",            # Bayshore Logistics (3 projects)
        "CL08",                             # Sunridge Retail (1 project)
    ],
    "start_date": pd.to_datetime([
        "2024-01-08", "2024-02-15", "2024-04-01", "2024-07-10",
        "2024-01-22", "2024-03-18", "2024-06-03",
        "2024-02-01", "2024-05-14",
        "2024-01-15", "2024-02-28", "2024-04-22", "2024-07-01",
        "2024-03-05", "2024-06-17",
        "2024-02-12",
        "2024-01-29", "2024-03-11", "2024-06-24",
        "2024-04-08",
    ]),
    "hours_billed": [
        42, 38, 55, 47,
        28, 31, 44,
        19, 22,
        35, 40, 28, 52,
        16, 21,
        60,
        24, 33, 29,
        18,
    ],
    "hourly_rate": [
        195, 195, 210, 210,
        175, 175, 185,
        220, 220,
        145, 145, 160, 160,
        165, 165,
        200,
        155, 155, 170,
        130,
    ],
    "direct_expenses": [
        850,  400,  1200, 600,
        0,    250,  400,
        900,  500,
        300,  200,  150,  450,
        0,    0,
        2400,
        100,  0,    300,
        0,
    ],
    "project_type": [
        "Strategy",    "Strategy",    "Implementation", "Strategy",
        "Analysis",    "Analysis",    "Implementation",
        "Strategy",    "Implementation",
        "Analysis",    "Analysis",    "Analysis",       "Implementation",
        "Strategy",    "Strategy",
        "Implementation",
        "Analysis",    "Analysis",    "Implementation",
        "Analysis",
    ],
})

# Client master data
clients = pd.DataFrame({
    "client_id": ["CL01","CL02","CL03","CL04","CL05","CL06","CL07","CL08"],
    "client_name": [
        "Harborview Capital",
        "Redstone Analytics",
        "Pinnacle Health",
        "Crestline Foods",
        "Westport Media",
        "Ironwood Manufacturing",
        "Bayshore Logistics",
        "Sunridge Retail",
    ],
    "client_type": [
        "Retainer", "Retainer", "Project", "Project",
        "Project",  "Project",  "Retainer","Project",
    ],
    "industry": [
        "Finance",       "Technology",  "Healthcare",  "Consumer Goods",
        "Media",         "Manufacturing","Logistics",   "Retail",
    ],
    "relationship_start": pd.to_datetime([
        "2022-03-01", "2023-07-15", "2024-02-01", "2024-01-15",
        "2024-03-05", "2024-02-12", "2023-11-01", "2024-04-08",
    ]),
})

print(f"Projects: {len(projects)}")
print(f"Clients: {len(clients)}")
print(f"\nProject types: {projects['project_type'].value_counts().to_dict()}")

Step 1: Calculate Revenue and Margin Per Project

# Revenue = hours * rate
projects["revenue"] = projects["hours_billed"] * projects["hourly_rate"]

# Maya's "cost" is her time at her internal cost rate plus direct expenses
# She values her own time at $100/hour for cost purposes
MAYA_COST_RATE = 100.0

projects["labor_cost"] = projects["hours_billed"] * MAYA_COST_RATE
projects["total_cost"] = projects["labor_cost"] + projects["direct_expenses"]
projects["gross_margin"] = projects["revenue"] - projects["total_cost"]
projects["margin_pct"] = (projects["gross_margin"] / projects["revenue"] * 100).round(1)

# Extract quarter from start date
projects["quarter"] = projects["start_date"].dt.to_period("Q").astype(str)
projects["month"] = projects["start_date"].dt.to_period("M").astype(str)

print("Revenue stats:")
print(projects["revenue"].describe().round(2))
print(f"\nOverall margin: {(projects['gross_margin'].sum() / projects['revenue'].sum() * 100):.1f}%")

Step 2: Merge with Client Data

project_data = pd.merge(projects, clients, on="client_id", how="left")

# Sanity check
print(f"\nAfter merge: {len(project_data)} rows (should be {len(projects)})")
print(f"Columns: {project_data.columns.tolist()}")

Step 3: Total Hours, Revenue, and Margin per Client

This is Maya's portfolio overview — one row per client showing aggregated performance.

client_summary = project_data.groupby(
    ["client_id", "client_name", "client_type", "industry"]
).agg(
    project_count=("project_id", "count"),
    total_hours=("hours_billed", "sum"),
    total_revenue=("revenue", "sum"),
    total_cost=("total_cost", "sum"),
    total_margin=("gross_margin", "sum"),
    avg_hourly_rate=("hourly_rate", "mean"),
    avg_margin_pct=("margin_pct", "mean"),
).round(2).reset_index()

# Recalculate margin % from totals (more accurate than averaging percentages)
client_summary["blended_margin_pct"] = (
    client_summary["total_margin"] / client_summary["total_revenue"] * 100
).round(1)

client_summary = client_summary.sort_values("total_revenue", ascending=False)

print("\n=== Client Portfolio Summary ===")
print(client_summary[[
    "client_name", "client_type", "project_count",
    "total_hours", "total_revenue", "blended_margin_pct"
]].to_string(index=False))

Output (approximate):

         client_name client_type  project_count  total_hours  total_revenue  blended_margin_pct
  Harborview Capital     Retainer              4          182        37,610                48.1
   Ironwood Mfg           Project              1           60        12,000                50.0
     Redstone Analytics   Retainer              3          103        18,405                47.4
   Crestline Foods        Project              4          155        23,600                40.9
  Bayshore Logistics     Retainer              3           86        13,460                47.1
   Pinnacle Health        Project              2           41         9,020                50.6
  Westport Media          Project              2           37         6,105                49.6
  Sunridge Retail         Project              1           18         2,340                46.2

Step 4: Identify the Top 3 Clients by Revenue

top_3_clients = client_summary.nlargest(3, "total_revenue")

print("\n=== Top 3 Clients by Revenue ===")
for rank, (_, row) in enumerate(top_3_clients.iterrows(), start=1):
    print(f"\n  #{rank}: {row['client_name']}")
    print(f"       Type: {row['client_type']}")
    print(f"       Total Revenue:  ${row['total_revenue']:,.0f}")
    print(f"       Total Hours:    {row['total_hours']:.0f} hours")
    print(f"       Avg Rate:       ${row['avg_hourly_rate']:.0f}/hr")
    print(f"       Blended Margin: {row['blended_margin_pct']:.1f}%")
    print(f"       Projects:       {row['project_count']}")

total_revenue_all = client_summary["total_revenue"].sum()
top3_revenue = top_3_clients["total_revenue"].sum()
print(f"\n  Top 3 clients represent {top3_revenue/total_revenue_all*100:.1f}% of total revenue.")

Step 5: Compare Average Margin Across Client Types

# Compare retainer vs. project-based clients
type_comparison = project_data.groupby("client_type").agg(
    client_count=("client_id", "nunique"),
    project_count=("project_id", "count"),
    total_hours=("hours_billed", "sum"),
    total_revenue=("revenue", "sum"),
    total_margin=("gross_margin", "sum"),
    avg_project_margin_pct=("margin_pct", "mean"),
    avg_hourly_rate=("hourly_rate", "mean"),
).round(2).reset_index()

type_comparison["blended_margin_pct"] = (
    type_comparison["total_margin"] / type_comparison["total_revenue"] * 100
).round(1)

type_comparison["avg_revenue_per_project"] = (
    type_comparison["total_revenue"] / type_comparison["project_count"]
).round(0)

type_comparison["avg_hours_per_project"] = (
    type_comparison["total_hours"] / type_comparison["project_count"]
).round(1)

print("\n=== Retainer vs. Project-Based Comparison ===")
print(type_comparison[[
    "client_type", "client_count", "project_count",
    "total_revenue", "blended_margin_pct",
    "avg_revenue_per_project", "avg_hours_per_project", "avg_hourly_rate"
]].to_string(index=False))

Step 6: Margin by Project Type

project_type_summary = project_data.groupby("project_type").agg(
    project_count=("project_id", "count"),
    avg_hours=("hours_billed", "mean"),
    avg_revenue=("revenue", "mean"),
    avg_margin_pct=("margin_pct", "mean"),
    total_revenue=("revenue", "sum"),
).round(2).reset_index()

project_type_summary = project_type_summary.sort_values("avg_margin_pct", ascending=False)

print("\n=== Margin by Project Type ===")
print(project_type_summary.to_string(index=False))

Step 7: Quarterly Revenue Trend

quarterly = project_data.groupby("quarter").agg(
    project_count=("project_id", "count"),
    total_hours=("hours_billed", "sum"),
    total_revenue=("revenue", "sum"),
    avg_margin_pct=("margin_pct", "mean"),
).round(2).reset_index()

quarterly["qoq_growth_pct"] = (
    quarterly["total_revenue"].pct_change() * 100
).round(1)

print("\n=== Quarterly Revenue Trend ===")
print(quarterly.to_string(index=False))

Step 8: Client Tenure and Project Volume Relationship

Maya wonders: do clients give her more work as the relationship matures? She uses the relationship start date to compute tenure.

reference_date = pd.Timestamp("2024-12-31")
clients["tenure_months"] = (
    (reference_date - clients["relationship_start"]) / pd.Timedelta(days=30.44)
).round(0).astype(int)

# Merge tenure into client summary
client_summary_v2 = pd.merge(
    client_summary,
    clients[["client_id", "tenure_months"]],
    on="client_id",
    how="left",
)

# Sort by tenure to see the pattern
print("\n=== Client Tenure vs. Project Count ===")
print(
    client_summary_v2[["client_name", "tenure_months", "project_count", "total_revenue"]]
    .sort_values("tenure_months", ascending=False)
    .to_string(index=False)
)

Step 9: Rate Adequacy — Are Maya's Rates Keeping Up?

# Compare Maya's average rate in early projects vs. recent projects
projects_sorted = project_data.sort_values("start_date")

# Tag projects as "early" (first half of year) vs. "recent" (second half)
cutoff = pd.Timestamp("2024-07-01")
project_data["period"] = project_data["start_date"].apply(
    lambda d: "H1 (Jan–Jun)" if d < cutoff else "H2 (Jul–Dec)"
)

rate_by_period = project_data.groupby("period").agg(
    project_count=("project_id", "count"),
    avg_rate=("hourly_rate", "mean"),
    avg_margin_pct=("margin_pct", "mean"),
    total_revenue=("revenue", "sum"),
).round(2).reset_index()

print("\n=== Rate Analysis: H1 vs. H2 ===")
print(rate_by_period.to_string(index=False))

Step 10: Build the Final Portfolio Summary

print("\n" + "=" * 65)
print("MAYA REYES CONSULTING — 2024 ANNUAL PORTFOLIO REVIEW")
print("=" * 65)

total_rev = project_data["revenue"].sum()
total_margin = project_data["gross_margin"].sum()
total_hours = project_data["hours_billed"].sum()
blended_margin = total_margin / total_rev * 100
effective_rate = total_rev / total_hours

print(f"\n  Total Revenue:       ${total_rev:>10,.0f}")
print(f"  Total Gross Margin:  ${total_margin:>10,.0f}  ({blended_margin:.1f}%)")
print(f"  Total Hours Billed:  {total_hours:>10,.0f} hrs")
print(f"  Effective Rate:      ${effective_rate:>10,.2f}/hr")
print(f"  Active Clients:      {len(client_summary):>10}")
print(f"  Projects Completed:  {len(projects):>10}")

print("\n  TOP 3 CLIENTS:")
for _, row in top_3_clients.iterrows():
    pct = row["total_revenue"] / total_rev * 100
    print(f"    {row['client_name']:<30} ${row['total_revenue']:>8,.0f}  ({pct:.1f}% of revenue)")

print("\n  STRATEGIC NOTES:")
print("  - Retainer clients show more consistent hours and predictable margin.")
print("  - Strategy projects command higher rates than Analysis projects.")
print("  - H2 rates are higher than H1 — pricing adjustments are working.")
print("  - Top 3 clients represent high revenue concentration; diversification advised.")

What Maya Discovered

The analysis surfaces several actionable insights:

Revenue concentration. Her top three clients represent a large share of her total income. This is a business risk — losing one retainer client would significantly impact her revenue. She should actively develop relationships with newer clients.

Retainer vs. project margin. Both client types have similar blended margins, but retainer clients provide revenue predictability and larger project volumes. When choosing new clients, Maya will favor retainer arrangements.

Strategy projects earn more per hour than analysis projects. Strategy engagements have higher rates and comparable margins. This guides how she positions herself in proposals.

Rate increases are working. H2 projects have higher average rates than H1, reflecting the mid-year rate increase she implemented. The market has accepted the new pricing.


Techniques Used in This Case Study

Technique Where Used
Derived columns from arithmetic revenue, total_cost, gross_margin, margin_pct
.merge() with how="left" Joining project data with client master
.groupby() with named .agg() Client portfolio summary, type comparison, quarterly trend
.nlargest() Top 3 clients by revenue
.pct_change() Quarter-over-quarter growth
.apply() with lambda H1/H2 period tagging
.sort_values() Ranking and ordering results
Computed .pct_change() Q-o-Q growth
pd.to_period() Grouping by quarter and month

Discussion Questions

  1. Maya uses projects["labor_cost"] = projects["hours_billed"] * MAYA_COST_RATE with a constant rate. In reality, some projects are more cognitively demanding. How might you model a variable cost rate? What data would you need?

  2. The top-3 client analysis uses .nlargest(3, "total_revenue"). What would happen if two clients were tied for third place? How does pandas handle ties in .nlargest()?

  3. Maya recalculates blended_margin_pct from totals (total_margin / total_revenue) rather than averaging the per-project margin_pct values. Why is this more accurate? Can you construct a simple example that shows the difference?

  4. How would you build a pivot table showing Maya's revenue by client_type (rows) and quarter (columns)?

  5. Maya wants to know whether she tends to get better margins on later projects with the same client (the "learning curve" effect). How would you add a project_sequence column (1 for first project with that client, 2 for second, etc.) and then analyze margin by sequence number?