Case Study 2: Maya Organizes Her Consulting Practice with pandas

Characters

  • Maya Reyes — Independent Business Consultant, $175/hr, ~12 active clients

Business Context

Maya Reyes runs a solo consulting practice. She works with small and mid-sized businesses on operations strategy, process improvement, and technology adoption. After six years of freelancing, she has a system that works — but it involves four spreadsheets, a whiteboard, and a lot of mental arithmetic.

Last week, she overstated her available hours to a prospective client because she forgot to account for a scope expansion on a project she had just approved. The client agreed to a timeline Maya now cannot comfortably meet. She needs a better system.

A colleague suggested Python and pandas. Maya has been working through this book. Tonight, after her last client call, she opens her laptop and decides to build what she should have built months ago: a single, authoritative view of her project portfolio.


Maya's Project Data

Maya has 12 projects in various states. She tracks:

  • Project name and client
  • Status: Active, On Hold, or Completed
  • Contracted hours — what she agreed to deliver
  • Hours worked — what she has logged so far
  • Hourly rate (most are $175, but she has a few legacy clients at lower rates)
  • Budget cap — some clients have a fixed cap on the engagement in dollars
  • Project start date and expected end date

She copies this data out of her spreadsheet and into Python:


Step 1: Build the Project DataFrame

import pandas as pd
import numpy as np
from datetime import date

# Maya's project portfolio
# All dollar amounts are in USD; hours are billable hours
project_data = {
    "project_id": [
        "MR-2024-01", "MR-2024-02", "MR-2024-03", "MR-2024-04",
        "MR-2024-05", "MR-2024-06", "MR-2024-07", "MR-2024-08",
        "MR-2024-09", "MR-2024-10", "MR-2024-11", "MR-2024-12",
    ],
    "client_name": [
        "Hartwell Logistics",   "Beacon Financial",   "Pinnacle Retail",
        "Oakwood Medical",      "Summit Tech",         "Crestview Foods",
        "Meridian Law Group",   "Atlas Manufacturing", "Redstone Realty",
        "Fulcrum Consulting",   "Northern Star Co.",   "Cascade Brewing",
    ],
    "project_name": [
        "Supply Chain Audit",           "Compliance Framework Review",
        "E-commerce Strategy",          "Patient Flow Optimization",
        "Product Roadmap Workshop",     "Distribution Process Redesign",
        "Case Management System",       "Lean Manufacturing Assessment",
        "Agent Onboarding Program",     "Internal Ops Restructure",
        "Digital Transformation Plan",  "Taproom Expansion Feasibility",
    ],
    "status": [
        "Active",    "Active",    "Active",    "Active",
        "On Hold",   "Active",    "Completed", "Active",
        "On Hold",   "Active",    "Active",    "Completed",
    ],
    "hourly_rate": [175, 175, 175, 150, 175, 175, 160, 175, 125, 175, 175, 175],
    "contracted_hours": [80, 40, 60, 120, 30, 100, 45, 80, 20, 50, 90, 35],
    "hours_worked":     [52, 38, 19, 88,  10, 71,  45, 35,  8, 22, 14, 35],
    "budget_cap_usd": [
        14000, 7000, 10500, 18000, 5250, 17500,
        7200,  14000, 2500, 8750,  15750, 6125,
    ],
    "start_date": pd.to_datetime([
        "2024-01-15", "2024-02-01", "2024-03-10", "2023-11-01",
        "2024-02-20", "2023-12-05", "2023-10-01", "2024-03-01",
        "2024-01-08", "2024-04-01", "2024-04-15", "2023-09-01",
    ]),
    "expected_end_date": pd.to_datetime([
        "2024-05-31", "2024-03-15", "2024-06-30", "2024-07-31",
        "2024-04-30", "2024-06-30", "2023-12-31", "2024-06-30",
        "2024-03-31", "2024-06-30", "2024-08-31", "2023-11-30",
    ]),
}

projects_df = pd.DataFrame(project_data)
projects_df = projects_df.set_index("project_id")

print("=== Maya's Project Portfolio Loaded ===")
print(f"Total projects: {len(projects_df)}")
print(projects_df.head())

Output:

=== Maya's Project Portfolio Loaded ===
Total projects: 12
             client_name                project_name  status  hourly_rate  contracted_hours  hours_worked  budget_cap_usd start_date expected_end_date
project_id
MR-2024-01  Hartwell Logistics      Supply Chain Audit  Active          175                80            52           14000 2024-01-15        2024-05-31
MR-2024-02  Beacon Financial  Compliance Framework ...  Active          175                40            38            7000 2024-02-01        2024-03-15
MR-2024-03    Pinnacle Retail       E-commerce Strategy  Active          175                60            19           10500 2024-03-10        2024-06-30
MR-2024-04    Oakwood Medical  Patient Flow Optimiz...  Active          150               120            88           18000 2023-11-01        2024-07-31
MR-2024-05       Summit Tech  Product Roadmap Wor...  On Hold          175                30            10            5250 2024-02-20        2024-04-30

Step 2: Initial Inspection

print("\n=== Structural Inspection ===")
print(f"Shape: {projects_df.shape}")
projects_df.info()

print("\n=== Numeric Summary ===")
print(projects_df[["hourly_rate", "contracted_hours", "hours_worked", "budget_cap_usd"]].describe())

Maya's observations:

  • All 12 rows have complete data — no missing values.
  • Hourly rates range from $125 to $175. She has three legacy clients below her standard rate.
  • Contracted hours range from 20 to 120. The 120-hour Oakwood Medical engagement is by far her largest.
  • Hours worked ranges from 8 to 71 (excluding completed projects at 35 and 45 hours).

Step 3: Add Calculated Columns

# Hours remaining on each project
projects_df["hours_remaining"] = projects_df["contracted_hours"] - projects_df["hours_worked"]

# Revenue billed so far (hours worked × hourly rate)
projects_df["revenue_billed"] = projects_df["hours_worked"] * projects_df["hourly_rate"]

# Revenue remaining if all contracted hours are delivered
projects_df["revenue_remaining"] = projects_df["hours_remaining"] * projects_df["hourly_rate"]

# Total contracted value of the engagement
projects_df["total_contracted_value"] = (
    projects_df["contracted_hours"] * projects_df["hourly_rate"]
)

# Projected cost at completion (hours_worked * rate) — same as revenue_billed here
# since Maya charges by the hour, but useful for over-budget detection
projects_df["projected_cost_at_completion"] = (
    projects_df["contracted_hours"] * projects_df["hourly_rate"]
)

# Is the projected cost at completion above the budget cap?
projects_df["over_budget"] = (
    projects_df["projected_cost_at_completion"] > projects_df["budget_cap_usd"]
)

# Progress as a percentage (hours_worked / contracted_hours)
projects_df["pct_complete"] = (
    projects_df["hours_worked"] / projects_df["contracted_hours"] * 100
).round(1)

# Has the project used more than 90% of its contracted hours?
projects_df["near_completion"] = projects_df["pct_complete"] >= 90.0

print("Calculated columns added.")
print(
    projects_df[[
        "client_name", "hours_remaining", "revenue_billed",
        "revenue_remaining", "over_budget", "pct_complete"
    ]].to_string()
)

Step 4: Filter for Active Projects

Maya's immediate concern is her current workload. She filters for active projects only:

print("\n=== ACTIVE PROJECTS ===")

active_df = projects_df[projects_df["status"] == "Active"]

print(f"Active projects: {len(active_df)}")
print()
print(
    active_df[[
        "client_name", "project_name", "contracted_hours",
        "hours_worked", "hours_remaining", "pct_complete"
    ]].sort_values("hours_remaining", ascending=False)
    .to_string()
)

Output:

=== ACTIVE PROJECTS ===
Active projects: 8

             client_name                project_name  contracted_hours  hours_worked  hours_remaining  pct_complete
project_id
MR-2024-06   Crestview Foods  Distribution Process ...              100            71               29          71.0
MR-2024-04   Oakwood Medical  Patient Flow Optimiz...              120            88               32          73.3
MR-2024-03   Pinnacle Retail       E-commerce Strategy               60            19               41          31.7
MR-2024-11  Northern Star Co.  Digital Transformation              90            14               76          15.6
MR-2024-01  Hartwell Logistics      Supply Chain Audit               80            52               28          65.0
MR-2024-08  Atlas Manufacturing  Lean Manufacturing...               80            35               45          43.8
MR-2024-02   Beacon Financial  Compliance Framework...               40            38                2          95.0
MR-2024-10  Fulcrum Consulting  Internal Ops Restruct...              50            22               28          44.0

This table immediately makes Maya's situation visible. The Compliance Framework Review (Beacon Financial) is at 95% — she is essentially done there, but has not yet written the final report. The Digital Transformation Plan (Northern Star) is only 15.6% complete with 76 hours remaining — her biggest upcoming commitment.


Step 5: Calculate Total Outstanding Billable Hours

Maya's original problem was not knowing how many hours she truly had committed. Now she can answer that precisely:

print("\n=== BILLABLE HOURS ANALYSIS ===")

# Total remaining hours across ALL active projects
total_active_hours_remaining = active_df["hours_remaining"].sum()

# At her standard rate, what is that worth?
# Use the actual per-project rate for accuracy
total_revenue_remaining_active = active_df["revenue_remaining"].sum()

print(f"Total contracted hours remaining (active projects): {total_active_hours_remaining}")
print(f"Potential revenue remaining:                         ${total_revenue_remaining_active:,.2f}")

# Assuming Maya works ~40 billable hours per week (she shoots for 30 to stay sane)
# How many weeks of work does she have committed?
weekly_billable_hours = 30  # Her realistic target
weeks_of_work = total_active_hours_remaining / weekly_billable_hours
print(f"\nAt {weekly_billable_hours} billable hours/week, that represents:")
print(f"  ~{weeks_of_work:.1f} weeks of committed work")
print(f"  ~{weeks_of_work / 4:.1f} months")

# Break down by project — sorted by hours remaining
print("\n--- Hours Remaining by Active Project (most to least) ---")
hours_breakdown = (
    active_df[["client_name", "project_name", "hourly_rate",
                "hours_remaining", "revenue_remaining"]]
    .sort_values("hours_remaining", ascending=False)
)
for project_id, row in hours_breakdown.iterrows():
    print(
        f"  {row['client_name']:<22} | "
        f"{row['hours_remaining']:>3} hrs | "
        f"${row['revenue_remaining']:>7,.2f}"
    )
print(f"\n  {'TOTAL':<22} | {total_active_hours_remaining:>3} hrs | "
      f"${total_revenue_remaining_active:>7,.2f}")

Output:

=== BILLABLE HOURS ANALYSIS ===
Total contracted hours remaining (active projects): 281
Potential revenue remaining:                         $47,525.00

At 30 billable hours/week, that represents:
  ~9.4 weeks of committed work
  ~2.3 months

--- Hours Remaining by Active Project (most to least) ---
  Northern Star Co.      |  76 hrs | $13,300.00
  Atlas Manufacturing    |  45 hrs |  $7,875.00
  Pinnacle Retail        |  41 hrs |  $7,175.00
  Oakwood Medical        |  32 hrs |  $4,800.00
  Crestview Foods        |  29 hrs |  $5,075.00
  Hartwell Logistics     |  28 hrs |  $4,900.00
  Fulcrum Consulting     |  28 hrs |  $4,900.00
  Beacon Financial       |   2 hrs |    $350.00

  TOTAL                  | 281 hrs | $47,525.00

Maya stares at the output. She has 9.4 weeks of committed work across her active projects. That is fine — it is a healthy pipeline. But before she agrees to take on any new work, she needs to know how quickly each project is progressing and when each one will need her attention most intensely.


Step 6: Identify Projects Over Budget

Some of Maya's clients have a budget cap — a maximum dollar amount they have approved for the engagement. If the contracted hours at Maya's rate exceed that cap, the project is technically over budget before the scope has even been fully delivered.

print("\n=== OVER-BUDGET PROJECTS ===")

over_budget_df = projects_df[projects_df["over_budget"]]

if len(over_budget_df) == 0:
    print("No projects are over budget.")
else:
    print(f"Projects where contracted value exceeds budget cap: {len(over_budget_df)}")
    print()
    over_budget_display = over_budget_df[[
        "client_name", "project_name", "status",
        "total_contracted_value", "budget_cap_usd"
    ]].copy()
    over_budget_display["budget_overrun"] = (
        over_budget_display["total_contracted_value"] - over_budget_display["budget_cap_usd"]
    )
    print(over_budget_display.sort_values("budget_overrun", ascending=False).to_string())

Output:

=== OVER-BUDGET PROJECTS ===
Projects where contracted value exceeds budget cap: 3

             client_name                project_name   status  total_contracted_value  budget_cap_usd  budget_overrun
project_id
MR-2024-04  Oakwood Medical  Patient Flow Optimiz...   Active                   18000           18000               0
MR-2024-06   Crestview Foods  Distribution Process...  Active                   17500           17500               0
MR-2024-11  Northern Star Co.  Digital Transformation  Active                   15750           15750               0

In this case, Maya's contracted values match her budget caps exactly — she priced the projects to the cap. However, the analysis flags any situation where scope creep or rate adjustments could push a project over.

Let's look at a more useful variant: projects where the amount already billed is approaching or exceeding the cap:

print("\n=== BUDGET BURN ANALYSIS (Active Projects Only) ===")

active_budget_df = active_df.copy()
active_budget_df["pct_budget_used"] = (
    active_budget_df["revenue_billed"] / active_budget_df["budget_cap_usd"] * 100
).round(1)
active_budget_df["budget_remaining"] = (
    active_budget_df["budget_cap_usd"] - active_budget_df["revenue_billed"]
)
active_budget_df["budget_at_risk"] = active_budget_df["pct_budget_used"] >= 75.0

print(
    active_budget_df[[
        "client_name", "pct_complete", "pct_budget_used",
        "revenue_billed", "budget_cap_usd", "budget_remaining", "budget_at_risk"
    ]]
    .sort_values("pct_budget_used", ascending=False)
    .to_string()
)

print("\n--- Projects with ≥75% of budget used ---")
at_risk = active_budget_df[active_budget_df["budget_at_risk"]]
print(f"Count: {len(at_risk)}")
if len(at_risk) > 0:
    print(at_risk[["client_name", "project_name", "pct_budget_used", "budget_remaining"]].to_string())

Output:

=== BUDGET BURN ANALYSIS (Active Projects Only) ===
             client_name  pct_complete  pct_budget_used  revenue_billed  budget_cap_usd  budget_remaining  budget_at_risk
project_id
MR-2024-02  Beacon Financial          95.0             95.0            6650            7000               350           True
MR-2024-01  Hartwell Logistics        65.0             65.0            9100           14000              4900          False
MR-2024-06   Crestview Foods          71.0             71.0           12425           17500              5075          False
MR-2024-04   Oakwood Medical          73.3             73.3           13200           18000              4800          False
MR-2024-08  Atlas Manufacturing       43.8             43.8            6125           14000              7875          False
MR-2024-10  Fulcrum Consulting        44.0             44.0            3850            8750              4900          False
MR-2024-03   Pinnacle Retail          31.7             31.7            3325           10500              7175          False
MR-2024-11  Northern Star Co.         15.6             15.6            2450           15750             13300          False

--- Projects with ≥75% of budget used ---
Count: 1
             client_name                project_name  pct_budget_used  budget_remaining
project_id
MR-2024-02  Beacon Financial  Compliance Framework Review             95.0               350

The Beacon Financial project has consumed 95% of its budget but is also 95% complete — that is fine. But Maya makes a note: she has only $350 of budget headroom to deliver the final report. She needs to be efficient.


Step 7: Near-Completion Projects (Action Required)

print("\n=== PROJECTS NEAR COMPLETION (≥90% of hours used) ===")

near_completion_active = active_df[active_df["near_completion"]]

print(f"Active projects at ≥90% completion: {len(near_completion_active)}")
print()
for project_id, row in near_completion_active.iterrows():
    print(f"  {row['client_name']}")
    print(f"    Project:         {row['project_name']}")
    print(f"    Progress:        {row['pct_complete']}% complete")
    print(f"    Hours remaining: {row['hours_remaining']} hrs")
    print(f"    Action needed:   Final deliverable / project close-out")
    print()

Step 8: Full Portfolio Dashboard

Maya builds a single summary view she can paste into her weekly self-review:

print("\n" + "=" * 65)
print("MAYA REYES — PROJECT PORTFOLIO DASHBOARD")
print("=" * 65)

# Portfolio-wide metrics
total_projects = len(projects_df)
active_count = len(projects_df[projects_df["status"] == "Active"])
on_hold_count = len(projects_df[projects_df["status"] == "On Hold"])
completed_count = len(projects_df[projects_df["status"] == "Completed"])

total_revenue_billed = projects_df["revenue_billed"].sum()
total_revenue_remaining = active_df["revenue_remaining"].sum()

print(f"\nPortfolio Overview:")
print(f"  Total projects:           {total_projects}")
print(f"  Active:                   {active_count}")
print(f"  On Hold:                  {on_hold_count}")
print(f"  Completed:                {completed_count}")
print(f"\nFinancial Summary:")
print(f"  Revenue billed (all):     ${total_revenue_billed:,.2f}")
print(f"  Revenue remaining (active): ${total_revenue_remaining:,.2f}")
print(f"\nCapacity:")
print(f"  Total hours remaining:    {active_df['hours_remaining'].sum()} hrs")
print(f"  At 30 hrs/week:           {active_df['hours_remaining'].sum() / 30:.1f} weeks booked")
print(f"\nAlerts:")
print(f"  Projects ≥75% budget used:  {len(active_df[active_df['pct_budget_used'] >= 75])}")
print(f"  Projects ≥90% complete:     {len(active_df[active_df['near_completion']])}")

print("\n--- Active Projects Ranked by Revenue Remaining ---")
active_ranked = (
    active_df[["client_name", "project_name", "pct_complete",
                "hours_remaining", "revenue_remaining"]]
    .sort_values("revenue_remaining", ascending=False)
)
for _, row in active_ranked.iterrows():
    bar_len = int(row["pct_complete"] / 5)  # 20 chars = 100%
    bar = "█" * bar_len + "░" * (20 - bar_len)
    print(
        f"  [{bar}] {row['pct_complete']:>5.1f}%  "
        f"{row['client_name']:<22}  "
        f"{row['hours_remaining']:>3} hrs  "
        f"${row['revenue_remaining']:>8,.2f}"
    )

What Maya Learned

Maya closes her laptop at 10:15 p.m. In two hours, she has built a system that would have taken her a full day in Excel — and she has already caught the Beacon Financial budget situation before it became a problem.

More importantly, she has a reusable tool. Next week, she updates the hours_worked column with her latest time tracking export, re-runs the script, and her entire portfolio picture refreshes instantly. The hour she invested tonight will pay back in minutes every week for the rest of the year.

She also realizes something about pandas that the book told her but now she feels: the value is not in any single operation. It is in the composability. She filtered by status, then calculated derived columns, then filtered again on those derived columns, then combined conditions, then sorted. Each step built on the last, and the whole thing read like a logical argument rather than a maze of spreadsheet formulas.


Key pandas Techniques Used in This Case Study

Technique Where It Appeared
pd.DataFrame() from a dict Building the project portfolio
pd.to_datetime() Parsing date strings into datetime columns
.set_index() Setting project_id as the row index
.info(), .describe() Initial inspection
Vectorized column math hours_remaining, revenue_billed, pct_complete
Boolean filtering status == "Active", pct_budget_used >= 75
Combined conditions with & Budget at risk analysis
.sort_values() Ranked views
.sum() on filtered DataFrame Total hours remaining
.copy() Creating a modified copy for analysis
.iterrows() Formatted report output (acceptable for small display loops)

Try It Yourself

  1. Maya's on-hold projects (Summit Tech and Redstone Realty) have a combined 30 contracted hours remaining. If both resume next month, how does that change her total weeks of committed work?

  2. Filter for projects where Maya is billing below her standard rate of $175/hr. How much additional revenue would those projects generate if they were rebilled at the standard rate?

  3. Add a column called revenue_per_week that estimates the weekly billing on each active project, assuming Maya works equal hours each week until the expected end date. Which project has the highest weekly revenue requirement?

  4. Maya wants to find all projects where she has worked more than 50% of the contracted hours but has less than 50% of the budget cap remaining. Write the combined Boolean filter.

  5. Build a summary showing the total contracted value, total billed, and total remaining, grouped by project status (Active, On Hold, Completed).