Case Study 15-2: Maya's Revenue Heatmap and Hours-vs-Revenue Scatter

Background

Maya Reyes runs her consulting practice with a mix of retainer clients (fixed monthly fee), hourly clients (billed at $175/hr), and occasional fixed-fee project clients (charged a flat amount regardless of hours). After three years in business, she has 12 active or recently-completed clients across five service categories: Strategy, Operations, Financial Modeling, Marketing, and Training.

Every month, Maya spends about two hours extracting numbers from her project tracker into a rough Excel pivot table, staring at it, and trying to make sense of her portfolio. Which clients are generating the most revenue? Which months are slow? Are her hourly clients actually more profitable than her retainer clients when she accounts for actual hours?

In Chapter 9 she started tracking projects in maya_projects.csv. By Chapter 13 she had clean aggregation functions. This chapter: she visualizes the data.

She opens her terminal and installs seaborn, which she hasn't used yet:

pip install seaborn

Then she starts with the question she asks herself most often: "Which clients gave me money in which months?"

Part 1: The Client × Month Revenue Heatmap

Building the Data

Maya's maya_projects.csv has one row per project. For the heatmap, she needs a pivot: rows = clients, columns = months, values = revenue invoiced that month.

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

# Load project data
projects_df = pd.read_csv("maya_projects.csv", parse_dates=["invoice_date"])

# Keep only invoiced or completed projects with a known invoice date
billed_df = projects_df[
    projects_df["status"].isin(["Invoiced", "Completed"])
    & projects_df["invoice_date"].notna()
].copy()

# Extract month label for pivot
billed_df["month"] = billed_df["invoice_date"].dt.strftime("%b %Y")

# Build pivot: clients × months
revenue_pivot = billed_df.pivot_table(
    index="client_name",
    columns="month",
    values="billed_amount",
    aggfunc="sum",
    fill_value=0,
)

# Sort columns chronologically
all_months = pd.date_range("2023-01-01", "2023-12-01", freq="MS")
month_labels = [m.strftime("%b %Y") for m in all_months]
revenue_pivot = revenue_pivot.reindex(columns=month_labels, fill_value=0)

# Sort rows by total annual revenue (largest clients at top)
revenue_pivot["_total"] = revenue_pivot.sum(axis=1)
revenue_pivot = revenue_pivot.sort_values("_total", ascending=False).drop(
    columns="_total"
)

print(revenue_pivot.shape)   # Should be (12 clients, 12 months)

The fill_value=0 is important. Many client-month combinations have no revenue — the client exists but wasn't billed that month. Without the fill, those cells would be NaN, which seaborn's heatmap would render as a gray "missing data" patch rather than a meaningful zero.

Drawing the Heatmap

sns.set_theme(style="white", font_scale=0.95)

fig, ax = plt.subplots(figsize=(16, 7))

# Build annotation matrix in thousands
annot_matrix = revenue_pivot.map(
    lambda v: f"${v / 1000:.1f}K" if v > 0 else "—"
)

sns.heatmap(
    data=revenue_pivot,
    annot=annot_matrix,
    fmt="",
    cmap="YlOrRd",        # Yellow (low) → orange → red (high)
    linewidths=0.8,
    linecolor="white",
    cbar_kws={
        "label": "Monthly Revenue ($)",
        "shrink": 0.75,
        "format": "${x:,.0f}",
    },
    ax=ax,
)

ax.set_title(
    "Revenue by Client × Month — 2023",
    fontsize=15,
    fontweight="bold",
    pad=14,
)
ax.set_ylabel("Client", labelpad=10)
ax.set_xlabel("Month", labelpad=10)
ax.tick_params(axis="x", rotation=30)
ax.tick_params(axis="y", rotation=0)

# Add total revenue column annotation on the right
for i, (client, row) in enumerate(revenue_pivot.iterrows()):
    total = row.sum()
    ax.text(
        x=12.3,
        y=i + 0.5,
        s=f"${total / 1000:.0f}K",
        va="center",
        ha="left",
        fontsize=9,
        color="#333333",
    )

ax.text(12.3, -0.35, "Total", ha="left", va="top", fontsize=9, fontweight="bold")

plt.tight_layout()
plt.savefig("maya_client_revenue_heatmap.png", dpi=150, bbox_inches="tight")
plt.show()

What Maya Saw

The heatmap revealed three things immediately.

First: one client, Harrington & Associates, had zero revenue in April, May, and June. Maya had mentally categorized them as a "steady" client. Looking at the heatmap, she remembered — there had been a contract renewal delay. Those three gray dashes cost her nearly $16,000 in delayed revenue. She hadn't registered that as a pattern before; she had registered it as three separate months of "things were slow."

Second: her two largest clients by annual total were concentrated in Q4. That explained why November and December always felt flush and February felt dire. The heatmap made the seasonal pattern visible in a way her month-by-month Excel review never had.

Third: four smaller clients had sporadic revenue — appearing in only two or three months of the year. Maya had been treating these as "active clients." Looking at the red-and-yellow heat map, they barely registered. They were project-based relationships masquerading as ongoing ones. Time to re-classify them — and think about whether to actively pursue repeat work or let them stay dormant.

She saved the chart as maya_client_revenue_heatmap.png and uploaded it to her business review folder.

Part 2: Hours vs Revenue Scatter by Project Type

The second question Maya had was subtler: are my fixed-fee projects actually more profitable than my hourly projects on a per-hour basis?

Her hypothesis: fixed-fee projects often run over the estimated hours because scope creep is common. If she charges $12,000 fixed and spends 90 hours instead of 60, her effective rate is $133/hr — significantly below her $175/hr standard. The heatmap didn't address this. A scatter plot would.

Building the Data

# Use project-level data — each row is one project
project_scatter_df = projects_df[
    projects_df["status"].isin(["Invoiced", "Completed"])
    & projects_df["actual_hours"].notna()
    & projects_df["billed_amount"].notna()
].copy()

# Compute effective hourly rate
project_scatter_df["effective_rate"] = (
    project_scatter_df["billed_amount"] / project_scatter_df["actual_hours"]
)

# Label billing type
project_scatter_df["billing_type"] = project_scatter_df["billing_type"].map({
    "hourly": "Hourly ($175/hr)",
    "retainer": "Monthly Retainer",
    "fixed": "Fixed Fee",
})

print(project_scatter_df[
    ["project_name", "actual_hours", "billed_amount", "effective_rate", "billing_type"]
].head(10))

Drawing the Interactive Scatter with plotly

Maya wanted to share this chart with her business coach, David, who was going to review her Q4 strategy with her. A static seaborn scatter plot would have worked for internal analysis, but David had asked for something he could look at on his tablet during their call. She switched to plotly for this chart.

import plotly.express as px

fig = px.scatter(
    project_scatter_df,
    x="actual_hours",
    y="billed_amount",
    color="billing_type",
    hover_data={
        "project_name": True,
        "client_name": True,
        "effective_rate": ":$.2f",
        "actual_hours": ":.1f",
        "billed_amount": ":$,.0f",
        "billing_type": False,   # Already encoded by color
    },
    title="Hours vs Revenue by Project Type — Maya Reyes Consulting (2023)",
    labels={
        "actual_hours": "Actual Hours Spent",
        "billed_amount": "Revenue Billed ($)",
        "billing_type": "Billing Type",
    },
    color_discrete_map={
        "Hourly ($175/hr)": "#1565C0",
        "Monthly Retainer": "#2E7D32",
        "Fixed Fee": "#B71C1C",
    },
    size="effective_rate",       # Dot size = effective hourly rate
    size_max=18,
    opacity=0.75,
)

# Add a reference line at $175/hr (standard rate)
# If a dot lies on this line, the project hit exactly the standard rate
import numpy as np
max_hours = project_scatter_df["actual_hours"].max() * 1.05
x_ref = np.array([0, max_hours])
y_ref = 175 * x_ref

fig.add_scatter(
    x=x_ref,
    y=y_ref,
    mode="lines",
    name="$175/hr Reference",
    line={"color": "#9E9E9E", "dash": "dot", "width": 1.5},
    hoverinfo="skip",
)

fig.add_annotation(
    x=max_hours * 0.85,
    y=175 * max_hours * 0.85 + 1500,
    text="$175/hr rate",
    showarrow=False,
    font={"size": 10, "color": "#9E9E9E"},
)

fig.update_layout(
    xaxis_title="Actual Hours Spent",
    yaxis_title="Revenue Billed ($)",
    yaxis_tickformat="$,.0f",
    hovermode="closest",
    plot_bgcolor="white",
    paper_bgcolor="#FAFAFA",
    legend={
        "title": "Billing Type",
        "orientation": "v",
        "x": 1.01,
        "y": 0.5,
    },
    font={"family": "Arial", "size": 12},
    height=580,
    width=900,
)

fig.write_html(
    "maya_hours_vs_revenue.html",
    include_plotlyjs=True,
    config={"responsive": True},
)
fig.show()

The $175/hr Reference Line

The reference line is the key design decision in this chart. Without it, the scatter plot shows clusters and trends. With it, every dot above the line represents a project where Maya earned more than $175/hr effective; every dot below is one where she earned less.

Adding a reference line to a plotly figure uses fig.add_scatter() with mode="lines". This adds a new trace to the figure that happens to be a line rather than points. The line is computed manually from two points — (0, 0) and (max_hours, 175 × max_hours) — which is just the equation rate = 175.

The hoverinfo="skip" prevents the reference line from generating hover tooltips. It's a visual reference, not a data series.

What the Scatter Revealed

Maya looked at the final chart for a long time before calling David.

The hourly projects clustered tightly around the reference line, as expected — hourly billing is straightforward. Most fell between $160 and $195 per effective hour, reflecting small rounding and scope adjustments.

The retainer projects were scattered above the line. A good retainer client is one where the work scope is predictable and the client doesn't over-consume. Three of her retainer clients consistently generated projects above the line. Two were below — those clients called frequently, requested revisions, and generally consumed more time than the retainer fee justified.

The fixed-fee projects were all over the chart. Two were substantially above the line (well-scoped, delivered quickly). Four were meaningfully below it, including one — a training curriculum she'd spent 68 hours on for a $7,200 fixed fee — that had barely cleared $106/hr. That project had seemed well-priced at the time. The scatter plot said otherwise.

"David," Maya said when he called, "I think I have a fixed-fee pricing problem."

She shared her screen and walked him through the chart. By the end of the call she had decided to raise her fixed-fee minimum and build a more rigorous scoping process. The insight from the scatter plot directly influenced how she priced her next three proposals.

Seaborn vs Plotly: Maya's Workflow

By the end of this analysis, Maya had settled into a natural rhythm with the two libraries:

Task	Tool	Why
Quick exploratory look at distributions	seaborn	Faster to write, good defaults
Chart for internal documentation	seaborn → PNG	Clean, publication-quality static image
Chart to share with client or advisor	plotly → HTML	Interactive hover; they can explore it themselves
Chart for a printed report or PDF	seaborn or matplotlib	Static formats for static output

The heatmap was internal — she was the only audience. seaborn was right for that. The scatter plot was going to David, who would explore it on his own. plotly was right for that.

Knowing which tool fits which situation is more valuable than being expert in either one alone.

This case study demonstrates: pivot_table with fill_value, sns.heatmap with custom annotation text, the YlOrRd color map, px.scatter with hover_data, size encoding, fig.add_scatter() for reference lines, hoverinfo="skip", and the workflow decision between seaborn and plotly.