Case Study 26-2: Maya Forecasts Her Revenue to Make a Major Business Decision

Characters: - Maya Reyes — Freelance Business Consultant, 3 years in business

Business Question: What is Maya's revenue likely to be over the next 6 months?

Decision at Stake: Whether to sign a 12-month lease on a coworking office space at $1,200/month — a $14,400 annual commitment.

The Situation

Maya Reyes had been running her consulting practice from home for three years. Her revenue had grown, her client roster was stable at 9 active clients, and she was starting to feel the friction of working from her spare bedroom. A coworking space two miles from her house had opened a new location and was offering a discounted 12-month membership.

The cost: $1,200/month for a dedicated desk.

Maya had the money to sign today. The question was whether her revenue trajectory over the next 12 months justified the fixed cost — particularly over the next 6 months, which would determine whether she had the flexibility to grow her team or needed to stay lean.

She opened her maya_projects.csv and her billing records and started the analysis.

Maya's Data

Maya's data included two sources:

maya_projects.csv: Active and recently completed projects with completion percentages, hourly rates, and deadlines
Historical billing data: 30 months of monthly revenue she had tracked in a spreadsheet

Her billing history showed: - Clear upward trend over 30 months - Some seasonality: slower in December–January (holiday client slowdowns), strong in March–April (Q1 planning work) - High month-to-month variability — consulting revenue is lumpy by nature

The Two-Track Forecast

Maya recognized that she had two independent signals she could combine:

Signal 1: Historical trend extrapolation — what does her 30-month revenue history say about the next 6 months?

Signal 2: Pipeline-based forecast — what does her current project list plus historical new business win rate say about the next 6 months?

"""
maya_revenue_forecast.py
Two-track forecast: historical trend + pipeline-based projection
"""

import pandas as pd
import numpy as np
from scipy import stats
from datetime import date, datetime
from dateutil.relativedelta import relativedelta
import matplotlib.pyplot as plt


# ---------------------------------------------------------------------------
# Track 1: Historical Trend Extrapolation
# ---------------------------------------------------------------------------

def load_billing_history(csv_path: str) -> pd.DataFrame:
    """
    Load Maya's monthly billing history.

    Expected columns: month (YYYY-MM), revenue.

    Args:
        csv_path: Path to historical billing CSV.

    Returns:
        pd.DataFrame: Sorted billing history with datetime month column.
    """
    df = pd.read_csv(csv_path)
    df["month"] = pd.to_datetime(df["month"])
    df = df.sort_values("month").reset_index(drop=True)
    return df


def historical_trend_forecast(
    billing_df: pd.DataFrame,
    months_ahead: int = 6,
    confidence_level: float = 0.90,
) -> dict:
    """
    Forecast revenue based on linear trend in historical billing data.

    Uses scipy.stats.linregress for the trend and computes confidence
    bands based on historical residuals.

    Args:
        billing_df: DataFrame with 'month' and 'revenue' columns.
        months_ahead: Number of future months to forecast.
        confidence_level: Confidence level for bands (default 0.90 = 90%).

    Returns:
        dict with 'trend_stats', 'forecast_months', 'monthly_growth_pct'.
    """
    from scipy.stats import norm

    ordinals = billing_df["month"].map(pd.Timestamp.toordinal).values
    revenues = billing_df["revenue"].values

    slope, intercept, r_value, p_value, std_err = stats.linregress(ordinals, revenues)

    fitted = slope * ordinals + intercept
    residuals = revenues - fitted
    residual_std = float(np.std(residuals, ddof=1))

    z = norm.ppf((1 + confidence_level) / 2)
    avg_revenue = float(revenues.mean())
    monthly_growth = (slope * 30.44) / avg_revenue * 100  # ~30.44 days per month

    last_month = billing_df["month"].max()
    forecast_months = []

    for i in range(1, months_ahead + 1):
        future_month = last_month + relativedelta(months=i)
        future_ordinal = future_month.toordinal()
        point = slope * future_ordinal + intercept
        margin = z * residual_std * np.sqrt(i)

        forecast_months.append({
            "month": future_month.strftime("%Y-%m"),
            "point_forecast": round(max(0, point), 2),
            "lower_bound": round(max(0, point - margin), 2),
            "upper_bound": round(point + margin, 2),
            "track": "historical_trend",
        })

    return {
        "trend_stats": {
            "r_squared": round(r_value ** 2, 3),
            "monthly_growth_pct": round(monthly_growth, 2),
            "p_value": round(p_value, 4),
            "residual_std": round(residual_std, 2),
        },
        "forecast_months": forecast_months,
    }


# ---------------------------------------------------------------------------
# Track 2: Pipeline-Based Forecast
# ---------------------------------------------------------------------------

def pipeline_revenue_forecast(
    projects_df: pd.DataFrame,
    months_ahead: int = 6,
    new_client_monthly_probability: float = 0.4,
    new_client_avg_monthly_revenue: float = 3500.0,
) -> list[dict]:
    """
    Estimate future revenue from the current project pipeline plus expected
    new business wins.

    Args:
        projects_df: DataFrame from maya_projects.csv.
        months_ahead: Number of months to forecast.
        new_client_monthly_probability: Probability of winning at least one
                                         new client per month (default 0.40).
        new_client_avg_monthly_revenue: Expected monthly revenue from a new
                                         client engagement.

    Returns:
        List of dicts with month, committed_revenue, expected_new_revenue,
        total_estimated_revenue.
    """
    projects_df = projects_df.copy()
    projects_df["start_date"] = pd.to_datetime(projects_df["start_date"])
    projects_df["deadline"] = pd.to_datetime(projects_df["deadline"])

    today = date.today()
    results = []

    for month_offset in range(months_ahead):
        month_start = date(today.year, today.month, 1) + relativedelta(months=month_offset)
        month_end = month_start + relativedelta(months=1) - relativedelta(days=1)
        month_label = month_start.strftime("%Y-%m")

        # Projects active during this month
        active_projects = projects_df[
            (projects_df["start_date"].dt.date <= month_end)
            & (projects_df["deadline"].dt.date >= month_start)
            & (projects_df["completion_percent"] < 100)
            & (~projects_df.get("on_hold", pd.Series([False] * len(projects_df))).astype(bool))
        ]

        # Estimate monthly revenue from active projects
        # Approach: for each project, estimate remaining hours and spread over remaining months
        committed_revenue = 0.0
        for _, project in active_projects.iterrows():
            remaining_pct = (100 - project["completion_percent"]) / 100
            total_estimated_hours = project.get("estimated_hours", 40)
            remaining_hours = remaining_pct * total_estimated_hours
            rate = project.get("hourly_rate", 175.0)

            # Remaining months for this project
            months_remaining = max(
                1,
                (project["deadline"].date() - month_start).days / 30
            )
            monthly_hours = remaining_hours / months_remaining
            committed_revenue += monthly_hours * rate

        # Expected new business
        expected_new = (
            new_client_monthly_probability * new_client_avg_monthly_revenue
        )

        results.append({
            "month": month_label,
            "active_projects": len(active_projects),
            "committed_revenue": round(committed_revenue, 2),
            "expected_new_revenue": round(expected_new, 2),
            "total_estimated_revenue": round(committed_revenue + expected_new, 2),
            "track": "pipeline",
        })

    return results


# ---------------------------------------------------------------------------
# Combining the Two Tracks
# ---------------------------------------------------------------------------

def build_combined_forecast(
    billing_history_path: str,
    projects_path: str,
    months_ahead: int = 6,
) -> pd.DataFrame:
    """
    Build Maya's combined 6-month forecast from both data sources.

    Args:
        billing_history_path: Path to historical billing CSV.
        projects_path: Path to maya_projects.csv.
        months_ahead: Number of months to forecast.

    Returns:
        pd.DataFrame: Month-by-month forecast combining both tracks.
    """
    billing_df = load_billing_history(billing_history_path)
    projects_df = pd.read_csv(projects_path, parse_dates=["start_date", "deadline"])

    # Track 1: Historical trend
    trend_result = historical_trend_forecast(billing_df, months_ahead=months_ahead)
    trend_months = pd.DataFrame(trend_result["forecast_months"])

    # Track 2: Pipeline
    pipeline_months = pd.DataFrame(
        pipeline_revenue_forecast(projects_df, months_ahead=months_ahead)
    )

    # Merge by month
    combined = trend_months.merge(
        pipeline_months[["month", "committed_revenue", "expected_new_revenue",
                         "total_estimated_revenue", "active_projects"]],
        on="month",
        how="left",
    )

    # Consensus forecast: average of trend midpoint and pipeline total
    combined["consensus_forecast"] = (
        (combined["point_forecast"] + combined["total_estimated_revenue"]) / 2
    ).round(2)

    # Agreed lower/upper: use trend confidence bands as the range
    # (pipeline gives a point estimate; trend gives the uncertainty)
    return combined[
        ["month", "point_forecast", "lower_bound", "upper_bound",
         "total_estimated_revenue", "committed_revenue",
         "consensus_forecast", "active_projects"]
    ]

Maya's Numbers

When Maya ran the analysis, she found:

Historical trend stats: - R-squared: 0.74 (trend explains 74% of monthly variation — a moderate fit) - Monthly growth rate: approximately +2.8% per month - Residual standard deviation: $4,200 (her revenue bounces around a lot)

6-Month Forecast Summary:

Month	Trend Forecast	Pipeline Estimate	Consensus
Month 1	$18,400 \| $17,200	$17,800 \| $13,100	$23,700
Month 2	$18,900 \| $16,800	$17,850 \| $11,400	$26,400
Month 3	$19,400 \| $18,500	$18,950 \| $10,100	$28,700
Month 4	$19,900 \| $19,200	$19,550 \| $8,900	$30,900
Month 5	$20,400 \| $18,000	$19,200 \| $7,800	$33,000
Month 6	$20,900 \| $19,800	$20,350 \| $6,800	$35,000

The 6-month consensus total: approximately $113,700.

The Decision Framework

Maya built a simple decision analysis on top of the forecast:

def lease_decision_analysis(
    forecast_df: pd.DataFrame,
    monthly_lease_cost: float = 1200.0,
    lease_months: int = 12,
) -> dict:
    """
    Analyze whether the lease is financially justified given the revenue forecast.

    Args:
        forecast_df: Combined forecast DataFrame.
        monthly_lease_cost: Monthly cost of the office lease.
        lease_months: Length of the lease commitment.

    Returns:
        dict with break-even analysis and scenario outcomes.
    """
    total_lease_cost = monthly_lease_cost * lease_months

    # 6-month forecast scenarios
    optimistic_6m = forecast_df["upper_bound"].sum()
    base_6m = forecast_df["consensus_forecast"].sum()
    pessimistic_6m = forecast_df["lower_bound"].sum()

    # As percentage of revenue
    lease_as_pct_base = (monthly_lease_cost / forecast_df["consensus_forecast"].mean()) * 100

    return {
        "total_lease_cost": total_lease_cost,
        "monthly_lease_cost": monthly_lease_cost,
        "lease_as_pct_revenue_base": round(lease_as_pct_base, 1),
        "6month_scenarios": {
            "optimistic": round(optimistic_6m, 2),
            "base_case": round(base_6m, 2),
            "pessimistic": round(pessimistic_6m, 2),
        },
        "monthly_revenue_needed_to_cover_lease_comfortably": monthly_lease_cost / 0.07,
        # Comfortable = lease is <7% of monthly revenue
    }


decision = lease_decision_analysis(combined_forecast)
print(f"\nLease Analysis:")
print(f"Monthly lease cost: ${decision['monthly_lease_cost']:,.0f}")
print(f"As % of base-case monthly revenue: {decision['lease_as_pct_revenue_base']:.1f}%")
print(f"\n6-Month Forecast Scenarios:")
print(f"  Optimistic: ${decision['6month_scenarios']['optimistic']:,.0f}")
print(f"  Base case:  ${decision['6month_scenarios']['base_case']:,.0f}")
print(f"  Pessimistic: ${decision['6month_scenarios']['pessimistic']:,.0f}")
print(f"\nFor the lease to be <7% of revenue, you need: "
      f"${decision['monthly_revenue_needed_to_cover_lease_comfortably']:,.0f}/month")

What the Numbers Said

At a base-case monthly revenue of ~$19,000, the $1,200/month lease represents about 6.3% of projected revenue — within the "comfortable" threshold Maya set.

Even at the pessimistic lower bound of ~$16,000/month, the lease is 7.5% of revenue — not ideal but manageable.

The 6-month pessimistic total was about $75,000. The lease for 12 months would cost $14,400. At pessimistic revenue levels, the lease represents 9.6% of her annual revenue — tight but survivable.

Maya's honest assessment of the analysis:

"The trend-based forecast assumes my current growth trajectory continues. If I lose two clients in the next six months, the pipeline estimate would drop significantly and the consensus forecast would fall. The confidence bands are wide — my monthly revenue genuinely fluctuates by $4,000–5,000 around the trend. This is a real risk.

But here's what the analysis also shows: if my trajectory continues and the pipeline is reasonable, I can afford this. And the question isn't just whether I can afford it — it's whether the office space helps me grow. That's a business judgment, not a statistical one."

The Decision

Maya signed the lease.

Her reasoning: the base-case and optimistic scenarios both showed comfortable affordability. The pessimistic scenario was manageable. And she had a three-month cash reserve that covered the lease even if revenue fell significantly.

The forecast did not make the decision for her. It gave her the financial context to make the decision with confidence rather than anxiety.

Six months later, her revenue had come in at $19,400 per month on average — right near the base-case center of her forecast. The lease was 6.2% of her monthly revenue. She had referred two other consultants to the coworking space and gained two new clients through introductions she made there.

What Made This Analysis Good

Two independent signals were combined. Neither the historical trend nor the pipeline forecast alone would have been as informative as both together. The agreement between them (consensus ~$19,000/month) was itself a data point.
Uncertainty was quantified honestly. The wide confidence bands (±$5,000–10,000) reflected the real variability in consulting revenue. Maya did not artificially narrow them.
The analysis answered the business question directly. The question was "can I afford the lease?" — so the output was lease cost as a percentage of revenue, not just raw forecast numbers.
The limitations were acknowledged. Maya explicitly noted that the trend-based model would not capture a sudden client loss.
The forecast informed a decision; it did not make one. The final call required judgment about growth strategy, personal risk tolerance, and the qualitative value of the office space — things no model can capture.

Discussion Questions

Maya used a 90% confidence interval rather than 95%. What is the trade-off between these two choices for a personal financial decision like this one?
Maya's residual standard deviation was $4,200 — quite high relative to her monthly revenue. What does this high variability suggest about the reliability of her forecast, and how should it change how she communicates the analysis to a banker or investor?
The pipeline-based forecast depends on an assumption: new_client_monthly_probability = 0.40. How would you validate or estimate this probability more rigorously from Maya's historical client acquisition data?
If Maya's revenue comes in at $14,000/month for three consecutive months — well below the pessimistic scenario — what should she do with her forecast model? Should she update it, extend the confidence band, or abandon trend-based forecasting altogether?