Case Study 2: Maya Builds Her Project Log System

Character: Maya Reyes (Freelance Business Consultant, $175/hr, ~12 clients) Concepts: csv.DictReader, csv.DictWriter, append mode, json.dump, pathlib, data enrichment, filtered reports

The Situation

Maya Reyes has been freelancing for four years. She is good at what she does — her clients are happy, her referral rate is strong, and she has more work than she can comfortably handle. Her business problem is not finding clients. It is tracking them.

For the last two years she has used a spreadsheet to log her projects. The spreadsheet has 94 rows, four different tab versions because she kept "improving" the layout, and a formula in column M that gives a different answer depending on whether you have macros enabled. She lost a $2,600 billing dispute last year because she could not clearly show a client that she had exceeded the original scope — the records were there, but the spreadsheet's version history was not.

Maya's goal for this weekend: build a Python-based project log that she owns completely. No spreadsheet, no vendor dependency, no format mystery. Just a CSV she controls and a script that reads, updates, and reports on it.

She has three specific requirements:

The log must persist between script runs — she updates it multiple times a week
Any project over 110% of its estimated hours should be clearly flagged
She wants two automated outputs: an "active projects" report she reviews each morning, and an "over-budget" report she reviews before client calls

Designing the Data Model

Maya thinks carefully about what fields she actually needs. The spreadsheet had 22 columns, most of which she never used. She cuts it down to nine:

Field	Type	Notes
`project_id`	string	`PROJ-2024-001` — unique, never changes
`project_name`	string	Human-readable description
`client`	string	Client company name
`start_date`	string (ISO)	`YYYY-MM-DD` format
`status`	string	`active`, `completed`, `on_hold`, `cancelled`
`estimated_hours`	float	Original scope
`actual_hours`	float	Hours worked to date
`hourly_rate`	float	May differ by client
`notes`	string	Free-text, optional

She deliberately keeps the schema flat. No nested data, no calculated columns stored in the CSV. Any derived values (earnings, variance percentages) will be computed at read time. Computed values stored in CSVs go stale and create confusion.

The Script

"""
maya_project_log.py
Maya Reyes — Freelance Consulting Project Log
Reads, updates, and reports on project status and earnings.
"""

import csv
import json
from datetime import date, datetime
from pathlib import Path


# ---------------------------------------------------------------------------
# Configuration
# ---------------------------------------------------------------------------

MAYA_DEFAULT_RATE    = 175.00
OVER_BUDGET_THRESHOLD = 1.10    # flag at 110% of estimated hours

LOG_DIR = Path("data") / "maya_logs"
LOG_DIR.mkdir(parents=True, exist_ok=True)

PROJECT_CSV       = LOG_DIR / "project_log.csv"
ACTIVE_REPORT     = LOG_DIR / "active_projects_report.csv"
OVER_BUDGET_CSV   = LOG_DIR / "over_budget_projects.csv"
EARNINGS_JSON     = LOG_DIR / "earnings_summary.json"

FIELDNAMES = [
    "project_id", "project_name", "client", "start_date",
    "status", "estimated_hours", "actual_hours", "hourly_rate", "notes",
]


# ---------------------------------------------------------------------------
# Core CRUD: Load and Save
# ---------------------------------------------------------------------------

def load_projects() -> list[dict]:
    """
    Read all projects from the CSV log with type conversion.
    Returns an empty list if the log does not yet exist.
    """
    if not PROJECT_CSV.exists():
        print(f"[load] No project log found at {PROJECT_CSV}. Starting fresh.")
        return []

    projects = []
    with open(PROJECT_CSV, mode="r", newline="", encoding="utf-8") as csv_file:
        reader = csv.DictReader(csv_file)
        for row in reader:
            projects.append({
                "project_id":       row["project_id"],
                "project_name":     row["project_name"],
                "client":           row["client"],
                "start_date":       row["start_date"],
                "status":           row["status"],
                "estimated_hours":  float(row["estimated_hours"]),
                "actual_hours":     float(row["actual_hours"]),
                "hourly_rate":      float(row["hourly_rate"]),
                "notes":            row["notes"],
            })

    print(f"[load] {len(projects)} projects loaded from {PROJECT_CSV.name}")
    return projects


def save_projects(projects: list[dict]) -> None:
    """
    Overwrite the project CSV with the current in-memory list.
    This is Maya's primary save mechanism — load, modify, save.
    """
    with open(PROJECT_CSV, mode="w", newline="", encoding="utf-8") as csv_file:
        writer = csv.DictWriter(csv_file, fieldnames=FIELDNAMES)
        writer.writeheader()
        writer.writerows(projects)

    print(f"[save] {len(projects)} projects saved to {PROJECT_CSV.name}")


# ---------------------------------------------------------------------------
# Adding and Updating Projects
# ---------------------------------------------------------------------------

def add_project(
    projects: list[dict],
    project_id: str,
    project_name: str,
    client: str,
    estimated_hours: float,
    status: str = "active",
    start_date: str = "",
    hourly_rate: float = 0.0,
    notes: str = "",
) -> list[dict]:
    """
    Add a new project to the in-memory list.
    Validates that project_id is unique.
    """
    existing_ids = {p["project_id"] for p in projects}
    if project_id in existing_ids:
        raise ValueError(
            f"Project ID '{project_id}' already exists. "
            "Use update_hours() or change_status() to modify it."
        )

    new_project = {
        "project_id":       project_id,
        "project_name":     project_name,
        "client":           client,
        "start_date":       start_date or date.today().isoformat(),
        "status":           status,
        "estimated_hours":  estimated_hours,
        "actual_hours":     0.0,
        "hourly_rate":      hourly_rate if hourly_rate > 0 else MAYA_DEFAULT_RATE,
        "notes":            notes,
    }

    projects.append(new_project)
    print(f"[add] {project_id}: '{project_name}' for {client}")
    return projects


def log_hours(
    projects: list[dict],
    project_id: str,
    additional_hours: float,
) -> list[dict]:
    """
    Add hours to a project's actual_hours total.
    Prints a warning if the project crosses the over-budget threshold.
    """
    for project in projects:
        if project["project_id"] == project_id:
            before = project["actual_hours"]
            project["actual_hours"] = round(before + additional_hours, 2)
            ratio = project["actual_hours"] / project["estimated_hours"]

            print(
                f"[hours] {project_id}: "
                f"{before:.1f}h → {project['actual_hours']:.1f}h "
                f"({ratio * 100:.0f}% of {project['estimated_hours']:.1f}h estimate)"
            )

            if ratio > OVER_BUDGET_THRESHOLD:
                excess = project["actual_hours"] - project["estimated_hours"]
                extra_billing = excess * project["hourly_rate"]
                print(
                    f"  *** OVER BUDGET: {project['project_name']} "
                    f"({project['client']}) — "
                    f"{excess:.1f}h over estimate, "
                    f"~${extra_billing:,.0f} additional billing ***"
                )
            return projects

    raise ValueError(f"Project '{project_id}' not found.")


def change_status(
    projects: list[dict],
    project_id: str,
    new_status: str,
) -> list[dict]:
    """Update the status of a project."""
    valid_statuses = {"active", "completed", "on_hold", "cancelled"}
    if new_status not in valid_statuses:
        raise ValueError(f"Invalid status '{new_status}'. Must be one of: {valid_statuses}")

    for project in projects:
        if project["project_id"] == project_id:
            old_status = project["status"]
            project["status"] = new_status
            print(f"[status] {project_id}: {old_status} → {new_status}")
            return projects

    raise ValueError(f"Project '{project_id}' not found.")


# ---------------------------------------------------------------------------
# Flagging and Analysis
# ---------------------------------------------------------------------------

def is_over_budget(project: dict) -> bool:
    """Return True if actual hours exceed the threshold * estimated hours."""
    if project["estimated_hours"] == 0:
        return False
    ratio = project["actual_hours"] / project["estimated_hours"]
    return ratio > OVER_BUDGET_THRESHOLD


def get_over_budget_projects(projects: list[dict]) -> list[dict]:
    """
    Return active and on_hold projects that exceed the over-budget threshold.
    Completed and cancelled projects are excluded — the conversation has passed.
    """
    actionable_statuses = {"active", "on_hold"}
    return [
        p for p in projects
        if p["status"] in actionable_statuses and is_over_budget(p)
    ]


def enrich_with_earnings(project: dict) -> dict:
    """
    Compute earnings and variance for a project.
    Returns a new dict with all original fields plus computed ones.
    """
    projected = project["estimated_hours"] * project["hourly_rate"]
    actual    = project["actual_hours"]    * project["hourly_rate"]
    variance  = actual - projected

    return {
        **project,                          # all original fields
        "projected_earnings":  round(projected, 2),
        "actual_earnings":     round(actual, 2),
        "earnings_variance":   round(variance, 2),
        "hours_variance":      round(project["actual_hours"] - project["estimated_hours"], 2),
        "pct_of_estimate":     round(
            (project["actual_hours"] / project["estimated_hours"] * 100)
            if project["estimated_hours"] > 0 else 0.0,
            1,
        ),
        "is_over_budget":      is_over_budget(project),
    }


# ---------------------------------------------------------------------------
# Report Writing
# ---------------------------------------------------------------------------

def write_active_report(projects: list[dict]) -> None:
    """
    Write a CSV of active projects, enriched with earnings data.
    Over-budget projects appear first (most urgent); otherwise sorted by client.
    """
    active = [
        enrich_with_earnings(p)
        for p in projects
        if p["status"] == "active"
    ]

    # Sort: over-budget first, then alphabetically by client
    active.sort(key=lambda p: (not p["is_over_budget"], p["client"]))

    report_fieldnames = [
        "project_id", "project_name", "client", "start_date",
        "estimated_hours", "actual_hours", "hours_variance", "pct_of_estimate",
        "hourly_rate", "projected_earnings", "actual_earnings",
        "earnings_variance", "is_over_budget", "notes",
    ]

    with open(ACTIVE_REPORT, mode="w", newline="", encoding="utf-8") as csv_file:
        writer = csv.DictWriter(
            csv_file,
            fieldnames=report_fieldnames,
            extrasaction="ignore",
        )
        writer.writeheader()
        writer.writerows(active)

    print(
        f"[report] Active projects: {len(active)} rows → {ACTIVE_REPORT.name}"
    )


def write_over_budget_report(projects: list[dict]) -> None:
    """
    Write a focused over-budget CSV with just the fields Maya needs
    for client billing conversations.
    """
    over_budget = get_over_budget_projects(projects)
    enriched = [enrich_with_earnings(p) for p in over_budget]

    # Sort by earnings_variance descending: highest dollar over-run first
    enriched.sort(key=lambda p: p["earnings_variance"], reverse=True)

    report_fieldnames = [
        "project_id", "project_name", "client",
        "estimated_hours", "actual_hours", "hours_variance",
        "hourly_rate", "projected_earnings", "actual_earnings",
        "earnings_variance", "notes",
    ]

    with open(OVER_BUDGET_CSV, mode="w", newline="", encoding="utf-8") as csv_file:
        writer = csv.DictWriter(
            csv_file,
            fieldnames=report_fieldnames,
            extrasaction="ignore",
        )
        writer.writeheader()
        writer.writerows(enriched)

    print(
        f"[report] Over-budget: {len(enriched)} rows → {OVER_BUDGET_CSV.name}"
    )


def write_earnings_json(projects: list[dict]) -> None:
    """
    Write a structured earnings summary to JSON.
    This is the file Maya opens when she wants a quick financial picture.
    """
    enriched = [enrich_with_earnings(p) for p in projects]

    # Aggregate by status
    by_status: dict[str, dict] = {}
    for p in enriched:
        s = p["status"]
        if s not in by_status:
            by_status[s] = {
                "count": 0,
                "total_actual_hours": 0.0,
                "total_earnings": 0.0,
            }
        by_status[s]["count"]              += 1
        by_status[s]["total_actual_hours"] += p["actual_hours"]
        by_status[s]["total_earnings"]     += p["actual_earnings"]

    for group in by_status.values():
        group["total_actual_hours"] = round(group["total_actual_hours"], 2)
        group["total_earnings"]     = round(group["total_earnings"], 2)

    all_earnings = sum(p["actual_earnings"] for p in enriched)
    all_hours    = sum(p["actual_hours"] for p in enriched)
    over_budget_projects = [p["project_name"] for p in enriched if p["is_over_budget"]]

    summary = {
        "summary_date":          date.today().isoformat(),
        "generated_at":          datetime.now().isoformat(timespec="seconds"),
        "total_projects":        len(projects),
        "total_logged_hours":    round(all_hours, 2),
        "total_earnings_to_date": round(all_earnings, 2),
        "effective_hourly_rate": round(all_earnings / all_hours, 2) if all_hours > 0 else 0.0,
        "default_rate":          MAYA_DEFAULT_RATE,
        "rate_efficiency_pct":   round(
            (all_earnings / all_hours / MAYA_DEFAULT_RATE * 100) if all_hours > 0 else 0.0, 1
        ),
        "by_status":             by_status,
        "over_budget_projects":  over_budget_projects,
    }

    with open(EARNINGS_JSON, mode="w", encoding="utf-8") as json_file:
        json.dump(summary, json_file, indent=2)

    print(f"[json]   Earnings summary → {EARNINGS_JSON.name}")


# ---------------------------------------------------------------------------
# Display
# ---------------------------------------------------------------------------

def print_project_table(projects: list[dict], title: str = "Project Log") -> None:
    """Print a formatted console table of all projects."""
    print(f"\n{title}")
    print("=" * 90)
    header = (
        f"{'ID':<16} {'Project':<32} {'Client':<20} "
        f"{'Status':<10} {'Est':>5} {'Act':>5} {'Flag':<6}"
    )
    print(header)
    print("-" * 90)
    for p in sorted(projects, key=lambda x: (x["status"], x["client"])):
        flag = "OVER" if is_over_budget(p) else ""
        print(
            f"{p['project_id']:<16} {p['project_name'][:31]:<32} {p['client'][:19]:<20} "
            f"{p['status']:<10} {p['estimated_hours']:>5.1f} {p['actual_hours']:>5.1f} "
            f"{flag:<6}"
        )
    print()


def print_earnings_dashboard(projects: list[dict]) -> None:
    """Print a quick earnings dashboard."""
    enriched = [enrich_with_earnings(p) for p in projects]
    active_only = [p for p in enriched if p["status"] == "active"]

    total_earnings   = sum(p["actual_earnings"]    for p in enriched)
    active_projected = sum(p["projected_earnings"] for p in active_only)
    active_actual    = sum(p["actual_earnings"]    for p in active_only)
    all_hours        = sum(p["actual_hours"]        for p in enriched)

    print("\n" + "=" * 55)
    print("  MAYA REYES — EARNINGS DASHBOARD")
    print("=" * 55)
    print(f"  Total Logged Hours          : {all_hours:>7.1f} hrs")
    print(f"  Total Earnings (all time)   : ${total_earnings:>10,.2f}")
    print(f"  Effective Rate              : "
          f"${total_earnings/all_hours:>7.2f}/hr" if all_hours > 0 else "  N/A")
    print(f"\n  Active Pipeline:")
    print(f"    Projected earnings        : ${active_projected:>10,.2f}")
    print(f"    Earned to date            : ${active_actual:>10,.2f}")
    remaining_potential = active_projected - active_actual
    print(f"    Remaining potential       : ${remaining_potential:>10,.2f}")
    over_budget_count = sum(1 for p in active_only if p["is_over_budget"])
    if over_budget_count > 0:
        print(f"\n  *** {over_budget_count} active project(s) are over budget ***")
    print()

Running the System

Maya runs this script in three modes, all from the same file:

Mode 1: Daily check-in (runs automatically each morning via a scheduled task)

# At the bottom of the script — morning review
projects = load_projects()
print_project_table(projects, "Morning Review — All Projects")
print_earnings_dashboard(projects)
write_active_report(projects)
write_over_budget_report(projects)
write_earnings_json(projects)

Mode 2: Adding a new project (runs when she signs a new contract)

projects = load_projects()
projects = add_project(
    projects,
    project_id       = "PROJ-2024-013",
    project_name     = "Market Entry Analysis",
    client           = "Solaris Ventures",
    estimated_hours  = 60.0,
    hourly_rate      = 175.00,
    notes            = "Phase 1: market sizing. SOW signed 2024-04-01.",
)
save_projects(projects)

Mode 3: Logging hours (runs after a working session)

projects = load_projects()
projects = log_hours(projects, "PROJ-2024-006", 3.5)   # 3.5 hours on Nexwave
projects = log_hours(projects, "PROJ-2024-008", 2.0)   # 2 hours on Sundial
save_projects(projects)

The Active Projects Report

active_projects_report.csv opens cleanly in Excel or Google Sheets. Over-budget projects appear at the top, sorted by highest dollar variance. Here is what the output looks like:

project_id,project_name,client,...,estimated_hours,actual_hours,hours_variance,is_over_budget
PROJ-2024-006,Customer Churn Segmentation,Nexwave Telecom,...,45.0,55.0,10.0,True
PROJ-2024-009,Annual Report Automation,Blue Mesa Logistics,...,28.0,31.5,3.5,True
PROJ-2024-004,HR Metrics Reporting,Pinnacle Staffing,...,24.0,26.5,2.5,True
PROJ-2024-003,Sales Pipeline Analysis,Crescendo Marketing,...,32.0,33.5,-1.5,False
...

The Earnings JSON

earnings_summary.json gives Maya a snapshot she can read in five seconds:

{
  "summary_date": "2024-04-01",
  "total_projects": 12,
  "total_logged_hours": 288.5,
  "total_earnings_to_date": 51362.50,
  "effective_hourly_rate": 178.04,
  "default_rate": 175.0,
  "rate_efficiency_pct": 101.7,
  "by_status": {
    "active": {
      "count": 7,
      "total_actual_hours": 158.5,
      "total_earnings": 28525.00
    },
    "completed": {
      "count": 2,
      "total_actual_hours": 109.5,
      "total_earnings": 19162.50
    }
  },
  "over_budget_projects": [
    "Customer Churn Segmentation",
    "Annual Report Automation",
    "HR Metrics Reporting"
  ]
}

Her effective rate of $178.04 is slightly above her target of $175. Three projects are over budget — she knows before she opens her laptop which client calls require a scope conversation.

What Maya Learned

The load → modify → save pattern is the right design for a simple persistent store. Load everything into memory, make your changes to the Python objects, save everything back out. For files with fewer than a few thousand rows, this is fast and easy to reason about. For larger data, you would use a database — but for twelve projects, a CSV is perfect.

Never store computed values in the source CSV. Maya's CSV stores only what she directly knows: hours estimated, hours logged, rate per hour. Everything else — earnings, variance, over-budget status — is computed at read time. This means there is only one source of truth, and it is always in sync.

Sorting before writing produces stable, auditable output. Because the active report always sorts the same way (over-budget first, then alphabetically), Maya notices immediately when something changes position between Monday and Tuesday.

JSON is the right format for summary data other tools might consume. The earnings JSON could feed a dashboard, a weekly email, or a chatbot. CSV is for tabular row data; JSON is for structured, nested summaries.

The billing dispute Maya lost last year? It cannot happen again. Her project log shows exactly when hours were logged, and the over-budget flag would have surfaced the issue before the invoice, not after.