Case Study 39-A: Priya Sets Up a Professional Repository

Background

It is a Tuesday in March. Priya Okonkwo has been building Python tools for Acme Corp for nine months. She has scripts for the weekly margin report, the regional dashboard, the customer tier analysis, and three smaller automation tools that handle data cleanup tasks Marcus used to do by hand.

All of these scripts live in a folder on her laptop called analysis_scripts. There is no version control. There are files named margin_report_v2.py, margin_report_v3_FINAL.py, and margin_report_v3_FINAL_ACTUALLY_FINAL.py. She is not sure which one Marcus ran last week.

Sandra Chen asked Priya on Monday whether the regional dashboard could be shared with Marcus so he could run it himself when Priya is on vacation. Priya said yes. She has been putting off figuring out exactly how for three days.

Today is the day she figures it out.


Step 1: Initializing the Repository

Priya opens her terminal, navigates to a fresh directory, and runs:

mkdir acme-analytics
cd acme-analytics
git init

Git reports: Initialized empty Git repository in /Users/priya/projects/acme-analytics/.git/

She then creates the directory structure she learned from the chapter:

acme-analytics/
├── .gitignore
├── README.md
├── requirements.txt
├── src/
│   ├── __init__.py
│   ├── business_math.py
│   ├── data_loader.py
│   └── report_builder.py
├── tests/
│   ├── __init__.py
│   └── test_business_math.py
└── data/
    └── sample/
        └── acme_sales_sample.csv

She copies her functions from the scattered scripts into src/business_math.py, taking the time to clean them up as she goes — consistent docstrings, type hints, the PEP 8 formatting she's been meaning to apply for months.


Step 2: The .gitignore

Priya creates her .gitignore:

# Virtual environment
venv/
.venv/

# Python cache
__pycache__/
*.pyc

# Environment variables
.env

# The real data files — too large and potentially sensitive
data/raw/
data/processed/
reports/

# Keep sample data for testing
!data/sample/

# Editor
.vscode/settings.json

This is the moment she realizes something important: the actual acme_sales_2023.csv file — which lives in data/raw/ — contains customer names, addresses, and order values. It should not be in a public or even team-shared repository without review. She adds data/raw/ to .gitignore immediately, relieved she caught this before pushing.


Step 3: The First Real Commit

Priya stages her files and writes her first commit message:

git add .gitignore README.md requirements.txt
git add src/ tests/
git add data/sample/
git commit -m "Initialize Acme analytics project structure

- Add business_math module with typed functions
- Add initial test suite for business calculations
- Include sample data for local testing
- Set up .gitignore to exclude real data and credentials"

She pauses on the commit message. Her instinct is to write first commit. But she remembers what the chapter said: a commit message is a note to your future self. When she runs git log in four months, "Initialize Acme analytics project structure" will tell her something. "first commit" will tell her nothing.

She commits.


Step 4: Pushing to GitHub

Priya goes to GitHub, creates a new repository called acme-analytics, and follows the instructions to connect her local repository:

git remote add origin https://github.com/priya-okonkwo/acme-analytics.git
git branch -M main
git push -u origin main

The repository is live. She navigates to it in her browser and sees her files, her README, her commit message. Something about seeing her own code in a professional repository — properly structured, with a README, with tests — feels different from a folder called analysis_scripts on her laptop. It looks like something someone built, rather than something someone wrote.


Step 5: Writing the First Tests

Priya opens tests/test_business_math.py. She has never written a formal test before. She starts with the function she trusts most — calculate_gross_margin — and works through what she knows about it.

Her first pass:

import pytest
from src.business_math import calculate_gross_margin


def test_gross_margin_standard():
    assert calculate_gross_margin(100_000, 65_000) == 0.35


def test_gross_margin_zero_revenue():
    assert calculate_gross_margin(0, 50_000) == 0.0

She runs pytest:

pytest tests/ -v
============================= test session starts ==============================
collected 2 items

tests/test_business_math.py::test_gross_margin_standard PASSED         [ 50%]
tests/test_business_math.py::test_gross_margin_zero_revenue PASSED     [100%]

============================== 2 passed in 0.08s ===============================

Both pass. She adds more:

def test_gross_margin_100_percent():
    # Zero COGS — pure revenue
    assert calculate_gross_margin(100_000, 0) == 1.0


def test_gross_margin_zero_percent():
    # Breaking even
    assert calculate_gross_margin(100_000, 100_000) == 0.0


def test_gross_margin_precision():
    # The Acme Q3 actuals — use approx for floating point
    result = calculate_gross_margin(874_400, 550_000)
    assert result == pytest.approx(0.3705, rel=1e-3)

She runs pytest again. All five pass.

Then she thinks about the incident from October — the one where the column headers changed and the script returned zeros silently. She writes a test that captures exactly what went wrong:

def test_gross_margin_does_not_silently_return_zero_for_valid_inputs():
    """
    The October Incident: ensure that valid revenue inputs never produce 0.0.
    A zero result is only valid when revenue itself is zero.
    """
    result = calculate_gross_margin(874_400, 550_000)
    assert result != 0.0
    assert result > 0.0

She commits this batch of tests with the message: Add initial gross margin tests — including guard against October-style silent zeros


Step 6: Marcus Reviews the Pull Request

Priya has been assigned to add a new function: calculate_customer_concentration_risk, which measures what percentage of total revenue comes from the top 5 customers. Sandra wants to add this to the dashboard after a concern raised in a board meeting about revenue concentration.

Rather than pushing directly to main, Priya creates a feature branch:

git checkout -b feature/customer-concentration-risk

She writes the function, writes tests for it, and pushes the branch to GitHub:

git push origin feature/customer-concentration-risk

On GitHub, she opens a pull request with the title: Add customer concentration risk metric to analytics module

In the PR description, she writes:

What this does: Adds calculate_customer_concentration_risk(revenue_by_customer) which returns the percentage of total revenue attributable to the top N customers. Default N=5.

Why: Sandra's request following the Q4 board discussion on revenue concentration risk.

Test coverage: 6 tests covering standard case, single customer, empty input, N > number of customers, and ties in revenue ranking.

To review: Please check the edge case handling in lines 47-51. I wasn't sure whether to return 1.0 or raise ValueError when there's only one customer — I chose 1.0 since it's mathematically correct (one customer IS 100% concentration risk).

Marcus receives the notification and reviews the PR.


Marcus's Review Comments

Comment on line 32 (non-blocking):

The variable name top_n_revenue is clear, but top_customer_revenue might be even more explicit since we're specifically talking about customer revenue here, not just any top-N calculation. Up to you — both work.

Comment on line 48 (non-blocking):

Agree with your choice to return 1.0 for single customer. The docstring should probably say this explicitly though — someone reading this three months from now might wonder if the function can return exactly 1.0.

Comment on line 73 (blocking):

The test test_concentration_risk_empty_input expects a ValueError to be raised, but the implementation returns 0.0 on line 47. These are inconsistent. Either the test should use assert result == 0.0 or the implementation should raise ValueError. I'd lean toward ValueError since empty input suggests a data problem, not a valid zero-concentration scenario.

General comment (approving):

This is solid work. The type hints are clean, the docstring covers all the inputs and edge cases I'd want documented, and the test for "N greater than number of customers" is exactly the kind of edge case that causes silent bugs in production. One blocking comment but otherwise ready to merge once addressed.


Priya's Response

Priya reads the comments carefully. She appreciates that Marcus distinguished between blocking and non-blocking — it means she doesn't have to guess which comments are requirements versus preferences.

She agrees with all three points. She updates the implementation to raise ValueError for empty input (agreeing with Marcus's reasoning), updates the test to match, updates the docstring on the single-customer case, and renames the variable.

She pushes the changes to her branch. The PR updates automatically.

She replies to Marcus's comments:

Fixed the test/implementation inconsistency — went with ValueError as you suggested. Also updated the docstring on the 1.0 case. Renamed the variable but left a note in the commit message about why.

Marcus approves the PR. Priya merges it into main.


What Changed

Before this workflow, Priya's code lived on her laptop, had no tests, had no version history, and could only be shared by emailing a .py file. If she was wrong about a calculation, she might not find out until a report was already in Sandra's inbox.

After: her code is versioned, tested, reviewable, and reproducible on any machine with Python. When Marcus ran the dashboard himself last week while Priya was at a training, he cloned the repository, ran pip install -r requirements.txt, and ran the script in under five minutes. He didn't need to ask Priya anything.

That is what professional Python looks like.


Key Practices Demonstrated

Practice What Priya Did
Version control Initialized git, created meaningful commits, pushed to GitHub
.gitignore Excluded real data, credentials, and cache files
Feature branches Created feature/ branch for new functionality
Pull requests Opened PR with clear description of what, why, and areas of concern
Code review Marcus distinguished blocking vs. non-blocking feedback
Responding to review Priya addressed all comments, explained her choices
Test coverage Tests for standard cases, edge cases, and the specific failure mode from the October incident