Case Study 2: Context-Optimized Development

How a Senior Developer Plans Context Budgets for Every Coding Session

Background

Priya is a senior backend engineer at a mid-size fintech startup. She has been using AI coding assistants for over a year and has developed a disciplined, systematic approach to context management. While her junior colleagues often have long, meandering conversations with AI that produce inconsistent results, Priya consistently gets high-quality code in fewer turns.

Her secret is not better prompting skills (though those help)---it is that she plans every coding session before typing her first message. She treats context as a scarce resource and budgets it deliberately.

This case study follows Priya through a real development task: adding a new "transaction dispute" feature to her company's payment processing API. We will see her planning process, her session execution, and the results.

The Task

The payment processing API needs a new feature: customers should be able to dispute transactions. The feature requires:

A new Dispute data model
API endpoints for creating, viewing, updating, and resolving disputes
Integration with the existing Transaction and User models
An automated notification when disputes are created or resolved
Admin endpoints for reviewing and adjudicating disputes
Comprehensive test coverage

The codebase is a FastAPI application with 45 files, roughly 8,000 lines of code total. It uses SQLAlchemy 2.0 with PostgreSQL, Pydantic v2 for schemas, and pytest for testing.

Phase 1: Pre-Session Planning (15 minutes)

Before opening her AI assistant, Priya spends 15 minutes planning. She opens a text file and writes:

## Session Plan: Transaction Dispute Feature

### Goal
Add dispute lifecycle (create, view, update, resolve) to payment API.

### Estimated Scope
- 1 new model file (~80 lines)
- 1 new schema file (~60 lines)
- 1 new repository file (~120 lines)
- 1 new router file (~150 lines)
- 1 new service file (~100 lines)
- Modifications to notification service (~30 lines)
- 1 new test file (~200 lines)
- Total new/modified: ~740 lines

### Context Budget (200K token window)

Fixed costs:
  System prompt + CLAUDE.md:     6,000 tokens
  Response reserve:              4,000 tokens
  Subtotal:                     10,000 tokens

Session 1 - Models & Schemas (est. 8 turns):
  Existing files needed:
    models/transaction.py       full      1,800 tokens
    models/user.py              interface   400 tokens
    models/base.py              full        300 tokens
    schemas/transaction.py      full        800 tokens
  Conversation budget:                   12,000 tokens
  Subtotal:                              15,300 tokens
  Running total:                         25,300 tokens  [OK]

Session 2 - Repository & Service (est. 10 turns):
  Existing files needed:
    New dispute model            full      1,200 tokens
    New dispute schemas          full        800 tokens
    repositories/transaction.py  interface   500 tokens
    services/notification.py     interface   400 tokens
  Conversation budget:                   18,000 tokens
  Subtotal:                              20,900 tokens
  Running total:                         31,900 tokens  [OK]

Session 3 - API Endpoints (est. 10 turns):
  Existing files needed:
    New dispute model            interface   300 tokens
    New dispute schemas          full        800 tokens
    New dispute service          interface   400 tokens
    routers/transactions.py      full      2,200 tokens  (pattern reference)
    api/deps.py                  full        600 tokens
  Conversation budget:                   18,000 tokens
  Subtotal:                              22,300 tokens
  Running total:                         33,300 tokens  [OK]

Session 4 - Tests (est. 12 turns):
  Existing files needed:
    New dispute router           full      2,500 tokens
    New dispute schemas          full        800 tokens
    tests/conftest.py            full      1,200 tokens
    tests/test_transactions.py   snippet   1,500 tokens  (pattern reference)
  Conversation budget:                   22,000 tokens
  Subtotal:                              28,000 tokens
  Running total:                         38,000 tokens  [OK]

### Checkpoint Strategy
- After each session: generate summary, collect final code
- Between sessions: review code, run existing tests, verify integration
- If any session exceeds 12 turns: summarize and evaluate fresh start

### Priming Template
[See CLAUDE.md for base; add dispute-specific context]

Priya notes that even the most expensive session (Session 4, tests) uses only about 38,000 tokens out of her 200,000 budget. She has significant headroom, which means she can include additional context if needed or extend conversations without worrying about running out of space.

The planning took 15 minutes. She estimates it will save her at least an hour of wasted effort during the actual coding sessions.

Phase 2: Session 1 --- Models and Schemas (35 minutes)

Priya opens Claude Code and begins. Her CLAUDE.md file already contains the project conventions:

# CLAUDE.md

## Project: PayFlow API
- Python 3.12, FastAPI 0.110+, SQLAlchemy 2.0, PostgreSQL 16
- Pydantic v2 for all schemas
- Async everywhere (AsyncSession, async def)
- Repository pattern for data access
- Google-style docstrings, type hints on all functions
- UUID primary keys, UTC timestamps
- RFC 7807 problem detail for error responses
- pytest + pytest-asyncio for testing

Turn 1 (Priming + Task):

I'm adding a transaction dispute feature to our payment API.

Here's the existing Transaction model that disputes will reference:
[pastes models/transaction.py - full file]

Here's the base model we inherit from:
[pastes models/base.py - full file]

Here's the schema pattern we follow:
[pastes schemas/transaction.py - full file]

Here's the User model interface (disputes reference users):
```python
class User(Base):
    __tablename__ = "users"
    id: Mapped[UUID] (PK)
    email: Mapped[str] (unique)
    full_name: Mapped[str]
    role: Mapped[UserRole]  # enum: customer, admin, support

Create the Dispute model with these requirements: - Fields: id (UUID PK), transaction_id (FK), user_id (FK who filed it), reason (enum: unauthorized, duplicate, wrong_amount, not_received, other), status (enum: open, under_review, resolved_in_favor, resolved_against, closed), description (text), evidence_urls (JSON array of strings), resolution_notes (text, nullable), created_at, updated_at, resolved_at (nullable) - Add appropriate indexes and constraints - Follow our existing model patterns exactly


The AI produces the Dispute model in a single response that perfectly matches the codebase conventions. No clarifying questions were needed because Priya front-loaded all necessary context.

**Turns 2-3:** Priya asks for the Pydantic schemas (DisputeCreate, DisputeUpdate, DisputeResponse, DisputeListResponse) and gets them in two turns, with one refinement to add a `with_transaction` field to the response model.

**Turns 4-5:** She asks the AI to add a `disputes` relationship to the Transaction model and verify there are no circular import issues. The AI provides both the model update and an explanation of the import order.

**Turn 6:** She asks the AI to summarize the session.

**Session 1 Result:** 6 turns, approximately 9,000 tokens total. Three new files (model, schemas, model update) with clean, consistent code. Under budget by 6,300 tokens.

Priya copies the final code into her project, runs the existing tests to verify nothing broke, and creates the database migration.

---

### Phase 3: Session 2 --- Repository and Service (40 minutes)

**Turn 1 (Priming + Task):**

```markdown
Continuing the dispute feature for PayFlow API.

New files from the previous session:
[pastes dispute model - full]
[pastes dispute schemas - full]

Existing interfaces I'll need:
```python
# repositories/transaction.py (interface only)
class TransactionRepository:
    async def get_by_id(self, txn_id: UUID) -> Transaction | None
    async def get_by_user(self, user_id: UUID, ...) -> list[Transaction]

# services/notification.py (interface only)
class NotificationService:
    async def send_email(self, to: str, template: str, context: dict) -> None
    async def send_in_app(self, user_id: UUID, message: str, ...) -> None

Create the DisputeRepository following our repository pattern. Methods needed: create, get_by_id, get_by_transaction, get_by_user, update_status, list_all (admin, with pagination and status filter). ```

Notice how Priya uses interface-only context for files the new code will interact with but does not need to modify. This saves approximately 2,000 tokens compared to including the full repository and notification service files.

Turns 2-5: The AI produces the repository, Priya refines the pagination approach (switching from offset to cursor-based after reviewing the initial implementation), and they finalize the repository.

Turns 6-9: They build the DisputeService, which orchestrates the business logic: validation (checking that the transaction exists and belongs to the user), state machine transitions (open -> under_review -> resolved), and notifications (email and in-app when disputes are created or resolved).

Turn 10: Summary and artifact collection.

Session 2 Result: 10 turns, approximately 16,000 tokens total. Two new files with proper separation of concerns. Under budget by 4,900 tokens.

Phase 4: Sessions 3 and 4 --- Endpoints and Tests (80 minutes)

Priya follows the same disciplined pattern for the remaining sessions. For Session 3, she includes the dispute schemas and service interface alongside a full copy of the existing transaction router as a pattern reference. The AI produces consistent endpoints on the first attempt for the basic CRUD operations, with two rounds of refinement for the admin adjudication endpoint.

For Session 4 (tests), she includes the full router code (so the AI knows exactly what to test), the test configuration file, and a snippet from an existing test file as a pattern reference. The AI generates comprehensive tests including edge cases she had not specified---invalid transition states, authorization checks, and pagination boundary conditions.

Results and Analysis

Time Breakdown

Activity	Time
Pre-session planning	15 minutes
Session 1: Models & Schemas	35 minutes
Session 2: Repository & Service	40 minutes
Between-session review & integration	20 minutes
Session 3: API Endpoints	35 minutes
Session 4: Tests	45 minutes
Final integration & manual testing	20 minutes
Total	3 hours 30 minutes

Token Usage

Session	Planned Budget	Actual Usage	Under/Over
Session 1	15,300	9,000	-6,300 (under)
Session 2	20,900	16,000	-4,900 (under)
Session 3	22,300	19,500	-2,800 (under)
Session 4	28,000	25,000	-3,000 (under)
Total	86,500	69,500	-17,000 (under)

Priya came in under budget on every session. This is typical of well-planned sessions---the budget provides headroom for unexpected complexity, but the planning itself reduces the need for that headroom.

Code Quality Metrics

Consistency with existing codebase: 100%. Every file follows the established patterns.
Test coverage: 94% line coverage on the new code.
Bugs found in integration testing: 1 (a minor issue with the notification template name).
AI rework needed: 2 instances across all four sessions (the pagination approach in Session 2 and a response model detail in Session 1).
Total files created/modified: 8 files, approximately 780 lines.

Priya's Context Management Principles

Over a year of practice, Priya has developed several principles that guide her approach:

Principle 1: Plan Before You Prompt

"I never start a session without knowing what files I need, how many turns I expect, and what my context budget looks like. Planning takes 10-15 minutes and saves at least an hour of wasted effort."

Principle 2: Sessions Are Cheap, Context Is Expensive

"Starting a new conversation is free. Wasting tokens on irrelevant context or degraded conversations is expensive in time and frustration. I would rather do five focused sessions than one marathon session."

Principle 3: Interface-Only Is Usually Enough

"For files the AI needs to interact with but not modify, I almost always use interface-only context. The AI does not need to see the SQL inside a repository method to write code that calls that method---it just needs the method signature and return type."

Principle 4: The First Message Is an Investment

"I spend real time crafting my first message for each session. I include full file context for the files being worked on, interfaces for dependencies, explicit constraints, and a clear task description. This front-loading pays dividends in every subsequent turn."

Principle 5: Between-Session Review Is Non-Negotiable

"Between sessions, I always review the generated code, run tests, and verify integration. I never start a new session using AI-generated code I have not personally reviewed. This catches issues early and ensures the context I carry forward to the next session is accurate."

Principle 6: Budget for Headroom

"I plan to use about 70% of the available context. The remaining 30% is headroom for unexpected complexity---additional file context I did not anticipate, extra turns to refine a tricky piece, or debugging if something goes wrong."

Comparing Approaches: Planned vs. Unplanned

To illustrate the impact of context planning, consider how a developer without Priya's discipline might approach the same task:

Aspect	Unplanned Approach	Priya's Planned Approach
Sessions	1 (marathon)	4 (focused)
Total turns	40-60	36
Context degradation	Severe by turn 25	None (fresh context each session)
Rework needed	Frequent (8-12 instances)	Minimal (2 instances)
Code consistency	Mixed (style drifts over time)	High (conventions reinforced each session)
Total time	4-5 hours	3.5 hours
Developer frustration	High	Low
Code usable as-is	Partially	Fully

The planned approach is faster, produces better code, and is less frustrating---even though it includes 15 minutes of "overhead" for planning and 20 minutes for between-session review.

Adapting the Approach

Priya does not plan every vibe coding session this thoroughly. Her approach scales with the complexity of the task:

Quick fix (5-10 minutes): No formal planning. Open a conversation, describe the bug and paste the relevant code, get the fix.

Small feature (30-60 minutes): Mental planning only. One session, front-load the context, 5-10 turns.

Medium feature (1-3 hours): Brief written plan. Two to three sessions with summaries between them. Priya's dispute feature falls into this category.

Large feature or refactoring (half-day or more): Full written plan with context budgets, session breakdown, and explicit checkpoint criteria. Shared with team members who might also be working on AI-generated code for the same feature.

The key insight is that the cost of planning scales with the benefit. For a 5-minute task, spending 15 minutes planning is wasteful. For a 4-hour task, spending 15 minutes planning is an excellent investment.

Tools and Automation

Priya has built a few lightweight tools to support her workflow:

A priming template library. She has text files with priming templates for different types of sessions (backend feature, frontend component, debugging, code review, test writing). She pastes the appropriate one at the start of each session.
An interface extractor script. A Python script that reads a Python file and outputs only the class definitions, method signatures, and docstrings---no implementation bodies. This makes it easy to generate interface-only context for any file in the codebase.
A session summary template. A standard format for the summary she generates at the end of each session, ensuring she captures all the information needed to prime the next session.

These tools took about two hours to build (using AI assistance, naturally) and save her time on every session.

Reflection Questions

Priya's planning took 15 minutes for a 3.5-hour task. At what task size does planning stop being worthwhile? How would you determine this for your own workflow?
Priya uses interface-only context extensively. Can you identify a scenario where this approach would lead to problems---where the AI needs to see the full implementation, not just the interface?
Priya's "between-session review" adds 20 minutes to her workflow. What risks would she face if she skipped this step and piped AI output directly into the next session?
How would Priya's approach change if she were working on a greenfield project with no existing codebase to provide as context?
Priya developed her approach through a year of practice. If you were training a new team member in context management, which of her principles would you teach first, and why?