Case Study 1: Navigating a 50,000-Line Codebase

Using AI to Understand and Modify a Large Existing Project

Background

Priya Sharma is a software developer who just joined a fintech startup called PayStream. The company has a payment processing platform built over three years by a team of eight developers, several of whom have since left the company. The codebase is approximately 50,000 lines of Python, spread across 180 files organized into a monorepo with three main services and two shared libraries.

Priya's first assignment: add a new "recurring payments" feature to the billing service. The feature requires changes to the billing service's models, services, and API layers, plus modifications to the shared payment processing library. She has two weeks to deliver.

The challenge: Priya has never seen this codebase before, the documentation is sparse, and the two developers who originally built the billing service have left the company. She decides to use AI-assisted development to accelerate her understanding of the code and implement the feature.

Phase 1: Codebase Reconnaissance (Day 1)

Priya starts by generating a directory tree of the entire monorepo:

paystream/
├── packages/
│   ├── billing-service/
│   │   ├── src/billing/
│   │   │   ├── models/        (12 files, ~2,400 lines)
│   │   │   ├── services/      (8 files, ~3,200 lines)
│   │   │   ├── api/           (6 files, ~1,800 lines)
│   │   │   ├── tasks/         (4 files, ~800 lines)
│   │   │   └── utils/         (5 files, ~600 lines)
│   │   └── tests/             (22 files, ~4,000 lines)
│   ├── user-service/
│   │   ├── src/users/         (28 files, ~6,000 lines)
│   │   └── tests/             (18 files, ~3,500 lines)
│   ├── notification-service/
│   │   ├── src/notifications/ (15 files, ~3,000 lines)
│   │   └── tests/             (10 files, ~2,000 lines)
│   ├── shared-models/
│   │   └── src/shared/        (20 files, ~4,500 lines)
│   └── payment-lib/
│       └── src/payment_lib/   (18 files, ~5,000 lines)
├── tools/                     (8 files, ~1,200 lines)
└── infrastructure/            (12 files, ~2,000 lines)

She shares this tree with her AI assistant along with the first prompt:

I'm new to this codebase and need to add a "recurring payments" feature to
the billing service. Here is the monorepo structure:

[directory tree]

Please help me understand:
1. What is the overall architecture?
2. Which packages are most relevant to my feature?
3. What should I look at first?

The AI identifies the three-service architecture, notes the shared libraries, and recommends she focus on billing-service, payment-lib, and shared-models. It suggests starting with the models to understand the data structures.

Phase 2: Building the Repository Map (Day 1-2)

Priya uses a repository map generator (similar to example-01-repo-mapper.py from this chapter's code) to create detailed maps of the three relevant packages. For the billing service, the map reveals:

billing/models/invoice.py (85 lines)
  Classes: Invoice, InvoiceStatus(Enum), InvoiceLineItem
  Dependencies: shared.models.customer, shared.models.currency

billing/models/payment.py (120 lines)
  Classes: Payment, PaymentStatus(Enum), PaymentMethod(Enum)
  Dependencies: shared.models.customer, payment_lib.processor

billing/models/subscription.py (65 lines)
  Classes: Subscription, SubscriptionPlan, BillingCycle(Enum)
  Dependencies: shared.models.customer

billing/services/invoice_service.py (280 lines)
  Classes: InvoiceService
  Dependencies: billing.models.invoice, billing.models.payment,
                payment_lib.processor, billing.utils.tax_calculator

billing/services/payment_service.py (320 lines)
  Classes: PaymentService
  Dependencies: billing.models.payment, payment_lib.processor,
                shared.models.customer, billing.tasks.webhooks

This map immediately reveals something important: there is already a subscription.py model with a BillingCycle enum. The recurring payments feature is not starting from zero; it builds on existing subscription infrastructure.

Phase 3: Understanding Existing Code (Day 2-3)

Priya provides the AI with the full contents of subscription.py and payment_service.py and asks it to trace the existing payment flow:

Here is the existing Subscription model and PaymentService. Please trace
the flow of a one-time payment from API request to payment processor
call. Identify where recurring payment logic would need to be inserted.

[paste subscription.py - 65 lines]
[paste payment_service.py - 320 lines]

The AI identifies that the current flow is:

  1. API endpoint receives payment request
  2. PaymentService.process_payment() validates the request
  3. It calls payment_lib.processor.charge() for the actual payment
  4. It creates a Payment record with the result
  5. It triggers a webhook notification via billing.tasks.webhooks

For recurring payments, the AI suggests the changes needed:

  • A new RecurringPayment model to track schedules
  • A new recurring_payment_service.py to manage scheduling logic
  • A new Celery task to execute scheduled payments
  • Modifications to the existing PaymentService to support recurring charges
  • API endpoints for creating, viewing, and canceling recurring payments

Phase 4: Understanding Conventions (Day 3)

Before writing any code, Priya asks the AI to analyze the coding conventions used in the billing service:

Here are three existing service files from the billing service. Please
identify the coding conventions, patterns, and style used:

[paste invoice_service.py]
[paste payment_service.py]
[paste billing/utils/tax_calculator.py]

The AI identifies the following conventions:

  • All services use dependency injection via __init__
  • Database operations use SQLAlchemy sessions passed as parameters
  • Error handling uses custom exceptions from billing.exceptions
  • Logging uses structured logging via structlog
  • All public methods have Google-style docstrings
  • Type hints are used throughout
  • Methods follow a validate-process-persist pattern

Priya saves this analysis as a "conventions document" to include in her generation prompts.

Phase 5: Generating the Implementation (Day 4-7)

With a solid understanding of the codebase, Priya begins generating code. She uses the hybrid approach:

Holistic generation for models:

Using the following conventions and existing models as reference, please
generate the RecurringPayment model and any necessary enums. It should
follow the exact same patterns as the existing models.

Conventions:
[paste conventions document]

Reference model (subscription.py):
[paste subscription.py]

The RecurringPayment should track: id, subscription_id, amount, currency,
interval (daily/weekly/monthly/yearly), next_payment_date, status
(active/paused/cancelled/failed), retry_count, max_retries, created_at,
updated_at.

File-by-file generation for services:

For the more complex RecurringPaymentService, she uses file-by-file generation with extensive context:

Please create recurring_payment_service.py following these conventions:

[paste conventions document]

Consistency reference (payment_service.py):
[paste payment_service.py]

Available imports:
[paste import map]

The service should support: create_recurring_payment, pause_recurring_payment,
cancel_recurring_payment, process_due_payments (called by scheduler),
handle_payment_failure (with retry logic).

Holistic generation for tests:

Please generate tests for the RecurringPaymentService. Here is the service
implementation and an existing test file as a reference for test patterns:

[paste recurring_payment_service.py]
[paste test_payment_service.py as reference]

Phase 6: Cross-File Verification (Day 8)

After generating all the new files, Priya uses the AI to verify cross-file consistency:

I have generated the following new files for the recurring payments feature.
Please review them for:
1. Consistent naming conventions
2. Correct import paths
3. No circular dependencies
4. Consistent error handling patterns
5. Consistent docstring format

[paste all new files]

The AI identifies two issues: one method used camelCase instead of snake_case (convention drift from a long session), and one import path referenced a non-existent utility function. Priya fixes both.

Phase 7: Integration Testing (Day 9-10)

Priya asks the AI to trace the complete recurring payment flow through all the files she created and modified, verifying that the data flows correctly from the API layer through the service layer to the payment processor and back.

The AI identifies a subtle issue: the retry logic in the service uses a different status enum value than what the Celery task checks for. This would have caused failed payments to never be retried. Priya fixes it before the bug reaches production.

Results and Lessons Learned

Priya delivered the feature in 9 working days instead of the estimated 14. More importantly, her code received positive reviews from senior developers who noted that it followed the existing conventions perfectly.

Key lessons from this case study:

  1. Invest time in understanding before generating. Priya spent three full days understanding the codebase before writing a single line of new code. This investment paid off in higher-quality generation and fewer iterations.

  2. The repository map was essential. It revealed the existing subscription infrastructure that Priya might have missed, preventing her from reinventing what was already partially built.

  3. Conventions analysis prevented style clashes. By asking the AI to extract conventions from existing code, Priya ensured her new code was indistinguishable from code written by the original developers.

  4. Cross-file verification caught real bugs. The status enum mismatch between the service and the task would have been a subtle production bug. AI-assisted review caught it because the AI could reason about the contract between the two files simultaneously.

  5. Scoping to relevant packages made the monorepo manageable. Priya never needed to load the entire 50,000-line codebase into context. By scoping to the billing service and its direct dependencies, she kept her context focused and effective.

  6. The phased approach provided natural checkpoints. Each phase had a clear deliverable (understanding, conventions, models, services, tests, verification), making progress visible and reviewable.

Priya's Prompt Library

After completing the feature, Priya documented her most effective prompts as reusable templates for the team:

  • Codebase orientation prompt: Provides directory tree, asks for architecture overview and entry points
  • Convention extraction prompt: Provides 2-3 representative files, asks for style analysis
  • Consistency reference generation prompt: Provides one file as reference, asks for a new file following the same patterns
  • Cross-file verification prompt: Provides multiple related files, asks for consistency review
  • Data flow tracing prompt: Provides files along a call chain, asks for end-to-end trace

These templates became part of PayStream's engineering playbook, helping other developers onboard faster and use AI more effectively.