Case Study 2: Paying Down AI Debt

A Team's 3-Month Plan to Reduce Technical Debt in Their AI-Generated Codebase

Background

CloudTrack is a project management SaaS application built by a five-person engineering team over eight months. The team adopted AI coding assistants early in development and used them heavily throughout. The application includes task management, time tracking, team collaboration features, reporting dashboards, and integrations with Slack, GitHub, and Google Calendar.

At the eight-month mark, CloudTrack had 62,000 lines of Python backend code and 45,000 lines of React/TypeScript frontend code. The team's velocity had declined from an average of 18 story points per sprint (in months 1-4) to 9 story points per sprint (in months 7-8). Bug density had tripled. The engineering manager, Ravi, initiated a technical debt assessment.

The assessment identified 83 debt items with an estimated total remediation effort of 68 developer-days. Rather than halt feature development entirely, Ravi designed a three-month remediation plan that balanced debt reduction with continued feature delivery.

The Starting State

Before describing the plan, here is where CloudTrack stood at Month 8:

Quantitative Metrics:

Metric Value Target
Average cyclomatic complexity 16.4 < 10
Code duplication 14.7% < 5%
Pattern consistency (backend) 48% > 85%
Pattern consistency (frontend) 62% > 85%
Test coverage (backend) 52% > 80%
Test coverage (frontend) 34% > 70%
Build time 18 minutes < 5 minutes
Dependencies with known CVEs 4 0
Unused dependencies 11 0

Qualitative Issues:

  • Five different error handling patterns in the backend.
  • The "task" module alone had 4,200 lines in a single file.
  • Three different state management approaches on the frontend (Redux, Context API, and local state for similar use cases).
  • No coding conventions document existed. Developers gave the AI assistant whatever context they thought of in the moment.
  • The CI pipeline had no quality gates — all checks were advisory, and failing checks did not block merges.

Team Morale:

In anonymous surveys, four of five developers reported frustration with the codebase. Common themes included: "I am afraid to touch certain modules," "I spend more time understanding existing code than writing new code," and "Every bug fix creates a new bug." One developer was actively considering leaving the team.

The Three-Month Plan

Ravi worked with the team to design a plan organized into three phases, each lasting one month. The plan allocated 30% of team capacity to debt reduction in Month 1, 25% in Month 2, and 20% in Month 3, with the remaining capacity dedicated to feature development.

Month 1: Foundations and Quick Wins

Goals: Establish conventions, fix security issues, remove the easiest debt, and set up automated prevention.

Week 1-2: Conventions and Tooling

The team's first action was creating a comprehensive coding conventions document. They spent two full days as a team (not individually) debating and documenting their preferred patterns for:

  • Database access (SQLAlchemy ORM with the repository pattern)
  • Error handling (custom exception hierarchy with FastAPI exception handlers)
  • API response format (standardized envelope with data, errors, and metadata fields)
  • Naming conventions (snake_case everywhere, "Service" suffix for business logic, "Repository" suffix for data access)
  • Frontend state management (Redux Toolkit for global state, React hooks for component state)
  • Testing patterns (pytest with fixtures, factory_boy for test data)

This conventions document became part of their AI assistant context. Every team member was required to include it at the start of every AI session. They stored it as CONVENTIONS.md in the project root and added a pre-session checklist to their team wiki.

They also configured their CI pipeline with hard quality gates:

  • Complexity threshold: No new function with cyclomatic complexity above 15 (they could not immediately enforce 10 due to existing violations, so they set a ceiling to prevent further degradation)
  • Duplication: No new duplicated blocks above 20 lines
  • Test coverage: No PR that decreases overall coverage
  • Security: Zero known CVEs in dependencies (blocking)
  • Lint: Zero new pylint errors (blocking)

Week 2-3: Security and Dependency Cleanup

The team addressed all four dependencies with known CVEs. Three were simple version bumps. One required a minor code change because the API had changed between versions. They used their AI assistant to help identify every usage of the changed API and update it consistently.

They removed all 11 unused dependencies and consolidated the three date-handling libraries (datetime, arrow, pendulum) down to one (datetime with helper utilities in a core/datetime_utils.py module). The AI assistant was particularly helpful here: they provided it with all files that imported the removed libraries and asked it to rewrite each one using the standard library equivalents.

Week 3-4: Top Five Quick Wins

Using the effort-impact matrix, the team identified and completed five Quick Win items:

  1. Shared email validation (4 hours): Consolidated six email validation implementations into one, with proper unit tests. The AI generated the consolidated function and the migration plan for each calling module.

  2. Shared pagination utility (6 hours): Replaced eight separate pagination implementations with a shared utility. The team used the AI to generate the utility based on their conventions, then manually reviewed and adjusted it.

  3. Shared API response formatting (8 hours): Created a standard response builder and updated all endpoints. This was done module by module over four days.

  4. Auth token extraction (2 hours): Trivial consolidation of duplicated middleware logic.

  5. Logging standardization (6 hours): Created a centralized logging configuration and updated all module-level logger initializations. The AI was asked to scan each file and replace custom logging setups with the standardized import.

Month 1 Results:

Metric Before After Month 1 Change
Avg complexity 16.4 15.8 -3.7%
Duplication 14.7% 11.2% -23.8%
Pattern consistency 48% 59% +22.9%
Test coverage (BE) 52% 56% +7.7%
Build time 18 min 14 min -22.2%
CVEs 4 0 -100%
Unused deps 11 0 -100%
Velocity (story points) 9 11 +22.2%

The velocity improvement was immediate and notable — even while dedicating 30% of capacity to debt work, the team completed more feature work than in the previous sprint because the quick wins had removed daily friction.

Month 2: Structural Improvements

Goals: Standardize the major architectural patterns, break up monolithic modules, and increase test coverage for high-risk areas.

Week 5-6: Error Handling Standardization

The team tackled their five different error handling patterns, converging on a custom exception hierarchy:

# core/exceptions.py
class AppError(Exception):
    """Base exception for all application errors."""
    def __init__(self, message: str, code: str, status_code: int = 500):
        self.message = message
        self.code = code
        self.status_code = status_code
        super().__init__(message)

class NotFoundError(AppError):
    def __init__(self, resource: str, identifier: str):
        super().__init__(
            message=f"{resource} '{identifier}' not found",
            code="NOT_FOUND",
            status_code=404
        )

class ValidationError(AppError):
    def __init__(self, message: str, field: str | None = None):
        super().__init__(
            message=message,
            code="VALIDATION_ERROR",
            status_code=422
        )

They migrated module by module. For each module, they asked the AI to convert all error handling to use the new exception classes, provided the exception hierarchy as context, and reviewed every change. The migration took eight developer-days spread across two weeks.

Week 6-7: Breaking Up the Task Monolith

The 4,200-line task.py file was the team's biggest pain point. They used the AI assistant to help decompose it:

  1. First, they asked the AI to analyze the file and suggest a decomposition into logical units. The AI identified seven responsibility groups: task CRUD operations, task assignment logic, task status transitions, task search and filtering, task notifications, task reporting, and task import/export.

  2. They created seven new files matching these groups, plus a facade class that preserved the existing public API so callers did not need to change.

  3. They migrated functionality one group at a time, writing tests for each group before and after the migration.

  4. After all groups were migrated, they updated callers to use the new modules directly and removed the facade.

The entire process took six developer-days. The AI was invaluable for the mechanical work of moving functions and updating imports, but human judgment was essential for deciding where to draw the boundaries.

Week 7-8: Database Access Standardization

The team created a base repository class and migrated all five data access patterns to use it. The AI helped generate the repository implementations from the conventions document and existing model definitions.

They also added test coverage for all repository methods, bringing backend coverage from 56% to 64%.

Month 2 Results:

Metric After Month 1 After Month 2 Change
Avg complexity 15.8 12.1 -23.4%
Duplication 11.2% 7.8% -30.4%
Pattern consistency 59% 74% +25.4%
Test coverage (BE) 56% 67% +19.6%
Build time 14 min 9 min -35.7%
Velocity (story points) 11 14 +27.3%

Month 3: Refinement and Sustainability

Goals: Continue pattern standardization, establish ongoing processes, and reach sustainable metrics.

Week 9-10: Frontend Consolidation

Tomasz (the team's strongest frontend developer) led the effort to consolidate the three state management approaches. They kept Redux Toolkit for application-level state and converted all Context API and inappropriate local state usage to follow the established pattern. The AI assistant generated the migration code for each component, while Tomasz reviewed for React-specific correctness.

Week 10-11: Remaining High-Priority Items

The team addressed the next tier of debt items: - Replaced the custom middleware chain with FastAPI's native middleware system - Simplified the over-engineered configuration loader from 340 lines to 45 lines - Created shared API client utilities for external integrations - Added input validation using Pydantic models at all API endpoints

Week 11-12: Process Establishment

The team established ongoing debt management processes:

  1. Monthly debt review: A 90-minute meeting on the first Monday of each month to review the debt catalog, add new items, close resolved items, and select items for the next month.

  2. 15% debt budget: Going forward, 15% of each sprint would be allocated to debt work. Items were selected from the prioritized catalog during sprint planning.

  3. Convention enforcement: The AI assistant context was made a required part of the PR description template. Reviewers were expected to check new code against conventions.

  4. Quarterly metrics review: A broader review of all health metrics, shared with the product team in business terms.

  5. New module template: A cookiecutter template was created for new Python modules, ensuring every new module started with the correct structure, imports, and patterns.

Final Results

After three months, CloudTrack's metrics had transformed:

Metric Month 0 Month 3 Target Status
Avg complexity 16.4 9.8 < 10 Met
Duplication 14.7% 4.2% < 5% Met
Pattern consistency (BE) 48% 82% > 85% Close
Pattern consistency (FE) 62% 79% > 85% Improving
Test coverage (BE) 52% 74% > 80% Improving
Test coverage (FE) 34% 51% > 70% Improving
Build time 18 min 4.5 min < 5 min Met
CVEs 4 0 0 Met
Velocity 9 pts 16 pts 18 pts 78% recovery
Debt items 83 29 Decreasing On track

Team morale improved dramatically. The anonymous survey at Month 3 showed all five developers reporting improved satisfaction. The developer who had been considering leaving chose to stay, citing the improved codebase health and the team's commitment to maintaining it.

Cost-Benefit Analysis

Investment: - Month 1: 30% of team capacity = ~30 developer-days - Month 2: 25% of team capacity = ~25 developer-days - Month 3: 20% of team capacity = ~20 developer-days - Total: ~75 developer-days

Returns (measured at Month 3): - Velocity recovered from 9 to 16 story points per sprint (78% increase) - Bug density decreased by 60% - Developer onboarding time estimated to decrease from 4 weeks to 2 weeks - Projected annual savings: approximately 200 developer-days in reduced friction and debugging

The investment paid for itself within the three-month period when accounting for the velocity recovery during the remediation itself.

Key Lessons

  1. The conventions document was the single highest-ROI investment. It took two days to create and prevented an estimated 40% of new debt from entering the codebase during the remediation period.

  2. AI was both the cause and a powerful tool for the cure. The same AI assistants that generated the inconsistent code were invaluable for systematic remediation — performing consistent refactoring across dozens of files with minimal errors.

  3. Velocity recovery began immediately. The team did not have to wait until the end of the three months to see benefits. Quick wins in the first two weeks produced measurable velocity improvements.

  4. Process is more important than heroics. The ongoing processes established in Month 3 — the monthly debt review, the 15% budget, the convention enforcement — are more valuable than the initial remediation because they prevent the debt from returning.

  5. Communicate in business terms from day one. Ravi's initial pitch to the product team framed the remediation as a velocity investment: "We are slow now because of accumulated inefficiencies. Investing 25% of our time for three months will recover most of our lost velocity permanently." This framing secured buy-in that technical jargon never would have achieved.

Applying This to Your Projects

CloudTrack's three-month plan is not a one-size-fits-all template. A smaller project might need only two weeks. A larger project might need six months. But the structure is adaptable:

  • Phase 1: Establish conventions, fix security issues, and capture quick wins. This phase should produce visible results fast to build momentum and stakeholder confidence.
  • Phase 2: Address structural issues that require more effort. This is where the big complexity and consistency improvements happen.
  • Phase 3: Refine, extend, and — critically — establish the ongoing processes that prevent a return to the starting state.

The worst outcome is completing a successful remediation and then abandoning the processes that maintain the improvement. Without ongoing discipline, debt will accumulate again to its previous levels within six to twelve months.