Case Study 1: The Debt Audit
A Comprehensive Technical Debt Audit of an AI-Built Application Revealing 47 Debt Items
Background
MealPlanr is a meal planning and grocery list application built over four months by a two-person team — Priya, a backend developer, and Tomasz, a frontend developer. Both used AI coding assistants extensively throughout the project. The application includes user account management, recipe storage, weekly meal plan generation, grocery list creation, and integration with two grocery delivery APIs.
By the time MealPlanr reached 28,000 lines of Python (backend) and 19,000 lines of TypeScript (frontend), the team noticed troubling symptoms. Adding a simple "favorite recipes" feature, which Priya estimated at two days, took eleven days. Three bugs were introduced during the process. Tomasz reported that changing a single UI component often broke unrelated parts of the application. Both developers avoided modifying certain modules that they described as "fragile" and "incomprehensible."
Their CTO, Alana, hired an external consultant named Damien to perform a comprehensive technical debt audit. What follows is a summary of Damien's findings.
Audit Methodology
Damien used a five-step methodology:
- Automated scanning: Ran pylint, flake8, radon (complexity), and jscpd (duplication) across the entire codebase.
- Dependency analysis: Used pipdeptree and custom scripts to map module dependencies and identify circular references.
- Pattern analysis: Manually reviewed representative files from each module to identify inconsistent patterns.
- Developer interviews: Spoke with Priya and Tomasz individually about pain points, areas of confusion, and their AI usage patterns.
- Historical analysis: Reviewed git history to identify high-churn files and correlate them with bug reports.
The Findings: 47 Debt Items
Damien organized the 47 debt items into the seven AI-specific debt patterns described in this chapter's framework.
Style Drift (11 items)
The most prevalent category. Priya used AI sessions of varying lengths, sometimes providing the project's conventions document and sometimes not. Tomasz switched between two different AI tools mid-project.
DEBT-001: Three database access patterns. The recipe module used SQLAlchemy ORM with the repository pattern. The user module used SQLAlchemy ORM with direct model queries in route handlers. The meal plan module used raw SQL via psycopg2. All three patterns worked. None were documented as the canonical approach.
DEBT-002: Inconsistent error handling. The user module returned HTTP error responses directly from service classes. The recipe module raised custom exceptions that were caught by middleware. The grocery integration module returned tuple pairs of (result, error_message). Priya confirmed she had never consciously chosen between these approaches — each emerged from a different AI session.
DEBT-003 through DEBT-007: Naming inconsistencies. Five separate naming problems were documented: mixed snake_case and camelCase in the meal plan module, inconsistent use of "Service" vs. "Manager" vs. "Handler" suffixes, variable names using different conventions for the same concept (userId, user_id, uid), inconsistent file naming (some modules used singular nouns, others plural), and test files using both test_ prefix and _test suffix.
DEBT-008 through DEBT-011: Formatting and structural inconsistencies. Different import ordering conventions, inconsistent use of type hints (present in 60% of functions, absent in 40%), varying docstring formats (Google style, NumPy style, and no docstrings), and mixed use of dataclasses and plain dictionaries for data transfer.
Copy-Paste Duplication (9 items)
DEBT-012: Email validation in five locations. Five separate email validation implementations existed, each with slightly different regex patterns. One accepted emails with plus signs, three did not. One required a two-character minimum TLD, others did not check.
DEBT-013: Date formatting in seven locations. Seven functions across the codebase formatted dates for display, using three different output formats.
DEBT-014: Authentication token extraction in four locations. Four separate functions parsed the Authorization header from incoming requests, each implementing the "Bearer " prefix removal differently.
DEBT-015: Pagination logic in six locations. Every API endpoint that returned lists implemented its own pagination, with different parameter names (page/offset, limit/per_page/page_size) and different default values.
DEBT-016 through DEBT-020: Additional duplications. Duplicated input sanitization (3 variants), duplicated price formatting (4 variants), duplicated API response envelope construction (3 variants), duplicated logging setup (each module initialized its own logger differently), and duplicated database connection management.
Over-Engineering (6 items)
DEBT-021: Configuration system. The configuration loader used the Strategy pattern with four concrete strategies (JSON, YAML, TOML, environment variables), a configuration registry with dependency injection, and a caching layer with TTL support. The application used a single .env file.
DEBT-022: Event system. An elaborate publish-subscribe event system was built for "future extensibility." It was used in exactly one place: sending a welcome email after user registration. A simple function call would have sufficed.
DEBT-023: Recipe search. The recipe search feature used an abstract query builder with support for boolean logic, faceted search, and relevance scoring. The actual search requirements were: find recipes by name containing a substring.
DEBT-024 through DEBT-026: Additional over-engineering. An abstract factory for creating API response objects (used for a single response format), a plugin system for grocery delivery integrations (only two integrations existed, with no plans for more), and a custom ORM-like layer on top of SQLAlchemy.
Hallucinated Patterns (5 items)
DEBT-027: Custom middleware chain. The backend used a custom "middleware chain" implementation that did not match Flask's, FastAPI's, or Django's middleware conventions. It worked correctly but was undocumented, untestable with standard tools, and incomprehensible to anyone who had not read the implementation.
DEBT-028: State machine for order processing. The AI generated a custom state machine implementation rather than using a standard library like transitions. The custom implementation had no validation, no transition hooks, and no visualization support.
DEBT-029: Custom dependency injection. Instead of using a standard DI library or Python's own module system, the AI created a custom service locator that used module-level dictionaries and string-based lookups.
DEBT-030 and DEBT-031: Additional hallucinated patterns. A custom "reactive" data binding system on the frontend that duplicated React's own state management, and a custom "query language" for filtering API results that could have been standard query parameters.
Missing Abstractions (7 items)
DEBT-032: No base repository class. Despite having five repository-like classes, there was no base class or interface defining the common operations (find_by_id, find_all, create, update, delete). Each repository implemented these methods with different signatures and behaviors.
DEBT-033: No shared API client. The two grocery delivery API integrations each implemented their own HTTP client setup, retry logic, timeout handling, and error mapping. These should have shared a base API client class.
DEBT-034: No validation framework. Input validation was scattered throughout the codebase with no unifying abstraction. Some endpoints validated in route handlers, some in service classes, and some not at all.
DEBT-035 through DEBT-038: Additional missing abstractions. No common response formatting, no shared error taxonomy, no unified caching interface, and no abstraction for external service health checks.
Shallow Understanding (5 items)
DEBT-039: Meal plan generation ignores dietary restrictions. The AI-generated meal plan algorithm optimized for variety and nutritional balance but did not enforce dietary restrictions (allergies, vegetarian, kosher) as hard constraints. Restrictions were treated as preferences that could be overridden.
DEBT-040: Grocery quantity aggregation rounding errors. When combining quantities across recipes (e.g., two recipes each requiring 0.33 cups of oil), the aggregation used floating-point arithmetic without proper rounding, leading to display values like "0.6599999999 cups."
DEBT-041: Concurrent meal plan editing not handled. Two users in the same household could simultaneously edit the same meal plan. The last save won, silently discarding the other user's changes.
DEBT-042 and DEBT-043: Additional shallow understanding issues. Recipe ingredient parsing assumed US measurement units with no support for metric, and the grocery delivery API integration did not handle rate limiting or implement exponential backoff.
Dependency Bloat (4 items)
DEBT-044: Unused dependencies. Five Python packages in requirements.txt were not imported anywhere in the codebase. They had been suggested by the AI during development but their usage was later removed while the dependency declarations remained.
DEBT-045: Redundant dependencies. Both arrow and pendulum were installed for date handling, alongside the standard library datetime. Different modules used different libraries for the same operations.
DEBT-046: Outdated dependencies. Three packages were more than two major versions behind their current releases, including one with a known moderate-severity CVE.
DEBT-047: Unnecessary heavy dependencies. The pandas library was imported in one file to calculate the average of a list of numbers. The standard library statistics.mean() function would have sufficed.
Impact Assessment
Damien calculated the following impact metrics:
| Metric | Value | Benchmark |
|---|---|---|
| Average cyclomatic complexity | 13.8 | Target: < 10 |
| Code duplication | 11.2% | Target: < 5% |
| Pattern consistency score | 54% | Target: > 85% |
| Test coverage | 41% | Target: > 80% |
| Technical Debt Ratio | 18% | Concern threshold: 10% |
| Estimated remediation time | 34 developer-days | -- |
Prioritized Remediation Plan
Damien delivered a prioritized list based on the effort-impact matrix:
Immediate (Quick Wins, Week 1-2): - DEBT-046: Update outdated dependency with known CVE (security) - DEBT-044: Remove unused dependencies (5 minutes) - DEBT-047: Replace pandas with statistics.mean() (30 minutes) - DEBT-012: Consolidate email validation into shared utility (4 hours)
Short-term (Weeks 3-6): - DEBT-001: Standardize on one database access pattern - DEBT-002: Standardize error handling with custom exceptions and middleware - DEBT-014: Create shared auth token extraction utility - DEBT-015: Create shared pagination utility - DEBT-034: Implement a validation framework
Medium-term (Weeks 7-12): - DEBT-032: Create base repository class and migrate existing repositories - DEBT-033: Create shared API client for external integrations - DEBT-039: Fix meal plan generation to enforce dietary restrictions as hard constraints - DEBT-021: Simplify configuration system
Long-term (Ongoing): - Address remaining style drift items incrementally using the Boy Scout Rule - Replace hallucinated patterns with standard library alternatives as modules are modified - Improve test coverage to 80% target
Lessons Learned
Damien's final report included three key lessons for the MealPlanr team:
-
Context is king. Most of the style drift could have been prevented by consistently providing the AI with a conventions document. The team had created one after the second month but did not enforce its use.
-
Review across modules, not just within them. Their code review process checked that each file was correct and well-written. It did not check that files were consistent with each other. Cross-module consistency reviews should happen monthly.
-
The speed trap. The team shipped features quickly for the first two months, then spent the next two months increasingly slowed by the accumulated debt. A small investment in consistency from the start would have maintained their velocity throughout.
Applying This to Your Projects
This case study illustrates that debt audits do not have to be elaborate or expensive. Damien's five-step methodology — automated scanning, dependency analysis, pattern analysis, developer interviews, and historical analysis — can be adapted by any team. The important thing is doing it regularly, before symptoms become severe. A lightweight audit every quarter is far more effective than a comprehensive audit performed only when the codebase is already in crisis.