21 min read

> "The only way to go fast is to go well." — Robert C. Martin

In This Chapter

Learning Objectives
Introduction
26.1 What Makes Code "Legacy"?
26.2 Understanding Unfamiliar Codebases with AI
26.3 Characterization Testing: Capturing Existing Behavior
26.4 The Strangler Fig Pattern
26.5 Extract Method and Extract Class Refactoring
26.6 Modernizing Dependency Structures
26.7 Migrating Between Frameworks
26.8 Incremental Modernization Strategies
26.9 Risk Management During Refactoring
26.10 Case Study: Refactoring a Legacy Application
Summary

Exercises Quiz Case Study 01 Case Study 02 Key Takeaways Further Reading

Chapter 26: Refactoring Legacy Code with AI

"The only way to go fast is to go well." — Robert C. Martin

Learning Objectives

By the end of this chapter, you will be able to:

Evaluate code characteristics that classify a codebase as "legacy" and assess the risks and costs associated with refactoring it (Bloom's: Evaluate)
Analyze unfamiliar codebases using AI-assisted exploration techniques, including data flow tracing, pattern identification, and dependency mapping (Bloom's: Analyze)
Create characterization tests that capture existing behavior before modifying legacy code (Bloom's: Create)
Apply the strangler fig pattern to incrementally replace legacy components with modern implementations (Bloom's: Apply)
Apply extract method and extract class refactoring techniques with AI assistance to improve code structure (Bloom's: Apply)
Design a modernization strategy for dependency structures, frameworks, and architectural patterns (Bloom's: Create)
Evaluate risk management strategies including feature flags, canary releases, and rollback plans during large-scale refactoring efforts (Bloom's: Evaluate)

Introduction

Every developer eventually encounters it: a sprawling codebase that nobody fully understands, written in a style that predates current best practices, with minimal tests and documentation that was last updated three framework versions ago. This is legacy code, and working with it is one of the most common challenges in professional software development.

The good news is that AI coding assistants have fundamentally changed how we approach legacy code. Tasks that once required weeks of painstaking archaeology — tracing data flows, understanding implicit contracts, mapping dependency graphs — can now be accomplished in hours. AI assistants can read thousands of lines of unfamiliar code and explain what it does, suggest refactoring strategies, generate characterization tests, and guide incremental modernization.

This chapter provides a systematic approach to refactoring legacy code with AI assistance. We will cover everything from initial assessment through final deployment, with practical techniques you can apply to real-world codebases. We build on the testing strategies from Chapter 21 and the design patterns from Chapter 25, applying them specifically to the challenge of improving existing code.

Note

Refactoring legacy code is inherently risky. The techniques in this chapter are designed to minimize that risk, but they require discipline and patience. Resist the temptation to rewrite everything from scratch — incremental improvement almost always produces better outcomes.

26.1 What Makes Code "Legacy"?

The word "legacy" often carries negative connotations, but it is worth understanding precisely what makes code qualify as legacy — and why simply being old is not the defining factor.

Defining Legacy Code

Michael Feathers, in his influential book Working Effectively with Legacy Code, provides the most widely accepted definition: legacy code is code without tests. This definition is powerful because it captures the essential problem: without tests, you cannot change the code with confidence. Every modification carries the risk of breaking something that works.

However, the practical definition is broader. Code becomes "legacy" when it exhibits several of these characteristics:

Lack of automated tests. Without a test suite, every change is a gamble. You cannot verify that existing behavior is preserved, and regression bugs become inevitable.

Missing or outdated documentation. When the original developers have moved on and the documentation has not kept pace with changes, understanding the system requires archaeological effort.

Tightly coupled components. When everything depends on everything else, changing one part risks breaking others in unpredictable ways. This often manifests as "God classes" or modules with hundreds of imports.

Obsolete dependencies. Libraries that are no longer maintained, framework versions with known security vulnerabilities, or language features that have been deprecated all contribute to technical debt.

Implicit knowledge. When critical business logic is embedded in code that only one person understood — and that person has left the team — the code becomes a black box.

Inconsistent coding standards. Code written by many developers over many years often exhibits wildly different styles, naming conventions, and architectural approaches within the same project.

Key Insight: A two-year-old codebase with no tests, no documentation, and tangled dependencies is more "legacy" than a ten-year-old system with comprehensive test coverage and clear architecture. Age is a correlating factor, not a defining one.

The Economics of Legacy Code

Before diving into refactoring techniques, it is essential to understand the economics. Refactoring legacy code is expensive, and not all legacy code needs to be refactored.

Consider the following decision matrix:

Scenario	Recommendation
Code works, rarely changes, low risk	Leave it alone
Code works but needs frequent modifications	Refactor incrementally
Code has known bugs affecting users	Fix bugs, add tests around fixes
Code blocks new feature development	Refactor the blocking components
Code has security vulnerabilities	Prioritize security-related refactoring
Entire system is unmaintainable	Consider strangler fig replacement

The key principle is: refactor code that you need to change. Do not refactor stable, working code simply because it does not meet modern standards. Your effort should be proportional to the pain the code is causing.

AI's Role in Legacy Code Assessment

AI coding assistants can dramatically accelerate the initial assessment of a legacy codebase. Here are specific tasks where AI excels:

Prompt: "Analyze this Python module and identify the following:
1. What are the main responsibilities of this module?
2. What external dependencies does it use?
3. What are the public vs private interfaces?
4. What design patterns (or anti-patterns) are present?
5. What would be the riskiest parts to modify?"

This kind of analysis, which might take a developer hours of reading, can be performed by an AI assistant in seconds. The AI may not catch every nuance, but it provides an excellent starting point for deeper investigation.

Cross-Reference: For more on how AI assistants process and understand code, see Chapter 2. For strategies on providing effective context to AI, see Chapter 9.

26.2 Understanding Unfamiliar Codebases with AI

The first step in any refactoring effort is understanding what the code does. With legacy systems, this is often the hardest part. AI assistants provide powerful tools for this exploration.

Systematic Codebase Exploration

Rather than reading code linearly, use AI to guide a systematic exploration:

Step 1: Entry Point Analysis. Start by identifying how the application is launched and how requests flow through it.

Prompt: "Look at the main entry points of this application
(manage.py, wsgi.py, and urls.py). Trace the request lifecycle
from an incoming HTTP request to a response. What middleware,
views, and models are involved?"

Step 2: Dependency Mapping. Understand what depends on what.

Prompt: "Analyze the import statements across all Python files
in the src/ directory. Create a dependency graph showing which
modules import from which other modules. Identify any circular
dependencies."

Step 3: Data Flow Tracing. Follow data through the system to understand transformations.

Prompt: "Trace how user registration data flows through this
application. Start from the registration form submission and
follow the data through validation, processing, storage, and
any subsequent notifications. Identify every function that
touches the user data."

Step 4: Pattern Identification. Recognize recurring structures and conventions.

Prompt: "Examine these five view functions and identify the
common patterns they share. What conventions does this codebase
use for error handling, authentication checks, and response
formatting?"

Building a Mental Model

AI can help you construct a high-level mental model of the system quickly. Ask for architectural summaries:

Prompt: "Based on the directory structure and the key files
I've shown you, describe the overall architecture of this
application. What architectural pattern does it follow?
What are the main layers and how do they communicate?"

The AI's response gives you a hypothesis about the system's architecture that you can then verify through targeted reading. This is much more efficient than trying to build an understanding purely from bottom-up code reading.

Identifying Hidden Coupling

Legacy code often has non-obvious dependencies. AI can help find these:

Prompt: "This function modifies a global dictionary called
'_registry'. Find every place in the codebase that reads from
or writes to this dictionary. What would break if I changed
its structure?"

Warning

AI analysis of codebases is not infallible. Always verify critical findings by reading the actual code. AI assistants may miss runtime dependencies, dynamic imports, monkey-patching, or metaprogramming tricks that are common in legacy Python code. Treat AI analysis as a starting point, not the final word.

Documenting As You Go

As you explore a legacy codebase with AI assistance, create documentation for those who follow:

Prompt: "Based on our analysis of the payment processing
module, write a concise architectural decision record (ADR)
documenting how the payment flow works, what assumptions
it makes, and what constraints it operates under."

This approach transforms the exploration process from pure consumption into productive documentation that benefits the entire team.

26.3 Characterization Testing: Capturing Existing Behavior

Before changing legacy code, you need a safety net. Characterization tests (also called "golden master" tests or "approval tests") capture the current behavior of the system — bugs and all — so that you can detect unintended changes during refactoring.

The Philosophy of Characterization Tests

Traditional tests are written to verify that code does what it should do. Characterization tests are different: they verify that code continues to do what it currently does, regardless of whether that behavior is correct. The distinction matters because in legacy code, the existing behavior often encodes implicit business rules that nobody documented.

Consider a legacy function that calculates shipping costs. It might have a bug where orders over $100 get free shipping even when they weigh more than 50 pounds. But if customers have relied on this behavior for years, "fixing" it could cause customer complaints. The characterization test captures the bug, so you know about it before deciding whether to preserve or change the behavior.

Generating Characterization Tests with AI

AI assistants are remarkably effective at generating characterization tests. Here is the workflow:

Step 1: Show the AI the function you want to characterize.

Prompt: "Here is a legacy function that calculates order
totals. Generate characterization tests that capture its
current behavior for a wide range of inputs, including edge
cases. Include tests for:
- Normal orders
- Empty orders
- Orders with discounts
- Orders with negative quantities (even if that seems wrong)
- Very large orders
- Orders with special characters in item names"

Step 2: Run the generated tests against the current code and fix any that fail.

Step 3: Verify coverage by checking which branches and conditions the tests exercise.

Prompt: "Looking at the control flow of this function, are
there any branches or conditions that our characterization
tests don't exercise? Generate additional tests to cover
those paths."

AI-Assisted Test Generation Patterns

When working with AI to generate characterization tests, several patterns are especially useful:

Input-Output Recording. For pure functions, have AI generate tests that record the actual output for various inputs:

def test_calculate_shipping_captures_current_behavior():
    """Characterization test: captures current shipping calculation."""
    assert calculate_shipping(weight=5, distance=100) == 12.50
    assert calculate_shipping(weight=0, distance=100) == 5.00
    assert calculate_shipping(weight=50, distance=0) == 0.00
    assert calculate_shipping(weight=100, distance=500) == 75.00

State Transition Recording. For stateful code, capture what state changes occur:

def test_process_order_state_transitions():
    """Characterization test: captures order state changes."""
    order = create_test_order()
    process_order(order)
    assert order.status == "processed"
    assert order.processed_at is not None
    assert len(order.line_items) == 3  # Even though input had 4

Side Effect Recording. For code with side effects, verify those effects occur:

def test_send_notification_side_effects(mock_email):
    """Characterization test: captures notification behavior."""
    send_order_notification(order_id=123)
    assert mock_email.call_count == 2  # Sends to user and admin
    assert "Order #123" in mock_email.calls[0].subject

Cross-Reference: For a deeper treatment of testing strategies and AI-assisted test generation, see Chapter 21. The characterization testing approach described here builds on those foundations.

Practical Tips for Characterization Testing

Start with the code you plan to change. You do not need to characterize the entire system — focus on the components you will modify.
Capture weird behavior explicitly. If a function returns None when you expect it to raise an exception, write a test that asserts result is None. Add a comment noting the surprising behavior.
Use snapshot testing for complex outputs. When functions return large data structures, JSON or HTML, snapshot testing frameworks can capture the full output without requiring you to hand-write every assertion.
Run tests frequently. After every small refactoring step, run your characterization tests. If something breaks, you know exactly which change caused it.
Accept imperfect coverage. Some legacy code is nearly impossible to test (for instance, code with deeply embedded database calls or file system operations). Get the best coverage you can and document the gaps.

26.4 The Strangler Fig Pattern

The strangler fig pattern is one of the most important strategies for legacy system modernization. Named after the strangler fig tree, which grows around an existing tree and eventually replaces it, this pattern allows you to incrementally replace legacy components with modern ones.

How the Pattern Works

Instead of rewriting a legacy system from scratch (the infamous "Big Bang" rewrite), you:

Identify a component of the legacy system to replace
Build the replacement alongside the existing code
Route traffic to the new component using a facade or router
Verify the new component produces correct results
Remove the old component once the new one is proven
Repeat for the next component

This approach eliminates the all-or-nothing risk of a complete rewrite. At every step, you have a working system that you can roll back to.

Implementing the Strangler Fig with AI

AI assistants can help at every stage of the strangler fig pattern:

Identifying components to replace:

Prompt: "Analyze this legacy order processing system and
identify components that could be independently replaced.
Rank them by: (1) how self-contained they are, (2) how much
pain they cause the development team, and (3) how risky
replacement would be."

Building the replacement:

Prompt: "Here is the legacy shipping calculator module.
Write a modern replacement that:
- Produces identical output for the same inputs
- Uses type hints and dataclasses
- Has comprehensive unit tests
- Follows the repository pattern for data access
- Is compatible with the existing interface"

Creating the routing layer:

class ShippingCalculatorFacade:
    """Routes shipping calculations to legacy or modern implementation.

    Uses feature flags to control which implementation handles
    each request, enabling gradual migration.
    """

    def __init__(self, legacy_calculator, modern_calculator, feature_flags):
        self.legacy = legacy_calculator
        self.modern = modern_calculator
        self.flags = feature_flags

    def calculate(self, order):
        if self.flags.is_enabled("use_modern_shipping", order.customer_id):
            result = self.modern.calculate(order)
            # Shadow mode: compare results but return legacy
            if self.flags.is_enabled("shipping_shadow_mode"):
                legacy_result = self.legacy.calculate(order)
                if result != legacy_result:
                    log_discrepancy(order, legacy_result, result)
                return legacy_result
            return result
        return self.legacy.calculate(order)

Shadow Mode: Verify Before You Switch

The example above demonstrates "shadow mode" — running both old and new implementations simultaneously and comparing results. This is invaluable for catching discrepancies before they affect users.

Prompt: "Generate a shadow mode wrapper that runs both the
legacy and modern payment processing implementations, compares
their results, logs any discrepancies with full context, and
returns the legacy result. Include metrics collection for
monitoring the discrepancy rate."

Shadow mode lets you build confidence in the new implementation gradually. When the discrepancy rate drops to zero (or to an acceptable level for known improvements), you can switch to the new implementation.

Key Insight: The strangler fig pattern works because it turns a terrifying "all-at-once" migration into a series of small, reversible changes. Each change is low-risk on its own, and the overall migration can be paused or adjusted at any point.

26.5 Extract Method and Extract Class Refactoring

Extract method and extract class are the workhorse refactoring techniques. They transform tangled, monolithic code into well-structured components with clear responsibilities. AI assistants make these refactorings faster and safer.

Extract Method with AI

Long methods are the most common code smell in legacy systems. A single function might handle validation, business logic, data access, and formatting — all in one 200-line block. Extract method breaks this into focused, testable pieces.

Identifying extraction candidates:

Prompt: "This function is 180 lines long. Identify logical
sections that could be extracted into separate methods. For
each section, suggest a descriptive method name and describe
what parameters it would need and what it would return."

The AI will typically identify natural boundaries in the code: comment blocks that describe sections, variable assignments that mark transitions between phases, and try/except blocks that wrap distinct operations.

Performing the extraction:

Prompt: "Extract the validation logic (lines 15-45) from the
process_order function into a separate validate_order method.
Ensure:
1. The extracted method has a clear signature with type hints
2. The original function calls the new method
3. All variables needed by the extracted code are passed as
   parameters
4. The behavior is identical to the original"

Extract Class with AI

When a class has too many responsibilities, extracting a subset of its functionality into a new class improves cohesion and testability.

Identifying class extraction opportunities:

Prompt: "This UserManager class has 25 methods. Group the
methods by responsibility and suggest how to split this into
multiple focused classes. For each proposed class:
1. List the methods it would contain
2. Describe its single responsibility
3. Identify what data it needs access to
4. Describe how it would interact with the other classes"

Common extractions include:

Data access logic into repository classes
Validation logic into validator classes
Formatting logic into presenter or serializer classes
Notification logic into notifier classes
Configuration logic into configuration classes

AI-Assisted Refactoring Workflow

Here is a practical workflow for using AI in extract method/class refactoring:

Show the AI the code and ask it to identify extraction opportunities
Discuss the proposed extractions — ask about trade-offs and alternatives
Write characterization tests for the original code (see Section 26.3)
Have AI perform the extraction one method/class at a time
Run characterization tests after each extraction to verify behavior preservation
Review the refactored code to ensure it makes sense and is well-named

Prompt: "I've extracted the validation logic into a separate
OrderValidator class. Review the refactored code and check:
1. Is the interface between OrderProcessor and OrderValidator
   clean and minimal?
2. Are there any subtle behavioral changes from the extraction?
3. Are there additional methods that should move to
   OrderValidator?
4. Does the naming follow Python conventions?"

Practical Tip: When extracting methods from legacy code, preserve the original method as a thin wrapper that delegates to the extracted methods. This ensures that all existing callers continue to work without modification. You can deprecate and remove the wrapper later.

26.6 Modernizing Dependency Structures

Legacy code often suffers from tangled dependency structures — circular imports, God objects that everything depends on, tight coupling to specific libraries, and global state scattered throughout the codebase. Modernizing these structures is essential for long-term maintainability.

Identifying Dependency Problems

Ask AI to analyze the dependency structure of your codebase:

Prompt: "Analyze the import statements in all Python files
under src/. Identify:
1. Circular dependencies (A imports B, B imports A)
2. Modules with more than 10 imports (potential God modules)
3. Direct imports of third-party libraries that should be
   abstracted
4. Inconsistent import patterns (some files use absolute
   imports, others use relative)"

Breaking Circular Dependencies

Circular dependencies are a common problem in legacy Python code. AI can help identify and resolve them:

Prompt: "The modules models.py and services.py have a circular
import. models.py imports ServiceRegistry from services.py, and
services.py imports User from models.py. Suggest three
different approaches to break this circular dependency, with
pros and cons of each approach."

Common solutions include:

Extract shared code into a third module that both import from
Use dependency injection to pass dependencies at runtime rather than import time
Use protocols or abstract base classes to define interfaces without concrete dependencies
Defer imports by moving them inside functions (a temporary fix, not a long-term solution)

Introducing Dependency Injection

Legacy code often creates its dependencies directly, making it hard to test and hard to swap implementations. AI can help introduce dependency injection:

Prompt: "This OrderService class directly creates a
DatabaseConnection, an EmailClient, and a PaymentGateway in
its __init__ method. Refactor it to accept these as constructor
parameters with sensible defaults, so they can be replaced with
mocks in tests and swapped in production."

Before:

class OrderService:
    def __init__(self):
        self.db = DatabaseConnection("production_db")
        self.email = EmailClient("smtp.company.com")
        self.payment = PaymentGateway(api_key="sk_live_xxx")

After:

class OrderService:
    def __init__(
        self,
        db: DatabaseConnection | None = None,
        email: EmailClient | None = None,
        payment: PaymentGateway | None = None,
    ):
        self.db = db or DatabaseConnection("production_db")
        self.email = email or EmailClient("smtp.company.com")
        self.payment = payment or PaymentGateway(
            api_key=os.environ["PAYMENT_API_KEY"]
        )

Abstracting Third-Party Dependencies

Legacy code often uses third-party libraries directly throughout the codebase. When those libraries need to be upgraded or replaced, every usage site must change. Wrapping third-party dependencies behind your own interfaces protects against this:

Prompt: "This codebase uses the 'requests' library directly
in 47 different files. Design a thin wrapper module that
provides the HTTP client functionality we actually use (GET,
POST, JSON parsing, basic auth) behind our own interface. Show
me the wrapper and one example of converting a call site."

Cross-Reference: The dependency inversion principle and other SOLID principles discussed in Chapter 25 provide the theoretical foundation for the dependency modernization techniques described here.

26.7 Migrating Between Frameworks

Framework migrations are among the most challenging refactoring tasks. Whether you are moving from Flask to FastAPI, from Django REST Framework to a GraphQL API, or from SQLAlchemy to a different ORM, AI assistants can dramatically accelerate the process.

Planning a Framework Migration

Before writing any code, use AI to create a comprehensive migration plan:

Prompt: "We need to migrate our Flask application to FastAPI.
The application has 45 endpoints, uses Flask-SQLAlchemy for
database access, Flask-Login for authentication, and Celery
for background tasks. Create a detailed migration plan that:
1. Lists what changes for each component
2. Identifies what can be migrated independently
3. Suggests an order of operations
4. Highlights potential breaking changes
5. Estimates relative effort for each phase"

Example: Flask to FastAPI Migration

This is one of the most common Python framework migrations. Here is how AI can assist at each step:

Step 1: Route conversion

Prompt: "Convert this Flask route to FastAPI. Preserve the
exact same behavior, including error handling and response
format. Use Pydantic models for request/response validation."

Flask:

@app.route('/api/users/<int:user_id>', methods=['GET'])
@login_required
def get_user(user_id):
    user = User.query.get(user_id)
    if not user:
        return jsonify({"error": "User not found"}), 404
    return jsonify(user.to_dict())

FastAPI:

@router.get("/api/users/{user_id}", response_model=UserResponse)
async def get_user(
    user_id: int,
    current_user: User = Depends(get_current_user),
    db: Session = Depends(get_db),
):
    user = db.query(User).get(user_id)
    if not user:
        raise HTTPException(status_code=404, detail="User not found")
    return UserResponse.from_orm(user)

Step 2: Middleware migration

Prompt: "Our Flask app uses these middleware components:
[CORS, rate limiting, request logging, error handling].
Show me how to implement equivalent middleware in FastAPI."

Step 3: Authentication migration

Prompt: "Convert our Flask-Login based authentication to
FastAPI's dependency injection system. Here is our current
login_required decorator and user loader. Preserve the same
session-based authentication behavior."

Running Both Frameworks During Migration

A powerful technique for framework migration is running both the old and new frameworks simultaneously behind a reverse proxy:

# nginx configuration for gradual migration
upstream legacy_flask {
    server 127.0.0.1:5000;
}

upstream modern_fastapi {
    server 127.0.0.1:8000;
}

server {
    # Migrated endpoints go to FastAPI
    location /api/v2/users {
        proxy_pass http://modern_fastapi;
    }

    # Everything else still goes to Flask
    location / {
        proxy_pass http://legacy_flask;
    }
}

This is the strangler fig pattern applied to framework migration. As you convert each endpoint, you update the routing to send traffic to the new framework.

Warning

Framework migrations often uncover implicit behavior that the old framework handled automatically. For example, Flask's automatic JSON serialization of datetime objects may differ from FastAPI's Pydantic serialization. Always test thoroughly at integration boundaries.

26.8 Incremental Modernization Strategies

Large-scale refactoring cannot happen overnight. You need strategies for making steady progress while maintaining a working system. This section covers practical approaches to incremental modernization.

The Boy Scout Rule at Scale

The Boy Scout Rule — "leave the code better than you found it" — is a powerful strategy when applied consistently across a team. Every time someone touches legacy code, they make one small improvement:

Add type hints to function signatures they modify
Write a test for the function they are debugging
Extract a method from a long function they are extending
Replace a deprecated API call they encounter
Add a docstring to a function they had to spend time understanding

Over time, these small improvements compound. AI assistants make each improvement faster:

Prompt: "I'm modifying the calculate_discount function to add
a new discount type. While I'm here, add type hints to all
parameters and the return value, and add a docstring describing
the function's behavior."

Modernization Layers

Structure your modernization effort in layers, tackling foundational issues before cosmetic ones:

Layer 1: Safety (Weeks 1-4) - Add characterization tests to critical paths - Set up continuous integration if not already present - Establish code formatting standards (use tools like black and isort)

Layer 2: Structure (Weeks 5-12) - Break circular dependencies - Extract methods from oversized functions - Introduce dependency injection for testability

Layer 3: Architecture (Weeks 13-24) - Apply design patterns (repository, service layer, etc.) - Migrate to modern frameworks where needed - Introduce proper configuration management

Layer 4: Polish (Ongoing) - Add type hints throughout - Improve naming and documentation - Optimize performance bottlenecks

Feature Flags for Safe Rollout

Feature flags are essential for safe incremental modernization. They let you deploy new code without activating it, then gradually roll it out to users:

class FeatureFlags:
    """Simple feature flag system for controlling rollout."""

    def __init__(self, config: dict[str, Any]):
        self.config = config

    def is_enabled(self, flag_name: str, user_id: int | None = None) -> bool:
        """Check if a feature flag is enabled.

        Supports global flags and percentage-based rollouts.
        """
        flag = self.config.get(flag_name)
        if flag is None:
            return False
        if isinstance(flag, bool):
            return flag
        if isinstance(flag, dict):
            percentage = flag.get("percentage", 0)
            if user_id is not None:
                return (hash(f"{flag_name}:{user_id}") % 100) < percentage
            return False
        return False

Use feature flags to control which implementation handles each request:

def process_payment(order, feature_flags, user_id):
    if feature_flags.is_enabled("modern_payment_flow", user_id):
        return modern_payment_processor.process(order)
    return legacy_payment_processor.process(order)

Managing Technical Debt Backlog

Track refactoring work as explicitly as feature work:

Prompt: "Based on our analysis of the codebase, create a
prioritized technical debt backlog. For each item, include:
1. Description of the debt
2. Impact on development velocity
3. Estimated effort to address
4. Risk level of the refactoring
5. Dependencies on other debt items"

Practical Tip: Allocate a consistent percentage of each sprint to technical debt reduction — typically 15-20%. This ensures steady progress without halting feature development. Trying to do all refactoring in a single "cleanup sprint" rarely works because it creates a large batch of changes with high risk.

26.9 Risk Management During Refactoring

Refactoring legacy code is inherently risky. This section covers strategies for managing that risk effectively.

Risk Assessment Framework

Before beginning any refactoring effort, assess the risks:

Impact Analysis. What happens if the refactoring introduces a bug?

Prompt: "If we refactor the payment processing module and
introduce a bug, what is the blast radius? Which user-facing
features depend on this module? What is the financial impact
of a payment processing failure?"

Reversibility Assessment. How easily can you undo the change?

Database schema changes are hard to reverse
API contract changes affect external consumers
Internal refactoring is usually easy to revert with version control
Data migrations may be irreversible

Dependency Assessment. What else might break?

Prompt: "List every module that imports from or depends on
the payment processing module. For each dependency, classify
it as direct (imports functions), indirect (uses data
structures defined there), or implicit (relies on side effects
like database writes)."

Canary Releases

Canary releases deploy the refactored code to a small subset of users before rolling it out to everyone:

Deployment Strategy:
1. Deploy refactored code to canary servers (5% of traffic)
2. Monitor error rates, latency, and business metrics
3. If metrics are normal after 24 hours, expand to 25%
4. If metrics remain normal after another 24 hours, expand to 100%
5. If any metric degrades, immediately roll back canary servers

AI can help you design monitoring for canary releases:

Prompt: "We're deploying a refactored order processing system
as a canary release. What metrics should we monitor to detect
regressions? Include both technical metrics (error rates,
latency) and business metrics (order completion rate, average
order value)."

Rollback Plans

Every refactoring deployment should have a rollback plan:

Prompt: "Create a rollback plan for our payment processing
refactoring. Include:
1. Criteria that trigger a rollback
2. Step-by-step rollback procedure
3. How to handle transactions that occurred during the
   refactored period
4. Communication plan for stakeholders
5. Post-rollback investigation process"

The Testing Pyramid for Refactoring

During refactoring, your testing strategy should follow a specific pattern:

Before refactoring: - Write characterization tests (integration level) - Write smoke tests for critical paths (end-to-end level)

During refactoring: - Write unit tests for new code - Run characterization tests after every change - Run integration tests frequently

After refactoring: - Replace characterization tests with proper specification tests - Add performance tests if applicable - Run full regression suite

Common Refactoring Risks and Mitigations

Risk	Mitigation
Behavior change	Characterization tests + shadow mode
Performance regression	Performance benchmarks before and after
Data corruption	Database backups + reversible migrations
API contract break	API versioning + consumer-driven contracts
Team confusion	Documentation + communication + pair programming
Scope creep	Clear boundaries + time-boxing
Incomplete migration	Feature flags + strangler fig + tracking

Key Insight: The biggest risk in refactoring is not introducing a bug — it is introducing a bug that you do not detect until it has caused real damage. Your risk management strategy should focus on detection speed: monitoring, alerting, and rapid rollback capabilities.

26.10 Case Study: Refactoring a Legacy Application

Let us walk through a realistic refactoring scenario from start to finish. This case study illustrates how the techniques from this chapter work together in practice.

The Starting Point

We have inherited a Flask-based e-commerce application that has been in production for six years. It has the following characteristics:

45,000 lines of Python across 120 files
Flask 1.0 (three major versions behind)
No test suite
15 database models with raw SQL queries mixed in with ORM usage
A monolithic views.py with 3,200 lines and 67 route handlers
Global configuration via module-level variables
Authentication implemented via custom decorators with copy-pasted logic
No type hints anywhere
Three developers originally worked on it; none remain on the team

Phase 1: Assessment (Week 1)

We begin by using AI to understand the system:

Prompt: "I'm going to share the directory structure and key
files from a legacy Flask e-commerce application. Help me
understand:
1. The overall architecture and request flow
2. The data model and relationships
3. The most critical business logic
4. The riskiest areas (most complex, least understood)
5. Quick wins for improvement"

The AI identifies several critical findings: - The payment processing code is duplicated in three places with slight variations - There are SQL injection vulnerabilities in the search functionality - The session management has a race condition under high load - Several database queries use SELECT * and fetch far more data than needed

We prioritize the security issues for immediate fixing and plan the broader refactoring.

Phase 2: Safety Net (Weeks 2-4)

We add characterization tests to the most critical paths:

Prompt: "Here is the order processing function (145 lines).
Generate comprehensive characterization tests that capture its
current behavior. Cover the happy path, common error cases,
and the edge cases I've noticed (orders with zero items,
orders with negative prices, orders from suspended accounts)."

We set up pytest with coverage reporting, add a CI pipeline, and establish a baseline: 12% code coverage over the critical payment and order processing paths.

Phase 3: Extract and Organize (Weeks 5-10)

We tackle the monolithic views.py first:

Prompt: "This 3,200-line views.py contains 67 route handlers.
Group them by domain (user management, product catalog,
order processing, payment, reporting) and suggest how to split
this into separate Blueprint modules."

The AI suggests five Blueprint modules. We extract them one at a time, running characterization tests after each extraction. We also introduce a service layer, extracting business logic from the route handlers:

Prompt: "This route handler for order creation mixes HTTP
request parsing, validation, business logic, database access,
and email notification into a single 95-line function. Separate
it into:
1. A thin route handler that deals with HTTP
2. A service function that contains business logic
3. A repository function for data access
Show me the refactored code and verify it preserves behavior."

Phase 4: Modernize Dependencies (Weeks 11-16)

With the code better organized, we modernize the dependency structure:

Replace raw SQL with SQLAlchemy ORM queries
Introduce dependency injection for database sessions
Wrap the email library behind our own interface
Replace the custom authentication with Flask-Login

Each change is guarded by feature flags so we can roll back if issues arise.

Phase 5: Framework Migration (Weeks 17-24)

With the codebase well-structured and tested, we begin migrating from Flask 1.0 to FastAPI:

Convert one Blueprint at a time, starting with the least critical (reporting)
Run both Flask and FastAPI behind nginx, routing by endpoint
Use shadow mode to compare responses between old and new implementations
Gradually shift traffic as confidence builds

Results

After 24 weeks of incremental work:

Test coverage: 12% to 78%
Code structure: 1 monolithic file to 45 focused modules
Framework: Flask 1.0 to FastAPI with async support
Security: All known vulnerabilities addressed
Performance: Average response time decreased 40% due to async I/O
Developer productivity: New features now take approximately half the time

Key Insight: Notice that this was not a "stop everything and rewrite" project. The team continued delivering features throughout. The refactoring happened incrementally, in parallel with normal development, guided by the principle of refactoring what you need to change.

Summary

Refactoring legacy code is challenging but essential work, and AI coding assistants have made it dramatically more accessible. The key principles to remember are:

Understand before you change. Use AI to build a comprehensive mental model of the legacy codebase before modifying anything.
Test before you refactor. Characterization tests are your safety net. Without them, you are changing code blind.
Proceed incrementally. The strangler fig pattern, feature flags, and incremental modernization strategies all serve the same purpose: making large changes through a series of small, reversible steps.
Manage risk explicitly. Every refactoring change should have a rollback plan. Use canary releases, shadow mode, and monitoring to catch problems early.
Refactor what you need to change. Do not refactor stable code for its own sake. Focus your effort where it will have the most impact on development velocity and system reliability.

AI assistants accelerate every stage of this process: understanding unfamiliar code, generating characterization tests, suggesting refactoring strategies, implementing changes, and reviewing results. But the strategic decisions — what to refactor, in what order, and how aggressively — remain yours.

The techniques in this chapter, combined with the testing strategies from Chapter 21 and the design patterns from Chapter 25, give you a comprehensive toolkit for transforming legacy codebases into modern, maintainable systems. The key is patience, discipline, and a commitment to incremental improvement.

Final Thought: Every codebase you write today will eventually become someone else's legacy code. The best way to make future refactoring easier is to write clean, well-tested, well-documented code now. But when you inherit code that did not follow these principles, the techniques in this chapter will serve you well.

In This Chapter

Chapter 26: Refactoring Legacy Code with AI

Learning Objectives

Introduction

Note

26.1 What Makes Code "Legacy"?

Defining Legacy Code

The Economics of Legacy Code

AI's Role in Legacy Code Assessment

26.2 Understanding Unfamiliar Codebases with AI

Systematic Codebase Exploration

Building a Mental Model

Identifying Hidden Coupling

Warning

Documenting As You Go

26.3 Characterization Testing: Capturing Existing Behavior

The Philosophy of Characterization Tests

Generating Characterization Tests with AI

AI-Assisted Test Generation Patterns

Practical Tips for Characterization Testing

26.4 The Strangler Fig Pattern

How the Pattern Works

Implementing the Strangler Fig with AI

Shadow Mode: Verify Before You Switch

26.5 Extract Method and Extract Class Refactoring

Extract Method with AI

Extract Class with AI

AI-Assisted Refactoring Workflow

26.6 Modernizing Dependency Structures

Identifying Dependency Problems

Breaking Circular Dependencies

Introducing Dependency Injection

Abstracting Third-Party Dependencies

26.7 Migrating Between Frameworks

Planning a Framework Migration

Example: Flask to FastAPI Migration

Running Both Frameworks During Migration

Warning

26.8 Incremental Modernization Strategies

The Boy Scout Rule at Scale

Modernization Layers

Feature Flags for Safe Rollout

Managing Technical Debt Backlog

26.9 Risk Management During Refactoring

Risk Assessment Framework

Canary Releases

Rollback Plans

The Testing Pyramid for Refactoring

Common Refactoring Risks and Mitigations

26.10 Case Study: Refactoring a Legacy Application

The Starting Point

Phase 1: Assessment (Week 1)

Phase 2: Safety Net (Weeks 2-4)

Phase 3: Extract and Organize (Weeks 5-10)

Phase 4: Modernize Dependencies (Weeks 11-16)

Phase 5: Framework Migration (Weeks 17-24)

Results

Summary

Related Reading