Chapter 26: Exercises — Refactoring Legacy Code with AI
Tier 1: Recall and Understanding (Exercises 1-6)
Exercise 1: Defining Legacy Code
Objective: Recall the characteristics that make code "legacy."
List at least six characteristics that can classify a codebase as "legacy." For each characteristic, write one sentence explaining why it makes the code harder to work with. Refer to Michael Feathers' definition and explain why "code without tests" is considered the core definition.
Exercise 2: Characterization Test Vocabulary
Objective: Distinguish characterization tests from other test types.
Compare and contrast the following test types in a table format: - Unit tests - Integration tests - Characterization tests - Regression tests
For each, specify: (a) when it is written relative to the code, (b) what it verifies, (c) whether it captures intended or actual behavior, and (d) when it is most useful.
Exercise 3: The Strangler Fig Pattern
Objective: Explain the strangler fig pattern in your own words.
Write a 200-word explanation of the strangler fig pattern suitable for a junior developer who has never heard of it. Include: (a) why it is named after a tree, (b) the basic steps, (c) why it is preferred over a "Big Bang" rewrite, and (d) one real-world analogy that is not related to software.
Exercise 4: Risk Assessment Matching
Objective: Match refactoring risks with appropriate mitigations.
Match each risk on the left with the most appropriate mitigation on the right:
| Risk | Mitigation Options |
|---|---|
| 1. Behavior change during refactoring | A. Database backups + reversible migrations |
| 2. Performance regression | B. API versioning |
| 3. Data corruption | C. Characterization tests + shadow mode |
| 4. API contract break | D. Feature flags + tracking dashboard |
| 5. Incomplete migration | E. Performance benchmarks before/after |
Exercise 5: Dependency Problems Identification
Objective: Recognize common dependency problems in legacy code.
Given the following import structure, identify all problems:
# file: models/user.py
from services.auth import AuthService
from services.email import send_welcome_email
# file: services/auth.py
from models.user import User
from services.email import send_login_alert
# file: services/email.py
from models.user import User
from services.auth import get_current_user
List each circular dependency you find and explain why it is problematic.
Exercise 6: Framework Migration Concepts
Objective: Recall the key considerations for framework migration.
List five implicit behaviors that a web framework typically handles automatically (for example, JSON serialization of datetime objects). Explain why each of these can cause subtle bugs during a framework migration if not accounted for.
Tier 2: Application (Exercises 7-12)
Exercise 7: Writing Your First Characterization Test
Objective: Write characterization tests for a given function.
Given the following legacy function, write at least eight characterization tests that capture its current behavior, including edge cases:
def calculate_price(base_price, quantity, customer_type, coupon_code=None):
if quantity < 0:
quantity = 0
price = base_price * quantity
if customer_type == "wholesale":
price = price * 0.8
elif customer_type == "employee":
price = price * 0.5
if coupon_code == "SAVE10":
price = price - 10
elif coupon_code == "HALF":
price = price / 2
if price < 0:
price = 0
return round(price, 2)
Run your tests mentally and verify that each assertion matches the function's actual behavior.
Exercise 8: Extract Method Practice
Objective: Apply extract method refactoring to a long function.
Refactor the following function by extracting at least three separate methods. Each extracted method should have a clear name, type hints, and a docstring:
def process_user_registration(form_data):
# Validate email
email = form_data.get("email", "")
if "@" not in email or "." not in email:
return {"error": "Invalid email"}
if len(email) > 254:
return {"error": "Email too long"}
existing = db.query("SELECT id FROM users WHERE email = %s", email)
if existing:
return {"error": "Email already registered"}
# Validate password
password = form_data.get("password", "")
if len(password) < 8:
return {"error": "Password too short"}
if not any(c.isupper() for c in password):
return {"error": "Password needs uppercase letter"}
if not any(c.isdigit() for c in password):
return {"error": "Password needs a digit"}
# Create user
hashed = hashlib.sha256(password.encode()).hexdigest()
user_id = db.execute(
"INSERT INTO users (email, password_hash) VALUES (%s, %s)",
email, hashed
)
# Send welcome email
subject = f"Welcome to our platform!"
body = f"Hi there! Your account has been created with email {email}."
email_client.send(to=email, subject=subject, body=body)
# Log the registration
logger.info(f"New user registered: {user_id} ({email})")
audit_log.record("user_registration", user_id=user_id)
return {"success": True, "user_id": user_id}
Exercise 9: Strangler Fig Implementation
Objective: Implement a strangler fig facade for a simple scenario.
You have a legacy tax calculator and a modern replacement. Write a TaxCalculatorFacade class that:
1. Accepts both implementations and a feature flag configuration
2. Routes calculations to the appropriate implementation based on a feature flag
3. Supports a shadow mode that runs both, logs discrepancies, and returns the legacy result
4. Includes proper type hints and docstrings
# Legacy implementation
def calculate_tax_legacy(subtotal, state):
rates = {"CA": 0.0725, "NY": 0.08, "TX": 0.0625}
rate = rates.get(state, 0.05)
return round(subtotal * rate, 2)
# Modern implementation
def calculate_tax_modern(subtotal, state):
# Uses updated rates and handles more states
rates = {"CA": 0.0725, "NY": 0.08, "TX": 0.0625, "WA": 0.065, "OR": 0.0}
rate = rates.get(state, 0.05)
return round(subtotal * rate, 2)
Exercise 10: Dependency Injection Refactoring
Objective: Refactor tightly coupled code to use dependency injection.
Refactor the following class to use dependency injection. The refactored class should be testable with mock dependencies:
import requests
import smtplib
class OrderNotifier:
def notify_customer(self, order_id):
response = requests.get(f"https://api.internal/orders/{order_id}")
order = response.json()
server = smtplib.SMTP("mail.company.com", 587)
server.login("orders@company.com", "secretpassword")
server.sendmail(
"orders@company.com",
order["customer_email"],
f"Your order #{order_id} has been shipped!"
)
server.quit()
requests.post(
"https://api.internal/audit",
json={"event": "notification_sent", "order_id": order_id}
)
Exercise 11: Feature Flag System
Objective: Implement a feature flag system with percentage-based rollout.
Build a FeatureFlag class that supports:
1. Boolean on/off flags
2. Percentage-based rollout (e.g., enable for 25% of users)
3. User allowlist (always enabled for specific users)
4. User blocklist (always disabled for specific users)
5. Default value when flag is not configured
Write tests demonstrating each capability.
Exercise 12: Migration Checklist Generator
Objective: Use AI prompting skills to generate a migration checklist.
Write three detailed prompts that you would give to an AI assistant to help migrate a Flask application to FastAPI. Each prompt should: 1. Provide specific context about the existing code 2. Request a specific output format 3. Include constraints the AI should respect 4. Ask for both the migration code and an explanation of changes
Tier 3: Analysis (Exercises 13-18)
Exercise 13: Legacy Code Smell Analysis
Objective: Analyze code to identify and categorize code smells.
Examine the following class and identify at least seven distinct code smells. For each smell, (a) name it, (b) explain why it is problematic, and (c) suggest a specific refactoring to address it:
class Manager:
_instance = None
data = {}
def __new__(cls):
if cls._instance is None:
cls._instance = super().__new__(cls)
return cls._instance
def do_everything(self, action, **kwargs):
if action == "create_user":
name = kwargs.get("name")
email = kwargs.get("email")
if not name or not email:
print("Error: missing fields")
return None
import sqlite3
conn = sqlite3.connect("app.db")
conn.execute("INSERT INTO users VALUES (?, ?)", (name, email))
conn.commit()
conn.close()
self.data[email] = {"name": name, "type": "user"}
import smtplib
s = smtplib.SMTP("localhost")
s.sendmail("admin@co.com", email, f"Welcome {name}")
s.quit()
return True
elif action == "delete_user":
email = kwargs.get("email")
import sqlite3
conn = sqlite3.connect("app.db")
conn.execute("DELETE FROM users WHERE email=?", (email,))
conn.commit()
conn.close()
if email in self.data:
del self.data[email]
return True
elif action == "send_report":
import sqlite3
conn = sqlite3.connect("app.db")
users = conn.execute("SELECT * FROM users").fetchall()
conn.close()
report = ""
for u in users:
report += f"{u[0]}: {u[1]}\n"
import smtplib
s = smtplib.SMTP("localhost")
s.sendmail("admin@co.com", "boss@co.com", report)
s.quit()
return report
else:
print(f"Unknown action: {action}")
return None
Exercise 14: Dependency Graph Analysis
Objective: Analyze and improve a dependency graph.
Given this module dependency list, draw (or describe) the dependency graph and identify problems:
app.py -> views.py, models.py, config.py
views.py -> models.py, services.py, utils.py, config.py
models.py -> utils.py, config.py
services.py -> models.py, utils.py, external_api.py, config.py
utils.py -> config.py, models.py
external_api.py -> config.py, models.py
config.py -> (no dependencies)
- Identify all circular dependencies
- Identify the module with the highest coupling
- Propose a revised dependency structure that eliminates circular dependencies
- Explain which module should be refactored first and why
Exercise 15: Refactoring Strategy Comparison
Objective: Compare different refactoring strategies for the same problem.
You have a 500-line function that processes CSV files, validates data, transforms it, stores it in a database, and sends summary emails. Compare these three refactoring strategies:
- Extract Method: Break into smaller functions within the same module
- Extract Class: Create separate classes for each responsibility
- Pipeline Pattern: Create a data processing pipeline with discrete stages
For each strategy, analyze: (a) effort required, (b) testability improvement, (c) reusability, (d) readability, and (e) when it would be the best choice.
Exercise 16: Shadow Mode Log Analysis
Objective: Analyze shadow mode discrepancy logs to identify issues.
Given these shadow mode logs comparing legacy and modern implementations, identify the root causes and suggest fixes:
[2024-01-15 10:23:45] DISCREPANCY order_id=1001
legacy_total=99.99 modern_total=100.00
[2024-01-15 10:23:46] DISCREPANCY order_id=1002
legacy_total=0.00 modern_total=0.01
[2024-01-15 10:24:01] DISCREPANCY order_id=1003
legacy_shipping="free" modern_shipping=0.00
[2024-01-15 10:24:15] DISCREPANCY order_id=1004
legacy_tax=8.25 modern_tax=8.2500001
[2024-01-15 10:25:00] MATCH order_id=1005 total=250.00
[2024-01-15 10:25:30] DISCREPANCY order_id=1006
legacy_discount=10% modern_discount=0.10
[2024-01-15 10:26:00] DISCREPANCY order_id=1007
legacy_result=None modern_result={"error": "invalid_state"}
For each discrepancy, classify it as: (a) a genuine bug in the modern implementation, (b) a representation/formatting difference, (c) a floating-point precision issue, or (d) an improvement in the modern implementation that should be preserved.
Exercise 17: Test Coverage Gap Analysis
Objective: Identify gaps in characterization test coverage.
Given this function and the following test suite, identify at least five behaviors that are not covered by the tests:
def apply_coupon(order, coupon_code):
if order.status != "pending":
return order
coupon = db.get_coupon(coupon_code)
if not coupon:
order.add_error("Invalid coupon")
return order
if coupon.expires_at < datetime.now():
order.add_error("Coupon expired")
return order
if coupon.min_order_value and order.subtotal < coupon.min_order_value:
order.add_error(f"Minimum order: ${coupon.min_order_value}")
return order
if coupon.usage_count >= coupon.max_uses:
order.add_error("Coupon fully redeemed")
return order
if coupon.type == "percent":
discount = order.subtotal * (coupon.value / 100)
elif coupon.type == "fixed":
discount = coupon.value
else:
discount = 0
order.discount = min(discount, order.subtotal)
order.coupon_code = coupon_code
coupon.usage_count += 1
return order
# Existing tests:
def test_valid_percent_coupon():
order = make_order(subtotal=100, status="pending")
result = apply_coupon(order, "SAVE20")
assert result.discount == 20.0
def test_invalid_coupon():
order = make_order(subtotal=100, status="pending")
result = apply_coupon(order, "INVALID")
assert "Invalid coupon" in result.errors
Exercise 18: Incremental vs. Big Bang Analysis
Objective: Analyze the trade-offs between incremental and complete rewrite approaches.
Your team is debating whether to incrementally refactor or completely rewrite a legacy system with 80,000 lines of code, 200 API endpoints, and 15 developers. The system has 5% test coverage and is written in Python 2.7.
Write a structured analysis covering: 1. Three arguments for incremental refactoring 2. Three arguments for a complete rewrite 3. A hybrid approach that might work better than either extreme 4. What additional information you would need to make the final decision
Tier 4: Synthesis (Exercises 19-24)
Exercise 19: Complete Refactoring Plan
Objective: Create a comprehensive refactoring plan for a legacy module.
Design a detailed 8-week refactoring plan for the following legacy module. Your plan should include weekly milestones, specific tasks, risk mitigations, and success criteria:
# legacy_report_generator.py - 800 lines, no tests, global state
import os, sys, csv, json, sqlite3, smtplib, datetime
from email.mime.text import MIMEText
DB_PATH = "/var/data/reports.db"
SMTP_HOST = "mail.company.com"
ADMIN_EMAILS = ["boss@company.com", "vp@company.com"]
_cache = {}
def generate_report(report_type, start_date, end_date, format="pdf",
recipients=None, filters=None, include_charts=True,
compare_previous=False, department=None):
# ... 200 lines of interleaved logic ...
pass
def generate_and_send(report_type, **kwargs):
# ... 150 lines that duplicate logic from generate_report ...
pass
# ... 20 more functions with similar issues ...
Exercise 20: Characterization Test Suite Design
Objective: Design a complete characterization test strategy for a legacy system.
You are tasked with adding characterization tests to a legacy e-commerce checkout process that spans five modules (cart, pricing, inventory, payment, fulfillment). Design:
- A prioritization framework for deciding which modules to test first
- A template for characterization tests that your team can follow
- A strategy for handling untestable code (database calls, external APIs)
- A coverage target and rationale
- An estimated timeline for achieving adequate coverage
Exercise 21: Framework Migration Implementation
Objective: Implement a partial framework migration with dual-stack support.
Write the complete code for running a Flask and FastAPI application simultaneously behind a routing layer. Include:
- A shared endpoint
/api/healthimplemented in both frameworks - A Flask endpoint
/api/legacy/usersthat remains in Flask - A FastAPI endpoint
/api/v2/usersthat is the modern replacement - A routing configuration that directs traffic appropriately
- Shared authentication middleware that works with both frameworks
Exercise 22: Automated Refactoring Tool
Objective: Build an AI-assisted refactoring tool.
Create a Python script that: 1. Reads a Python source file 2. Uses AST parsing to identify functions longer than a configurable threshold 3. Identifies code blocks that could be extracted (based on comment markers or blank line separation) 4. Generates a report suggesting specific refactoring actions 5. Optionally applies simple refactorings (like extracting a commented section into a new function)
Exercise 23: Technical Debt Tracker
Objective: Build a technical debt tracking and prioritization system.
Create a system that: 1. Parses Python files for TODO, FIXME, HACK, and XXX comments 2. Categorizes each finding by severity (based on keywords and context) 3. Estimates effort based on the size of the surrounding function 4. Generates a prioritized backlog in Markdown format 5. Tracks progress over time by comparing reports
Exercise 24: Strangler Fig with Database Migration
Objective: Implement the strangler fig pattern including database schema evolution.
Design and implement a strangler fig migration that includes changing the database schema. The legacy system stores addresses as a single text field; the modern system uses structured fields (street, city, state, zip). Your implementation must:
- Support reading from both old and new schemas simultaneously
- Migrate data incrementally (not all at once)
- Handle writes correctly during the migration period
- Include rollback capability
- Include a completion check that verifies all data has been migrated
Tier 5: Critical Evaluation (Exercises 25-30)
Exercise 25: Refactoring Decision Review
Objective: Evaluate a team's refactoring decisions and suggest improvements.
A team made the following refactoring decisions. Evaluate each one — was it a good decision? What would you have done differently?
- Rewrote the entire authentication system in one sprint (2 weeks) without maintaining backward compatibility
- Added type hints to every file in the project before addressing any structural issues
- Replaced all raw SQL queries with ORM calls, including complex reporting queries
- Used feature flags for migrating the payment system but not for the notification system
- Decided not to write characterization tests because "we're rewriting the code anyway"
Exercise 26: AI-Assisted Refactoring Pitfalls
Objective: Identify and evaluate common pitfalls when using AI for refactoring.
For each of the following scenarios, explain what could go wrong and how to prevent it:
- Asking AI to refactor a 500-line function in a single prompt
- Using AI-generated characterization tests without reviewing them
- Trusting AI's analysis of circular dependencies without verification
- Having AI generate a migration plan without providing it the full codebase context
- Using AI to rename variables across a codebase without understanding dynamic attribute access
Exercise 27: Modernization Strategy Evaluation
Objective: Evaluate competing modernization strategies.
Three teams are modernizing similar legacy systems. Evaluate their approaches:
Team A: Dedicated refactoring sprint every 6 weeks. No feature work during refactoring sprints. Focus on one area at a time until complete.
Team B: 20% of each sprint allocated to technical debt. Developers choose what to refactor based on what they are working on. No dedicated tracking.
Team C: Quarterly "modernization milestones" with specific goals. Feature work continues but must follow new standards. Old code refactored only when modified.
For each team, analyze: strengths, weaknesses, likely outcomes after 12 months, and what type of organization each approach suits best.
Exercise 28: Test Quality Assessment
Objective: Evaluate the quality and usefulness of characterization tests.
Review the following characterization tests and evaluate whether they provide adequate safety for refactoring:
def test_process_order_1():
result = process_order({"items": [{"id": 1, "qty": 2}], "user": 5})
assert result is not None
def test_process_order_2():
result = process_order({"items": [], "user": 5})
assert result["status"] == "error"
def test_process_order_3():
result = process_order({"items": [{"id": 1, "qty": 2}], "user": 5})
assert "total" in result
assert isinstance(result["total"], (int, float))
For each test, explain: (a) what it does and does not protect against, (b) how it could be improved, and (c) what specific refactoring mistakes it would fail to catch.
Exercise 29: Risk-Benefit Analysis
Objective: Perform a risk-benefit analysis for a specific refactoring decision.
Your legacy application has a 2,000-line utils.py file that every other module imports from. It contains 85 functions covering string manipulation, date formatting, database helpers, file I/O wrappers, and logging utilities. Perform a complete risk-benefit analysis for splitting this into focused modules:
- Quantify the benefits (maintainability, testability, clarity)
- Quantify the risks (breaking imports, merge conflicts, deployment risk)
- Estimate the effort (hours, number of files affected)
- Propose a migration strategy that minimizes risk
- Define success criteria and rollback triggers
Exercise 30: Legacy Code Ethics
Objective: Evaluate the ethical and professional responsibilities around legacy code.
Consider these scenarios and discuss the professional and ethical implications:
- A legacy system has a known security vulnerability, but refactoring it would take 3 months. Management wants to defer it to next quarter. What is your responsibility?
- You discover that a legacy function has a bug that has been overcharging customers by 0.1% for two years. Fixing it will change revenue. How do you handle this?
- An AI assistant suggests a refactoring that would improve code quality but would also make it harder for a junior team member to understand the code. How do you balance these concerns?
- The original developer of a legacy system insists their approach was correct and resists refactoring. How do you navigate this interpersonal dynamic while maintaining code quality?
- Your team uses AI to generate characterization tests, but you suspect the AI missed some edge cases. Do you ship the refactoring with incomplete test coverage or delay the project?