Chapter 14 Exercises: When AI Gets It Wrong

DataField.Dev

Chapter 14 Exercises: When AI Gets It Wrong

These exercises are organized into five tiers of increasing difficulty, following Bloom's taxonomy. Work through them sequentially or jump to the tier that matches your current skill level.

Tier 1: Recall (Exercises 1-7)

These exercises test your ability to remember and identify key concepts from the chapter.

Exercise 1: Failure Category Matching

Match each code snippet to its failure category (hallucinated API, logic error, security vulnerability, performance anti-pattern, outdated pattern):

a) from pandas.ml import QuickClassifier b) query = f"SELECT * FROM users WHERE id = '{user_id}'" c) for i in range(len(items) + 1): process(items[i]) d) from typing import List, Dict, Tuple e) for order in orders: db.query(Customer).get(order.cid)

Exercise 2: VERIFY Framework

List the six steps of the VERIFY debugging framework from memory and write one sentence explaining each step.

Exercise 3: Confidence Problem Signals

List at least four signals that indicate AI-generated code might be wrong despite sounding confident.

Exercise 4: Trust Spectrum

Describe the three positions on the trust spectrum (blind trust, healthy skepticism, excessive distrust) and explain why each is or is not effective.

Exercise 5: Security Vulnerability Identification

Name six categories of security vulnerabilities that AI commonly introduces in generated code. For each, write one sentence describing the vulnerability.

Exercise 6: Recovery Strategy Names

Name four recovery strategies for when an AI conversation goes off track. Write one sentence describing when each is most appropriate.

Exercise 7: Hallucination Definition

In your own words, define what "API hallucination" means in the context of AI-generated code. Explain why it happens and give one example not found in the chapter.

Tier 2: Apply (Exercises 8-14)

These exercises ask you to apply chapter concepts to straightforward scenarios.

Exercise 8: Bug Identification

The following AI-generated function calculates the average of a list. Identify all bugs:

def calculate_average(numbers: list[float]) -> float:
    """Calculate the arithmetic mean of a list of numbers."""
    total = 0
    for i in range(1, len(numbers)):
        total += numbers[i]
    return total / len(numbers)

List each bug, explain why it is wrong, and provide the corrected code.

Exercise 9: Security Audit

Audit the following AI-generated Flask route for security vulnerabilities. List every vulnerability you find and provide the secure alternative:

@app.route('/profile/<username>')
def profile(username):
    query = f"SELECT * FROM users WHERE username = '{username}'"
    user = db.execute(query).fetchone()
    bio = user['bio']
    return f"""
    <html>
    <body>
        <h1>{username}'s Profile</h1>
        <p>{bio}</p>
        <img src="/avatars/{username}.jpg">
    </body>
    </html>
    """

Exercise 10: Import Verification

The following code was generated by an AI. Without running it, identify which imports are likely hallucinated and which are real. Then verify your guesses by checking the actual Python documentation or PyPI.

from collections import OrderedDict, FrozenDict
from itertools import batch, chain, islice
from functools import cache, partial, compose
from pathlib import Path, SafePath
from dataclasses import dataclass, field, validate

Exercise 11: Performance Analysis

Analyze the time complexity of this AI-generated function. Identify the performance anti-pattern and rewrite it to be efficient:

def find_common_elements(list_a: list[int], list_b: list[int]) -> list[int]:
    """Find elements that appear in both lists."""
    common = []
    for item in list_a:
        if item in list_b and item not in common:
            common.append(item)
    return common

Exercise 12: Edge Case Testing

Write a comprehensive test suite for this AI-generated function. Include at least 8 test cases covering normal operation and edge cases:

def chunk_list(items: list, chunk_size: int) -> list[list]:
    """Split a list into chunks of specified size."""
    return [items[i:i + chunk_size] for i in range(0, len(items), chunk_size)]

Exercise 13: Deprecation Update

Rewrite the following AI-generated code to use modern Python 3.10+ patterns. Identify every outdated pattern:

from typing import Optional, List, Dict, Union, Tuple

def process_data(
    items: List[Dict[str, Union[str, int]]],
    filter_key: Optional[str] = None
) -> Tuple[List[Dict], int]:
    """Process and filter data items."""
    result = []
    for item in items:
        if filter_key is None or filter_key in item:
            result.append(item)
    count = len(result)
    return (result, count)

Exercise 14: Targeted Reprompt

You asked an AI to generate a function that reads a CSV file and returns the sum of a specific column. The AI generated code that: - Uses pandas (you wanted to avoid heavy dependencies) - Has a hardcoded file path - Does not handle missing values - Does not handle the case where the column does not exist

Write a targeted reprompt that addresses all four issues. Your reprompt should be specific enough that the AI cannot repeat any of these mistakes.

Tier 3: Analyze (Exercises 15-22)

These exercises require you to break down complex situations and reason about AI failure modes.

Exercise 15: Race Condition Analysis

The following AI-generated code implements a simple cache. Analyze it for thread safety issues. Identify every race condition and explain the scenario that triggers it:

import time

class SimpleCache:
    def __init__(self, ttl: int = 300):
        self.cache = {}
        self.ttl = ttl

    def get(self, key: str):
        if key in self.cache:
            value, timestamp = self.cache[key]
            if time.time() - timestamp < self.ttl:
                return value
            else:
                del self.cache[key]
        return None

    def set(self, key: str, value):
        self.cache[key] = (value, time.time())

    def cleanup(self):
        now = time.time()
        expired = [k for k, (v, t) in self.cache.items() if now - t >= self.ttl]
        for key in expired:
            del self.cache[key]

Exercise 16: Cascading Failure Analysis

An AI generated a three-function pipeline: fetch_data() -> transform_data() -> save_data(). Each function has a subtle bug. Analyze how the bugs interact:

import json
import csv

def fetch_data(url: str) -> list[dict]:
    """Fetch JSON data from URL."""
    import requests
    response = requests.get(url, timeout=30)
    return response.json()  # Bug: no status code check

def transform_data(records: list[dict]) -> list[dict]:
    """Transform records by normalizing values."""
    for record in records:
        record['name'] = record['name'].strip().title()
        record['score'] = round(record['score'] / 100, 2)  # Bug: mutates input
    return records

def save_data(records: list[dict], filename: str) -> None:
    """Save records to CSV."""
    with open(filename, 'w') as f:
        writer = csv.DictWriter(f, fieldnames=records[0].keys())  # Bug: crashes on empty list
        writer.writeheader()
        writer.writerows(records)

For each bug: describe the failure scenario, explain the downstream consequences, and provide the fix.

Exercise 17: Hallucination Probability Assessment

For each of the following AI-suggested imports, assess the probability of hallucination (high, medium, low) and explain your reasoning:

a) from fastapi import FastAPI, Depends, HTTPException b) from sqlalchemy.ext.automap import automap_base c) from django.contrib.ai import ModelGenerator d) from cryptography.hazmat.primitives.ciphers import Cipher, algorithms e) from flask_restful_swagger import Api as SwaggerApi f) from numpy.advanced import TensorDecomposition

Exercise 18: Pattern Detection

The following complete module was AI-generated. Find at least 6 issues across different failure categories (logic, security, performance, outdated patterns, etc.):

from typing import List, Dict, Optional
import hashlib
import sqlite3
import os

DB_PATH = "app.db"
SECRET = "my-app-secret-key-2024"

def get_db():
    return sqlite3.connect(DB_PATH)

def create_user(username: str, password: str) -> bool:
    """Create a new user account."""
    hashed = hashlib.sha256(password.encode()).hexdigest()
    db = get_db()
    query = f"INSERT INTO users (username, password) VALUES ('{username}', '{hashed}')"
    try:
        db.execute(query)
        db.commit()
        return True
    except:
        return False

def get_users(status: Optional[str] = None) -> List[Dict]:
    """Get all users, optionally filtered by status."""
    db = get_db()
    if status:
        query = f"SELECT * FROM users WHERE status = '{status}'"
    else:
        query = "SELECT * FROM users"
    results = db.execute(query).fetchall()
    users = []
    for row in results:
        users.append({"id": row[0], "username": row[1], "status": row[3]})
    return users

def authenticate(username: str, password: str) -> bool:
    """Check if username and password match."""
    hashed = hashlib.sha256(password.encode()).hexdigest()
    db = get_db()
    query = f"SELECT * FROM users WHERE username = '{username}' AND password = '{hashed}'"
    result = db.execute(query).fetchone()
    return result is not None

Exercise 19: Confidence Calibration

For each of the following AI responses, rate whether the code is likely correct (1 = almost certainly wrong, 5 = almost certainly correct) and explain your reasoning:

a) AI confidently generates a complex recursive algorithm for a well-known problem (Fibonacci with memoization) b) AI provides a detailed configuration for a niche library's undocumented feature c) AI generates a REST API CRUD endpoint using Flask d) AI implements a custom cryptographic protocol e) AI writes a SQL query joining four tables with specific business logic

Exercise 20: Conversation Diagnosis

Read the following conversation summary and diagnose what went wrong. Recommend a recovery strategy.

Turn 1: "Write a function to parse dates in multiple formats." Turn 2: AI generates function using dateutil.parser.parse. Works but too permissive. Turn 3: "Make it stricter, only accept ISO 8601 and US format." Turn 4: AI rewrites using regex. Gets ISO 8601 right but US format wrong (swaps month/day). Turn 5: "The US format is wrong, month and day are swapped." Turn 6: AI fixes US format but now ISO 8601 parsing is broken. Turn 7: "Now ISO 8601 is broken." Turn 8: AI attempts to fix both but introduces a new bug with timezone handling.

Exercise 21: Automated Verification Design

Design an automated verification pipeline for AI-generated Python code. Specify: - What tools to use at each stage - What each tool catches - The order of execution - How to handle failures at each stage - What still requires manual review after all automated checks pass

Exercise 22: Failure Mode Prediction

You are about to ask an AI to generate code for each of the following tasks. For each, predict the most likely failure mode and describe what specific verification you would perform:

a) A web scraper for a specific website b) A function to encrypt sensitive data c) A database migration script d) A real-time chat server using WebSockets e) A CSV parser that handles quoted fields with commas

Tier 4: Create (Exercises 23-30)

These exercises require you to build something new using chapter concepts.

Exercise 23: Bug Catalog Extension

Extend the example-01-bug-catalog.py code file with three new bug categories not covered in the chapter. For each category: - Write a buggy version (as AI would generate it) - Write the corrected version - Write a test that catches the bug - Add a docstring explaining why AI makes this mistake

Exercise 24: Custom Verification Tool

Write a Python script that analyzes a Python source file and reports: - All imports and whether they can be resolved - Any string formatting used in SQL-like contexts (potential injection) - Any use of pickle.loads, eval, or exec (dangerous functions) - Any hardcoded strings that look like secrets (API keys, passwords)

Exercise 25: AI Code Review Simulator

Create a Python program that takes AI-generated code as input and simulates a code review. The program should: - Check for common security anti-patterns - Flag potential performance issues - Identify deprecated patterns - Generate a report with severity levels (critical, warning, info)

Exercise 26: Edge Case Generator

Write a tool that, given a function signature with type hints, automatically generates edge case test inputs. For example, given def process(items: list[int], threshold: int) -> list[int], it should generate test cases including empty lists, single-element lists, all elements above/below threshold, negative numbers, zero, and very large values.

Exercise 27: Conversation Recovery Templates

Create a library of 10 targeted reprompt templates for common AI failure scenarios. Each template should: - Describe the scenario it addresses - Include the template text with placeholders - Explain why this formulation is effective - Include an example of filling in the template

Exercise 28: Security Scanner

Build a simple security scanner that analyzes Python code for the six vulnerability categories covered in Section 14.4. The scanner should: - Parse Python source code using the ast module - Detect SQL injection patterns - Detect hardcoded secrets - Detect use of weak cryptographic functions - Produce a report with line numbers and severity ratings

Exercise 29: Debugging Journal

Create a structured template for a "debugging journal" specifically designed for AI-generated code. Then use it to document the debugging process for three bugs of your choice from this chapter. Include: - The original buggy code - How you detected the bug - The root cause analysis - The fix - The verification test - A lesson learned for future AI interactions

Exercise 30: Trust-but-Verify Dashboard

Design (and optionally implement) a dashboard that tracks AI code verification metrics for a development team: - Number of AI-generated code blocks reviewed vs. unreviewed - Bug categories found during review - Bugs that escaped to production - Time saved by AI vs. time spent verifying - Trend lines showing improvement over time

Tier 5: Challenge (Exercises 31-35)

These exercises integrate concepts from multiple chapters and present stretch problems.

Exercise 31: Comprehensive Code Audit

Take any open-source project that was partially generated by AI (or generate a small project yourself using an AI assistant) and perform a comprehensive audit using all techniques from this chapter. Write a detailed report including: - Total issues found by category - Most critical issues and their potential impact - Recommendations for automated prevention - Estimated time to fix all issues - Lessons learned about this AI's specific failure patterns

Cross-reference with Chapter 7 (code review skills) and Chapter 10 (specification-driven prompting).

Exercise 32: AI Failure Mode Research

Research and document three AI coding failure modes not covered in this chapter. For each: - Describe the failure mode with a concrete code example - Explain why AI makes this mistake (based on how LLMs work, from Chapter 2) - Propose a detection strategy - Propose a prevention strategy (better prompting, from Chapters 8-13) - Estimate the impact severity and likelihood

Exercise 33: Organizational Playbook

Create a complete organizational playbook (2,000+ words) for adopting AI coding assistants at a mid-size company. Address: - Training developers on AI failure modes - Updating code review processes - Setting up automated verification pipelines - Tracking and learning from AI-generated bugs - Balancing productivity gains with quality assurance - Handling the transition period

Integrate concepts from this chapter with Chapter 7 (code review) and Chapter 10 (specification-driven prompting).

Exercise 34: Multi-Model Comparison

If you have access to multiple AI coding assistants, generate the same 10 functions with each and compare: - Which failure modes are common across all models? - Which failure modes are specific to particular models? - Do models fail in correlated or independent ways? - Could using multiple models as cross-checks improve reliability? - Design a multi-model verification workflow

Exercise 35: The Ultimate Verification Pipeline

Design and implement a complete, production-ready verification pipeline for AI-generated Python code. The pipeline should: 1. Parse the code using the ast module 2. Check all imports against installed packages 3. Run type checking with mypy 4. Run security scanning with bandit patterns 5. Check for common performance anti-patterns 6. Generate targeted edge-case tests 7. Run the generated tests 8. Produce a comprehensive report

The pipeline itself should be well-tested, documented, and designed to be integrated into a CI/CD workflow. This exercise integrates concepts from this chapter with Chapter 21 (AI-assisted testing) and Chapter 29 (DevOps and deployment).