Chapter 14 Quiz: When AI Gets It Wrong
Test your understanding of AI coding failure modes, detection techniques, and recovery strategies. Try to answer each question before revealing the answer.
Question 1
What is the most distinctive failure mode specific to AI coding assistants (as opposed to human programmers)?
Show Answer
**API hallucination** — the generation of references to functions, classes, methods, or entire libraries that do not exist. This is unique to AI because human programmers typically only write code using APIs they have actually used or looked up. AI models generate plausible-sounding but non-existent APIs based on statistical patterns in training data.Question 2
Why is the +1 in this line a bug?
end = start + page_size + 1
(Context: Python list slicing for pagination)
Show Answer
Python's slice notation `items[start:end]` already excludes the element at index `end`. Adding `+1` causes each page to include one extra element, creating overlap between consecutive pages. The correct expression is `end = start + page_size`.Question 3
What does VERIFY stand for in the VERIFY debugging framework?
Show Answer
- **V** — Validate imports (check all modules and names exist) - **E** — Examine edge cases (empty inputs, boundaries, single elements) - **R** — Review security (SQL injection, XSS, secrets, etc.) - **I** — Inspect logic (trace with concrete values, check loop bounds) - **F** — Find performance issues (nested loops, N+1 queries, blocking I/O) - **Y** — Yell about types (check type consistency throughout data flow)Question 4
What is "dependency confusion" and how does it relate to AI hallucinations?
Show Answer
Dependency confusion (also called typosquatting) occurs when an attacker publishes a malicious package on a public registry (like PyPI) using a name that developers might accidentally install. When AI hallucinates a package name that does not exist, an attacker could register that exact name as a malicious package. If a developer then runs `pip installQuestion 5
What is wrong with this AI-generated thread-safe counter?
class Counter:
def __init__(self):
self.value = 0
def increment(self):
self.value += 1
Show Answer
The `+=` operation is not atomic. It involves three steps: read the current value, add one, and write back the result. In a multi-threaded environment, two threads can read the same value simultaneously, both increment it, and both write back the same result — losing one increment. The fix is to use a `threading.Lock` to serialize access to the shared state.Question 6
Why is hashlib.md5(password.encode()).hexdigest() inappropriate for password hashing?
Show Answer
Three reasons: (1) MD5 is cryptographically broken — collisions can be generated quickly. (2) MD5 is too fast for password hashing — attackers can try billions of candidates per second. (3) It lacks salting — identical passwords produce identical hashes. Purpose-built password hashing functions like `bcrypt`, `scrypt`, or `argon2` are designed to be slow, include automatic salting, and are resistant to known attacks.Question 7
What is the N+1 query problem?
Show Answer
The N+1 query problem occurs when code executes one query to fetch a list of N records, then executes N additional queries (one per record) to fetch related data. For example, querying all orders (1 query) and then querying the customer for each order individually (N queries) results in N+1 total queries. The fix is to use a JOIN query or eager loading to fetch all data in one or two queries.Question 8
An AI generates a function to find duplicates using items.count(item) inside a loop. What is the performance issue?
Show Answer
`list.count()` is an O(n) operation, and it is called once per element in the loop (also O(n)), making the overall complexity O(n^2). For a list of 100,000 elements, this means ~10 billion operations. The fix is to use a `collections.Counter` or a dictionary to count occurrences in a single O(n) pass.Question 9
What makes a "targeted reprompt" effective? List at least three key elements.
Show Answer
An effective targeted reprompt includes: (1) A clear stop signal ("Stop" or "Let's start over on this part") to break the current approach. (2) Identification of the fundamental problem, not just the symptom. (3) An explicit constraint stating what the new approach must do differently. (4) Enough specificity to prevent the AI from drifting back to the old pattern. (5) Optionally, a working code snippet to use as a starting point.Question 10
What is the "expertise illusion" in AI-generated code?
Show Answer
The expertise illusion occurs when AI-generated code has high superficial quality — clean variable names, thorough docstrings, consistent style, proper type hints — that makes it appear to be written by an expert. This professional presentation can mask substantive errors (logic bugs, missing edge case handling, security vulnerabilities) because reviewers are psychologically inclined to trust well-formatted, well-documented code.Question 11
Why might asyncio.get_event_loop() followed by loop.run_until_complete(main()) be flagged as outdated?
Show Answer
Starting in Python 3.10, `asyncio.get_event_loop()` emits a deprecation warning when there is no running event loop. The modern and recommended approach is to simply use `asyncio.run(main())`, which creates, manages, and closes the event loop automatically. This pattern is simpler, less error-prone, and the officially recommended way to run async code.Question 12
What is path traversal and how does it appear in AI-generated code?
Show Answer
Path traversal is a vulnerability where an attacker includes `../` sequences in a file path to access files outside the intended directory. AI-generated file-serving code often fails to validate that the resolved path stays within the allowed directory. For example, if the code serves files from `/uploads/` and the user requests `../../etc/passwd`, they could access the system password file. The fix is to resolve the full path and verify it starts with the allowed directory prefix.Question 13
What are the signs that an AI conversation has "derailed"?
Show Answer
Signs of a derailed conversation include: (1) Circular fixes — fixing one bug introduces another, back and forth. (2) Contradictory approaches — the AI switches between fundamentally different architectures. (3) Increasing complexity — each fix makes code more complex rather than simpler. (4) Losing context — the AI forgets constraints specified earlier. (5) Cargo cult fixes — changes that look relevant but do not address the actual problem.Question 14
What is the difference between requests.get() and aiohttp.ClientSession().get() in the context of async code?
Show Answer
`requests.get()` is a synchronous (blocking) call. When used inside an `async` function, it blocks the entire event loop, preventing any other coroutines from running. This defeats the purpose of async programming. `aiohttp.ClientSession().get()` is a true asynchronous call that yields control back to the event loop while waiting for the network response, allowing other coroutines to run concurrently. AI often incorrectly uses `requests` in async code because it is the more common library in training data.Question 15
When should you use the "nuclear option" (starting a completely new conversation)?
Show Answer
Start a fresh conversation when: (1) The context window is heavily polluted with failed attempts. (2) The AI has adopted a fundamentally wrong architecture that incremental fixes cannot repair. (3) More than three rounds of fixes have failed to resolve the issue. (4) The AI keeps reverting to the same incorrect approach despite corrections.Question 16
What Python tool converts deprecation warnings into errors?
Show Answer
import warnings
warnings.filterwarnings('error', category=DeprecationWarning)
This causes any `DeprecationWarning` to raise an exception instead of printing a warning. This is useful during testing to ensure AI-generated code does not use deprecated APIs.
Question 17
Why should you be especially suspicious of AI-generated regular expressions?
Show Answer
AI-generated regex patterns typically handle common cases correctly while failing on edge cases. This is because the model has seen many "close enough" regex patterns in training data (especially from Stack Overflow) that are widely shared despite being incomplete. Regular expressions for email validation, URL parsing, and date matching are notoriously difficult to get right, and AI tends to reproduce the popular-but-wrong versions. Always test regex with comprehensive test cases including boundary and adversarial inputs.Question 18
What is the difference between pickle.loads() and json.loads() from a security perspective?
Show Answer
`pickle.loads()` can execute arbitrary Python code during deserialization. If an attacker controls the input bytes, they can craft a pickle payload that executes malicious code (e.g., deleting files, exfiltrating data, establishing a reverse shell). `json.loads()` can only parse JSON data into basic Python types (dicts, lists, strings, numbers, booleans, None) and cannot execute code. For untrusted input, always use `json.loads()` or another safe deserialization format.Question 19
An AI generates this binary search. What causes it to infinite-loop?
elif arr[mid] < target:
low = mid
else:
high = mid
Show Answer
When `low` and `high` are adjacent (e.g., `low=3, high=4`), `mid = (3 + 4) // 2 = 3`, which equals `low`. If `arr[3] < target`, setting `low = mid` leaves `low = 3` — unchanged. The next iteration computes the same `mid`, and the loop never terminates. The fix is `low = mid + 1` (and similarly `high = mid - 1` for the else branch), which ensures the search space shrinks with every iteration.Question 20
What automated tools should be part of an AI code verification pipeline? List at least five.
Show Answer
(1) **Type checker** (`mypy` or `pyright`) — catches type inconsistencies. (2) **Linter** (`ruff`, `flake8`, or `pylint`) — catches style issues and common bugs. (3) **Security scanner** (`bandit`) — catches common security vulnerabilities. (4) **Formatter** (`ruff format` or `black`) — ensures consistent code style. (5) **Test runner** (`pytest`) — runs automated tests including edge cases. Additional useful tools: (6) **Import checker** — verifies all imports resolve. (7) **Complexity checker** — flags overly complex functions. (8) **Dependency auditor** (`pip-audit`) — checks for known vulnerabilities in dependencies.Question 21
Why does AI-generated code sometimes use from typing import List instead of the built-in list type hint?
Show Answer
Before Python 3.9, built-in types like `list`, `dict`, and `tuple` could not be used directly as generic types in annotations (e.g., `list[int]` was a syntax error). Instead, developers had to import `List`, `Dict`, `Tuple` from the `typing` module. Since AI models are trained on code spanning many Python versions, they frequently reproduce this older pattern. In Python 3.9+, the built-in types support subscript notation directly, making the `typing` imports unnecessary for these basic types.Question 22
What is a "constraint fence" in the context of AI conversation repair?
Show Answer
A constraint fence is a prompting technique where you provide the AI with an explicit list of constraints that the generated code must satisfy, and frame any violation as meaning the code is wrong. For example: "NO nested loops, NO string concatenation for SQL, ALL functions must handle empty input." This technique is effective when the AI keeps making the same type of mistake, because it gives the model a clear checklist to verify against before finalizing its response.Question 23
What is the recommended "three strikes" rule for AI conversation management?
Show Answer
The three strikes rule suggests that if three rounds of fixes do not resolve an issue in an AI conversation, you should start a fresh conversation rather than continuing to iterate. This is because: (1) the context window becomes polluted with failed attempts, (2) the AI may have committed to a fundamentally flawed approach, and (3) diminishing returns make further iteration less productive than a clean start with a better-specified prompt.Question 24
What is the Criticality Matrix for AI code verification, and what does it suggest about prioritization?
Show Answer
The Criticality Matrix maps AI coding failures along two axes: likelihood and impact. It creates four quadrants: (1) High impact, high likelihood — logic errors in business rules (verify first). (2) High impact, low likelihood — security vulnerabilities and race conditions (always verify). (3) Low impact, high likelihood — verbose or non-idiomatic code (fix when convenient). (4) Low impact, low likelihood — stylistic inconsistencies (lowest priority). The key insight is to always verify high-impact areas first, regardless of likelihood, because a single security vulnerability matters more than dozens of style issues.Question 25
Explain why "hard problems usually require hard solutions" is a useful heuristic when reviewing AI-generated code.