Chapter 9 Quiz: Context Management and Conversation Design

Test your understanding of context management concepts with these 25 questions. Try to answer each question before revealing the answer.


Question 1

What is a context window in practical terms for a vibe coding session?

Show Answer A context window is the total amount of text the AI can consider at once when generating a response. It includes the system prompt, the full conversation history, any attached files or code, the current message, and the AI's response as it generates it. It is measured in tokens, and when it fills up, the AI can no longer access older information.

Question 2

Approximately how many tokens does one line of Python code consume?

Show Answer One line of Python code consumes approximately 10-15 tokens, depending on the complexity of the line (variable names, string literals, nested structures, etc.).

Question 3

What is the "Lost in the Middle" effect, and how does it impact vibe coding conversations?

Show Answer The "Lost in the Middle" effect refers to the finding that AI models pay the most attention to information at the beginning and end of the context window, while information in the middle receives less attention. In vibe coding, this means constraints or decisions established in the middle of a long conversation may be effectively forgotten, even though they are technically still within the context window. This is why periodic restatement of key constraints and strategic information placement are important.

Question 4

Why is an AI conversation described as an "append-only" data structure?

Show Answer In most AI coding tools, you cannot go back and edit earlier messages in the conversation. Each new message (from the user or the AI) is appended to the end of the message list. This means that mistakes, misunderstandings, and outdated information accumulate in the conversation history unless you actively correct them with new messages.

Question 5

What are the three phases of the conversation lifecycle?

Show Answer 1. **Ramp-up (turns 1-3):** Context is being established; the AI does not yet have a full picture and responses improve as more information is provided. 2. **Peak performance (turns 3-15):** The AI has enough context to produce excellent results without yet having accumulated too much history. 3. **Degradation (turns 15+):** The conversation history grows long; the AI may forget earlier constraints, contradict itself, repeat mistakes, or produce lower-quality code.

Question 6

What is "front-loading" and why is it effective?

Show Answer Front-loading is the practice of providing all critical context (project details, constraints, coding standards, architecture decisions) in the first message of a conversation. It is effective because of the primacy effect---the AI pays strong attention to information at the beginning of the context. Front-loading also saves tokens by preventing multiple rounds of clarifying questions.

Question 7

Describe the "sandwich pattern" for prompt construction.

Show Answer The sandwich pattern places critical information at both the beginning and end of a message, with detailed content (code, examples, specifications) in the middle: 1. Top: Context and constraints 2. Middle: Details, code, examples 3. Bottom: Restatement of key requirements This pattern exploits both the primacy and recency effects, ensuring important instructions receive high attention.

Question 8

In the Progressive Disclosure pattern, what is the main risk?

Show Answer The main risk is that later requirements may be interdependent with earlier ones, meaning that if all requirements had been known upfront, the design would have been different. When requirements are revealed progressively and a later requirement conflicts with the established design, the AI must refactor earlier work, which wastes tokens and can introduce inconsistencies.

Question 9

What is the difference between the Scaffold-Then-Fill pattern and the Progressive Disclosure pattern?

Show Answer **Scaffold-Then-Fill** asks the AI to create the complete structure first (class skeletons, method signatures, file organization) with empty implementations, then fills in the implementations one at a time. The focus is on getting architecture right before implementation. **Progressive Disclosure** builds features incrementally, adding one layer of functionality at a time to an evolving, fully-implemented codebase. The focus is on building working code layer by layer. The key difference is that scaffold-then-fill separates design from implementation, while progressive disclosure combines them in incremental steps.

Question 10

Name three context priming techniques and explain when each is most useful.

Show Answer 1. **Role and Expertise Priming** --- Tell the AI what role to play. Most useful when you need domain-specific knowledge (for example, "you are a security engineer" for security reviews). 2. **Codebase Context Priming** --- Show examples of your existing code patterns. Most useful when you need generated code to be consistent with your existing codebase. 3. **Anti-Pattern Priming** --- Tell the AI what NOT to do. Most useful when the AI tends to generate code with patterns that violate your project's standards (for example, "do not use print() for logging").

Question 11

What is a "CLAUDE.md" file and why is it a powerful context management tool?

Show Answer A `CLAUDE.md` file is a project-level configuration file used by Claude Code. Any instructions in this file are automatically included as context in every conversation about the project. It is powerful because it sits in a high-attention position (the beginning of the context), it persists across conversations (you do not have to restate instructions), and it ensures consistency across multiple coding sessions. It effectively serves as a permanent system prompt for your project.

Question 12

When should you use "interface-only" context instead of including a full file?

Show Answer You should use interface-only context when a file is relevant (the AI needs to know what methods/classes exist and their signatures) but the implementation details are not needed for the current task. For example, if you are writing code that calls methods on a repository class, the AI needs to know the method signatures and return types, but not the SQL implementation inside each method. This saves tokens while providing the AI with enough information to generate compatible code.

Question 13

What are the five signals that indicate you should start a new conversation?

Show Answer 1. You are starting work on a different feature or component 2. The conversation has exceeded 20-25 turns with heavy code exchanges 3. The AI is showing signs of context degradation (forgetting constraints, contradicting itself, declining quality) 4. You have changed your mind about a fundamental design decision 5. You are switching from one development phase to another (e.g., from implementation to testing) A bonus sixth signal: the AI has made the same mistake more than twice.

Question 14

What is the "Fresh Start Protocol"?

Show Answer The Fresh Start Protocol is a three-step process for transitioning to a new conversation while preserving context: 1. **Ask the AI to summarize** the current progress, including what was built, key decisions, current code state, and remaining work. 2. **Gather artifacts** --- copy the final versions of any code generated in the conversation. 3. **Start a new conversation with a priming message** that combines your standard project priming, the summary from step 1, and the relevant code from step 2. This gives you a fresh context window with high-quality, distilled context from the previous session.

Question 15

Why is the statement "just paste your entire codebase" a bad context management strategy?

Show Answer Pasting the entire codebase is problematic for several reasons: 1. It wastes tokens on files irrelevant to the current task 2. The irrelevant information creates noise, making it harder for the AI to identify what matters 3. It can push the context past the window limit, causing older important context to be truncated 4. It can trigger the "lost in the middle" effect, where the AI loses track of important files buried among irrelevant ones 5. It is more effective to provide a curated subset of relevant files, interfaces, and snippets

Question 16

What is the difference between a "token" and a "word" in the context of AI models?

Show Answer Tokens are the fundamental units of text that AI models process. They do not map one-to-one with words. Common short words may be a single token, while longer or uncommon words may be split into multiple tokens. Code syntax (brackets, operators, indentation) also consumes tokens. As a rough rule of thumb, one word averages about 1.3 tokens, but this varies by language and content type. Code tends to use more tokens per word than natural language because of special characters and structured syntax.

Question 17

How does the Tree-and-Summary strategy save tokens while still providing useful context?

Show Answer The Tree-and-Summary strategy provides the directory structure of the codebase along with a one-line description of each file's purpose, without including any actual code. This gives the AI a "map" of the entire project at minimal token cost (typically a few hundred tokens for a moderate project). The AI can understand the architecture, file organization, and where different functionality lives, which helps it generate code that fits into the project correctly. You can then include full code only for the specific files relevant to the current task.

Question 18

In what scenario would diff-based context be more effective than providing the full file?

Show Answer Diff-based context is most effective when debugging or reviewing a recent change. If a change introduced a bug, showing the diff immediately focuses the AI on what changed, rather than making it read through the entire file to find the relevant sections. It is also useful during code review, when refining a specific section of code, or when the file is very large but the changes are small. The diff format clearly shows both what was removed and what was added, making the problem or change obvious.

Question 19

Why might a 1 million token context window still require careful context management?

Show Answer Even with very large context windows, careful context management is still necessary because: 1. The "Lost in the Middle" effect means the AI does not attend equally to all parts of the context, regardless of size 2. More irrelevant context creates more noise, reducing the signal-to-noise ratio 3. Attention degradation with distance means earlier information still receives less focus 4. Processing larger contexts is slower and more expensive 5. The quality of the context matters more than the quantity---well-curated, relevant context outperforms a bulk dump even when the window can hold everything

Question 20

What is "anchor messaging" and when should you use it?

Show Answer Anchor messaging is the practice of periodically restating the most important context, constraints, and current state within a long conversation. It serves as a "reminder" to the AI about critical information that may have drifted out of the high-attention zones due to intervening messages. You should use anchor messages when: - The conversation has gone on for more than 10-15 turns - You are about to start a new sub-task within the same conversation - The AI has shown signs of forgetting a specific constraint - You have been working on a tangent and are returning to the main task

Question 21

Design a system prompt for a code review persona. What elements should it include?

Show Answer An effective code review persona system prompt should include: 1. **Role definition:** "You are a senior code reviewer with expertise in [languages/frameworks]." 2. **Review focus areas:** What to check for (bugs, security, performance, style, architecture). 3. **Severity rating system:** How to classify findings (Critical, High, Medium, Low, Info). 4. **Output format:** How to structure the review (file-by-file, by severity, as a checklist). 5. **Behavioral rules:** "Do not rewrite the code; only identify issues and suggest specific fixes." 6. **Scope boundaries:** "Focus on logic and correctness; ignore formatting issues that a linter would catch." 7. **Positive feedback guidance:** "Also note well-written sections, not just problems."

Question 22

You are planning a session to implement a search feature. Your context window is 128K tokens. The system prompt uses 5K tokens. You estimate 20 turns at 2,500 tokens each. You need to include 5 files totaling 8,000 tokens. Will this fit? Show your calculation.

Show Answer Total context budget: 128,000 tokens Fixed costs: - System prompt: 5,000 tokens - Reserved for AI response (final turn): 4,000 tokens Variable costs: - File context: 8,000 tokens - Conversation history (20 turns x 2,500 tokens): 50,000 tokens Total needed: 5,000 + 4,000 + 8,000 + 50,000 = 67,000 tokens 67,000 < 128,000, so yes, this fits within the budget with approximately 61,000 tokens to spare. However, the 50,000 tokens of conversation history is an estimate that could grow if the AI produces verbose responses, so monitoring usage is still advisable.

Question 23

What is the key difference between context priming and the first prompt of a conversation?

Show Answer Context priming and the first prompt are related but distinct: **Context priming** focuses on setting up the AI's "mental state"---defining its role, expertise, behavioral rules, coding conventions, and output format. It is about configuring *how* the AI will respond. **The first prompt** includes the priming but also contains the first *task*---what you actually want the AI to do. In practice, some developers send priming as a separate first message (with no task), then send the first task in the second message. Others combine priming and the first task into a single message. The separate approach gives cleaner separation of concerns; the combined approach saves a turn.

Question 24

Explain why asking the AI to "be concise" is a form of context management.

Show Answer Asking the AI to be concise is a context management strategy because the AI's responses consume tokens in the context window just as your messages do. A verbose AI response that includes lengthy explanations, multiple alternatives, and step-by-step commentary can easily use 2,000-4,000 tokens when a code-only response would use 500-1,000 tokens. Over a 20-turn conversation, this difference can add up to 30,000-60,000 tokens of wasted context. By instructing the AI to show only code and skip unnecessary explanations, you preserve context budget for more productive use---additional conversation turns, more file context, or simply staying within the window longer.

Question 25

A developer has a conversation that is 25 turns long. They notice the AI is generating code that contradicts a design decision from turn 4. What are three possible remediation strategies, in order from quickest to most thorough?

Show Answer Three remediation strategies in order of effort: 1. **Quick fix: Anchor message.** Restate the design decision explicitly in the current message: "Remember, we decided in turn 4 to use X approach. Please regenerate this code following that decision." This is fast but may not hold if the conversation continues much longer. 2. **Moderate fix: Summarize and refocus.** Ask the AI to summarize the current state and key decisions, then explicitly confirm or correct the summary. Use this corrected summary as an anchor for the remaining work. This resets the AI's "understanding" without starting over. 3. **Thorough fix: Fresh start.** Extract the summary and final code artifacts, start a new conversation with proper priming that includes the design decision prominently in the first message, and continue from there. This is the most reliable approach when the conversation has degraded significantly. The right choice depends on how much work remains. If you are nearly done, the quick fix may suffice. If you have significant work ahead, the fresh start will save time in the long run.