Chapter 2: Quiz

Test your understanding of how AI coding assistants work. Try to answer each question before revealing the answer.


Question 1

What is the fundamental operation that a large language model performs?

Show Answer A large language model predicts the next token in a sequence. Given all the tokens that come before a position, it produces a probability distribution over all possible next tokens and selects one. This simple operation, repeated hundreds or thousands of times, produces complete code snippets, explanations, and other outputs.

Question 2

Why can a "language" model write code, even though programming languages are not natural languages?

Show Answer Programming languages are languages -- they have grammar (syntax), meaning (semantics), common phrases (idioms), and context-dependent interpretation. Models trained on both natural language and source code learn the patterns of both. In fact, code is in some ways easier for language models because it has stricter syntax rules, more predictable patterns, and less ambiguity than natural language.

Question 3

What is the attention mechanism, and why was it a breakthrough for AI?

Show Answer The attention mechanism allows the model to look at all parts of the input simultaneously and decide which parts are most relevant to the current prediction. Before attention, models processed text sequentially and tended to "forget" earlier parts of long inputs. Attention solved this by enabling the model to focus on relevant parts of the input regardless of distance, like a spotlight operator illuminating the most important parts of a stage.

Question 4

What does "multi-head attention" mean?

Show Answer Multi-head attention means the transformer runs multiple attention mechanisms in parallel. Each "head" can focus on different aspects of the input -- for example, one head might track syntactic relationships (matching brackets), another might track variable definitions and usage, another might focus on semantic meaning, and another might track data types. This allows the model to understand code at multiple levels simultaneously.

Question 5

Approximately how many tokens does the English word "indentation" correspond to?

Show Answer The word "indentation" is typically split into approximately 3 tokens, such as ["ind", "ent", "ation"]. Common words are often single tokens, while longer or less common words are split into subword pieces by the tokenizer. The exact tokenization depends on the specific model's tokenizer.

Question 6

What are the three main stages of training for an AI coding assistant?

Show Answer The three main stages are: 1. **Pre-training**: The model learns to predict the next token from a massive corpus of text and code, absorbing patterns of language and programming. 2. **Supervised fine-tuning (SFT)**: The model is trained on curated examples of helpful assistant behavior, learning the format and quality standards of a good coding assistant. 3. **Reinforcement Learning from Human Feedback (RLHF)**: Human evaluators rank model responses, and the model is optimized to produce responses that align with human preferences for code quality, helpfulness, and safety.

Question 7

What is a context window, and why does its size matter for code generation?

Show Answer The context window is the total amount of text (in tokens) that the model can process at once, including both the input prompt and the generated response. Its size matters for code generation because: (1) code has dependencies spread across files, (2) the AI needs to see relevant code to maintain consistency, (3) specifications and requirements can be lengthy, and (4) everything the model needs to know must fit within this window. A larger context window allows the model to consider more relevant code and produce more contextually appropriate output.

Question 8

What happens if critical information about your project falls outside the context window?

Show Answer If information falls outside the context window, the model simply cannot access it -- it is as if that information does not exist. The model has no way to retrieve or reference information that is not within its context window. This is why context management is one of the most important skills for effective vibe coding: you must ensure that relevant code, type definitions, conventions, and requirements are included in the prompt.

Question 9

What is temperature in the context of AI text generation?

Show Answer Temperature is a parameter that adjusts the probability distribution before a token is selected. A temperature of 0 (or very low) makes the model nearly deterministic, almost always choosing the highest-probability token. A temperature of 1 samples from the natural probability distribution. A temperature greater than 1 flattens the distribution, making less likely tokens more probable. For code generation, lower temperatures (0-0.3) produce more predictable, consistent output, while higher temperatures introduce more variety and creativity.

Question 10

Why might an AI coding assistant produce different code when you ask the same question twice?

Show Answer This happens because of the sampling process. Unless the temperature is set to exactly 0, the model samples from a probability distribution rather than always choosing the most likely token. Even at low temperatures, there is some randomness in the selection process. Additionally, top-k and top-p sampling strategies introduce controlled randomness. This means the model may choose different but equally plausible tokens at each step, leading to different (but usually similar) outputs.

Question 11

What is Byte Pair Encoding (BPE), and why is it used for tokenization?

Show Answer Byte Pair Encoding is a tokenization algorithm that builds a vocabulary by iteratively merging the most frequently occurring pairs of adjacent tokens. Starting from individual characters, it discovers useful subword units: common words become single tokens while rare words are split into pieces. BPE is used because it balances vocabulary size (keeping it manageable at 50,000-100,000 tokens) with sequence length (keeping token sequences reasonably short), and it can handle any text input including novel words, code syntax, and special characters.

Question 12

What is the relationship between token count and cost when using AI coding APIs?

Show Answer Most AI API services charge per token for both input (your prompt) and output (the model's response). This means verbose prompts with unnecessary context cost more, and requesting longer outputs also costs more. Understanding tokenization helps you write efficient prompts that include necessary context without waste, optimizing both quality and cost. Code tends to use fewer tokens per line than English prose due to shorter words and keywords.

Question 13

Why does the order and structure of information in your prompt matter?

Show Answer The transformer's attention mechanism works better when relevant information is clearly structured. While attention can theoretically focus on any part of the input, in practice, important details can be "diluted" in very long, unstructured prompts. Placing the most important information at the beginning and end of your prompt, using clear headings and bullet points, and using code blocks for code all help the attention mechanisms identify and focus on the most relevant parts. This directly improves the quality of the generated output.

Question 14

What is the "knowledge cutoff" problem, and how does it affect AI-generated code?

Show Answer AI models are trained on data up to a specific date (the knowledge cutoff) and do not have information about events, library updates, API changes, or new technologies after that date. This means the model might generate code using deprecated APIs, miss new features in libraries, or be unaware of recently discovered security vulnerabilities. For recently updated libraries or new frameworks, you should always verify AI-generated code against current documentation and be prepared to provide the model with updated information in your prompt.

Question 15

What is the difference between pre-training and supervised fine-tuning?

Show Answer Pre-training teaches the model to predict the next token from a massive, uncurated corpus of text and code. The model learns language patterns, code syntax, and factual knowledge but does not learn to be a helpful assistant. Supervised fine-tuning then trains the model on carefully curated examples of helpful assistant behavior -- specific prompt-response pairs that demonstrate the desired format, quality standards, and helpfulness. Pre-training gives the model knowledge; fine-tuning teaches it how to use that knowledge helpfully.

Question 16

Can an AI coding assistant "go back and fix" earlier parts of its response if it realizes it made a mistake?

Show Answer No. The model generates text token by token in a forward-only manner. Once a token is generated, it becomes part of the context for all subsequent tokens, but the model cannot revise previously generated tokens. If the model makes a wrong decision early (like choosing an incorrect algorithm), the rest of the output builds on that wrong foundation. This is why it is effective to break complex tasks into stages: first ask the AI to outline its approach, review it, and then implement specific parts.

Question 17

What role does RLHF play in making AI code generation produce readable, well-documented code?

Show Answer RLHF trains the model to produce responses that align with human preferences. During the RLHF process, human experts evaluate code quality on dimensions including readability, documentation, error handling, and structure. The model learns that humans prefer code with descriptive variable names, comprehensive docstrings, proper error handling, and clean structure. Without RLHF, a model might produce code that is technically correct but poorly organized, uncommented, or using cryptic variable names -- because all of these patterns exist in the training data.

Question 18

What is top-p (nucleus) sampling, and how does it differ from top-k sampling?

Show Answer Top-k sampling limits the model to choosing from only the k most likely tokens (e.g., the top 10). Top-p (nucleus) sampling instead selects the smallest set of tokens whose cumulative probability exceeds a threshold p (e.g., 0.9 or 90%). The key difference is that top-p adapts to the shape of the distribution: when the model is very confident (one token has 95% probability), top-p might select just 1-2 tokens; when the model is uncertain (many tokens have similar probabilities), top-p might select dozens. Top-k always selects the same number regardless of confidence level.

Question 19

Why does code tend to be easier for language models than natural language in some respects?

Show Answer Code has several properties that make it more amenable to language model prediction: (1) strict syntax rules mean fewer valid continuations at each point, (2) common patterns and idioms are more standardized than natural language expressions, (3) variable names and function names are often descriptive and predictable, (4) code has less ambiguity than natural language -- a syntax error is always wrong, whereas natural language often has multiple valid interpretations, and (5) programming patterns are repeated across millions of codebases, providing abundant training examples.

Question 20

Which of the following is NOT a reason AI coding assistants struggle with complex algorithms?

A) They generate code token by token without the ability to revise B) They have not been trained on any algorithmic code C) Novel algorithms may not have close patterns in the training data D) They cannot simulate program execution to verify correctness

Show Answer **B) They have not been trained on any algorithmic code.** This is false -- AI models have been trained on vast amounts of code including many algorithm implementations. The other three options are genuine reasons for difficulty: (A) forward-only generation means early mistakes cascade, (C) truly novel algorithms lack training examples to match, and (D) the model predicts patterns rather than simulating execution, so it cannot verify logical correctness during generation.

Question 21

A developer provides the following prompt: "Write some code." What concept from this chapter best explains why this prompt will likely produce unsatisfying results?

Show Answer The concept of **probability distributions** best explains this. An ambiguous prompt like "Write some code" has an enormous number of plausible continuations -- the model could write code in any language, for any purpose, in any style. The probability distribution is spread thinly across many possibilities, and the model must arbitrarily choose among them. A specific prompt constrains the distribution toward the desired output. This is also related to **attention**: with no specific context to attend to, the model has no basis for focusing on any particular pattern.

Question 22

How many tokens are available for the AI's response if the context window is 8,192 tokens and the prompt (including system instructions and user context) consumes 5,500 tokens?

Show Answer 8,192 - 5,500 = **2,692 tokens** available for the response. Using the approximation that 1 token is roughly 3 characters of code, this would yield approximately 8,076 characters, or roughly 200 lines of code (at 40 characters per line). This illustrates the importance of context window budgeting -- providing too much context can leave insufficient room for the response. In practice, you should aim to keep your prompt well under the context window limit to leave ample room for a complete response.

Question 23

What is Constitutional AI, and how does it relate to RLHF?

Show Answer Constitutional AI (CAI) is an alignment technique where the model is trained to evaluate its own outputs against a set of principles or "constitution." It complements RLHF by providing additional guidance on values and behaviors. While RLHF teaches the model what humans prefer through direct feedback, CAI provides principled guidelines that the model can apply consistently. This helps the model refuse harmful requests, acknowledge uncertainty, follow ethical guidelines, and be transparent about limitations. CAI is used by models like Claude as an additional alignment layer beyond RLHF.

Question 24

Which temperature range would you recommend for generating production database migration scripts, and why?

Show Answer **Low temperature (0-0.3)** would be most appropriate for database migration scripts. Database migrations are critical operations where correctness is paramount -- a mistake could cause data loss or corruption. Low temperature produces the most predictable, consistent output, heavily favoring the highest-probability (most common, well-established) patterns. You want the model to generate the most standard, well-tested migration patterns rather than creative alternatives. Creativity is not desirable when writing DDL statements that will modify production data.

Question 25

Explain the analogy: "The AI is like a talented but new team member."

Show Answer This analogy captures several key characteristics of AI coding assistants: (1) Like a talented new team member, the AI has strong general skills and knowledge from its training but lacks specific knowledge about your project, codebase, and team conventions. (2) Its code is usually good and sometimes excellent, but it needs review -- just as you would review a new colleague's pull requests. (3) It may follow common patterns that do not match your team's specific approach. (4) It benefits enormously from context and guidance -- the more you tell it about your project, the better it performs. (5) It can be confidently wrong, especially in areas outside its experience, so verification is essential.

Question 26

A model generates a Python function that uses the requests library version 1.x syntax. The current version is 2.x. What concept from this chapter best explains this error?

Show Answer This is best explained by the **knowledge cutoff** combined with **training data distribution**. The model's training data contains code from various time periods. If the training data includes substantial amounts of older code using the 1.x API, the model may generate code following those older patterns, especially if the prompt does not explicitly specify a version. Additionally, if the model's training cutoff predates the widespread adoption of 2.x patterns, it may default to the patterns it has seen most frequently. The solution is to explicitly specify the library version in your prompt or include an example of the correct API usage in the context.

Question 27

Why is it a good practice to ask an AI to outline its approach before implementing complex code?

Show Answer This practice is effective because of the **forward-only generation** constraint. Since the model cannot go back and revise earlier output, a wrong decision early in code generation (choosing the wrong algorithm, data structure, or approach) causes the entire output to build on a flawed foundation. By asking for an outline first, you can: (1) catch wrong directions before the model commits to them in code, (2) provide corrections that become part of the context for the implementation phase, (3) ensure the approach aligns with your architecture and requirements, and (4) use the approved outline as additional context that guides the implementation.

Question 28

What is the approximate rule of thumb for estimating tokens from English text versus code?

Show Answer For **English text**, approximately 1 token per 4 characters (or about 3/4 of a word per token, meaning roughly 1.33 tokens per word). For **code**, approximately 1 token per 3 characters, because code has a higher density of special characters, operators, brackets, and shorter "words" (keywords, variable names). This means: a 1,000-word English document is approximately 1,333 tokens, while a 100-line Python file (averaging 40 characters per line) is also approximately 1,333 tokens despite containing far fewer "words."