Chapter 10 Quiz: Advanced Prompting Techniques

Test your understanding of CoT, few-shot, self-critique, and structured decomposition.

Question 1

What is the core mechanism by which chain-of-thought prompting improves model accuracy on reasoning tasks?

A) It accesses a larger training dataset B) It forces the model to generate intermediate reasoning steps that constrain subsequent conclusions C) It reduces the length of the prompt, which reduces confusion D) It switches the model from a fast generation mode to a slow reasoning mode

Show Answer

**B** — CoT works by making intermediate reasoning steps explicit in the generated text. Each step becomes context that constrains the next step, preventing the model from jumping to a plausible-but-wrong conclusion. The model doesn't switch modes or access different data; the improvement comes entirely from the structure of the generation process.

Question 2

The original chain-of-thought research (Wei et al., 2022) showed approximately what improvement on math word problem accuracy when CoT prompting was used?

A) 10-15% improvement (from ~50% to ~60%) B) 3× improvement (from ~18% to ~57%) C) 2× improvement (from ~40% to ~80%) D) Minimal improvement (less than 5%)

Show Answer

**B** — The Wei et al. (2022) paper showed approximately a 3× improvement: from roughly 18% accuracy without CoT to roughly 57% with CoT prompting, on grade-school math word problems. This dramatic improvement is one of the most cited findings in prompt engineering research.

Question 3

You need to classify customer emails into 5 categories for routing. You have 10 clear examples ready. Which prompting technique should be your primary approach?

A) Chain-of-thought, to reason through each email's category B) Few-shot, to show examples of each category C) Self-critique, to have the AI review its own classifications D) Structured decomposition, to break the task into sub-steps

Show Answer

**B** — Few-shot prompting is the primary technique for classification tasks. Examples define the category boundaries more precisely than abstract definitions, establish the output format, and handle the "gray area" cases where your examples can demonstrate how to resolve ambiguity. You might add CoT for difficult individual cases, but the core technique for a classification system is few-shot.

Question 4

What is "zero-shot CoT" and what is its primary advantage?

A) CoT with no task description — useful for creative tasks B) Adding "Let's think step by step" without providing examples — produces significant reasoning improvement with minimal prompt engineering effort C) A technique where the model generates zero reasoning steps, for faster output D) CoT applied to tasks where the model has zero prior knowledge

Show Answer

**B** — Zero-shot CoT is the finding (from Kojima et al., 2022) that simply adding "Let's think step by step" (or similar phrasing) to a prompt produces substantial improvements in reasoning quality, without needing to provide any worked examples. Its main advantage is simplicity: four words produce roughly 60% of the benefit of full few-shot CoT examples.

Question 5

You are building a few-shot prompt for product descriptions. You have examples available. Which set of examples would be the best choice?

A) Your 3 most elaborate, detailed examples — to show the model your best work B) Your 3 most recent examples — to ensure the model has the latest style C) 3 examples that represent the range of product types you write about, each demonstrating the voice and format you want D) 1 excellent example and 2 average examples — to represent typical quality

Show Answer

**C** — The best few-shot examples represent the range of the task (so the model sees how the pattern applies to different inputs) while consistently demonstrating the qualities that matter most (voice, format, length). Recency and quality of individual examples matter less than representativeness and consistency. Three examples of the same type teaches less than three examples of different types.

Question 6

Research on few-shot prompting suggests what is the optimal number of examples for most tasks with large language models?

A) 1 example (maximum efficiency) B) 10-15 examples (maximum information) C) 2-6 examples (balanced performance and context efficiency) D) 0 examples — description alone is sufficient for large models

Show Answer

**C** — Research consistently shows that 2-6 examples produces the best trade-off between performance improvement and context window efficiency. Beyond 8 examples, the marginal gain diminishes significantly, and the examples consume context window space that could be used for other information. For large frontier models, 3 high-quality examples often achieves near-maximum performance.

Question 7

What is the primary limitation of self-critique prompting that you should be aware of?

A) It always makes outputs longer, which is sometimes undesirable B) It cannot be used in the same prompt as the original generation — requires a separate exchange C) Models can produce sycophantic critiques that identify trivial issues while missing real ones, especially without explicit criteria D) It only works for written text, not code or data

Show Answer

**C** — The most important limitation of self-critique is the sycophancy risk: without explicit evaluation criteria, models tend to validate their own output rather than genuinely critique it, identifying minor stylistic issues while missing substantive problems. The fix is to provide specific, measurable criteria and to instruct the model to find weaknesses even if the output seems good.

Question 8

In the "plan then execute" approach to structured decomposition, why do you create and review the plan before executing any content?

A) It reduces total token cost B) It allows you to approve the structure before the model makes large structural decisions that are expensive to undo C) It prevents the model from hallucinating content in the outline D) It is required by most AI platforms to enable long-form generation

Show Answer

**B** — The plan-then-execute approach lets you review and revise the skeleton before content is written. If a 10-page document is structured wrong from the outset, fixing it after generation requires major rework. Getting approval on the structure first means you only execute content within a framework you've already validated. This is especially important for documents with complex, interdependent sections.

Question 9

What is the key difference between few-shot examples that include reasoning traces and few-shot examples that include only input/output pairs?

A) Reasoning trace examples are only useful for math problems B) Input/output-only examples teach the model the desired output format; reasoning trace examples teach both the output format AND the desired reasoning process C) Reasoning trace examples require more examples to be effective D) There is no meaningful difference for modern large language models

Show Answer

**B** — Including reasoning steps in few-shot examples is the combination of few-shot and chain-of-thought techniques. Input/output pairs teach format, style, and what the answer should look like. Reasoning trace examples additionally teach the model how to think about the problem — which steps to take, what to consider first, how to structure the analysis. This combination is especially powerful for tasks that require both consistent format and multi-step reasoning.

Question 10

Alex wants to write product copy consistently in her brand's voice. She has tried describing the voice ("warm, energetic, lifestyle-focused") but results are generic. What should she do?

A) Add more adjectives to her voice description — try 10-15 descriptors instead of 3 B) Use few-shot prompting with 3-5 examples of existing brand copy she's proud of C) Use chain-of-thought to reason through what the voice should sound like D) Use self-critique to have the AI evaluate its own copy against the voice description

Show Answer

**B** — Few-shot examples are the right solution when abstract description fails to convey style. Three to five examples of real brand copy communicate the specific vocabulary register, sentence rhythm, energy level, and structural patterns far more precisely than any number of adjectives. The key insight from the chapter: demonstrating is more effective than describing for style transfer.

Question 11

Tree-of-Thought prompting is most appropriate for which situation?

A) Any task involving more than 3 steps B) Tasks where there are genuinely multiple viable approaches and the early choice of path significantly affects the outcome C) Creative writing tasks only D) Tasks where you need to generate more than 1,000 words

Show Answer

**B** — Tree-of-Thought is specifically valuable when the problem has multiple legitimate solution paths and committing to the wrong path early leads to poor outcomes. By exploring multiple paths before committing, ToT avoids the trap of standard CoT (which commits to one approach immediately). For most business tasks, CoT alone is sufficient; ToT adds value when the problem is genuinely multi-path.

Question 12

Which of the following is the best description of the "constitutional self-correction" approach?

A) Using AI to write a code of conduct document B) Evaluating AI output against a specific list of principles or standards, then revising to meet all standards C) A training technique used by AI companies — not applicable to prompting D) Asking the AI to critique its output from the perspective of a hypothetical "ideal user"

Show Answer

**B** — Constitutional self-correction (adapted from Anthropic's Constitutional AI approach for prompting use) involves specifying a set of explicit principles or standards the output must meet, asking the model to evaluate its output against each one, and then revising to address any failures. The "constitution" is simply a numbered list of quality criteria specific to your use case. This is more rigorous than open-ended critique because it forces evaluation against specific, pre-defined standards.

Question 13

Raj is debugging code using CoT prompting. The model produces a reasoning trace that says "Step 1: The function receives input. Step 2: It processes the data. Step 3: It returns the result." This is an example of what problem?

A) Correct CoT usage — the steps are logical and clear B) "Fake reasoning" — the steps are vague and do not build on each other with specific conclusions C) Over-decomposition — too many steps for a simple function D) Format inconsistency — the steps should be numbered differently

Show Answer

**B** — This is the fake reasoning problem described in the chapter. The three "steps" are generic descriptions that would apply to any function — they don't actually trace through the specific code logic, don't commit to specific intermediate conclusions, and don't build on each other. Real CoT reasoning would say things like "Step 1: The function takes an integer x and a list L. Step 2: It iterates through L checking x > each element — but notice this uses > not >=, which means equal values are excluded..." The fix is to require the model to commit to specific, code-level observations at each step.

Question 14

When combining the "role" technique with "self-critique," what is the benefit of the role assignment?

A) It allows the model to access domain-specific training data B) It establishes a perspective with built-in quality standards — a specific role naturally embodies certain evaluative criteria C) It makes the output longer, which is helpful for critique D) It prevents the model from being too brief in its critique

Show Answer

**B** — Combining role + self-critique works because the role assignment establishes an evaluative perspective with implicit standards. A "senior editor" has internalized what makes writing good or bad. A "security researcher" knows what makes code vulnerable. The role provides the standards framework; the self-critique instruction activates the evaluative function. This is more effective than self-critique without a role, because the role's standards are richer and more contextually appropriate than general quality criteria.

Question 15

Elena's five-step self-critique protocol includes a "factual audit" step where the model rates each claim as (A) demonstrably correct, (B) plausible but unverified, or (C) potentially incorrect. What is the primary professional value of this step?

A) It eliminates all factual errors from the output B) It creates a checklist of what Elena needs to verify with her own expertise, allowing her to focus human review where it matters most C) It allows the AI to access the internet to check facts in real time D) It prevents the model from hallucinating entirely

Show Answer

**B** — The factual audit does not eliminate errors — AI cannot reliably self-verify factual claims. What it does is create a prioritized verification checklist. Elena doesn't need to read every sentence with equal skepticism; she can focus her expert review on the Category B and C claims that the AI itself has flagged as uncertain or potentially wrong. This dramatically reduces the time required for human review while concentrating it where it adds the most value.