Chapter 17 Quiz: Generative AI — Large Language Models


Multiple Choice

Question 1. What fundamental problem did the transformer architecture solve that previous sequence models (RNNs, LSTMs) struggled with?

  • (a) Transformers can process images in addition to text.
  • (b) Transformers solved the long-range dependency problem by allowing every word to attend to every other word simultaneously.
  • (c) Transformers eliminated the need for training data.
  • (d) Transformers reduced the number of parameters needed for language modeling by 90%.

Question 2. In the self-attention mechanism, what are the three learned transformations applied to each input token?

  • (a) Encoder, decoder, and classifier
  • (b) Input, hidden, and output
  • (c) Query, key, and value
  • (d) Embedding, position, and context

Question 3. During the pre-training phase, what is the primary training objective for most large language models?

  • (a) Classify text into predefined categories
  • (b) Predict the next token in a sequence
  • (c) Maximize factual accuracy on benchmark tests
  • (d) Generate text that matches a human reference response

Question 4. Why does Professor Okonkwo describe hallucination as "a feature of the architecture, not a bug to be patched"?

  • (a) Because hallucination helps generate creative content that is more engaging.
  • (b) Because the training objective (next-token prediction) optimizes for plausibility, not factual accuracy, and the model has no internal fact-checking mechanism.
  • (c) Because hallucination only occurs in older models and has been eliminated in GPT-4 and later.
  • (d) Because hallucination is intentionally programmed to encourage human oversight.

Question 5. What is RLHF (Reinforcement Learning from Human Feedback)?

  • (a) A method for training LLMs on reinforcement learning tasks like game playing
  • (b) A training technique that aligns LLM outputs with human preferences by training a reward model on human evaluations
  • (c) A method for allowing LLMs to learn from user feedback during deployment
  • (d) A type of pre-training that uses human-labeled data instead of internet text

Question 6. In Athena's customer service chatbot evaluation, what was the incorrect information rate, and why was it considered unacceptable despite being a small percentage?

  • (a) 1.2%; because even a single error could result in a lawsuit
  • (b) 4.2%; because it would scale to approximately 13,500 customers per year receiving wrong information
  • (c) 8.5%; because it exceeded the industry standard of 5%
  • (d) 15%; because it required too many escalations to human agents

Question 7. Which of the following is NOT a documented LLM failure mode discussed in the chapter?

  • (a) Sycophancy — agreeing with the user even when the user is wrong
  • (b) Prompt injection — adversarial inputs that override the model's instructions
  • (c) Data poisoning — attackers corrupting the model's weights during inference
  • (d) Knowledge cutoffs — inability to answer questions about events after the training data cutoff

Question 8. According to the chapter, what is the recommended decision hierarchy for LLM customization?

  • (a) Build a custom model first, then fall back to fine-tuning if the cost is too high
  • (b) Start with prompting, try fine-tuning if prompting plateaus, and only consider custom models for unique use cases with substantial resources
  • (c) Fine-tune first for best performance, then use prompting for lower-priority tasks
  • (d) Use the largest available model for all tasks to maximize quality

Question 9. Why does the chapter recommend against selecting an LLM provider based solely on benchmark scores?

  • (a) Benchmarks are always fraudulent.
  • (b) Benchmarks measure standardized tasks that may not resemble your specific use case, and they can be inflated by data contamination.
  • (c) Benchmarks only measure speed, not quality.
  • (d) Benchmarks are only available for open-source models.

Question 10. What is Retrieval-Augmented Generation (RAG), and why did Athena adopt it for the customer service chatbot?

  • (a) RAG is a training technique that makes models larger. Athena used it to improve the chatbot's vocabulary.
  • (b) RAG retrieves relevant documents and provides them as context to the LLM, grounding its responses in actual data. Athena adopted it to reduce the chatbot's hallucination rate.
  • (c) RAG is a data compression technique that reduces API costs. Athena used it to manage expenses.
  • (d) RAG is a security protocol that prevents prompt injection. Athena adopted it for compliance.

Question 11. When would fine-tuning an LLM be preferred over prompt engineering?

  • (a) For any task where accuracy matters
  • (b) For high-volume production tasks with consistent formats where prompt engineering has plateaued and cost reduction is needed
  • (c) Whenever the model gives incorrect answers
  • (d) Only when using open-source models

Question 12. What is the key distinction between API-based and on-premise LLM deployment for enterprises?

  • (a) API-based deployment is always cheaper.
  • (b) On-premise deployment provides the latest frontier models, while API deployment uses older models.
  • (c) API deployment sends data to an external service (raising privacy concerns), while on-premise deployment keeps data within the organization's network.
  • (d) On-premise deployment does not require any technical expertise.

Question 13. In the chapter's product description generator example, what percentage of generated descriptions required only minor edits?

  • (a) 42%
  • (b) 60%
  • (c) 72%
  • (d) 88%

Question 14. Which LLM provider is positioned as the "open-source alternative" that allows enterprises to download, customize, and deploy models on their own infrastructure?

  • (a) OpenAI
  • (b) Anthropic
  • (c) Google
  • (d) Meta (Llama)

Question 15. What does the temperature parameter control in an LLM API call?

  • (a) The speed at which the model generates tokens
  • (b) The randomness of token selection — lower values produce more deterministic outputs, higher values produce more diverse outputs
  • (c) The maximum number of tokens in the response
  • (d) The factual accuracy of the response

Short Answer

Question 16. In two to three sentences, explain why the chapter describes sycophancy as "particularly insidious for business use cases." Include a specific business scenario where sycophancy could lead to a poor decision.


Question 17. The chapter states that Athena's product description generator achieved a "60% reduction in content creation time." Explain how this metric was calculated and what human activities still account for the remaining 40%.


Question 18. Explain the concept of exponential backoff in error handling and why it is important for production LLM applications. What would happen if you retried immediately without any delay?


Question 19. Lena Park raises the liability question: who is responsible when an LLM chatbot gives incorrect information to a customer? Summarize Ravi Mehta's four-part governance framework for addressing this risk.


Question 20. The chapter compares LLM outputs to "a report from a brilliant intern who has read everything but experienced nothing, and who would rather make something up than admit ignorance." In three to four sentences, explain why this analogy is apt and identify one way in which it is misleading.


True or False (with Justification)

For each statement, indicate whether it is true or false and provide a one-sentence justification citing evidence from the chapter.

Question 21. The transformer architecture processes input tokens sequentially, one at a time, similar to recurrent neural networks.


Question 22. According to the chapter, hallucination rates for the best current models (GPT-4, Claude, Gemini) are effectively zero for factual claims.


Question 23. The chapter recommends that enterprises should always deploy the most capable (and most expensive) LLM model for every use case to minimize quality risk.


Question 24. Athena's RAG-grounded chatbot reduced the incorrect information rate from 4.2% to 0.6% in subsequent testing.


Question 25. According to the chapter, parameter count is a reliable and sufficient indicator of an LLM's capability — a model with more parameters will always outperform one with fewer parameters.


Answer key available in Appendix B.