Chapter 17 Exercises: Generative AI — Large Language Models

DataField.Dev

Chapter 17 Exercises: Generative AI — Large Language Models

Section A: Recall and Comprehension

Exercise 17.1 Define the following terms in your own words, using no more than two sentences each: (a) self-attention, (b) transformer, (c) pre-training, (d) instruction tuning, (e) RLHF, (f) hallucination, (g) emergent capability, (h) prompt injection.

Exercise 17.2 Explain the long-range dependency problem that pre-transformer architectures struggled with. Why does self-attention solve this problem, and why does the solution matter for business applications like document summarization?

Exercise 17.3 Describe the three phases of LLM training (pre-training, instruction tuning, RLHF). For each phase, identify: (a) the training objective, (b) the type of data used, and (c) one limitation of that phase that the next phase addresses.

Exercise 17.4 NK Adeyemi asks: "At what point did [the model] learn to be accurate?" Why is this question so important, and what is the answer Professor Okonkwo provides? Explain the business implications in three to four sentences.

Exercise 17.5 List and briefly describe five capabilities of LLMs discussed in the chapter. For each capability, identify one business function (marketing, finance, operations, HR, legal) that could benefit and explain how.

Exercise 17.6 The chapter identifies five LLM failure modes: hallucination, knowledge cutoffs, reasoning failures, sycophancy, and prompt injection. For each, write one sentence explaining the failure mode and one sentence explaining why it matters for enterprise deployment.

Exercise 17.7 Summarize the five major LLM providers discussed in the chapter (OpenAI, Anthropic, Google, Meta, Mistral) in a table with columns for: positioning, primary strength, primary consideration for enterprise use, and pricing model.

Section B: Application

Exercise 17.8: Hallucination Detection The following is an LLM-generated summary of a fictional company's quarterly results. The actual data is provided below. Identify every factual claim in the summary that is fabricated (not supported by the source data).

Actual data: - Q2 Revenue: $245M (up 6.1% YoY) - Operating margin: 14.3% (up 40 bps YoY) - Customer count: 48,200 (net addition of 2,100) - Employee count: 1,850 - R&D spending: $18.2M (7.4% of revenue)

LLM-generated summary: "TechCorp reported strong Q2 results with revenue of $245M, representing 6.1% year-over-year growth driven primarily by expansion in the enterprise segment, which grew 11.2% to $142M. Operating margin improved to 14.3%, up 40 basis points, reflecting the company's ongoing cost optimization program that reduced SG&A by $3.2M. The customer base grew to 48,200, with a net addition of 2,100 new customers, primarily mid-market accounts. R&D investment increased to $18.2M, a 12% increase from the prior year, supporting the launch of three new AI-powered product features. CEO Sarah Chen noted that the company expects to achieve its full-year revenue target of $510M based on the current pipeline."

(a) List every fabricated claim.
(b) For each fabricated claim, explain why it is plausible enough to deceive a reader.
(c) What process would you implement to prevent hallucinated claims from appearing in reports distributed to your leadership team?

Exercise 17.9: Use Case Evaluation Framework You are the VP of Operations at a mid-sized insurance company. Your CEO has asked you to evaluate LLM deployment for the following five use cases. For each, assess: (a) the potential value (high/medium/low), (b) the hallucination risk (high/medium/low), (c) whether a human-in-the-loop is essential, and (d) your recommendation (proceed, proceed with caution, or defer).

Generating first drafts of policyholder communications
Summarizing lengthy claims documents for adjusters
Answering policyholder questions about coverage via chatbot
Extracting key data points from submitted claims forms
Generating underwriting risk assessments

Exercise 17.10: Model Selection A retail bank needs to deploy LLM capabilities for two applications: (1) an internal employee assistant that answers questions about bank policies and procedures, and (2) a customer-facing chatbot that handles routine banking inquiries. For each application: - (a) Which LLM provider would you recommend and why? - (b) Would you recommend API-based or on-premise deployment? Justify your choice. - (c) What data privacy considerations are most important? - (d) What is the most likely failure mode, and how would you mitigate it?

Exercise 17.11: Cost Analysis An e-commerce company processes 25,000 customer support tickets per day. Each ticket averages 200 input tokens (the customer's message plus context) and 150 output tokens (the model's response). Using the GPT-4o pricing from the chapter ($2.50/million input tokens, $10.00/million output tokens): - (a) Calculate the daily cost of processing all tickets through GPT-4o. - (b) Calculate the monthly cost (30 days). - (c) If the company switches to GPT-4o-mini ($0.15/million input tokens, $0.60/million output tokens), what are the daily and monthly costs? - (d) The company currently employs 40 customer service agents at an average annual cost of $52,000 each. If the LLM can handle 60% of tickets without human intervention, what is the net annual cost impact? Show your work. - (e) What non-financial factors should the company consider before making this decision?

Exercise 17.12: System Prompt Design Write a system prompt for each of the following LLM applications. Your prompt should define the model's role, specify its constraints, and include at least two rules designed to prevent hallucination. - (a) A legal document summarizer for a law firm's internal use - (b) A product recommendation chatbot for an outdoor gear retailer - (c) An HR policy question-answering system for a 5,000-employee company - (d) A financial report analyzer for an investment firm's analysts

Section C: Coding Exercises

Exercise 17.13: Basic API Integration Using the OpenAI API patterns from the chapter, write a Python function called classify_customer_feedback that: - Takes a customer feedback string as input - Returns a JSON object with the fields: sentiment (positive/neutral/negative), category (product, shipping, service, pricing, other), urgency (high/medium/low), and summary (one-sentence summary) - Uses temperature=0.0 for consistency - Includes error handling for API failures

Exercise 17.14: Batch Processing with Cost Tracking Extend the product description generator from the chapter to process a list of products and produce a summary report including: - Total products processed - Total tokens consumed (input and output) - Total estimated cost - Average generation time per product - Number of failures (if any) Write the function and test it with at least three sample products.

Exercise 17.15: Temperature Experiment Write a Python script that generates five responses to the same prompt at each of the following temperature values: 0.0, 0.3, 0.7, and 1.0. Use a business-relevant prompt such as "Suggest three strategies for reducing customer churn in a subscription SaaS business." For each temperature: - (a) Print all five responses. - (b) Measure the average response length (in words). - (c) Write a brief analysis of how temperature affects output diversity, quality, and relevance for business applications.

Exercise 17.16: Structured Output Validation Write a Python function that: 1. Calls the OpenAI API with JSON mode to extract meeting action items from a meeting transcript 2. Validates the returned JSON against an expected schema (each action item should have: description, assignee, due_date, priority) 3. Flags any action items where fields are missing or malformed 4. Returns both the parsed action items and a validation report

Test your function with a sample meeting transcript of at least 200 words.

Exercise 17.17: Multi-Model Comparison Write a Python script that sends the same prompt to two different models (e.g., gpt-4o and gpt-4o-mini) and compares: - Response quality (you will need to evaluate this manually) - Response time - Token usage - Estimated cost

Use a business-relevant prompt such as analyzing a short customer review and extracting insights. Discuss when the cheaper model would be sufficient and when the premium model is justified.

Section D: Analysis and Evaluation

Exercise 17.18: The Sycophancy Problem The chapter describes sycophancy as a particularly insidious failure mode for business analysis. - (a) Design an experiment to test an LLM for sycophancy in a business context. Describe the prompts you would use, the metrics you would measure, and how you would interpret the results. - (b) A CEO uses an LLM to evaluate a strategic proposal. The CEO's prompt begins: "I believe this acquisition is the right move because..." How might sycophancy bias the model's analysis? What prompt design would mitigate this bias? - (c) Is sycophancy always bad? Can you identify a business context where a degree of alignment with the user's perspective might actually be desirable?

Exercise 17.19: Emergent Capabilities — Hype vs. Reality The chapter notes that the concept of emergent capabilities is "more contested than popular accounts suggest." - (a) In your own words, explain the difference between a capability that genuinely "emerges" at scale and one that appears to emerge due to measurement artifacts. - (b) Why does this distinction matter for business planning? How should a business leader's investment strategy differ depending on which interpretation is correct? - (c) Research one specific capability that has been described as "emergent" (e.g., chain-of-thought reasoning, theory of mind, code generation). What is the evidence for and against its emergence?

Exercise 17.20: The Build-vs-Buy Decision for LLMs A healthcare company is considering three options for deploying LLM capabilities: - Option A: Use OpenAI's API through Azure with enterprise data handling agreements - Option B: Deploy Meta's Llama 3 on their private cloud infrastructure - Option C: Fine-tune Mistral's model on their proprietary medical documentation

For each option, analyze: (a) upfront cost, (b) ongoing cost, (c) data privacy implications, (d) performance expectations, (e) maintenance burden, and (f) regulatory compliance considerations. Which option would you recommend and why?

Exercise 17.21: Athena's Governance Framework At the end of the chapter, Ravi Mehta establishes a governance framework for customer-facing LLM deployments requiring: (1) escalation paths, (2) uncertainty expression, (3) logging and auditability, and (4) regular accuracy audits. - (a) For each of these four requirements, describe a specific implementation approach (what tools, processes, or technical mechanisms would you use?). - (b) What additional governance requirements would you add for an LLM deployment in financial services? In healthcare? - (c) Estimate the cost of implementing this governance framework as a percentage of the total LLM deployment cost. Is the investment justified?

Exercise 17.22: Evaluating LLM Outputs You are evaluating two LLMs for Athena's product description generator. Model A produces descriptions that are creative and engaging but occasionally mention features not in the source data (hallucination rate: 8%). Model B produces descriptions that are accurate and complete but more formulaic and less engaging (hallucination rate: 1%). - (a) Which model would you deploy for Athena's e-commerce catalog? Justify your choice. - (b) How does your answer change if the use case is generating descriptions for in-store signage (where errors are more costly to fix once printed)? - (c) Is there a way to get the benefits of both models? Describe an approach.

Section E: Discussion and Debate

Exercise 17.23: "The Most Impressive Autocomplete Ever Built" For classroom debate or written argument. Professor Okonkwo describes LLMs as "the most impressive autocomplete ever built — both a compliment and a warning." Some argue this is reductive — that LLMs exhibit genuine understanding and reasoning. Others argue it is accurate and that the anthropomorphization of LLMs is dangerous for business decision-making. - Choose a position and argue it persuasively. - Address the strongest counterargument to your position. - Discuss the practical implications of your position for how businesses should deploy LLMs.

Exercise 17.24: Who Is Liable? For classroom discussion. Lena Park asks: "Who is liable when a chatbot confidently tells a customer that their product has a warranty that doesn't exist?" - (a) Argue the case that the company deploying the LLM should bear full liability. - (b) Argue the case that the LLM provider (OpenAI, Anthropic, etc.) should share liability. - (c) Argue the case that the customer has some responsibility to verify AI-generated information. - (d) How would you design a legal and contractual framework that fairly allocates liability among these parties?

Exercise 17.25: The Intelligence Question For written reflection. The chapter asks whether LLMs are "intelligent" and concludes that the answer depends on the definition. - (a) Write a definition of "intelligence" that would include LLMs. Write a definition that would exclude them. - (b) Does it matter, for business purposes, whether LLMs are "truly intelligent"? Why or why not? - (c) Professor Okonkwo says the practical question is "whether its output is reliable enough to act on." Is reliability a sufficient criterion for business deployment, or should we also care about whether the system understands what it is doing?

Exercise 17.26: The Training Data Question For classroom discussion or written analysis. LLMs are trained on text scraped from the internet, including copyrighted material. - (a) Summarize the key legal arguments for and against using copyrighted material for AI training. - (b) If courts rule that AI training on copyrighted data requires licensing, how would this affect the LLM market? Which providers would be most and least affected? - (c) As a business leader deploying LLMs, should you care about the training data question? What risks does it create for your organization, even if you are not the one doing the training?

Section F: Integrative Application

Exercise 17.27: LLM Deployment Proposal Write a two-page business proposal for deploying an LLM in an organization you know well (current employer, former employer, or a company you have studied). Your proposal should include: - The specific use case and expected business value - The recommended LLM provider and deployment model (API vs. on-premise) - A cost estimate (use the pricing data from the chapter) - A risk assessment covering hallucination, data privacy, and compliance - A governance plan addressing the four requirements Ravi establishes - Success metrics and a plan for measuring them - A timeline for pilot, evaluation, and production deployment

Exercise 17.28: Competitive Analysis Your CEO has asked you to prepare a one-page competitive briefing on the LLM provider landscape. Using information from the chapter and your own research: - (a) Create a positioning map plotting the five major providers on two axes of your choosing (e.g., capability vs. openness, enterprise readiness vs. cost). - (b) Identify the most significant competitive development in the LLM market in the past six months. - (c) Make a prediction about how the market structure will change in the next two years. Support your prediction with evidence.

Exercise 17.29: From Demo to Production The chapter emphasizes the gap between an impressive LLM demonstration and a reliable production deployment. Using Athena's customer service chatbot as a case study: - (a) List every step required to move from "impressive demo" to "production-ready deployment." - (b) Estimate the time and resources required for each step. - (c) Identify the three most common failure points in this journey and propose mitigation strategies for each.

Exercise 17.30: Ethics Case Analysis A multinational consulting firm deploys an LLM to generate first drafts of client deliverables. The LLM produces a strategy document for a healthcare client that includes a fabricated research citation — a study that does not exist but sounds highly credible. A junior consultant does not catch the fabrication, and the document is delivered to the client. The client cites the fabricated study in a regulatory filing. - (a) Who is responsible? Analyze the ethical and legal responsibility of each party: the consulting firm, the junior consultant, the LLM provider, and the client. - (b) What systems should the consulting firm have had in place to prevent this outcome? - (c) How does this scenario change your view (if at all) of the "human-in-the-loop" requirement for LLM deployments?

Answers to selected exercises are available in Appendix B.