Chapter 21 Exercises: AI-Powered Workflows


Section A: Recall and Comprehension

Exercise 21.1 Define the following terms in your own words, using no more than two sentences each: (a) Retrieval-Augmented Generation (RAG), (b) hallucination, (c) embedding, (d) vector database, (e) chunking, (f) AI agent, (g) function calling, (h) faithfulness.

Exercise 21.2 Describe the two phases of a RAG pipeline (indexing and querying). For each phase, list the steps in order and explain the purpose of each step.

Exercise 21.3 Explain why cosine similarity is preferred over Euclidean distance for comparing text embeddings in most RAG applications. What property of cosine similarity makes it more appropriate for semantic comparison?

Exercise 21.4 Compare and contrast RAG and fine-tuning along the following dimensions: (a) knowledge update speed, (b) cost, (c) traceability, (d) hallucination control. Under what circumstances would you recommend fine-tuning over RAG?

Exercise 21.5 List three common chunking strategies discussed in the chapter. For each, identify one strength and one weakness. Which strategy did Tom find most effective for Athena's policy documents, and why?

Exercise 21.6 Explain the difference between dense retrieval (semantic search), sparse retrieval (keyword search), and hybrid search. Provide a specific business example where each approach would be most appropriate.

Exercise 21.7 What is the ReAct (Reasoning + Acting) pattern for AI agents? Describe the five steps of the agent loop and explain how it differs from a standard chatbot interaction.


Section B: Application

Exercise 21.8: Chunking Strategy Analysis Consider the following 800-word product warranty document for a fictional electronics retailer:

"Section 1: Coverage Period. All electronics purchased from ElectraMax carry a 24-month manufacturer warranty from the date of purchase. Extended warranties of 36 or 48 months are available at the time of purchase. Section 2: What's Covered. The warranty covers defects in materials and workmanship under normal use. This includes hardware failures, manufacturing defects, and component malfunctions. Section 3: What's Not Covered. The warranty does not cover: damage from accidents, misuse, or unauthorized modifications; normal wear and tear; cosmetic damage; software issues; damage from power surges; or items with removed or altered serial numbers. Section 4: Claim Process. To file a warranty claim, customers must contact ElectraMax support with their order number and a description of the issue. A support representative will troubleshoot the issue remotely. If the issue cannot be resolved remotely, the customer will receive a prepaid shipping label to return the item. Repairs are completed within 10-15 business days. Section 5: Replacement Policy. If an item cannot be repaired, ElectraMax will replace it with an identical or comparable product. If the original product is discontinued, a store credit for the original purchase price will be issued."

(a) Chunk this document using fixed-size chunking (200 characters per chunk, no overlap). List the resulting chunks and identify any chunks that would be difficult to understand in isolation.

(b) Now chunk the document using document-aware chunking, splitting on section headers. List the resulting chunks. Compare the quality to your fixed-size chunks.

(c) For each chunking approach, describe what would happen if a customer asked: "My laptop stopped working after 18 months. What do I do?" Which approach is more likely to retrieve a useful answer, and why?

Exercise 21.9: RAG System Design You are the VP of Operations at a 500-person law firm. Attorneys currently spend an average of 45 minutes searching for relevant precedents and internal memos per case. You want to build a RAG system over the firm's document library (200,000 documents including case briefs, legal memos, contracts, and regulatory filings).

(a) Design the RAG pipeline architecture. Specify: document sources, chunking strategy, embedding model (and justify your choice), vector database (and justify your choice), retrieval strategy, and LLM for generation.

(b) Identify three categories of metadata you would attach to each chunk and explain how each would improve retrieval quality.

(c) What specific evaluation metrics would you track? Define at least four metrics and explain why each matters for a legal knowledge base.

(d) Identify three risks specific to deploying RAG in a legal context that do not apply to a retail customer service context like Athena's. How would you mitigate each risk?

Exercise 21.10: Agent Tool Design Design a set of tools (function definitions) for an AI agent that serves as a "Travel Concierge" for a corporate travel management company. The agent should be able to:

  • Search for available flights between two cities on a given date
  • Check the company's travel policy for approval requirements
  • Look up an employee's travel budget and remaining balance
  • Book a flight (with manager approval for trips over $1,000)
  • Send booking confirmation to the traveler and their manager

(a) Write the tool definitions (name, description, parameters) for each of the five tools above. Follow the design principles from the chapter.

(b) Describe the sequence of tool calls the agent would make for the following request: "Book the cheapest direct flight from Chicago to San Francisco next Tuesday for Sarah Chen from the Marketing department."

(c) At which step(s) should the agent require human confirmation before proceeding? Why?

Exercise 21.11: Athena's Knowledge Base Governance Ravi discovered that Athena's RAG system retrieved an outdated price-match policy that included Best Buy as a qualifying competitor.

(a) Design a knowledge base governance process for Athena with at least five components. For each component, specify: what it does, who is responsible, and how often it occurs.

(b) Propose a technical solution to the stale document problem that does not rely solely on human review. How would you automate the detection of potentially outdated documents?

(c) A customer service agent notices that the RAG system gave contradictory answers about the same policy on two different days. What are three possible causes of this inconsistency, and how would you investigate each?

Exercise 21.12: Embedding Exploration Using the concept of embeddings as vectors in semantic space, predict the relative cosine similarity between the following pairs of sentences. Rank the pairs from most similar (highest cosine similarity) to least similar and justify your ranking:

  • Pair A: "The quarterly revenue exceeded projections" / "Sales numbers were above forecast"
  • Pair B: "The quarterly revenue exceeded projections" / "The fiscal quarter ended March 31"
  • Pair C: "The quarterly revenue exceeded projections" / "Our team won the basketball championship"
  • Pair D: "Can I return this sweater?" / "What is your refund policy for clothing?"
  • Pair E: "Can I return this sweater?" / "The sweater is made of merino wool"

Section C: Analysis and Evaluation

Exercise 21.13: RAG vs. Vanilla LLM — Quantitative Analysis Athena's RAG system achieved 90 percent accuracy on policy questions, up from 72 percent with a vanilla LLM.

(a) In a day with 1,200 queries, how many incorrect responses does the vanilla LLM produce? How many does the RAG system produce?

(b) If each incorrect response that reaches a customer costs Athena an estimated $35 in goodwill credits, customer service follow-up, and potential escalation, calculate the daily cost of incorrect responses for both systems.

(c) The RAG system costs $0.50 per day to operate (embedding + LLM generation). Calculate the daily net savings of the RAG system over the vanilla LLM.

(d) Ravi wants to reach 95 percent accuracy. Identify three specific pipeline improvements that could help, and for each, estimate whether the improvement would affect retrieval quality, generation quality, or both.

Exercise 21.14: Framework Selection — LangChain vs. LlamaIndex vs. Custom A Series B startup (60 employees, 10 engineers) wants to build a RAG-powered product that helps SaaS companies answer their customers' questions using their product documentation.

(a) Evaluate LangChain, LlamaIndex, and a custom-built pipeline for this use case. For each option, identify two advantages and two disadvantages specific to this startup's situation.

(b) The startup's CTO argues: "We should build custom from day one because we'll need to replace the framework eventually anyway." The VP of Engineering argues: "We should use LangChain because we need to ship in 8 weeks." Who is right, and under what conditions might the other be right?

(c) What would change about your recommendation if the startup had 500 customers (each with 100-5,000 documentation pages) and needed multi-tenant architecture?

Exercise 21.15: The Governance Imperative Professor Okonkwo says: "RAG does not solve the problem of data quality. It concentrates it."

(a) Explain what she means by "concentrates." How does RAG change the relationship between data quality and output quality compared to a vanilla LLM?

(b) In a vanilla LLM deployment, who is implicitly responsible for the accuracy of the model's responses? In a RAG deployment, who is responsible? How does this shift in responsibility affect organizational accountability?

(c) NK says: "The engineering is easy. The governance is hard." Evaluate this claim. Is the engineering of a production RAG system truly "easy"? What does NK mean, and is she right?

Exercise 21.16: Production Cost Modeling You are building a RAG system for a healthcare company's internal knowledge base. The system will index 50,000 clinical guidelines (average 3,000 words each), serve 500 clinician queries per day, and use GPT-4 for generation.

(a) Estimate the one-time indexing cost. Assume 1 token per 0.75 words for embedding, and OpenAI's text-embedding-3-small at $0.02 per million tokens. Show your work.

(b) Estimate the daily operating cost. Assume each query retrieves 5 chunks of 500 tokens each, the system prompt is 200 tokens, the user query averages 50 tokens, and the response averages 300 tokens. Use GPT-4 pricing of $30 per million input tokens and $60 per million output tokens.

(c) The CFO asks: "Could we use GPT-4o-mini instead of GPT-4 to save money?" GPT-4o-mini costs $0.15 per million input tokens and $0.60 per million output tokens. Calculate the cost reduction. Then discuss two risks of using a smaller model for clinical queries.

(d) Propose a tiered approach that uses GPT-4o-mini for some queries and GPT-4 for others. Define the criteria for routing queries to each model.


Section D: Synthesis and Design

Exercise 21.17: End-to-End RAG Project Plan You have been asked to build a RAG system for a university's student services department. The system should answer student questions about financial aid, course registration, academic policies, housing, and dining. The knowledge base consists of approximately 2,000 documents across five departmental websites, three PDF handbooks, and a FAQ database.

Write a project plan that includes:

(a) Architecture diagram: Sketch the complete system architecture from document sources to user interface.

(b) Chunking and metadata strategy: Specify your chunking approach, chunk size, overlap, and metadata schema. Justify your choices.

(c) Evaluation plan: Define at least five metrics, a test dataset of 50 query-answer pairs (describe how you would create it), and a process for ongoing quality monitoring.

(d) Governance plan: Define document ownership, update frequency, quality auditing, and escalation procedures.

(e) Risk register: Identify five risks (technical and organizational) and mitigation strategies for each.

(f) Timeline and milestones: Propose a realistic timeline from kickoff to production launch. Assume a team of two engineers and one product manager.

Exercise 21.18: Competitive Analysis — RAG Products Research and compare three commercially available RAG solutions (e.g., Amazon Bedrock Knowledge Bases, Azure AI Search + OpenAI, Google Vertex AI Search). For each:

(a) Describe the architecture and key components.

(b) Identify pricing model, scalability limits, and vendor lock-in risks.

(c) Evaluate suitability for a mid-size company (1,000 employees, 10,000 knowledge base documents, 500 queries per day).

(d) Make a recommendation and justify it. Consider the build-vs-buy framework from Chapter 1.


Solutions to selected exercises (21.1, 21.3, 21.5, 21.8a, 21.13a-c) are provided in Appendix B.