Chapter 39 Exercises: Building AI-Powered Applications

These exercises are organized into five tiers based on Bloom's taxonomy, progressing from basic recall to creative challenges.

Tier 1: Recall (Exercises 1-6)

These exercises test your ability to remember key concepts from the chapter.

Exercise 1: The AI Feature Spectrum

List the five levels of the AI feature spectrum described in Section 39.1, and provide one example application for each level that was NOT used in the chapter.

Exercise 2: RAG Phases

Name and describe the three phases of a Retrieval-Augmented Generation system. For each phase, identify the key technology or component involved.

Exercise 3: Conversation Management Strategies

List the three strategies for managing conversation history in a chatbot. For each strategy, identify one advantage and one disadvantage.

Exercise 4: Error Categories

Reproduce the table of common AI API error types from Section 39.5. For each error type, state whether it is retryable or non-retryable.

Exercise 5: Cost Optimization Techniques

List at least five cost optimization techniques for AI features discussed in Section 39.8. Rank them in order of typical impact from highest to lowest.

Exercise 6: Fallback Hierarchy

Describe the five-level fallback hierarchy for AI features presented in Section 39.10. Explain why the order matters.

Tier 2: Understand (Exercises 7-12)

These exercises test your ability to explain concepts in your own words.

Exercise 7: RAG vs. Fine-Tuning

Explain in your own words why RAG is often preferred over fine-tuning for adding domain-specific knowledge to an AI application. Under what circumstances might fine-tuning be a better choice?

Exercise 8: Streaming UX

Explain why streaming AI responses improves perceived performance even though the total time to generate the full response is the same. Describe at least two UI patterns that complement streaming to create a better user experience.

Exercise 9: Prompt Injection Analogy

The chapter compares prompt injection to SQL injection. Explain this analogy in detail. What is the shared vulnerability pattern? How are the mitigation strategies similar and different?

Exercise 10: Token Economics

A startup processes 50,000 customer support interactions per month. Each interaction averages 2,000 input tokens and 800 output tokens. Using the pricing table in Section 39.8, calculate the monthly cost for each of the following models: Claude 3.5 Haiku, Claude 3.5 Sonnet, and GPT-4o. Which model would you recommend and why?

Exercise 11: Semantic Caching Trade-offs

Explain how semantic caching differs from exact-match caching. What additional infrastructure does semantic caching require? When would exact-match caching be sufficient, and when would semantic caching provide meaningful benefit?

Exercise 12: Quality Gate Philosophy

The chapter states "Never ship AI-generated content directly to users without at least one quality gate." Explain why this is important. What could go wrong without quality gates? Describe a scenario where a missing quality gate causes a real business problem.

Tier 3: Apply (Exercises 13-18)

These exercises ask you to apply the concepts to practical coding tasks.

Exercise 13: Build a Conversation Manager

Implement a ConversationManager class that supports the summarization strategy for history management. When the conversation exceeds a token threshold, the class should: 1. Summarize the older messages into a condensed summary using an AI call. 2. Replace the older messages with the summary. 3. Keep the most recent N messages intact.

Write the class with full type hints, docstrings, and a test script that demonstrates the summarization behavior.

Exercise 14: Implement Semantic Caching

Build a SemanticCache class that: 1. Stores prompt-response pairs with their embeddings. 2. On a cache lookup, computes the embedding of the new prompt and finds the most similar cached prompt. 3. Returns the cached response if the similarity score exceeds a configurable threshold. 4. Tracks cache hit rates and average similarity scores.

Use a simple in-memory store and cosine similarity for the similarity calculation. Include a function that computes cosine similarity from scratch (without external libraries beyond basic math).

Exercise 15: RAG Document Indexer

Create a DocumentIndexer class that: 1. Accepts documents as strings with metadata (title, source, date). 2. Splits documents into chunks using recursive character splitting. 3. Generates mock embeddings (random vectors for testing purposes). 4. Stores chunks with their embeddings and metadata. 5. Supports similarity search with metadata filtering.

The class should be usable as a standalone component that could later be connected to a real embedding API and vector database.

Exercise 16: Model Router Implementation

Implement a ModelRouter class that: 1. Accepts routing rules as a list of conditions and target models. 2. Evaluates rules in priority order for each request. 3. Logs routing decisions for analysis. 4. Supports a default fallback model. 5. Tracks usage statistics per model.

Test it with at least five different request types and verify that each is routed to the expected model.

Exercise 17: AI Regression Test Suite

Build a regression test framework for AI features that: 1. Defines test cases as structured data (input prompt, expected criteria). 2. Runs all tests against a mock AI client. 3. Reports pass/fail status for each criterion. 4. Generates a summary report with overall pass rate and details of failures.

Create at least 10 test cases for a hypothetical customer support chatbot.

Exercise 18: Prompt Version Manager

Implement a file-based prompt versioning system that: 1. Stores prompts as YAML files with version metadata. 2. Supports creating new versions, listing all versions, and activating a specific version. 3. Maintains an audit log of all version changes. 4. Supports environment-specific active versions (dev, staging, production).

Write a CLI script that demonstrates creating three versions of a prompt and switching between them.

Tier 4: Analyze (Exercises 19-24)

These exercises ask you to break down problems, compare approaches, and make reasoned judgments.

Exercise 19: Architecture Pattern Analysis

A food delivery app wants to add three AI features: 1. A chatbot for customer support (real-time, conversational). 2. Automatic classification of restaurant menu items into dietary categories (batch, high volume). 3. Personalized restaurant recommendations based on order history (medium frequency, personalized).

For each feature, recommend which architecture pattern (direct API, service layer, or async processing) is most appropriate. Justify your choice by analyzing latency requirements, volume, cost, and failure tolerance.

Exercise 20: Chunking Strategy Comparison

Given a technical documentation site with the following content types: - API reference pages (structured, with code examples) - Tutorial articles (narrative, with step-by-step instructions) - FAQ pages (question-answer pairs) - Changelog entries (short, dated, technical)

Analyze which chunking strategy would work best for each content type. Consider how each strategy preserves the meaning and usefulness of the chunks for a RAG system. Propose a unified chunking approach that handles all four types.

Exercise 21: Cost-Quality Trade-off Analysis

A company's AI feature uses Claude 3.5 Sonnet for all requests and processes 200,000 requests per month. Average tokens: 1,500 input, 600 output. Monthly cost is approximately $4,800.

Analyze the following optimization strategies and estimate the cost savings and quality impact of each: 1. Switch all requests to Claude 3.5 Haiku. 2. Implement exact-match caching (estimated 30% cache hit rate). 3. Implement model routing: 60% of requests to Haiku, 40% to Sonnet. 4. Reduce average input tokens by 25% through prompt compression.

Which combination of strategies would you recommend? Justify your answer with calculated costs.

Exercise 22: Evaluation Framework Design

Design a comprehensive evaluation framework for a RAG-based Q&A system that answers questions about a company's product documentation. Your framework should: 1. Define at least five evaluation dimensions with clear rubrics. 2. Specify automated metrics for each dimension. 3. Include a human evaluation protocol. 4. Describe how to create and maintain a test dataset. 5. Define alerting thresholds for when quality drops below acceptable levels.

Present your framework as a structured document that could be used by a development team.

Exercise 23: Security Threat Model

Perform a security analysis of the following AI-powered application: a customer-facing chatbot that has access to a RAG knowledge base containing product documentation and pricing information. The chatbot can also look up order status using an API tool.

Identify at least five security threats, classify their severity (low/medium/high/critical), and propose mitigations for each. Consider prompt injection, data leakage, unauthorized tool use, and other attack vectors.

Exercise 24: Monitoring Dashboard Design

Design the monitoring dashboard for an AI-powered application that includes a chatbot, a content generation feature, and a document classification system. Specify: 1. The key metrics for each feature. 2. The visualizations (charts, tables, alerts) for each metric. 3. The refresh frequency for each dashboard component. 4. Alert conditions and escalation procedures. 5. How you would investigate a sudden drop in quality scores.

Present your design as a specifications document.

Tier 5: Create (Exercises 25-30)

These exercises challenge you to build complete systems that combine multiple concepts from the chapter.

Exercise 25: Build a Complete RAG System

Build a complete RAG system that: 1. Loads documents from a directory of text files. 2. Chunks and indexes them with embeddings (use a real embedding API or a mock). 3. Accepts user questions via a command-line interface. 4. Retrieves relevant chunks and generates answers. 5. Cites the source document and chunk for each answer. 6. Tracks and displays retrieval quality metrics (relevance scores).

The system should be a single Python script (or small package) that can be run from the command line. Include error handling, logging, and at least basic quality evaluation of the generated answers.

Exercise 26: Production-Ready Chatbot

Build a complete chatbot application with: 1. A FastAPI backend with streaming support. 2. Conversation management with the summarization strategy. 3. A system prompt that defines a specific persona. 4. User memory that persists across sessions (use SQLite). 5. Automatic escalation detection based on sentiment and explicit requests. 6. Request logging and cost tracking.

The chatbot should be fully functional and ready for deployment. Include a simple HTML frontend that demonstrates streaming responses.

Exercise 27: Content Generation Pipeline

Build a content generation pipeline for blog posts that: 1. Takes a topic and target audience as input. 2. Generates an outline with AI. 3. Expands each outline section into a draft. 4. Runs the draft through a quality gate (word count, readability, required sections). 5. If the quality gate fails, provides feedback and regenerates. 6. Outputs the final post with quality scores and metadata.

Implement the pipeline as a Python class with clear step-by-step execution and logging.

Exercise 28: Multi-Provider AI Client

Build a production-ready AI client library that: 1. Supports Anthropic and OpenAI as providers. 2. Implements automatic retries with exponential backoff. 3. Includes response caching with configurable TTL. 4. Tracks cost per request and cumulative cost per feature. 5. Supports automatic fallback from primary to secondary provider. 6. Provides both synchronous and asynchronous interfaces. 7. Logs all requests with structured data for monitoring.

Package the library as a reusable module with a clean API and comprehensive docstrings.

Exercise 29: AI Feature A/B Testing Platform

Build an A/B testing platform for AI prompts that: 1. Defines experiments with multiple prompt variants and traffic splits. 2. Deterministically assigns users to variants based on user ID. 3. Collects quality metrics (automated scores and simulated user feedback). 4. Computes statistical significance using a simple hypothesis test. 5. Generates experiment reports with recommendations. 6. Supports experiment lifecycle management (create, start, stop, analyze).

Include a demonstration script that runs a simulated experiment with 1,000 requests and produces a report.

Exercise 30: End-to-End AI-Powered Application

Build a complete AI-powered application that combines at least three concepts from this chapter. Choose one of the following scenarios or propose your own:

Option A — AI-Powered Study Assistant: A web application where users upload course materials (PDFs, notes), the system indexes them with RAG, and users can ask questions, generate summaries, and create practice quizzes from their materials.

Option B — Intelligent Support Ticketing: A system that receives support tickets, classifies them by category and priority using AI, generates suggested responses using RAG from a knowledge base, and routes complex tickets to human agents.

Option C — AI Content Studio: A platform for content creators that generates drafts from briefs, checks facts against a source database, adjusts tone and style to match brand guidelines, and provides quality scores and improvement suggestions.

Your application should include: - At least two AI features from different sections of the chapter - Error handling and graceful degradation - Cost tracking - Basic quality monitoring - A user-facing interface (CLI, web, or API)

Solutions to coding exercises are available in code/exercise-solutions.py.