Chapter 23 Exercises: Cloud AI Services and APIs

DataField.Dev

Chapter 23 Exercises: Cloud AI Services and APIs

Section A: Recall and Comprehension

Exercise 23.1 Define the following terms in your own words, using no more than two sentences each: (a) data gravity, (b) model distillation, (c) API gateway, (d) total cost of ownership (TCO), (e) vendor lock-in.

Exercise 23.2 Describe the four layers of the AI service stack (AI Infrastructure, AI Platform, Pre-trained APIs, AI Applications). For each layer, identify one advantage and one trade-off compared to the layer below it.

Exercise 23.3 List the five strategic questions Professor Okonkwo identifies as the basis for cloud AI vendor selection. For each question, explain why it matters more than comparing individual service features.

Exercise 23.4 Identify the seven components of total cost of ownership for cloud AI as described in the chapter. Which component does the chapter argue is most frequently underestimated, and why?

Exercise 23.5 Explain the difference between API lock-in, data lock-in, and knowledge lock-in. Which is the most difficult to reverse, and why?

Exercise 23.6 What is the "primary cloud with selective multi-cloud" strategy? How does it differ from both single-cloud and full multi-cloud approaches? What conditions make it the most appropriate choice?

Exercise 23.7 Describe the anonymization pipeline pattern used by Athena for sending customer service queries to Azure OpenAI Service. Why is this pattern necessary, and what trade-offs does it introduce?

Section B: Application

Exercise 23.8: Cloud Provider Comparison Using the five-question framework from the chapter, evaluate AWS, Azure, and Google Cloud Platform for the following hypothetical organization:

A 500-person healthcare technology company
All current infrastructure is on Azure (including Azure SQL Database and Azure Blob Storage)
The development team consists of 15 engineers, all proficient in the Microsoft stack (.NET, C#, Azure DevOps)
HIPAA compliance is mandatory
The company wants to build: (a) a medical document processing pipeline, (b) an LLM-powered clinical decision support tool, and (c) a patient readmission prediction model
Annual cloud budget is $800,000

Answer each of the five questions for this organization. Based on your analysis, which cloud strategy would you recommend? Justify your recommendation.

Exercise 23.9: TCO Calculation A mid-sized e-commerce company is planning to deploy an AI-powered product recommendation engine using a cloud ML platform. Estimate the total cost of ownership for the first year, using the following assumptions:

Training: 4 GPU hours per week on an NVIDIA A100 (cloud cost: $3.50/hour)
Inference: 2 GPU instances running 24/7 for serving recommendations (cloud cost: $1.20/hour per instance)
Storage: 2 TB of training data and model artifacts ($0.023/GB/month)
Data transfer: 500 GB/month of data egress ($0.09/GB)
Engineering team: 2 ML engineers (fully loaded cost: $280,000/year each)
API calls to a pre-trained NLP service for processing product descriptions: 2 million calls/month ($1.00/1,000 calls)
Monitoring and management overhead: estimate at 15% of engineering time

Calculate the total first-year TCO. What percentage of the total cost is compute, and what percentage is engineering? What does this ratio suggest about where cost optimization efforts should focus?

Exercise 23.10: LLM Cost Projection A financial services company is deploying an internal AI assistant powered by GPT-4o via Azure OpenAI Service. Using the following usage projections, model the LLM API costs over 12 months:

Month 1: 500 users, average 5 queries/day, average 1,200 input tokens and 400 output tokens per query
User growth: 20% per month (new users adopt the tool as word spreads)
Token growth: 10% increase in average tokens per query every 3 months (as users learn to write more detailed prompts)
Pricing: $2.50/1M input tokens, $10.00/1M output tokens

(a) Create a month-by-month cost projection for the first 12 months. (b) What is the total annual cost? (c) At what month does the monthly cost exceed $10,000? (d) Propose three cost optimization strategies from the chapter that could reduce the projected cost by at least 30%.

Exercise 23.11: Vendor Lock-In Assessment Select a cloud AI service you have used or studied (e.g., AWS SageMaker, Azure OpenAI Service, Google BigQuery ML). Assess the degree of vendor lock-in it creates across the following dimensions:

(a) API lock-in: How provider-specific is the API? Are there open-source alternatives or abstraction layers?
(b) Data lock-in: What data formats and storage dependencies does the service create? What would it cost to export?
(c) Knowledge lock-in: How provider-specific is the expertise needed to use the service? Is it transferable?
(d) Switching cost estimate: If you needed to migrate to an equivalent service on a different cloud, what would the migration involve? Estimate the time and cost.

Exercise 23.12: Multi-Cloud Architecture Design An international retail company has the following requirements:

Customer data must remain in the EU (GDPR compliance)
The company needs LLM capabilities for customer service chatbots in 12 languages
The supply chain team needs custom ML models for demand forecasting
The marketing team wants to use a pre-trained recommendation engine
The company's data warehouse is on Google BigQuery
The development team has experience with AWS but limited GCP experience

Design a cloud AI architecture for this company. Specify: (a) Which cloud provider(s) to use for each workload and why (b) How data will flow between services (include data transfer cost considerations) (c) How PII will be protected when using external AI APIs (d) What abstraction layers or gateway patterns you would implement (e) The top three risks of your proposed architecture and mitigation strategies for each

Section C: Analysis and Critical Thinking

Exercise 23.13: The Build vs. Buy vs. Rent Decision The chapter describes a spectrum from raw infrastructure (IaaS) to fully managed AI applications (SaaS). For each of the following AI use cases, argue whether the organization should build (custom model on infrastructure), buy (managed ML platform), or rent (pre-trained API):

(a) A bank detecting fraudulent credit card transactions (10 million transactions per day) (b) A law firm extracting key terms from legal contracts (500 contracts per month) (c) A social media company moderating user-generated images (50 million images per day) (d) A 50-person startup building an AI-powered writing assistant as its core product (e) A hospital predicting patient readmission risk using its proprietary patient data

For each case, justify your recommendation by considering: competitive differentiation, data sensitivity, volume, required accuracy, team expertise, and budget.

Exercise 23.14: Evaluating the OpenAI-Microsoft Partnership The chapter describes Microsoft's approximately $13 billion investment in OpenAI and the exclusive cloud hosting arrangement for OpenAI's models through Azure.

(a) From Microsoft's perspective, what are the strategic benefits and risks of this partnership? (b) From a customer's perspective (say, an enterprise considering Azure OpenAI Service), what are the benefits and risks of building on a capability that depends on a third-party partnership? (c) How should an enterprise mitigate the concentration risk of relying on a single LLM provider? (d) If the OpenAI-Microsoft partnership were to change significantly (e.g., if OpenAI began hosting on multiple clouds), how would this affect Azure's competitive positioning?

Exercise 23.15: Cost Optimization Sprint You are the VP of Data & AI at a retail company. Your monthly cloud AI spend has grown from $35,000 to $120,000 over the past year, driven primarily by: - LLM API costs ($45,000/month — up 300% from one year ago) - ML training compute ($30,000/month — steady) - ML inference compute ($25,000/month — up 50%) - Data storage ($12,000/month — growing 10% monthly) - Data transfer ($8,000/month — growing 15% monthly)

The CFO has asked you to reduce monthly spend to $90,000 without degrading capability. Design a cost optimization plan: (a) Which of the seven cost categories offers the largest opportunity? (b) For each major cost driver, identify specific optimization techniques from the chapter. (c) Estimate the savings from each optimization (be specific about assumptions). (d) Are there any optimizations that require upfront investment before generating savings? If so, calculate the payback period.

Exercise 23.16: Security Architecture Review You are reviewing the cloud AI security posture of a healthcare technology company that has the following architecture:

Patient data is stored in AWS S3 (encrypted at rest with AWS-managed keys)
ML models are trained on SageMaker using patient data
The company uses Azure OpenAI Service for a physician-facing chatbot that answers questions about treatment guidelines
Physicians sometimes paste patient notes into the chatbot to ask about differential diagnoses
The chatbot does not have an anonymization pipeline — patient data is sent directly to Azure OpenAI Service

Identify at least five security and compliance issues with this architecture. For each issue, recommend a specific remediation and estimate the effort required to implement it.

Exercise 23.17: Cloud AI Strategy Presentation You are preparing a 10-minute presentation for your company's board of directors on cloud AI strategy. The board members are not technical but are concerned about (a) costs growing too quickly, (b) dependence on a single vendor, and (c) data security.

Outline your presentation. For each of the three concerns: (a) Frame the issue in business terms (not technical jargon) (b) Present the current state honestly (c) Describe your strategy for managing the risk (d) Propose one specific metric the board should track quarterly to monitor the risk

Section D: Athena Case Extension

Exercise 23.18: Athena's Next Cloud Decision Athena's shelf analytics project (using Google Vision AI) has been successful, and the product team wants to expand computer vision to two new use cases: (a) automated checkout verification (comparing items scanned at self-checkout with what the camera sees in the cart) and (b) customer traffic flow analysis in stores.

These new use cases have different requirements from shelf analytics: - Checkout verification requires real-time inference (< 200ms latency) - Traffic flow analysis processes video streams, not individual images - Both use cases involve customer-facing cameras that may capture faces

Should Athena continue using Google Vision AI for these new use cases, or should it consider alternatives? Analyze the decision using the five-question framework, considering latency requirements, data privacy implications, costs at scale, and Athena's existing architecture.

Exercise 23.19: Athena's LLM Cost Crisis Six months after the events of this chapter, Athena's LLM API costs have reached $18,500/month — 23% over Ravi's projected budget. The growth is driven by three factors: - The customer service assistant is handling 40% more interactions than projected - The marketing team has begun using GPT-4o to generate product descriptions, adding $4,200/month in unexpected costs - Average prompt length has increased 35% as users have learned to provide more context

Ravi needs to reduce monthly LLM costs to $14,000 without degrading the customer service assistant's quality.

(a) Design a cost reduction plan using at least four techniques from the chapter. (b) For each technique, estimate the percentage reduction in LLM costs. (c) Which techniques can be implemented immediately, and which require engineering investment? (d) How should Ravi communicate the cost constraints to the marketing team without discouraging their AI adoption?

Exercise 23.20: Ravi's Enterprise Agreement Negotiation Ravi is negotiating an enterprise agreement with AWS for Athena's cloud AI services. The current annual spend is $1.2 million, projected to grow to $1.8 million next year. AWS has offered a 3-year commitment at a 25% discount in exchange for a minimum annual spend of $1.5 million.

(a) What is the total savings over three years compared to on-demand pricing at projected spend levels? (b) What are the risks of this commitment? (c) What additional terms should Ravi negotiate beyond the discount? (d) Should Ravi also negotiate enterprise agreements with Azure and Google for the secondary and tertiary workloads? What factors should inform this decision?

Exercises 23.8-23.12 require the cloud provider comparison framework from this chapter. Exercises 23.13-23.17 develop critical thinking about strategic trade-offs. Exercises 23.18-23.20 extend the Athena case study with realistic business scenarios. Selected solutions appear in Appendix B.