Chapter 33 Exercises: AI Product Management


Section A: The AI PM Role and Probabilistic Thinking

Exercise 33.1 — The Framing Exercise

You are the AI PM for a bank's new AI-powered fraud detection system. The model correctly identifies 94% of fraudulent transactions (recall = 94%) but also flags 3% of legitimate transactions as suspicious (false positive rate = 3%). The Head of Fraud Prevention is impressed. The Head of Customer Experience is alarmed: "You're going to freeze the accounts of 3 out of every 100 legitimate customers?"

a) Write four different one-sentence descriptions of this model's performance — one using the error frame, one using the improvement frame, one using the comparison frame, and one using the outcome frame. Use realistic but invented numbers for the current baseline and financial impact.

b) Which framing would you lead with in a presentation to the CEO? Why?

c) The Head of Customer Experience asks you to reduce the false positive rate to below 1%. What tradeoff would this likely require, and how would you explain it to her?

Exercise 33.2 — AI PM vs. Traditional PM

For each of the following PM activities, explain how it differs when the product is AI-powered versus a traditional software feature. Be specific about what changes and why.

a) Writing user stories b) Defining acceptance criteria c) Planning a product launch d) Responding to a bug report e) Building a product roadmap f) Conducting user research

Exercise 33.3 — The Perfection Trap

A fintech company has developed an AI-powered investment recommendation tool. The model has been in development for fourteen months. It started at 62% accuracy (measured by user-accepted recommendations) and is now at 79%. The product team wants to launch, but the Chief Investment Officer insists on waiting until the model reaches 90%.

a) What questions would you ask the CIO to understand whether the 90% threshold is justified? b) The current (non-AI) process involves human financial advisors whose recommendation acceptance rate is 74%. How does this information change the launch decision? c) Design a launch strategy that addresses the CIO's concerns while avoiding an indefinite delay.

Exercise 33.4 — Skill Stack Self-Assessment

Rate yourself (1 = novice, 5 = expert) on each element of the AI PM skill stack described in Section 33.1: business strategy, user empathy, ML literacy, ethical reasoning, and communication/translation.

a) For your lowest-rated skill, describe three specific actions you could take in the next six months to improve it. b) For your highest-rated skill, describe how you would leverage it as an AI PM to compensate for gaps in other areas. c) Which skill do you believe is hardest to develop? Why?


Section B: User Research and Requirements

Exercise 33.5 — Mental Model Mapping

You are launching an AI-powered email assistant that drafts replies to incoming messages. Before launch, you conduct user interviews with 20 employees at a mid-sized consulting firm.

a) Write five interview questions designed to reveal users' mental models of how the AI email assistant works. (Do not ask "How do you think the AI works?" — that question is too direct and often yields unreliable answers.)

b) For each of the five mental models described in Section 33.4 (magic, database, human-like, surveillance, random), describe one design decision you would make differently based on which mental model is most prevalent among your users.

c) Design a brief onboarding flow (3-5 screens) for the email assistant that calibrates user expectations without overwhelming them with technical detail.

Exercise 33.6 — AI User Stories and Acceptance Criteria

Write a complete set of AI-enhanced user stories and acceptance criteria for ONE of the following products:

a) A streaming music service's "Discover Weekly" playlist feature b) A ride-sharing app's estimated time of arrival (ETA) feature c) A job recruitment platform's candidate-matching feature d) A grocery delivery app's "reorder suggestions" feature

For each product, write: - Three user stories (following the AI-enhanced format from Section 33.5) - Performance acceptance criteria (with specific thresholds) - Fairness acceptance criteria - Reliability acceptance criteria (including fallback behavior) - User experience acceptance criteria (including transparency features)

Exercise 33.7 — The "Good Enough" Debate

A healthcare company is building an AI system that analyzes skin images to screen for potential melanoma. The model currently achieves: - Sensitivity (recall): 91% — it correctly identifies 91% of melanoma cases - Specificity: 85% — 15% of benign images are flagged as potentially malignant

The dermatologists on staff have sensitivity of approximately 86% and specificity of approximately 92%.

a) Is this model "good enough" to deploy as a first-line screening tool? Argue both sides. b) Is this model "good enough" to deploy as a second-opinion tool that supplements the dermatologist's judgment? How does this change your analysis? c) What additional information would you need to make a definitive recommendation? List at least five factors. d) Design a deployment strategy that manages risk while still delivering value.


Section C: Failure Mode Design

Exercise 33.8 — Graceful Degradation Hierarchy

Design a five-level graceful degradation hierarchy (following the model in Section 33.8) for each of the following AI products:

a) A customer service chatbot for a telecommunications company b) An AI-powered language translation feature in a video conferencing tool c) An autonomous checkout system (like Amazon's "Just Walk Out" technology) in a grocery store d) An AI-driven content moderation system for a social media platform

For each level, specify: the trigger condition, the user experience, and the monitoring metric that would detect this level of degradation.

Exercise 33.9 — Failure Mode Analysis

You are the AI PM for a large e-commerce platform's search ranking algorithm. The algorithm uses ML to rank search results based on predicted relevance, purchase probability, and margin contribution.

a) Identify at least six distinct failure modes for this system (at least one each of: model failure, data failure, infrastructure failure, concept drift, and edge case failure). b) For each failure mode, describe: (i) how the failure would manifest to the user, (ii) how long it might take to detect the failure, and (iii) what the business impact would be if the failure persisted for one week. c) Prioritize the six failure modes by (likelihood x impact) and recommend the top three to address first.

Exercise 33.10 — Human Escalation Design

Design a human escalation system for an AI-powered medical symptom checker app. The app asks users about their symptoms and suggests possible conditions and next steps.

a) Define five specific trigger conditions for human escalation (be precise about thresholds, keywords, or patterns). b) For each trigger, describe the escalation path — who does the user get connected to, and how? c) Design the context handoff — what information does the human agent see when they receive the escalation? d) How would you measure the effectiveness of the escalation system? Define three metrics.


Section D: Metrics and Stakeholder Communication

Exercise 33.11 — Metrics Dashboard Design

You are the AI PM for a news app's personalized content feed. Design a three-tier metrics dashboard (daily monitoring, weekly review, monthly business review) following NK's model from Section 33.7.

a) List 4-6 metrics for each tier. For each metric, specify: the metric name, the measurement method, the target value, and the alert threshold. b) Identify two metrics that could be in tension with each other (e.g., engagement vs. diversity). How would you manage this tension? c) Design a "fairness scorecard" that ensures the personalized feed serves all user segments equitably. What dimensions would you measure fairness across?

Exercise 33.12 — Stakeholder Translation

The data science team presents you with the following technical results from a model update for a customer churn prediction system:

"We retrained the XGBoost model with 47 new features including NLP-derived sentiment scores from support tickets. AUC-ROC improved from 0.83 to 0.89. Precision at 80% recall went from 0.61 to 0.74. The model now identifies churners an average of 12 days earlier in their lifecycle. Training time increased from 3 hours to 11 hours due to NLP feature computation. Inference latency increased from 45ms to 120ms."

Translate this into three different one-page updates:

a) A Slack message to the VP of Customer Success (who cares about retention rates and cost savings) b) An email to the CTO (who cares about infrastructure costs and system performance) c) A paragraph for the quarterly board update (which focuses on strategic AI capabilities)

Exercise 33.13 — Managing Hype

For each of the following stakeholder statements, write a response that (a) validates the stakeholder's underlying concern, (b) provides accurate context, and (c) redirects toward a productive conversation:

a) CEO: "I just saw that Amazon is using generative AI for product descriptions. We need to do the same thing by next quarter." b) VP of Sales: "Can we use AI to predict exactly which deals will close? That would solve our forecasting problem." c) General Counsel: "I'm not comfortable launching any AI features until we can guarantee zero errors." d) Board Member: "I read that AI is going to replace 40% of jobs. What's our AI headcount reduction plan?" e) Head of Marketing: "The competitor's AI chatbot is all over social media. We look like we're falling behind."


Section E: Product Strategy and Roadmapping

Exercise 33.14 — AI MVP Design

You are the AI PM for a B2B SaaS company that provides project management software. You want to add an AI feature that predicts project completion dates based on historical project data, team composition, and current progress.

a) Design a Wizard of Oz MVP for this feature. What would users see? Who performs the AI's job behind the scenes? How long would you run the MVP, and what data would you collect? b) Design a rules-based MVP. What simple rules or heuristics would approximate the AI's behavior? What percentage of the AI's eventual value do you estimate the rules-based version would deliver? c) Design a data-first MVP. What instrumentation would you add to the existing product to collect the data the future AI model will need? How would you validate that the data is being collected correctly? d) Which MVP approach would you recommend, and why?

Exercise 33.15 — AI Product Roadmap

You are the AI PM for a financial services company that has just launched an AI-powered credit scoring model. The model has been live for three months, performing at an AUC of 0.87 (compared to the legacy rules-based system's equivalent of 0.76). Build a 12-month product roadmap.

a) Categorize each planned initiative as Model Improvement, Feature Development, or Infrastructure Investment. b) Allocate approximate effort percentages for each quarter, following (but adapting) Okonkwo's stage-based heuristics. c) Create a feature track (deliverables with dates) and a performance track (targets with ranges). d) Identify two "dependencies" — initiatives that must be completed before others can begin. e) Identify one risk that could derail the roadmap and describe your mitigation strategy.

Exercise 33.16 — The AI Product Canvas

Extend the ML Canvas from Chapter 6 into an "AI Product Canvas" by adding product-specific sections. Your canvas should include all 10 sections from the original ML Canvas plus at least five additional sections specific to AI product management. For each additional section, explain why it is necessary and provide an example entry.

Use one of the following products as your example: a) An AI-powered resume screening system for a staffing agency b) A personalized nutrition and meal planning app c) An AI assistant for real estate agents that generates property listing descriptions


Section F: Integration and Synthesis

Exercise 33.17 — The Full AI Product Spec

Write a complete AI Product Specification (2-4 pages) for an AI-powered feature of your choice. The spec should include:

  1. Product overview and business case
  2. Target users and user research insights
  3. AI-enhanced user stories (at least 3)
  4. Performance, fairness, and reliability acceptance criteria
  5. Failure mode analysis and graceful degradation plan
  6. Metrics framework (engagement, quality, trust, fairness, business outcomes)
  7. MVP strategy
  8. Launch plan (including graduated rollout)
  9. Stakeholder communication plan
  10. Three-quarter roadmap

Exercise 33.18 — Case Analysis: NK's Launch

Review NK's loyalty personalization engine launch described in Section 33.12.

a) Identify three decisions NK made that reflect best practices from this chapter. For each, explain what made it effective. b) Identify two things NK could have done differently or better. What would you have recommended? c) NK's A/B test showed a +181% click-through rate lift but only a +30% purchase conversion lift. What are three hypotheses that could explain this gap, and how would you test each one? d) If you were NK's manager, what would you prioritize for the personalization engine in the next quarter, and why?

Exercise 33.19 — Ethical Product Design

You are the AI PM for a social media platform. Your team has built a content recommendation algorithm that maximizes user engagement (time spent on the platform). Early A/B tests show a 22% increase in daily active usage — an extraordinary result.

However, a deeper analysis reveals: - Users in the treatment group report 15% higher rates of "feeling worse about themselves" in post-session surveys - Content with emotional or outrage-provoking headlines receives 4x more algorithmic amplification than neutral content - Users aged 13-17 show a 35% increase in late-night usage (past midnight)

a) Is this product "successful"? By whose definition? b) As the AI PM, what changes would you make to the product before launch? Be specific. c) How would you present these findings to an executive team that is focused on user engagement growth? d) Design a set of "responsible AI" product requirements that balance engagement with user well-being. e) Connecting to Chapter 25 (bias) and Chapter 26 (transparency): what transparency features would you build into this product?

Exercise 33.20 — Competitive Analysis

Choose a real AI-powered product (such as Spotify's recommendation engine, Grammarly's writing assistant, Notion AI, or Waze's route optimization). Conduct an analysis that covers:

a) What is the core AI capability, and how does it create user value? b) What is the likely performance range, and what does "wrong" look like for this product? c) How does the product handle the transparency challenge? (Does it explain its AI? How?) d) What fallback strategy does the product appear to use when the AI fails? e) What feedback mechanisms does the product use to improve the AI? f) If you were the AI PM for this product, what would be your top three priorities for the next quarter?


These exercises span the full range of AI product management competencies: probabilistic thinking, user research, requirements definition, failure mode design, metrics, stakeholder communication, roadmapping, and ethical reasoning. For exercises that ask you to write product specifications or roadmaps, aim for the level of detail and rigor you would bring to a real product review with your executive team.