Answers to Selected Exercises
This appendix provides worked solutions for selected exercises from each chapter. Solutions are chosen to illustrate key concepts, demonstrate analytical reasoning, and provide model answers that combine technical rigor with business relevance. Python code solutions are syntactically correct and designed to run independently. For exercises requiring dataset-specific answers (e.g., analysis of the Athena sales data), representative solutions are provided using the synthetic data generators from each chapter.
Chapter 1: The AI-Powered Organization
Exercise 1.1 — Definitions
(a) Artificial intelligence is the broad field of computer science focused on creating systems capable of performing tasks that typically require human intelligence, including perception, reasoning, learning, and decision-making. It encompasses everything from simple rule-based systems to complex neural networks.
(b) Machine learning is a subset of AI in which systems learn patterns from data rather than being explicitly programmed with rules. It enables computers to improve performance on a task through experience (data) without being told the exact steps to follow.
(c) Deep learning is a subset of machine learning that uses neural networks with many layers (hence "deep") to learn hierarchical representations of data. It has driven breakthroughs in image recognition, natural language processing, and generative AI.
(d) Generative AI refers to AI systems that can create new content — text, images, code, audio, video — rather than simply classifying or predicting. These systems learn the statistical structure of their training data and generate novel outputs that resemble it.
(e) Large language model (LLM) is a type of generative AI trained on massive text corpora that can understand and generate human language. Models like GPT-4, Claude, and Gemini contain billions of parameters and exhibit capabilities including summarization, translation, reasoning, and code generation.
Exercise 1.3 — AI Maturity Model
| Stage | Characteristic | Business Risk |
|---|---|---|
| Stage 1: Awareness | Organization recognizes AI's potential but has no formal initiatives | Risk of falling behind competitors who are already experimenting |
| Stage 2: Experimentation | Isolated AI pilots and proof-of-concepts in individual departments | Pilots fail to scale because they lack organizational support and data infrastructure |
| Stage 3: Systematic | Formal AI strategy exists with dedicated resources and governance | Overinvestment in technology without corresponding organizational change management |
| Stage 4: Transformative | AI is embedded in core business processes and decision-making | Organizational dependency on AI systems creates fragility if models degrade |
| Stage 5: Pioneering | AI drives competitive differentiation and creates new business models | Ethical and regulatory risks multiply as AI becomes deeply embedded in operations |
Exercise 1.11 — Athena's Priorities
Recommended prioritization (most urgent to least urgent):
-
(b) Unify customer data across the four siloed systems. This is the foundational blocker. Every AI initiative — from personalization to demand forecasting — requires integrated, accessible data. Without this, all other investments will underperform.
-
(d) Establish a data governance framework with data owners for key datasets. Governance must accompany data unification. Without clear ownership and quality standards, unified data will quickly degrade, and AI models trained on poor data will produce poor decisions.
-
(e) Launch AI literacy training for the executive team. Leadership buy-in and informed decision-making are prerequisites for successful AI adoption. Executives who do not understand AI's capabilities and limitations will make poor investment decisions and set unrealistic expectations.
-
(f) Replace the legacy POS system. The POS system is a critical data source for any customer-facing AI. If it cannot capture the data needed for modern analytics, it constrains everything downstream.
-
(a) Hire a team of data scientists to begin building ML models. Data scientists need clean, accessible data to be productive. Hiring before the data infrastructure is in place leads to expensive talent sitting idle or building models on unreliable data.
-
(c) Deploy an enterprise AI chatbot for customer service. This is a visible, flashy initiative — but it is the riskiest to deploy without data integration, governance, and AI literacy in place. A chatbot launched prematurely on fragmented data will deliver a poor customer experience and damage internal confidence in AI.
The logic behind this ordering is that Ravi must build the foundation (data, governance, literacy) before building the house (models, applications). This reflects the chapter's central argument that AI value creation requires organizational readiness, not just technology adoption.
Chapter 2: Thinking Like a Data Scientist
Exercise 2.1 — CRISP-DM Phases
- Business Understanding: Define the business problem, success criteria, and project scope — ensures the data science work is solving the right problem.
- Data Understanding: Collect, explore, and assess available data — determines whether sufficient data exists and identifies quality issues.
- Data Preparation: Clean, transform, and engineer features from raw data — typically the most time-consuming phase, consuming 60–80% of project effort.
- Modeling: Select and train machine learning algorithms — involves experimentation, hyperparameter tuning, and iterative refinement.
- Evaluation: Assess model performance against business criteria — goes beyond statistical metrics to include business value, fairness, and deployment feasibility.
- Deployment: Integrate the model into business operations — includes monitoring, maintenance, and organizational change management.
Exercise 2.7 — Confounding Variables
(a) Two confounding variables: (1) Customer engagement level — customers who read newsletters are likely already more engaged with the brand, and this engagement (not the newsletter itself) drives higher purchasing. (2) Customer tenure — longer-tenured customers are more likely both to subscribe to a newsletter and to purchase more, creating a spurious relationship.
(b) Reverse causation: Customers who purchase more are more likely to sign up for the newsletter because they have a stronger relationship with the brand. The purchasing behavior causes the newsletter subscription, not the other way around.
(c) Experiment: Conduct a randomized controlled trial (A/B test). Randomly select 5,000 customers who are not currently newsletter subscribers. Send half of them the newsletter for three months (treatment group) and withhold it from the other half (control group). Compare purchasing behavior between the two groups after the test period. Randomization ensures that pre-existing differences in engagement are balanced between groups, isolating the causal effect of the newsletter.
Chapter 3: Python for the Business Professional
Exercise 2 — Type Conversion
revenue_str = "$1,250,000"
revenue_float = float(revenue_str.replace("$", "").replace(",", ""))
print(f"Revenue as float: {revenue_float}")
# Output: Revenue as float: 1250000.0
The key insight is that financial data often arrives as formatted strings. The .replace() method removes non-numeric characters, and chaining multiple calls handles both the dollar sign and commas in a single expression. This pattern appears constantly in real-world data cleaning.
Exercise 5 — Conditional Logic: Discount Tiers
def calculate_discount(order_value):
"""Assign discount rate based on order value tiers."""
if order_value >= 500:
discount_rate = 0.20
elif order_value >= 200:
discount_rate = 0.10
elif order_value >= 100:
discount_rate = 0.05
else:
discount_rate = 0.00
return discount_rate
# Test with four order values
test_values = [75, 150, 350, 600]
for value in test_values:
rate = calculate_discount(value)
final_price = value * (1 - rate)
print(f"Order: ${value:.2f} | Discount: {rate:.0%} | "
f"Final: ${final_price:.2f}")
# Output:
# Order: $75.00 | Discount: 0% | Final: $75.00
# Order: $150.00 | Discount: 5% | Final: $142.50
# Order: $350.00 | Discount: 10% | Final: $315.00
# Order: $600.00 | Discount: 20% | Final: $480.00
Exercise 10 — Functions: Business Calculator
import math
def compound_growth(principal, rate, years):
"""Return future value given compound growth."""
return principal * (1 + rate) ** years
def payback_period(investment, monthly_profit):
"""Return months to recoup investment, rounded up."""
return math.ceil(investment / monthly_profit)
def format_currency(amount):
"""Return a formatted currency string."""
return f"${amount:,.2f}"
# Tests
print(compound_growth(100000, 0.08, 5)) # 146932.81
print(compound_growth(50000, 0.12, 10)) # 155292.41
print(payback_period(500000, 45000)) # 12 months
print(payback_period(120000, 8500)) # 15 months
print(format_currency(1234567.89)) # $1,234,567.89
print(format_currency(42.5)) # $42.50
Exercise 11 — Creating a DataFrame
import pandas as pd
data = {
'Office': ['HQ', 'West', 'South', 'Midwest', 'Northeast'],
'City': ['New York', 'San Francisco', 'Austin', 'Chicago', 'Boston'],
'Employees': [120, 85, 45, 60, 35],
'Revenue': [4500000, 3200000, 1800000, 2100000, 1200000],
'Year_Opened': [2010, 2015, 2018, 2016, 2020]
}
df = pd.DataFrame(data)
print("Shape:", df.shape) # (5, 5)
print("\nData Types:")
print(df.dtypes)
print("\nDescriptive Statistics:")
print(df.describe())
# Output (shape): (5, 5)
# Data types: Office and City are object; Employees, Revenue,
# Year_Opened are int64.
# Descriptive statistics show mean Revenue of $2,560,000,
# mean Employees of 69, etc.
Exercise 16 — GroupBy Basics (representative approach)
# Assuming df is the Athena sales DataFrame with columns:
# store, region, category, revenue, units_sold, month
# 1. Total revenue by region
print(df.groupby('region')['revenue'].sum().sort_values(ascending=False))
# 2. Average revenue by category
print(df.groupby('category')['revenue'].mean().sort_values(ascending=False))
# 3. Total units sold by store
print(df.groupby('store')['units_sold'].sum().sort_values(ascending=False))
# 4. Region with highest total revenue
top_region = df.groupby('region')['revenue'].sum().idxmax()
print(f"Highest revenue region: {top_region}")
# 5. Category with lowest average revenue
low_cat = df.groupby('category')['revenue'].mean().idxmin()
print(f"Lowest avg revenue category: {low_cat}")
The groupby() method is the pandas equivalent of a SQL GROUP BY clause. The pattern df.groupby('column')['target'].agg_function() is the single most important pandas pattern for business analysis because almost every business question involves comparing metrics across groups (regions, categories, time periods, customer segments).
Chapter 4: Data Strategy and Data Literacy
Exercise 4.3 — Six Dimensions of Data Quality
| Dimension | Definition | Retail Example of Failure |
|---|---|---|
| Accuracy | Data correctly represents real-world values | Product weight recorded as 5 kg instead of 0.5 kg, causing incorrect shipping cost calculations |
| Completeness | Required data values are present | 30% of customer records missing email addresses, preventing email marketing campaigns |
| Consistency | Same data is recorded the same way across systems | POS system records "San Francisco" while warehouse system records "SF" — customer orders cannot be matched |
| Timeliness | Data is available when needed | Inventory counts updated only weekly, causing stockouts between updates |
| Validity | Data conforms to defined rules and formats | Phone numbers stored in inconsistent formats (some with dashes, some with parentheses, some with country codes) |
| Uniqueness | Each entity is represented once | Same customer appears as three separate records across online, in-store, and mobile channels — distorting CLV calculations |
Exercise 4.8 (representative) — Data Governance RACI
A RACI matrix for a key data asset (Customer Master Record):
| Activity | Data Owner (VP Marketing) | Data Steward (CRM Manager) | Data Custodian (IT) | Data Consumer (Analyst) |
|---|---|---|---|---|
| Define data quality standards | A (Accountable) | R (Responsible) | C (Consulted) | I (Informed) |
| Monitor data quality | I | R | C | I |
| Grant access to data | A | R | R | — |
| Fix data quality issues | I | R | R | C |
| Use data for analysis | I | C | I | R |
The data owner has ultimate accountability but does not perform hands-on work. The data steward is the working-level decision maker. This distinction is critical: without clear ownership, data quality becomes "everyone's responsibility," which in practice means no one's responsibility.
Chapter 5: Exploratory Data Analysis
Exercise 1 — Reading Summary Statistics
(a) The mean ($47,200) is substantially higher than the median ($31,800), indicating a right-skewed distribution. This is typical for revenue data because most days have moderate revenue, but occasional high-revenue days (promotions, holidays) pull the mean upward. Revenue data is almost always right-skewed because it has a natural floor (zero) but no practical ceiling.
(b) IQR = Q3 - Q1 = $62,400 - $18,900 = $43,500**. Lower bound = Q1 - 1.5 * IQR = $18,900 - $65,250 = **-$46,350 (effectively $0, since revenue cannot be negative). Upper bound = Q3 + 1.5 * IQR = $62,400 + $65,250 = **$127,650**. Days with revenue above $127,650 would be flagged as outliers. Given the max is $312,000, there are clearly some extreme revenue days.
(c) Reporting $47,200 as the "average" is misleading because the skewness inflates the mean. A better report would state: "Our **median** daily revenue is $31,800, with a typical range of $18,900 to $62,400 (interquartile range). The mean of $47,200 is higher due to occasional high-revenue days." The median is a more representative measure of "typical" performance for skewed data.
(d) A skewness of 2.41 means the distribution has a long tail to the right — most days cluster at lower revenue values, but a meaningful number of days produce revenue far above the average. In business terms: "Revenue on most days is moderate, but we have occasional big days that significantly boost our averages."
Exercise 7 — Basic Distribution Plot (Python)
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(42)
salaries = np.random.lognormal(mean=11.0, sigma=0.5, size=500)
fig, ax = plt.subplots(figsize=(10, 6))
ax.hist(salaries, bins=30, color='steelblue', edgecolor='white',
alpha=0.8)
mean_val = np.mean(salaries)
median_val = np.median(salaries)
ax.axvline(mean_val, color='crimson', linewidth=2, linestyle='--',
label=f'Mean: ${mean_val:,.0f}')
ax.axvline(median_val, color='darkgreen', linewidth=2, linestyle='-',
label=f'Median: ${median_val:,.0f}')
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.set_xlabel('Annual Salary ($)', fontsize=12)
ax.set_ylabel('Number of Employees', fontsize=12)
ax.set_title('Most Employees Earn Below the Company Average:\n'
'Right-Skewed Salary Distribution (n=500)',
fontsize=14, fontweight='bold')
ax.legend(fontsize=11)
plt.tight_layout()
plt.savefig('salary_distribution.png', dpi=150, bbox_inches='tight')
plt.show()
The title communicates an insight ("Most employees earn below the company average") rather than merely describing the chart ("Employee Salary Distribution"). The vertical lines for mean and median, with the mean visibly to the right, reinforce the skewness message visually.
Chapter 6: The Business of Machine Learning
Exercise 6.1 — Translating Business Problems
(a) "We want to reduce customer churn."
- Prediction target: Binary label — will the customer churn (1) or not (0) within the next 180 days?
- Prediction type: Classification
- Prediction window: 180 days from the scoring date
- Action: Customers with churn probability above threshold X receive a targeted retention offer (discount, loyalty reward, or personal outreach)
(b) "We need to forecast how many units of each product to stock next month."
- Prediction target: Continuous value — quantity of units demanded per SKU per store per week
- Prediction type: Regression (demand forecasting)
- Prediction window: 4 weeks ahead, updated weekly
- Action: Supply chain team uses forecasts to set reorder quantities, safety stock levels, and warehouse allocation
Exercise 6.3 — The "Predict Everything" Trap
(a) This request suffers from what Professor Okonkwo calls the "omniscient algorithm" misconception. Asking a single model to predict who, what, when, how much, and which channel conflates five distinct prediction problems, each with different target variables, feature requirements, and evaluation criteria. A model trained to predict whether a customer will buy optimizes for a different objective than a model predicting how much they will spend. Combining them produces a system that does none of these tasks well.
(b) A better starting point is to identify the single highest-value decision the marketing team needs to support. For example: "Which existing customers are most likely to make a purchase in the next 30 days?" This is a binary classification problem with a clear target, a defined time window, and a specific action (target those customers with marketing spend). Once this model delivers value, additional models can be built incrementally — predicting spend amount, optimal channel, or product category — each scoped as its own project with its own success criteria.
Chapter 7: Supervised Learning — Classification
Exercise 7.8 — The Accuracy Paradox
(a) A model that predicts every single transaction as legitimate (never flags any fraud) would achieve 9,950/10,000 = 99.5% accuracy because 99.5% of transactions genuinely are legitimate. This model catches zero fraud — it is useless — yet it reports impressive accuracy. This is the accuracy paradox: when classes are heavily imbalanced, accuracy rewards the model for always predicting the majority class.
(b) Recall (also called sensitivity) is the most appropriate metric because it measures the percentage of actual fraud cases the model detects. Precision is also important because it measures what fraction of flagged transactions are actually fraudulent (reducing false alarms). The F1 score (harmonic mean of precision and recall) or the AUC-ROC provide a balanced assessment. In fraud detection specifically, recall is usually prioritized because the cost of missing fraud ($2,000+ per incident) vastly exceeds the cost of investigating a false alarm ($15–$50).
(c) With 80% recall: the model catches 80% of 50 fraudulent transactions = 40 caught, 10 missed. With 60% precision: of all transactions flagged as fraud, 60% are actually fraudulent. If 40 are true positives, then total flagged = 40/0.60 = approximately 67 flagged, meaning about 27 legitimate transactions are incorrectly flagged. The 10 missed fraudulent transactions cost the company $20,000+ in losses. The 27 false alarms require investigation, costing perhaps $15 each = $405 in operational cost. This asymmetry is why recall is prioritized in fraud detection.
Exercise 7.10 — Threshold Selection
(a) Calculations:
| Threshold | Precision | Recall | F1 |
|---|---|---|---|
| 0.30 | 340/(340+480) = 0.415 | 340/400 = 0.850 | 0.557 |
| 0.50 | 260/(260+200) = 0.565 | 260/400 = 0.650 | 0.605 |
| 0.70 | 160/(160+60) = 0.727 | 160/400 = 0.400 | 0.516 |
(b) Cost-benefit at each threshold:
- At 0.30: TP value = 340 x $400 = $136,000; FP cost = 480 x $15 = $7,200; FN cost = 60 x $400 = $24,000. Net value = $136,000 - $7,200 - $24,000 = $104,800
- At 0.50: TP value = 260 x $400 = $104,000; FP cost = 200 x $15 = $3,000; FN cost = 140 x $400 = $56,000. Net value = $104,000 - $3,000 - $56,000 = $45,000
- At 0.70: TP value = 160 x $400 = $64,000; FP cost = 60 x $15 = $900; FN cost = 240 x $400 = $96,000. Net value = $64,000 - $900 - $96,000 = -$32,900
The 0.30 threshold maximizes net value at $104,800 because the cost of a missed churner ($400) vastly exceeds the cost of an unnecessary retention offer ($15). A lower threshold catches more churners, and the false positive cost is negligible.
(c) The highest F1 score (0.605) is at the 0.50 threshold, but this produces only $45,000 in net value — less than half the value at the 0.30 threshold. **F1 treats precision and recall as equally important, but the business context does not.** When the cost asymmetry between false positives and false negatives is large (as it is here — $400 vs. $15), the optimal business threshold will almost always differ from the F1-maximizing threshold.
Exercise 7.12 — Data Exploration (Python)
from sklearn.datasets import make_classification
import pandas as pd
import numpy as np
# Using the chapter's generate_athena_churn_data() function
# (simplified synthetic equivalent shown here)
np.random.seed(42)
n = 10000
df = pd.DataFrame({
'purchase_count_12m': np.random.poisson(8, n),
'days_since_last_purchase': np.random.exponential(60, n).astype(int),
'avg_order_value': np.random.lognormal(3.5, 0.8, n),
'return_rate': np.random.beta(2, 10, n),
'email_open_rate': np.random.beta(3, 7, n),
'loyalty_tier': np.random.choice(
['Bronze', 'Silver', 'Gold', 'Platinum'], n,
p=[0.40, 0.30, 0.20, 0.10]),
'online_ratio': np.random.beta(5, 5, n),
})
# Simulate churn: higher churn for low purchase, high recency
churn_prob = 1 / (1 + np.exp(-(
-2.0
- 0.3 * df['purchase_count_12m']
+ 0.015 * df['days_since_last_purchase']
- 0.005 * df['avg_order_value']
+ 2.0 * df['return_rate']
)))
df['churned'] = (np.random.random(n) < churn_prob).astype(int)
# (a) Churn rate by loyalty tier
print("Churn rate by loyalty tier:")
print(df.groupby('loyalty_tier')['churned'].mean()
.sort_values(ascending=False))
# (b) Summary by churn status
numeric_cols = ['purchase_count_12m', 'days_since_last_purchase',
'avg_order_value', 'return_rate', 'email_open_rate']
summary = df.groupby('churned')[numeric_cols].agg(['mean', 'median'])
print("\nSummary statistics by churn status:")
print(summary.round(2))
# (c) Correlation with churn
correlations = df[numeric_cols + ['churned']].corr()['churned'].drop(
'churned').abs().sort_values(ascending=False)
print("\nTop 3 features correlated with churn:")
print(correlations.head(3))
The Bronze tier typically shows the highest churn rate, consistent with the intuition that less-engaged, lower-loyalty customers are more likely to leave. The top correlated features are typically days_since_last_purchase (positive — longer recency means higher churn), purchase_count_12m (negative — more purchases means lower churn), and return_rate (positive — more returns means higher churn).
Chapter 8: Supervised Learning — Regression
Exercise 8.8 — Interpreting Regression Coefficients
(a) Interpretations:
- Intercept (120.0): Baseline monthly revenue is $120,000 when all other features are zero — this is the model's starting point.
- Marketing spend (2.3): Each additional $1,000 in marketing spend is associated with a $2,300 increase in revenue, holding other factors constant.
- Number of products (0.05): Each additional product listed is associated with a $50 increase in revenue — individually small but significant at scale.
- Average rating (45.0): Each one-point increase in customer rating is associated with a $45,000 increase in revenue — the most impactful controllable feature.
- Holiday month (85.0): Holiday months generate an additional $85,000 in revenue compared to non-holiday months.
- Competitor price index (-1.8): Each one-point increase in the competitor price index is associated with a $1,800 decrease in revenue — likely because higher competitor prices are measured when Athena's relative pricing becomes less competitive.
(b) Predicted revenue = 120.0 + (2.3 x 50) + (0.05 x 2000) + (45.0 x 4.2) + (85.0 x 0) + (-1.8 x 100) = 120 + 115 + 100 + 189 + 0 - 180 = $344,000/month.
(c) R-squared of 0.72 means the model explains 72% of the variation in monthly revenue. Whether this is "good enough" depends on: (1) what alternative the model replaces (if the previous approach was gut feeling, 72% is excellent), (2) the magnitude of the remaining 28% in dollar terms, and (3) how the model will be used (directional planning vs. precise budgeting).
(d) Correlation of 0.85 between marketing spend and number of products suggests multicollinearity, which makes individual coefficients unreliable — the model cannot distinguish the independent effect of each. Solutions include: removing one of the correlated features, combining them into a single feature (e.g., "marketing intensity"), or using ridge regularization to stabilize the coefficients.
Exercise 8.13 — Safety Stock Calculation
(a) safety_stock = z x sigma x sqrt(lead_time) = 1.65 x 35 x sqrt(14) = 1.65 x 35 x 3.742 = 216 units
(b) At 99% service level: safety_stock = 2.33 x 35 x 3.742 = 305 units. Increase = 305 - 216 = 89 units (41.2% increase). Moving from 95% to 99% service level requires 41% more safety stock — a clear demonstration of the diminishing returns of service level improvements.
(c) With improved model (sigma = 22): safety_stock = 1.65 x 22 x 3.742 = 136 units. Savings = 216 - 136 = 80 fewer units of safety stock needed.
(d) Daily holding cost savings = 80 units x $12/day = **$960/day.**
(e) Payback period = $200,000 / $960 per day = 208 days (approximately 7 months). This is a compelling payback period — the model pays for itself in under a year through inventory savings alone, not counting the revenue benefits of fewer stockouts.
Chapter 9: Unsupervised Learning
Exercise 9.4 — K-Means by Hand
(a) Distances from each point to C1 = (1,1) and C2 = (5,7):
| Point | X | Y | Dist to C1 | Dist to C2 | Assignment |
|---|---|---|---|---|---|
| A | 1 | 1 | 0 | 7.21 | Cluster 1 |
| B | 1.5 | 2 | 1.12 | 6.10 | Cluster 1 |
| C | 3 | 4 | 3.61 | 3.61 | Either (tie — assign to C1) |
| D | 5 | 7 | 7.21 | 0 | Cluster 2 |
| E | 3.5 | 5 | 4.72 | 2.50 | Cluster 2 |
(b) New centroids: C1 = mean of {A, B, C} = ((1+1.5+3)/3, (1+2+4)/3) = (1.833, 2.333). C2 = mean of {D, E} = ((5+3.5)/2, (7+5)/2) = (4.25, 6.0).
(c) Recalculate distances with new centroids:
| Point | Dist to C1(1.83, 2.33) | Dist to C2(4.25, 6.0) | Assignment |
|---|---|---|---|
| A | 1.56 | 5.83 | Cluster 1 |
| B | 0.47 | 4.80 | Cluster 1 |
| C | 2.06 | 2.28 | Cluster 1 |
| D | 5.52 | 1.18 | Cluster 2 |
| E | 3.14 | 1.18 | Cluster 2 |
No points changed clusters. The algorithm has converged.
(d) Final clusters: Cluster 1 = {A, B, C} and Cluster 2 = {D, E}.
Exercise 9.5 — Choosing K
(a) Elbow method: The largest drops in inertia occur from K=2 to K=3 (decrease of 16,300), K=3 to K=4 (decrease of 9,400), and K=4 to K=5 (decrease of 4,400). After K=5, decreases become much smaller (1,500, 1,100, 600...). The "elbow" is at K=5.
(b) Silhouette score: The peak silhouette score (0.61) occurs at K=5, confirming the elbow analysis.
(c) Both methods agree on K=5. However, the marketing executive's constraint (K=3) is also legitimate if three campaigns is a genuine operational constraint. The solution is to run K=5 for analytical understanding and then strategically merge similar segments down to three campaign groups. This respects both the statistical structure and the business reality.
(d) Using K=3 solely because of a campaign constraint is not statistically optimal, but it is a valid business decision. The response should be: "The data suggests five natural segments. Let me show you the five-segment solution, then we can merge the two most similar pairs to create three actionable campaign groups. This way, we respect the data while working within your operational constraints."
Exercise 9.14 — RFM Analysis
(a) RFM Scoring (1 = lowest, 5 = highest):
| Customer | Recency Score | Frequency Score | Monetary Score | RFM Total |
|---|---|---|---|---|
| Dave | 5 (2 days) | 5 (48) | 5 ($8,500) | 15 |
| Alice | 4 (5 days) | 4 (24) | 4 ($3,200) | 12 |
| Carol | 3 (15 days) | 3 (12) | 3 ($1,800) | 9 |
| Frank | 2 (30 days) | 2 (8) | 2 ($600) | 6 |
| Bob | 1 (90 days) | 1 (3) | 1 ($150) | 3 |
| Eve | 1 (180 days) | 1 (1) | 1 ($45) | 3 |
(b) Best customers: Dave (5-5-5) and Alice (4-4-4) — recent, frequent, high-value. At-risk: Frank (moderate across all dimensions — not fully disengaged but declining). Lost: Eve (180 days since last purchase, single purchase, minimal spend) and Bob (90 days, infrequent, low spend).
(c) Frank and Bob. Frank is the highest-value target for retention because he has established purchasing behavior (8 purchases, $600) but is showing early signs of disengagement (30 days since last purchase). A timely offer could re-engage him before he drifts further. Bob is worth a "win-back" attempt because his 90-day gap is recoverable, whereas Eve's 180-day gap and single purchase suggest she may have been a one-time buyer rather than a disengaged loyal customer.
Chapter 10: Recommendation Systems
Exercise 10.4 — Ratings Matrix Computation
(a) Cosine similarity between User 1 and User 2 using overlapping items (A and D):
User 1 vector: [5, 2] (Movies A and D) User 2 vector: [4, 1]
cos(U1, U2) = (5x4 + 2x1) / (sqrt(25+4) x sqrt(16+1)) = 22 / (5.385 x 4.123) = 22 / 22.20 = 0.991
(b) Cosine similarity between User 1 and User 4 using overlapping items (A, B, D):
User 1 vector: [5, 4, 2] User 4 vector: [5, 4, 1]
cos(U1, U4) = (25+16+2) / (sqrt(45) x sqrt(42)) = 43 / (6.708 x 6.481) = 43 / 43.47 = 0.989
(c) User 1 and User 2 have slightly higher similarity (0.991 vs. 0.989), but the difference is negligible. Using User 4's rating (which has more overlapping items and thus is more reliable), the predicted rating for User 1 on Movie E = 4 (User 4 rated Movie E as 4).
(d) Two overlapping items is highly unreliable because a single extreme rating can dominate the similarity calculation. In production systems, a minimum overlap of 5–10 items is typically required before computing similarity to ensure statistical stability.
Exercise 10.7 — The Sparsity Challenge
(a) Matrix density = (2,000,000 x 12) / (2,000,000 x 500,000) = 24,000,000 / 1,000,000,000,000 = 0.0024% (approximately 0.002%). For every 100,000 cells in the matrix, only about 2.4 are filled.
(b) At 99.998% sparsity, most user pairs have zero or one overlapping items, making similarity computation impossible for the vast majority of user pairs. Item-based similarity fares slightly better (items have more interactions than individual users) but still faces severe sparsity.
(c) Two mitigation techniques: (1) Matrix factorization (e.g., SVD or ALS) — decomposes the sparse matrix into dense latent factor vectors, enabling comparison even without direct overlap. Trade-off: latent factors are not interpretable, and cold-start users still have no factors. (2) Hybrid approach — supplement collaborative filtering with content-based features (product descriptions, categories), so recommendations can be made based on item attributes when interaction data is too sparse. Trade-off: requires maintaining a content feature pipeline alongside the collaborative system.
Exercise 10.15 — NDCG Calculation
(a) DCG@5 = 1/log2(2) + 0/log2(3) + 1/log2(4) + 1/log2(5) + 0/log2(6) = 1/1.0 + 0 + 1/2.0 + 1/2.322 + 0 = 1.0 + 0 + 0.5 + 0.431 + 0 = 1.931
(b) Ideal ranking places all relevant items first: positions 1, 2, 3 have relevant items. IDCG@5 = 1/log2(2) + 1/log2(3) + 1/log2(4) + 0 + 0 = 1.0 + 0.631 + 0.5 = 2.131
(c) NDCG@5 = 1.931 / 2.131 = 0.906
(d) If A and C swap: DCG = 1/log2(4) + 0 + 1/log2(2) + 1/log2(5) + 0 = 0.5 + 0 + 1.0 + 0.431 = 1.931 — identical DCG in this case because positions 1 and 3 have symmetric contributions in this particular configuration. However, in general, ranking the most relevant items earlier is always preferred because the logarithmic discount penalizes lower positions more heavily.
Chapter 11: Model Evaluation and Selection
Exercise 11.8 — Confusion Matrix Interpretation
(a) Accuracy = (45 + 9800) / 10000 = 98.45%
(b) Precision = 45 / (45 + 150) = 23.08% — only 23% of flagged transactions are actually fraud.
(c) Recall = 45 / (45 + 5) = 90.0% — the model catches 90% of actual fraud.
(d) F1 score = 2 x (0.2308 x 0.90) / (0.2308 + 0.90) = 2 x 0.2077 / 1.1308 = 0.367
(e) False positive rate = 150 / (150 + 9800) = 1.51%
(f) Specificity = 9800 / (9800 + 150) = 98.49%
Despite 98.45% accuracy and 90% recall, the precision of only 23% means that for every actual fraud case caught, the system also incorrectly flags about 3.3 legitimate transactions. Whether this is acceptable depends on the cost of investigating false positives versus the cost of missed fraud.
Exercise 11.11 — Threshold Optimization
(a) Expected profit at each threshold:
- 0.3: (85 x $480) + (200 x -$20) + (15 x -$500) = $40,800 - $4,000 - $7,500 = $29,300
- 0.5: (60 x $480) + (80 x -$20) + (40 x -$500) = $28,800 - $1,600 - $20,000 = $7,200
- 0.7: (30 x $480) + (20 x -$20) + (70 x -$500) = $14,400 - $400 - $35,000 = -$21,000
(b) The 0.3 threshold maximizes profit at $29,300.
(c) Precision, recall, and F1:
| Threshold | Precision | Recall | F1 |
|---|---|---|---|
| 0.3 | 85/285 = 0.298 | 85/100 = 0.850 | 0.441 |
| 0.5 | 60/140 = 0.429 | 60/100 = 0.600 | 0.500 |
| 0.7 | 30/50 = 0.600 | 30/100 = 0.300 | 0.400 |
(d) The highest F1 (0.500) is at threshold 0.5, which yields only $7,200 in profit — less than a quarter of the profit at threshold 0.3. **F1 does not maximize profit** because F1 treats precision and recall as equally important, whereas the business context heavily favors recall (cost of missed churner = $500) over precision (cost of unnecessary offer = $20). The profit-maximizing threshold always depends on the specific cost structure, not on a generic statistical metric.
Chapter 12: From Model to Production — MLOps
Exercise 12.1(c) — Feature Store Gap
Root cause: Data pipeline gap. The features average_transaction_amount_30d and transactions_this_week are aggregate features that require historical lookback queries. In development, these were computed from a static dataset where all historical data was available. In production, the features must be computed in real-time, but the database queries take 45 seconds — far too slow for a fraud detection system that must respond in milliseconds.
Prevention: Implement a feature store (e.g., Feast, Tecton, or a custom solution) that pre-computes aggregate features on a schedule and serves them from a low-latency cache. During the data preparation phase, the team should have validated that every feature used in training could be computed within the production latency requirement. This is the "training-serving skew" problem — a gap between how features are computed in training versus serving.
Exercise 12.3 — The Business Case for MLOps
Current state: 8 models x 14 weeks deployment time = 112 engineering-weeks. With MLOps: 8 models x 4 weeks = 32 engineering-weeks. Time savings = 80 engineering-weeks.
But the real value is time-to-value acceleration. Each model generates $500K/year. A 10-week deployment acceleration means each model starts generating value 10 weeks earlier: 10 weeks x $500K/52 weeks = $96,154 in accelerated value per model.** For 8 models: 8 x $96,154 = $769,231 in total accelerated value.**
Investment: $350,000/year. First-year net value: $769,231 - $350,000 = **$419,231. ROI = $419,231 / $350,000 = 120%.** This does not even account for the compound effect of deploying future models faster or the reduced risk of deployment failures.
Chapter 13: Neural Networks Demystified
Exercise 13.1 — Key Definitions
(a) Neuron (node): The basic computational unit of a neural network that receives inputs, applies weights and a bias, sums them, and passes the result through an activation function. It is loosely inspired by biological neurons but is fundamentally a mathematical operation.
(b) Activation function: A nonlinear function applied to a neuron's output that enables the network to learn complex, nonlinear patterns. Without activation functions, a deep network would collapse to a single linear transformation regardless of depth.
(c) Backpropagation: The algorithm that computes how much each weight contributed to the network's prediction error, propagating the error signal backward from the output layer through hidden layers. It enables gradient descent by providing the gradient of the loss function with respect to each weight.
Exercise 13.5 — Build vs. Buy for Neural Networks
For a mid-size retailer considering image-based product tagging, the recommended approach is buy (use a cloud API like Google Cloud Vision or AWS Rekognition) for the following reasons:
- Data requirement: Training a custom CNN requires thousands of labeled images per category. A retailer with 5,000 SKUs would need 50,000+ labeled images — a significant labeling investment.
- Expertise: Custom neural network development requires deep learning expertise that most mid-size retailers do not have in-house.
- Maintenance: Neural networks require GPU infrastructure, monitoring for drift, and periodic retraining — ongoing costs that cloud APIs handle transparently.
- Time-to-value: A cloud API can be integrated in days; a custom model takes months.
The exception would be if the retailer has highly specialized visual classification needs (e.g., fabric texture analysis) that generic APIs handle poorly and that represent a significant competitive advantage.
Chapter 14: NLP for Business
Exercise 14.1 — NLP Definitions
(a) Tokenization is the process of splitting text into individual units (tokens) — typically words, subwords, or characters — that serve as the input to NLP models. The choice of tokenization strategy affects model performance because it determines the vocabulary and granularity of text representation.
(b) Stopwords are common words (e.g., "the," "is," "and," "of") that appear frequently in all documents and carry little discriminative information. Removing them reduces noise and dimensionality, allowing the model to focus on content-bearing words, though modern transformer models typically retain stopwords because they provide syntactic context.
(c) Lemmatization reduces words to their base (dictionary) form using linguistic rules — "running" becomes "run," "better" becomes "good." Unlike stemming (which applies crude rules like removing suffixes), lemmatization produces actual words, making output more interpretable for business users.
Exercise 14.10 — TF-IDF Feature Engineering
(b) Highest IDF scores (terms appearing in few documents):
- "crashes" — appears in only 2 of 5 documents (high IDF because it is distinctive)
- "steep" — appears in only 1 document
- "useless" — appears in only 1 document
- "intuitive" — appears in only 1 document
- "confusing" — appears in only 1 document
Terms like "software," "the," and "is" appear in multiple documents and would have low IDF scores.
(c) Useful distinguishing bigrams:
- "software crashes" — strongly negative, product-specific
- "tech support" — service-related, appears in both positive and negative contexts (distinguishing based on accompanying words)
- "learning curve" — typically negative in product reviews
- "easy install" or "easy to" — strongly positive
- "love it" — strongly positive
Exercise 14.15 — ReviewAnalyzer Extension (Python)
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
import pandas as pd
class ReviewAnalyzerExtended:
"""Extended ReviewAnalyzer with similarity search."""
def __init__(self):
self.vectorizer = TfidfVectorizer(
max_features=5000,
stop_words='english',
ngram_range=(1, 2)
)
self.tfidf_matrix = None
def fit(self, corpus_texts):
"""Fit TF-IDF vectorizer on the corpus."""
self.tfidf_matrix = self.vectorizer.fit_transform(corpus_texts)
return self
def find_similar_reviews(self, query_text, corpus_df,
top_n=5):
"""
Find the most similar reviews to a query text.
Parameters:
query_text: str — the review to find matches for
corpus_df: DataFrame with 'text' column
top_n: int — number of similar reviews to return
Returns:
DataFrame with top_n most similar reviews and
similarity scores
"""
query_vec = self.vectorizer.transform([query_text])
similarities = cosine_similarity(
query_vec, self.tfidf_matrix
).flatten()
top_indices = similarities.argsort()[::-1][:top_n]
results = corpus_df.iloc[top_indices].copy()
results['similarity_score'] = similarities[top_indices]
return results[['text', 'similarity_score']]
# Usage example
corpus = pd.DataFrame({'text': [
"The jacket quality is excellent, very warm.",
"Shipping took forever but the product was worth it.",
"Great coat, perfect for winter. Arrived quickly.",
"Poor stitching, material feels cheap.",
"Loved the jacket but delivery was very slow.",
]})
analyzer = ReviewAnalyzerExtended()
analyzer.fit(corpus['text'])
query = "The jacket quality is great but shipping was slow"
results = analyzer.find_similar_reviews(query, corpus, top_n=3)
print(results)
This similarity search would be useful for Athena's customer support team because agents could quickly find previous cases with similar complaints, see how they were resolved, and apply consistent solutions. It also enables trend detection — if many recent queries match a specific complaint pattern, it signals a systemic issue.
Chapter 15: Computer Vision for Business
Exercise 15.1 — CV Application Classification
(a) Quality inspection on a manufacturing line: Image classification (binary: defective/non-defective) or object detection (locating the specific defect region). This is one of the highest-ROI computer vision applications because even small improvements in defect detection rates translate to large savings in warranty costs and customer returns.
(b) Counting people in a retail store: Object detection (identifying and counting individual people in video frames). The key challenge is occlusion — people partially hidden behind displays or other shoppers — which requires models trained specifically on crowded retail environments.
(c) Reading text from scanned invoices: OCR (Optical Character Recognition), which is a specialized form of image classification applied to character-level regions. Modern approaches use transformer-based document understanding models (like LayoutLM) that consider both text content and spatial layout.
Exercise 15.3 — Transfer Learning Decision
For a mid-size hospital with 2,000 labeled X-ray images, transfer learning is strongly recommended over training from scratch. A pre-trained model (e.g., ResNet or EfficientNet trained on ImageNet) has already learned to detect edges, textures, and shapes from millions of images. Fine-tuning the last few layers on 2,000 X-rays can achieve strong performance because the low-level features (edges, gradients) transfer well across domains. Training from scratch with only 2,000 images would result in severe overfitting and poor generalization.
Chapter 16: Time Series Forecasting
Exercise 16.9 — Decomposition Analysis
(a) Trend: The data shows a clear upward linear trend. Year 1 annual total: $642,000. Year 4 annual total: $943,000. Annual growth ≈ ($943K - $642K) / (3 years x $642K) ≈ **15.6% per year**, or approximately $100K per year in absolute terms. The growth appears linear rather than exponential because the absolute annual increase is roughly constant.
(b) Seasonal pattern: Peak months are June-July (summer), with a secondary rise in November-December. Trough months are January-February. The seasonal amplitude in Year 1 (max 65K - min 38K = 27K) versus Year 4 (max 95K - min 58K = 37K) grows proportionally with the level, suggesting multiplicative seasonality.
(c) January Year 5 estimate: Using multiplicative decomposition, January's seasonal index ≈ 42/avg(Y1) ÷ 1.0 ≈ 42/53.5 ≈ 0.785. Projected Year 5 average month ≈ $943K/12 x 1.156 ≈ $90.9K. Estimated January Y5 ≈ 0.785 x $90.9K ≈ **$71,300**.
(d) Additional helpful information: day-of-week effects (if daily data were available), weather patterns, competitive activity, marketing calendar, and whether any new locations opened during the period.
Exercise 16.11 — Prophet Configuration (Python)
from prophet import Prophet
import pandas as pd
# Configure Prophet model for restaurant chain
model = Prophet(
growth='logistic', # Trend acceleration from new stores
yearly_seasonality=True, # Annual seasonal pattern
weekly_seasonality=True, # Weekend vs. weekday patterns
daily_seasonality=False, # Not needed for daily revenue
changepoint_prior_scale=0.15, # Allow more flexibility for
# trend changes from new stores
seasonality_prior_scale=10.0, # Strong seasonality expected
seasonality_mode='multiplicative' # Revenue level affects
# seasonal amplitude
)
# Define holidays (closures)
holidays = pd.DataFrame({
'holiday': ['thanksgiving', 'christmas'] * 3,
'ds': pd.to_datetime([
'2024-11-28', '2024-12-25',
'2025-11-27', '2025-12-25',
'2026-11-26', '2026-12-25'
]),
'lower_window': [0, 0, 0, 0, 0, 0],
'upper_window': [0, 0, 0, 0, 0, 0],
})
model = Prophet(holidays=holidays,
growth='logistic',
yearly_seasonality=True,
weekly_seasonality=True,
seasonality_mode='multiplicative',
changepoint_prior_scale=0.15)
# Add monthly promotion regressor
# (binary column in the dataframe: 1 if first weekend
# of month, 0 otherwise)
model.add_regressor('is_promo_weekend', mode='multiplicative')
# Add custom monthly seasonality for finer control
model.add_seasonality(
name='monthly',
period=30.5,
fourier_order=5,
mode='multiplicative'
)
The key design decisions: multiplicative mode because revenue level affects seasonal amplitude; logistic growth to model the trend acceleration from new store openings (requires setting cap in the dataframe); holidays with lower_window=0 and upper_window=0 because closures affect only the specific day; and promotional weekends as an external regressor rather than a seasonality because they are irregular events.
Chapter 17: Generative AI — Large Language Models
Exercise 17.8 — Hallucination Detection
(a) Fabricated claims in the LLM summary:
- "driven primarily by expansion in the enterprise segment, which grew 11.2% to $142M" — No segment breakdown exists in the actual data.
- "reflecting the company's ongoing cost optimization program that reduced SG&A by $3.2M" — No information about cost programs or SG&A in the source data.
- "primarily mid-market accounts" — No breakdown of customer additions by segment in the source data.
- "a 12% increase from the prior year" — R&D spending percentage is given (7.4% of revenue) but the year-over-year R&D growth rate is not in the source data.
- "supporting the launch of three new AI-powered product features" — No product launch information in the source data.
- "CEO Sarah Chen noted..." and the full-year target of $510M — No CEO name, quote, or guidance figure in the source data.
(b) Each fabrication is plausible because it follows the logical pattern of a real earnings summary: segment breakdowns are common, cost optimization programs are typical drivers of margin improvement, and CEO quotes with forward guidance are standard in earnings releases. The LLM generates text that looks like an earnings summary because it has seen thousands of them during training.
(c) Implement a source-verification protocol: (1) every factual claim in an LLM-generated summary must have a traceable source citation, (2) use a RAG architecture that grounds generation in source documents rather than the model's parametric knowledge, (3) require human review of all externally-shared LLM-generated content with a specific checklist of "verify each number and attribution."
Exercise 17.11 — Cost Analysis
(a) Daily cost (GPT-4o): Input = 25,000 x 200 = 5,000,000 tokens x $2.50/M = $12.50. Output = 25,000 x 150 = 3,750,000 tokens x $10.00/M = $37.50. Daily total = $50.00.
(b) Monthly cost = $50.00 x 30 = **$1,500/month.**
(c) GPT-4o-mini: Input = 5,000,000 x $0.15/M = $0.75. Output = 3,750,000 x $0.60/M = $2.25. Daily = $3.00; Monthly = $90.00. Cost reduction = 94%.
(d) Current agent cost: 40 agents x $52,000 = $2,080,000/year. If LLM handles 60% of tickets: agent workload reduces by 60%, potentially enabling a reduction to 16 agents (40 x 0.40). Savings = 24 agents x $52,000 = $1,248,000/year. LLM annual cost (GPT-4o): $1,500 x 12 = $18,000. Net annual savings = $1,248,000 - $18,000 = $1,230,000. Even using the more expensive model, the economics are compelling.
(e) Non-financial factors: quality of AI-generated responses versus human agents, customer satisfaction with automated interactions, employee morale and retention, legal liability for AI-generated advice, edge cases requiring human empathy and judgment, and reputational risk if the chatbot makes visible errors.
Exercise 17.13 — Basic API Integration (Python)
import json
from openai import OpenAI
client = OpenAI() # Uses OPENAI_API_KEY env variable
def classify_customer_feedback(feedback_text: str) -> dict:
"""
Classify customer feedback into structured categories.
Returns:
dict with keys: sentiment, category, urgency, summary
"""
system_prompt = """You are a customer feedback classifier.
Analyze the provided feedback and return a JSON object with
exactly these fields:
- sentiment: one of "positive", "neutral", or "negative"
- category: one of "product", "shipping", "service",
"pricing", or "other"
- urgency: one of "high", "medium", or "low"
(high = safety issue or order problem requiring immediate
action; medium = complaint needing follow-up;
low = general feedback)
- summary: one sentence summarizing the feedback
Return ONLY valid JSON, no additional text."""
try:
response = client.chat.completions.create(
model="gpt-4o",
temperature=0.0,
response_format={"type": "json_object"},
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": feedback_text}
]
)
result = json.loads(
response.choices[0].message.content
)
# Validate expected keys
required_keys = {
'sentiment', 'category', 'urgency', 'summary'
}
if not required_keys.issubset(result.keys()):
missing = required_keys - set(result.keys())
raise ValueError(
f"Missing keys in response: {missing}"
)
return result
except Exception as e:
return {
"sentiment": "unknown",
"category": "other",
"urgency": "medium",
"summary": f"Classification failed: {str(e)}"
}
# Example usage
feedback = ("I ordered a winter jacket two weeks ago and it "
"still hasn't arrived. Customer service put me on "
"hold for 30 minutes. Very frustrated.")
result = classify_customer_feedback(feedback)
print(json.dumps(result, indent=2))
# Expected output:
# {
# "sentiment": "negative",
# "category": "shipping",
# "urgency": "high",
# "summary": "Customer is frustrated by a two-week shipping
# delay and poor customer service experience."
# }
Chapter 18: Generative AI — Multimodal
Exercise 18.1 — Multimodal Definitions
(a) Multimodal AI refers to systems that can process and generate content across multiple data types (modalities) — text, images, audio, video, and code — within a single model. This contrasts with unimodal systems that handle only one type of input.
(b) Vision-language model is a multimodal AI system that can jointly understand images and text, enabling tasks like image captioning, visual question answering, and document understanding. Examples include GPT-4V, Gemini, and Claude's vision capabilities.
Exercise 18.5 — Business Use Case Evaluation
For a real estate company considering multimodal AI for property listings:
- Value: High — automated property description generation from photos could save listing agents 30–60 minutes per property. With 10,000 listings per year, this represents 5,000–10,000 hours of labor savings.
- Risk: Medium — the model might misidentify room types (calling a den a bedroom, which has legal implications for listing accuracy) or miss important property defects visible in photos.
- Recommendation: Proceed with caution — use multimodal AI to generate first-draft descriptions that agents review and edit, rather than publishing directly. Include a specific validation checklist for room counts, square footage claims, and amenity descriptions.
Chapter 19: Prompt Engineering Fundamentals
Exercise 19.3 — Zero-Shot vs. Few-Shot
Zero-shot prompting provides only the instruction without any examples — the model must infer the task from the instruction alone. Few-shot prompting provides several input-output examples before the actual task, demonstrating the desired format and reasoning pattern.
When to use zero-shot: Tasks where the instruction is clear, the format is standard, and the model has strong prior training on the task type. Example: "Summarize this paragraph in one sentence" — summarization is well-understood and does not require examples.
When to use few-shot: Tasks with specific formatting requirements, domain-specific conventions, or ambiguous classification criteria. Example: classifying customer support tickets into company-specific categories (e.g., "Billing — Dispute," "Billing — Inquiry," "Technical — Outage"). Without examples showing the boundary between "Dispute" and "Inquiry," the model will apply its own interpretation, which may not match the company's definitions.
Exercise 19.13 — Prompt Debugging Challenge
(a) "Tell me about our company's performance."
- Problem: Missing context — the LLM has no access to "our company's" data. The prompt is vague ("performance" on what dimension?) and assumes the model knows which company "our" refers to.
- Rewritten: "You are a financial analyst at Athena Retail Group. Using the following quarterly data [insert data], analyze revenue performance across the four regions, identify the strongest and weakest performing region, and explain the key drivers of the performance gap."
(b) The multi-role, multi-task prompt.
- Problem: Task overload — asking a single prompt to perform five complex tasks simultaneously guarantees that each task receives superficial treatment. Multiple conflicting roles create confusion.
- Rewritten: "You are a marketing strategist. Using the following customer segmentation data [insert data], recommend three targeted marketing campaigns for our highest-value customer segment. For each campaign, specify the channel, message, and expected KPI."
(c) "Explain why our Q4 sales were disappointing and suggest that we need to increase our digital advertising budget."
- Problem: Leading the witness — the prompt pre-determines the conclusion ("suggest that we need to increase our digital advertising budget"), which makes the LLM produce a sycophantic confirmation rather than an objective analysis.
- Rewritten: "Our Q4 sales were 12% below target. Using the following channel performance data [insert data], identify the three most likely contributing factors to the shortfall. For each factor, recommend a specific corrective action with an estimated impact. Do not assume any particular solution in advance."
Exercise 19.14 — Building a PromptBuilder (Python)
class PromptBuilder:
"""Systematic prompt construction with six components."""
def __init__(self, name: str):
self.name = name
self.role = ""
self.instruction = ""
self.context = ""
self.examples = []
self.output_format = ""
self.constraints = []
self.temperature = 0.7
self.max_tokens = 1000
self.versions = []
def set_role(self, role: str):
self.role = role
return self
def set_instruction(self, instruction: str):
self.instruction = instruction
return self
def set_context(self, context: str):
self.context = context
return self
def add_example(self, input_text: str, output_text: str):
self.examples.append({
'input': input_text, 'output': output_text
})
return self
def set_output_format(self, fmt: str):
self.output_format = fmt
return self
def add_constraint(self, constraint: str):
self.constraints.append(constraint)
return self
def set_params(self, temperature=None, max_tokens=None):
if temperature is not None:
self.temperature = temperature
if max_tokens is not None:
self.max_tokens = max_tokens
return self
def build(self) -> str:
parts = []
if self.role:
parts.append(f"ROLE: {self.role}")
if self.instruction:
parts.append(f"\nINSTRUCTION: {self.instruction}")
if self.context:
parts.append(f"\nCONTEXT: {self.context}")
if self.examples:
parts.append("\nEXAMPLES:")
for i, ex in enumerate(self.examples, 1):
parts.append(f" Example {i}:")
parts.append(f" Input: {ex['input']}")
parts.append(f" Output: {ex['output']}")
if self.output_format:
parts.append(
f"\nOUTPUT FORMAT: {self.output_format}"
)
if self.constraints:
parts.append("\nCONSTRAINTS:")
for c in self.constraints:
parts.append(f" - {c}")
return "\n".join(parts)
def save_version(self, version_id: str):
self.versions.append({
'version': version_id,
'prompt': self.build()
})
return self
def preview(self):
print(f"=== {self.name} ===")
print(self.build())
print(f"\n[temperature={self.temperature}, "
f"max_tokens={self.max_tokens}]")
# Meeting agenda generator
agenda = PromptBuilder("meeting-agenda-generator")
agenda.set_role(
"You are a professional executive assistant who creates "
"clear, action-oriented meeting agendas."
)
agenda.set_instruction(
"Given a list of discussion topics, create a structured "
"meeting agenda with time allocations, discussion leads, "
"and desired outcomes for each item."
)
agenda.set_context(
"The agenda is for Athena Retail Group's weekly "
"leadership meeting (60 minutes, 8 attendees)."
)
agenda.add_example(
input_text="Topics: Q3 results review, holiday hiring "
"plan, new POS system update",
output_text="AGENDA — Leadership Team Weekly\n"
"Duration: 60 minutes\n\n"
"1. Q3 Results Review (20 min) — CFO\n"
" Outcome: Align on variance drivers\n"
"2. Holiday Hiring Plan (25 min) — VP HR\n"
" Outcome: Approve headcount request\n"
"3. POS System Update (10 min) — CTO\n"
" Outcome: Confirm go-live date\n"
"4. Open Items (5 min) — Chair"
)
agenda.add_example(
input_text="Topics: AI chatbot pilot results, "
"customer satisfaction dip, budget reforecast",
output_text="AGENDA — Leadership Team Weekly\n"
"Duration: 60 minutes\n\n"
"1. AI Chatbot Pilot Results (15 min) — "
"VP Data & AI\n"
" Outcome: Decide expand, modify, or pause\n"
"2. Customer Satisfaction Analysis (25 min) "
"— VP CX\n"
" Outcome: Identify root causes, assign "
"owners\n"
"3. Budget Reforecast (15 min) — CFO\n"
" Outcome: Approve revised Q4 targets\n"
"4. Open Items (5 min) — Chair"
)
agenda.set_output_format(
"Structured agenda with numbered items, time allocations "
"totaling 60 minutes, discussion lead for each item, "
"and a one-line desired outcome."
)
agenda.add_constraint("Total time must not exceed 60 minutes")
agenda.add_constraint(
"No single item should exceed 25 minutes"
)
agenda.set_params(temperature=0.3, max_tokens=500)
agenda.save_version("1.0")
agenda.preview()
Chapter 20: Advanced Prompt Engineering
Exercise 20.1 — Definitions
(a) Chain-of-thought (CoT) prompting instructs the LLM to show its reasoning step by step before arriving at an answer, reducing errors on complex reasoning tasks. The simplest form appends "Let's think through this step by step" to the prompt.
(b) Tree-of-thought (ToT) prompting generates multiple reasoning paths simultaneously, evaluates each path, and selects the best one. It is useful for strategic decisions where multiple options should be compared before committing to a recommendation.
(d) Prompt chaining breaks a complex task into a sequence of simpler prompts, where each prompt's output feeds into the next prompt's input. It mirrors how experienced analysts decompose problems into manageable steps.
Exercise 20.8 — Chain-of-Thought for Financial Analysis
You are a financial analyst evaluating a potential new store
location for Athena Retail Group. Think through this analysis
step by step.
DATA:
- Market population: 185,000
- Median household income: $78,500
- Nearest Athena store: 42 miles
- Nearest competitor: 8 miles
- Estimated build-out cost: $3.2M
- Estimated annual revenue (Year 1): $6.8M
- Estimated annual operating cost: $5.9M
STEP 1 — MARKET ASSESSMENT:
Evaluate the market size and income level. Is the population
sufficient to support a retail location? How does the median
income compare to Athena's target demographic?
STEP 2 — COMPETITIVE ANALYSIS:
The nearest Athena store is 42 miles away (low cannibalization
risk). The nearest competitor is 8 miles away. Assess the
competitive dynamics. Is 8 miles enough distance for
differentiation?
STEP 3 — FINANCIAL PROJECTIONS (3-YEAR):
Calculate Year 1 profit, then project Years 2 and 3 assuming
8% annual revenue growth and 3% annual cost inflation.
Show your calculations for each year.
STEP 4 — BREAKEVEN CALCULATION:
Given the $3.2M build-out cost and the annual profit stream,
calculate the payback period. Show the cumulative cash flow
by year until breakeven.
STEP 5 — RECOMMENDATION:
Based on your analysis, provide a clear GO or NO-GO
recommendation with three supporting reasons and one key risk
to monitor.
Exercise 20.18 — Cost-Benefit Analysis of Self-Consistency
(a) Current annual API cost: 50,000 emails/day x 365 days x $0.002 = **$36,500/year. With 5x self-consistency: $182,500/year.** Annual cost increase: **$146,000.**
(b) Current misclassifications: 50,000 x 0.09 = 4,500/day. With self-consistency: 50,000 x 0.04 = 2,000/day. Daily improvement: 2,500 fewer misclassifications. Annual improvement: 2,500 x 365 = 912,500 fewer misclassifications. Annual savings: 912,500 x $15 = **$13,687,500.**
(c) Net ROI: ($13,687,500 - $146,000) / $146,000 = 9,273%. The investment is overwhelmingly justified.
(d) Self-consistency becomes cost-negative when the misclassification cost is low enough that the savings no longer exceed the API cost increase. Break-even: 912,500 x cost_per_error = $146,000. **Cost_per_error = $0.16.** If each misclassification costs less than $0.16, self-consistency is not worth the API cost. At $15 per misclassification, the investment is a clear win.
Chapter 21: AI-Powered Workflows
Exercise 21.1 — Definitions
(a) Retrieval-Augmented Generation (RAG) is an architecture that enhances LLM responses by first retrieving relevant documents from a knowledge base and then using those documents as context for generation. It reduces hallucination by grounding responses in actual source material rather than relying solely on the model's parametric knowledge.
(c) Embedding is a dense numerical vector representation of text (or other data) in a high-dimensional space where semantically similar items are positioned near each other. Embeddings enable semantic search — finding documents that are related in meaning, not just keyword overlap.
(e) Chunking is the process of splitting large documents into smaller segments before embedding and indexing them in a vector database. The chunk size and strategy directly affect retrieval quality: too small and chunks lack context, too large and they dilute the relevant information with irrelevant content.
Exercise 21.8(a) — Chunking Strategy Analysis
Fixed-size chunking at 200 characters produces approximately these chunks:
- "Section 1: Coverage Period. All electronics purchased from ElectraMax carry a 24-month manufacturer warranty from the date of purchase. Extended warranties of 36 or 48 months are available at th"
- "e time of purchase. Section 2: What's Covered. The warranty covers defects in materials and workmanship under normal use. This includes hardware failures, manufacturing defects, and component malfu"
- "nctions. Section 3: What's Not Covered. The warranty does not cover: damage from accidents, misuse, or unauthorized modifications; normal wear and tear; cosmetic damage; software issues; damage f"
Problem: Chunks 1-2 split mid-sentence ("available at th / e time of purchase") and mid-section, making them difficult to understand in isolation. A customer asking about coverage period would retrieve Chunk 1, which cuts off before the extended warranty information is complete.
Exercise 21.13(a-c) — RAG vs. Vanilla LLM Quantitative Analysis
(a) Vanilla LLM: 1,200 x 0.28 (error rate) = 336 incorrect responses/day. RAG system: 1,200 x 0.10 = 120 incorrect responses/day.
(b) Vanilla LLM daily cost of errors: 336 x $35 = **$11,760. RAG daily cost of errors: 120 x $35 = **$4,200.
(c) Daily savings: $11,760 - $4,200 - $0.50 (RAG operating cost) = **$7,559.50/day. Annualized: approximately $2.76 million** in avoided customer experience costs. The $0.50/day operating cost is negligible compared to the error reduction value.
Chapter 22: No-Code / Low-Code AI
Exercise 22.1 — Platform Classification
For a marketing team at a mid-size company wanting to build a customer segmentation tool:
- No-code (e.g., Obviously AI, DataRobot): Best if the team has no technical members and needs results within days. Limitation: limited customization and potential vendor lock-in.
- Low-code (e.g., H2O, Azure ML Studio): Best if the team has one member with basic Python skills who can customize pre-built components. Offers more flexibility than no-code.
- Pro-code (scikit-learn, custom Python): Best if the team has data scientists and needs full control over feature engineering, model selection, and deployment. Requires the most skill but provides the most flexibility.
The recommendation for a marketing team with no data scientists: start with a low-code platform. It provides guardrails that prevent common mistakes (data leakage, improper validation) while allowing enough customization to handle company-specific segmentation needs. Transition to pro-code only if the segmentation becomes a core competitive differentiator requiring custom algorithms.
Chapter 23: Cloud AI Services and APIs
Exercise 23.3 — Cost Comparison
For processing 100,000 product images for automated tagging:
Cloud API (e.g., Google Vision API): $1.50 per 1,000 images = $150 total. No infrastructure cost. Instant scalability. Vendor dependency for pricing changes.
Custom model on cloud GPU: ~40 GPU-hours for training at $3/hour = $120. Plus development time (40 hours at $100/hour = $4,000). Plus inference infrastructure. Total first-time cost: ~$4,500.
Recommendation: For a one-time batch of 100,000 images, the cloud API at $150 is dramatically cheaper. The break-even point where a custom model becomes cost-effective is approximately 3 million images per year (where ongoing API costs exceed the amortized development cost) — or when the classification task is so specialized that the generic API cannot achieve acceptable accuracy.
Chapter 24: AI for Marketing and Customer Experience
Exercise 24.1 — Marketing AI Use Case Mapping
| Customer Journey Stage | AI Application | Primary Technique | Key Metric |
|---|---|---|---|
| Awareness | Audience lookalike modeling | Classification | Cost per qualified lead |
| Consideration | Personalized content recommendations | Collaborative filtering | Engagement rate uplift |
| Purchase | Dynamic pricing optimization | Regression / RL | Revenue per visitor |
| Post-Purchase | Churn prediction with proactive outreach | Classification (Ch. 7) | Retention rate |
| Advocacy | Sentiment analysis of reviews and social | NLP (Ch. 14) | Net Promoter Score |
Exercise 24.5 — Attribution Modeling
Last-touch attribution assigns all credit to the final marketing touchpoint before conversion. This systematically overvalues bottom-of-funnel channels (search ads, retargeting) and undervalues top-of-funnel channels (brand advertising, content marketing) that introduce customers to the brand. A data-driven attribution model using ML (e.g., Shapley value attribution) distributes credit proportionally across all touchpoints based on their marginal contribution, providing a more accurate picture of channel effectiveness.
Chapter 25: Bias in AI Systems
Exercise 25.1 — Classifying Bias Sources
(a) Representation bias (primary) and evaluation bias (secondary). The training data over-represents American English speakers, and the evaluation benchmark shares this same bias — so the model appears to perform well because it is tested on the same population it was trained on. The poor performance on other accents is invisible until deployment.
(b) Historical bias. The model learns patterns from a period when discriminatory lending practices (redlining, cosigner requirements) restricted credit access in certain neighborhoods. The lower repayment rates in these neighborhoods are an artifact of historical discrimination, not a reflection of borrowers' actual creditworthiness.
(c) Measurement bias (also called aggregation bias). Using a single diagnostic threshold across all patients assumes the underlying measurement means the same thing for everyone. When biological variation causes the baseline to differ by sex and ethnicity, a universal threshold misclassifies patients from groups whose normal range is different from the majority.
(d) Deployment bias. The model was designed for one purpose (flagging high-risk customers for retention offers) but is being used for a different purpose (excluding customers from promotions). This repurposing inverts the model's intended value — customers identified as needing more attention are instead given less.
Exercise 25.6 — Disparate Impact Ratio
(a) Selection rate for Group A: 120/300 = 40.0%. Selection rate for Group B: 50/200 = 25.0%.
(b) Disparate impact ratio = 25.0% / 40.0% = 0.625 (62.5%).
(c) The four-fifths (80%) rule requires a ratio of at least 0.80. At 0.625, the model fails the four-fifths rule — Group B's selection rate is only 62.5% of Group A's, well below the 80% threshold.
(d) To meet the four-fifths rule: Group B selection rate must be at least 0.80 x 40% = 32%. Required selections from Group B = 0.32 x 200 = 64 candidates (an increase of 14 from the current 50).
(e) The four-fifths rule is a guideline from the EEOC's Uniform Guidelines on Employee Selection Procedures (1978). It is not a statute, but courts regularly use it as evidence of adverse impact. Failing the four-fifths rule shifts the burden of proof to the employer to demonstrate that the selection criterion is job-related and consistent with business necessity.
Exercise 25.9 — Calculating Bias in Athena's Model
(a) Under 35: Candidates 1, 2, 4, 7, 9, 11 — predicted positive: 1, 2, 4, 7, 11 = 5 out of 6. Selection rate = 83.3%. 35+: Candidates 3, 5, 6, 8, 10, 12 — predicted positive: 5 = 1 out of 6. Selection rate = 16.7%.
(b) Disparate impact ratio (age) = 16.7% / 83.3% = 0.200 (20%). This dramatically fails the four-fifths rule.
(c) Four-Year Degree: Candidates 1, 2, 5, 7, 10, 11 — predicted positive: 1, 2, 5, 7, 11 = 5 out of 6. Selection rate = 83.3%. Other: Candidates 3, 4, 6, 8, 9, 12 — predicted positive: 4 = 1 out of 6. Selection rate = 16.7%.
(d) Disparate impact ratio (education) = 16.7% / 83.3% = 0.200 (20%). Also fails the four-fifths rule severely.
(e) Both dimensions show identical disparate impact ratios (0.200), both severely failing the four-fifths rule. In this small sample, age and education are highly correlated — younger candidates tend to have four-year degrees. Neither dimension passes the four-fifths threshold of 0.80, and both are far below it, indicating systematic bias in the model.
Chapter 26: Fairness, Explainability, and Transparency
Exercise 26.7 — Interpreting SHAP Values
(a) Sum of SHAP values: 0.18 + 0.14 + 0.09 + 0.07 + (-0.03) + (-0.05) + 0.03 = 0.43. Base value + SHAP sum = 0.35 + 0.43 = 0.78. This equals the predicted churn probability (0.78), confirming additivity. SHAP values always sum to the difference between the prediction and the base value.
(b) Plain-language narrative for a customer service representative: "This customer has a high churn risk (78%) driven primarily by three factors: they have cut their purchasing frequency in half over the past month, they haven't made a purchase in 42 days, and they have contacted support three times recently. On the positive side, they have been a customer for three years, which suggests loyalty that could be re-engaged. A personalized outreach with a relevant offer could address their declining engagement."
(c) If purchase frequency increased from 0.5 to 3.0, we would expect the SHAP contribution to shift from positive (+0.18, pushing toward churn) to negative (pushing away from churn), because high purchase frequency is associated with retention. However, the exact new SHAP value depends on interactions with other features — SHAP values are context-dependent, not linear functions of the feature value.
(d) The interpretation is incorrect. A SHAP value of -0.03 means that for this specific customer, average order value pushes the prediction slightly toward non-churn. It does not mean the feature is unimportant globally. The feature might have large SHAP values for other customers. Global feature importance (mean absolute SHAP) provides a better picture of a feature's overall importance across all predictions.
Exercise 26.10 — Writing a Model Card (selected sections)
Model Details: Fine-tuned BERT classifier for hotel review sentiment (positive/neutral/negative). Trained on 50,000 labeled reviews from TripAdvisor, Booking.com, and Google Reviews. Model architecture: bert-base-uncased with a classification head. Last updated: [date]. Maintained by: CX Analytics Team.
Ethical Considerations: The model was trained primarily on English-language data, resulting in significantly lower accuracy for French reviews (71% vs. 92% for English). This means French-speaking customers' negative reviews are more likely to be misclassified and may not receive timely follow-up. Additionally, the model may exhibit biases present in the training platforms — for example, TripAdvisor skews toward leisure travelers, potentially underrepresenting business traveler sentiment patterns. The model should not be used for employee performance evaluation based on review sentiment, as individual reviews may reflect factors outside an employee's control.
Caveats and Recommendations: (1) Do not deploy for French-language reviews until accuracy reaches at least 85% — use human review for French reviews in the interim. (2) Monitor for performance degradation over time as language patterns and guest expectations evolve. (3) Implement quarterly re-evaluation on a held-out test set stratified by language. (4) Negative classifications should trigger human review before any customer-facing action is taken.
Chapter 27: AI Governance Frameworks
Exercise 27.1 — Governance Framework Components
The five pillars of an AI governance framework:
- Accountability: Clear ownership of every AI system — who approved it, who monitors it, who is responsible when it fails.
- Transparency: Documentation of what the model does, how it was built, what data it uses, and what its known limitations are.
- Fairness: Systematic bias testing before deployment and ongoing monitoring for discriminatory outcomes.
- Privacy: Compliance with data protection regulations and minimization of personal data usage.
- Safety and Security: Protection against adversarial attacks, prompt injection, and unintended harmful outputs.
Exercise 27.5 — Risk Tiering
| Tier | Criteria | Examples | Governance Requirements |
|---|---|---|---|
| Tier 1 (Low) | Internal analytics, no individual decisions | Dashboard reports, aggregate forecasts | Standard documentation, annual review |
| Tier 2 (Medium) | Influences but does not automate decisions about people | Lead scoring, product recommendations | Bias audit, human review of edge cases, quarterly monitoring |
| Tier 3 (High) | Automates decisions directly affecting individuals | Hiring screening, credit scoring, medical diagnosis | Full bias audit, external review, model card, impact assessment, continuous monitoring, escalation paths |
Chapter 28: AI Regulation — Global Landscape
Exercise 28.1 — Regulatory Comparison
| Dimension | EU AI Act | US (Sector-Based) | China |
|---|---|---|---|
| Approach | Risk-based, horizontal regulation | Sector-specific, fragmented | State-directed, algorithm registration |
| Scope | All AI systems sold or used in EU | Varies by sector (FDA, SEC, EEOC) | All AI serving Chinese users |
| Enforcement | National authorities + EU AI Office | Sector regulators + state AGs | Cyberspace Administration of China |
| Key requirement | Risk classification and conformity assessment | Varies (e.g., FDA pre-market approval for medical AI) | Algorithm registration and transparency |
| Business impact | Compliance burden proportional to risk tier | Compliance varies by industry | Must register recommendation algorithms |
Exercise 28.5 — Compliance Readiness Assessment
For a US-based fintech company expanding to Europe, the top three compliance actions are: (1) Classify all AI systems by EU AI Act risk tier — credit scoring likely qualifies as "high-risk," requiring conformity assessment. (2) Implement documentation requirements: technical documentation, data provenance records, and logging of automated decisions per GDPR Article 22. (3) Establish a human review process for any AI-driven credit decisions that significantly affect individuals, ensuring the right to explanation is supported.
Chapter 29: Privacy, Security, and AI
Exercise 29.1 — Privacy by Design Principles
The seven foundational principles applied to a customer recommendation system:
- Proactive not reactive: Build privacy protections into the recommendation algorithm from the start — do not wait for a privacy complaint.
- Privacy as the default: Customers should be opted out of personalized recommendations by default, with clear opt-in.
- Privacy embedded in design: Use techniques like differential privacy or federated learning to generate recommendations without centralizing raw user data.
- Full functionality: Provide quality recommendations even for users who limit data sharing, using content-based methods rather than requiring full behavioral tracking.
- End-to-end security: Encrypt recommendation data in transit and at rest; limit retention of behavioral data.
- Visibility and transparency: Clearly explain what data is used for recommendations and provide a "why was this recommended" explanation.
- Respect for user privacy: Allow users to delete their recommendation history and reset their profile.
Chapter 30: Responsible AI in Practice
Exercise 30.1 — Responsible AI Maturity Assessment
For an organization at Level 2 (Defined) on the responsible AI maturity model:
- Current state: The organization has written AI ethics principles and assigned responsibility for responsible AI, but practices are inconsistent across teams.
- Next step to Level 3 (Implemented): Embed responsible AI practices into the ML development lifecycle — require bias audits as a gate in the deployment pipeline, not as an optional review.
- Key investment: Train all data scientists in bias detection and fairness metrics (Chapter 25–26 content) and integrate the
BiasDetectorandExplainabilityDashboardtools into standard workflows.
Chapter 31: AI Strategy for the C-Suite
Exercise 31.1 — Strategic AI Positioning
Using Porter's Five Forces framework to analyze AI's strategic impact on the retail industry:
- Threat of new entrants: AI lowers barriers in some areas (e.g., no-code tools enable small retailers to deploy personalization) but raises them in others (data network effects create scale advantages for incumbents with more customer data).
- Supplier power: AI reduces supplier power by enabling better demand forecasting and inventory optimization, giving retailers more negotiating leverage through reduced dependency on just-in-time supplier relationships.
- Buyer power: AI-powered personalization increases switching costs for customers (recommendations improve with usage), reducing buyer power. However, AI also enables price comparison and transparency, potentially increasing buyer power.
- Threat of substitutes: AI enables entirely new shopping experiences (visual search, conversational commerce) that could substitute for traditional retail.
- Competitive rivalry: AI intensifies rivalry by enabling faster response to competitive moves (real-time pricing, dynamic promotions) and creating a data arms race.
Chapter 32: Building and Managing AI Teams
Exercise 32.3 — AI Team Structure
For a company at Stage 2 (Experimentation) with two data scientists, the recommended team structure is a centralized model (Center of Excellence):
- Rationale: With only two data scientists, distributing them across business units would leave each unit with insufficient coverage. A centralized team ensures knowledge sharing, consistent practices, and efficient resource allocation.
- Transition plan: As the team grows to 6–8 members, transition to a hub-and-spoke model where the central team maintains standards and infrastructure while embedded analysts work directly with business units.
Exercise 32.5 — Hiring Assessment
For a "10x ML engineer" job posting that lists 25 required technologies:
The posting is problematic because it describes a unicorn that does not exist. No single person is expert in deep learning, MLOps, distributed systems, front-end development, and business strategy simultaneously. The posting will either receive no qualified applicants or attract overconfident candidates who claim expertise they do not have. Better approach: define the three most critical skills for the role's first six months and hire for those, with a growth plan for acquiring additional skills.
Chapter 33: AI Product Management
Exercise 33.1 — AI Product Requirements
A key difference between AI product management and traditional product management is non-deterministic behavior. A traditional software product produces the same output every time for the same input. An AI product may produce different outputs (recommendations, predictions, generated text) for the same input depending on model updates, data drift, or stochastic elements. This means AI product managers must specify acceptable output ranges and define monitoring strategies for when outputs deviate from expectations, rather than specifying exact output specifications.
Exercise 33.5 — User Story for AI Features
"As a customer service manager, I want the AI ticket router to correctly categorize at least 90% of incoming tickets so that my agents spend less time on manual triage and more time resolving customer issues."
Acceptance criteria: (1) Accuracy >= 90% measured weekly on a random sample of 200 tickets. (2) Tickets with model confidence below 70% are routed to a human triage queue rather than auto-assigned. (3) Response latency <= 2 seconds from ticket submission to routing. (4) Dashboard shows daily routing accuracy and confidence distribution.
Chapter 34: Measuring AI ROI
Exercise 34.5 — TCO Calculation
(a) 5-Year TCO = Development + Deployment + (Annual Ops x 5) + Retirement = $680,000 + $240,000 + ($120,000 x 5) + $45,000 = $1,565,000
(b) TCO Multiplier = 5-Year TCO / Initial Development Cost = $1,565,000 / $680,000 = 2.30x — the true cost is 2.3 times the initial development estimate. This is consistent with the chapter's finding that TCO multipliers typically range from 2x to 4x.
(c) NPV at 10% discount rate with $500,000 annual value:
Year 0: -$920,000 (development + deployment) Year 1: ($500,000 - $120,000) / 1.10 = $345,455 Year 2: $380,000 / 1.21 = $314,050 Year 3: $380,000 / 1.331 = $285,500 Year 4: $380,000 / 1.4641 = $259,545 Year 5: ($380,000 - $45,000) / 1.6105 = $208,011
NPV = -$920,000 + $345,455 + $314,050 + $285,500 + $259,545 + $208,011 = $492,561
(d) Break-even annual value: Set NPV = 0 and solve for annual value V.
-$920,000 + (V - $120,000) x PV annuity factor (5 years, 10%) - $45,000/1.6105 = 0
PV annuity factor = 3.7908. So: (V - $120,000) x 3.7908 = $920,000 + $27,941 = $947,941. V - $120,000 = $250,055. **V = $370,055/year** — the project needs to generate at least approximately $370,000 in annual value to break even over five years.
Exercise 34.16 — AIROICalculator Setup (Python)
class AIROICalculator:
"""Simplified AI ROI calculator for business cases."""
def __init__(self, project_name: str,
time_horizon_years: int = 5,
discount_rate: float = 0.10):
self.project_name = project_name
self.horizon = time_horizon_years
self.discount_rate = discount_rate
self.costs = {}
self.value_streams = []
def add_cost(self, name: str, amount: float,
category: str = 'development'):
self.costs[name] = {
'amount': amount, 'category': category
}
def add_value_stream(self, name: str,
annual_value: float,
confidence: float = 1.0,
ramp_months: int = 0):
self.value_streams.append({
'name': name,
'annual_value': annual_value,
'confidence': confidence,
'ramp_months': ramp_months
})
def calculate_npv(self):
# Total upfront costs
upfront = sum(
c['amount'] for c in self.costs.values()
if c['category'] in ('development', 'deployment')
)
annual_ops = sum(
c['amount'] for c in self.costs.values()
if c['category'] == 'operations'
)
retirement = sum(
c['amount'] for c in self.costs.values()
if c['category'] == 'retirement'
)
# Annual value (confidence-weighted)
annual_value = sum(
vs['annual_value'] * vs['confidence']
for vs in self.value_streams
)
npv = -upfront
for year in range(1, self.horizon + 1):
net = annual_value - annual_ops
if year == self.horizon:
net -= retirement
npv += net / (1 + self.discount_rate) ** year
return npv
def executive_summary(self):
total_cost = sum(c['amount'] for c in self.costs.values())
annual_value = sum(
vs['annual_value'] * vs['confidence']
for vs in self.value_streams
)
npv = self.calculate_npv()
print(f"=== {self.project_name} — Executive Summary ===")
print(f"Total Investment: ${total_cost:,.0f}")
print(f"Annual Value (confidence-weighted): "
f"${annual_value:,.0f}")
print(f"NPV ({self.horizon}-year, "
f"{self.discount_rate:.0%} discount): "
f"${npv:,.0f}")
print(f"Simple ROI: "
f"{(annual_value * self.horizon - total_cost) "
f"/ total_cost:.1%}")
# Setup for airline dynamic pricing project
calc = AIROICalculator("Airline Dynamic Pricing",
time_horizon_years=5,
discount_rate=0.10)
# Add costs
calc.add_cost("Development", 1_200_000,
category="development")
calc.add_cost("Deployment", 400_000,
category="deployment")
calc.add_cost("Annual Operations", 180_000,
category="operations")
calc.add_cost("Retirement", 60_000,
category="retirement")
# Add value streams
calc.add_value_stream("Revenue Increase", 5_500_000,
confidence=0.70, ramp_months=6)
calc.add_value_stream("Analyst Cost Savings", 800_000,
confidence=0.85, ramp_months=3)
calc.add_value_stream("Competitive Data Value", 600_000,
confidence=0.40, ramp_months=12)
calc.executive_summary()
Chapter 35: Change Management for AI
Exercise 35.1 — ADKAR Analysis
For Athena's supply chain team adopting AI-powered demand forecasting:
| ADKAR Element | Assessment | Action |
|---|---|---|
| Awareness (of need to change) | Medium — team knows forecasting could improve but underestimates AI's potential | Share specific examples of forecast error costs and competitor adoption |
| Desire (to participate) | Low — planners fear being replaced by algorithms | Emphasize that AI handles routine forecasts, freeing planners for strategic analysis |
| Knowledge (of how to change) | Low — team has no ML experience | Three-tier training: AI awareness (all), dashboard literacy (planners), model feedback (leads) |
| Ability (to implement) | Medium — requires new tools and workflows | Phased rollout: AI recommendations alongside existing process for 3 months |
| Reinforcement (to sustain) | Not yet addressed | Celebrate early wins publicly, track and share accuracy improvements weekly |
The weakest element is Desire — addressing fear of replacement is the single most important change management investment for this team.
Chapter 36: Industry Applications of AI
Exercise 36.1 — Industry AI Maturity
| Industry | AI Maturity | Highest-Value Application | Key Barrier |
|---|---|---|---|
| Financial Services | High (Stage 3-4) | Fraud detection, algorithmic trading | Regulatory constraints on model explainability |
| Healthcare | Medium (Stage 2-3) | Clinical decision support, drug discovery | Data interoperability (EHR fragmentation), regulatory approval |
| Manufacturing | Medium (Stage 2-3) | Predictive maintenance, quality inspection | Legacy equipment lacking sensors, shop floor connectivity |
| Retail | Medium-High (Stage 3) | Demand forecasting, personalization | Data silos between online and offline channels |
| Agriculture | Low (Stage 1-2) | Precision agriculture, yield prediction | Connectivity in rural areas, farmer technology adoption |
Chapter 37: Emerging AI Technologies
Exercise 37.1 — Technology Readiness Assessment
For AI agents (autonomous multi-step AI systems):
- Current state: Capable of multi-step tool use for well-defined tasks (research, data analysis, code generation) but unreliable for high-stakes autonomous decisions.
- Business readiness: Medium — suitable for internal productivity tools with human oversight, not yet ready for customer-facing autonomous operation.
- Timeline to mainstream enterprise adoption: 2–4 years for supervised agent workflows; 5+ years for fully autonomous agents in high-stakes domains.
- Key risk: Agent systems can take irreversible actions (sending emails, making purchases, modifying databases) based on incorrect reasoning, and the multi-step nature makes errors harder to detect and reverse.
Chapter 38: AI, Society, and the Future of Work
Exercise 38.1 — Job Impact Analysis
Using the task-based framework from the chapter to analyze the role of "Marketing Analyst":
| Task | % of Role | AI Augmentation Potential | AI Automation Potential |
|---|---|---|---|
| Data collection and cleaning | 25% | High (automated pipelines) | High (80% automatable) |
| Report generation | 20% | High (LLM-assisted drafting) | Medium (60% automatable) |
| Statistical analysis | 15% | High (AutoML, AI assistants) | Medium (50% automatable) |
| Insight interpretation | 20% | Medium (AI suggests patterns) | Low (requires domain knowledge) |
| Stakeholder presentation | 10% | Medium (slide generation) | Low (requires persuasion, judgment) |
| Strategic recommendations | 10% | Low (AI provides inputs) | Very Low (requires business context) |
Net assessment: The marketing analyst role will not disappear but will transform. Approximately 40-50% of current tasks can be automated, shifting the role from data wrangling to insight interpretation and strategic communication. Analysts who embrace AI tools will be significantly more productive; those who resist will find their manual data skills increasingly commoditized.
Chapter 39: Capstone — AI Transformation Plan
Exercise 39.2 — AI Maturity Assessment (Python)
import numpy as np
class AIMaturityAssessment:
"""Assess organizational AI maturity across six dimensions."""
DIMENSIONS = [
'strategy', 'data', 'technology',
'talent', 'governance', 'culture'
]
def __init__(self, org_name: str):
self.org_name = org_name
self.current_scores = {}
self.target_scores = {}
def set_current(self, **scores):
"""Set current maturity scores (1-5 scale)."""
for dim in self.DIMENSIONS:
if dim in scores:
assert 1 <= scores[dim] <= 5, (
f"{dim} must be 1-5"
)
self.current_scores[dim] = scores[dim]
def set_target(self, **scores):
"""Set target maturity scores (1-5 scale)."""
for dim in self.DIMENSIONS:
if dim in scores:
assert 1 <= scores[dim] <= 5
self.target_scores[dim] = scores[dim]
def gap_analysis(self):
"""Identify dimensions with largest gaps."""
gaps = {}
for dim in self.DIMENSIONS:
current = self.current_scores.get(dim, 1)
target = self.target_scores.get(dim, current)
gaps[dim] = target - current
sorted_gaps = sorted(
gaps.items(), key=lambda x: x[1], reverse=True
)
print(f"\n=== Gap Analysis: {self.org_name} ===")
print(f"{'Dimension':<15} {'Current':>8} {'Target':>8} "
f"{'Gap':>6}")
print("-" * 40)
for dim, gap in sorted_gaps:
print(f"{dim:<15} {self.current_scores.get(dim,1):>8} "
f"{self.target_scores.get(dim,1):>8} "
f"{gap:>6}")
return sorted_gaps
def overall_maturity(self):
"""Calculate overall maturity level."""
if not self.current_scores:
return 1.0
avg = np.mean(list(self.current_scores.values()))
return round(avg, 1)
def summary(self):
"""Print executive summary."""
overall = self.overall_maturity()
print(f"\n=== AI Maturity Assessment: "
f"{self.org_name} ===")
print(f"Overall Maturity: {overall}/5.0")
if overall < 2.0:
level = "Stage 1: Awareness"
elif overall < 3.0:
level = "Stage 2: Experimentation"
elif overall < 4.0:
level = "Stage 3: Systematic"
elif overall < 4.5:
level = "Stage 4: Transformative"
else:
level = "Stage 5: Pioneering"
print(f"Maturity Level: {level}")
print(f"\nDimension Scores:")
for dim in self.DIMENSIONS:
score = self.current_scores.get(dim, 'N/A')
print(f" {dim:<15}: {score}")
# Example usage for a mid-size retailer
assessment = AIMaturityAssessment("Athena Retail Group")
assessment.set_current(
strategy=3, data=2, technology=3,
talent=2, governance=1, culture=2
)
assessment.set_target(
strategy=4, data=4, technology=4,
talent=3, governance=3, culture=3
)
assessment.summary()
assessment.gap_analysis()
# Output:
# Overall Maturity: 2.2/5.0
# Maturity Level: Stage 2: Experimentation
#
# Gap Analysis:
# governance 1 3 2
# data 2 4 2
# strategy 3 4 1
# technology 3 4 1
# talent 2 3 1
# culture 2 3 1
#
# Largest gaps are in governance and data — the foundation
# must be strengthened before scaling AI initiatives.
The gap analysis reveals that governance and data are the two dimensions with the largest gaps. This is consistent with the textbook's recurring theme that data quality and governance are prerequisites for successful AI adoption — not afterthoughts. The transformation plan should prioritize closing these foundational gaps in Phase 1 before investing heavily in advanced technology or talent.
Chapter 40: Leading in the AI Era
Exercise 40.1 — Personal AI Leadership Framework
A strong answer to "What kind of AI leader do I want to become?" should integrate three dimensions:
-
Technical fluency — Not expert-level coding, but the ability to ask the right questions of technical teams, evaluate AI vendor claims, and understand the difference between a proof-of-concept and a production system. (Chapters 1–12)
-
Ethical grounding — A personal framework for navigating bias, fairness, privacy, and the societal impact of AI decisions. The willingness to slow down or halt a project that delivers business value but causes harm. (Chapters 25–30)
-
Strategic vision — The ability to connect AI capabilities to business strategy, build organizational capacity for AI adoption, and communicate AI's value and limitations to boards, regulators, and the public. (Chapters 31–39)
The most effective AI leaders are not the most technical — they are the ones who can hold all three dimensions in mind simultaneously and make decisions that balance commercial value, technical feasibility, and human impact.
As Professor Okonkwo tells the class in the final lecture: "You now know enough to be dangerous. Use that knowledge responsibly. The organizations you lead will build AI systems that affect millions of people. That is not a technical responsibility. It is a human one."
This appendix provides solutions to approximately 160 selected exercises across all 40 chapters. For exercises requiring extensive original research, dataset analysis, or written deliverables (case analyses, strategic proposals, capstone components), representative frameworks and model answers are provided rather than exhaustive responses. Complete code solutions are available in the online supplement.