Chapter 33 Exercises: Introduction to Machine Learning for Business

These exercises span five tiers of difficulty. Complete them in order — each tier builds on the previous one.

Tier 1: Concept Comprehension (No Coding Required)

These exercises check your understanding of the core ideas in Chapter 33.

Exercise 1.1 — ML Type Classification

For each business problem below, identify whether it is best suited to (a) supervised classification, (b) supervised regression, (c) unsupervised clustering, (d) anomaly detection, or (e) not an ML problem at all. Explain your reasoning in one sentence.

Predicting whether a loan applicant will default within 12 months
Grouping 10,000 customers into behavior-based segments with no predefined categories
Forecasting next month's total revenue
Determining why customer satisfaction scores dropped last quarter
Flagging unusual transactions in a payment system in real time
Deciding the optimal price for a new product
Automatically routing incoming support emails to the correct team
Calculating average order value by region and product category

Exercise 1.2 — The Base Rate Problem

A marketing team tells you their new lead-scoring model achieves 92% accuracy. Before congratulating them, you ask to see the class distribution. You learn that 91% of leads in the training set did not convert.

a) What accuracy would a naive model achieve by always predicting "will not convert"? b) What is the actual improvement the model provides over the naive baseline? c) What metric would better capture the model's ability to find converting leads? d) What does this tell you about using accuracy as a primary metric for imbalanced classification problems?

Exercise 1.3 — Overfitting Diagnosis

A data scientist reports the following evaluation results for a churn prediction model:

Training set accuracy: 97.3%
Test set accuracy: 71.4%

a) What does this gap tell you about the model? b) What is the technical term for this problem? c) Name three strategies the data scientist could use to address it. d) Which is more trustworthy — the training set accuracy or the test set accuracy? Why?

Exercise 1.4 — ML Workflow Sequencing

The following steps of the ML workflow are listed out of order. Arrange them in the correct sequence and briefly explain why that order matters.

Train the model on the training data
Define the success criteria and evaluation metrics
Deploy the model and set up monitoring
Collect and clean the data
Split the data into training and test sets
Frame the business problem as an ML problem
Evaluate the model on the test set
Identify the features and the label

Exercise 1.5 — The Simpler Solution Test

For each scenario below, decide whether ML is warranted or whether a simpler solution should be tried first. Justify your answer.

A retailer wants to know which of its 500 products to put on sale each week. Historical sales data is available but no one has ever analyzed it systematically.
A B2B company wants to automatically flag support tickets mentioning competitor names. They have 50,000 historical tickets and a labeling budget.
A 10-person startup wants to predict which users will convert from a free to a paid plan. They have 200 total users and 40 conversions to date.
A bank with 2 million customers wants to proactively identify customers who will likely overdraft their accounts, so it can offer an overdraft protection product.

Tier 2: Terminology and Metrics (Light Coding)

Exercise 2.1 — Confusion Matrix Math

Given the following confusion matrix for a fraud detection model:

                Predicted: Not Fraud    Predicted: Fraud
Actual: Not Fraud       9,420                  180
Actual: Fraud             85                   315

Calculate: a) Total predictions b) Accuracy c) Precision (for "Fraud" class) d) Recall (for "Fraud" class) e) F1 Score (for "Fraud" class) f) What percentage of actual fraud cases did the model miss? g) In a fraud detection context, is missing fraud cases (false negatives) or incorrectly flagging legitimate transactions (false positives) more costly? How does this affect which metric you optimize for?

Exercise 2.2 — Cross-Validation vs. Train/Test Split

Write a short Python function that: 1. Accepts a scikit-learn estimator, feature matrix X, and label vector y 2. Performs both a simple 80/20 train/test split evaluation AND 5-fold cross-validation 3. Prints both results, including the standard deviation of the CV scores 4. Prints a recommendation on which estimate to trust more, given the dataset size

Test your function with LogisticRegression() on any suitable dataset.

Exercise 2.3 — Metric Selection

For each scenario, choose the single most appropriate evaluation metric from the list: [Accuracy, Precision, Recall, F1, ROC AUC, RMSE, MAE, R²]. Explain your choice.

Predicting whether an email is spam or not spam. Roughly equal numbers of each class.
Predicting whether a patient has a rare disease. 1% of patients actually have it.
Predicting next week's unit sales. Large overestimates are especially costly (leads to overproduction).
Predicting which customers to offer a free upgrade to. Offering upgrades to everyone is not feasible — you need to be selective.
Explaining to a stakeholder what percentage of the variation in quarterly revenue your model captures.

Exercise 2.4 — Feature vs. Label Identification

For each ML problem below, identify (a) the features, (b) the label, and (c) whether it is classification or regression.

Predicting employee attrition: you have HR records including tenure, salary, manager rating, last promotion date, and whether the employee left within 12 months.
Predicting a house's sale price: you have square footage, number of bedrooms, zip code, age of the house, lot size, and the actual sale price from historical transactions.
Predicting whether a marketing email will be opened: you have the subject line length, sending time, recipient's past open rate, email category, and whether the last 100,000 emails were opened or not.

Tier 3: Applied Coding (scikit-learn Fundamentals)

Exercise 3.1 — Your First Pipeline

Using scikit-learn's built-in load_breast_cancer() dataset (a binary classification problem):

Load the dataset and convert it to a pandas DataFrame
Perform a stratified 80/20 train/test split
Build a Pipeline with StandardScaler and LogisticRegression
Train the pipeline and evaluate on the test set
Print: accuracy, ROC AUC, and a full classification report
Perform 5-fold cross-validation and report mean and standard deviation of ROC AUC
Comment on whether the test set performance is consistent with cross-validation performance

Exercise 3.2 — Data Leakage Demonstration

This exercise demonstrates data leakage and why it matters.

Load the load_wine() dataset from scikit-learn (multiclass classification).
Incorrect approach: Fit a StandardScaler on the entire dataset (X), then split into train/test. - Train a LogisticRegression on the scaled training data. - Evaluate on the scaled test data. - Record the accuracy.
Correct approach: Fit the StandardScaler only on training data, then transform both sets. - Train the same model. - Evaluate on the correctly scaled test data. - Record the accuracy.
Compare the two accuracy scores. Are they different? Why does data leakage usually lead to an optimistic performance estimate?

Exercise 3.3 — Evaluating a Classifier on Imbalanced Data

Generate a synthetic imbalanced classification dataset using sklearn.datasets.make_classification with weights=[0.9, 0.1] (90% class 0, 10% class 1).

Split into train/test (stratified).
Train a LogisticRegression on the training set.
Evaluate accuracy on the test set. Comment on why the number is misleading.
Compute precision, recall, and F1 for the minority class.
Compute ROC AUC. Comment on whether this metric is more informative.
Plot the ROC curve and label the AUC value.

Tier 4: Business Application (End-to-End Problems)

Exercise 4.1 — Employee Attrition Prediction

Using the IBM HR Analytics Employee Attrition dataset (available on Kaggle or via several open-data portals):

Load the dataset and perform EDA: distributions, missing values, class balance.
Frame the ML problem: what is the label? What are the features? What is the business question?
Engineer at least two new features with business rationale.
Build a complete pipeline: preprocessing + LogisticRegression.
Evaluate with cross-validation and on a held-out test set.
Identify the top 5 predictors based on model coefficients.
Interpret the results in plain English for an HR director audience.
List at least two questions the model cannot answer.

Exercise 4.2 — The Baseline Challenge

This exercise forces you to think carefully about baselines before training.

Using any classification dataset of your choice:

Calculate the following baselines: - Majority class classifier (always predict the most common class) - Random classifier (predict each class with its base rate probability) - Stratified random classifier (sklearn's DummyClassifier(strategy="stratified"))
Train a LogisticRegression model.
Compare all results using precision, recall, F1, and ROC AUC.
Write a one-paragraph "model performance memo" a non-technical manager could understand, stating clearly how much better the model is than doing nothing.

Exercise 4.3 — Demand Forecasting (Regression)

Create a synthetic monthly sales dataset with the following features: - Month number (1–36, representing 3 years) - Is holiday month (binary: November and December = 1) - Marketing spend (random, in thousands) - Sales (derived from a formula with noise)

Create this dataset using numpy, then convert to a DataFrame.
Build a linear regression model to predict sales.
Use a time-based split (train on months 1–24, test on months 25–36) rather than a random split. Explain why this matters for time-series data.
Evaluate with RMSE and R². Interpret both in business terms.
Plot actual vs. predicted sales for the test period.

Tier 5: Critical Thinking and Synthesis

Exercise 5.1 — The ML Audit

Your company has deployed a customer churn model six months ago. You have been asked to audit it. Write a structured one-page audit plan covering:

What performance metrics you would check, and how you would compute them on recent data
How you would detect whether the model's predictive power has degraded since deployment (model drift)
What business outcomes you would link to the model's predictions to assess real-world impact
Two scenarios that would cause you to recommend retraining
One scenario that would cause you to recommend decommissioning the model entirely

Exercise 5.2 — The Case Against ML

You are a data analyst at a company that wants to build an ML model to predict which sales territories will underperform next quarter. Write a one-page memo arguing against building this model, addressing:

Data requirements the model needs that may not be met
The simpler alternative and what it would achieve
The maintenance cost of deploying and monitoring a model
The risk of the model being wrong and what business damage that causes
What conditions would change your recommendation

Exercise 5.3 — Design a Complete ML Problem

Choose a business problem from your own work experience or from a domain you know well. Write a complete problem framing document covering all five questions from Section 33.3:

What exactly are you predicting? (Be specific — include the time window and the exact definition of the label)
What data do you have and how would you construct the label?
What features would you engineer and why?
How would the output be used in practice?
What does success look like? (Define at least two numeric thresholds)

Then identify the single biggest risk to this project succeeding and how you would mitigate it.