Chapter 33 Key Takeaways: Introduction to Machine Learning for Business

The One-Line Summary

Machine learning is pattern-finding in historical data — nothing more, nothing less. Its value in business comes from applying those patterns at scale, consistently, and to new cases the pattern-finder has never seen.


Core Concepts

What Machine Learning Is (and Is Not)

  • ML finds patterns in data and applies them to new cases. It is not magic, not AGI, and not a universal solution.
  • ML can only find patterns that exist in the data. No algorithm compensates for missing, incorrect, or irrelevant data.
  • Every ML model makes mistakes. The question is whether the rate and type of errors are acceptable for your business context.

Three Types of ML

  • Supervised learning (most common in business): learn from labeled examples to predict labels for new cases. Classification predicts categories; regression predicts numbers.
  • Unsupervised learning: find structure in unlabeled data. Customer segmentation and anomaly detection are common business applications.
  • Reinforcement learning: learn from reward signals. Less common in standard business analytics but relevant in dynamic pricing and recommendation systems.

The ML Workflow (Five Steps)

  1. Frame the business problem: define the exact prediction target, time window, data requirements, and success criteria before writing any code
  2. Collect and prepare data: this step takes 60–80% of the total time and is where most projects fail
  3. Train a model: start simple; logistic regression and decision trees solve most business problems
  4. Evaluate honestly: use the held-out test set, not the training set; compare against a meaningful baseline
  5. Deploy and monitor: a model in a notebook helps no one; expect performance to degrade over time

The scikit-learn API

Every estimator follows the same interface: - .fit(X, y) — train the model - .predict(X) — generate predictions - .transform(X) — apply a preprocessing transformation - .predict_proba(X) — return class probabilities (classifiers) - .score(X, y) — evaluate with default metric

Use Pipeline to chain preprocessing and modeling steps. Pipelines prevent data leakage by ensuring transformers are fit only on training data.


When ML Is Worth It

ML is worth pursuing when all of these are true: - You have enough labeled historical data (at least several hundred labeled examples; more is better) - The pattern is stable over time - A simpler solution (rules, better reporting, process improvement) does not solve the problem adequately - You have the infrastructure to deploy and maintain a model - The improvement over the simpler solution justifies the complexity

If any of these conditions fail, start with the simpler solution.


The Hype Antidote

  • The base rate problem: always compare model performance to the majority-class baseline. A model that can't beat "always predict the most common outcome" is worthless.
  • The deployment gap: training a model is easy. Getting it into production and keeping it working is the hard part.
  • The interpretability trade-off: complex models often achieve higher accuracy but are harder to explain. In many business contexts, a slightly less accurate model you can explain is more valuable than a black box.

Key Vocabulary

Term Definition
Features Input variables used to make a prediction (also: predictors, X)
Label The variable being predicted (also: target, y, outcome)
Training set Data used to fit (train) the model
Test set Held-out data used only for final evaluation — never seen by the model during training
Overfitting Model performs well on training data but poorly on new data
Underfitting Model performs poorly on both training and new data
Parameters Values the model learns from data
Hyperparameters Settings chosen by the practitioner that control how learning happens
Data leakage Allowing information from the test set to influence training or preprocessing
Cross-validation Using multiple train/test splits to get a more reliable performance estimate

Evaluation Metrics Quick Reference

Classification: - Accuracy: proportion correct — misleading with class imbalance - Precision: of predicted positives, what fraction were correct — optimize when false alarms are costly - Recall: of actual positives, what fraction were caught — optimize when missing cases is costly - F1: harmonic mean of precision and recall - ROC AUC: overall discrimination ability, threshold-independent, good default for binary classification

Regression: - MAE: average absolute error, same units as target, interpretable - RMSE: penalizes large errors more, good when large errors are especially costly - R²: proportion of variance explained, ranges from 0 to 1 (higher is better)


The Two Questions to Ask Before Any ML Project

  1. "What would a non-ML solution achieve here?" — If the gap between the simple solution and ML is small, the simple solution wins.

  2. "What does the model need to be right about, and what does it need not to be wrong about?" — Define the acceptable false positive and false negative rates before you train anything. This determines your success criteria and your evaluation metric.


What Comes Next

Chapter 34 builds on this foundation with hands-on implementation of the four core predictive model families for business: - Linear regression (quantifying relationships, forecasting numbers) - Logistic regression (predicting probabilities, binary classification) - Decision trees (interpretable non-linear classification) - Random forests (higher accuracy, feature importance)

The churn prediction problem framed by Priya in Case Study 33-01 is solved end-to-end in Chapter 34.