Chapter 33 Key Takeaways: Introduction to Machine Learning for Business
The One-Line Summary
Machine learning is pattern-finding in historical data — nothing more, nothing less. Its value in business comes from applying those patterns at scale, consistently, and to new cases the pattern-finder has never seen.
Core Concepts
What Machine Learning Is (and Is Not)
- ML finds patterns in data and applies them to new cases. It is not magic, not AGI, and not a universal solution.
- ML can only find patterns that exist in the data. No algorithm compensates for missing, incorrect, or irrelevant data.
- Every ML model makes mistakes. The question is whether the rate and type of errors are acceptable for your business context.
Three Types of ML
- Supervised learning (most common in business): learn from labeled examples to predict labels for new cases. Classification predicts categories; regression predicts numbers.
- Unsupervised learning: find structure in unlabeled data. Customer segmentation and anomaly detection are common business applications.
- Reinforcement learning: learn from reward signals. Less common in standard business analytics but relevant in dynamic pricing and recommendation systems.
The ML Workflow (Five Steps)
- Frame the business problem: define the exact prediction target, time window, data requirements, and success criteria before writing any code
- Collect and prepare data: this step takes 60–80% of the total time and is where most projects fail
- Train a model: start simple; logistic regression and decision trees solve most business problems
- Evaluate honestly: use the held-out test set, not the training set; compare against a meaningful baseline
- Deploy and monitor: a model in a notebook helps no one; expect performance to degrade over time
The scikit-learn API
Every estimator follows the same interface:
- .fit(X, y) — train the model
- .predict(X) — generate predictions
- .transform(X) — apply a preprocessing transformation
- .predict_proba(X) — return class probabilities (classifiers)
- .score(X, y) — evaluate with default metric
Use Pipeline to chain preprocessing and modeling steps. Pipelines prevent data leakage by ensuring transformers are fit only on training data.
When ML Is Worth It
ML is worth pursuing when all of these are true: - You have enough labeled historical data (at least several hundred labeled examples; more is better) - The pattern is stable over time - A simpler solution (rules, better reporting, process improvement) does not solve the problem adequately - You have the infrastructure to deploy and maintain a model - The improvement over the simpler solution justifies the complexity
If any of these conditions fail, start with the simpler solution.
The Hype Antidote
- The base rate problem: always compare model performance to the majority-class baseline. A model that can't beat "always predict the most common outcome" is worthless.
- The deployment gap: training a model is easy. Getting it into production and keeping it working is the hard part.
- The interpretability trade-off: complex models often achieve higher accuracy but are harder to explain. In many business contexts, a slightly less accurate model you can explain is more valuable than a black box.
Key Vocabulary
| Term | Definition |
|---|---|
| Features | Input variables used to make a prediction (also: predictors, X) |
| Label | The variable being predicted (also: target, y, outcome) |
| Training set | Data used to fit (train) the model |
| Test set | Held-out data used only for final evaluation — never seen by the model during training |
| Overfitting | Model performs well on training data but poorly on new data |
| Underfitting | Model performs poorly on both training and new data |
| Parameters | Values the model learns from data |
| Hyperparameters | Settings chosen by the practitioner that control how learning happens |
| Data leakage | Allowing information from the test set to influence training or preprocessing |
| Cross-validation | Using multiple train/test splits to get a more reliable performance estimate |
Evaluation Metrics Quick Reference
Classification: - Accuracy: proportion correct — misleading with class imbalance - Precision: of predicted positives, what fraction were correct — optimize when false alarms are costly - Recall: of actual positives, what fraction were caught — optimize when missing cases is costly - F1: harmonic mean of precision and recall - ROC AUC: overall discrimination ability, threshold-independent, good default for binary classification
Regression: - MAE: average absolute error, same units as target, interpretable - RMSE: penalizes large errors more, good when large errors are especially costly - R²: proportion of variance explained, ranges from 0 to 1 (higher is better)
The Two Questions to Ask Before Any ML Project
-
"What would a non-ML solution achieve here?" — If the gap between the simple solution and ML is small, the simple solution wins.
-
"What does the model need to be right about, and what does it need not to be wrong about?" — Define the acceptable false positive and false negative rates before you train anything. This determines your success criteria and your evaluation metric.
What Comes Next
Chapter 34 builds on this foundation with hands-on implementation of the four core predictive model families for business: - Linear regression (quantifying relationships, forecasting numbers) - Logistic regression (predicting probabilities, binary classification) - Decision trees (interpretable non-linear classification) - Random forests (higher accuracy, feature importance)
The churn prediction problem framed by Priya in Case Study 33-01 is solved end-to-end in Chapter 34.