Index
Alphabetical listing of key topics with chapter references. Bold entries indicate primary coverage; regular entries indicate secondary or brief mention.
A/B testing: 3, 16, 32, 34
Accuracy: 16, 11, 13, 14, 17, 33
Adjusted Rand Index (ARI): 20
Aggregation (SQL): 5, 6, 10
ARIMA: 25, 36
Association rules: 23, 24
AUC: see ROC AUC, PR AUC
Backpropagation: 36, 4
Bagging: 13, 14
Balanced accuracy: 16, 17
Base rate: 16, 17, 33
Baseline model: 2, 11, 16, 35
Batch prediction: 31, 32
Bayesian optimization: 18
Bias (algorithmic): 33, 19, 34, 35
Bias (statistical): 4, 11, 16
Bias-variance tradeoff: 1, 4, 11, 13, 14, 18
BigQuery: 5, 28
Binary classification: 1, 11, 13, 14, 16, 17
Binning: 6, 7
Boosting: 14, 13, 18
Box plot: 1, 6, 8
Business metric: 34, 2, 35
Calibration (model): 16, 33
Cardinality (high): 7, 6, 9
Categorical encoding: 7, 6, 9, 10
CatBoost: 14, 7, 18
Causal inference: 36, 3
Chi-squared test: 9, 3
Class imbalance: 17, 16, 22, 33
Classification report: 16, 17
Clustering: 20, 21, 22
Cohen's kappa: 16
Collaborative filtering: 24, 23
Collinearity: see Multicollinearity
Confidence interval: 3, 16
Confusion matrix: 16, 17, 33, 34
Content-based filtering: 24
Correlation: 6, 9, 11, 21
Cost-benefit analysis: 34, 16, 17
Cost matrix: 17, 16, 34
Cross-validation: 16, 11, 13, 14, 18, 19
CTEs (common table expressions): 5, 10
Curse of dimensionality: 21, 9, 15, 20
Dashboard: 32, 34
Data drift: 32, 31, 35
Data leakage: 2, 5, 6, 10, 16, 25
Data pipeline: 10, 2, 5, 29, 31
Data types (Python): 6, 7, 10, 28
Davies-Bouldin index: 20
DBSCAN: 20, 22
Decision boundary: 12, 13, 15
Decision tree: 13, 14, 19
Deep learning: 36, 26, 28
Demographic parity: 33
Deployment: 31, 29, 30, 32, 35
Difference-in-differences: 36, 3
Dimensionality reduction: 21, 9, 20
Disparate impact: 33
Docker: 31, 29
Drift detection: 32, 31, 35
Dummy variable: see One-hot encoding
Early stopping: 14, 18, 36
EDA (exploratory data analysis): 1, 2, 6, 8
Elastic net: 11, 9, 18
Elbow method: 20
Embedding: 7, 24, 26, 36
Ensemble methods: 13, 14, 18
Entropy: 13, 9, 26
Equalized odds: 33
Equal opportunity: 33
Evaluation metrics: 16, 17, 33, 34
Experiment tracking: 30, 18, 29, 31, 35
Exponential smoothing: 25
F1 score: 16, 17, 18
F-beta score: 16, 17
Fairness: 33, 19, 34, 35
Fairness-accuracy tradeoff: 33, 34
False negative: 16, 17, 33, 34
False positive: 16, 17, 22, 33, 34
FastAPI: 31, 29, 35
Feature engineering: 6, 7, 8, 9, 10, 25, 26, 27, 35
Feature importance: 13, 9, 14, 19
Feature selection: 9, 6, 11, 21
Feature store: 2, 31, 10
Flask: 31
Focal loss: 17
Fraud detection: 17, 22, 16
Gaussian Naive Bayes: 15, 26
Geospatial data: 27, 6
Gini impurity: 13, 9
Git: 29, 10, 30
Gradient boosting: 14, 13, 17, 18, 19, 35
Gradient descent: 4, 11, 14, 36
Grid search: 18, 14, 16
Groupby (pandas): 5, 6, 10
Guardrail metrics: 3, 32
Hierarchical clustering: 20
Histogram-based gradient boosting: 14, 28
Hospital readmission (Metro General anchor): 1, 4, 7, 11, 16, 17, 19, 33, 34, 35
Hyperparameter tuning: 18, 14, 16
Hypothesis testing: 3, 16
Imputation: 8, 7, 10
Impossibility theorem (fairness): 33
Imbalanced data: see Class imbalance
Inference time: 14, 28, 31
Information gain: 13, 9
Interaction features: 6, 11, 14
Interpretation: see Model interpretation
Isolation Forest: 22, 20
JSON: 5, 31, 29
Jupyter notebook: 1, 2, 29, 30
K-means clustering: 20, 21, 22
K-nearest neighbors (KNN): 15, 8, 20
K-fold cross-validation: see Cross-validation
Label encoding: 7, 6
Lasso regression: 11, 9, 18
Latency: 31, 28, 32
Learning curve: 16, 4, 18
Leakage: see Data leakage
LightGBM: 14, 7, 17, 18, 28, 35
LIME: 19, 33
Linear regression: 11, 4, 6, 9
Log loss: 16, 4, 14
Log transformation: 6, 11
Logistic regression: 11, 4, 12, 16, 17
MAE (mean absolute error): 16, 11, 25
Manufacturing predictive maintenance (TurbineTech anchor): 1, 8, 17, 22, 25, 28, 32, 35
MAPE (mean absolute percentage error): 16, 25
Market basket analysis: 23
MASE (mean absolute scaled error): 25
Matthews correlation coefficient: 16
Mean encoding: see Target encoding
Merge (pandas): 5, 6, 10
Minimum detectable effect (MDE): 3
Missing data: 8, 6, 7, 10
MCAR, MAR, MNAR: 8
MLflow: 30, 18, 29, 31, 35
MLOps: 29, 30, 31, 32, 36
Model card: 33, 31, 34
Model comparison: 16, 13, 14, 18
Model decay: 32, 16, 30, 31, 35
Model deployment: see Deployment
Model governance: 33, 34
Model interpretation: 19, 13, 14, 33, 34
Model monitoring: 32, 31, 35
Model registry: 2, 30, 31
Model selection: 16, 13, 14, 18
Multiclass classification: 15, 16, 13, 14
Multicollinearity: 9, 6, 11, 21
Mutual information: 9, 6
Naive Bayes: 15, 26
NDCG (normalized discounted cumulative gain): 24
Neural network: 36, 4, 26
NLP (natural language processing): 26, 21, 36
Normalization: 4, 6, 12, 15
Novelty effect: 3, 32
Null hypothesis: 3, 16
NumPy: 4, 6, 10
Observation unit: 2, 1, 5
One-hot encoding: 7, 6, 9, 13
Ordinal encoding: 7, 6
Outlier detection: see Anomaly detection
Outliers: 6, 8, 22
Overfitting: 1, 4, 13, 14, 16, 18
P-value: 3, 16
PCA (principal component analysis): 21, 9, 20
Pandas: 1, 5, 6, 7, 8, 10, 28, 29
Partial dependence plot (PDP): 19, 14
Permutation importance: 9, 13, 19
Pipeline (scikit-learn): 10, 7, 8, 11, 18, 29, 31
Polars: 28, 10
Polynomial features: 6, 11
PostgreSQL: 5, 10, 28
Power (statistical): 3
PR AUC (precision-recall AUC): 16, 17
Precision: 16, 17, 33, 34
Precision@k: 24
Prediction interval: 25, 16
Predictive maintenance: see Manufacturing predictive maintenance
Predictive parity: 33
Problem framing: 1, 2, 34, 35
Production (ML in): 31, 32, 29, 30, 35
Progressive project (StreamFlow churn): 1, 2, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 16, 17, 18, 19, 29, 30, 31, 32, 33, 34, 35
Prophet (Facebook/Meta): 25
Protected attribute: 33, 34
Pruning (tree): 13, 18
PyTorch: 36
R-squared: 16, 1, 11
Random Forest: 13, 9, 14, 17, 19
Random search: 18, 14
Ranking metrics: 24, 16
Real-time prediction: 31, 28, 32
Recall: 16, 17, 33, 34
Recall@k: 24
Recommender systems: 24, 23
Regularization: 11, 4, 12, 14, 18, 36
Reproducibility: 10, 2, 5, 18, 29, 30, 31
REST API: 31, 29
Retraining: 32, 30, 31
Ridge regression: 11, 4, 9, 18
RMSE (root mean squared error): 16, 11, 25
ROC AUC: 16, 14, 17, 18
ROC curve: 16, 17
ROI of ML: 34, 2, 35
Rollback: 31, 32
Rolling features: 6, 25
SaaS churn (StreamFlow anchor): 1, 2, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 16, 17, 18, 19, 29, 30, 31, 32, 33, 34, 35
Sample size: 3, 16, 17
Scaling (feature): 4, 6, 10, 12, 15, 21
Scikit-learn: 4, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21
Seasonality: 25, 6
SHAP values: 19, 14, 33, 34, 35
ShopSmart (e-commerce anchor): 3, 4, 5, 16, 20, 23, 24, 26, 34
Silhouette score: 20, 21
SMOTE: 17, 16
Software engineering: 29, 10, 30, 31
Spark (PySpark): 28, 5, 36
Specificity: 16, 33
SQL: 5, 10, 28
Stacking: 14, 18
Stakeholder communication: 34, 2, 19, 35
Standardization: see Scaling (feature)
Stationarity: 25
Statistical significance: 3, 16
Stratified sampling: 16, 17
StreamFlow: see SaaS churn
Subgroup analysis: 33, 16, 3
Support vector machine (SVM): 12, 4, 16, 21
Support vectors: 12
Survival analysis: 25, 17
t-SNE: 21, 20
Target encoding: 7, 6, 9
Target variable: 1, 2, 5, 6
Technical debt (ML): 29, 2, 32
TF-IDF: 26, 9, 21
Threshold (classification): 16, 17, 33, 34
Time series: 25, 6, 32, 36
Tokenization: 26
Train-test split: 1, 2, 16
Transformer (architecture): 36, 26
Tree-based methods: 13, 14, 19
True positive rate: see Recall
t-test: 3
TurbineTech: see Manufacturing predictive maintenance
UMAP: 21, 20
Underfitting: 1, 4, 16, 18
Unit testing: 29, 10, 31
Validation set: 16, 2, 18
Variance (bias-variance): see Bias-variance tradeoff
Variance inflation factor (VIF): 9, 11
Version control: see Git
Voting classifier: 13, 18
WAPE: 25
Window functions (SQL): 5, 6, 10, 25
Word embedding: 26, 21, 36
Word2Vec: 26
XGBoost: 14, 7, 13, 17, 18, 19, 28, 35