Further Reading: Chapter 17

Class Imbalance and Cost-Sensitive Learning


Foundational Papers

1. "SMOTE: Synthetic Minority Over-sampling Technique" --- Chawla, Bowyer, Hall, and Kegelmeyer (2002) The paper that introduced SMOTE and launched an entire subfield. Chawla et al. demonstrated that interpolating between minority-class neighbors produces better generalization than duplicating existing examples, particularly for non-tree-based classifiers. The original paper is clearly written and includes experiments on multiple datasets with varying imbalance ratios. Published in the Journal of Artificial Intelligence Research, Vol. 16. Over 25,000 citations --- one of the most cited papers in machine learning.

2. "ADASYN: Adaptive Synthetic Sampling Approach for Imbalanced Learning" --- He, Bai, Garcia, and Li (2008) The paper that introduced ADASYN, which adaptively generates more synthetic examples in regions where the minority class is harder to learn. ADASYN weights each minority example by a density ratio measuring how many of its neighbors are majority-class. Harder examples get more synthetic neighbors. Published in the IEEE International Joint Conference on Neural Networks. Read this after SMOTE if you want to understand why adaptive synthesis can improve on uniform synthesis in complex boundary regions.

3. "The Relationship Between Precision-Recall and ROC Curves" --- Jesse Davis and Mark Goadrich (2006) The formal proof that PR curves are more informative than ROC curves for imbalanced datasets. Davis and Goadrich showed that dominance in ROC space implies dominance in PR space, but PR curves can reveal differences that ROC curves hide. Essential reading for understanding why AUC-PR is the right primary metric for any minority-class prediction problem. Published in ICML 2006.


Resampling Methods

4. Imbalanced Learning: Foundations, Algorithms, and Applications --- He and Ma, eds. (2013) The most comprehensive book on imbalanced learning. Covers the full taxonomy of resampling methods (over-sampling, under-sampling, hybrid), cost-sensitive approaches, ensemble methods for imbalanced data, and evaluation metrics. Chapters 3-5 on resampling are particularly relevant to this chapter. Chapter 7 on evaluation is a thorough treatment of why accuracy fails and what to use instead.

5. "A Survey of Predictive Modeling on Imbalanced Domains" --- Branco, Torgo, and Ribeiro (2016) A comprehensive survey covering both classification and regression with imbalanced data. The taxonomy of techniques (data-level, algorithm-level, and hybrid) provides a useful framework for choosing your approach. Published in ACM Computing Surveys, Vol. 49, No. 2. The coverage of regression imbalance (rare extreme values) extends the topic beyond the classification focus of most references.

6. "Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning" --- Han, Wang, and Mao (2005) Introduces Borderline-SMOTE, which generates synthetic examples only for minority instances near the decision boundary (those whose k nearest neighbors include majority-class examples). This targeted approach avoids synthesizing in safe regions where the minority class is already well-represented. Published in ICIC 2005. Implemented in imblearn as BorderlineSMOTE.


Cost-Sensitive Learning

7. "Cost-Sensitive Learning and the Class Imbalance Problem" --- Charles Elkan (2001) A foundational tutorial on cost-sensitive learning presented at the KDD workshop. Elkan shows that any cost-sensitive classification problem can be reduced to a threshold-tuning problem if the model produces well-calibrated probabilities. The paper provides the theoretical basis for the threshold-tuning approach emphasized in this chapter: the optimal threshold is the cost ratio divided by one plus the cost ratio. Published in the KDD-2001 Workshop on Learning from Imbalanced Datasets.

8. "The Foundations of Cost-Sensitive Learning" --- Pedro Domingos (1999) Domingos proves that making a classifier cost-sensitive is equivalent to changing the class distribution in the training data. Specifically, adjusting class weights by the cost ratio produces the same decision boundary as resampling to a different class ratio. This result explains why class_weight and resampling often produce similar results. Published in IJCAI 1999.

9. Cost-Sensitive Machine Learning --- Sheng and Ling (2019) A focused monograph on cost-sensitive methods covering cost-sensitive decision trees, cost-sensitive boosting, cost-sensitive neural networks, and cost-sensitive evaluation. Chapters 2-4 provide the theoretical framework, and Chapters 5-8 cover specific algorithms. Useful as a reference when you need to move beyond class_weight to model-specific cost-sensitive implementations.


Threshold Tuning and Decision Theory

10. "The Optimal ROC Curve and the Cost-Sensitive Decision" --- Fawcett (2006) Fawcett's tutorial on ROC analysis includes a detailed treatment of how to select the optimal operating point (threshold) given a cost matrix and class prior. The key insight: the slope of the tangent line to the ROC curve at the optimal point equals the cost ratio times the class ratio. This connects ROC geometry directly to cost-optimal decision-making. Published in Pattern Recognition Letters, Vol. 27, No. 8.

11. "An Introduction to ROC Analysis" --- Tom Fawcett (2006) The most widely cited tutorial on ROC curves, precision-recall curves, and threshold selection. Fawcett covers the geometric interpretation of ROC curves, the relationship between ROC and cost, and practical guidance on choosing operating points. Published in Pattern Recognition Letters, Vol. 27, No. 8. Read this alongside Davis and Goadrich (item 3) to understand both ROC and PR perspectives.

12. "Classifier Technology and the Illusion of Progress" --- David Hand (2006) A provocative paper arguing that much of the apparent progress in classifier performance is an artifact of using improper evaluation measures. Hand introduces the H-measure as an alternative to AUC that does not depend on the (unknown) cost distribution. The paper's core message reinforces this chapter: the evaluation metric you choose determines whether your model appears to work. Published in Statistical Science, Vol. 21, No. 1.


Practical Guides and Libraries

13. imbalanced-learn (imblearn) Documentation The official documentation for the imbalanced-learn library, which extends scikit-learn with resampling methods (SMOTE, ADASYN, Borderline-SMOTE, Tomek links, etc.), ensemble methods for imbalanced data (BalancedRandomForestClassifier, EasyEnsembleClassifier), and pipelines that handle resampling inside cross-validation. The "All Examples" gallery provides code for every technique covered in this chapter. Available at imbalanced-learn.org.

14. Applied Predictive Modeling --- Kuhn and Johnson (2013) Chapter 16, "Remedies for Severe Class Imbalance," is one of the most practical treatments of imbalanced learning in any textbook. Kuhn and Johnson cover cost-sensitive learning, sampling methods, and evaluation metrics with a focus on when each technique helps and when it does not. Their discussion of the interaction between resampling and tree-based models is particularly relevant.

15. Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow --- Aurelien Geron (3rd edition, 2022) Chapter 3 covers precision-recall tradeoffs, threshold tuning, and the confusion matrix as tools for handling imbalanced classification. Geron's step-by-step approach to plotting the PR curve and choosing an operating point is the most accessible code-first introduction to threshold tuning available.


Ensemble Methods for Imbalanced Data

16. "EasyEnsemble and BalanceCascade: Methods for Classifiers Using Random Undersampling" --- Liu, Wu, and Zhou (2009) Introduces two ensemble approaches for imbalanced data. EasyEnsemble trains multiple classifiers, each on a balanced subset created by undersampling the majority class, then combines their predictions. BalanceCascade is a sequential variant that removes correctly classified majority examples at each stage. Both methods reduce the information loss of random undersampling while maintaining its computational efficiency. Published in IEEE Transactions on Systems, Man, and Cybernetics, Vol. 39, No. 2.

17. "SMOTEBoost" --- Chawla, Lazarevic, Hall, and Bowyer (2003) Combines SMOTE with AdaBoost: at each boosting iteration, SMOTE is applied to the reweighted training set before learning the next weak classifier. This embeds resampling into the boosting process rather than applying it as a preprocessing step. Published in the European Conference on Principles of Data Mining and Knowledge Discovery (PKDD). The idea generalizes: resampling can be integrated into any iterative algorithm.


Extreme Imbalance and Rare Events

18. "Learning from Imbalanced Data" --- Haibo He and Edwardo Garcia (2009) A comprehensive survey covering the full spectrum from mild to extreme imbalance. The authors categorize solutions into data-level (resampling), algorithm-level (cost-sensitive, one-class learning), and ensemble-level methods. The discussion of when each approach is appropriate, based on dataset size and imbalance ratio, is the most useful practical guide in the survey literature. Published in IEEE Transactions on Knowledge and Data Engineering, Vol. 21, No. 9.

19. "One-Class Classification: Taxonomy of Study and Review of Techniques" --- Khan and Madden (2014) For extreme imbalance (>1000:1), standard classification approaches may fail entirely because there are too few positive examples for the classifier to learn from. One-class classification --- learning only from the normal class and detecting deviations --- is an alternative. This survey covers isolation forests, one-class SVM, and autoencoders for anomaly detection. Published in Knowledge Engineering Review, Vol. 29, No. 3. Chapter 22 of this book covers anomaly detection in depth.


Industry Applications

20. "Predictive Maintenance Using Machine Learning" --- Carvalho et al. (2019) A survey of predictive maintenance applications with a focus on the imbalance problem. Covers feature engineering from sensor data, survival analysis as an alternative to binary classification, and the practical challenges of deploying failure prediction models in industrial settings. The discussion of alert fatigue and human factors is directly relevant to the TurbineTech case study. Published in Computers in Industry, Vol. 106.

21. "A Review on Customer Churn Prediction in Telecommunications" --- Amin et al. (2019) A meta-analysis of 32 churn prediction studies covering resampling methods, cost-sensitive approaches, and ensemble techniques. The authors find that cost-sensitive learning outperforms resampling in the majority of studies, consistent with the results in this chapter. Published in IEEE Access, Vol. 7. The supplementary tables comparing techniques across datasets provide useful benchmarks.


Video Resources

22. StatQuest with Josh Starmer --- "Dealing With Imbalanced Datasets" A 15-minute visual explanation of the imbalanced data problem, covering why accuracy fails, what SMOTE does, and how class weights work. Starmer's step-by-step visual style makes the geometric intuition behind SMOTE particularly clear. Available on YouTube.

23. Andrew Ng --- "Error Analysis for ML Systems" (Stanford CS229 Lecture) Ng's discussion of error analysis includes practical guidance on handling imbalanced data in production systems. His framework of computing the expected cost of each type of error and using it to choose the operating threshold is the same approach used in this chapter. Available on YouTube via Stanford's course recordings.


How to Use This List

If you read nothing else, read Elkan (item 7) on cost-sensitive learning and threshold tuning, and Davis and Goadrich (item 3) on AUC-PR vs. AUC-ROC. Together they take about 2 hours and will permanently change how you approach imbalanced classification.

If you use SMOTE regularly, read the original SMOTE paper (item 1) to understand the mechanics, then Domingos (item 8) to understand why class weights often achieve the same result with less complexity.

If you work with extreme imbalance (equipment failure, fraud, cybersecurity), read He and Garcia (item 18) for the full landscape and Khan and Madden (item 19) for one-class alternatives.

If you are in a practitioner role, start with the imblearn documentation (item 13) and Kuhn and Johnson (item 14). They provide the most directly actionable guidance.

If you want to understand the theory, start with Elkan (item 7) and Domingos (item 8), then move to Fawcett (item 10) on optimal ROC operating points. Together, these three papers provide the complete theoretical framework for cost-sensitive decision-making under imbalance.


This reading list supports Chapter 17: Class Imbalance and Cost-Sensitive Learning. Return to the chapter to review concepts before diving in.