Further Reading
Chapter 25: Machine Learning in Fraud Detection
Essential Reading
European Banking Authority (2021). Report on Big Data and Advanced Analytics. The EBA's comprehensive analysis of how financial institutions are using machine learning and advanced analytics in credit risk, fraud detection, AML, and customer segmentation. Covers the governance expectations and risk management considerations for AI/ML in financial services. Free at eba.europa.eu. Essential for understanding the regulatory expectations that frame fraud detection ML use.
Federal Reserve (2011). SR 11-7: Guidance on Model Risk Management. The foundational US regulatory guidance on model risk management — covering model development, validation, and ongoing monitoring for all models used in risk management and financial decision-making. While a US document, the principles have been widely adopted globally and are the standard against which fraud detection model governance is assessed. Free at federalreserve.gov.
FCA (2022). AI and Machine Learning — Discussion Paper DP22/4. The FCA's discussion of AI and ML use in financial services — covering explainability, fairness, governance, and regulatory expectations. Essential for understanding the UK regulatory direction for ML-based fraud detection. Free at fca.org.uk.
For Practitioners
Dal Pozzolo, A., Caelen, O., Johnson, R.A., & Bontempi, G. (2015). Calibrating Probability with Undersampling for Unbalanced Classification. IEEE Symposium on Computational Intelligence and Data Mining. A seminal paper on class imbalance handling in fraud detection. Demonstrates that undersampling during training requires probability recalibration for correct threshold setting — a practical finding that affects production fraud model deployment.
Lundberg, S.M., & Lee, S.I. (2017). A Unified Approach to Interpreting Model Predictions. NeurIPS. The original SHAP paper. Provides the theoretical foundation for Shapley-value-based model explanations. Available at arxiv.org. Essential reading for anyone implementing explainability in fraud detection.
Phua, C., Lee, V., Smith, K., & Gayler, R. (2010). A Comprehensive Survey of Data Mining-Based Fraud Detection Research. arXiv:1009.6119. Comprehensive survey of fraud detection approaches across payment fraud, insurance fraud, healthcare fraud, and telecommunications fraud. Provides historical context for the evolution from rules-based to ML-based detection.
Bolton, R.J., & Hand, D.J. (2002). Statistical Fraud Detection: A Review. Statistical Science, 17(3), 235–255. Classic academic treatment of statistical approaches to fraud detection. Establishes the formal framework for class imbalance, behavioral profiling, and network-based detection that underpins modern ML approaches.
Technical References
XGBoost Documentation: xgboost.readthedocs.io — The XGBoost library is the most widely used gradient boosting implementation in production fraud detection. Documentation includes worked examples with tabular data.
LightGBM Documentation: lightgbm.readthedocs.io — Microsoft's LightGBM is faster than XGBoost for large datasets and particularly efficient for fraud detection at scale. Includes categorical feature handling useful for MCC codes.
SHAP Python Library: shap.readthedocs.io — The standard library for SHAP-based model explanation. TreeSHAP for gradient boosted models provides O(T × L × M) computation — fast enough for production use. Includes SHAP waterfall plots, summary plots, and force plots for fraud alert investigation dashboards.
imbalanced-learn: imbalanced-learn.org — Python library implementing SMOTE and other class imbalance handling techniques. Integrates with scikit-learn pipelines.
Scikit-learn: scikit-learn.org — Standard Python ML library with Isolation Forest implementation (sklearn.ensemble.IsolationForest), logistic regression, and model evaluation utilities including precision_recall_curve for threshold selection.
Feast (Feature Store): docs.feast.dev — Open-source feature store for ML. Relevant for the architectural pattern of pre-computing and serving behavioral features at real-time fraud detection latency.
Regulatory Primary Sources
| Document | Jurisdiction | Key Relevance |
|---|---|---|
| GDPR Recital 47 | EU | Legitimate interest for fraud prevention |
| GDPR Article 22 | EU | Right not to be subject to solely automated decisions |
| Data Protection Act 2018 Schedule 2 Para 14 | UK | Financial crime processing exemption from subject access |
| FCA Consumer Duty PS22/9 | UK | Good outcomes requirement; false positive harm |
| ECOA / Regulation B | US | Adverse action reasons for credit decisions |
| SR 11-7 | US | Model risk management for all models |
| EBA Guidelines on Internal Governance | EU | Model risk governance for EBA-supervised firms |
| PRA SS1/23 | UK | Model risk management for major banks and insurers |
For the Curious
Provost, F., & Fawcett, T. (2013). Data Science for Business. O'Reilly Media. Accessible treatment of machine learning for business applications, with extensive discussion of classifier evaluation, precision-recall tradeoffs, and the cost-sensitive learning framework that underlies fraud detection threshold calibration. Chapter 8 (Visualizing Model Performance) is particularly relevant.
Baesens, B., Van Vlasselaer, V., & Verbeke, W. (2015). Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques. Wiley. The definitive practitioner text on fraud analytics. Covers rules-based, supervised, unsupervised, and network-based detection with a financial services focus. Chapters on card fraud specifically relevant to this chapter's content.
Ziegler, A. (2012). A Short Introduction to Boosting. Journal of Japanese Society for Artificial Intelligence. Accessible introduction to boosting algorithms — the family of techniques that produces gradient-boosted trees. Useful background for understanding why GBT consistently outperforms other approaches on tabular fraud data.
Vendor and Industry Resources
UK Finance (annual). Fraud: The Facts. Annual UK Finance report on fraud statistics across UK financial services — card fraud, APP fraud, online banking fraud. Provides the industry context for fraud rates, fraud typologies, and detection effectiveness. Free at ukfinance.org.uk.
Feedzai Research: feedzai.com/resources — Feedzai is a major fraud detection vendor whose research publications cover ML for financial crime. Practitioner-level content on feature engineering, model performance, and regulatory compliance.
Stripe Engineering Blog: stripe.com/blog/engineering — Stripe has published detailed technical posts on its fraud detection architecture (Radar). Particularly useful for understanding real-time scoring architecture and the feature store pattern in production.
Netflix Tech Blog (anomaly detection posts): netflixtechblog.com — While not financial services-specific, Netflix's engineering blog on anomaly detection and streaming feature computation is referenced by financial services fraud teams for its architectural insights. The concepts transfer directly to card fraud detection systems.