Chapter 27: Further Reading
MLOps Foundations
-
Kreuzberger, D., Kuhl, N., & Hirschl, S. (2023). "Machine Learning Operations (MLOps): Overview, Definition, and Architecture." IEEE Access, 11, 31866-31879. A comprehensive academic survey of MLOps, defining the term precisely and mapping out the architectural components. Covers the full lifecycle from data management through model monitoring, with reference architectures and maturity models.
-
Sculley, D., Holt, G., Golovin, D., et al. (2015). "Hidden Technical Debt in Machine Learning Systems." Advances in Neural Information Processing Systems, 28. The landmark Google paper identifying the sources of technical debt in ML systems: data dependency management, configuration debt, feature store requirements, and monitoring needs. Establishes why MLOps is necessary. Essential reading for understanding the maintenance burden of production ML.
-
Amershi, S., Begel, A., Bird, C., et al. (2019). "Software Engineering for Machine Learning: A Case Study." Proceedings of the 41st International Conference on Software Engineering (ICSE), 291-300. Microsoft's experience with ML in production, covering data management, model training, deployment, and monitoring. Practical insights from one of the largest ML deployments in the world.
scikit-learn Pipelines and Feature Engineering
-
scikit-learn User Guide: Pipelines and Composite Estimators. https://scikit-learn.org/stable/modules/compose.html The official documentation for Pipeline, ColumnTransformer, and related classes. Includes detailed examples of combining transformers, parameter naming conventions, and integration with GridSearchCV.
-
Zheng, A., & Casari, A. (2018). Feature Engineering for Machine Learning. O'Reilly Media. Practical guide to feature engineering covering numeric transformations, text features, time-series features, and feature selection. Good complement to the pipeline construction approach in this chapter.
-
Mueller, A. C., & Guido, S. (2016). Introduction to Machine Learning with Python. O'Reilly Media. Accessible introduction to scikit-learn with detailed coverage of pipelines, cross-validation, and model selection. Particularly good on ColumnTransformer usage patterns.
Feature Stores
-
Feast Documentation. https://docs.feast.dev/ The most popular open-source feature store. Documentation covers feature definitions, online/offline serving, point-in-time joins, and integration with ML pipelines. Feast is a good starting point for teams moving beyond ad-hoc feature management.
-
Tecton Documentation. https://docs.tecton.ai/ A managed feature store with strong integration into ML training and serving workflows. Documentation provides useful patterns for feature engineering pipelines, even if you don't use the product.
-
Li, W., Shetty, S., Paleyes, A., & Lawrence, N. D. (2023). "Feature Stores: A Hierarchy of Needs." arXiv:2302.12229. Academic treatment of feature stores, organizing capabilities into a hierarchy from basic storage through point-in-time retrieval, feature monitoring, and automated feature engineering. Useful framework for evaluating feature store implementations.
Experiment Tracking and MLflow
-
MLflow Documentation. https://mlflow.org/docs/latest/ The official MLflow documentation covering the Tracking API, Model Registry, Projects, and Model deployment. The quickstart guide and tutorials are excellent for getting started.
-
Zaharia, M., Chen, A., Davidson, A., et al. (2018). "Accelerating the Machine Learning Lifecycle with MLflow." IEEE Data Engineering Bulletin, 41(4), 39-45. The MLflow paper from Databricks, explaining the motivation for unified experiment tracking and model management. Provides the conceptual framework behind MLflow's design.
-
Weights & Biases Documentation. https://docs.wandb.ai/ An alternative to MLflow for experiment tracking with strong visualization capabilities. Worth comparing to understand different design choices in experiment tracking tools.
Model Monitoring and Drift Detection
-
Rabanser, S., Gunnemann, S., & Lipton, Z. (2019). "Failing Loudly: An Empirical Study of Methods for Detecting Dataset Shift." Advances in Neural Information Processing Systems, 32. Comprehensive empirical comparison of methods for detecting distribution shift, including univariate tests (KS, chi-squared), multivariate tests (MMD), and classifier-based approaches. Essential for choosing the right drift detection method.
-
Lu, J., Liu, A., Dong, F., Gu, F., Gama, J., & Zhang, G. (2019). "Learning Under Concept Drift: A Review." IEEE Transactions on Knowledge and Data Engineering, 31(12), 2346-2363. Thorough review of concept drift: definitions, detection methods, and adaptation strategies. Covers gradual, sudden, incremental, and recurring drift patterns with corresponding detection algorithms.
-
Bayram, F., Ahmed, B. S., & Kassler, A. (2022). "From Concept Drift to Model Degradation: An Overview on Performance-Aware Drift Detectors." Knowledge-Based Systems, 245, 108632. Surveys performance-aware drift detection methods that monitor model accuracy rather than (or in addition to) input distributions. Relevant for prediction market applications where ground truth becomes available at market resolution.
CI/CD for Machine Learning
-
Sato, D., Wider, A., & Windheuser, C. (2019). "Continuous Delivery for Machine Learning." ThoughtWorks Technology Radar. https://martinfowler.com/articles/cd4ml.html Martin Fowler's authoritative article on applying continuous delivery principles to ML systems. Covers data versioning, model validation, deployment strategies, and monitoring. The closest thing to a standard reference for ML CI/CD.
-
Huyen, C. (2022). Designing Machine Learning Systems. O'Reilly Media. Comprehensive treatment of production ML design, covering data engineering, feature engineering, model development, deployment, and monitoring. Particularly strong on the operational challenges of maintaining ML systems over time. Highly recommended as a companion to this chapter.
Model Serving
-
FastAPI Documentation. https://fastapi.tiangolo.com/ The documentation for FastAPI, the recommended framework for building ML prediction APIs. Covers request validation, async handling, dependency injection, and automatic OpenAPI documentation.
-
BentoML Documentation. https://docs.bentoml.com/ An ML model serving framework that handles model packaging, containerization, and deployment. Worth exploring for teams that need more sophisticated serving capabilities than a simple FastAPI endpoint.
Data Validation
-
Pandera Documentation. https://pandera.readthedocs.io/ Statistical data validation for pandas DataFrames. Documentation covers schema definitions, custom checks, hypothesis testing, and integration with ML pipelines. Used in this chapter for training data validation.
-
Great Expectations Documentation. https://docs.greatexpectations.io/ A more comprehensive data validation framework with profiling, documentation generation, and integration with data pipelines. Useful for larger-scale prediction market data infrastructure.
Prediction Market-Specific Operations
-
Tetlock, P. E., & Gardner, D. (2015). Superforecasting: The Art and Science of Prediction. Crown Publishers. Understanding what makes good forecasters is essential context for building ML systems that attempt to replicate or augment human forecasting. Chapters on calibration, updating, and the Brier score are directly relevant to monitoring prediction market models.
-
Gneiting, T., & Raftery, A. E. (2007). "Strictly Proper Scoring Rules, Prediction, and Estimation." Journal of the American Statistical Association, 102(477), 359-378. Rigorous treatment of proper scoring rules used to evaluate and monitor prediction market models. Understanding why Brier score and log-loss are proper (and why accuracy is not) is foundational for model monitoring.
Related Chapters
-
Chapter 23: Machine Learning for Prediction Markets -- The models that are deployed and monitored using the MLOps infrastructure from this chapter.
-
Chapter 24: NLP and Sentiment Analysis -- NLP features that are stored in the feature store and fed into production pipelines.
-
Chapter 25: Ensemble Methods -- Ensemble models whose components are individually tracked, versioned, and monitored.
-
Chapter 26: Backtesting Prediction Market Strategies -- Backtesting validates strategies before the MLOps pipeline deploys them.