Chapter 9: Further Reading

Foundational Texts

Zheng, A. and Casari, A. (2018). Feature Engineering for Machine Learning. O'Reilly Media. A practical, hands-on guide to feature engineering covering numerical, categorical, text, and image features. Well-suited for engineers transitioning from theory to practice.
Kuhn, M. and Johnson, K. (2019). Feature Engineering and Selection: A Practical Approach for Predictive Models. CRC Press. Comprehensive treatment of feature engineering from a statistical modeling perspective, with excellent coverage of feature selection. Freely available at https://bookdown.org/max/FES/.
Muller, A. C. and Guido, S. (2016). Introduction to Machine Learning with Python. O'Reilly Media. Chapter 4 provides an accessible, code-driven introduction to feature engineering with scikit-learn, including pipelines and ColumnTransformers.
VanderPlas, J. (2016). Python Data Science Handbook. O'Reilly Media. Excellent treatment of pandas-based data manipulation and feature extraction. Freely available at https://jakevdp.github.io/PythonDataScienceHandbook/.

Box, G. E. P. and Cox, D. R. (1964). "An Analysis of Transformations." Journal of the Royal Statistical Society, Series B, 26(2), 211--252. The foundational paper on power transformations for normalizing distributions, the basis for PowerTransformer in scikit-learn.
Yeo, I. and Johnson, R. A. (2000). "A New Family of Power Transformations to Improve Normality or Symmetry." Biometrika, 87(4), 954--959. Extends Box-Cox to handle zero and negative values, implemented as the Yeo-Johnson transform.

Micci-Barreca, D. (2001). "A Preprocessing Scheme for High-Cardinality Categorical Attributes in Classification and Prediction Problems." ACM SIGKDD Explorations, 3(1), 27--32. Introduces target encoding with regularization, a key technique for high-cardinality features.
Pargent, F., Pfisterer, F., Thomas, J., and Bischl, B. (2022). "Regularized Target Encoding Outperforms Traditional Methods in Supervised Machine Learning with High-Cardinality Features." Computational Statistics, 37, 2671--2692. Systematic comparison of encoding strategies, demonstrating regularized target encoding's advantages.

Guyon, I. and Elisseeff, A. (2003). "An Introduction to Variable and Feature Selection." Journal of Machine Learning Research, 3, 1157--1182. The definitive survey on feature selection methods (filter, wrapper, embedded), widely cited and still highly relevant.
Chandrashekar, G. and Sahin, F. (2014). "A Survey on Feature Selection Methods." Computers & Electrical Engineering, 40(1), 16--28. A more recent survey covering both classical and modern feature selection techniques.
Tibshirani, R. (1996). "Regression Shrinkage and Selection via the Lasso." Journal of the Royal Statistical Society, Series B, 58(1), 267--288. Introduces L1 regularization for simultaneous feature selection and regression, foundational for embedded methods.
Breiman, L. (2001). "Random Forests." Machine Learning, 45(1), 5--32. Introduces permutation importance and Gini importance for tree-based feature ranking.

Buitinck, L. et al. (2013). "API Design for Machine Learning Software: Experiences from the scikit-learn Project." arXiv:1309.0238. Describes the design principles behind scikit-learn's Pipeline, Transformer, and Estimator APIs.

Kaufman, S., Rosset, S., and Perlich, C. (2012). "Leakage in Data Mining: Formulation, Detection, and Avoidance." ACM Transactions on Knowledge Discovery from Data, 6(4), 15. Formalizes data leakage, classifies its types, and provides detection and prevention strategies.
Kapoor, S. and Narayanan, A. (2023). "Leakage and the Reproducibility Crisis in Machine-Learning-Based Science." Patterns, 4(9). Demonstrates how data leakage has affected thousands of published studies, with recommendations for prevention.

scikit-learn Preprocessing Guide: https://scikit-learn.org/stable/modules/preprocessing.html --- Comprehensive documentation for all scikit-learn transformers including scalers, encoders, and imputers.
scikit-learn Pipeline Guide: https://scikit-learn.org/stable/modules/compose.html --- Official guide for Pipeline, ColumnTransformer, and FeatureUnion.
scikit-learn Feature Selection Guide: https://scikit-learn.org/stable/modules/feature_selection.html --- Documentation for VarianceThreshold, SelectKBest, RFE, and SelectFromModel.
Category Encoders Library Documentation: https://contrib.scikit-learn.org/category_encoders/ --- Documentation for the category_encoders package providing 20+ encoding strategies with scikit-learn API compatibility.
Kaggle Feature Engineering Course: https://www.kaggle.com/learn/feature-engineering --- Free interactive course with practical notebooks covering target encoding, feature creation, and mutual information.

scikit-learn (sklearn): Core library for pipelines (Pipeline, ColumnTransformer), preprocessing (StandardScaler, OneHotEncoder, OrdinalEncoder), feature selection (SelectKBest, RFE, SelectFromModel), and imputation (SimpleImputer, IterativeImputer, KNNImputer).
category_encoders (category_encoders): Provides target encoding, leave-one-out encoding, binary encoding, hashing encoding, and more. Integrates seamlessly with scikit-learn pipelines. Install with pip install category-encoders.
feature-engine (feature_engine): Specialized library for feature engineering with scikit-learn-compatible transformers for encoding, discretization, outlier handling, and missing data. Install with pip install feature-engine.
pandas (pandas): Essential for data manipulation, datetime feature extraction, and exploratory analysis before pipeline construction.
scikit-learn contrib: imbalanced-learn (imblearn): Provides pipeline-compatible samplers (SMOTE, ADASYN) for handling class imbalance. Install with pip install imbalanced-learn.

Automated Feature Engineering: Libraries like featuretools generate features automatically from relational datasets using deep feature synthesis. See Kanter and Veeramachaneni (2015), "Deep Feature Synthesis: Towards Automating Data Science Endeavors."
Feature Stores: Production systems for managing, sharing, and serving features at scale. See Feast (https://feast.dev/) and Tecton for open-source and managed solutions.
Neural Feature Learning: Deep learning models (Chapter 11+) learn features automatically from raw data. Understanding manual feature engineering helps interpret what deep models learn and when manual engineering still outperforms learned representations.
Causal Feature Selection: Using causal reasoning to select features that represent true causes rather than mere correlations. See Peters, Janzing, and Scholkopf (2017), Elements of Causal Inference.
Missing Data Theory: Little and Rubin (2019), Statistical Analysis with Missing Data, 3rd ed. Wiley. The authoritative reference on MCAR, MAR, and MNAR missing data mechanisms and their implications for imputation.