Further Reading: Chapter 6

Feature Engineering

Foundational Books

1. Feature Engineering and Selection: A Practical Approach for Predictive Models --- Max Kuhn and Kjell Johnson (2019) The definitive reference on feature engineering for applied machine learning. Chapters 5-8 cover numeric transformations, encoding categorical variables, engineering date and time features, and detecting problematic features. The treatment of interaction terms and non-linear transformations is the most thorough available. The code is in R, but the concepts are language-agnostic. Essential reading for anyone who takes feature engineering seriously.

2. The Art of Feature Engineering: Essentials for Machine Learning --- Pablo Duboue (Cambridge University Press, 2020) A broader treatment that covers feature engineering across domains: text, images, time series, and graphs. Particularly strong on the "thinking" side --- how to generate candidate features from domain knowledge. Chapters 2-4 on feature taxonomies and the engineering process complement the recipe approach introduced in this chapter. Academic in tone but practically grounded.

3. Feature Engineering for Machine Learning: Principles and Techniques for Data Scientists --- Alice Zheng and Amanda Casari (O'Reilly, 2018) A practical, code-heavy guide focused on Python. The chapters on numeric features, categorical encoding, and text features are directly relevant. The treatment of feature hashing and count-based features for high-cardinality categoricals extends what we covered in the target encoding section. Shorter and more accessible than Kuhn and Johnson, with a stronger Python emphasis.

Papers and Technical Articles

4. "A Few Useful Things to Know About Machine Learning" --- Pedro Domingos (2012) Communications of the ACM, Vol. 55, No. 10, pp. 78-87. Section 6, "Learn Many Models, Not One" is often quoted, but Section 4, "Intuition Fails in High Dimensions," and the discussion of feature engineering as the key differentiator in applied ML is the most relevant to this chapter. Domingos argues that the time spent on feature engineering returns more value than time spent on algorithm selection. One of the most-cited papers in applied ML. Freely available online.

5. "Target Encoding Done the Right Way" --- Daniele Micci-Barreca (2001) "A Preprocessing Scheme for High-Cardinality Categorical Attributes in Classification and Prediction Problems." ACM SIGKDD Explorations, Vol. 3, No. 1. The original paper introducing smoothed target encoding. The Bayesian shrinkage approach described here is the foundation for the category_encoders library's TargetEncoder. If you use target encoding in production, read this paper to understand the statistical guarantees behind the smoothing parameter.

6. "An Empirical Analysis of Feature Engineering for Predictive Modeling" --- Jeff Heaton (2016) IEEE Congress on Evolutionary Computation. A systematic comparison of feature engineering approaches across multiple datasets, showing that engineered features consistently outperform raw features regardless of algorithm choice. Provides empirical evidence for the claim made in this chapter: features matter more than algorithms.

Practical Guides and Tutorials

7. scikit-learn User Guide --- "Preprocessing Data" and "Feature Extraction" The official scikit-learn documentation for StandardScaler, PowerTransformer, PolynomialFeatures, KBinsDiscretizer, and related transformers. Particularly useful: the ColumnTransformer documentation, which shows how to apply different transformations to different feature subsets in a pipeline. The examples are minimal but precise. Available at scikit-learn.org.

8. "Feature Engineering A-Z" --- Towards Data Science Blog Series A multi-part blog series covering feature engineering techniques organized by data type: numeric, categorical, temporal, text, and geospatial. Each post includes Python code and real-world examples. The posts on temporal features (creating lag features, rolling windows, and seasonal decomposition) extend the temporal feature techniques covered in this chapter. Quality varies across posts, but the temporal and categorical installments are strong.

9. Kaggle Feature Engineering Course (Free) A hands-on course covering mutual information for feature discovery, clustering as a feature engineering technique, target encoding, and creating features from text and dates. Includes Kaggle notebook exercises. The mutual information section is a practical complement to the per-feature AUC validation technique described in this chapter. Registration required but free.

Domain-Specific Feature Engineering

10. "RFM Analysis for Customer Segmentation" --- Various Sources RFM (Recency, Frequency, Monetary) analysis originated in direct mail marketing in the 1990s. The technique is well-documented in marketing analytics literature. For a modern treatment, search for "RFM analysis Python tutorial" --- multiple quality implementations exist. The adaptation of RFM to subscription analytics (replacing Monetary with engagement metrics) introduced in this chapter is a common industry practice.

11. "Feature Engineering for Predictive Maintenance" --- Microsoft Azure AI Gallery A technical walkthrough of feature engineering for industrial IoT predictive maintenance, covering rolling window statistics, lag features, and failure-mode features. The approach mirrors TurbineTech Case Study 2 but at a larger scale with more sensor types. Includes code for computing features in PySpark for distributed processing. Available in Microsoft's Azure documentation.

Tools and Libraries

12. category_encoders Library --- Documentation The Python library implementing target encoding, binary encoding, hash encoding, leave-one-out encoding, and 15+ other categorical encoding schemes. The TargetEncoder class implements the smoothed target encoding with cross-validation described in this chapter. The library integrates with scikit-learn pipelines. Documentation at contrib.scikit-learn.org/category_encoders.

13. featuretools Library --- Documentation An open-source library for automated feature engineering ("deep feature synthesis"). Given relational tables, it automatically generates features by applying primitives (sum, mean, count, trend) across relationships. Useful for generating candidate features quickly, though the best features still come from domain knowledge applied manually. Documentation at featuretools.alteryx.com.

How to Use This List

If you read one thing, read Domingos (item 4). It is short, opinionated, and will recalibrate your intuition about where to invest your time in a data science project.

If you want a reference book for your desk, choose Kuhn and Johnson (item 1). It covers every technique mentioned in this chapter and many we did not have space for.

If you want to improve your target encoding implementation, read Micci-Barreca (item 5) for the theory and the category_encoders documentation (item 12) for the implementation.

If you work with sensor data or IoT applications, the Microsoft predictive maintenance guide (item 11) extends the TurbineTech case study with additional sensor types and distributed computing patterns.

This reading list supports Chapter 6: Feature Engineering. Return to the chapter to review concepts before diving in.