Key Takeaways: Chapter 6

Feature Engineering


  1. Feature engineering provides more predictive lift than algorithm choice. A logistic regression with well-engineered features routinely outperforms a gradient boosted tree with raw features. The information content of your features sets the ceiling for model performance. The algorithm determines how close to that ceiling you get. Invest in features first, algorithms second.

  2. The feature engineering recipe is: domain knowledge first, code second. Understand the domain. Ask "what would a human expert look at?" Translate that intuition into computable variables. The best features in the StreamFlow model came from a 60-minute conversation with a customer success manager --- not from automated feature generation or architecture search.

  3. Temporal features are the workhorses of behavioral prediction. Recency (when did they last do something?), frequency (how often do they do it?), and tenure (how long have they been here?) capture the dimensions that matter most in subscription, engagement, and transaction models. Always create multiple time windows (7-day, 30-day, 90-day) and let the model learn which horizon is most predictive.

  4. Trend features are more predictive than snapshot features. The direction of behavioral change --- usage declining, support tickets increasing, engagement decreasing --- signals future intent more strongly than the current level. A subscriber watching 20 hours per month and declining is a higher churn risk than one watching 10 hours per month and increasing.

  5. Ratio features normalize for exposure and make comparisons fair. "Five support tickets" means different things for a 2-month subscriber and a 24-month subscriber. Tickets per tenure month, hours per session, and genre diversity per session are all ratios that control for opportunity and enable fair comparisons across users with different histories.

  6. Interaction features encode compound domain logic. "New and inactive" is not just tenure < 3 months AND hours = 0. It is a specific compound risk signal that customer success teams recognize. Every interaction feature should have a business interpretation. If you cannot explain why two variables interact to a domain expert, the interaction is likely noise.

  7. Mathematical transformations serve linear models more than tree-based models. Log transformations, Box-Cox transformations, and polynomial features reshape distributions and capture non-linearities that linear models cannot learn on their own. Tree-based models split on rank order and are invariant to monotonic transformations. Apply transformations selectively, based on your model choice.

  8. Target encoding is powerful but dangerous without cross-validation and smoothing. Naive target encoding (computing target means on the full dataset) causes severe data leakage. Correct target encoding uses cross-validation within the training set and smoothing to regularize categories with few observations. The smoothing formula blends the category mean with the global mean, weighted by sample size.

  9. Every feature must be computable using only information available at prediction time. This is the golden rule of feature engineering. No future data. No target leakage. No statistics computed on the test set. Violating this rule produces models that look excellent in evaluation and fail in production. Run a leakage audit --- per-feature AUC check --- before trusting any results.

  10. Sensor feature engineering follows the same recipe as behavioral feature engineering, but the domain knowledge is physics instead of business. Rolling statistics capture current state. Rate-of-change features capture trends. Cross-sensor correlations capture relationships that change during degradation. Threshold features encode expert rules. The recipe is universal: understand the domain, ask what an expert would look at, and translate it into code.


If You Remember One Thing

Feature engineering is where data science becomes a craft. It is the bridge between domain knowledge and model performance. The five-minute feature (days_since_last_login) that beat three weeks of neural architecture search is not an anomaly --- it is the norm. The most valuable thing you can do on any project is talk to someone who understands the problem, listen to what they pay attention to, and turn their expertise into computable variables. Algorithms are commodities. Features are craft.


These takeaways summarize Chapter 6: Feature Engineering. Return to the chapter for full context.