Further Reading: Chapter 18

Hyperparameter Tuning

Foundational Papers

1. "Random Search for Hyper-Parameter Optimization" --- James Bergstra and Yoshua Bengio (2012) The paper that demonstrated random search is more efficient than grid search for most hyperparameter tuning problems. Bergstra and Bengio showed that when only a few hyperparameters matter (which is typical), random search explores more unique values of the important dimensions for the same budget. The key figure --- comparing grid vs. random sampling on a 2D space where only one dimension matters --- is one of the most influential visualizations in applied machine learning. Published in the Journal of Machine Learning Research, Vol. 13.

2. "Algorithms for Hyper-Parameter Optimization" --- James Bergstra, Remi Bardenet, Yoshua Bengio, Balazs Kegl (2011) The paper that introduced the Tree-structured Parzen Estimator (TPE), which is Optuna's default surrogate model. TPE models the conditional probability of hyperparameters given good and bad performance, rather than modeling performance given hyperparameters (as Gaussian Process approaches do). This distinction makes TPE naturally suited to high-dimensional, mixed-type search spaces. Published in Advances in Neural Information Processing Systems (NeurIPS).

3. "Practical Bayesian Optimization of Machine Learning Algorithms" --- Jasper Snoek, Hugo Larochelle, Ryan Adams (2012) The paper that popularized Gaussian Process-based Bayesian optimization for hyperparameter tuning. Snoek et al. showed that Bayesian optimization consistently finds better hyperparameters than random search and grid search with fewer evaluations, across convolutional networks, SVMs, and boosted trees. Published in NeurIPS 2012. While Optuna's TPE has largely replaced GP-based approaches for practical tuning, this paper provides the clearest explanation of surrogate models and acquisition functions.

Bayesian Optimization Frameworks

4. "Optuna: A Next-generation Hyperparameter Optimization Framework" --- Akiba, Sano, Yanase, Ohta, and Koyama (2019) The paper introducing Optuna, the framework used throughout this chapter. Key contributions: define-by-run API (search spaces are defined in the objective function, not declaratively), efficient pruning of unpromising trials, and integration with major ML frameworks. Published at KDD 2019. The official documentation at optuna.readthedocs.io is comprehensive and includes tutorials for every major use case.

5. "BOHB: Robust and Efficient Hyperparameter Optimization at Scale" --- Falkner, Klein, and Hutter (2018) Combines Bayesian optimization (using TPE) with Hyperband's successive halving approach. BOHB starts with many cheap evaluations and progressively allocates more resources to promising configurations. It is the theoretical foundation for Optuna's pruning mechanism. Published at ICML 2018.

6. "Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization" --- Li, Jamieson, DeSalvo, Rostamizadeh, and Talwalkar (2018) Introduces the Hyperband algorithm, which frames hyperparameter tuning as a resource allocation problem. Instead of running every configuration to completion, Hyperband allocates small budgets to many configurations and progressively increases the budget for the best performers. This is the intellectual ancestor of scikit-learn's HalvingRandomSearchCV. Published in the Journal of Machine Learning Research, Vol. 18.

Search Space Design

7. "On Hyperparameter Optimization of Machine Learning Algorithms: Theory and Practice" --- Yang and Shami (2020) A comprehensive survey of hyperparameter optimization methods, covering grid search, random search, Bayesian optimization, evolutionary algorithms, and gradient-based approaches. The practical contribution is a set of recommended search ranges for common algorithms (Random Forest, SVM, XGBoost, LightGBM, neural networks). Published in Neurocomputing, Vol. 415.

8. "Hyperparameter Importance Across Datasets" --- van Rijn and Hutter (2018) Analyzes which hyperparameters matter most across 38 datasets for Random Forest, SVM, and AdaBoost. For Random Forest, max_features and min_samples_leaf dominate; n_estimators matters little above 128. For SVM, C and gamma are critical. This paper provides empirical grounding for the tuning priorities recommended in this chapter. Published at KDD 2018.

9. "Tunability: Importance of Hyperparameters of Machine Learning Algorithms" --- Probst, Boulesteix, and Bischl (2019) Quantifies the gap between default and tuned performance across 38 datasets and 7 algorithms. Key finding: some algorithms (SVM, neural networks) are highly tunable, while others (Random Forest) perform well with defaults. XGBoost falls in between, with learning_rate, max_depth, and n_estimators showing the highest tunability. Published in the Journal of Machine Learning Research, Vol. 20.

Practical Guides to Tuning Specific Algorithms

10. XGBoost Documentation --- "Notes on Parameter Tuning" The official XGBoost tuning guide, written by the library authors. It provides a step-by-step protocol: (1) set a high learning rate and use early stopping to find n_estimators, (2) tune max_depth and min_child_weight, (3) tune gamma, (4) tune subsample and colsample_bytree, (5) tune regularization, (6) reduce the learning rate and increase n_estimators. Available at xgboost.readthedocs.io.

11. LightGBM Documentation --- "Parameters Tuning" LightGBM's official tuning guide. Key differences from XGBoost: LightGBM uses num_leaves instead of max_depth as the primary complexity control. The guide recommends setting num_leaves less than 2^max_depth to avoid overfitting. Also covers min_data_in_leaf, feature_fraction, and bagging_fraction. Available at lightgbm.readthedocs.io.

12. CatBoost Documentation --- "Parameter Tuning" CatBoost is generally less sensitive to hyperparameter tuning than XGBoost or LightGBM because of its ordered boosting and built-in categorical feature handling. The official guide focuses on iterations, learning_rate, depth, and l2_leaf_reg. Available at catboost.ai.

Feature Engineering vs. Tuning

13. Feature Engineering and Selection: A Practical Approach for Predictive Models --- Max Kuhn and Kjell Johnson (2019) The definitive book on feature engineering. Chapters 5-8 cover encoding strategies, interaction features, and feature selection methods. The central argument aligns with this chapter's thesis: better features matter more than better hyperparameters. Kuhn and Johnson demonstrate this empirically across dozens of datasets. Free at feat.engineering.

14. "Feature Engineering for Machine Learning" --- Alice Zheng and Amanda Casari (O'Reilly, 2018) A practical guide covering text features, image features, time-series features, and automated feature engineering. Shorter and more code-focused than Kuhn and Johnson. Especially useful for understanding when feature engineering is likely to provide larger gains than tuning.

AutoML and Automated Tuning

15. "Auto-WEKA: Combined Selection and Hyperparameter Optimization of Classification Algorithms" --- Thornton, Hutter, Hoos, and Leyton-Brown (2013) The paper that started the AutoML movement by jointly optimizing algorithm selection and hyperparameter tuning. Auto-WEKA uses Bayesian optimization (SMAC) to search over both the choice of algorithm and its hyperparameters simultaneously. Published at KDD 2013.

16. "Efficient and Robust Automated Machine Learning" --- Feurer et al. (2015) Introduces Auto-sklearn, which extends Auto-WEKA to scikit-learn. Key innovations include meta-learning (using performance on previous datasets to warm-start the search) and ensemble construction (combining the top configurations instead of selecting a single winner). Published at NeurIPS 2015.

17. "Auto-Sklearn 2.0: Hands-free AutoML via Meta-Learning" --- Feurer et al. (2022) The follow-up to Auto-sklearn, removing the need for meta-learning initialization and simplifying the API. If you find yourself tuning the same model types repeatedly, Auto-sklearn 2.0 can automate the process. Published in the Journal of Machine Learning Research, Vol. 23.

Experiment Tracking

18. MLflow Documentation --- "Tracking Experiments" MLflow provides a framework for logging hyperparameter configurations, metrics, and artifacts. Optuna integrates directly with MLflow via optuna.integration.MLflowCallback, allowing you to log every trial automatically. This chapter's discussion of logging tuning results is a simplified version of what MLflow provides. Available at mlflow.org.

19. Weights & Biases Documentation --- "Hyperparameter Sweeps" W&B's sweep functionality supports grid, random, and Bayesian hyperparameter search with built-in visualization and experiment tracking. The dashboard provides real-time views of parameter importance, optimization history, and parallel coordinates plots. Available at docs.wandb.ai.

Video Resources

20. StatQuest with Josh Starmer --- "Hyperparameter Tuning" A visual walkthrough of grid search, random search, and cross-validation for hyperparameter tuning. Starmer's strength is making the geometry of search spaces intuitive. 15 minutes on YouTube.

21. Optuna Official Tutorial --- "10 Minutes to Optuna" The official getting-started tutorial walks through creating a study, defining an objective function, and visualizing results. Available on the Optuna documentation site and as a Colab notebook.

How to Use This List

If you read nothing else, read Bergstra and Bengio (item 1) on random search and Akiba et al. (item 4) on Optuna. Together they take about 2 hours and give you the theoretical and practical foundation for modern hyperparameter tuning.

If you want to understand which hyperparameters to prioritize, read van Rijn and Hutter (item 8) on hyperparameter importance and Probst et al. (item 9) on tunability. These papers provide empirical evidence for the tuning priorities in this chapter.

If you are working with XGBoost or LightGBM, the official tuning guides (items 10-11) are essential. They are written by the library authors and contain advice you will not find elsewhere.

If you suspect feature engineering would help more than tuning (it usually does), start with Kuhn and Johnson (item 13). It is the most comprehensive treatment of feature engineering available.

If you want to automate tuning entirely, explore Auto-sklearn (items 16-17) or Optuna's advanced features. But understand the manual process first --- AutoML is most effective when you understand what it is automating.

This reading list supports Chapter 18: Hyperparameter Tuning. Return to the chapter to review concepts before diving in.