Hyperparameter tuning is the systematic process of finding optimal configuration values for machine learning models to maximize performance. Unlike model parameters learned during training (such as weights in neural networks), hyperparameters are set before training begins and control the learning process itself—learning rate, batch size, number of layers, regularization strength, and optimizer choice. Effective tuning can mean the difference between a model that barely outperforms random guessing and one that achieves state-of-the-art results. The key challenge lies in navigating vast, high-dimensional search spaces efficiently: exhaustive search becomes computationally infeasible as the number of hyperparameters grows, making intelligent search strategies—Bayesian optimization, multi-fidelity methods, and evolutionary approaches—essential for practical machine learning workflows.
What This Cheat Sheet Covers
This topic spans 15 focused tables and 122 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Core Search Strategies
| Method | Example | Description |
|---|---|---|
GridSearchCV(estimator, param_grid={'C': [0.1, 1, 10], 'kernel': ['rbf', 'linear']}, cv=5) | • Exhaustively evaluates all combinations from a predefined grid • guarantees the best within the grid but scales exponentially with the number of hyperparameters. | |
RandomizedSearchCV(estimator, param_distributions={'C': uniform(0.1, 10), 'gamma': loguniform(1e-4, 1e-1)}, n_iter=100, cv=5) | • Samples random combinations from distributions • more efficient than grid search for high-dimensional spaces and often finds good solutions with far fewer evaluations. | |
sampler = optuna.samplers.QMCSampler(qmc_type='sobol')study.optimize(objective, n_trials=32) | • Samples from low-discrepancy Sobol or Halton sequences that cover the search space more uniformly than random numbers • outperforms pure random search at low to medium sample counts. | |
BayesSearchCV(estimator, search_spaces={'C': Real(1e-6, 1e+6, prior='log-uniform')}, n_iter=50) | Builds a probabilistic surrogate model of the objective function and uses acquisition functions to intelligently select the next configuration, balancing exploration and exploitation. | |
HalvingRandomSearchCV(estimator, param_distributions, resource='n_samples', factor=3) | Starts with many configurations on small budgets, progressively eliminates poor performers, and allocates more resources to promising candidates. | |
HyperbandScheduler(time_attr='training_iteration', max_t=81, reduction_factor=3) | • Extends successive halving by running it multiple times with different resource allocation strategies • balances exploration and exploitation without manual budget selection. | |
from hpbandster.optimizers import BOHBbohb = BOHB(configspace, run_id='run', min_budget=1, max_budget=81) | • Combines Hyperband's multi-fidelity efficiency with TPE-based Bayesian optimization • achieves strong anytime performance at small budgets and strong final performance at large ones. |