Hyperparameter Tuning Cheat Sheet

Updated 2026-04-28

Next Topic: Image Segmentation Models Cheat Sheet

🧠Study flashcards on this topic106 cards · spaced repetition→

Hyperparameter tuning is the systematic process of finding optimal configuration values for machine learning models to maximize performance. Unlike model parameters learned during training (such as weights in neural networks), hyperparameters are set before training begins and control the learning process itself—learning rate, batch size, number of layers, regularization strength, and optimizer choice. Effective tuning can mean the difference between a model that barely outperforms random guessing and one that achieves state-of-the-art results. The key challenge lies in navigating vast, high-dimensional search spaces efficiently: exhaustive search becomes computationally infeasible as the number of hyperparameters grows, making intelligent search strategies—Bayesian optimization, multi-fidelity methods, and evolutionary approaches—essential for practical machine learning workflows.

What This Cheat Sheet Covers

This topic spans 15 focused tables and 122 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Core Search StrategiesTable 2: Search Space DefinitionTable 3: Popular Frameworks and ToolsTable 4: Cross-Validation StrategiesTable 5: Acquisition Functions (Bayesian Optimization)Table 6: Surrogate ModelsTable 7: Early Stopping and PruningTable 8: Parallel and Distributed OptimizationTable 9: Multi-Objective OptimizationTable 10: Advanced TechniquesTable 11: Common Hyperparameter TypesTable 12: Learning Rate SchedulesTable 13: LLM Fine-Tuning HyperparametersTable 14: Best Practices and TipsTable 15: Common Pitfalls

Quick IndexSubscribe to unlock

A jump-to index of every table row in this cheat sheet.

Mind MapSubscribe to unlock

An interactive map of every table and concept in this topic.

Table 1: Core Search Strategies

How you walk the search space is the single biggest determinant of how quickly you find a good configuration. The strategies here climb a ladder of sophistication—from brute-force grid and random search, through Bayesian methods that learn where to look next, to multi-fidelity approaches like Hyperband and ASHA that cut losses on weak trials early and evolutionary algorithms that breed better configurations over generations.

Method	Example	Description
Grid Search	`GridSearchCV(estimator, param_grid={'C': [0.1, 1, 10], 'kernel': ['rbf', 'linear']}, cv=5)`	• Exhaustively evaluates all combinations from a predefined grid • guarantees the best within the grid but scales exponentially with the number of hyperparameters.
Random Search	`RandomizedSearchCV(estimator, param_distributions={'C': uniform(0.1, 10), 'gamma': loguniform(1e-4, 1e-1)}, n_iter=100, cv=5)`	• Samples random combinations from distributions • more efficient than grid search for high-dimensional spaces and often finds good solutions with far fewer evaluations.
QMC Sampling (Quasi-Monte Carlo)	`sampler = optuna.samplers.QMCSampler(qmc_type='sobol')` `study.optimize(objective, n_trials=32)`	• Samples from low-discrepancy Sobol or Halton sequences that cover the search space more uniformly than random numbers • outperforms pure random search at low to medium sample counts.
Bayesian Optimization	`BayesSearchCV(estimator, search_spaces={'C': Real(1e-6, 1e+6, prior='log-uniform')}, n_iter=50)`	Builds a probabilistic surrogate model of the objective function and uses acquisition functions to intelligently select the next configuration, balancing exploration and exploitation.
Successive Halving	`HalvingRandomSearchCV(estimator, param_distributions, resource='n_samples', factor=3)`	Starts with many configurations on small budgets, progressively eliminates poor performers, and allocates more resources to promising candidates.
Hyperband	`HyperbandScheduler(time_attr='training_iteration', max_t=81, reduction_factor=3)`	• Extends successive halving by running it multiple times with different resource allocation strategies • balances exploration and exploitation without manual budget selection.
BOHB (Bayesian Optimization + Hyperband)	`from hpbandster.optimizers import BOHB` `bohb = BOHB(configspace, run_id='run', min_budget=1, max_budget=81)`	• Combines Hyperband's multi-fidelity efficiency with TPE-based Bayesian optimization • achieves strong anytime performance at small budgets and strong final performance at large ones.

Table 1: Core Search Strategies

Method	Example	Description
Grid Search	`GridSearchCV(estimator, param_grid={'C': [0.1, 1, 10], 'kernel': ['rbf', 'linear']}, cv=5)`	• Exhaustively evaluates all combinations from a predefined grid • guarantees the best within the grid but scales exponentially with the number of hyperparameters.
Random Search	`RandomizedSearchCV(estimator, param_distributions={'C': uniform(0.1, 10), 'gamma': loguniform(1e-4, 1e-1)}, n_iter=100, cv=5)`	• Samples random combinations from distributions • more efficient than grid search for high-dimensional spaces and often finds good solutions with far fewer evaluations.
QMC Sampling (Quasi-Monte Carlo)	`sampler = optuna.samplers.QMCSampler(qmc_type='sobol')` `study.optimize(objective, n_trials=32)`	• Samples from low-discrepancy Sobol or Halton sequences that cover the search space more uniformly than random numbers • outperforms pure random search at low to medium sample counts.
Bayesian Optimization	`BayesSearchCV(estimator, search_spaces={'C': Real(1e-6, 1e+6, prior='log-uniform')}, n_iter=50)`	Builds a probabilistic surrogate model of the objective function and uses acquisition functions to intelligently select the next configuration, balancing exploration and exploitation.
Successive Halving	`HalvingRandomSearchCV(estimator, param_distributions, resource='n_samples', factor=3)`	Starts with many configurations on small budgets, progressively eliminates poor performers, and allocates more resources to promising candidates.
Hyperband	`HyperbandScheduler(time_attr='training_iteration', max_t=81, reduction_factor=3)`	• Extends successive halving by running it multiple times with different resource allocation strategies • balances exploration and exploitation without manual budget selection.
BOHB (Bayesian Optimization + Hyperband)	`from hpbandster.optimizers import BOHB` `bohb = BOHB(configspace, run_id='run', min_budget=1, max_budget=81)`	• Combines Hyperband's multi-fidelity efficiency with TPE-based Bayesian optimization • achieves strong anytime performance at small budgets and strong final performance at large ones.