Neural Architecture Search (NAS) is an automated machine learning technique that discovers optimal neural network architectures for specific tasks by algorithmically exploring vast design spaces, replacing manual architecture engineering with principled search methods. NAS emerged as a response to the time-consuming and expertise-intensive process of manually designing network topologies, enabling models to design models—architectures that often surpass human-designed counterparts. The field encompasses three core components: search space definition (the set of possible architectures), search strategy (the algorithm to explore this space), and performance estimation (evaluating candidate architectures), with modern approaches dramatically reducing search costs from thousands of GPU hours to mere hours through techniques like weight sharing, differentiable search, and zero-cost proxies. A key insight: the quality of the search space often matters more than the sophistication of the search algorithm, as even random search can find strong architectures in well-designed spaces. Understanding the tradeoffs between search efficiency, architecture quality, and hardware constraints is central to practical NAS deployment across domains from computer vision to natural language processing and large language model optimization.
What This Cheat Sheet Covers
This topic spans 18 focused tables and 118 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Core NAS Components
The three-component framework of search space, strategy, and performance estimation defines every NAS system; how each is designed determines both cost and result quality. Modern NAS adds a fourth practical concern—hardware constraints—turning architecture discovery into a constrained multi-objective problem.
| Component | Example | Description |
|---|---|---|
operations = [3x3_conv, 5x5_conv, max_pool, skip]space_size = operations^edges | • Defines the set of all possible architectures that the search algorithm can explore • determines both expressiveness (can optimal architectures be represented?) and search difficulty (how large is the space?). | |
candidates = evolutionary_search( population, fitness, generations)best_arch = select_top(candidates) | • The algorithm used to navigate the search space and propose candidate architectures • ranges from random sampling to sophisticated methods like reinforcement learning, evolutionary algorithms, gradient-based, or generative approaches. | |
accuracy = train_epochs(model, 50)vsproxy = zero_cost_metric(model) | • Method for evaluating candidate architectures to guide the search • can involve full training, early stopping, weight sharing, or zero-cost proxies—each trades speed against reliability. | |
min_alpha max_w L_val(w*(alpha), alpha)where w* = argmin L_train(w, alpha) | • Formulation where architecture parameters (α) are optimized on validation data while network weights (w) are optimized on training data • the outer loop searches architectures, inner loop trains them. |