Neural Architecture Search (NAS) Cheat Sheet

Updated 2026-05-25

Next Topic: Neural Network Attention Mechanisms Cheat Sheet

🧠Study flashcards on this topic105 cards · spaced repetition→

Neural Architecture Search (NAS) is an automated machine learning technique that discovers optimal neural network architectures for specific tasks by algorithmically exploring vast design spaces, replacing manual architecture engineering with principled search methods. NAS emerged as a response to the time-consuming and expertise-intensive process of manually designing network topologies, enabling models to design models—architectures that often surpass human-designed counterparts. The field encompasses three core components: search space definition (the set of possible architectures), search strategy (the algorithm to explore this space), and performance estimation (evaluating candidate architectures), with modern approaches dramatically reducing search costs from thousands of GPU hours to mere hours through techniques like weight sharing, differentiable search, and zero-cost proxies. A key insight: the quality of the search space often matters more than the sophistication of the search algorithm, as even random search can find strong architectures in well-designed spaces. Understanding the tradeoffs between search efficiency, architecture quality, and hardware constraints is central to practical NAS deployment across domains from computer vision to natural language processing and large language model optimization.

What This Cheat Sheet Covers

This topic spans 18 focused tables and 118 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.

Table 1: Core NAS ComponentsTable 2: Search Space TypesTable 3: Search Strategy AlgorithmsTable 4: Reinforcement Learning-Based NASTable 5: Evolutionary Algorithms for NASTable 6: Differentiable & Gradient-Based NASTable 7: One-Shot & Few-Shot NAS MethodsTable 8: Performance Estimation StrategiesTable 9: Zero-Cost Proxies & Training-Free NASTable 10: Hardware-Aware NASTable 11: Transfer Learning & Meta-Learning in NASTable 12: Multi-Objective NASTable 13: Architecture Encoding & RepresentationTable 14: NAS Benchmarks & EvaluationTable 15: AutoML Platforms & ToolsTable 16: Famous NAS-Discovered ArchitecturesTable 17: NAS for Transformers & Large Language ModelsTable 18: NAS Challenges & Considerations

Quick IndexSubscribe to unlock

A jump-to index of every table row in this cheat sheet.

Mind MapSubscribe to unlock

An interactive map of every table and concept in this topic.

Table 1: Core NAS Components

The three-component framework of search space, strategy, and performance estimation defines every NAS system; how each is designed determines both cost and result quality. Modern NAS adds a fourth practical concern—hardware constraints—turning architecture discovery into a constrained multi-objective problem.

Component	Example	Description
Search Space	`operations = [3x3_conv, 5x5_conv,` `max_pool, skip]` `space_size = operations^edges`	• Defines the set of all possible architectures that the search algorithm can explore • determines both expressiveness (can optimal architectures be represented?) and search difficulty (how large is the space?).
Search Strategy	`candidates = evolutionary_search(` `population, fitness, generations)` `best_arch = select_top(candidates)`	• The algorithm used to navigate the search space and propose candidate architectures • ranges from random sampling to sophisticated methods like reinforcement learning, evolutionary algorithms, gradient-based, or generative approaches.
Performance Estimation	`accuracy = train_epochs(model, 50)` `vs` `proxy = zero_cost_metric(model)`	• Method for evaluating candidate architectures to guide the search • can involve full training, early stopping, weight sharing, or zero-cost proxies—each trades speed against reliability.
Bilevel Optimization	`min_alpha max_w L_val(w(alpha), alpha)` `where w = argmin L_train(w, alpha)`	• Formulation where architecture parameters (α) are optimized on validation data while network weights (w) are optimized on training data • the outer loop searches architectures, inner loop trains them.

Table 1: Core NAS Components

Component	Example	Description
Search Space	`operations = [3x3_conv, 5x5_conv,` `max_pool, skip]` `space_size = operations^edges`	• Defines the set of all possible architectures that the search algorithm can explore • determines both expressiveness (can optimal architectures be represented?) and search difficulty (how large is the space?).
Search Strategy	`candidates = evolutionary_search(` `population, fitness, generations)` `best_arch = select_top(candidates)`	• The algorithm used to navigate the search space and propose candidate architectures • ranges from random sampling to sophisticated methods like reinforcement learning, evolutionary algorithms, gradient-based, or generative approaches.
Performance Estimation	`accuracy = train_epochs(model, 50)` `vs` `proxy = zero_cost_metric(model)`	• Method for evaluating candidate architectures to guide the search • can involve full training, early stopping, weight sharing, or zero-cost proxies—each trades speed against reliability.
Bilevel Optimization	`min_alpha max_w L_val(w(alpha), alpha)` `where w = argmin L_train(w, alpha)`	• Formulation where architecture parameters (α) are optimized on validation data while network weights (w) are optimized on training data • the outer loop searches architectures, inner loop trains them.