Physical AI, also called embodied intelligence, represents the integration of artificial intelligence into physical systems that perceive, reason, and interact with the real world. Unlike traditional AI that processes abstract data, Physical AI encompasses robots, autonomous vehicles, humanoid systems, and smart devices that must understand physics, manipulate objects, navigate spaces, and collaborate safely with humans. The field has experienced unprecedented growth in 2025-2026, driven by vision-language-action (VLA) foundation models, world simulators, and sim-to-real transfer breakthroughs. Modern Physical AI systems combine multi-modal perception (vision, tactile, force), foundation model reasoning, and learned motor policies to achieve general-purpose manipulation capabilities previously requiring task-specific programming. This shift from narrow automation to generalist robot intelligence marks a fundamental transformation in how machines learn to act in unstructured environments.
What This Cheat Sheet Covers
This topic spans 16 focused tables and 85 indexed concepts. Below is a complete table-by-table outline of this topic, spanning foundational concepts through advanced details.
Table 1: Foundation Models for Robotics
Foundation models pre-trained on massive datasets enable robots to generalize across tasks, embodiments, and environments without task-specific retraining. These models—particularly vision-language-action (VLA) architectures—learn from Internet-scale vision-language data, robot trajectory datasets, and simulation, producing policies that can zero-shot transfer to novel manipulation scenarios. The 2025-2026 period saw rapid convergence on VLA as the dominant paradigm, with models scaling from millions to billions of parameters. Unlike classical robotics that required explicit programming for each task, foundation models discover manipulation strategies through pattern recognition across diverse demonstrations, enabling emergence of general-purpose robot intelligence.
| Model | Example | Description |
|---|---|---|
Bi-arm robots with on-device VLA | Google DeepMind's VLA model for robots of any shape, enabling perception, reasoning, tool use, and human interaction using a SOTA LLM adapted for robotic control without significant architecture changes | |
Open-source 7B parameter VLA | Open-weight vision-language-action model with vision encoder and action head supporting different robot embodiments; retunable and downloadable for research and commercial use | |
Cross-embodiment policy transfer | Trained on Open X-Embodiment dataset (1M+ trajectories, 22 robots, 527 skills from 21 institutions); demonstrates positive transfer improving multiple robots' capabilities through shared learning | |
π0 0.7 steerable model (April 2026) | General-purpose robot foundation model that learns from Internet-scale vision-language pre-training, open-source datasets, and proprietary dexterous tasks from 8 robots; outputs low-level motor commands directly |