## TL;DR
Embodied AI brings intelligence into the physical world — robots that see, understand, plan, and act. Vision-language-action models (RT-2, Octo, π0) transfer web-scale knowledge to robot control.
## Core Explanation
Embodied AI vs disembodied (pure text/image): agents must handle partial observability (sensors capture limited view), non-stationary environments, and real-time constraints. Key capabilities: visual grounding (mapping language to physical objects), task planning (decomposing goals into actions), and manipulation (dexterous grasping).
## Detailed Analysis
Foundation models for robotics: RT-2 (Google), Octo (open-source generalist robot policy), π0 (Physical Intelligence — unified model across robot embodiments). Simulation training (Isaac Sim, MuJoCo) provides infinite data. Imitation learning from human demonstrations is scaling rapidly.
## Further Reading
- Google DeepMind: Robotics Research
- Toyota Research Institute: LBM
- Physical Intelligence (π)