RUNLOCALAIv38
->Will it run?Best GPUCompareTroubleshootStartLearnPulseModelsHardwareToolsBench
Run check
RUNLOCALAI

Independently operated catalog for local-AI hardware and software. Hand-written verdicts. Source-cited claims. Reproducible commands when we have them.

OP·Fredoline Eruo
DIR
  • Models
  • Hardware
  • Tools
  • Benchmarks
TOOLS
  • Will it run?
  • Compare hardware
  • Cost vs cloud
  • Choose my GPU
  • Prompting kits
  • Quick answers
REF
  • All buyer guides
  • Learn local AI
  • Methodology
  • Glossary
  • Errors KB
  • Trust
EDITOR
  • About
  • Author
  • How we make money
  • Editorial policy
  • Contact
LEGAL
  • Privacy
  • Terms
  • Sitemap
MAIL · MONTHLY DIGEST
Get monthly local AI changes
Monthly recap. No spam.
DISCLOSURE

Some links on this site are affiliate links (Amazon Associates and other first-class retailers). When you buy through them, we earn a small commission at no extra cost to you. Affiliate links do not influence our verdicts — there are cards we rate highly that we don't have affiliate relationships with, and cards that sell well that we refuse to recommend. Read more →

© 2026 runlocalai.coIndependently operated
RUNLOCALAI · v38
Glossary / Agents & agentic AI / Embodied AI
Agents & agentic AI

Embodied AI

Embodied AI refers to AI systems that interact with the physical world through a body or sensorimotor capabilities, rather than operating purely in software. For operators running local AI, this term arises when deploying models on robots, drones, or edge devices that must process real-time sensor data (cameras, LIDAR, microphones) and generate motor commands or physical actions. The key constraint is latency: inference must complete within milliseconds to enable closed-loop control, which often requires quantized models (e.g., Q4 or Q8) running on embedded GPUs like the Jetson Orin or Apple M-series chips. VRAM and power budgets are tight, so model size and batch size are tuned to fit the hardware.

Deeper dive

Embodied AI contrasts with disembodied AI (e.g., chatbots or image generators) that only process text or images without physical interaction. The embodiment can be a robotic arm, a legged robot, a drone, or even a smartphone with sensors. The AI model typically runs a perception-action loop: sense (e.g., camera frame) -> infer (e.g., object detection, path planning) -> act (e.g., motor torque). This loop imposes strict real-time requirements. For local AI operators, common frameworks include ROS 2 with ONNX Runtime or TensorRT for inference on edge hardware. Quantization (e.g., INT8) and model pruning are standard to meet latency targets. A popular embodied AI model is RT-2 (Robotic Transformer 2) from Google, which can be run locally on a Jetson Orin at ~10 FPS with INT8 quantization. The field also includes sim-to-real transfer, where models trained in simulation (e.g., Isaac Sim) are deployed on real hardware.

Practical example

An operator deploying a mobile robot with a Jetson Orin NX 16GB runs a quantized YOLOv8n (INT8) for object detection at 30 FPS and a small policy network (e.g., 1M parameters) for collision avoidance. The total VRAM usage is ~2 GB, leaving room for sensor processing. Inference latency must stay under 33 ms to match the camera frame rate. If the operator switches to a larger model like RT-2 (300M parameters), they would need to quantize to INT4 and possibly offload layers to system RAM, dropping to ~5 FPS.

Workflow example

In a typical workflow, the operator first trains a policy in simulation (e.g., using RLlib or Isaac Gym). Then they export the model to ONNX and quantize it using ONNX Runtime's quantization tool. On the robot, they run a ROS 2 node that loads the quantized model with TensorRT and subscribes to camera topics. The node publishes motor commands at 20 Hz. If using llama.cpp for a language-guided robot, the operator would quantize a small LLM (e.g., Phi-3-mini) to Q4_K_M and run it on the Jetson, but must limit context to 512 tokens to keep inference under 100 ms.

Reviewed by Fredoline Eruo. See our editorial policy.

Buyer guides
  • Best GPU for local AI →
  • Best laptop for local AI →
  • Best Mac for local AI →
When it doesn't work
  • CUDA out of memory →
  • Ollama running slowly →
  • ROCm not detected →