RHO: Your Coding Agent is Secretly a Roboticist
2026-06-15 • Robotics
Robotics
AI summaryⓘ
The authors propose a new method called Robotics Harness Optimization (RHO) to improve how large language models (LLMs) generate robot control code. Instead of generating code step-by-step during robot operation, RHO trains models to create multi-file policy repositories that connect different robotic skills ahead of time, using feedback from task success rather than demonstrations. This approach performs much better in various robotic tasks compared to previous methods, achieving higher success rates and faster execution without needing ongoing code fixes during deployment. The authors tested RHO on different benchmarks and found it significantly improved robot control efficiency and effectiveness.
Large Language ModelsCode-as-PoliciesRobotics ControlNeurosymbolic PoliciesMulti-file Code RepositoriesPolicy OptimizationPick-and-Place TasksReal-time Robot ControlEnvironment Reward FeedbackRobotic Benchmarking
Authors
Karim Elmaaroufi, Justin Svegliato, Sarunas Kalade, Graham Schelle, Sanjit A. Seshia, Matei Zaharia
Abstract
Code-as-Policies (CaP) has shown that large language models (LLMs) can write code to solve robotics tasks by composing perception, planning, and control primitives. Recent CaP systems, however, rely on multi-turn code-generation loops at test time, which is often infeasible for real-time robot control. We introduce Robotics Harness Optimization (RHO), a novel paradigm in which tool-enabled coding agents, at training time, propose and search for interpretable, neurosymbolic multi-file policy repositories (Repositories-as-Policies) that compose these primitives rather than a single prompt, function, or file. RHO searches with reflective feedback from environment reward and execution rather than teleoperation demonstrations. It generalizes to perturbed pick-and-place settings like LIBERO-PRO, where OpenVLA scores 0.0% and $π_{0.5}$ averages 12.83%. Using the same low-level primitives, RHO reaches a 45.0% success rate, 2.5x higher than the strongest multi-turn agentic system, and 3.5x higher than $π_{0.5}$. On Robosuite, RHO sets a new state-of-the-art of 70.0%, exceeding the prior multi-turn record of 68.29% using single-turn execution with no corrective LLM code edits at deployment. When an LLM is used in the control loop, as on RAI's O3DE benchmark, RHO optimizes the deployed agent's multi-file harness of prompts, tools, and control code, improving held-out success from 23.5% to 44.3% with 20% less wall-clock time and 27% fewer tool calls.