HORIZON: Recoverability-Governed Curriculum for Physical-Domain Scaling
2026-06-03 • Robotics
Robotics
AI summaryⓘ
The authors explore how to teach robots to handle harder physical tasks reliably by gradually increasing the difficulty based on what the current robot policy can recover from. They propose a method called HORIZON that carefully expands the range of physical conditions, checking and rolling back when the robot can't recover, to keep learning effective. Their experiments show that trying to increase difficulty too much or too quickly can actually harm learning, and that simply combining experts trained separately isn't enough. Overall, they show that making robots better at handling diverse real-world physics is about growing the challenges in a controlled way, guided by the robot's ability to bounce back.
on-policy trainingrobot locomotioncurriculum learningdomain randomizationrecoverabilityphysical-domain expansionrobot policyquadruped robots
Authors
Chenhao Bai, Liqin Lu, Kaijun Wang, Hui Chen, Jin-Chuan Shi, Yuyang Liu, Hao Chen, Chunhua Shen
Abstract
Scaling robust robot policies requires more than broader randomization, because physical-domain experience must remain organized and learnable throughout training. We study when a policy can benefit from harder physics and identify recoverability as a central constraint in on-policy physical-domain scaling. In on-policy training, new dynamics are useful only insofar as they remain close enough to the current policy to generate corrective on-policy data, rather than collapsing rollouts into unrecoverable failures. Using quadruped locomotion as a physically demanding benchmark for embodied generalization, we introduce HORIZON, a checkpointed frontier curriculum that expands physical domains only within the current policy's recoverable boundary. HORIZON uses rollback and boundary refinement to govern each expansion step, turning fixed randomization into a continual process of physical-domain growth. Experiments reveal three regularities of physical-domain expansion. First, direct domain widening is uneven across physical axes and often unlearnable without staged ordering. Second, domain composition is non-monotonic, and adding more domains beyond a compact core can dilute recoverable joint samples and reduce overall robustness. Third, offline distillation of isolated experts cannot substitute for the joint interaction generated by on-policy curriculum. Together, these results frame physical-domain generalization as a continual growth problem for embodied control, with recoverability as the organizing principle for on-policy expansion.