World-Task Factorization for Robot Learning

2026-06-01Robotics

RoboticsMachine LearningMultiagent Systems
AI summary

The authors study how to make robot learning work better across different tasks, environments, and teammates by clearly separating information about the world from the specifics of the task. They argue this separation is important because the world’s properties exist regardless of what the robot is trying to do, while tasks depend on those properties. They create a system that uses gradients to carry information between these two parts, allowing robots to learn efficiently and generalize well to new situations without retraining. Their method outperformed other approaches in various tests, including real hardware experiments.

robot learningpolicy factorizationworld-task separationBayesian model evidencedifferentiable graphgradient propagationgeneralizationzero-shot transfersensorimotor modalitiesreinforcement learning
Authors
Eduardo Sebastián, Adrian Pfisterer, Vito Mengers, Oliver Brock, Amanda Prorok
Abstract
Robot learning must produce policies that generalize to new combinations of constraints, teammates, and environments. To achieve this, we must structurally factor the policy, which is a choice that dictates what generalizes, what requires retraining, and what remains entangled. Existing methods span a wide spectrum, from expecting structure to emerge from data scaling, to hand-designing it via hierarchies, skill libraries or learned specializations. In this paper, we study what we argue is the most fundamental factorization in robotics: separating the world from the task. We investigate the conditions under which this factorization is principled. World factors are properties of the embodied system and the environment; they exist independently of intent. Task factors are defined by the task's logic over what the world admits. We formalize this asymmetry through Bayesian model evidence: it aligns with the data-generating process, maintains high likelihood through an analytical world model, and reduces the Occam razor's penalty on task parameters. We instantiate this factorization by pairing AICON, a differentiable graph of recursive estimators and interconnections that is compositional, operates without task-specific data, and propagates cost gradients to actuators, with a compact, learned policy that modulates gradient paths. Gradients serve as the interface between the two factors: they carry world structure through the graph and task structure through costs, enabling low-dimensional learning while preserving structural generalization. We test the world/task factorization across three problems that encompass heterogeneous robots, environments, task logic and sensorimotor modalities. Our framework outperforms end-to-end baselines and analytical heuristics in all settings, generalizes zero-shot to out-of-distribution configurations, and transfers to real hardware without retraining.