Chronos: A Physics-Informed Full-History Framework for Non-Markovian Long-Horizon Manipulation
2026-06-29 • Robotics
Robotics
AI summaryⓘ
The authors present Chronos, a robot control approach that remembers the entire history of observations instead of just the current moment. This helps the robot handle tasks where past events change what the robot should do, like remembering different phases of a task. Chronos creates a timeline of state tokens aligned with real time and uses physics-inspired models to make movements smoother and more accurate. In tests, it performed much better than existing methods, especially on tasks needing memory, and did so using far fewer model parameters. Their work shows that history should be part of the robot's internal state, not just extra information.
MarkovianNon-MarkovianImitation learningState space modelImplicit maximum likelihood estimationSchrodinger bridgeRobotic manipulationLatent stateLong-horizon controlProprioception
Authors
Yulin Zhou, Yimeng Wang, Nengyu Wang, Shaojia Xing, Shiyun Tu, Xiang Li, Jingkai Zhang, Ningbo Jiang, Yuankai Lin, Hua Yang, Xiangrui Zeng, Zhouping Yin
Abstract
General-purpose robot policies should be modeled as dynamical systems, yet many VLA and generative imitation policies still rely on present observations or short windows. This Markovian shortcut fails in memory-dependent manipulation: identical observations can demand different actions after different histories. We present Chronos, a physics-informed full-history framework for non-Markovian long-horizon manipulation. The key idea is to elevate observation history from auxiliary context to the latent state of the policy dynamics. At each physical control step, Chronos forms one state-representative token by fusing observation and proprioception, so the token sequence is aligned one-to-one with physical time. A selective state space model propagates this causal historical state, which conditions a multimodal coarse action prior through implicit maximum likelihood estimation (IMLE). This prior is then refined by a second-order Schrodinger-inspired bridge that predicts acceleration fields, yielding smoother and more physically grounded robot motion. Across 16 simulated tasks and 4 real-world experiments, Chronos is evaluated on precision insertion, general manipulation, and memory-dependent long-horizon control. On RMBench, where success requires remembering task phase, Chronos achieves 73.6% average success, outperforming Markovian VLA baseline pi0.5 by +62.4 percentage points, a 6.6x relative gain, while using 10x fewer parameters. It also surpasses the memory VLA Mem-0 by 22.8 points while using over 30x fewer parameters. In real-world dual-arm experiments using a single RGB camera, Chronos achieves 78% average success over four tasks, including 72% on the three memory-dependent tasks, whereas pi0.5 achieves 7% overall and 0% on the memory-dependent subset. These results suggest that history should not be treated as auxiliary context, but as the latent state of the manipulation policy.