OASIS: Observation-Action Space Alignment via SE(3) Trajectory Prediction for Robotic Manipulation
2026-05-25 • Robotics
RoboticsArtificial Intelligence
AI summaryⓘ
The authors developed OASIS, a robotic control method that improves how robots plan their movements by better connecting visual information with actual physical motions. Instead of just guessing how to move based on what the robot sees, OASIS predicts precise 3D trajectories for the robot's hand in space. This makes the robot's actions more accurate and easier to understand, leading to better success in both simulations and real-world tasks compared to previous methods. Their approach integrates vision, language cues, and depth data to guide the robot's actions more directly.
vision-language-action modelsworld action modelsrigid-body geometrySE(3) trajectory prediction3D feature encodingend-effector trajectoryvisuomotor policyrobotic manipulationdepth sensingpose supervision
Authors
Xinzhe Chen, Sihua Ren, Liqi Huang, Haowen Sun, Mingyang Li, Xingyu Chen, Zeyang Liu, Xuguang Lan
Abstract
Recent vision-language-action (VLA) models and world action models (WAMs) advance robotic manipulation by enriching intermediate representations with auxiliary spatial features or future visual-state prediction. However, these representations largely remain within the observation space and do not share the rigid-body geometry of the action space, forcing the action decoder to implicitly recover this geometry. We propose OASIS, a visuomotor policy that aligns the intermediate representation with the action space via $SE(3)$ end-effector trajectory prediction. OASIS couples a 3D-aware feature encoder that fuses vision-language and metric-depth features with an $SE(3)$ trajectory predictor that produces a camera-frame end-effector trajectory. Conditioned on the predictor's pose-supervised hidden states, the action decoder generates action chunks consistent with rigid-body motion. Across simulation and real-world experiments, OASIS outperforms VLA and WAM baselines in success rate and out-of-distribution generalization. Our project page is available at https://npuhandsome.github.io/OASIS_web.