The Lie We Tell: Correcting the Euclidean Fallacy in Vision Language Action Policies via Score Matching on Tangent Space

2026-06-01Robotics

RoboticsMachine Learning
AI summary

The authors point out a common mistake in robot action learning, where 3D rotations and positions are wrongly treated as plain numbers, causing errors and inefficiencies. They propose a new method called Lie Diffuser Actor (LDA) that works directly with the true math of 3D rotations (SE(3)), which avoids these errors. Their approach improves task performance and efficiency, confirmed through tests in simulation and on real robots. Overall, their method better respects the geometry of robot movement, leading to smoother and more reliable actions.

SE(3)SO(3)robotic manipulationdiffusion modelsLie groupsmanifoldequivariancegeodesic trajectoryscore predictionexponential map
Authors
Bing-Cheng Chuang, I-Hsuan Chu, Bor-Jiun Lin, YuanFu Yang, Min Sun, Chun-Yi Lee
Abstract
Diffusion-based Vision-Language-Action policies achieve remarkable success in robotic manipulation, yet commit a fundamental geometric error we term the $\textbf{Euclidean Fallacy}$: representing SE(3) poses as flat $\mathbb{R}^{12}$ vectors. This approximation induces (1) manifold drift violating SO(3) constraints, (2) broken equivariance under coordinate transformations, and (3) non-geodesic trajectories with excessive kinematic cost. We introduce $\textbf{Lie Diffuser Actor (LDA)}$, a diffusion framework operating intrinsically on SE(3). Our method injects noise through left-invariant SDEs, predicts scores in the tangent space, and retracts samples via the exponential map. This formulation eliminates manifold drift by construction while guaranteeing coordinate-frame equivariance and geodesic optimality. On CALVIN ABC$\rightarrow$D, LDA improves average task length from $3.27$ to $3.51$ ($+7.3\%$). We further validate our method on real robot and the results show that our methodology outperforms the baseline on majority tasks.