The Lie We Tell: Correcting the Euclidean Fallacy in Vision Language Action Policies via Score Matching on Tangent Space

2026-06-01 • Robotics

RoboticsMachine Learning

AI summaryⓘ

The authors point out a common mistake in robot action learning, where 3D rotations and positions are wrongly treated as plain numbers, causing errors and inefficiencies. They propose a new method called Lie Diffuser Actor (LDA) that works directly with the true math of 3D rotations (SE(3)), which avoids these errors. Their approach improves task performance and efficiency, confirmed through tests in simulation and on real robots. Overall, their method better respects the geometry of robot movement, leading to smoother and more reliable actions.

SE(3)SO(3)robotic manipulationdiffusion modelsLie groupsmanifoldequivariancegeodesic trajectoryscore predictionexponential map

Authors

Bing-Cheng Chuang, I-Hsuan Chu, Bor-Jiun Lin, YuanFu Yang, Min Sun, Chun-Yi Lee

Abstract

Diffusion-based Vision-Language-Action policies achieve remarkable success in robotic manipulation, yet commit a fundamental geometric error we term the $\textbf{Euclidean Fallacy}$: representing SE(3) poses as flat $\mathbb{R}^{12}$ vectors. This approximation induces (1) manifold drift violating SO(3) constraints, (2) broken equivariance under coordinate transformations, and (3) non-geodesic trajectories with excessive kinematic cost. We introduce $\textbf{Lie Diffuser Actor (LDA)}$, a diffusion framework operating intrinsically on SE(3). Our method injects noise through left-invariant SDEs, predicts scores in the tangent space, and retracts samples via the exponential map. This formulation eliminates manifold drift by construction while guaranteeing coordinate-frame equivariance and geodesic optimality. On CALVIN ABC$\rightarrow$D, LDA improves average task length from $3.27$ to $3.51$ ($+7.3\%$). We further validate our method on real robot and the results show that our methodology outperforms the baseline on majority tasks.

View PDFOpen arXiv