Sphere-VIO: Fast and Robust Visual-Inertial Odometry via Unified Spherical Representation for Heterogeneous Multi-Camera Systems

2026-06-29 • Robotics

Robotics

AI summaryⓘ

The authors present Sphere-VIO, a new system that helps track movement using multiple cameras of different types working together. They created a method to map images from all cameras onto a single spherical view, making it easier to connect features seen by different cameras without complicated stitching. Their approach includes a way to track features quickly and reliably across cameras and a filtering process that keeps the system running smoothly in real-time, even on less powerful devices. Tests show Sphere-VIO balances accuracy, speed, and compatibility better than previous methods.

Visual-Inertial OdometryMulti-camera systemsSpherical Panorama ModelFeature trackingDepth estimationExtended Kalman Filter (ESKF)Schur complementReal-time state estimationCross-camera feature associationOmnidirectional imaging

Authors

Yueteng Yang, Yusen Xie, Hao Wei, Qianhao Wang, Boyu Zhou, Fei Gao, Jun Ma, Jinni Zhou

Abstract

Multi-camera visual-inertial odometry (VIO) overcomes the inherent limitations of pure visual systems by expanding the field of view. However, existing algorithms are typically tailored for fixed camera setups and lack unified compatibility with heterogeneous multi-camera systems. Meanwhile, due to the absence of a unified cross-camera representation and association mechanism, current methods struggle to achieve a balance among robust cross-camera feature tracking, stable depth estimation, and reliable real-time performance. To address these issues, we present Sphere-VIO, a lightweight filter-based VIO framework with unified spherical representation for heterogeneous multi-camera systems. Specifically, we first propose a Unified Spherical Panorama Model (USPM) that supports all standard camera models and enables bidirectional fast mapping between multi-camera images and a shared spherical space without sequential stitching, simplifying cross-camera feature management and improving triangulation efficiency. Second, we design a parallel-accelerated depth-guided semi-direct tracking pipeline, namely Hierarchical Omnidirectional Feature Alignment (HOFA), with global spherical constraints for robust cross-camera matching, and fuse multi-camera depth observations into a standard depth filter for stable initialization. Finally, we develop a multi-camera-adapted ESKF backend that employs spherical bearing residuals and Schur complement marginalization to minimize computational overhead, enabling accurate real-time state estimation on resource-constrained devices. Extensive experiments on public benchmarks and a custom omnidirectional dataset show that Sphere-VIO achieves superior trade-offs between accuracy, robustness, efficiency, and cross-camera generality.

View PDFOpen arXiv