Robust and Efficient Monocular 3D Gaussian SLAM for Kilometer-Scale Outdoor Scenes
2026-06-29 • Computer Vision and Pattern Recognition
Computer Vision and Pattern Recognition
AI summaryⓘ
The authors developed KiloGS-SLAM, a system that improves 3D mapping using a single camera over very large outdoor areas. They created a special tracking method that switches between different mathematical tools to keep the camera's position accurate and avoid errors. They also designed a way to manage memory efficiently by organizing and simplifying the 3D map without losing important details. Their experiments showed the system works well on long sequences using limited computer resources.
Monocular SLAM3D Gaussian SplattingPose TrackingEssential MatrixPnP (Perspective-n-Point)Motion-Adaptive TrackingMemory ManagementChunk-Based MappingMulti-View DensificationGeometric Degeneracies
Authors
Sicheng Yu, Dongxu Shen, Beizhen Zhao, Guanzhi Ding, Hao Wang
Abstract
Scaling monocular 3D Gaussian Splatting (3DGS) SLAM to kilometer-level outdoor environments poses two tightly coupled challenges: fragile long-term pose tracking and excessive memory overhead during large-scale mapping. In this paper, we propose KiloGS-SLAM, a highly efficient and robust monocular 3DGS-SLAM system that jointly addresses both bottlenecks. Since high-fidelity scene reconstruction fundamentally relies on drift-free camera poses, we first introduce a motion-adaptive hybrid tracking module. This module features a condition-triggered three-tier solving pipeline. It dynamically switches between Essential matrix and PnP models to handle geometric degeneracies. An on-demand foundation model can also be activated to rescue the trajectory from catastrophic drift. To ensure the system can sustain these long trajectories without memory exhaustion, we subsequently design a lifecycle-managed Gaussian mapping strategy. By integrating probabilistic initialization with chunk-based multi-view densification and pruning, this full-pipeline optimization effectively reduces primitive redundancy while preserving high-frequency details. Together, the robust tracking guarantees the geometric foundation required for accurate mapping, while the memory-efficient lifecycle-managed mapping enables large-scale operation. Extensive experiments across three challenging outdoor datasets demonstrate that our approach achieves state-of-the-art tracking accuracy and rendering quality, successfully scaling to sequences of over 10,000 frames on a single GPU.