Accelerating Transformer-Based Monocular SLAM via Geometric Utility Scoring
2026-04-09 • Computer Vision and Pattern Recognition
Computer Vision and Pattern RecognitionArtificial IntelligenceRobotics
AI summaryⓘ
The authors focus on improving a type of 3D mapping system called Geometric Foundation Models used in monocular SLAM (Simultaneous Localization and Mapping). They noticed that processing every video frame is slow and wastes lots of computing power. To fix this, they created LeanGate, a lightweight network that predicts whether a frame is worth processing before doing heavy computations. This approach skips most unnecessary frames, making the system much faster while keeping accuracy intact.
Monocular SLAMGeometric Foundation ModelsKeyframe selectionFrame gating3D mappingComputational efficiencyFeed-forward networkFeature extractionTracking accuracyThroughput speedup
Authors
Xinmiao Xiong, Bangya Liu, Hao Wang, Dayou Li, Nuo Chen, Andrew Feng, Mingyu Ding, Suman Banerjee, Yang Zhou, Zhiwen Fan
Abstract
Geometric Foundation Models (GFMs) have recently advanced monocular SLAM by providing robust, calibration-free 3D priors. However, deploying these models on dense video streams introduces significant computational redundancy. Current GFM-based SLAM systems typically rely on post hoc keyframe selection. Because of this, they must perform expensive dense geometric decoding simply to determine whether a frame contains novel geometry, resulting in late rejection and wasted computation. To mitigate this inefficiency, we propose LeanGate, a lightweight feed-forward frame-gating network. LeanGate predicts a geometric utility score to assess a frame's mapping value prior to the heavy GFM feature extraction and matching stages. As a predictive plug-and-play module, our approach bypasses over 90% of redundant frames. Evaluations on standard SLAM benchmarks demonstrate that LeanGate reduces tracking FLOPs by more than 85% and achieves a 5x end-to-end throughput speedup. Furthermore, it maintains the tracking and mapping accuracy of dense baselines.