VGP-Nav: Metric-Aware Visual Geometric Perception for Robot Navigation

2026-06-08 • Robotics

Robotics

AI summaryⓘ

The authors present VGP-Nav, a system that helps robots navigate using just one regular camera instead of multiple sensors like LiDAR and cameras combined. Their method uses the ground plane to understand real-world sizes and distances, which is normally hard with a single camera. This lets the robot figure out where it is and detect obstacles accurately in metric terms, all from just visual input. Their tests show it works well in different places and can be used on actual robots, making navigation simpler and cheaper.

monocular visionrobotic navigationglobal localizationobstacle perceptionscale ambiguityground-plane geometrymetric perceptionRGB camerasensor fusionautonomous robots

Authors

Hewei Pan, Weiye Zhu, Zekai Zhang, Zitong Huang, Rongtao Xu, Jinbao Wang, Feng Zheng

Abstract

Reliable robotic navigation necessitates the seamless integration of accurate global localization and dense, metric-consistent obstacle perception. A common strategy to achieve these capabilities involves integrating diverse sensing modalities: cameras offer rich visual features for localization, while active sensors like LiDAR provide direct metric measurements. However, such multi-sensor configurations necessitate complex spatial-temporal calibration and increase deployment overhead. Although vision-only approaches offer a low-cost and scalable alternative, existing monocular visual systems typically struggle to simultaneously achieve efficient, globally consistent localization and dense, metric-consistent geometric perception. To bridge this gap, we propose \textbf{VGP-Nav}, a unified framework for \textit{Metric-Aware Visual Geometric Perception} that relies solely on monocular RGB input to jointly support metric localization and obstacle perception. Our key insight is to anchor localization-grounded visual geometry to physically meaningful scale constraints derived from ground-plane geometry, thereby providing a reliable metric reference for monocular perception. VGP-Nav resolves monocular scale ambiguity online and produces localization-grounded, metric obstacle representations that are directly applicable to downstream planning. Extensive experiments demonstrate strong generalization across diverse environments and successful deployment on real mobile robots, highlighting the practicality of our approach for scalable, low-cost, and safe autonomous navigation.

View PDFOpen arXiv