G-DRAGON: Geospatial Reasoning and Dynamic Planning for Retrieval-Augmented Outdoor Navigation
2026-05-25 • Robotics
Robotics
AI summaryⓘ
The authors developed G-DRAGON, a system that helps autonomous ground robots navigate large outdoor spaces by understanding natural language commands. Unlike previous methods, their approach accurately connects language instructions to map locations and plans routes, while also handling close-up exploration when the robot reaches its destination. They tested their system in both simulated and real urban environments, where it successfully completed missions like searching for people over distances up to 500 meters. Their work improves robot navigation by combining map data and exploration techniques in a new way.
Visual-Language NavigationOpenStreetMapLarge Language ModelsSLAM (Simultaneous Localization and Mapping)Frontier-based ExplorationSemantic Voxel MappingUnmanned Ground VehicleGeospatial WaypointsTopological Route Planning
Authors
Dongzhihan Wang, Yi Du, Jianan Sun, Yuan Xue, Yingchen Zhang, Bing Xiao, Chen Wang, Liang Xu
Abstract
Autonomous ground robots operating in large-scale outdoor environments require both robust long-range navigation and fine-grained ''last-mile'' exploration. Current advances in visual-language navigation (VLN) work well at short-range tasks, lacking geospatial grounding for long-distance missions. Some OpenStreetMap (OSM)-based methods relying on cloud-based Large Language Models (LLMs) are prone to factual hallucination and cannot conduct ''last-mile'' exploration based on human instruction. To address these challenges, we present G-DRAGON, a retrieval-augmented framework for outdoor, open-world navigation. This framework maps natural-language commands to versioned, local OSM entities via generative retrieval based on lightweight LLM, yielding accurate coordinates for global route planning. A high-level planning module bridges global topological routes with the SLAM system, projecting geospatial waypoints into the robot's navigable frame. For the ''last mile," the framework transitions to frontier-based exploration and open-set semantic voxel mapping to localize open-vocabulary targets. Experimental results in simulation demonstrate our framework outperforms state-of-the-art baselines. Furthermore, we validate the system in unseen real-world urban environments on an Unmanned Ground Vehicle (UGV), successfully completing person-search missions with trajectories of up to 500m.