Towards Precise Intent-Aligned VLA Aerial Navigation via Expert-Guided GRPO

2026-06-01 • Robotics

Robotics

AI summaryⓘ

The authors developed a new way to improve how drones understand and follow detailed flying instructions using a mix of vision, language, and action models. They found that training these models with standard methods falls short because of limited data and difficulty learning complex commands. To fix this, they created a reinforcement learning system called EG-GRPO that uses expert examples to guide the drone's learning. Their method makes training faster and helps drones complete tasks more successfully and accurately according to human intent.

Vision-Language-Action modelsUnmanned Aerial Vehicles (UAVs)Supervised Fine-TuningReinforcement LearningExpert GuidancePolicy OptimizationSimulationIntent AlignmentAerial NavigationFew-shot Learning

Authors

Tianyang Chen, Wenjun Li, Xin zhou, Yuze Wu, Fei Gao

Abstract

Vision-Language-Action (VLA) models offer a promising end-to-end paradigm for unmanned aerial vehicles (UAVs) to accomplish complex tasks specified by fine-grained instructions. However, standard supervised fine-tuning (SFT) suffers from data scarcity, limited generalization, and weak supervision for nuanced and complicated human intents. Reinforcement fine-tuning offers a natural way to mitigate these challenges and align policy behaviors with human intents through designable feedback, but applying it to aerial navigation remains challenging due to inefficient exploration in expansive continuous spaces. To address these challenges, we introduce an efficient reinforcement learning (RL) framework for VLA-based aerial navigation. At its core, we propose EG-GRPO (Expert-Guided Group Relative Policy Optimization) to augment online rollouts with few-shot expert data. Additionally, we design a heterogeneous pipeline enabling parallel simulation and inference, which reduces rollout time by 43.5%. Across multiple tasks specified by complex human intents, EG-GRPO improves the success rate to 2.13x that of the SFT baseline, while improving intent alignment performance by 60.9%. These results demonstrate that our framework can move aerial navigation toward precise intent-aligned flight.

View PDFOpen arXiv