Game-Theoretic Multi-Agent Reinforcement Learning for Swarm Trajectory Planning in Low-Altitude Wireless Networks

2026-06-15 • Computer Science and Game Theory

Computer Science and Game Theory

AI summaryⓘ

The authors study how groups of drones (UAVs) flying low and communicating with cell towers can best plan their paths to complete missions efficiently while sharing limited wireless resources. They point out that when multiple drones use the same cellular network, the resources get divided and affect each other's communication and flight routes, especially across multiple cell towers. To solve this, they create a new game-based model and a smart training method that helps drones coordinate better. Their approach performs better than other methods in simulations, leading to improved mission success rates and efficient use of shared wireless resources.

Low-Altitude EconomyUAV (Unmanned Aerial Vehicle)Wireless Resource AllocationMulti-Cell Networks5G New RadioProximal Policy OptimizationMulti-Agent Reinforcement LearningTrajectory PlanningCongestion GameCommunication Throughput

Authors

Nguyen Duc Minh Quang, Ruoxi Chong, Zhiqiang Wei, Chang Liu, Derrick Wing Kwan Ng

Abstract

The Low-Altitude Economy (LAE) is rapidly expanding, giving rise to low-altitude wireless networks (LAWNs), where large-scale cellular-connected unmanned aerial vehicle (UAV) deployments support heterogeneous mission-critical applications over multi-cell ground base station (GBS) infrastructures. To ensure mission success, each UAV must jointly optimize communication throughput and mission completion efficiency. In fifth-generation (5G) new radio (NR) systems, the equal resource block (RB) allocation policy induces strong strategic coupling among UAV trajectories: when a UAV enters a GBS cell, it reduces the RB share available to all co-served UAVs, thereby altering their achievable rates and trajectory incentives through shared wireless resources. Existing studies either ignore this coupling or focus on single-cell infrastructure, leaving the multi-cell, congestion-aware UAV trajectory planning problem insufficiently addressed. To fill this gap, we formulate the problem as a cooperative stochastic congestion game with a communication-and-mission-aware utility function, and propose a centralized-training decentralized-execution multi-agent proximal policy optimization (CTDE-MAPPO) algorithm to maximize social welfare under multi-cell RB congestion. Simulation results show that the proposed method outperforms QMIX, independent Q-learning, and random baselines in terms of aggregate utility and mission success rate, while achieving stable convergence within practical training budgets.

View PDFOpen arXiv