Network Distributed Multi-Agent Reinforcement Learning for Consensus Control of Quadcopters

2026-06-01Robotics

RoboticsArtificial IntelligenceMachine Learning
AI summary

The authors present a system called ND-MARL for controlling groups of quadcopters so they can agree on positions using limited communication with just two neighbors each. Unlike traditional methods that need either a central controller or fully independent agents, their approach lets each quadcopter make decisions based on local neighbor info via a distributed policy. They train the system with a method called MASAC and combine it with a low-level controller to follow target positions smoothly. Their approach works well even when scaled up from 3 to 250 agents without retraining, though larger groups show some spread due to limited communication. This shows their method can handle distributed, communication-based control effectively.

Multi-Agent Reinforcement LearningQuacopter Consensus ControlDistributed Policy2-Neighbor Communication TopologyMulti-Agent Soft Actor-CriticHierarchical ControlZero-Shot ScalabilitySwarm Communication GraphPlanner-Tracker Integration
Authors
Youssef Mahran, Zeyad Gamal, Aamir Ahmad, Ayman El-Badawy
Abstract
This paper proposes a Network Distributed Multi-Agent Reinforcement Learning (ND-MARL) framework for quadcopter consensus control. Compared to conventional multi-agent MARL formulations that rely on centralized planning or fully decentralized execution, ND-MARL incorporates the swarm communication graph into the decision process. Under a 2-Neighbor communication topology, each agent observes information of only two neighbors and outputs an action through a distributed policy. A high-level distributed consensus planner is trained using Multi-Agent Soft Actor-Critic (MASAC) and embedded in a hierarchical stack to generate reference target positions tracked by a low-level quadcopter controller. Results demonstrate smooth consensus trajectories and planner-tracker integration when compared to a centralized MARL controller. Most notably, the learned controller exhibits zero-shot scalability, as policies trained on a three-agent system are deployed to swarms of up to 250 agents under the same 2-Neighbor communication topology without retraining or fine-tuning, achieving consistent convergence with increasing steady-state spread at large team sizes due to sparse information propagation. These findings highlight ND-MARL as a stable framework for distributed, communication-aware quadcopter consensus control.