Plasticity-Enhanced Multi-Agent Mixture of Experts for Dynamic Objective Adaptation in UAVs-Assisted Emergency Communication Networks

2026-04-10 • Multiagent Systems

Multiagent SystemsMachine LearningNetworking and Internet Architecture

AI summaryⓘ

The authors study drone networks that act as flying cell towers after disasters, where user movement and demand change rapidly, causing problems for AI systems adapting to new conditions. They introduce a new method called PE-MAMoE that uses multiple expert models and special controls to keep the AI flexible and avoid losing important skills. Their approach improves drone service quality, supports more users, and lowers connection conflicts in simulated mobile environments. They also provide theoretical guarantees and evidence that their method helps the AI recover from changes more reliably.

Unmanned Aerial VehiclesBase StationsDeep Reinforcement LearningPlasticityMixture of ExpertsProximal Policy OptimizationNon-stationarityEntropy AnnealingUser Mobility3GPP Channels

Authors

Wen Qiu, Zhiqiang He, Wei Zhao, Hiroshi Masui

Abstract

Unmanned aerial vehicles serving as aerial base stations can rapidly restore connectivity after disasters, yet abrupt changes in user mobility and traffic demands shift the quality of service trade-offs and induce strong non-stationarity. Deep reinforcement learning policies suffer from plasticity loss under such shifts, as representation collapse and neuron dormancy impair adaptation. We propose plasticity enhanced multi-agent mixture of experts (PE-MAMoE), a centralized training with decentralized execution framework built on multi-agent proximal policy optimization. PE-MAMoE equips each UAV with a sparsely gated mixture of experts actor whose router selects a single specialist per step. A non-parametric Phase Controller injects brief, expert-only stochastic perturbations after phase switches, resets the action log-standard-deviation, anneals entropy and learning rate, and schedules the router temperature, all to re-plasticize the policy without destabilizing safe behaviors. We derive a dynamic regret bound showing the tracking error scales with both environment variation and cumulative noise energy. In a phase-driven simulator with mobile users and 3GPP-style channels, PE-MAMoE improves normalized interquartile mean return by 26.3\% over the best baseline, increases served-user capacity by 12.8\%, and reduces collisions by approximately 75\%. Diagnostics confirm persistently higher expert feature rank and periodic dormant-neuron recovery at regime switches.

View PDFOpen arXiv