Diffusion Offline Reinforcement Learning for Fair and Energy-Efficient UAV-Assisted Wireless Networks

2026-06-15Machine Learning

Machine Learning
AI summary

The authors developed a new method called Diffusion-SAC to help drones in wireless networks decide the best paths and timing to send data more efficiently. They combined two techniques: one that learns safely from past data (CQL) and another that generates smart guesses (diffusion models) to improve learning even when data is limited or changing. Their approach helps reduce energy use and make data sharing fairer between devices. Tests showed their method works better than older approaches, using less energy and achieving faster data speeds. This could help future wireless systems like 6G work smarter and use resources better.

Generative Artificial IntelligenceWireless CommunicationSignal Processing6G NetworksOffline Reinforcement LearningConservative Q-Learning (CQL)Denoising Diffusion Probabilistic Models (DDPMs)Unmanned Aerial Vehicle (UAV) NetworksTrajectory and Scheduling ControlEnergy Efficiency
Authors
Eslam Eldeeb, Hirley Alves
Abstract
The integration of generative artificial intelligence with wireless communication and signal processing systems has opened new avenues for intelligent, data-driven decision-making in future 6G networks. This work proposes a diffusion soft actor-critic (Diffusion-SAC) approach that leverages offline reinforcement learning (RL) enhanced by denoising diffusion probabilistic models (DDPMs) to optimize trajectory and scheduling control in unmanned aerial vehicle (UAV) networks. While offline RL methods, such as conservative Q-learning (CQL), can learn from static datasets, they often struggle to generalize in low-data or dynamic conditions. To address this, we combine the robustness of CQL with the generative power of diffusion models, enabling expressive and signal-aware policy learning that generalizes beyond behavior policies. Applied to a UAV-assisted wireless network, the proposed framework minimizes transmission energy and improves fairness among devices. Simulations show that Diffusion-SAC outperforms standard offline RL baselines, achieving more stable convergence and higher rewards even with limited datasets. The method enhances data efficiency, reduces energy consumption, and increases throughput by more than 35 % compared to existing algorithms, demonstrating its potential for robust policy learning in next-generation wireless control systems.