Joint Optimization of Training and Inference in Federated Edge Learning via Constrained Multi-Objective Deep Reinforcement Learning
2026-05-25 • Machine Learning
Machine LearningDistributed, Parallel, and Cluster ComputingNetworking and Internet Architecture
AI summaryⓘ
The authors study a way for edge devices (like smartphones or sensors) to learn together without sharing raw data, which keeps privacy safe. They focus on managing both training the shared model and making quick predictions on devices that don't have much power or memory. To handle this, they create a method that balances accuracy, speed, and energy use by making smart choices about how devices work and communicate. Because the problem is very complex, they use a special type of learning called constrained multi-objective reinforcement learning to find good solutions. Their tests show this method works better than other approaches for different setups.
Federated edge learningEdge intelligenceMulti-objective optimizationMarkov decision processProximal policy optimizationResource allocationData freshnessModel freshnessLatencyEnergy consumption
Authors
Zhen Li, Jun Cai, Chao Yang, Haoran Gao
Abstract
Federated edge learning (FEEL) has recently emerged as a promising paradigm for achieving edge intelligence (EI) via enabling collaborative model training across edge devices while protecting data privacy. In this paper, we put forth an online optimization framework that jointly manages federated training and inference on resource-constrained edge devices. We introduce a tandem-queue-inspired conversion mechanism that bridges inference requests and training data, and further incorporate both data and model freshness into the accuracy formulation to capture temporal dynamics in real-world environments. To maximize inference accuracy while minimizing latency and energy consumption, the mode selections, communication, and computation resource allocations of edge devices are jointly optimized. We formulate this optimization as a multi-objective optimization problem, which is NP-hard and further complicated by the online setting. To address these challenges, we transform the problem into a multi-objective Markov decision process (MOMDP) and develop a \underline{c}onstrained \underline{m}ulti-\underline{o}bjective \underline{p}roximal \underline{p}olicy \underline{o}ptimization (C-MOPPO) algorithm. Specifically, C-MOPPO first learns a set of policies with different preferences across three objectives, then leverages constrained policy optimization to enrich the Pareto front and obtain high-quality, dense solutions. Extensive experiments demonstrate that C-MOPPO achieves well-balanced trade-offs among objectives and significantly outperforms baselines under various system configurations.