Boosting Multimodal Federated Learning via Chained Modality Optimization

2026-06-01Distributed, Parallel, and Cluster Computing

Distributed, Parallel, and Cluster ComputingArtificial Intelligence
AI summary

The authors tackle a problem in multimodal federated learning where some data types (modalities) overshadow others during training, which hurts the overall model. They propose FedMChain, a method that trains each modality one at a time locally to give all data types a fair chance and uses a special technique to help modalities work well together. On the server side, they combine updates carefully to avoid mixing errors and reduce communication needs. Tests show their approach improves model accuracy while requiring less frequent communication between clients and server.

Multimodal Federated LearningModality CompetitionLocal OptimizationCross-Modal ComplementarityError-Compensated RegularizerSparse AggregationSign-Guided AggregationDecentralized LearningCommunication EfficiencyHeterogeneous Data
Authors
Zixin Zhang, Fan Qi, Shuai Li, Xiaoshan Yang, Changsheng Xu
Abstract
Multimodal Federated Learning (MMFL) enables privacy-preserving collaborative learning across decentralized clients with heterogeneous data and modality availability. However, most existing MMFL methods cast multimodal training as a joint optimization problem, overlooking a key bottleneck: modality competition, where dominant modalities suppress weaker ones and lead to suboptimal global models. To address this, we propose FedMChain, a balanced MMFL framework that structures federated multimodal training as a chain of modality-wise phases. This phase-wise design gives each modality a dedicated local optimization window on multimodal clients to mitigate modality competition, and further promotes cross-modal complementarity via an error-compensated regularizer. On the server side, we employ a sparse sign-guided aggregation strategy that leverages directional sign agreement for robust intra-modality aggregation, avoids destructive averaging, and supports less frequent synchronization to reduce communication overhead. Extensive experiments on multimodal benchmarks demonstrate that FedMChain consistently improves predictive performance while requiring less frequent communication than baselines.