FedMTFI: Feature Importance Based Optimized Multi Teacher Knowledge Distillation in Heterogeneous Federated Learning Environment

2026-06-01Machine Learning

Machine LearningArtificial Intelligence
AI summary

The authors developed a new method called FedMTFI to improve federated learning, where multiple devices train AI models without sharing private data. Their method groups devices with similar hardware and models, training separate models on unevenly distributed data. Then, these models teach a global model by focusing on important features, identified using Shapley values, which helps the overall system learn better. Their experiments show FedMTFI works more accurately than usual methods, especially when data is not evenly spread out.

Federated LearningNon-IID DataKnowledge DistillationMulti-Teacher LearningShapley ValuesFeature ImportanceClient ClusteringFedAvgModel Aggregation
Authors
Nazmus Shakib Shadin, Aaron Cummings, Xinyue Zhang, Bobin Deng
Abstract
Federated learning (FL) is a decentralized approach that enables collaborative model training without exposing raw data. Instead of transferring sensitive data, it allows devices to share only model weights, keeping personal data locally and secure. However, in real world settings, the data held by devices is often not evenly distributed and devices mostly differ in computing power and memory capacity. These differences make FL harder to maintain consistent performance across the system. To address these issues, we propose FedMTFI, a novel architecture that combines multi-teacher knowledge distillation (MTKD) with feature importance to improve the FL process in heterogeneous environments. In FedMTFI, clients are clustered based on similar hardware and model types. Each cluster trains a specific model on not independently and identically distributed (non-IID) data. Within a cluster, every client updates that model using only its own local private data. The server then aggregates the locally trained models in each cluster using FedAvg to form multiple prototype models. Then these prototypes serve as teacher models to train a global generalized student model using MTKD. What makes FedMTFI more unique is the integration of Shapley values (SHAP) to emphasize important features during distillation, which enhances both accuracy and interpretability. Experimental results show that FedMTFI achieves higher accuracy than traditional FL algorithms and performs more effectively under non-IID data conditions.