Discovering Collaboration from Novelty: Random Network Distillation for Clustered Federated Learning
2026-06-29 • Machine Learning
Machine Learning
AI summaryⓘ
The authors address a common problem in Federated Learning where data from different clients can be very different, making it hard for one shared model to work well for everyone. They propose a simple way to group similar clients before training by using a technique called Random Network Distillation, which measures how unique each client's data is without sharing actual data. This method lets clients form meaningful groups on their own without extra communication or heavy computation during training, making it useful for large, decentralized systems where group numbers aren't known ahead of time. By separating the grouping step from the training, their approach is more efficient and flexible.
Federated LearningNon-IID dataClustered Federated LearningRandom Network DistillationClient similarityNovelty detectionDistributed systemsModel clusteringAutonomous collaboration
Authors
Davide Domini, Gianluca Aguzzi, Ivana Dusparic, Danilo Pianini, Mirko Viroli
Abstract
Federated Learning often suffers under non-independently and identically distributed data, where a single global model may fail to represent the diversity of client distributions. Clustered Federated Learning mitigates this issue by training specialized models for groups of similar clients, but existing approaches often couple cluster assignment with the main training loop, increasing computational and communication costs. We propose a lightweight clustering approach based on Random Network Distillation. Each client trains a compact Random Network Distillation predictor on its local data and uses its prediction error as a novelty signal to estimate similarity with other clients. This enables the discovery of meaningful client groups before federated training, without sharing raw data or repeatedly evaluating the main model. Crucially, the resulting federations emerge from local novelty estimates at runtime, making the method suitable for autonomous large-scale distributed systems where neither the number of clusters nor the collaboration structure can be specified a priori. Overall, by decoupling clustering from learning, the method provides a task-agnostic and efficient mechanism for autonomous collaboration under non-independently and identically distributed data.