Byzantine-Robust and Differentially Private Federated Optimization under Weaker Assumptions

2026-03-24Machine Learning

Machine LearningCryptography and Security
AI summary

The authors developed a new method called Byz-Clip21-SGD2M to help multiple users train a shared model together without sharing their private data. Their approach protects against tricky attacks from faulty users and keeps the training process private and secure. Unlike previous methods, theirs works under more realistic conditions and they prove it can reliably learn good models. They tested their method on common image tasks and showed it works well in practice.

Federated LearningDifferential PrivacyByzantine RobustnessGradient ClippingMomentum OptimizationConvergence GuaranteesL-smoothnessSub-Gaussian NoiseRobust AggregationCNN
Authors
Rustem Islamov, Grigory Malinovsky, Alexander Gaponov, Aurelien Lucchi, Peter Richtárik, Eduard Gorbunov
Abstract
Federated Learning (FL) enables heterogeneous clients to collaboratively train a shared model without centralizing their raw data, offering an inherent level of privacy. However, gradients and model updates can still leak sensitive information, while malicious servers may mount adversarial attacks such as Byzantine manipulation. These vulnerabilities highlight the need to address differential privacy (DP) and Byzantine robustness within a unified framework. Existing approaches, however, often rely on unrealistic assumptions such as bounded gradients, require auxiliary server-side datasets, or fail to provide convergence guarantees. We address these limitations by proposing Byz-Clip21-SGD2M, a new algorithm that integrates robust aggregation with double momentum and carefully designed clipping. We prove high-probability convergence guarantees under standard $L$-smoothness and $σ$-sub-Gaussian gradient noise assumptions, thereby relaxing conditions that dominate prior work. Our analysis recovers state-of-the-art convergence rates in the absence of adversaries and improves utility guarantees under Byzantine and DP settings. Empirical evaluations on CNN and MLP models trained on MNIST further validate the effectiveness of our approach.