PrivFedTalk: Privacy-Aware Federated Diffusion with Identity-Stable Adapters for Personalized Talking-Head Generation

2026-04-09 • Cryptography and Security

Cryptography and SecurityArtificial IntelligenceComputer Vision and Pattern RecognitionMachine Learning

AI summaryⓘ

The authors present PrivFedTalk, a new way to create personalized talking-head videos without sharing sensitive face and speech data. Instead of sending raw data, each user’s device trains small, identity-specific parts of the model locally, while a shared model is trained across all users. They introduce methods to keep the identity consistent and stable in the videos and protect privacy during data updates. Their experiments show it is possible to train these models privately and efficiently across devices. However, the authors note more testing is needed to fully confirm privacy and quality claims.

Talking-head generationFederated learningDiffusion modelsPersonalized modelsLoRA adaptersIdentity consistencyDifferential privacySecure aggregationTemporal denoisingFederated optimization

Authors

Soumya Mazumdar, Vineet Kumar Rakesh, Tapas Samanta

Abstract

Talking-head generation has advanced rapidly with diffusion-based generative models, but training usually depends on centralized face-video and speech datasets, raising major privacy concerns. The problem is more acute for personalized talking-head generation, where identity-specific data are highly sensitive and often cannot be pooled across users or devices. PrivFedTalk is presented as a privacy-aware federated framework for personalized talking-head generation that combines conditional latent diffusion with parameter-efficient identity adaptation. A shared diffusion backbone is trained across clients, while each client learns lightweight LoRA identity adapters from local private audio-visual data, avoiding raw data sharing and reducing communication cost. To address heterogeneous client distributions, Identity-Stable Federated Aggregation (ISFA) weights client updates using privacy-safe scalar reliability signals computed from on-device identity consistency and temporal stability estimates. Temporal-Denoising Consistency (TDC) regularization is introduced to reduce inter-frame drift, flicker, and identity drift during federated denoising. To limit update-side privacy risk, secure aggregation and client-level differential privacy are applied to adapter updates. The implementation supports both low-memory GPU execution and multi-GPU client-parallel training on heterogeneous shared hardware. Comparative experiments on the present setup across multiple training and aggregation conditions with PrivFedTalk, FedAvg, and FedProx show stable federated optimization and successful end-to-end training and evaluation under constrained resources. The results support the feasibility of privacy-aware personalized talking-head training in federated environments, while suggesting that stronger component-wise, privacy-utility, and qualitative claims need further standardized evaluation.

View PDFOpen arXiv