Learnability and Privacy Vulnerability are Entangled in a Few Critical Weights

2026-03-13 • Machine Learning

Machine LearningArtificial IntelligenceCryptography and Security

AI summaryⓘ

The authors noticed that only a small number of neural network weights make the model vulnerable to privacy attacks, but these same weights are important for keeping the model accurate. Instead of changing all the weights, they focus on identifying and adjusting just these critical weights by 'rewinding' them for fine-tuning. Their method helps protect privacy from attacks that try to tell if data was used in training, while still keeping the model working well.

Membership Inference Attacksneural networksweightsprivacy preservationfine-tuningutility performancerewinding weightsprivacy vulnerabilitymachine learning

Authors

Xingli Fang, Jung-Eun Kim

Abstract

Prior approaches for membership privacy preservation usually update or retrain all weights in neural networks, which is costly and can lead to unnecessary utility loss or even more serious misalignment in predictions between training data and non-training data. In this work, we observed three insights: i) privacy vulnerability exists in a very small fraction of weights; ii) however, most of those weights also critically impact utility performance; iii) the importance of weights stems from their locations rather than their values. According to these insights, to preserve privacy, we score critical weights, and instead of discarding those neurons, we rewind only the weights for fine-tuning. We show that, through extensive experiments, this mechanism exhibits outperforming resilience in most cases against Membership Inference Attacks while maintaining utility.

View PDFOpen arXiv