On Reliability of Efficient Membership Inference Vulnerability Evaluation

2026-05-25Machine Learning

Machine LearningCryptography and Security
AI summary

The authors studied how a common way to test if models leak private training data, called membership inference attacks (MIAs), can be misleading. They found that combining data from many individuals to measure how often attacks falsely accuse a sample (false positive rate) isn't accurate for checking privacy guarantees. To fix this, they suggest a method to better adjust these false positive rates for each sample. They also discovered a bias in a popular attack method that makes models seem more vulnerable than they really are.

membership inference attacksfalse positive ratetrue positive ratedifferential privacylikelihood-ratio attackcalibrationfinite population biastraining data leakagemodel vulnerabilityprivacy auditing
Authors
Joonas Jälkö, Gauri Pradhan, Ossi Räisä, Antti Honkela
Abstract
Membership inference attacks (MIAs) are popular methods for empirically assessing the leakage of sensitive information in the training data through models or statistics learned from the data. The MIA vulnerability is often evaluated through false positive rate (FPR) and true positive rate (TPR) of a binary classifier that tries to predict whether a particular sample was in the training data. However, in order to reliably estimate the TPR especially for low FPR values, a lot of observations are needed, which in case of MIA translates to many target models, leading to large computational cost. To avoid excessive compute requirements, the MIA scores are often averaged over multiple individuals and multiple targeted models. We demonstrate two key weaknesses in this efficient MIA evaluation pipeline. First, we show that evaluating the TPR based on MIA scores concatenated across multiple individuals, commonly used to study vulnerabilities in the very low FPR regime, is not calibrated across the per-sample FPRs. This makes it unreliable as a tool for auditing differential privacy. To solve this, we propose a post-processing method to effectively calibrate the FPR across different samples. Second, we identify a finite population bias in the commonly used efficient likelihood-ratio attack (LiRA) implementation proposed by Carlini et al. 2022, leading to a positive bias in the per-sample vulnerability.