Amplifying Membership Signal Through Chained Regeneration
2026-06-30 • Machine Learning
Machine LearningArtificial Intelligence
AI summaryⓘ
The authors address the problem that large AI models sometimes memorize their training data, which can cause privacy and copyright issues. They propose MADreMIA, a new method to better detect whether specific data was used in training by generating content repeatedly and checking how well it holds up over time. Unlike previous methods that only look at single outputs, their approach uses multiple steps and works with different kinds of models and data types. They show that their method gives clearer signals to identify memorized data without needing to train extra models.
Generative ModelsMembership Inference Attack (MIA)Dataset Inference (DI)Model Autophagy Disorder (MAD)Shadow ModelsIterative RegenerationCoherenceFalse Positive Rate (FPR)Diffusion ModelsWhite-box and Black-box Attacks
Authors
Wojciech Łapacz, Stanisław Pawlak
Abstract
The tendency of large generative models to memorize training data makes sample verification critical for privacy auditing and copyright enforcement. Current membership (MIA) and dataset inference (DI) attacks often rely on one-shot generations, which yield weak signals and limited sensitivity across modalities. Inspired by Model Autophagy Disorder (MAD), we introduce MADreMIA, a model-agnostic framework that enhances white-, gray-, and black-box MIA and DI. Rather than relying on shadow model training -- often infeasible for large generative models -- our framework facilitates scalable inference by leveraging inherent signals through iterative trajectories. This process utilizes chained generations across diverse modalities, where each output serves as the subsequent input, to improve membership evidence at low FPR. We demonstrate that memorized training samples exhibit significantly higher coherence and slower degradation during iterative regeneration than non-member generations. Our results show that MADreMIA provides richer signals across diverse model families and modalities; we present comprehensive evaluations for IARs, diffusion, and language models, alongside preliminary results demonstrating its potential for audio models.