Efficient Unlearning through Maximizing Relearning Convergence Delay
2026-04-10 • Machine Learning
Machine LearningComputer Vision and Pattern Recognition
AI summaryⓘ
The authors tackle the problem of removing unwanted data from trained models, which is hard because current methods only look at how the model predicts, not how it really 'understands' the data. They introduce a new way to measure forgetting by tracking how long it takes for a model to relearn removed data, capturing changes in both model weights and predictions. Using this, they build a method that effectively removes the unwanted data's influence while keeping overall accuracy. Their experiments show this method works better than previous ones and provide theoretical support for its effectiveness.
machine unlearningmodel weightsprediction spaceweight decaynoise injectionrelearning convergence delayclassificationgenerative modelsdata contaminationmodel evaluation metrics
Authors
Khoa Tran, Simon S. Woo
Abstract
Machine unlearning poses challenges in removing mislabeled, contaminated, or problematic data from a pretrained model. Current unlearning approaches and evaluation metrics are solely focused on model predictions, which limits insight into the model's true underlying data characteristics. To address this issue, we introduce a new metric called relearning convergence delay, which captures both changes in weight space and prediction space, providing a more comprehensive assessment of the model's understanding of the forgotten dataset. This metric can be used to assess the risk of forgotten data being recovered from the unlearned model. Based on this, we propose the Influence Eliminating Unlearning framework, which removes the influence of the forgetting set by degrading its performance and incorporates weight decay and injecting noise into the model's weights, while maintaining accuracy on the retaining set. Extensive experiments show that our method outperforms existing metrics and our proposed relearning convergence delay metric, approaching ideal unlearning performance. We provide theoretical guarantees, including exponential convergence and upper bounds, as well as empirical evidence of strong retention and resistance to relearning in both classification and generative unlearning tasks.