CoreUnlearn: Rethinking Concept Unlearning through Disentangled Component-Level Erasure in Text-guided Diffusion Models

2026-06-01 • Cryptography and Security

Cryptography and Security

AI summaryⓘ

The authors present CoreUnlearn, a new method to remove unwanted concepts from text-guided image generation models. Unlike older methods that rely on specific example phrases, CoreUnlearn breaks down concepts into parts and removes only the parts linked to the undesirable content. This approach helps keep the model working well while effectively erasing problematic ideas. Experiments show that CoreUnlearn is better at balancing concept removal and model performance.

diffusion modelsimage synthesisconcept erasuremodel fine-tuningembedding decompositiontext-guided generationalignment mechanismprivacy concernsmachine learningmodel robustness

Authors

Mengnan Zhao, Lihe Zhang, Baocai Yin

Abstract

Text guided diffusion models have revolutionized image synthesis but also raise ethical concerns, such as privacy violation and harmful content generation. To mitigate these issues, prevailing methods typically leverage an alignment mechanism, with predefined erasure references, to fine-tune pretrained model weights. However, these techniques are intrinsically limited by the representational capacity of textual space and display high sensitivity to the choice of predefined erasure references, e.g., suboptimal references may significantly affect the model utility preservation during erasure. To overcome these limitations, we introduce CoreUnlearn, aiming to disentangle and remove the erasure-critical component of the undesirable concept. Specifically, CoreUnlearn comprises a Component Extraction Module (CEM) and a Swap Disentangling Strategy (SDS). Guided by SDS, CEM is pre-trained to decompose concept embeddings into distinct component types. Leveraging this decomposition, CoreUnlearn then removes the erasure-critical component while retaining non-critical ones by fine-tuning model weights. Extensive experiments demonstrate that CoreUnlearn achieves effective concept erasure with minimal impact on overall model performance.

View PDFOpen arXiv