Continual Speaker Identity Unlearning with Minimal Interference
2026-05-25 • Sound
SoundArtificial Intelligence
AI summaryⓘ
The authors studied how to remove the ability of text-to-speech models to mimic specific speakers, a process called speaker identity unlearning. They found that previous methods fail when new unlearning requests come one after another, because they accidentally restore voices that were supposed to be forgotten. To fix this, the authors created a new method called CORTIS, which can forget speakers one by one without needing old data and without undoing prior unlearning. Their approach uses mathematical techniques to isolate and protect important parts of the model related to forgotten speakers. Tests showed their method works better over long sequences of unlearning requests.
machine unlearningspeaker identity unlearningzero-shot text-to-speechFisher informationparameter maskingorthogonal projectioncontinual learningprivacyVoiceBox
Authors
Jinju Kim, Yunsung Kang, Gyeong-Moon Park, Jong Hwan Ko
Abstract
Machine unlearning removes designated concepts or knowledge from pre-trained models. Recent work has extended this paradigm to speaker identity unlearning in zero-shot text-to-speech (ZS-TTS), the task of selectively erasing a model's ability to replicate a speaker's voice. Existing methods, however, quietly assume all unlearning requests arrive at once; an unrealistic assumption, since privacy-motivated removals arrive sequentially over time. We show this assumption breaks state-of-the-art methods: unlearning each new speaker fully revives previously unlearned speakers, reintroducing the very privacy risk unlearning was meant to eliminate. We present Cumulative ORThogonal Identity Suppression (CORTIS), the first framework for continual speaker identity unlearning in ZS-TTS that requires no access to previously-unlearned speaker data. CORTIS combines Fisher-information-based parameter masking, which localizes updates to speaker-relevant weights, with orthogonal projection against subspaces spanned by prior unlearning updates. With VoiceBox, CORTIS unlearns each requested speaker while keeping previously unlearned speakers forgotten across long request sequences, substantially outperforming sequential application of prior methods. The demo is available at https://cumulativeortis.github.io/ .