Repurposing Adversarial Perturbations for Continual Learning: From Defense to Active Alignment

2026-06-01Machine Learning

Machine LearningArtificial Intelligence
AI summary

The authors created AdvCL to help language models learn new tasks continuously without forgetting old ones or becoming too sensitive to tricky changes. They use small, controlled changes called adversarial perturbations to keep the model stable and balanced between old and new tasks. AdvCL has three parts that work together to keep learning smooth, avoid overfitting to the current task, and maintain connections to previous tasks. Tests show this approach improves learning, reduces forgetting, and works well across different learning methods.

continual learningadversarial perturbationslanguage modelstask forgettingtransfer learningprototyperepresentation alignmentregularizationdynamic architectures
Authors
Ran Liu, Min Yu, Mingqi Liu, Jianguo Jiang, Gang Li, Rongsheng Li, Ning Li, Zhen Xu, Weiqing Huang, Ming Liu
Abstract
In dynamic environments, large language models need to keep adapting to new tasks, but continual learning often suffers from forgetting, limited transfer, and vulnerability to adversarial perturbations. To address this, we present AdvCL, which repurposes adversarial perturbations as a geometric control signal for stable continual adaptation. AdvCL combines three plug-in modules: Intra-Smooth promotes local smoothness via small adversarial perturbations; Proto-Clip uses similarity clipping to prevent excessive alignment to current task prototype; and Inter-Align applies directional alignment toward previous task prototype to reduce representational gaps. Experiments show consistent gains in both standard performance and robustness, with lower forgetting and stronger transfer. We further analyze key mechanisms by quantifying the sensitivity of Intra-Smooth to perturbation settings and the effect of Inter-Align on task similarity and geometric distance. In summary, the modules provide complementary gains when combined, and each can also be integrated individually into diverse CL paradigms, including replay, regularization, and dynamic architectures, thereby offering a geometric control mechanism for continual learning.