Theory of Continual Learning Against Data Poisoning Attacks
2026-06-29 • Machine Learning
Machine LearningComputer Science and Game Theory
AI summaryⓘ
The authors study how continual learning (CL) models, which learn from a series of tasks one after another, can be attacked by bad data that causes the models to fail. They develop a mathematical framework to understand these attacks and defenses, showing that if an attacker poisons too many tasks heavily, no defense can stop the damage. They then look at cases where attacks happen only sometimes or where the attack is limited in strength, proposing new defense methods that help the model detect bad data or become less sensitive to it. Their experiments on real-world tasks support their theoretical findings.
Continual LearningData PoisoningRegularizationAdversarial AttacksZero-Sum GameLearning ConvergenceRobust DefenseOnline LearningBias ReductionNoise Injection
Authors
Yiting Hu, Lingjie Duan
Abstract
Continual learning (CL), where a model is trained on a sequence of data tasks, is increasingly being adopted across key fields such as large language models and image recognition, yet it remains highly vulnerable to data poisoning that triggers learning divergence or severe excess risk. Despite these threats, a principled theoretical foundation in CL for understanding attack and defense remains lacking. In this paper, we develop a theoretical framework to analyze strategic attacks and defenses in regularization-based CL, a cornerstone of recent CL theory. By framing the adversary-defender interaction as an online zero-sum game, we first establish a fundamental performance limit: no defense succeeds when an adversary poisons a linear proportion of tasks by injecting unbounded noise or pattern shifts in regularization-based CL. We then analyze two possibly defensible scenarios: infrequent attacks and bounded noise per attack. For the former regime, we propose a task-to-task verification mechanism to detect data poisoning and reduce cumulative bias for learning convergence. For the latter regime, we derive a robust defense that minimizes the model's sensitivity to poisoned features, provably accelerating the convergence rate. Extensive experiments on realistic tasks further validate our theoretical results.