KeepLoRA++: Continual Learning with Layer-Scaled Residual Gradient Adaptation
2026-06-15 • Computer Vision and Pattern Recognition
Computer Vision and Pattern RecognitionMachine Learning
AI summaryⓘ
The authors propose KeepLoRA++, a method to help vision-language models keep old knowledge while learning new tasks. They studied how knowledge is stored across different layers of the model and found that general knowledge is mostly in shallow layers, while task-specific info is deeper. Their approach updates the model in a way that protects important existing knowledge by adjusting learning differently across layers and focusing updates on less important parameter parts. Tests show that KeepLoRA++ performs better than other methods on various image and video tasks.
Continual LearningVision-Language ModelsTransformer ArchitectureLoRA (Low-Rank Adaptation)Principal SubspaceGradient ProjectionKnowledge RetentionImage ClassificationVisual Question AnsweringVideo Understanding
Authors
Mao-Lin Luo, Yi-Lin Zhang, Zi-Hao Zhou, Yankun Hong, Xialiang Tong, Mingxuan Yuan, Tong Wei, Min-Ling Zhang
Abstract
Continual learning for pre-trained vision-language models requires balancing three competing objectives: retaining pre-trained knowledge, preserving knowledge from a sequence of learned tasks, and maintaining the plasticity to acquire new knowledge. This paper presents KeepLoRA++, balancing these objectives through a unified dual-dimensional knowledge retention mechanism. We analyze knowledge distribution of Transformer architecture from both inter-layer and intra-layer perspectives. The inter-layer perspective examines how retention is distributed across layers, while the intra-layer perspective focuses on the parameter space within each layer. Our analysis reveals a structural property: general transferable knowledge is mainly encoded in the shallow layers and the principal subspace of the parameters, while task-specific adaptations are localized in the deep layers and the residual subspace. Motivated by this insight, KeepLoRA++ introduces a layer-scaled residual gradient adaptation method. New tasks are learned by restricting LoRA parameter updates to the residual subspace, combined with a shallow-to-deep layer scaling, to prevent interference with previously acquired capabilities. Specifically, the gradient of a new task is projected onto a subspace orthogonal to both the principal subspace of the pre-trained model and the dominant directions of previous task features, while simultaneously assigning smaller update magnitudes to shallow layers and larger ones to deeper layers. Our theoretical analysis and empirical evaluations confirm that KeepLoRA++ successfully balances these three competing objectives, consistently outperforming representative baselines across image classification, visual question answering, and video understanding tasks.