Few-Shot Domain Incremental Learning via Continual Vision-Language Consolidation
2026-06-29 • Computer Vision and Pattern Recognition
Computer Vision and Pattern RecognitionArtificial IntelligenceMachine Learning
AI summaryⓘ
The authors address a problem where a machine learning model must learn new tasks from very few examples, called few-shot domain incremental learning (FSDIL). They propose a new method, Continual Vision-Language Consolidation (CVLC), that uses both vision and language information to help the model remember old knowledge while efficiently adapting to new domains. CVLC uses a technique to reserve space in the model's memory and fine-tunes it with a special projection method to better handle new data. The authors tested their method and found it performs better than previous approaches by a significant margin.
Domain-Incremental LearningFew-Shot LearningVision-Language ModelsLatent SpacePrototype LearningParameter-Efficient Fine-TuningLarge Language ModelsDual Coalescent ProjectionContinual LearningDomain Adaptation
Authors
Naeem Paeedeh, Mahardhika Pratama, Wolfgang Mayer, Mukesh Prasad, Weiping Ding, Yew-Soon Ong
Abstract
Existing domain-incremental learning (DIL) strategies call for massive amounts of data to adapt to new domains and suffer from the overfitting problem in the case of data scarcity. This paper puts forward a relatively uncharted problem, namely, few-shot domain incremental learning (FSDIL), taking into account the problem of extreme data shortages in the realm of DIL. A novel algorithm, namely Continual Vision-Language Consolidation (CVLC), is proposed to address the FSDIL problem, where the key idea lies in the concept of latent space reservation in the base domain coupled with dual coalescent projection (DCP) as a parameter-efficient fine-tuning method. First, the vision prototype is calibrated while multiple templates and synonyms are generated via LLMs to induce the language prototype. The vision and language prototypes are fused. Adaptation to never-ending arrivals of new domains is done by the DCP technique, fine-tuned in such a way to prepare the model to unseen domains via latent-space reservations committed in the base domain. CVLC is structured under shared and domain-specific components to combine general knowledge and domain-specific details. The advantage of our approach is demonstrated through a range of benchmark problems and comparisons with prior arts, in which CVLC outperforms them by up to a 16% gap. Our codes are shared publicly in https://github.com/Naeem-Paeedeh/CVLC .