Parameter-Efficient Fine-Tuning of Large Pretrained Models for Instance Segmentation Tasks
2026-06-01 • Computer Vision and Pattern Recognition
Computer Vision and Pattern RecognitionArtificial Intelligence
AI summaryⓘ
The authors studied ways to update large AI models for instance segmentation tasks without changing many parameters, which saves time and computing power. They tested two techniques called adapters and LoRA on different models and datasets, finding that fine-tuning only 1-6% of parameters can still achieve good results. They discovered that using 2-3 adapters per transformer block works well, and that LoRA applied to deformable attention performs efficiently and sometimes even better. Their work shows that the best tuning method depends on the specific model and dataset.
parameter-efficient fine-tuningadaptersLow-Rank Adaptation (LoRA)transformer modelsinstance segmentationdeformable attentiontransfer learningbenchmark datasets
Authors
Nermeen Abou Baker, David Rohrschneider, Uwe Handmann
Abstract
Research and applications in artificial intelligence have recently shifted with the rise of large pretrained models, which deliver state-of-the-art results across numerous tasks. However, the substantial increase in parameters introduces a need for parameter-efficient training strategies. Despite significant advancements, limited research has explored parameter-efficient fine-tuning (PEFT) methods in the context of transformer-based models for instance segmentation. Addressing this gap, this study investigates the effectiveness of PEFT methods, specifically adapters and Low-Rank Adaptation (LoRA), applied to two models across four benchmark datasets. Integrating sequentially arranged adapter modules and applying LoRA to deformable attention--explored here for the first time--achieves competitive performance while fine-tuning only about 1-6% of model parameters, a marked improvement over the 40-55% required in traditional fine-tuning. Key findings indicate that using 2-3 adapters per transformer block offers an optimal balance of performance and efficiency. Furthermore, LoRA, exhibits strong parameter efficiency when applied to deformable attention, and in certain cases surpasses adapter configurations. These results show that the impact of PEFT techniques varies based on dataset complexity and model architecture, underscoring the importance of context-specific tuning. Overall, this work demonstrates the potential of PEFT to enable scalable, customizable, and computationally efficient transfer learning for instance segmentation tasks.