Compress Then Adapt? No, Do It Together via Task-aware Union of Subspaces

2026-05-04 • Artificial Intelligence

Artificial Intelligence

AI summaryⓘ

The authors point out that current methods for adapting large pretrained models either compress them first and then fine-tune, which can cause problems because the compression might not fit well with the specific task. They propose JACTUS, a new method that combines compression and adaptation into one step by carefully choosing which parts of the model to keep based on both the pretrained weights and the new task's needs. This approach helps create smaller models that still perform well and are easier to update for new tasks. Experiments show JACTUS does better than existing methods on both vision and language understanding tasks while using fewer parameters.

pretrained modelsparameter-efficient fine-tuning (PEFT)low-rank compressionadapter tuningsubspace alignmentvision transformer (ViT)Llama2low-rank approximationmodel adaptationparameter budget

Authors

Jingze Ge, Yun Liu, Xue Geng, Wanqi Dong, Wang Zhe Mark, Min Wu, Xulei Yang

Abstract

Adapting large pretrained models to diverse tasks is now routine, yet the two dominant strategies of parameter-efficient fine-tuning (PEFT) and low-rank compression are typically composed in sequence. This decoupled practice first compresses and then fine-tunes adapters, potentially misaligning the compressed subspace with downstream objectives and squandering a global parameter budget. To overcome this limitation, we introduce JACTUS (Joint Adaptation and Compression with a Task-aware Union of Subspaces), a single framework that unifies compression and adaptation. From a small calibration set, JACTUS estimates input and pre-activation gradient covariances, forms their orthogonal union with the pretrained weight subspace, performs a projected low-rank approximation inside this union, allocates rank globally by marginal gain per parameter, and trains only a compact core matrix. This explicitly mitigates the potential misalignment between the compressed subspace and downstream objectives by coupling the directions preserved for compression with those required for adaptation, yielding a deployable low-rank model that avoids retaining full frozen weights while enabling fast and robust tuning. On vision, JACTUS attains an average 89.2% accuracy on ViT-Base across eight datasets at 80% retained parameters, surpassing strong 100% PEFT baselines (e.g., DoRA 87.9%). On language, JACTUS achieves an 80.9% average on Llama2-7B commonsense QA at the same 80% retained-parameter budget, outperforming 100% PEFT (e.g., DoRA 79.7%) and exceeding prior compress-then-finetune pipelines under the same ratained-parameter budget. We will release code.

View PDFOpen arXiv