TaDA: Calibrated Probe Gating for Task-Domain LoRA Merging

2026-06-03Computation and Language

Computation and Language
AI summary

The authors studied how to better combine two types of adapter models—one focused on specific tasks and another on domains—into a single model. They found that the importance of each adapter changes depending on the depth of the model layers, with domain adapters being more important deeper in the model. To use this insight, they created TaDA, a method that smartly mixes the adapters at each layer without needing extra training. Tested on language and image tasks, their method improved accuracy compared to other approaches without slowing down the model.

LoRA adapterstask adaptersdomain adapterstransformer layerslayer-wise gatingmodel mergingsingular value decompositionLlama-2-7BViT-L/16model efficiency
Authors
Huy Quoc To, Fuyi Li, Guangyan Huang, Ming Liu
Abstract
Combining a task LoRA adapter with a domain LoRA adapter into a single unified model is a practical yet largely unexplored challenge. Existing methods treat both adapters as symmetric peers, applying uniform weights across all layers. We argue that task and domain adapters exhibit a consistent depth-dependent asymmetry across transformer architectures. Domain dominance increases with layer depth, while shallower layers retain stronger task-relevant signals. Motivated by this observation, we propose $\textbf{TaDA}$ ($\textbf{Ta}$sk-$\textbf{D}$omain LoR$\textbf{A}$ Merging), a training-free algorithm that exploits this structure through calibrated probe-guided per-layer gating and per-component subspace-aware merging. The gating assigns individual weights per layer and projection type using a probe signal proved invariant to adapter weight magnitude. The merging discards conflicting singular directions before combining the remaining components. $\textbf{TaDA}$ produces a standard rank-$r$ LoRA adapter with zero inference overhead. On six scientific QA benchmarks with Llama-2-7B, TaDA achieves an average accuracy of 0.452, outperforming DARE-TIES by +3.6 percentage points and obtaining the best result on all six benchmarks. On six image classification benchmarks with ViT-L/16, TaDA reaches 85.9\% average accuracy, improving over the strongest merging baseline while leading in three of the six individual benchmarks.