Post-Hoc Merging is Not Enough: Many-Shot Model Merging with Loss-Gap Balancing
2026-06-15 • Artificial Intelligence
Artificial Intelligence
AI summaryⓘ
The authors study how to combine several language models trained for different tasks into one model that can handle all those tasks. They find that merging models all at once after training causes some tasks to lose important information. To fix this, the authors propose METIS, a method that merges models step-by-step while paying attention to which tasks need more focus. This approach helps keep information from being erased, especially improving the performance of tasks that originally did the worst.
model merginglarge language modelmulti-task learningtask interferenceinformation erasurepost-hoc mergingiterative mergingloss-gap weightingmasking
Authors
Kyungjin Im, Miru Kim, Chanin Eom, Minhae Kwon
Abstract
Model merging has become a practical post-training strategy for building a single multi-task large language model (LLM) by combining multiple task-specialized models. However, most existing approaches rely on post-hoc merging, in which task-specific models are merged only once after training. This one-shot aggregation often suffers from task interference, leading to information erasure across individual tasks. In this work, we show that replacing post-hoc merging with an iterative many-shot merging protocol is effective in improving multi-task performance. Building on this insight, we propose METIS, Mitigating Erasure from Task Interference for Stable many-shot merging. METIS is a loss-aware many-shot merging method that addresses information erasure in post-hoc merging through task-wise loss-gap weighting and consensus-based masking. Notably, METIS exhibits significant performance improvement on the worst-performing task, effectively mitigating information erasure. (Project page: https://imkyungjin.github.io/METIS/)