On the Limits of Model Merging for Multilinguality in Pre-Training
2026-05-25 • Computation and Language
Computation and Language
AI summaryⓘ
The authors studied how to make AI models work well in many languages. They tested combining (merging) models trained in just one language each, but found this made them perform poorly because the models interfered with each other. Their work shows that merging works better when models have similar ways of understanding language. So, while merging helps during fine-tuning, it doesn’t easily work when combining models pre-trained separately on different languages.
multilingual modelspre-trainingmonolingual modelsmodel mergingfine-tuninglanguage interferencerepresentational similarityperformance collapse
Authors
Seth Aycock, Fedor Vitiugin, Aleksandr Umnov, Christof Monz, Khalil Sima'an
Abstract
Endowing models with consistent multilingual performance can be achieved by mixing pre-training data, or post-training approaches such as language-specific model merging. In this work, we test whether merging can be applied to monolingually pre-trained models. We conduct a controlled study on the efficacy of mixed, merged, and monolingual pre-training setups. We find that while monolingual pre-training results in strong in-language performance, merging any combination of monolingual models leads to performance collapse due to interference. Our analysis suggests representational similarity is a prerequisite for model merging. We therefore conclude that the flexibility of merging in fine-tuning does not extend trivially to language-specific pre-training.