TANDEM: Bi-Level Data Mixture Optimization with Twin Networks

2026-06-03Machine Learning

Machine Learning
AI summary

The authors developed a method called TANDEM to improve how large language models are trained using data from different domains. They treat the problem of mixing data from various sources as a special optimization task and solve it using two neural networks that learn together. By comparing these twin networks, their method figures out which data sources help the model more and adjusts the training data accordingly. Their approach works well both in theory and practice, especially when training data is limited or during fine-tuning.

large language modelsdomain adaptationbi-level optimizationneural networksdata mixingfine-tuningproxy modelreference modeldata weightingoptimization
Authors
Jiaxing Wang, Deping Xiang, Jin Xu, Mingyang Yi, Guoqiang Gong, Zicheng Zhang, Haoran Li, Pengzhang Liu, Zhen Chen, Ke Zhang, Ju Fan, Qixiang Jiang
Abstract
The capabilities of large language models (LLMs) significantly depend on training data drawn from various domains. Optimizing domain-specific mixture ratios can be modeled as a bi-level optimization problem, which we simplify into a single-level penalized form and solve with twin networks: a proxy model trained on primary data and a dynamically updated reference model trained with additional data. Our proposed method, Twin Networks for bi-level DatA mixturE optiMization (TANDEM), measures the data efficacy through the difference between the twin models and up-weights domains that benefit more from the additional data. TANDEM provides theoretical guarantees and wider applicability, compared to prior approaches. Furthermore, our bi-level perspective suggests new settings to study domain reweighting such as data-restricted scenarios and supervised fine-tuning, where optimized mixture ratios significantly improve the performance. Extensive experiments validate TANDEM's effectiveness in all scenarios.