Language-Aware Token Boosting: LLM Language Confusion Reduction Without Tuning

2026-06-08 • Computation and Language

Computation and Language

AI summaryⓘ

The authors found that large language models sometimes mix up languages when generating text in languages other than English. Instead of retraining the models, they created a new approach that gently adjusts certain words related to the target language to reduce confusion. They also designed a version that changes these adjustments based on how sure the model feels about the language choice. Their experiments showed this method helps the model stay in the right language and keeps the quality of summaries without extra training. They shared their code publicly for others to use.

Large Language ModelsLanguage ConfusionFine-TuningMultilingual AlignmentToken PerturbationLanguage-Aware Token BoostingAdaptive MechanismsSummarization QualityModel ConfidenceTuning-Free Approach

Authors

Trapoom Ukarapol, Pakhapoom Sarapat, Nut Chukamphaeng

Abstract

Large language models (LLMs) sometimes exhibit language confusion when generating non-English text. Existing approaches typically rely on fine-tuning to mitigate this issue. In contrast, we propose a tuning-free paradigm for reducing language confusion. Within this paradigm, we introduce two methods: Language-Aware Token Boosting (LATB), which applies targeted perturbations to tokens associated with the desired language, and Adaptive Language-Aware Token Boosting (Adaptive-LATB), which dynamically adjusts these perturbations based on the model's confidence in the intended language. Experiments demonstrate that our methods effectively improve multilingual alignment by reducing language confusion, while maintain the summarization quality without requiring any additional fine-tuning. Our code is publicly available. https://github.com/scbdatax/genai-datax-language-aware-token-boosting.

View PDFOpen arXiv