When Meaning Travels: A Granular Lens on Hybrid-MoE's Role in Idiomatic Understanding for Language Models

2026-06-01Computation and Language

Computation and Language
AI summary

The authors studied how to better understand idioms—phrases with special figurative meanings—in less commonly studied Southeast Asian languages like Hindi, Bengali, and Thai. They created a new collection called Varnika, which includes many idioms along with pictures and tones to help computers learn their meanings. They also designed a special computer method, HybridMoE, that mixes different expert opinions to better capture the tricky meanings of idioms. The authors developed new ways to test how well the computer understands idioms literally, visually, and in meaning. Their approach improved computer models' ability to represent and interpret idioms across languages and visuals.

figurative languageidiomsmultilingual educationmultimodal corpusHybrid Mixture-of-Expertssemantic alignmentcross-linguistic transferidiomatic tonesmasked embeddingsevaluation metrics
Authors
Sarmistha Das, Vaibhav Vishal, Shreyas Guha, Amaan Ali, Kitsuchart Pasupa, Sriparna Saha
Abstract
In the contemporary epoch of multilingual education, learning idioms provides a fascinating gateway towards creativity, cultural values, historical context, and diverse perspectives inherent to various linguistic traditions. This paper showcases the navigation of retaining figurative and cultural semantics in low-resource Southeast Asian languages such as Hindi, Bengali, and Thai, where culturally rich idioms pose significant obstacles for computational modeling and cross-linguistic transfer due to their deep metaphorical complexity. To tackle such complexity, we present Varnika, a reconstructed multimodal idiom corpus comprising 3,533 multilingual idioms, enriched with seven idiomatic tones aligned with both textual and visual representations. Additionally, to infer informative idiomatic understanding, we introduce a Hybrid Mixture-of-Experts (HybridMoE) framework that embeds multiple idiomatic expert opinions while mitigating expert sparsity by integrating outputs from both selected and unselected experts through controlled hybridization, further augmented with Idiomatic Property Signals via masked multimodal embeddings. To analyze the performance across multiple dimensions, we propose the IDIO-TONE and Idiomatic Validation Score, a three-stage evaluation pipeline measuring (i) literal translation fidelity, (ii) visual-semantic alignment, and (iii) idiomatic meaning retention. Empirical evaluations highlight that HybridMoE achieves 5--6\% performance gains across advanced vision language models, demonstrating improved representation of figurative language and culturally embedded meaning in multilingual multimodal settings