Domain-Specific Quality Estimation for Machine Translation in Low-Resource Scenarios
2026-03-07 • Computation and Language
Computation and LanguageArtificial IntelligenceMachine Learning
AI summaryⓘ
The authors studied how well large language models (LLMs) can estimate the quality of English to Indic language translations without using references. They tested different prompting methods across multiple domains and found that closed-weight models work better with prompts alone, but open-weight models struggle in complex fields like healthcare or legal. To improve this, they used a technique called ALOPE that adjusts parts of the model internally, which helped models perform better on difficult translations. They also introduced an extension called LoRMA for further improvements and shared their code and data for others to use.
Quality Estimation (QE)machine translationlarge language models (LLMs)zero-shot learningfew-shot learningpromptingLow-Rank Adaptation (LoRA)Transformer layersLow-Rank Multiplicative Adaptation (LoRMA)English to Indic languages
Authors
Namrata Patil Gurav, Akashdeep Ranu, Archchana Sindhujan, Diptesh Kanojia
Abstract
Quality Estimation (QE) is essential for assessing machine translation quality in reference-less settings, particularly for domain-specific and low-resource language scenarios. In this paper, we investigate sentence-level QE for English to Indic machine translation across four domains (Healthcare, Legal, Tourism, and General) and five language pairs. We systematically compare zero-shot, few-shot, and guideline-anchored prompting across selected closed-weight and open-weight LLMs. Findings indicate that while closed-weight models achieve strong performance via prompting alone, prompt-only approaches remain fragile for open-weight models, especially in high-risk domains. To address this, we adopt ALOPE, a framework for LLM-based QE that uses Low-Rank Adaptation with regression heads attached to selected intermediate Transformer layers. We also extend ALOPE with recently proposed Low-Rank Multiplicative Adaptation (LoRMA). Our results show that intermediate-layer adaptation consistently improves QE performance, with gains in semantically complex domains, indicating a path toward more robust QE in practical scenarios. We release code and domain-specific QE datasets publicly to support further research.