Testing the Deliteralization Hypothesis in Human and Machine Translation
2026-05-25 • Computation and Language
Computation and Language
AI summaryⓘ
The authors studied how large language models (LLMs) compare to traditional machine translation systems and humans in making translations less literal and more natural. They found that humans still produce less literal translations than any machine, but LLMs are getting closer. When LLMs revise their own translations, they make them progressively less literal, supporting a theory from translation studies. However, when using LLMs to edit human translations, they tend to keep the literal parts and change the more idiomatic expressions, which is different from how humans edit. This work helps understand how LLMs handle translation and revision differently from humans and older systems.
Machine TranslationLarge Language ModelsDeliteralizationPost-editingTranslation FluencyIterative RevisionSynthetic Literality IndexIdiomatic TranslationWMT24 Dataset
Authors
Malik Marmonier, Rachel Bawden, Benoît Sagot
Abstract
The recent shift from dedicated NMT systems to general-purpose LLMs has reshaped machine translation, with LLMs reported to produce more fluent, less literal output than their predecessors. We test whether this shift extends to the deliteralization hypothesis, the long-standing claim from translation studies that translations become progressively less literal as they are drafted and revised. Using the WMT24++ dataset, we compare the literality of human translations and post-editions to that of two NMT systems and six LLMs across 54 language pairs and three tasks: direct translation, iterative self-revision, and post-editing of human drafts. Literality is measured via a validated Synthetic Literality Index built from six heuristics. We find that (i) human translations remain significantly less literal than those of all tested MT systems, though recent LLMs narrow the gap; (ii) when prompted to iteratively revise their own output, LLMs deliteralize monotonically, providing the first evidence that the hypothesis applies natively to LLM generation; and (iii) as post-editors, LLMs invert the revision triggers of human post-editors, tolerating literal drafts and targeting idiomatic human formulations for revision.