The Impact of Editorial Intervention on Detecting Native Language Traces

2026-05-11 • Computation and Language

Computation and Language

AI summaryⓘ

The authors studied how well systems can guess a writer's native language from their non-native English writing, especially after the text has been edited by AI tools. They found that small corrections like fixing grammar still keep clues about the author's original language, but bigger changes like rewording make it much harder to identify. This means that the deeper habits and cultural influences in writing, not just surface errors, help in recognizing someone's native language. However, heavy editing by AI can hide these clues.

Native Language IdentificationNon-native writingGrammatical Error CorrectionParaphrasingLexico-semantic featuresPragmatic transferCultural perspectiveAI co-authorshipProfiling accuracyFluency edits

Authors

Ahmet Yavuz Uluslu, Mark Gales, Kate Knill, Gerold Schneider

Abstract

Native Language Identification (NLI) is the task of determining an author's native language (L1) from their non-native writings. With the advent of human-AI co-authorship, non-native texts are routinely corrected and rewritten by large language models, fundamentally altering the linguistic features NLI models depend on. In this paper, we investigate the robustness of L1 traces across increasing degrees of editorial intervention. By processing 450 essays from the Write & Improve 2024 corpus through varying levels of grammatical error correction (GEC) and paraphrasing, we demonstrate that L1 attribution does not entirely depend on surface-level errors. Instead, the detection models leverage deeper L1 features: unidiomatic lexico-semantic choices, pragmatic transfer, and the author's underlying cultural perspective. We find that minimal edits preserve these structural traces and maintain high profiling accuracy. In contrast, fluency edits and paraphrasing normalize these L1 features, leading to a severe degradation in performance.

View PDFOpen arXiv