Does Continued Pretraining on a Learner Corpus Improve Automated Essay Scoring on English Proficiency Tests? Evidence from EFCAMDAT

2026-05-25 • Computation and Language

Computation and LanguageMachine Learning

AI summaryⓘ

The authors studied whether further training of language models on essays written by English learners helps automated essay scoring (AES) for English proficiency tests. They found that training on all learner essays gave mixed results, partly because the training data and test data differed in English level and writing style. When they focused training on essays with similar proficiency levels, the scoring improved more reliably for one test but did not always help when applying the model to different tests. Overall, the authors show that adapting models to learner essays helps but only if the training closely matches the target test conditions.

Automated Essay ScoringPretrained Transformer ModelsDomain-Adaptive PretrainingEFCAMDAT CorpusEnglish Proficiency TestsFCEIELTSFew-Shot LearningCross-Dataset TransferCEFR Levels

Authors

Duy Anh Nguyen

Abstract

Recent automated essay scoring (AES) studies increasingly use pretrained transformer models, but these models are usually pretrained on general-domain English and may under-represent second-language learner writing. This study investigates whether domain-adaptive continued pretraining (DAPT) on the EFCAMDAT learner corpus improves transformer-based AES for English proficiency tests. We apply DAPT to three transformer encoders and evaluate them on FCE and IELTS in both in-domain scoring and few-shot cross-dataset transfer. Full-corpus DAPT produces mixed results across models, datasets, and metrics. Further analyses suggest that these mixed effects are partly explained by mismatches in proficiency, genre, and communicative purpose between EFCAMDAT and the downstream datasets. A proficiency-based ablation shows that targeted DAPT using CEFR-aligned subsets improves downstream scoring more reliably than full-corpus DAPT, especially for FCE with B1--B2 data. However, these gains do not consistently improve cross-dataset transfer. Overall, the findings suggest that continued pretraining on a learner-writing corpus can benefit in-domain AES for English assessment when the pretraining data is sufficiently aligned with the downstream assessment settings. However, it does not automatically improve transferability across different English proficiency test datasets.

View PDFOpen arXiv