Question Type, Cognitive Load, and CEFR Alignment: Evaluating LLM-Generated EFL Grammar Drill Exercises

2026-06-01 • Computers and Society

Computers and Society

AI summaryⓘ

The authors studied how English learning content created by large language models (LLMs) works for Japanese junior high students using a grammar app. They found that multiple-choice questions were easiest, fill-in-the-blank tasks were harder, and drag-and-drop exercises took the most time. Their data also confirmed that the CEFR-J grammar difficulty levels matched how well students actually did. Overall, the authors suggest that LLMs can make useful learning materials but should arrange question types carefully to help students move from recognizing answers to actively using language.

LLM (Large Language Model)English as a Foreign Language (EFL)CEFR (Common European Framework of Reference for Languages)CEFR-Jcognitive loadmultiple-choice questionscloze tasksdrag-and-drop exercisesactive recallpedagogical viability

Authors

Steve Woollaston, Brendan Flanagan, Yuko Toyokawa, Hiroaki Ogata

Abstract

This study evaluates the pedagogical viability of LLM-generated English as a Foreign Language (EFL) learning content. Utilising log data from Japanese junior high school students practicing on a grammar drilling application, we analysed how different question modalities impact student performance and whether theoretical localised CEFR difficulty tiers accurately predict empirical task difficulty. Results reveal a clear performance hierarchy: multiple-choice questions carried the lowest cognitive load, cloze tasks posed the greatest barrier to active recall, and drag-and-drop exercises incurred the heaviest time penalties. Furthermore, learner data validated the CEFR-J grammar framework, showing a steady decline in accuracy and increased response times as proficiency levels advanced. These findings demonstrate that LLMs can successfully generate learning content, while highlighting the need for developers to strategically sequence question modalities to transition learners from passive recognition to active linguistic production.

View PDFOpen arXiv