Structure-Aware Modeling of Multiple-Choice Questions Improves Automatic Difficulty Estimation
2026-06-08 • Computation and Language
Computation and LanguageMachine Learning
AI summaryⓘ
The authors studied how to better predict how hard multiple-choice questions are by looking not just at the question and correct answer but also at the wrong answer choices (distractors). They found that treating each distractor as a separate piece of information helps the prediction models work better. Their best method improved accuracy compared to simpler models and worked well with fewer computing resources when ignoring the order of distractors. This suggests that considering the structure and content of all parts of a question can improve automatic difficulty estimation for educational tests.
Automatic Question Difficulty Estimation (AQDE)Multiple-choice questionsDistractorsPredictive modelingR-squared (R^2)Feature representationOrder-aware concatenationOrder-invariant summationNatural SciencesSocial Sciences
Authors
Gabriel Ortega, Abelino Jiménez, Séverin Lions, Pablo Dartnell
Abstract
Automatic Question Difficulty Estimation (AQDE) holds growing promise for educational assessment because it has the potential to yield difficulty estimates that are competitive with expert judgment, while helping reduce the time and financial burden associated with pilot administrations and scaling to digital testing contexts. Prior AQDE studies report mixed evidence on whether adding distractors as additional text to the question stem and the correct key consistently improves difficulty prediction. We hypothesize that the effectiveness of distractor information depends on its structural representation, and that explicitly modeling distractors as separate components improves difficulty estimation over baselines that omit this information. To address this, we designed controlled architectures that model MCQ components as distinct inputs to isolate the contribution of distractor content and order. Specifically, we represented distractors by encoding each distractor as its own text input and aggregating their representations either with order-aware concatenation (with positional tags) or with an order-invariant summation. We evaluated these architectures using two Chilean datasets (Natural and Social Sciences, 2016-2020; 4,114 multiple-choice questions). Compared to a simpler model that only used the question stem and the key, our best distractor-aware architecture achieved higher predictive performance, reaching R^2 = 0.83 for Natural Sciences and R^2 = 0.71 for Social Sciences items. An order-invariant variant achieved nearly the same accuracy with approximately half as many parameters, offering a favorable accuracy-efficiency trade-off. These results show that structural information (especially distractor content) drives gains in predictive accuracy, supporting the development of efficient, structure-aware models that are computationally viable for large-scale educational applications.