When Can We Trust Early Warnings? Leakage-Excluded Early Outcome Prediction from LMS Interaction Logs

2026-05-25Artificial Intelligence

Artificial Intelligence
AI summary

The authors studied how to predict students' final course results early using data from online learning systems. They found that previous methods often used information that wouldn’t have been available yet, making predictions look better than they really were. To fix this, they created a strict testing method called LEAP that only uses data available up to a certain time point, preventing 'cheating' by looking ahead. They tested LEAP on a public dataset and showed how prediction accuracy improves as more weeks of data are included, with different machine learning models performing best at different times. They also showed that ignoring time constraints can make early predictions seem too optimistic.

Learning Management System (LMS)Early-warning modelsTemporal leakageCutoff-based predictionLEAP protocolFeature provenanceROC-AUCGradient BoostingRandom ForestOpen University Learning Analytics Dataset (OULAD)
Authors
Ngoc Luyen Le, Marie-Hélène Abel, Bertrand Laforge
Abstract
Early-warning models built from Learning Management System (LMS) logs aim to predict end-of-course outcomes early enough to enable timely learner support. However, reported "early" performance is often inflated by temporal leakage. This occurs when the pipeline uses information that would not yet be available at the time of prediction. We formalize cutoff-based early outcome prediction under a temporal availability constraint and introduce LEAP (Leakage-Excluded Early-Availability Protocol), which enforces cutoff-first truncation prior to joins and aggregation and audits feature provenance to prevent post-cutoff evidence from entering the benchmark. We instantiate LEAP on the public Open University Learning Analytics Dataset (OULAD) as a multi-step protocol for leakage-controlled evaluation across weekly cutoffs. Using several standard learning methods, we evaluate performance using ROC-AUC, PR-AUC, Brier score, and F1@0.5. Results show improving performance as the observation window expands, with a marked gain around week~3; Random Forest performs best at the earliest cutoffs, while Gradient Boosting dominates thereafter. Leakage ablations further show that temporal violations, especially through assessment information, can inflate apparent "early" performance.