Eyettention II: A Dual-Sequence Architecture for Modeling Fixation Location, Within-Word Landing Position, and Fixation Duration in Reading
2026-06-01 • Computation and Language
Computation and Language
AI summaryⓘ
The authors created a new deep-learning model called Eyettention II that can predict how people move their eyes when reading, including where and how long they look at words. Unlike earlier models, theirs works well even with less data and matches real human reading patterns closely. This can help improve language technology and psycholinguistic research by providing realistic eye movement data without expensive experiments. The model also follows ideas from cognitive science to better mimic how humans read.
eye trackingscanpathfixation durationdeep learningpsycholinguisticscognitive modelingnatural language processingfixation locationlanguage modelsdata scarcity
Authors
Shuwen Deng, Cui Ding, David R. Reich, Paul Prasse, Lena A. Jäger
Abstract
The way our eyes move while reading provides valuable insights into both the reader's cognitive processes and the properties of the text. In particular, eye-tracking-while-reading data has shown to be highly beneficial in various technological applications, such as enhancing and interpreting language models and inferring a reader's characteristics. However, these applications often rely on large-scale, data-driven models, which demand extensive eye-tracking datasets that are challenging to obtain due to the resource-intensive nature of data collection. To address the challenge of data scarcity, we develop Eyettention II, an end-to-end trained deep-learning model capable of generating realistic scanpaths consisting of a complete set of fixation attributes in chronological order, including fixation location, within-word landing position, and fixation duration. Our model is lightweight, efficiently trainable on limited GPU resources, and closely aligned with cognitive theories. We demonstrate that Eyettention II surpasses state-of-the-art models in scanpath prediction and mirrors human-like gaze behavior by capturing key psycholinguistic phenomena. With its robust performance, Eyettention II holds the potential to drive advancements in natural language processing, facilitate piloting the materials of psycholinguistic experiments, and uncover new insights beyond what is explicitly encoded in theoretical cognitive models.