3D Temporal Analysis for Autism Spectrum Disorder Screening During Attention Tasks

2026-06-03Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors developed a new method to help detect Autism Spectrum Disorder (ASD) in children aged 7-12 by using 3D video analysis instead of traditional 2D methods. Their system tracks detailed head movements and facial expressions while children complete tasks in virtual reality. Using advanced machine learning models, they found their 3D-based approach was more accurate than older methods. Combining different 3D features led to the best detection results, suggesting their method could help create more objective and automated ASD screening tools.

Autism Spectrum Disorder3D head pose estimationfacial expression analysisLSTMGRUVirtual RealityContinuous Performance Testmachine learningPCAmultimodal fusion
Authors
Inam Qadir, Elizabeth B Varghese, Dena Al-Thani, Marwa Qaraqe
Abstract
Accurate Autism Spectrum Disorder (ASD) screening for school-age children is crucial to identify cases that may have been missed earlier and to enable timely interventions supporting social, cognitive, and academic development. Current ASD screening relies on subjective assessments and 2D analysis methods that fail to capture spatial displacement patterns characteristic of ASD behaviors. In this study, a novel 3D temporal analysis framework is presented, built on top of DECA (Detailed Expression Capture and Animation), a 3D modeling framework, to extract comprehensive head pose parameters (including translational components $T_x, T_y, T_z$) and facial expressions independent of pose variations. LSTM and GRU-based temporal classifiers were trained on the extracted 3D features from video data collected from 39 participants (19 ASD, 20 TD) aged 7-12 years during Virtual Reality-Continuous Performance Test tasks. The GRU-based models demonstrated superior performance, with 3D head pose features achieving 83.9\% accuracy and 3D facial features reaching 81.4\% accuracy, outperforming 2D baseline approaches by 10.7\% and 7.5\%, respectively. Furthermore, multimodal fusion of 3D head pose and facial features with PCA-based dimensionality reduction achieved the highest accuracy of 84.6\%, outperforming unimodal approaches. This work establishes a foundation for objective, automated screening tools addressing current diagnostic limitations in ASD identification for school-age populations.