EVA-Net: Subject-Independent EEG Motor Decoding with Video-Derived Motor Priors

2026-06-01Artificial Intelligence

Artificial Intelligence
AI summary

The authors address the challenge of decoding brain signals (EEG) to understand motor actions without needing individual calibration. They propose EVA-Net, a method that uses videos of actions to help the system learn common patterns across different people’s brain signals. First, their approach matches brain signals and video features in a shared space to reduce differences between subjects. Then, they use information from videos to improve an EEG-only classifier, making it better at recognizing actions across many people. Their tests show this video-based method works better than using text descriptions for guiding the decoding process.

Brain-Computer Interface (BCI)EEG decodingCross-subject generalizationMotor semanticsNon-stationarityContrastive learningKnowledge distillationMultimodal learningVideo-based semantic prior
Authors
Ziyuan Li, Yueyu Sun, Yimeng Zhang
Abstract
Practical non-invasive Brain-Computer Interface (BCI) systems require EEG decoders with strong cross-subject generalization and minimal calibration. However, inter-subject variability and signal non-stationarity often entangle motor semantics with subject-specific noise, limiting subject-independent decoding. Recent multimodal approaches use text as a semantic anchor, yet text provides sparse and static supervision for inherently dynamic motor processes. To address this issue, we propose EVA-Net, a two-stage framework that uses action videos as semantic priors for subject-independent EEG motor decoding. In the first stage, EEG and video features are aligned in a shared space using cross-modal and supervised contrastive objectives to reduce subject-specific variation. In the second stage, video category prototypes and knowledge distillation transfer video-derived priors to an EEG-only classifier without adding inference overhead. Experiments on two public datasets show that EVA-Net achieves strong subject-independent decoding performance, including an 8.66% LOSO accuracy gain on EEGMMI. Ablation results further suggest that video provides a more effective semantic anchor than the text baseline considered in this work.