EEG Benchmarking Needs a Task Specification Layer: NeuroDoc for Rulebook-Guided, Executable Benchmark Construction

2026-06-22Machine Learning

Machine LearningNeural and Evolutionary Computing
AI summary

The authors noticed that public EEG datasets lack a unified way to describe tasks, making it hard to compare models fairly. They created a system with a clear language and rulebook to define EEG tasks consistently, which helps turn mixed data into standardized benchmarks. They also made tools called NeuroDoc and NeuroAudit to help manage and review these task definitions. Finally, they tested their approach on several EEG models to show it works for creating reliable and reusable EEG benchmarks.

EEGbenchmarkingtask specificationdataset standardizationfoundation modelsNeuroDocNeuroAuditrulebookexecutable tasks
Authors
Chengxuan Qin, Zhige Chen, Shu Peng, Rui Yang, Jiping Cui, Yikai Dong, Jun Li, Liu Peng, Zhida Shang, Mingze Tang, Kay Chen Tan, Jibin Wu
Abstract
Electroencephalography (EEG) foundation models increasingly rely on multi-dataset training and evaluation, yet public EEG datasets still lack a shared task specification layer that can turn heterogeneous recordings into reusable benchmark units. Existing standards organize files, metadata, and provenance, but they do not specify EEG tasks under a common language and rulebook, leaving critical task semantics scattered across papers, code, and manual interpretation. We investigate whether heterogeneous public EEG datasets can be standardized through a structured task specification language paired with a shared rulebook. Our methodology represents each benchmark entry as a task document synchronized with an executable task kernel, with the rulebook defining task fields, evidence requirements, document-kernel alignment, review states, and machine-checkable constraints. Using this methodology, we release a community-reviewed EEG benchmark corpus centered on 53 completed and reviewed entries with 245 task definitions spanning diverse paradigms, and we introduce NeuroDoc and NeuroAudit as the operational support layer for rulebook-guided drafting, upgrading, review, amendment, and release management. We further examine whether the resulting benchmark units can be instantiated in a shared downstream setting across four EEG foundation model backbones, providing execution-based evidence for reusable, auditable, and executable EEG benchmarking infrastructure.