Zero-Shot Imagined Speech Decoding via Imagined-to-Listened MEG Mapping

2026-05-08Machine Learning

Machine Learning
AI summary

The authors studied how to understand what people imagine saying by using brain signals recorded with a non-invasive method called MEG. Since data of imagined speech is limited and hard to line up in time, they recorded brain activity from musicians both when listening to and imagining the same speech and music. They built a three-step process: first, they learned to convert imagined brain signals into those like listening brain signals; second, they trained a decoder to identify words from listening brain signals; and third, they used this to decode imagined words from new subjects. Their results show it is possible to decode imagined speech better than chance and that more training data improves performance, making this a promising approach for brain-computer interfaces.

imagined speechMEG (magnetoencephalography)brain-computer interfaceneural decodingtemporal alignmentcontrastive decodingsemantic embeddingsacoustic embeddingsphonetic embeddingsmusicians
Authors
Maryam Maghsoudi, Shihab Shamma
Abstract
Decoding imagined speech from non-invasive brain recordings is challenging because imagined datasets are scarce and difficult to align temporally across subjects and sessions In this work, we propose a new approach to the decoding of imagined speech that leverages the richer and more reliably labeled recordings during listening to speech. We collected paired listened and imagined MEG recordings to rhythmic melodic and spoken stimuli from trained musicians. Using trained musicians helped improve temporal alignment across conditions. We then developed a three-stage decoding pipeline that revealed consistent and meaningful relationships between neural activity evoked by imagining and listening to the same stimuli. First, we trained six linear and neural models to map imagined MEG responses to listened responses. We evaluated these models against a null baseline from unseen subjects to validate that the predicted-listening responses preserve stimulus-specific information. In the second stage, we trained a contrastive word decoder exclusively on the listened MEG responses, and evaluated it using four embedding strategies including semantic, acoustic, and phonetic representations. In the third stage, we process the imagined MEG responses from held-out subjects through the mapping pipeline to compute the corresponding listening responses that are then decoded by the listened decoder. Using rank-based analysis, we show that the imagined words are decodable significantly above chance. We shall report here the results of a proof-of-concept implementation to decode imagined speech, where all evaluations are performed on held-out subjects. We also demonstrate that performance improves with training data size, suggesting that this approach is scalable and can directly be made applicable to realistic brain-computer interface scenarios.