BrainJanus: A Unified Model for Understanding and Generation across Brain, Vision, and Language

2026-06-29 • Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionMachine Learning

AI summaryⓘ

The authors developed BrainJanus, a new model that links brain activity with images and language all at once, instead of treating them separately. They created a way to convert brain signals into tokens that relate directly to visual and language information in a shared space. BrainJanus can predict and generate brain signals from images or text and do the reverse, showing strong results on various tests. The model also works well on tasks it hasn't seen before and keeps brain patterns understandable. This approach could help unify how we study brain responses to different types of information.

brain encodingbrain decodingmultimodal integrationtokenizationautoregressive modelszero-shot generalizationneural dynamicsvision-language modelsbiological topography

Authors

Haitao Wu, Qirui Zhang, Zhouheng Yao, Shangquan Sun, Qihao Zheng, Mianxin Liu, Chi Zhang, Wanli Ouyang, Chunfeng Song, Changqing Zhang, Jiamin Wu

Abstract

Modeling the bidirectional correspondence between external sensory stimuli and internal neural activity has emerged as a critical frontier in neuroscience. However, existing approaches predominantly treat brain encoding and decoding as isolated tasks, relying heavily on unimodal alignment and external priors while overlooking the brain's intrinsic nature as a multimodal integration system. To address these limitations, we propose BrainJanus, the first unified brain model that integrates brain, vision, and language within a single framework. Specifically, we introduce a Unified Brain Tokenizer to quantize continuous neural dynamics into discrete tokens aligned with visual and linguistic representations in a shared Omni space. Building on this, we utilize an All-in-One autoregressive architecture that leverages next-token prediction to enable seamless any-to-any generation, which encompasses image-to-brain and text-to-brain encoding, and brain-to-image and brain-to-text decoding. Extensive experiments demonstrate that BrainJanus achieves superior performance across diverse benchmarks. Furthermore, our framework exhibits zero-shot generalization and preserves interpretable biological topography, highlighting its potential as a general-purpose brain modeling paradigm. The code is available at \href{https://github.com/HaitaoWuTJU/BrainJanus}{GitHub}.

View PDFOpen arXiv