BrainJanus:融合脑、视觉与语言的统一模型
阅读原文· arxiv.orgBrainJanus是首个统一脑模型,在单一框架内融合脑、视觉与语言。它通过Unified Brain Tokenizer将连续神经动态量化为离散token,与视觉和语言表征在共享Omni空间中对齐。基于All-in-One自回归架构,利用下一token预测实现任意方向生成,包括图像/文本到脑的编码以及脑到图像/文本的解码。实验在多个基准上表现优越,具备零样本泛化能力,并保持可解释的脑拓扑结构。代码已公开。
Modeling the bidirectional correspondence between external sensory stimuli and internal neural activity has emerged as a critical frontier in neuroscience. However, existing approaches predominantly treat brain encoding and decoding as isolated tasks, relying heavily on unimodal alignment and external priors while overlooking the brain's intrinsic nature as a multimodal integration system. To address these limitations, we propose BrainJanus, the first unified brain model that integrates brain, vision, and language within a single framework. Specifically, we introduce a Unified Brain Tokenizer to quantize continuous neural dynamics into discrete tokens aligned with visual and linguistic representations in a shared Omni space. Building on this, we utilize an All-in-One autoregressive architecture that leverages next-token prediction to enable seamless any-to-any generation, which encompasses image-to-brain and text-to-brain encoding, and brain-to-image and brain-to-text decoding. Extensive experiments demonstrate that BrainJanus achieves superior performance across diverse benchmarks. Furthermore, our framework exhibits zero-shot generalization and preserves interpretable biological topography, highlighting its potential as a general-purpose brain modeling paradigm. The code is available at https://github.com/HaitaoWuTJU/BrainJanus{GitHub}.