# Brain-IT-VQA： 从大脑信号到答案

- 来源：HuggingFace Daily Papers（社区热门论文）
- 发布时间：2026-05-28 08:00
- AIHOT 分数：55
- AIHOT 链接：https://aihot.virxact.com/items/cmpw15rry03frsluk0a9rr3kz
- 原文链接：https://arxiv.org/abs/2605.29588

## AI 摘要

针对从fMRI信号解码视觉内容并回答问题的挑战，研究提出了Brain-IT-VQA框架。该框架基于Brain-IT，从大脑活动解码语言token并与大语言模型整合以回答视觉问题，性能显著超越现有方法。同时，引入了新基准NSD-VQA数据集，每张图像平均提供20个问题-答案对，涵盖20个受控问题类别，以实现更可靠和可解释的评估。Brain-IT-VQA与NSD-VQA结合，既提供了强大的预测框架，也成为研究大脑视觉表征的工具。

## 正文

Decoding visual content from fMRI signals recorded while a person views images, and specifically answering questions about the seen images, is a long-standing challenge. While significant progress has been made in recent years in visual question answering (VQA) from fMRI, performance remains limited. Moreover, although recent models can make increasingly accurate predictions, they have rarely been used as tools for understanding the structure of visual representations in the brain. We present Brain-IT-VQA, a framework for visual question answering from fMRI. Building on the Brain Interaction Transformer (Brain-IT), our method decodes language tokens from brain activity and integrates them with a language model to answer visual questions. Our model substantially outperforms previous fMRI-based captioning and VQA approaches. We further introduce NSD-VQA, a new dataset and benchmark for visual question answering from fMRI. Unlike existing image-fMRI VQA datasets, which typically provide only a few broad and weakly controlled questions per image, NSD-VQA provides on average 20 question-answer pairs per image across 20 controlled question categories that disentangle multiple levels of visual understanding. This enables more reliable and interpretable evaluation despite limited fMRI test data. Together, Brain-IT-VQA and NSD-VQA provide both a strong predictive framework and a tool for studying brain representations. Using this benchmark, we quantify which forms of visual and semantic information can be reliably decoded from fMRI responses to natural images. We further analyze the contributions of different brain regions across question types.
