计算机科学
语义学(计算机科学)
解码方法
编码器
视皮层
编码(集合论)
人工智能
词(群论)
自然语言处理
心理学
神经科学
语言学
电信
哲学
集合(抽象数据类型)
程序设计语言
操作系统
作者
Jiaxuan Chen,Qi Yu,Yueming Wang,Gang Pan
出处
期刊:Cornell University - arXiv
日期:2023-01-01
被引量:3
标识
DOI:10.48550/arxiv.2309.15729
摘要
Decoding of seen visual contents with non-invasive brain recordings has important scientific and practical values. Efforts have been made to recover the seen images from brain signals. However, most existing approaches cannot faithfully reflect the visual contents due to insufficient image quality or semantic mismatches. Compared with reconstructing pixel-level visual images, speaking is a more efficient and effective way to explain visual information. Here we introduce a non-invasive neural decoder, termed as MindGPT, which interprets perceived visual stimuli into natural languages from fMRI signals. Specifically, our model builds upon a visually guided neural encoder with a cross-attention mechanism, which permits us to guide latent neural representations towards a desired language semantic direction in an end-to-end manner by the collaborative use of the large language model GPT. By doing so, we found that the neural representations of the MindGPT are explainable, which can be used to evaluate the contributions of visual properties to language semantics. Our experiments show that the generated word sequences truthfully represented the visual information (with essential details) conveyed in the seen stimuli. The results also suggested that with respect to language decoding tasks, the higher visual cortex (HVC) is more semantically informative than the lower visual cortex (LVC), and using only the HVC can recover most of the semantic information. The code of the MindGPT model will be publicly available at https://github.com/JxuanC/MindGPT.
科研通智能强力驱动
Strongly Powered by AbleSci AI