答疑
计算机科学
情报检索
集合(抽象数据类型)
自然语言处理
钥匙(锁)
人工智能
计算机安全
程序设计语言
作者
Jiuxiang You,Zhenguo Yang,Qing Li,Wenyin Liu
标识
DOI:10.1109/icme55011.2023.00011
摘要
In this paper, we propose a Retriever-Reader framework with Visual Entity Linking (RR-VEL) for knowledge-based visual question answering. Given images and original questions, the visual entity linking (VEL) module extracts key entities in images to replace the question referents for semantic disambiguation, achieving entity-oriented queries with explicit entities. Furthermore, the Retriever encodes the queries and knowledge items by Bert with a feed-forward layer, and obtains a set of knowledge candidates. The Reader encodes the questions with image captions and knowledge candidates in two branches, which avoids their interference during self-attentive encoding. Finally, the decoder of Reader fuses the encoded features to generate answers. Extensive experiments conducted on the two public datasets show that our method significantly outperforms the existing baselines.
科研通智能强力驱动
Strongly Powered by AbleSci AI