Implicit-Knowledge-Guided Align Before Understanding for KB-VQA
计算机科学
知识管理
作者
Feng Mao,Lei Liao,Meng Yang
标识
DOI:10.1109/icassp48485.2024.10448108
摘要
Visual Question Answering based on external Knowledge Bases(KB-VQA) have gained more attention in recent years. Due to the large scale of knowledge base data and the lack of manual annotation information in the process of retrieving knowledge for image problem data, the current model usually suffers performance due to the low accuracy of knowledge retrieval. In order to solve the current challenges, we propose an implicit-knowledge-guided alignment before understanding (IK-ALBUN) model, which improves the knowledge base retrieval performance by aligning multimodal information with the knowledge base in the first stage, and conduct multi-modal semantic fusion and understanding to complete the KB-VQA with the retrieved knowledge bases and the additionally designed contrastive loss. The model has completed experiments on several public datasets and proved the superior performance of the current method.