Integrating Neural-Symbolic Reasoning With Variational Causal Inference Network for Explanatory Visual Question Answering

视觉推理计算机科学因果推理推论人工智能答疑因果推理诱因推理机器学习定性推理因果模型人工神经网络构造（python库）认知数学心理学神经科学统计计量经济学程序设计语言

作者

Dizhan Xue,Shengsheng Qian,Changsheng Xu

出处

期刊：IEEE Transactions on Pattern Analysis and Machine Intelligence [IEEE Computer Society]
日期：2024-05-08 卷期号：46 (12): 7893-7908 被引量：10

链接

nih.govdoi.org

标识

DOI：10.1109/tpami.2024.3398012

摘要

Recently, a novel multimodal reasoning task named Explanatory Visual Question Answering (EVQA) has been introduced, which combines answering visual questions with multimodal explanation generation to expound upon the underlying reasoning processes. In contrast to conventional Visual Question Answering (VQA) that merely concentrates on providing answers, EVQA aims to improve the explainability and verifiability of reasoning by providing user-friendly explanations. Despite the improved explainability of inferred results, the existing EVQA models still adopt black-box neural networks to infer results, lacking the explainability of the reasoning process. Moreover, existing EVQA models commonly predict answers and explanations in isolation, overlooking the inherent causal correlation between them. To handle these challenges, we propose a Program-guided Variational Causal Inference Network (Pro-VCIN) that integrates neural-symbolic reasoning with variational causal inference and constructs causal correlations between the predicted answers and explanations. First, we utilize pretrained models to extract visual features and convert questions into the corresponding programs. Second, we propose a multimodal program Transformer to translate programs and the related visual features into coherent and rational explanations of the reasoning processes. Finally, we propose a variational causal inference to construct the target structural causal model and predict answers based on the causal correlation to explanations. Comprehensive experiments conducted on EVQA benchmark datasets reveal the superiority of Pro-VCIN in terms of both performance and explainability over state-of-the-art EVQA methods.

求助该文献

最长约 10秒，即可获得该文献文件

Integrating Neural-Symbolic Reasoning With Variational Causal Inference Network for Explanatory Visual Question Answering

今日热心研友