连接主义
计算机科学
解析
人工智能
视觉推理
模式
答疑
自然语言处理
模态(人机交互)
认知科学
人工神经网络
机器学习
心理学
社会科学
社会学
作者
Aakansha Mishra,Miriyala Srinivas Soumitri,Vikram N Rajendiran
标识
DOI:10.1109/icassp48485.2024.10447493
摘要
Reasoning conditioned on visual and linguistic information has gained immense importance in recent times. The prior art in Visual Question Answering (VQA) has been predominantly connectionist in nature. To resolve the issues of connectionist AI models, Symbolic models were proposed that allowed for explainable visual reasoning. In addition to semantic parsing, such models worked towards visual parsing resulting in scene graphs that provided scope for accurate reasoning conditioned on the explainable scene graphs. However, the real scenarios of VQA cannot always be segregated exclusively into connectionist (neural networks) and conceptual modalities. Rather, they are always dependent on the relationships and interactions between the two modalities. In this work, the authors proposed a question-guided attention mechanism that combines the approach of explainable visual reasoning through scene graphs with a cross-modality-based multi-head attention mechanism. The contributions of con-nectionist and conceptual modalities are learned through the semantic parsing of questions in each VQA task. The novel method is tested with the VQA2.0 and GQA and it resulted in 65.31% and 63.06% accuracy, respectively, which is better than the state-of-the-art in explainable AI.
科研通智能强力驱动
Strongly Powered by AbleSci AI