答疑
常识
常识推理
蒸馏
计算机科学
人工智能
自然语言处理
知识抽取
化学
色谱法
作者
Shuo Yang,Siwen Luo,Soyeon Caren Han
出处
期刊:Cornell University - arXiv
日期:2024-11-04
标识
DOI:10.48550/arxiv.2411.02722
摘要
Existing Multimodal Large Language Models (MLLMs) and Visual Language Pretrained Models (VLPMs) have shown remarkable performances in the general Visual Question Answering (VQA). However, these models struggle with VQA questions that require external commonsense knowledge due to the challenges in generating high-quality prompts and the high computational costs of fine-tuning. In this work, we propose a novel graph-based multimodal commonsense knowledge distillation framework that constructs a unified relational graph over commonsense knowledge, visual objects and questions through a Graph Convolutional Network (GCN) following a teacher-student environment. This proposed framework is flexible with any type of teacher and student models without further fine-tuning, and has achieved competitive performances on the ScienceQA dataset.
科研通智能强力驱动
Strongly Powered by AbleSci AI