计算机科学
自然语言处理
背景(考古学)
人工智能
生物
古生物学
作者
Zhongjian Hu,Peng Yang,Yuanshuang Jiang,Zijian Bai
标识
DOI:10.1016/j.patcog.2024.110399
摘要
Existing studies apply Large Language Model (LLM) to knowledge-based Visual Question Answering (VQA) with encouraging results. Due to the insufficient input information, the previous methods still have shortcomings in constructing the prompt for LLM, and can not fully activate the capacity of LLM. In addition, previous works adopt GPT-3 for inference, which has expensive costs. In this paper, we propose PCPA: a framework that Prompts LLM with Context and Pre-Answer for VQA. Specifically, we adopt a vanilla VQA model to generate in-context examples and candidate answers, and add a pre-answer selection layer to generate pre-answers. We integrate in-context examples and pre-answers into the prompt to inspire the LLM. In addition, we choose LLaMA instead of GPT-3, which is an open and free model. We build a small dataset to fine-tune the LLM. Compared to existing baselines, the PCPA improves accuracy by more than 2.1 and 1.5 on OK-VQA and A-OKVQA, respectively.
科研通智能强力驱动
Strongly Powered by AbleSci AI