计算机科学
背景(考古学)
语言模型
人工智能
自然语言处理
认知科学
心理学
生物
古生物学
作者
Wenjuan Han,Haozhe Zhao,Zefan Cai
标识
DOI:10.1145/3603165.3607368
摘要
Pretrained visual-language models (VLMs) have made progress in developing multimodal models to improve various tasks. However, they lack reasoning and in-context learning ability. Building on the success of large language models (LLMs) in general-purple NLP tasks, researchers anticipate that the VLM should also have the same strong reasoning and ICL ability through specific techniques, for example benefiting from LLMs. To boost VLMs to solve vision-language problems via few-shot exemplars, we suggest a vision-language model, called MIC1.
科研通智能强力驱动
Strongly Powered by AbleSci AI