Exploring Effective Factors for Improving Visual In-Context Learning

计算机科学 背景(考古学) 可视化 人工智能 上下文模型 计算机视觉 对象(语法) 生物 古生物学
作者
Yanpeng Sun,Qiang Chen,Jian Wang,Jingdong Wang,Zechao Li
出处
期刊:IEEE transactions on image processing [Institute of Electrical and Electronics Engineers]
卷期号:34: 2147-2160 被引量:3
标识
DOI:10.1109/tip.2025.3554410
摘要

The In-Context Learning (ICL) is to understand a new task via a few demonstrations (aka. prompt) and predict new inputs without tuning the models. While it has been widely studied in NLP, it is still a relatively new area of research in computer vision. To reveal the factors influencing the performance of visual in-context learning, this paper shows that Prompt Selection and Prompt Fusion are two major factors that have a direct impact on the inference performance of visual in-context learning. Prompt selection is the process of selecting the most suitable prompt for query image. This is crucial because high-quality prompts assist large-scale visual models in rapidly and accurately comprehending new tasks. Prompt fusion involves combining prompts and query images to activate knowledge within large-scale visual models. However, altering the prompt fusion method significantly impacts its performance on new tasks. Based on these findings, we propose a simple framework prompt-SelF to improve visual in-context learning. Specifically, we first use the pixel-level retrieval method to select a suitable prompt, and then use different prompt fusion methods to activate diverse knowledge stored in the large-scale vision model, and finally, ensemble the prediction results obtained from different prompt fusion methods to obtain the final prediction results. We conducted extensive experiments on single-object segmentation and detection tasks to demonstrate the effectiveness of prompt-SelF. Remarkably, prompt-SelF has outperformed OSLSM method-based meta-learning in 1-shot segmentation for the first time. This indicated the great potential of visual in-context learning. The source code and models will be available at https://github.com/syp2ysy/prompt-SelF.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
刚刚
彩色藏鸟完成签到,获得积分10
刚刚
1秒前
小马甲应助codwest采纳,获得10
2秒前
sunny完成签到,获得积分10
3秒前
alv完成签到,获得积分10
3秒前
ahslyycky完成签到,获得积分10
3秒前
yang完成签到,获得积分10
4秒前
Aria完成签到,获得积分10
4秒前
gfbh完成签到,获得积分10
5秒前
6秒前
科研通AI6.3应助yoga采纳,获得10
7秒前
Smartjian完成签到,获得积分10
8秒前
华仔应助风趣依丝采纳,获得30
9秒前
9秒前
molihuakai应助轻松的水壶采纳,获得10
11秒前
hhh完成签到,获得积分10
11秒前
汉堡包应助半个桃子采纳,获得10
11秒前
nnnaaaa完成签到,获得积分10
12秒前
12秒前
13秒前
13秒前
14秒前
一名不知死活的研究生完成签到,获得积分10
15秒前
16秒前
隐形曼青应助宋二庆采纳,获得10
16秒前
xiaotan完成签到,获得积分10
16秒前
16秒前
Desperate完成签到 ,获得积分10
16秒前
00完成签到,获得积分10
17秒前
脑洞疼应助舒适寻凝采纳,获得10
17秒前
喵喵的鱼发布了新的文献求助30
17秒前
一小朵完成签到,获得积分10
17秒前
老实晓露完成签到 ,获得积分10
18秒前
李健应助收醉人采纳,获得10
19秒前
中中恭喜完成签到,获得积分20
20秒前
20秒前
codwest完成签到,获得积分10
20秒前
21秒前
博蘭一笑发布了新的文献求助10
22秒前
高分求助中
Psychopathic Traits and Quality of Prison Life 1000
Malcolm Fraser : a biography 680
Signals, Systems, and Signal Processing 610
天津市智库成果选编 600
Forced degradation and stability indicating LC method for Letrozole: A stress testing guide 500
全相对论原子结构与含时波包动力学的理论研究--清华大学 500
A Foreign Missionary on the Long March: The Unpublished Memoirs of Arnolis Hayman of the China Inland Mission 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6452687
求助须知:如何正确求助?哪些是违规求助? 8264409
关于积分的说明 17611542
捐赠科研通 5518123
什么是DOI,文献DOI怎么找? 2904165
邀请新用户注册赠送积分活动 1880991
关于科研通互助平台的介绍 1723316