Bi-Modality Individual-Aware Prompt Tuning for Visual-Language Model

计算机科学 模态(人机交互) 人工智能 计算机视觉 自然语言处理
作者
Hantao Yao,Rui Zhang,Haochang Lyu,Yongdong Zhang,Changsheng Xu
出处
期刊:IEEE Transactions on Pattern Analysis and Machine Intelligence [IEEE Computer Society]
卷期号:47 (8): 6352-6368 被引量:2
标识
DOI:10.1109/tpami.2025.3557780
摘要

Prompt tuning is a valuable technique for adapting visual language models (VLMs) to different downstream tasks, such as domain generalization and learning from a few examples. Previous methods have utilized Context Optimization approaches to deduce domain-shared or cross-modality prompt tokens, which enhance generalization and discriminative ability in textual or visual contexts. However, these prompt tokens, inferred from training data, cannot adapt perfectly to the distribution of the test dataset. This work introduces a novel approach called Bi-modality Individual-aware Prompt Tuning (BIP) by explicitly incorporating the individual's essential prior knowledge into the learnable prompt to enhance their discriminability and generalization. The critical insight of BIP involves applying the Textual Knowledge Embedding (TKE) and Visual Knowledge Embedding (VKE) models to project the class-aware textual essential knowledge and the instance-aware essential knowledge into the class-aware prompt and instance-aware prompt, referred to as Textual-level Class-aware Prompt tuning (TCP) and Visual-level Instance-aware Prompt tuning (VIP). On the one hand, TCP integrates the generated class-aware prompts into the Text Encoder to produce a dynamic class-aware classifier to improve generalization on unseen domains. On the other hand, VIP uses the instance-aware prompt to generate the dynamic visual embedding of each instance, thereby enhancing the discriminative capability of visual embedding. Comprehensive evaluations demonstrate that BIP can be used as a plug-and-play module easily integrated with existing methods and achieves superior performance on 15 benchmarks across four tasks.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
时迁完成签到 ,获得积分10
1秒前
派大星发布了新的文献求助10
1秒前
会飞的鱼完成签到,获得积分10
2秒前
要减肥芯发布了新的文献求助10
3秒前
7秒前
9秒前
10秒前
的服务费完成签到,获得积分10
13秒前
Hwwww完成签到,获得积分10
13秒前
14秒前
hehehehe发布了新的文献求助10
14秒前
124发布了新的文献求助10
15秒前
阳光思萱发布了新的文献求助10
18秒前
菜鸡游泳发布了新的文献求助10
18秒前
cpl完成签到 ,获得积分10
19秒前
AllRightReserved应助Ttttt采纳,获得10
22秒前
科研通AI6.3应助slx采纳,获得10
23秒前
渔夫完成签到,获得积分10
27秒前
yayika完成签到 ,获得积分10
31秒前
花生油炒花生米完成签到,获得积分10
33秒前
waa完成签到,获得积分10
34秒前
读万卷书完成签到 ,获得积分10
34秒前
124完成签到,获得积分10
34秒前
纹银完成签到,获得积分10
36秒前
左肩微笑完成签到,获得积分10
41秒前
42秒前
inyh59完成签到,获得积分10
46秒前
46秒前
46秒前
Maple完成签到,获得积分10
46秒前
大模型应助晓以情采纳,获得10
47秒前
张德帅完成签到,获得积分10
49秒前
黄任行完成签到,获得积分10
52秒前
一个豆升级版完成签到,获得积分10
52秒前
54秒前
踏实绮波应助1111采纳,获得10
56秒前
动人的亦旋完成签到,获得积分10
59秒前
RuiLi完成签到,获得积分10
1分钟前
志灰灰完成签到,获得积分10
1分钟前
cdercder应助FWCY采纳,获得10
1分钟前
高分求助中
Adhesion Science: Principles & Practice 1234
Signals, Systems, and Signal Processing 610
Petrology and Plate Tectonics,2025 450
Physiological Engineering Aspects of Penicillium chrysogenum 400
Circular Polar Constellations Providing Continuous Single or Multiple Coverage Above a Specified Latitude 400
Social democracy and urban politics Party responses to the diversifying left in European cities 400
Burger's Medicinal Chemistry and Drug Discovery 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6741612
求助须知:如何正确求助?哪些是违规求助? 8472906
关于积分的说明 18074660
捐赠科研通 6010269
什么是DOI,文献DOI怎么找? 3003456
邀请新用户注册赠送积分活动 1979987
关于科研通互助平台的介绍 1944300