Text-Video Retrieval with Global-Local Semantic Consistent Learning

计算机科学 视频检索 情报检索 人工智能 图像检索 语义学(计算机科学) 自然语言处理 图像(数学) 程序设计语言
作者
Haonan Zhang,Pengpeng Zeng,Lianli Gao,Jingkuan Song,Yihang Duan,Xinyu Lyu,Heng Tao Shen
出处
期刊:IEEE transactions on image processing [Institute of Electrical and Electronics Engineers]
卷期号:: 1-1 被引量:1
标识
DOI:10.1109/tip.2025.3574925
摘要

Adapting large-scale image-text pre-training models, e.g., CLIP, to the video domain represents the current state-of-the-art for text-video retrieval. The primary approaches involve transferring text-video pairs to a common embedding space and leveraging cross-modal interactions on specific entities for semantic alignment. Though effective, these paradigms entail prohibitive computational costs, leading to inefficient retrieval. To address this, we propose a simple yet effective method, Global-Local Semantic Consistent Learning (GLSCL), which capitalizes on latent shared semantics across modalities for text-video retrieval. Specifically, we introduce a parameter-free global interaction module to explore coarse-grained alignment. Then, we devise a shared local interaction module that employs several learnable queries to capture latent semantic concepts for learning fine-grained alignment. Furthermore, an Inter-Consistency Loss (ICL) is devised to accomplish the concept alignment between the visual query and corresponding textual query, and an Intra-Diversity Loss (IDL) is developed to repulse the distribution within visual (textual) queries to generate more discriminative concepts. Extensive experiments on five widely used benchmarks (i.e., MSR-VTT, MSVD, DiDeMo, LSMDC, and ActivityNet) substantiate the superior effectiveness and efficiency of the proposed method. Remarkably, our method achieves comparable performance with SOTA as well as being nearly 220 times faster in terms of computational cost. Code is available at: https://github.com/zchoi/GLSCL.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
热爱胃肠的小赖完成签到,获得积分10
1秒前
科研通AI6.1应助Nana1000采纳,获得10
1秒前
打打应助松尐采纳,获得10
2秒前
OK应助安详的寻菱采纳,获得20
2秒前
ng9Rr8完成签到,获得积分10
2秒前
2秒前
研友_LjDgxZ完成签到,获得积分10
3秒前
3秒前
5秒前
6秒前
CipherSage应助hhhhh采纳,获得10
6秒前
6秒前
Leo完成签到,获得积分10
6秒前
amazing39完成签到,获得积分10
7秒前
满意溪流完成签到 ,获得积分10
9秒前
cm5257完成签到 ,获得积分10
9秒前
文艺豌豆完成签到,获得积分10
10秒前
调皮嫣娆完成签到,获得积分10
10秒前
Hong发布了新的文献求助10
11秒前
尊尼霍家发布了新的文献求助10
11秒前
科研通AI6.1应助科研蠢狗采纳,获得10
12秒前
米娅完成签到,获得积分10
12秒前
pqx完成签到,获得积分10
13秒前
hu发布了新的文献求助10
13秒前
13秒前
Lucas应助科研通管家采纳,获得30
14秒前
kusedayang发布了新的文献求助10
14秒前
Orange应助文艺豌豆采纳,获得10
14秒前
tuyibo发布了新的文献求助10
14秒前
传奇3应助科研通管家采纳,获得10
14秒前
Orange应助科研通管家采纳,获得10
14秒前
19826536343发布了新的文献求助10
14秒前
genomed应助科研通管家采纳,获得10
15秒前
15秒前
genomed应助科研通管家采纳,获得10
15秒前
WENc完成签到,获得积分10
15秒前
CipherSage应助科研通管家采纳,获得10
15秒前
15秒前
无极微光应助科研通管家采纳,获得20
15秒前
16秒前
高分求助中
液晶指向矢仿真分析数据集 8888
Invited Discussant 63O and 64O 1000
Ideology and Meaning-Making under the Putin Regime 750
Petrology and Plate Tectonics 500
Writing Systems 500
A Handbook of User Experience Research & Design in Libraries 400
Understanding Modeling and Simulation of Polymerization Reactions 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 计算机科学 化学工程 生物化学 物理 内科学 复合材料 催化作用 光电子学 物理化学 电极 细胞生物学 基因 遗传学
热门帖子
关注 科研通微信公众号,转发送积分 6896937
求助须知:如何正确求助?哪些是违规求助? 8592516
关于积分的说明 18244481
捐赠科研通 6293962
什么是DOI,文献DOI怎么找? 3060890
关于科研通互助平台的介绍 2079967
邀请新用户注册赠送积分活动 2038655