Geometric Matching for Cross-Modal Retrieval

匹配(统计) 情态动词 计算机科学 人工智能 模式识别(心理学) 情报检索 数学 统计 材料科学 高分子化学
作者
Zheng Wang,Zhenwei Gao,Yang Yang,Guoqing Wang,Chengbo Jiao,Heng Tao Shen
出处
期刊:IEEE transactions on neural networks and learning systems [Institute of Electrical and Electronics Engineers]
卷期号:: 1-13 被引量:3
标识
DOI:10.1109/tnnls.2024.3381347
摘要

Despite its significant progress, cross-modal retrieval still suffers from one-to-many matching cases, where the multiplicity of semantic instances in another modality could be acquired by a given query. However, existing approaches usually map heterogeneous data into the learned space as deterministic point vectors. In spite of their remarkable performance in matching the most similar instance, such deterministic point embedding suffers from the insufficient representation of rich semantics in one-to-many correspondence. To address the limitations, we intuitively extend a deterministic point into a closed geometry and develop geometric representation learning methods for cross-modal retrieval. Thus, a set of points inside such a geometry could be semantically related to many candidates, and we could effectively capture the semantic uncertainty. We then introduce two types of geometric matching for one-to-many correspondence, i.e., point-to-rectangle matching (dubbed P2RM) and rectangle-to-rectangle matching (termed R2RM). The former treats all retrieved candidates as rectangles with zero volume (equivalent to points) and the query as a box, while the latter encodes all heterogeneous data into rectangles. Therefore, we could evaluate semantic similarity among heterogeneous data by the Euclidean distance from a point to a rectangle or the volume of intersection between two rectangles. Additionally, both strategies could be easily employed for off-the-self approaches and further improve the retrieval performance of baselines. Under various evaluation metrics, extensive experiments and ablation studies on several commonly used datasets, two for image-text matching and two for video-text retrieval, demonstrate our effectiveness and superiority.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
山茶发布了新的文献求助10
1秒前
1秒前
SciGPT应助shuqi采纳,获得10
1秒前
2秒前
悲凉的翼完成签到 ,获得积分10
4秒前
小二郎应助听闻采纳,获得10
4秒前
5秒前
月夕发布了新的文献求助30
5秒前
6秒前
7秒前
lll完成签到 ,获得积分10
8秒前
9秒前
大虾发布了新的文献求助10
9秒前
油条发布了新的文献求助20
10秒前
丹丹给丹丹的求助进行了留言
10秒前
Bebeans应助洁净的易梦采纳,获得30
11秒前
科研通AI5应助0720jy采纳,获得10
12秒前
李怡怡发布了新的文献求助30
13秒前
14秒前
15秒前
夭夭发布了新的文献求助20
17秒前
17秒前
22秒前
慕青应助热闹的冬天采纳,获得10
23秒前
传奇3应助热闹的冬天采纳,获得10
23秒前
Jasper应助热闹的冬天采纳,获得10
23秒前
25秒前
27秒前
长发飘飘完成签到 ,获得积分10
27秒前
27秒前
27秒前
深情安青应助熹贵妃采纳,获得10
28秒前
1111发布了新的文献求助10
29秒前
鲸鲸完成签到,获得积分10
31秒前
32秒前
kaiX完成签到,获得积分10
32秒前
听闻发布了新的文献求助10
32秒前
36秒前
西红柿发布了新的文献求助10
36秒前
37秒前
高分求助中
Applied Survey Data Analysis (第三版, 2025) 800
Assessing and Diagnosing Young Children with Neurodevelopmental Disorders (2nd Edition) 700
Images that translate 500
Algorithmic Mathematics in Machine Learning 500
Handbook of Innovations in Political Psychology 400
Mapping the Stars: Celebrity, Metonymy, and the Networked Politics of Identity 400
Nucleophilic substitution in azasydnone-modified dinitroanisoles 300
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3842679
求助须知:如何正确求助?哪些是违规求助? 3384676
关于积分的说明 10536789
捐赠科研通 3105234
什么是DOI,文献DOI怎么找? 1710162
邀请新用户注册赠送积分活动 823493
科研通“疑难数据库(出版商)”最低求助积分说明 774110