Cross-Modal Retrieval With Partially Mismatched Pairs

过度拟合 计算机科学 杠杆(统计) 情态动词 人工智能 稳健性(进化) 水准点(测量) 估计员 机器学习 数学 统计 人工神经网络 大地测量学 基因 生物化学 化学 高分子化学 地理
作者
Peng Hu,Zhenyu Huang,Dezhong Peng,Xu Wang,Xi Peng
出处
期刊:IEEE Transactions on Pattern Analysis and Machine Intelligence [IEEE Computer Society]
卷期号:45 (8): 9595-9610 被引量:40
标识
DOI:10.1109/tpami.2023.3247939
摘要

In this paper, we study a challenging but less-touched problem in cross-modal retrieval, i.e., partially mismatched pairs (PMPs). Specifically, in real-world scenarios, a huge number of multimedia data (e.g., the Conceptual Captions dataset) are collected from the Internet, and thus it is inevitable to wrongly treat some irrelevant cross-modal pairs as matched. Undoubtedly, such a PMP problem will remarkably degrade the cross-modal retrieval performance. To tackle this problem, we derive a unified theoretical Robust Cross-modal Learning framework (RCL) with an unbiased estimator of the cross-modal retrieval risk, which aims to endow the cross-modal retrieval methods with robustness against PMPs. In detail, our RCL adopts a novel complementary contrastive learning paradigm to address the following two challenges, i.e., the overfitting and underfitting issues. On the one hand, our method only utilizes the negative information which is much less likely false compared with the positive information, thus avoiding the overfitting issue to PMPs. However, these robust strategies could induce underfitting issues, thus making training models more difficult. On the other hand, to address the underfitting issue brought by weak supervision, we present to leverage of all available negative pairs to enhance the supervision contained in the negative information. Moreover, to further improve the performance, we propose to minimize the upper bounds of the risk to pay more attention to hard samples. To verify the effectiveness and robustness of the proposed method, we carry out comprehensive experiments on five widely-used benchmark datasets compared with nine state-of-the-art approaches w.r.t. the image-text and video-text retrieval tasks. The code is available at https://github.com/penghu-cs/RCL.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
刚刚
1秒前
1秒前
1秒前
1秒前
2秒前
科研通AI2S应助Chen采纳,获得10
2秒前
赘婿应助技术T工采纳,获得10
2秒前
Cope发布了新的文献求助30
3秒前
yuyu发布了新的文献求助10
3秒前
桐桐应助b_wasky采纳,获得10
3秒前
bkagyin应助VDC采纳,获得10
3秒前
沉默钢笔完成签到,获得积分20
4秒前
明天发布了新的文献求助10
4秒前
木子发布了新的文献求助10
5秒前
xjp发布了新的文献求助10
5秒前
ARIA完成签到,获得积分10
5秒前
温暖幻桃发布了新的文献求助10
5秒前
无头的小米完成签到,获得积分10
6秒前
又晴发布了新的文献求助10
7秒前
7秒前
MD99发布了新的文献求助10
7秒前
8秒前
阳光明媚完成签到,获得积分10
8秒前
赘婿应助xjp采纳,获得10
10秒前
10秒前
枫叶发布了新的文献求助10
11秒前
11秒前
李爱国应助沉默钢笔采纳,获得10
11秒前
魏清芦发布了新的文献求助10
12秒前
Jasper应助3youmutou采纳,获得10
13秒前
14秒前
碳碳焢烃发布了新的文献求助10
15秒前
16秒前
lierikafei发布了新的文献求助10
18秒前
NexusExplorer应助光亮的代云采纳,获得10
18秒前
gaodayu完成签到 ,获得积分10
18秒前
传奇3应助mmd采纳,获得10
18秒前
18秒前
19秒前
高分求助中
Encyclopedia of Mathematical Physics 2nd edition 888
Technologies supporting mass customization of apparel: A pilot project 600
Mechanochemistry of Solid Surfaces 500
材料概论 周达飞 ppt 500
Nonrandom distribution of the endogenous retroviral regulatory elements HERV-K LTR on human chromosome 22 500
Introduction to Strong Mixing Conditions Volumes 1-3 500
Optical and electric properties of monocrystalline synthetic diamond irradiated by neutrons 320
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 物理 生物化学 纳米技术 计算机科学 化学工程 内科学 复合材料 物理化学 电极 遗传学 量子力学 基因 冶金 催化作用
热门帖子
关注 科研通微信公众号,转发送积分 3806767
求助须知:如何正确求助?哪些是违规求助? 3351517
关于积分的说明 10354367
捐赠科研通 3067322
什么是DOI,文献DOI怎么找? 1684457
邀请新用户注册赠送积分活动 809699
科研通“疑难数据库(出版商)”最低求助积分说明 765606