计算机科学
相似性(几何)
人工智能
任务(项目管理)
情报检索
自然语言处理
互联网
噪声数据
噪音(视频)
机器学习
万维网
管理
经济
图像(数学)
作者
Haochen Han,Kaiyao Miao,Qinghua Zheng,Minnan Luo
标识
DOI:10.1109/cvpr52729.2023.00726
摘要
Despite the success of multimodal learning in crossmodal retrieval task, the remarkable progress relies on the correct correspondence among multimedia data. However, collecting such ideal data is expensive and time-consuming. In practice, most widely used datasets are harvested from the Internet and inevitably contain mismatched pairs. Training on such noisy correspondence datasets causes performance degradation because the cross-modal retrieval methods can wrongly enforce the mismatched data to be similar. To tackle this problem, we propose a Meta Similarity Correction Network (MSCN) to provide reliable similarity scores. We view a binary classification task as the meta-process that encourages the MSCN to learn discrimination from positive and negative meta-data. To further alleviate the influence of noise, we design an effective data purification strategy using meta-data as prior knowledge to remove the noisy samples. Extensive experiments are conducted to demonstrate the strengths of our method in both synthetic and real-world noises, including Flickr30K, MS-COCO, and Conceptual Captions. Our code is publicly available. 1 1 https://github.com/hhc1997/MSCN
科研通智能强力驱动
Strongly Powered by AbleSci AI