计算机科学
代表(政治)
匹配(统计)
图像(数学)
注意力网络
情报检索
相似性(几何)
人工智能
隐藏字幕
利用
模式识别(心理学)
自然语言处理
数据挖掘
统计
政治
计算机安全
法学
数学
政治学
作者
Yan Wang,Yuting Su,Wenhui Li,Zhengya Sun,Zhiqiang Wei,Jie Nie,Xuanya Li,An-An Liu
标识
DOI:10.1016/j.ipm.2023.103280
摘要
Image and text matching bridges visual and textual modality differences and plays a considerable role in cross-modal retrieval. Much progress has been achieved through semantic representation and alignment. However, the distribution of multimedia data is severely unbalanced and contains many low-frequency occurrences, which are often ignored and cause performance degradation, i.e., the long-tail effect. In this work, we propose a novel rare-aware attention network (RAAN), which explores and exploits textual rare content for tackling the long-tail effect of image and text matching. Specifically, we first design a rare-aware mining module, which contains global prior information construction and rare fragment detector for modeling the characteristic of rare content. Then, the rare attention matching utilizes prior information as attention to guide the representation enhancement of rare content and introduces the rareness representation to strengthen the similarity calculation. Finally, we design prior information loss to optimize the model together with the triplet loss. We perform quantitative and qualitative experiments on two large-scale databases and achieve leading performance. In particular, we conduct 0-shot test for rare content and improve rSum by 21.0 and 41.5 on Flickr30K (155,000 image and text pairs) and MSCOCO (616,435 image and text pairs), demonstrating the effectiveness of the proposed method for the long-tail effect.
科研通智能强力驱动
Strongly Powered by AbleSci AI