Semantic Pre-Alignment and Ranking Learning With Unified Framework for Cross-Modal Retrieval

计算机科学情报检索排名（信息检索）人工智能语义学（计算机科学）图像检索学习排名一致性（知识库）显式语义分析特征（语言学）视觉文字语义计算图像（数学）语义网语义技术语言学哲学程序设计语言

作者

Qingrong Cheng,Zhenshan Tan,Keyu Wen,Cheng Chen,Xiaodong Gu

出处

期刊：IEEE Transactions on Circuits and Systems for Video Technology [Institute of Electrical and Electronics Engineers]
日期：2022-06-13 卷期号：34 (7): 6503-6516 被引量：18

标识

DOI：10.1109/tcsvt.2022.3182549

摘要

Cross-modal retrieval aims at retrieving highly semantic relevant information among multi-modalities. Existing cross-modal retrieval methods mainly explore the semantic consistency between image and text while rarely consider the rankings of positive instances in the retrieval results. Moreover, these methods seldom take into account the cross-interaction between image and text, which leads to the deficiency of learning their semantic relations. In this paper, we propose a Unified framework with Ranking Learning (URL) for cross-modal retrieval. The unified framework consists of three sub-networks, visual network, textual network, and interaction network. Visual network and textual network project the image feature and text feature into their corresponding hidden spaces respectively. Then, the interaction network forces the target image-text representation to align in the common space. For unifying both semantics and rankings, we propose a new optimization paradigm including pre-alignment for semantic knowledge transfer and ranking learning for final retrieval, which can decouple semantic alignment and ranking learning. The former focuses on the semantic pre-alignment optimized by semantic classification and the latter revolves around the retrieval rankings. For the ranking learning, we introduce a cross-AP loss which can directly optimize the retrieval metric average precision for cross-modal retrieval. We conduct experiments on four widely-used benchmarks, including Wikipedia dataset, Pascal Sentence dataset, NUS-WIDE-10k dataset, and PKU XMediaNet dataset respectively. Extensive experimental results show that the proposed method can obtain higher retrieval precision.

求助该文献

最长约 10秒，即可获得该文献文件

Semantic Pre-Alignment and Ranking Learning With Unified Framework for Cross-Modal Retrieval

今日热心研友