Hypersphere-Based Remote Sensing Cross-Modal Text–Image Retrieval via Curriculum Learning

超球体计算机科学人工智能特征学习推论模式识别（心理学）稳健性（进化）特征提取特征（语言学）嵌入 MNIST数据库机器学习深度学习哲学化学基因生物化学语言学

作者

W Zhang,Jihao Li,Shuoke Li,Jialiang Chen,Wenkai Zhang,Xin Gao,Xian Sun

出处

期刊：IEEE Transactions on Geoscience and Remote Sensing [Institute of Electrical and Electronics Engineers]
日期：2023-01-01 卷期号：61: 1-15 被引量：13

标识

DOI：10.1109/tgrs.2023.3318227

摘要

Remote sensing cross-modal text-image retrieval (RSCTIR) is a flexible and human-centered approach to retrieving rich information from different modalities, which has attracted plenty of attention in recent years. It remains challenging because the current methods usually ignore the varying difficulty levels of different sample pairs, stemming from the large image distribution difference and the high text similarity in the remote sensing (RS) field. Therefore, in this paper, we propose an innovative hypersphere-based visual semantic alignment (HVSA) network via curriculum learning. Specifically, we first design an adaptive alignment strategy based on curriculum learning, that aligns RS image-text pairs from easy to hard. Sample pairs with different levels of difficulty are treated unequally, and we obtain a better embedding representation when projecting the features onto the unit hypersphere. Then, to measure the robustness of cross-modal feature alignment on the unit hypersphere, we introduce the feature uniformity strategy. It reduces the occurrence of mismatching cases and improves generalization performance. Finally, we design the key-entity attention (KEA) mechanism to alleviate the problem of information imbalance among different modalities. KEA has the ability to extract information about the key entity which is aligned with textual information. Despite its conciseness, our framework achieves state-of-the-art performance on classical datasets of RSCTIR tasks while enjoying faster inference. The summed recall of HVSA on the RISCD and RSITMD is 120.97 and 198.94, 2.50 and 10.49 points ahead of the current best methods, respectively. Extensive experiments demonstrate the competitiveness of our method. The code has been released at https://github.com/ZhangWeihang99/HVSA.

求助该文献

最长约 10秒，即可获得该文献文件

Hypersphere-Based Remote Sensing Cross-Modal Text–Image Retrieval via Curriculum Learning

今日热心研友