计算机科学
情态动词
相似性(几何)
人工智能
一致性(知识库)
任务(项目管理)
推论
计算机视觉
模态(人机交互)
情报检索
模式识别(心理学)
图像(数学)
管理
经济
化学
高分子化学
作者
Jihong Guan,Yulou Shu,Wengen Li,Zihan Song,Yichao Zhang
出处
期刊:Remote Sensing
[Multidisciplinary Digital Publishing Institute]
日期:2025-06-20
卷期号:17 (13): 2117-2117
被引量:2
摘要
With the development of satellite technology, remote sensing images have become increasingly accessible, making multi-modal remote sensing retrieval increasingly important. However, most existing methods rely on global visual and textual features to compute similarity, ignoring the positional correspondence between image regions and textual descriptions. To address this issue, we propose a novel cross-modal retrieval model named PR-CLIP, which leverages a cross-modal positional information reconstruction task to learn position-aware correlations between modalities. Specifically, PR-CLIP first uses a cross-modal positional information extraction module to extract the complementary features between images and texts. Then, the unimodal positional information filtering module filters out the complementary information from the unimodal features to generate embeddings for reconstruction. Finally, the cross-modal positional information reconstruction module reconstructs the unimodal embeddings of the images and texts based on the complete embeddings of the opposite modality, guided by a cross-modal positional consistency loss to ensure reconstruction quality. During the inference stage of retrieval, PR-CLIP directly calculates the similarity between the unimodal features without executing the modules of the reconstruction task. By combining the advantages of dual-stream and single-stream models, PR-CLIP achieves a good balance between performance and efficiency. Extensive experiments on multiple public datasets demonstrated the effectiveness of PR-CLIP.
科研通智能强力驱动
Strongly Powered by AbleSci AI