隐藏字幕
计算机科学
遥感
图像(数学)
计算机视觉
地质学
作者
Yunpeng Li,Xiangrong Zhang,Tianyang Zhang,Guanchun Wang,Xinlin Wang,Shuo Li
出处
期刊:Remote Sensing
[Multidisciplinary Digital Publishing Institute]
日期:2024-10-27
卷期号:16 (21): 3987-3987
被引量:2
摘要
Recent Transformer-based works can generate high-quality captions for remote sensing images (RSIs). However, these methods generally feed global or grid visual features to a Transformer-based captioning model for associating cross-modal information, which limits performance. In this work, we investigate unexplored ideas for remote sensing image captioning task, using a novel patch-level region-aware module with a multi-label framework. Due to an overhead perspective and a significantly larger scale in RSIs, a patch-level region-aware module is designed to filter the redundant information in the RSI scene, which benefits the Transformer-based decoder by attaining improved image perception. Technically, the trainable multi-label classifier capitalizes on semantic features as supplementary to the region-aware features. Moreover, modeling the inner relations of inputs is essential for understanding the RSI. Thus, we introduce region-oriented attention, which associates region features and semantic labels, omits the irrelevant regions to highlight relevant regions, and learns related semantic information. Extensive qualitative and quantitative experimental results show the superiority of our approach on the RSICD, UCM-Captions, and Sydney-Captions. The code for our method will be publicly available.
科研通智能强力驱动
Strongly Powered by AbleSci AI