隐藏字幕
水下
图像融合
计算机科学
计算机视觉
比例(比率)
遥感
人工智能
图像(数学)
地质学
地理
海洋学
地图学
作者
Huanyu Li,Li Li,Hao Wang,Weibo Zhang,Peng Ren
标识
DOI:10.1109/tgrs.2025.3585119
摘要
Underwater image captioning bridges the gap between visual perception and semantic understanding of underwater scenes, playing a crucial role in applications such as ocean geoscience and underwater remote sensing. Despite progress in this field, limitations remain in achieving accurate underwater image captioning. The main limitations are: (a) the underestimation of basic sketch features in underwater image captioning, and (b) insufficient consideration of the impact of scale differences in underwater objects. To overcome these limitations, we propose underwater image captioning with AquaSketch enhanced cross-scale information fusion. Our novel contributions are twofold: (a) A novel AquaSketch (i.e., aqua sketch) enhancement method is developed to reduce the impact of underwater image distortion on scene understanding, while enhancing both detailed and background information; and (b) A top-down dual-branch pyramid for cross-scale information fusion is proposed. This architecture fuses multi-scale feature information from two branches through an attention-based feature fusion structure, performing cross-scale fusion in a top-down manner. The resulting pyramid fusion features offer a comprehensive representation of underwater object information. Collectively, these contributions facilitate the generation of accurate and comprehensive underwater image captions. Experimental evaluations on three datasets demonstrate that our proposed underwater image captioning model achieves state-of-the-art performance in the field.
科研通智能强力驱动
Strongly Powered by AbleSci AI