隐藏字幕
计算机科学
编码器
人工智能
图像(数学)
图像检索
相似性(几何)
情报检索
计算机视觉
自然语言处理
操作系统
作者
Genc Hoxha,Farid Melgani,Jacopo Slaghenauffi
标识
DOI:10.1109/m2garss47143.2020.9105191
摘要
Remote sensing (RS) image captioning has been recently attracting the attention of the community as it provides more semantic information with respect to the traditional tasks such as scene classification. Image captioning aims to generate a coherent and comprehensive description that summarizes the content of an image. The description can be obtained directly from the ground truth descriptions of similar images (retrieval based image captioning) or can be generated through the encoder-decoder framework. The former has the limitation of not generating new descriptions. The latter may be affected by misrecognition of scenes or semantic objects. In this paper we try to address these issues by proposing a new framework which is a combination of generation and retrieval based image captioning. First a CNN-RNN framework combined with beam-search generates multiple captions for a target image. Then the best caption is selected on the basis of its lexical similarity with the reference captions of most similar images. Experimental results on RSCID dataset are reported and discussed.
科研通智能强力驱动
Strongly Powered by AbleSci AI