计算机科学
管道(软件)
发电机(电路理论)
语音识别
隐藏字幕
领域(数学)
人工智能
图像(数学)
语音合成
深度学习
直线(几何图形)
自然语言处理
计算机视觉
程序设计语言
功率(物理)
物理
几何学
数学
量子力学
纯数学
作者
K. Bhargav Ram,B. Venkatesh,Sala Pooja Sai Sree,Chunduru Anilkumar,V. Reddy,Bhavya Kodumuri
标识
DOI:10.1109/icaiss58487.2023.10250554
摘要
Image Caption generation is one of the challenging tasks in the field of artificial intelligence. It is used to generate a textual description for a given picture. But due to, the recent advancement in deep learning techniques requires only one single end-to-end model to create a caption for the given image. This paper is intended to provide an image caption and speech generator used to generate a single-line description for a given image and audio/speech for the report generated. The researchers used a sophisticated pipeline of specifically designed models in the previous models. To achieve the proposed model, VGG16 and LSTM models are used to obtain descriptions for the image and GTTS API is used for the audio/speech generation.
科研通智能强力驱动
Strongly Powered by AbleSci AI