隐藏字幕
计算机科学
领域(数学)
人工智能
多媒体
对抗制
自然语言处理
人机交互
图像(数学)
数学
纯数学
作者
Moloud Abdar,Meenakshi Kollati,K. Swaraja,Farhad Pourpanah,Daniel McDuff,Mohammad Ghavamzadeh,Shuicheng Yan,Abduallah Mohamed,Abbas Khosravi,Erik Cambria,Fatih Porikli
出处
期刊:Cornell University - arXiv
日期:2023-01-01
被引量:9
标识
DOI:10.48550/arxiv.2304.11431
摘要
Video captioning (VC) is a fast-moving, cross-disciplinary area of research that bridges work in the fields of computer vision, natural language processing (NLP), linguistics, and human-computer interaction. In essence, VC involves understanding a video and describing it with language. Captioning is used in a host of applications from creating more accessible interfaces (e.g., low-vision navigation) to video question answering (V-QA), video retrieval and content generation. This survey covers deep learning-based VC, including but, not limited to, attention-based architectures, graph networks, reinforcement learning, adversarial networks, dense video captioning (DVC), and more. We discuss the datasets and evaluation metrics used in the field, and limitations, applications, challenges, and future directions for VC.
科研通智能强力驱动
Strongly Powered by AbleSci AI