计算机科学
背景(考古学)
人工智能
上下文模型
特征(语言学)
计算机视觉
弹丸
阅读(过程)
任务(项目管理)
聚类分析
特征提取
模式识别(心理学)
对象(语法)
古生物学
语言学
哲学
化学
管理
有机化学
政治学
法学
经济
生物
作者
Dong Liu,Nagendra Kamath,Subhabrata Bhattacharya,Rohit Puri
标识
DOI:10.1109/tcsvt.2020.3042476
摘要
Video scene detection is the task of temporally segmenting a video into its basic story units called scenes. We propose a temporal context aware scene detection method. For each shot in a video, we store the time-indexed features of its surrounding shots as its context memory. A context-reading operation is performed to read the most relevant information from the memory which is used to update the feature of the query shot. To adaptively determine the temporal scale of context memory for different queries, we apply a bank of context memories of different temporal scales to generate multiple context reads, and adaptively aggregate them according to their confidence scores. The adaptive context-reading is guided by a structure learning objective which encourages each shot to read the most appropriate context such that the global structure of scene can be revealed in the feature space. With the context-aware shot features learned by our method, we perform clustering to find the scene boundaries. Our experiments demonstrate that adaptively modeling temporal context yields the state-of-the-art results on the existing video scene detection datasets. We also construct a large-scale dataset for the task and our ablation studies on it show that the performance gains owe to the proposed adaptive context reading.
科研通智能强力驱动
Strongly Powered by AbleSci AI