计算机科学
关系(数据库)
人工智能
计算机视觉
对象(语法)
目标检测
任务(项目管理)
集合(抽象数据类型)
视频跟踪
谓词(数理逻辑)
模式识别(心理学)
数据挖掘
经济
管理
程序设计语言
作者
Xindi Shang,Tongwei Ren,Jingfan Guo,Hanwang Zhang,Tat‐Seng Chua
标识
DOI:10.1145/3123266.3123380
摘要
As a bridge to connect vision and language, visual relations between objects in the form of relation triplet $łangle subject,predicate,object\rangle$, such as "person-touch-dog'' and "cat-above-sofa'', provide a more comprehensive visual content understanding beyond objects. In this paper, we propose a novel vision task named Video Visual Relation Detection (VidVRD) to perform visual relation detection in videos instead of still images (ImgVRD). As compared to still images, videos provide a more natural set of features for detecting visual relations, such as the dynamic relations like "A-follow-B'' and "A-towards-B'', and temporally changing relations like "A-chase-B'' followed by "A-hold-B''. However, VidVRD is technically more challenging than ImgVRD due to the difficulties in accurate object tracking and diverse relation appearances in video domain. To this end, we propose a VidVRD method, which consists of object tracklet proposal, short-term relation prediction and greedy relational association. Moreover, we contribute the first dataset for VidVRD evaluation, which contains 1,000 videos with manually labeled visual relations, to validate our proposed method. On this dataset, our method achieves the best performance in comparison with the state-of-the-art baselines.
科研通智能强力驱动
Strongly Powered by AbleSci AI