动作识别
计算机科学
残余物
块(置换群论)
人工智能
模式识别(心理学)
卷积神经网络
动作(物理)
数学
算法
班级(哲学)
物理
几何学
量子力学
作者
Du Tran,Heng Wang,Lorenzo Torresani,Jamie Ray,Yann LeCun,Manohar Paluri
出处
期刊:Computer Vision and Pattern Recognition
日期:2018-06-01
卷期号:: 6450-6459
被引量:3320
标识
DOI:10.1109/cvpr.2018.00675
摘要
In this paper we discuss several forms of spatiotemporal convolutions for video analysis and study their effects on action recognition. Our motivation stems from the observation that 2D CNNs applied to individual frames of the video have remained solid performers in action recognition. In this work we empirically demonstrate the accuracy advantages of 3D CNNs over 2D CNNs within the framework of residual learning. Furthermore, we show that factorizing the 3D convolutional filters into separate spatial and temporal components yields significantly gains in accuracy. Our empirical study leads to the design of a new spatiotemporal convolutional block "R(2+1)D" which produces CNNs that achieve results comparable or superior to the state-of-the-art on Sports-1M, Kinetics, UCF101, and HMDB51.
科研通智能强力驱动
Strongly Powered by AbleSci AI