计算机科学
人工智能
卷积(计算机科学)
水准点(测量)
计算
模式识别(心理学)
卷积神经网络
空间分析
动作识别
深度学习
特征提取
钥匙(锁)
计算机视觉
人工神经网络
算法
数学
统计
计算机安全
大地测量学
班级(哲学)
地理
作者
Huilan Luo,Han Chen,Yiu-ming Cheung,Yan Yu
标识
DOI:10.1117/1.jei.31.4.043007
摘要
Video action recognition methods based on deep learning can be divided into two types: two-dimensional convolutional networks (2D-ConvNets) relied and three-dimensional convolutional networks (3D-ConvNets) relied. 2D-ConvNets are more efficient to learn spatial features, but cannot capture temporal relationships directly. 3D-ConvNets can jointly learn spatial–temporal features, but their learning is time-consuming because of a large number of networks’ parameters. We therefore propose an effective spatial–temporal interaction (STI) module. The 2D spatial convolution and the one-dimensional temporal convolution are combined through attention mechanism in STI to learn the spatial–temporal information effectively and efficiently. The computation cost of the proposed method is far less than 3D convolution. The proposed STI module can be combined with 2D-ConvNets to obtain the effect of 3D-ConvNets with far fewer parameters, and it can also be inserted into 3D-ConvNets to improve their ability to learn spatial–temporal features, so as to improve the recognition accuracy. Experimental results show that the proposed method outperforms the existing counterparts on benchmark datasets.
科研通智能强力驱动
Strongly Powered by AbleSci AI