计算机科学
卷积神经网络
人工智能
模式识别(心理学)
一般化
特征(语言学)
上下文图像分类
帧(网络)
特征提取
领域(数学分析)
再培训
班级(哲学)
比例(比率)
机器学习
图像(数学)
数学
数学分析
电信
语言学
哲学
物理
量子力学
国际贸易
业务
作者
Andrej Karpathy,George Toderici,Sanketh Shetty,Thomas Leung,Rahul Sukthankar,Li Fei-Fei
标识
DOI:10.1109/cvpr.2014.223
摘要
Convolutional Neural Networks (CNNs) have been established as a powerful class of models for image recognition problems. Encouraged by these results, we provide an extensive empirical evaluation of CNNs on large-scale video classification using a new dataset of 1 million YouTube videos belonging to 487 classes. We study multiple approaches for extending the connectivity of a CNN in time domain to take advantage of local spatio-temporal information and suggest a multiresolution, foveated architecture as a promising way of speeding up the training. Our best spatio-temporal networks display significant performance improvements compared to strong feature-based baselines (55.3% to 63.9%), but only a surprisingly modest improvement compared to single-frame models (59.3% to 60.9%). We further study the generalization performance of our best model by retraining the top layers on the UCF-101 Action Recognition dataset and observe significant performance improvements compared to the UCF-101 baseline model (63.3% up from 43.9%).
科研通智能强力驱动
Strongly Powered by AbleSci AI