弹丸
代表(政治)
动作(物理)
人工智能
计算机科学
行动学习
动作识别
模式识别(心理学)
心理学
政治学
班级(哲学)
数学教育
物理
化学
政治
合作学习
法学
有机化学
量子力学
教学方法
作者
Xiao Wang,Yang Lu,YU Wan-chuan,Yanwei Pang,Hanzi Wang
标识
DOI:10.1109/tcsvt.2024.3384875
摘要
Few-shot action recognition aims to recognize novel action classes with limited labeled samples and has recently received increasing attention. The core objective of few-shot action recognition is to enhance the discriminability of feature representations. In this paper, we propose a novel multi-view representation learning network (MRLN) to model intra-video and inter-video relations for few-shot action recognition. Specifically, we first propose a spatial-aware aggregation refinement module (SARM), which mainly consists of a spatial-aware aggregation sub-module and a spatial-aware refinement sub-module to explore the spatial context of samples at the frame level. Then, we design a temporal-channel enhancement module (TCEM), which can capture the temporal-aware and channel-aware features of samples with the elaborately designed temporal-aware enhancement sub-module and channel-aware enhancement sub-module. Third, we introduce a cross-video relation module (CVRM), which can explore the relations across videos by utilizing the self-attention mechanism. Moreover, we design a prototype-centered mean absolute error loss to improve the feature learning capability of the proposed MRLN. Extensive experiments on four prevalent few-shot action recognition benchmarks show that the proposed MRLN can significantly outperform a variety of state-of-the-art few-shot action recognition methods. Especially, on the 5-way 1-shot setting, our MRLN respectively achieves 75.7%, 86.9%, 65.5% and 45.9% on the Kinetics, UCF101, HMDB51 and SSv2 datasets.
科研通智能强力驱动
Strongly Powered by AbleSci AI