计算机科学
弹丸
人工智能
动作识别
投影(关系代数)
计算机视觉
模式识别(心理学)
一次性
特征提取
算法
班级(哲学)
机械工程
工程类
有机化学
化学
作者
Yangbo Feng,Junyu Gao,Changsheng Xu
标识
DOI:10.1109/tmm.2024.3399453
摘要
In this paper, we propose a new task named incremental few-shot action recognition (IFSAR), which aims to learn new action classes incrementally with limited samples. Existing few-shot class incremental learning methods are mainly designed for image datasets and cannot be directly applied to action recognition due to the complicated temporal evolution and spatial structure in videos. Besides, because of the incremental and fewshot setting, the catastrophic forgetting and overfitting problems are further intensified in the video domain. To address the above issues, we propose a spatiotemporal orthogonal projection capsule network (STOP), which employs a spatiotemporal attention routing mechanism and an orthogonal projection capsule layer for effective IFSAR. The former can effectively encode spatial and temporal transformation information and explore the action partwhole relationships to prevent catastrophic forgetting, while the latter is further designed to maintain a sufficient distance between the prototypes of old and novel classes to avoid overfitting by considering spatial-temporal features. Extensive experimental results demonstrate that the proposed method outperforms a series of state-of-the-art approaches on UCF-101, Kinetics-100, and HMDB-51 datasets.
科研通智能强力驱动
Strongly Powered by AbleSci AI