过度拟合
判别式
计算机科学
人工智能
特征(语言学)
模式识别(心理学)
背景(考古学)
特征学习
班级(哲学)
机器学习
特征提取
情态动词
卷积神经网络
辍学(神经网络)
人工神经网络
古生物学
语言学
哲学
化学
高分子化学
生物
作者
Xiao Wang,Yan Yan,Hai‐Miao Hu,Bo Li,Hanzi Wang
标识
DOI:10.1109/tip.2024.3354104
摘要
Few-shot action recognition aims to recognize new unseen categories with only a few labeled samples of each class. However, it still suffers from the limitation of inadequate data, which easily leads to the overfitting and low-generalization problems. Therefore, we propose a cross-modal contrastive learning network (CCLN), consisting of an adversarial branch and a contrastive branch, to perform effective few-shot action recognition. In the adversarial branch, we elaborately design a prototypical generative adversarial network (PGAN) to obtain synthesized samples for increasing training samples, which can mitigate the data scarcity problem and thereby alleviate the overfitting problem. When the training samples are limited, the obtained visual features are usually suboptimal for video understanding as they lack discriminative information. To address this issue, in the contrastive branch, we propose a cross-modal contrastive learning module (CCLM) to obtain discriminative feature representations of samples with the help of semantic information, which can enable the network to enhance the feature learning ability at the class-level. Moreover, since videos contain crucial sequences and ordering information, thus we introduce a spatial-temporal enhancement module (SEM) to model the spatial context within video frames and the temporal context across video frames. The experimental results show that the proposed CCLN outperforms the state-of-the-art few-shot action recognition methods on four challenging benchmarks, including Kinetics, UCF101, HMDB51 and SSv2.
科研通智能强力驱动
Strongly Powered by AbleSci AI