计算机科学
人工智能
特征(语言学)
动作(物理)
模式识别(心理学)
动作识别
人工神经网络
特征提取
特征选择
钥匙(锁)
领域(数学)
作者
Junchi Lu,Zhitong Liu,Bing Xu,Yu Fu,H. J. Yang
标识
DOI:10.1109/iccc68654.2025.11437800
摘要
The vulnerability of RGB-based human action recognition systems in complex environments and dynamic scenarios can be mitigated through the integration of skeleton modality. Thus, multimodal action recognition methods that collaborate RGB and skeleton data have been gaining growing attention. However, due to insufficient optimization of sampling methods, feature modeling strategies, and cross-modal fusion strategies, the recognition performance of existing methods remains limited. To address these limitations, we propose a multi-modal feature synergy in dual-stream network with crossattention for action recognition (MMActionFormer) which is specifically designed to leverage the complementary semantic information between RGB and skeleton modalities to achieve better action recognition performance. Specifically, we first design modality-specific sampling strategies based on the inherent advantages of RGB and skeleton data. Subsequently, spatial cues derived from the skeleton are utilized to guide the adaptive cropping of key motion regions within RGB frames, thereby mitigating the confounding effect of irrelevant background clutter. Furthermore, a lightweight feature encoding module is introduced to perform discriminative representation learning, which retains action-related key semantic features while achieving dimension reduction and improving computational efficiency. Notably, a novel cross-attention mechanism is elaborately designed to model inter-modal dependencies and facilitate bidirectional feature refinement between RGB and skeleton representations. Experiments conducted on action datasets (UCF101, HMDB-51, Kinetics400, and Kinetics600) show that the proposed MMActionFormer effectively leverages the complementary properties of RGB and skeleton modalities, thereby significantly improving recognition accuracy. Importantly, our framework achieves competitive performance compared with existing representative methods while significantly accelerating inference speed.
科研通智能强力驱动
Strongly Powered by AbleSci AI