已入深夜,您辛苦了!由于当前在线用户较少,发布求助请尽量完整地填写文献信息,科研通机器人24小时在线,伴您度过漫漫科研夜!祝你早点完成任务,早点休息,好梦!

Multi-Modal Feature Synergy in Dual-Stream Networks with Cross-Attention for Action Recognition

计算机科学 人工智能 特征(语言学) 动作(物理) 模式识别(心理学) 动作识别 人工神经网络 特征提取 特征选择 钥匙(锁) 领域(数学)
作者
Junchi Lu,Zhitong Liu,Bing Xu,Yu Fu,H. J. Yang
标识
DOI:10.1109/iccc68654.2025.11437800
摘要

The vulnerability of RGB-based human action recognition systems in complex environments and dynamic scenarios can be mitigated through the integration of skeleton modality. Thus, multimodal action recognition methods that collaborate RGB and skeleton data have been gaining growing attention. However, due to insufficient optimization of sampling methods, feature modeling strategies, and cross-modal fusion strategies, the recognition performance of existing methods remains limited. To address these limitations, we propose a multi-modal feature synergy in dual-stream network with crossattention for action recognition (MMActionFormer) which is specifically designed to leverage the complementary semantic information between RGB and skeleton modalities to achieve better action recognition performance. Specifically, we first design modality-specific sampling strategies based on the inherent advantages of RGB and skeleton data. Subsequently, spatial cues derived from the skeleton are utilized to guide the adaptive cropping of key motion regions within RGB frames, thereby mitigating the confounding effect of irrelevant background clutter. Furthermore, a lightweight feature encoding module is introduced to perform discriminative representation learning, which retains action-related key semantic features while achieving dimension reduction and improving computational efficiency. Notably, a novel cross-attention mechanism is elaborately designed to model inter-modal dependencies and facilitate bidirectional feature refinement between RGB and skeleton representations. Experiments conducted on action datasets (UCF101, HMDB-51, Kinetics400, and Kinetics600) show that the proposed MMActionFormer effectively leverages the complementary properties of RGB and skeleton modalities, thereby significantly improving recognition accuracy. Importantly, our framework achieves competitive performance compared with existing representative methods while significantly accelerating inference speed.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
lululu发布了新的文献求助10
2秒前
Pinocchio完成签到,获得积分10
4秒前
鸽子发布了新的文献求助10
5秒前
5秒前
CodeCraft应助计蒙采纳,获得30
5秒前
哈哈哈哈哈噶完成签到 ,获得积分10
5秒前
6秒前
852应助jiyuan采纳,获得10
7秒前
8秒前
9秒前
11秒前
11秒前
13秒前
学术蠢驴完成签到 ,获得积分10
13秒前
是人完成签到 ,获得积分10
16秒前
敏感小熊猫完成签到,获得积分10
16秒前
coconut完成签到,获得积分10
17秒前
17秒前
香蕉觅云应助顺心的大娘采纳,获得10
17秒前
Jiangbs发布了新的文献求助10
18秒前
xiaxiaxia发布了新的文献求助10
18秒前
19秒前
脑洞疼应助Dr大壮采纳,获得10
19秒前
20秒前
20秒前
Robin发布了新的文献求助10
23秒前
23秒前
彭于晏应助lululu采纳,获得10
24秒前
25秒前
木卡卡完成签到,获得积分10
27秒前
27秒前
27秒前
Moonpie应助酷炫的__采纳,获得10
28秒前
28秒前
MiyaGuo发布了新的文献求助10
29秒前
31秒前
Lsh完成签到,获得积分10
31秒前
31秒前
鸽子完成签到 ,获得积分10
32秒前
小白发布了新的文献求助10
33秒前
高分求助中
Malcolm Fraser : a biography 680
Signals, Systems, and Signal Processing 610
天津市智库成果选编 600
Climate change and sports: Statistics report on climate change and sports 500
Forced degradation and stability indicating LC method for Letrozole: A stress testing guide 500
全相对论原子结构与含时波包动力学的理论研究--清华大学 500
Organic Reactions Volume 118 400
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6456200
求助须知:如何正确求助?哪些是违规求助? 8266626
关于积分的说明 17619340
捐赠科研通 5522824
什么是DOI,文献DOI怎么找? 2905100
邀请新用户注册赠送积分活动 1881825
关于科研通互助平台的介绍 1725210