模态(人机交互)
计算机科学
人工智能
代表(政治)
模式
特征学习
情态动词
杠杆(统计)
一般化
机器学习
动作识别
特征(语言学)
模式识别(心理学)
数学
班级(哲学)
数学分析
社会科学
语言学
化学
哲学
社会学
政治
政治学
高分子化学
法学
作者
Jianghao Zhang,Xian Zhong,Wenxuan Liu,Kui Jiang,Zhengwei Yang,Zheng Wang
标识
DOI:10.1109/icip49359.2023.10222496
摘要
Human action recognition is an active research topic in recent years. Multiple modalities often convey heterogeneous but potentially complementary action information that single modality does not hold. Some efforts have been resoted to explore cross-modal representation to promote the modeling capability, but with limited improvement due to the simple fusion of different modalities. To this end, we propose an impliCit attention-based Cross-modal Collaborative Learning (C3L) for action recognition. Specifically, we apply a Modality Generalization network with Grayscale enhancement (MGG) to learn specific modality representation and interaction (infrared and RGB). Then, we construct a unified representation space through the Uniform Modality Representation module (UMR), which preserves the modality information while enhancing the overall representation ability. Finally, feature extractors adaptively leverage modality-specific knowledge to realize cross-modal collaborative learning. Extensive experiments conducted on three widely-used public benchmarks InfAR, HMDB51, and UCF101, demonstrate the effectiveness and strength of our proposed method.
科研通智能强力驱动
Strongly Powered by AbleSci AI