计算机科学
变压器
模式识别(心理学)
人工智能
动作识别
理论计算机科学
图形
拓扑(电路)
算法
数学
量子力学
组合数学
物理
电压
班级(哲学)
作者
Haowei Liu,Yongcheng Liu,Yuxin Chen,Chunfeng Yuan,Bing Li,Weiming Hu
标识
DOI:10.1109/tcsvt.2023.3240472
摘要
In skeleton-based action recognition, it has been a dominant paradigm to extract motion features with temporal convolution and model spatial correlations with graph convolution. However, it's difficult for temporal convolution to capture long-range dependencies effectively. Meanwhile, commonly used multi-branch graph convolution leads to high complexity. In this paper, we propose TranSkeleton, a powerful Transformer framework which neatly unifies the spatial and temporal modeling of skeleton sequences. For temporal modeling, we propose a novel partition-aggregation temporal Transformer. It works with hierarchical temporal partition and aggregation, and can capture both long-range dependencies and subtle temporal structures effectively. A difference-aware aggregation approach is designed to reduce information loss during temporal aggregation. For spatial modeling, we propose a topology-aware spatial Transformer which utilizes the prior information of human body topology to facilitate spatial correlation modeling. Extensive experiments on two challenging benchmark datasets demonstrate that TranSkeleton notably outperforms the state of the arts.
科研通智能强力驱动
Strongly Powered by AbleSci AI