计算机科学
编码器
人工智能
空间相关性
相关性
编码(内存)
模式识别(心理学)
空间分析
计算机视觉
数据挖掘
数学
几何学
电信
统计
操作系统
作者
Yong Wang,Hongbo Kang,Doudou Wu,Wenming Yang,Longbin Zhang
标识
DOI:10.1109/tmm.2023.3321438
摘要
Transformers have been used for 3D human pose estimation with excellent performance; however, most transformers focus on encoding the global spatio-temporal correlation of all joints in the human body and there are few studies on the local Spatio-temporal correlation of each joint in the human body. In this article, we propose a Global and Local Spatio-Temporal Encoder (GLSTE) to model the Spatio-temporal correlation. Specifically, a Global Spatial Encoder (GSE) and a Global Temporal Encoder (GTE) are constructed to capture the global spatial information of all joints in a single frame and the global temporal information of all frames, respectively. A Local Spatio-Temporal Encoder (LSTE) is constructed to capture the spatial and temporal information of each joint in the local N frames. Furthermore, we propose a parallel attention module with weight sharing to better incorporate spatial and temporal information into each node simultaneously. Extensive experiments show that GLSTE outperforms state-of-the-art methods with fewer parameters and less computational overhead on two challenging datasets: Human3.6 M and MPI-INF-3DHP. Especially in the evaluation of Human3.6 M dataset, the results of our method with 27 frames as input are better than the vast majority of recent SOTA methods with 81 and 243 frames as input, which indicates that the model can learn more useful information with smaller inputs.
科研通智能强力驱动
Strongly Powered by AbleSci AI