姿势
计算机科学
人工智能
三维姿态估计
计算机视觉
模式识别(心理学)
作者
Chunyang Xie,Dongheng Zhang,Zhi Wu,Cong Yu,Yang Hu,Yan Chen
标识
DOI:10.1109/tcsvt.2023.3287329
摘要
Advanced human sensing technologies based on radio frequency (RF) signals have gained widespread attention in recent years. However, due to the sparsity and incompleteness of RF signals, fine-grained RF-based multi-person 3D pose estimation has progressed more slowly. In this paper, we present RF-based Pose Machine (RPM 2.0) for multi-person 3D pose estimation using RF signals. Specifically, we first develop a lightweight anchor-free detector module to locate and crop regions of interest from horizontal and vertical RF signals. Afterward, we treat the horizontal and vertical millimeter-wave radars as "RF cameras" with different viewing angles and propose a Multi-view Fusion Network to unproject the RF signals into a unified latent feature space, and then calculate the correlation for weighted fusion. Finally, a Spatio-Temporal Attention Network is designed to reconstruct the multi-person 3D skeleton sequences, in which the spatial attention module is proposed to recover invisible body parts using non-local correlations among joints and the temporal attention module refines the 3D pose sequences using temporal coherency learned from frame queries. We evaluate the performance of the proposed RPM 2.0 and state-of-the-art methods on a large-scale dataset with multi-person 3D pose labels and corresponding radar signals. The experimental results show that RPM 2.0 outperforms all of the baseline methods, which locates multi-person 3D key points with an average error of $73 mm$ and generalizes well in new data such as occlusion, low illumination.
科研通智能强力驱动
Strongly Powered by AbleSci AI