计算机科学
姿势
背景(考古学)
人工智能
骨干网
卷积(计算机科学)
人群
特征(语言学)
关节式人体姿态估计
适应性
国家(计算机科学)
计算机视觉
网络体系结构
模式识别(心理学)
机器学习
三维姿态估计
人工神经网络
算法
生态学
生物
哲学
古生物学
语言学
计算机安全
计算机网络
作者
Qun Li,Ziyi Zhang,Feng Zhang,Fu Xiao
标识
DOI:10.1109/tmm.2023.3248144
摘要
Occlusion handling in crowded scenes is an intractable challenge for human pose estimation. To address this problem, we propose two novel feed-forward network structures named Global Feed-Forward Network (GFFN) and Dynamic Feed-Forward Network (DFFN), which are specifically designed for image-based tasks to capture both local and global contextual information within intermediate features and update feature representations with high adaptability for occlusions. By exploiting the context modeling ability of the proposed GFFN and DFFN, we present a novel backbone network, namely High-Resolution Context Network (HRNeXt), which learns high-resolution representations with abundant contextual information to better estimate poses of occluded human bodies. Compared to state-of-the-art pose estimation networks, our HRNeXt absorbs advantages of convolution operation and attention mechanism, and it is more efficient in terms of training data sizes, network parameters and computational costs. Experimental results show that our HRNeXt significantly outperforms state-of-the-art backbone networks on challenging pose estimation datasets with high occurrence of crowds and occlusions.
科研通智能强力驱动
Strongly Powered by AbleSci AI