计算机科学
人工智能
姿势
关系(数据库)
单眼
计算机视觉
块(置换群论)
噪音(视频)
感知
三维姿态估计
机器学习
深度学习
模式识别(心理学)
功率(物理)
关节式人体姿态估计
特征(语言学)
降噪
特征学习
还原(数学)
接头(建筑物)
机制(生物学)
判别式
监督学习
可视化
方案(数学)
数据建模
实体造型
帧(网络)
重点(电信)
对比度(视觉)
标识
DOI:10.1109/tmm.2026.3654424
摘要
Nowadays, diffusion-based methods for monocular 3D human pose estimation (3D HPE) have achieved state-of-the-art performance by directly regressing the 3D joint coordinates from the 2D observations. Although some methods incorporated the human body prior to improve the denoising quality, the absense of the structural relation and pose-aware guidance make these models prone to generating unreasonable poses. The challenge is noticeable in complex conditions such as occlusions and crowded scenarios. To alleviate this, we present MMCPose, a novel Multi-modal Condition-driven 3D HPE framework via diffusion models that capitalizes on the benefits of the multi-modal conditioning input. Specifically, we propose Multi-modal Condition Learning (MCL) strategy to incorporate multi-modal conditions such as joint- wise relation, part-aware prompt and pose-aware mask to improve the generation quality. The MCL block consists of (i) Joint- wise Relation Condition Learning (JRCL) models the flexible joint- wise relation via GCN to mitigate disturbances arising from confused joints. (ii) Part-aware Prompt Condition Learning (PPCL) constructs multi-granular prompts via accessible texts and feasible knowledge of body parts with learnable prompts to model implicit textual guidance. (iii) Pose-aware Mask Condition Learning (PMCL) designs a pose-specific mask to increase the model's emphasis to the pose region, augmenting the precision in capturing intricate pose details. Furthermore, we explore a multi-modal condition-pose interaction learning (MCPI) mechanism to establish interaction between the learned multi-modal conditions and poses to maximize the power of condition effect. This method fully unleashes the perceptual capability of the multi-modal conditions in diffusion-based 3D HPE. Extensive evaluations conducted on two popular benchmarks (e.g., Human3.6 M, MPI-INF-3DHP) and achieve new state-of-the-art performance.
科研通智能强力驱动
Strongly Powered by AbleSci AI