PoseScript: Linking 3D Human Poses and Natural Language

计算机科学 人工智能 自然语言 自然(考古学) 自然语言处理 计算机视觉 人机交互 历史 考古
作者
Ginger Delmas,Philippe Weinzaepfel,Thomas G. Lucas,Francesc Moreno-Noguer,Grégory Rogez
出处
期刊:IEEE Transactions on Pattern Analysis and Machine Intelligence [Institute of Electrical and Electronics Engineers]
卷期号:47 (7): 5146-5159 被引量:3
标识
DOI:10.1109/tpami.2024.3407570
摘要

Natural language plays a critical role in many computer vision applications, such as image captioning, visual question answering, and cross-modal retrieval, to provide fine-grained semantic information. Unfortunately, while human pose is key to human understanding, current 3D human pose datasets lack detailed language descriptions. To address this issue, we have introduced the PoseScript dataset. This dataset pairs more than six thousand 3D human poses from AMASS with rich human-annotated descriptions of the body parts and their spatial relationships. Additionally, to increase the size of the dataset to a scale that is compatible with data-hungry learning algorithms, we have proposed an elaborate captioning process that generates automatic synthetic descriptions in natural language from given 3D keypoints. This process extracts low-level pose information, known as "posecodes", using a set of simple but generic rules on the 3D keypoints. These posecodes are then combined into higher level textual descriptions using syntactic rules. With automatic annotations, the amount of available data significantly scales up (100k), making it possible to effectively pretrain deep models for finetuning on human captions. To showcase the potential of annotated poses, we present three multi-modal learning tasks that utilize the PoseScript dataset. Firstly, we develop a pipeline that maps 3D poses and textual descriptions into a joint embedding space, allowing for cross-modal retrieval of relevant poses from large-scale datasets. Secondly, we establish a baseline for a text-conditioned model generating 3D poses. Thirdly, we present a learned process for generating pose descriptions. These applications demonstrate the versatility and usefulness of annotated poses in various tasks and pave the way for future research in the field.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
丘比特应助dd采纳,获得10
1秒前
11111发布了新的文献求助10
1秒前
林搞搞完成签到,获得积分10
1秒前
2秒前
Mcarry发布了新的文献求助20
2秒前
2秒前
liusong发布了新的文献求助10
5秒前
轻松凌柏完成签到 ,获得积分10
6秒前
yating发布了新的文献求助10
6秒前
大老黑发布了新的文献求助10
8秒前
9秒前
liusong完成签到,获得积分10
10秒前
七七完成签到,获得积分10
12秒前
12秒前
Nowind完成签到,获得积分10
13秒前
14秒前
16秒前
16秒前
16秒前
Baccano发布了新的文献求助10
17秒前
研友_8WMY7n完成签到 ,获得积分10
18秒前
大老黑完成签到,获得积分10
18秒前
19秒前
dd发布了新的文献求助10
20秒前
乐乐应助清新的幼旋采纳,获得10
20秒前
20秒前
小二郎应助violet采纳,获得10
21秒前
24秒前
cadet发布了新的文献求助10
24秒前
24秒前
Pastime发布了新的文献求助10
24秒前
量子星尘发布了新的文献求助10
25秒前
zh完成签到,获得积分10
26秒前
dd完成签到,获得积分20
26秒前
Jasper应助唐展通采纳,获得10
27秒前
29秒前
橙子完成签到,获得积分10
30秒前
30秒前
轻松凌柏发布了新的文献求助10
31秒前
pancake发布了新的文献求助80
32秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Introduction to strong mixing conditions volume 1-3 5000
Ägyptische Geschichte der 21.–30. Dynastie 2500
Human Embryology and Developmental Biology 7th Edition 2000
The Developing Human: Clinically Oriented Embryology 12th Edition 2000
Clinical Microbiology Procedures Handbook, Multi-Volume, 5th Edition 2000
„Semitische Wissenschaften“? 1510
热门求助领域 (近24小时)
化学 材料科学 生物 医学 工程类 计算机科学 有机化学 物理 生物化学 纳米技术 复合材料 内科学 化学工程 人工智能 催化作用 遗传学 数学 基因 量子力学 物理化学
热门帖子
关注 科研通微信公众号,转发送积分 5741889
求助须知:如何正确求助?哪些是违规求助? 5404554
关于积分的说明 15343509
捐赠科研通 4883431
什么是DOI,文献DOI怎么找? 2625018
邀请新用户注册赠送积分活动 1573876
关于科研通互助平台的介绍 1530812