仿人机器人
人工智能
计算机科学
机器人
同步(交流)
计算机视觉
面子(社会学概念)
运动(物理)
自编码
手势
弹道
人机交互
人机交互
启发式
语音识别
适应性
机器人学
隐马尔可夫模型
衔接(社会学)
简单(哲学)
动作(物理)
自由度(物理和化学)
机器人控制
管道(软件)
社交机器人
作者
Y. Charlie Hu,Jiong Lin,Judah Goldfeder,Philippe Martin Wyder,Yifeng Cao,Steven Tian,Yunzhe Wang,Jingran Wang,M. Wang,Jie Zeng,Cameron Mehlman,Yingke Wang,Delin Zeng,Boyuan Chen,Hod Lipson
标识
DOI:10.5061/dryad.j6q573nrc
摘要
Lip motion represents outsized importance in human communication, capturing nearly half of our visual attention during conversation. Yet anthropomorphic robots often fail to achieve lip-audio synchronization, resulting in clumsy and lifeless lip behaviors. Two fundamental barriers underlay this challenge. First, robotic lips typically lack the mechanical complexity required to reproduce nuanced human mouth movements; second, existing synchronization methods depend on manually predefined movements and rules, restricting adaptability and realism. Here, we present a humanoid robot face designed to overcome these limitations, featuring soft silicone lips actuated by a ten-degree-of-freedom (10-DoF) mechanism. To achieve lip synchronization without predefined movements, we use a self-supervised learning pipeline based on a Variational Autoencoder (VAE) combined with a Facial Action Transformer, enabling the robot to autonomously infer more realistic lip trajectories directly from speech audio. Our experimental results suggest that this method outperforms simple heuristics like amplitude-based baselines in achieving more visually coherent lip-audio synchronization. Furthermore, the learned synchronization successfully generalizes across multiple linguistic contexts, enabling robot speech articulation in ten languages unseen during training.
科研通智能强力驱动
Strongly Powered by AbleSci AI