计算机科学
人工智能
语音识别
动画
情态动词
深度学习
作者
Swapna Agarwal,Dipanjan Das,Brojeshwar Bhowmick
出处
期刊:European Signal Processing Conference
日期:2021-01-24
卷期号:: 690-694
标识
DOI:10.23919/eusipco47968.2020.9287778
摘要
Recent advances in Convolutional Neural Network (CNN) based approaches have been able to generate convincing talking heads. Personalization of such talking heads requires training of the model with a large number of examples of the target person. This is also time consuming. In this paper, we propose a meta-learning based few-shot approach for generating personalized 2D talking heads where the lip animation is driven by a given audio. The idea is that the model is meta-trained with a dataset consisting of a large variety of subjects’ ethnicity and vocabulary. We show that our meta-trained model is then capable of generating realistic animation for previously unseen face and unseen audio when finetuned with only a few-shot examples for a very short time (180 seconds). Considering the fact that facial expressions driven by audio are mainly expressed through motion around lips, we restrict ourselves to animating lip only. We have done the experiments on two publicly available datasets: GRID and TCD-TIMIT and our own captured data of Asian people. Both qualitative and quantitative analysis show that animations generated by such meta-learned model surpasses the state-of-the-art methods both in terms of realism and time taken.
科研通智能强力驱动
Strongly Powered by AbleSci AI