计算机科学
人工智能
特征提取
模式识别(心理学)
自编码
特征(语言学)
语音识别
卷积神经网络
深度学习
语言学
哲学
作者
Bo Li,Kele Xu,Dawei Feng,Haibo Mi,Huaimin Wang,Jian Zhu
标识
DOI:10.1109/icassp.2019.8682806
摘要
B-mode ultrasound tongue imaging is widely used in the speech production field. However, efficient interpretation is in a great need for the tongue image sequences. Inspired by the recent success of unsupervised deep learning approach, we explore unsupervised convolutional network architecture for the feature extraction in the ultrasound tongue image, which can be helpful for the clinical linguist and phonetics. By quantitative comparison between different unsupervised feature extraction approaches, the denoising convolutional autoencoder (DCAE)-based method outperforms the other feature extraction methods on the reconstruction task and the 2010 silent speech interface challenge. A Word Error Rate of 6.17% is obtained with DCAE, compared to the state-of-the-art value of 6.45% using Discrete cosine transform as the feature extractor. Our codes are available at https://github.com/DeePBluE666/Source-code1.
科研通智能强力驱动
Strongly Powered by AbleSci AI