计算机科学
胆小的
语音识别
人工智能
发音
解码方法
任务(项目管理)
过程(计算)
隐马尔可夫模型
自然语言处理
机器学习
模式识别(心理学)
电信
操作系统
哲学
经济
管理
语言学
作者
Ruitao Li,Xiaochen Lai
摘要
Mispronunciation Detection and Diagnosis (MDD) is one of the key components of the Computer Assisted Pronunciation Training (CAPT) system. The construction of the current mainstream MDD system is an automatic speech recognition (ASR) system based on DNN-HMM, on which a large amount of labeled data is required for training. In this paper, the self-supervised pre-training model wav2vec2.0 is applied to the MDD task. Self-supervised pre-training uses a large amount of unlabeled data to learn common features, and only a small amount of labeled data is required for training in subsequent applications. In order to utilize the prior text information, the audio features are combined with the text features through the attention mechanism, and the information of both is used in the decoding process. The experiment is conducted on the publicly available L2-Aritic and TIMIT datasets, yielding satisfactory results.
科研通智能强力驱动
Strongly Powered by AbleSci AI