流利
计算机科学
序数回归
话语
人工智能
二元分类
自然语言处理
人工神经网络
样品(材料)
分类器(UML)
语音识别
机器学习
心理学
支持向量机
色谱法
数学教育
化学
作者
Shaoguang Mao,Zhiyong Wu,Jingshuai Jiang,Peiyun Liu,Frank K. Soong
标识
DOI:10.1109/icassp.2019.8682187
摘要
Automatic assessment of a language learner's speech fluency is highly desirable for language education, e.g. for English as a Second Language (ESL) learning. In this paper, we formulate the fluency assessment as a problem of Ordinal Regression with Anchored Reference Samples (ORARS), where the fluency of a speech utterance is predicted by an ordinal regression neural network (NN) trained with anchored reference samples. The ORARS is trained and tested by: picking human expert labeled samples in each mean opinion score (MOS) bucket as the anchored reference samples and pairing them with input speech samples as training couplets; training an NN-based binary classifier to determine which sample in a pair is better in fluency; predicting the rank (MOS) of a test sample based upon the posteriors of all binary comparisons between the test sample and all anchored reference samples. Experimentally, our proposed approach outperforms the traditional NN-based methods and reaches a performance of "human parity", i.e. as comparable as human experts, in its fluency assessment of collected ESL speech. To the best of our knowledge, this is the first attempt to assess speech fluency with an ordinal regression framework where a test input is paired with bucketed and anchored reference samples.
科研通智能强力驱动
Strongly Powered by AbleSci AI