倒谱
语音识别
计算机科学
Mel倒谱
参数统计
集合(抽象数据类型)
音节
词(群论)
单词识别
线性预测
动态时间归整
人工智能
模式识别(心理学)
数学
特征提取
语言学
统计
几何学
哲学
程序设计语言
阅读(过程)
作者
S. Davis,P. Mermelstein
出处
期刊:IEEE Transactions on Acoustics, Speech, and Signal Processing
[Institute of Electrical and Electronics Engineers]
日期:1980-08-01
卷期号:28 (4): 357-366
被引量:5224
标识
DOI:10.1109/tassp.1980.1163420
摘要
Several parametric representations of the acoustic signal were compared with regard to word recognition performance in a syllable-oriented continuous speech recognition system. The vocabulary included many phonetically similar monosyllabic words, therefore the emphasis was on the ability to retain phonetically significant acoustic information in the face of syntactic and duration variations. For each parameter set (based on a mel-frequency cepstrum, a linear frequency cepstrum, a linear prediction cepstrum, a linear prediction spectrum, or a set of reflection coefficients), word templates were generated using an efficient dynamic warping method, and test data were time registered with the templates. A set of ten mel-frequency cepstrum coefficients computed every 6.4 ms resulted in the best performance, namely 96.5 percent and 95.0 percent recognition with each of two speakers. The superior performance of the mel-frequency cepstrum coefficients may be attributed to the fact that they better represent the perceptually relevant aspects of the short-term speech spectrum.
科研通智能强力驱动
Strongly Powered by AbleSci AI