计算机科学
语音识别
维纳滤波器
胆小的
语音增强
人工神经网络
人工智能
噪音(视频)
话语
深度学习
字错误率
模式识别(心理学)
算法
隐马尔可夫模型
降噪
图像(数学)
作者
Haoyan Pei,Keliang Song,Tianyu Zhu
标识
DOI:10.1109/iccasit55263.2022.9987143
摘要
Speech activity detection (VAD) algorithms based on deep neural networks (DNNs) ignore the temporal correlation of acoustic features between speech frames, which greatly reduces the performance in noisy environments. To solve this problem, this paper proposes a hybrid network structure based on deep neural network (DNN) and long short-term memory (LSTM), combining the nonlinear learning ability and long sequence node analysis ability of both to learn the dynamic changes of speech signals over time, and optimizing them with wavelet transform and BPTT algorithms. Meanwhile, the signal processing framework is combined with the Wiener filtering algorithm to cope with the untrained noise types in deep learning. Compared with the separate deep learning network and speech signal processing system, the DNN-LSTM-Wiener model has better acoustic modeling ability and speech recognition ability in realistic environments. The study uses the TIMIT corpus for experiments to compare with traditional acoustic models. The experimental results show that the utterance error rate of DNN-LSTM model combined with Wiener filtering algorithm decreases to 21.68%, which is more advantageous in recognition accuracy and still has accurate detection ability at lower signal-to-noise ratio.
科研通智能强力驱动
Strongly Powered by AbleSci AI