驾驶舱
计算机科学
机制(生物学)
语音识别
语音增强
语音处理
人机交互
人工智能
工程类
航空学
认识论
哲学
降噪
作者
YingWei Tan,Xuefeng Ding
标识
DOI:10.1109/o-cocosda60357.2023.10482915
摘要
The success of deep learning has significantly benefited single-channel speech enhancement in terms of intelligibility and perceptual quality. Traditional approaches have primarily relied on a single model to predict the clean version of the speech signal. Considering the advantages of different model structures, we design heterogeneous network frameworks with attention mechanism. We operate at the waveform level. The weighted outputs of different systems are combined to acquire the final waveform. We propose two strategies, scalar weights and vector-based attention weights, to lead the allocation of weights, respectively. Additionally, we train the proposed model end-to-end. It is optimized in both the time and frequency domains, employing multiple loss functions to achieve the desired performance. Experiments are conducted on synthesized dataset in car intelligent cockpit environments. In terms of perceptual evaluation of speech quality (PESQ), short-time objective intelligibility (STOI) and scale-invariant source-to-noise ratio (SI-SNR), the results show the proposed framework achieves 1.87, 8.38%, and 18.43 improvements over the man-made noisy data in the speech enhancement experiment. Besides, the presented algorithms achieves 3.20% word error rate (WER) improvements over the same data in the speech recognition experiment.
科研通智能强力驱动
Strongly Powered by AbleSci AI