光谱图
波形
计算机科学
语音识别
稳健性(进化)
编码器
编码(内存)
人工智能
时域
特征(语言学)
模式识别(心理学)
频域
计算机视觉
电信
雷达
生物化学
化学
语言学
哲学
基因
操作系统
作者
Hao Shi,Masato Mimura,Tatsuya Kawahara
标识
DOI:10.1109/taslp.2024.3407511
摘要
While waveform-domain speech enhancement (SE) has been extensively investigated in recent years and achieves state-of-the-art performance in many datasets, spectrogram-based SE tends to show robust and stable enhancement behavior. In this paper, we propose a waveform-spectrogram hybrid method (WaveSpecEnc) to improve the robustness of waveform-domain SE. WaveSpecEnc refines the corresponding temporal feature map by spectrogram encoding in each encoder layer. Incorporating spectral information provides robust human hearing experience performance. However, it has a minor automatic speech recognition (ASR) improvement. Thus, we improve it for robust ASR by further utilizing spectrogram encoding information (WaveSpecEnc+) to both the SE front-end and ASR back-end. Experimental results using the CHiME-4 dataset show that ASR performance in real evaluation sets is consistently improved with the proposed method, which outperformed others, including DEMUCS and Conv-Tasnet. Refining in the shallow encoder layers is very effective, and the effect is confirmed even with a strong ASR baseline using WavLM.
科研通智能强力驱动
Strongly Powered by AbleSci AI