计算机科学
语音识别
光谱图
杠杆(统计)
卷积神经网络
混响
可理解性(哲学)
深度学习
语音处理
人工智能
特征(语言学)
特征提取
模式识别(心理学)
工程类
哲学
认识论
语言学
电气工程
作者
Bajian Xiang,Wenyu Mao,Kaijun Tan,Huaxiang Lu
标识
DOI:10.1109/lsp.2024.3356420
摘要
Reverberation significantly degrades speech intelligibility, posing a substantial challenge in speech processing. While deep learning advancements offer promising solutions, current methodologies often overlook the effective integration of low-level and high-level feature representations, causing detrimental effects on overall performance. Simultaneously, prior approaches heavily rely on loss functions grounded in quantitative error metrics, which may not fully capture the perceptual intricacies of speech signals. To address these concerns, we introduce CAT-DUnet, a Unet architecture that integrates channel attention, time-frequency attention, and dilated convolution blocks to enhance feature fusion. We innovatively leverage the structural similarity as the training objective to align more closely with human perception, and investigate the effect of applying various reasonable transformations to spectrograms on the performance of the loss function. Through extensive ablation experiments, we demonstrate the effectiveness of our proposed enhancements. Our model outperforms state-of-the-art models on 6 out of 7 metrics, underscoring its exceptional performance.
科研通智能强力驱动
Strongly Powered by AbleSci AI