计算机科学
语音增强
比索
深度学习
人工智能
卷积神经网络
编码器
卷积(计算机科学)
语音识别
计算复杂性理论
GSM演进的增强数据速率
瓶颈
计算机工程
边缘设备
电子工程
信号处理
语音处理
频道(广播)
降噪
编码(内存)
边缘增强
适应(眼睛)
计算模型
边缘计算
噪音(视频)
自编码
人工神经网络
翻译(生物学)
语音编码
实时计算
解码方法
模式识别(心理学)
噪声测量
卷积码
作者
Fazal E. Wahab,Zhongfu Ye,Nasir Saleem,Sami Bourouis,Amir Hussain
标识
DOI:10.1109/tce.2025.3598007
摘要
Deep learning has significantly advanced speech enhancement (SE) by exploiting hierarchical representations to model complex speech patterns. However, deploying these models on resource-constrained edge devices remains challenging due to computational limitations and real-time processing requirements. Convolutional neural networks (CNNs) face challenges due to frequency translation equivariance, which reduces their sensitivity to frequency-specific features essential for speech-noise separation. Transformer-based SE models are effective at capturing global dependencies but are computationally expensive and less suitable for low-latency edge processing. This study proposes an efficient encoder-decoder architecture optimized for SE on edge devices to address these challenges. The model integrates adaptive frequency-aware gated convolution (AFAGC) in the encoder and a Ginformer-based bottleneck, ensuring robust real-time performance with minimal computational overhead. The encoder incorporates adaptive frequency band positional encoding to mitigate translation equivariance, while gated convolution selectively reweights frequency components to emphasize speech-relevant features. The Ginformer-based bottleneck uses low-rank projections to reduce self-attention complexity and an SRU-based temporal gating to enhance noise adaptation and computational efficiency. Evaluation on the VoiceBank+DEMAND dataset demonstrates that the proposed model outperforms recent SE models, achieving a PESQ of 3.25 and STOI of 95.5%. With only 1.32 million parameters and a real-time factor (RTF) of 0.14, it delivers high-quality speech enhancement suitable for real-time deployment on edge devices.
科研通智能强力驱动
Strongly Powered by AbleSci AI