计算机科学
编解码器
语音编码
语音识别
自适应多速率音频编解码器
编解码器2
带宽(计算)
可扩展性
编码
可理解性(哲学)
语音活动检测
语音处理
线性预测编码
计算机硬件
计算机网络
生物化学
数据库
基因
认识论
哲学
化学
作者
Xiaoqiang Hu,Zhe Chen,Fuliang Yin
摘要
In extremely noisy communication scenarios, the bone-conducted microphone (BCM) speech codec is often combined with speech bandwidth extension to improve the BCM speech quality. However, this tandem approach leads to a complex system architecture. To address the problem, a scalable codec for BCM speech based on generative and diffusion probabilistic models is proposed in this paper. Specifically, a specialized codec architecture is constructed to encode BCM speech while complementing its high-frequency components. Then, a key feature extraction block is presented to address the diminishing memory capacity in shallow layers as the network depth increases. Next, considering the potential lack of high-frequency detail information, an overall refinement block is introduced to refine the reconstructed speech signals. Finally, based on the U-Net architecture, a diffusion probability model is proposed to upsample the input audio signal from a bandwidth of 8 kHz to a high-resolution audio signal with a bandwidth of 20 kHz and a sampling rate of 48 kHz. The proposed method can simultaneously encode and improve BCM speech quality using a single network. It supports different bitrate settings without architectural changes or retraining and dynamically adjusts transmitted data based on changing network load. Simulation experiments demonstrate its feasibility.
科研通智能强力驱动
Strongly Powered by AbleSci AI