计算机科学
变压器
语音识别
水准点(测量)
对偶(语法数字)
人工智能
工程类
电压
艺术
文学类
大地测量学
地理
电气工程
作者
Dingding Han,Wensheng Zhang,Siling Feng,Mengxing Huang,Yuanyuan Wu
标识
DOI:10.1109/prai59366.2023.10332104
摘要
Transformer allows each position to interact with all other positions in the input sequence, enabling powerful capturing of global interaction information. However, in speech separation tasks, fine-grained local information is crucial in speech sequences, and relying solely on self-attention mechanisms may not extract these local details information effectively. To address this limitation, this paper proposes a dual-path hybrid attention transformer network (DPHAT-Net) for time-domain single-channel speech separation. Specifically, the hybrid attention transformer (HA-Transformer) module is designed to capture global and local information in speech sequences. Furthermore, a Simple Recurrent Unit (SRU) is introduced to replace traditional positional encoding better to utilize the temporal position information in speech sequences. This paper conducts experimental evaluations on the WSJ0-2mix benchmark dataset and shows that the proposed DPHAT-Net realizes state-of-the-art speech separation performance while maintaining a relatively small model size.
科研通智能强力驱动
Strongly Powered by AbleSci AI