计算机科学
编码器
语音识别
路径(计算)
卷积码
频道(广播)
集合(抽象数据类型)
注意力网络
特征(语言学)
钥匙(锁)
试验装置
代表(政治)
人工智能
算法
电信
计算机网络
解码方法
哲学
计算机安全
语言学
程序设计语言
法学
操作系统
政治
政治学
作者
Jiaming Cheng,Cong Pang,Ruiyu Liang,Jingjie Fan,Li Zhao
标识
DOI:10.1109/icassp49357.2023.10095770
摘要
This paper proposes a dual-path convolutional recurrent network with group attention for ICASSP Signal Processing Grand Challenge: L3DAS23 Challenge. We design a structure based on convolutional encoder-decoder, and frequency-time blocks based on group attention are introduced in the middle. The encoder is used to extract the local representation from the complex spectrum, the correlation along the frequency axis and the time axis are captured through groups of time-frequency processing modules and the key information in the feature flow is extracted by the group attention. As a result, our system ranks the 1st place of the 3D speech enhancement task in L3DAS23 Challenge, and significantly outperforms the baseline, while achieving 0.101 WER and 0.902 STOI on the blind test-set.
科研通智能强力驱动
Strongly Powered by AbleSci AI