计算机科学
人工智能
卷积神经网络
语音识别
字错误率
模式识别(心理学)
噪音(视频)
说话人验证
说话人识别
图像(数学)
作者
Xiaoyi Qin,Na Li,Chao Weng,Dan Su,Ming Li
标识
DOI:10.1109/icassp43922.2022.9746294
摘要
Recently, the attention mechanism such as squeeze-and-excitation module (SE) and convolutional block attention module (CBAM) has achieved great success in deep learning-based speaker verification system. This paper introduces an alternative effective yet simple one, i.e., simple attention module (SimAM), for speaker verification. The SimAM module is a plug-and-play module without extra modal parameters. In addition, we propose a noisy label detection method to iteratively filter out the data samples with a noisy label from the training data, considering that a large-scale dataset labeled with human annotation or other automated processes may contain noisy labels. Data with the noisy label may over parameterize a deep neural network (DNN) and result in a performance drop due to the memorization effect of the DNN. Experiments are conducted on VoxCeleb dataset. The speaker verification model with SimAM achieves the 0.675% equal error rate (EER) on VoxCeleb1 original test trials. Our proposed iterative noisy label detection method further reduces the EER to 0.643%.
科研通智能强力驱动
Strongly Powered by AbleSci AI