计算机科学
规范化(社会学)
语音识别
说话人识别
Mel倒谱
特征提取
特征(语言学)
模式识别(心理学)
人工智能
时域
频域
领域(数学分析)
时频分析
算法
数学
电信
计算机视觉
数学分析
语言学
哲学
社会学
人类学
雷达
作者
Jiqing Han,Yunfei Zi,Shengwu Xiong
标识
DOI:10.1007/978-981-99-7022-3_33
摘要
Many existing speaker recognition algorithms have the problem that single-domain feature extraction cannot represent the speech characteristics well, and this problem will affect the accuracy of speaker recognition. To solve this problem, we propose a time-frequency domain feature enhanced deep speaker (TFDS). The proposed algorithm can combine time domain and frequency domain, enhance the traditional MFCC feature extraction, and make up for the shortcomings of other algorithms that only extract features in a single domain. The deep speaker network architecture includes ResCNN, GRU, time averaging layer, style transformation layer, length normalization layer, and the loss is triple loss. Representation of experimental results performed on the librisspeech dataset results show that TFDS has higher accuracy and lower Equal Error Rate than deep speaker, and the time-frequency domain feature enhanced method can also be combined with other networks to improve the accuracy of speaker recognition.
科研通智能强力驱动
Strongly Powered by AbleSci AI