计算机科学
语音识别
说话人识别
人格
说话人日记
心理学
社会心理学
作者
Joon Gyu Maeng,Min Kyu Lee,Seung Yun,Sang Hun Kim
标识
DOI:10.1109/ictc52510.2021.9621038
摘要
Recently, a various speaker-dependent Voice Activity Detections (VAD) have been proposed which detect target speaker's speeches in noisy environment. Speaker-dependent VAD is similar to knowledge distillation in which it learns distribution of each speaker from speaker embedding model trained with lots of speakers. That is, the key idea is to sufficiently learn speaker embedding vector distribution for enhancing personality. In this paper, we proposed new strategies to enhance personality of speaker-dependent VAD. To make better personal characteristics of speakers, we considered several factors based on model size, language, and gender. Our experiments show that the model strategies achieves significant performance improvement on Average Precision(AP) of 0.959, 0.935, compared to 0.735, 0.530 of baseline model for each language evaluation set.
科研通智能强力驱动
Strongly Powered by AbleSci AI