计算机科学
语音识别
人工智能
任务(项目管理)
事件(粒子物理)
光学(聚焦)
领域(数学)
隐马尔可夫模型
模式识别(心理学)
机器学习
物理
光学
经济
管理
纯数学
量子力学
数学
作者
Liang Xu,Lizhong Wang,Sijun Bi,Hanyue Liu,Jing Wang
标识
DOI:10.1109/icassp49357.2023.10095687
摘要
Sound event detection (SED) is an interesting but challenging task due to the scarcity of data and diverse sound events in real life. In this paper, we focus on the semi-supervised SED task, and combine pre-trained model from other field to assist in improving the detection effect. Pre-trained models have been widely used in various tasks in the field of speech, such as automatic speech recognition, audio tagging, etc. If the training dataset is large and general enough, the embedding features extracted by the pre-trained model will cover the potential information in the original task. We use pre-trained model PANNs which is suitable for SED task and proposed two methods to fuse the features from PANNs and original model, respectively. In addition, we also propose a weight raised temporal contrastive loss to improve the model's switching speed at event boundaries and the smoothness within events. Experimental results show that using pre-trained model features outperforms the baseline by 8.5% and 9.1% in DESED public evaluation dataset in terms of polyphonic sound detection score (PSDS).
科研通智能强力驱动
Strongly Powered by AbleSci AI