计算机科学
联营
人工智能
卷积神经网络
模式识别(心理学)
人工神经网络
图像(数学)
深度学习
光学(聚焦)
事件(粒子物理)
学习迁移
语音识别
机器学习
量子力学
光学
物理
作者
Xichang Cai,Dongchi Yu,Du-Xin Liu,Menglong Wu
出处
期刊:Journal of physics
[IOP Publishing]
日期:2021-09-01
卷期号:2010 (1): 012108-012108
被引量:1
标识
DOI:10.1088/1742-6596/2010/1/012108
摘要
Abstract In this paper, we propose a sound event detection (SED) method which uses deep neural network trained on weak labeled and unlabeled data. The proposed method utilizes a convolutional recurrent neural network (CRNN) to extract high level features of audio clips. Inspired by the impressive performance of transfer learning in the field of image recognition, the convolutional neural network (CNN) in the proposed CRNN is an image-pretrained model. Although there is a significant difference between audio and image, the image-pretrained CNN still has competitive performance in SED and can effectively reduce the amount of training data needed. To learn from weak labeled data, the proposed method utilizes a weighted pooling strategy which enables the network to focus on the frames containing events in an audio clip. For unlabeled data, the proposed method utilizes a mean teacher semi-supervised learning method and data augmentation technique. To demonstrate the performance of the proposed method, we conduct the experimental evaluation using the DCASE2021 Task4 dataset. The experimental results demonstrate that the proposed method outperforms the DCASE2021 Task4 baseline method.
科研通智能强力驱动
Strongly Powered by AbleSci AI