单声道
计算机科学
卷积神经网络
模式识别(心理学)
特征(语言学)
人工智能
语音识别
事件(粒子物理)
循环神经网络
人工神经网络
语言学
量子力学
物理
哲学
作者
Sharath Adavanne,Pasi Pertilä,Tuomas Virtanen
出处
期刊:Cornell University - arXiv
日期:2017-01-01
标识
DOI:10.48550/arxiv.1706.02291
摘要
This paper proposes to use low-level spatial features extracted from multichannel audio for sound event detection. We extend the convolutional recurrent neural network to handle more than one type of these multichannel features by learning from each of them separately in the initial stages. We show that instead of concatenating the features of each channel into a single feature vector the network learns sound events in multichannel audio better when they are presented as separate layers of a volume. Using the proposed spatial features over monaural features on the same network gives an absolute F-score improvement of 6.1% on the publicly available TUT-SED 2016 dataset and 2.7% on the TUT-SED 2009 dataset that is fifteen times larger.
科研通智能强力驱动
Strongly Powered by AbleSci AI