计算机科学
人工智能
声音(地理)
语音识别
计算机视觉
光谱(功能分析)
图像(数学)
模式识别(心理学)
声学
物理
量子力学
作者
Shahin Amiriparian,Maurice Gerczuk,Sandra Ottl,Nicholas Cummins,Michael Freitag,Sergey Pugachevskiy,Alice Baird,Björn W. Schuller
标识
DOI:10.21437/interspeech.2017-434
摘要
In this paper, we propose a method for automatically detecting various types of snore sounds using image classification convolutional neural network (CNN) descriptors extracted from audio file spectrograms.The descriptors, denoted as deep spectrum features, are derived from forwarding spectrograms through very deep task-independent pre-trained CNNs.Specifically, activations of fully connected layers from two common image classification CNNs, AlexNet and VGG19, are used as feature vectors.Moreover, we investigate the impact of differing spectrogram colour maps and two CNN architectures on the performance of the system.Results presented indicate that deep spectrum features extracted from the activations of the second fully connected layer of AlexNet using a viridis colour map are well suited to the task.This feature space, when combined with a support vector classifier, outperforms the more conventional knowledge-based features of 6 373 acoustic functionals used in the INTERSPEECH ComParE 2017 Snoring sub-challenge baseline system.In comparison to the baseline, unweighted average recall is increased from 40.6 % to 44.8 % on the development partition, and from 58.5 % to 67.0 % on the test partition.
科研通智能强力驱动
Strongly Powered by AbleSci AI