计算机科学
自编码
人工智能
深度学习
语音识别
模式识别(心理学)
判别式
音乐信息检索
波形
音频信号
语音编码
电信
艺术
视觉艺术
雷达
音乐剧
作者
Richard Orjesek,Roman Jarina,Michal Chmulík
标识
DOI:10.1007/s11042-021-11584-7
摘要
Automatic music emotion recognition (MER) has received increased attention in areas of music information retrieval and user interface development. Music emotion variation detection (or dynamic MER) captures also temporal changes of emotion, and emotional content in music is expressed as a series of valence-arousal predictions. One of the issues in MER is extraction of emotional characteristics from audio signal. We propose a deep neural network based solution for mining music emotion-related salient features directly from raw audio waveform. The proposed architecture is based on stacking one-dimensional convolution layer, autoencoder-based layer with iterative reconstruction, and bidirectional gated recurrent unit. The tests on the DEAM dataset have shown that the proposed solution, in comparison with other state-of-the-art systems, can bring a significant improvement of the regression accuracy, notably for the valence dimension. It is shown that the proposed iterative reconstruction layer is able to enhance the discriminative properties of the features and further increase regression accuracy.
科研通智能强力驱动
Strongly Powered by AbleSci AI