计算机科学
特征(语言学)
棱锥(几何)
模式识别(心理学)
人工智能
塞德
事件(粒子物理)
语音识别
数学
几何学
语言学
量子力学
物理
哲学
程序设计语言
作者
Ji Won Kim,Geon Woo Lee,Chang‐Soo Park,Hong Kook Kim
标识
DOI:10.1109/icce56470.2023.10043590
摘要
This paper proposes a sound event detection (SED) model that uses EfficientNet-B2 and an attentional pyramid network (APNet) module to effectively represent information from a multi-resolution feature map. Compared to the A2FPN-based SED model, the proposed SED model is realized with a reduced computational complexity and improved performance due to the newly proposed APNet. The proposed SED model is based on a pre-trained EfficientNet-B2 that is obtained from the pretraining, sampling, labeling, and aggregation framework. Then, multi-resolution feature maps extracted from EfficientNet-B2 are aggregated by the APNet module. The aggregated feature map is then used to detect sound events by using a detection network mainly composed of two bidirectional gated recurrent unit layers. The performance of the proposed SED model is evaluated on the detection and classification of acoustic scenes and events (DCASE) 2022 Challenge Task 4. Consequently, it was shown that the F1-score and polyphonic sound event detection scores 1 and 2 of the proposed SED model are higher by 0.4%, 0.009, and 0.014, respectively, than those of the A2FPN-based SED model. In addition, the proposed model had a smaller number of floating point operations than the A2FPN-based SED model.
科研通智能强力驱动
Strongly Powered by AbleSci AI