稳健性(进化)
计算机科学
声源定位
特征(语言学)
人工智能
卷积(计算机科学)
信号处理
模式识别(心理学)
人工神经网络
声学
声波
数字信号处理
生物化学
化学
物理
语言学
哲学
基因
计算机硬件
作者
Xin-Cheng Zhu,H.-Y. Zhang,Haiyang Feng,Denghuang Zhao,Xiaojun Zhang,Zhi Tao
标识
DOI:10.1109/tim.2023.3348907
摘要
Currently, sound source localization (SSL) techniques based on deep learning mainly rely on traditional signal processing methods to generate input features. Nevertheless, the applicability of these features in various environments shows significant differences. This study proposes a new single SSL model, called the icosahedral feature attention network (IFAN), to overcome this limitation. The proposed IFAN not only uses steered response power with phase transform (SRP-PHAT), but also develops steered response power with Least-Mean-Square (SRPLMS) as inputs of the network. The IFAN network encodes spatial position information into convolution kernels by introducing icosahedral convolutions. In addition, it adaptively learns optimal feature weights based on the input acoustic environment using the sigmoid function to capture the spatial distribution information of the sound source. For single source SSL and tracking scenarios, the proposed method on the localization and tracking (LOCATA) challenge data corpus outperform other state-of-the-art models. Moreover, it is capable of learning complementary information even in acoustic simulations involving a wide range of reverberations. The proposed IFAN can thus enhance the robustness and performance in different environments.
科研通智能强力驱动
Strongly Powered by AbleSci AI