A neural network based algorithm for speaker localization in a multi-room environment

多向性 计算机科学 话筒 甲骨文公司 均方误差 人工神经网络 算法 背景(考古学) 语音识别 模式识别(心理学) 人工智能 数学 电信 声压 生物 统计 软件工程 古生物学 方位角 几何学
作者
Fabio Vesperini,Paolo Vecchiotti,Emanuele Principi,Stefano Squartini,Francesco Piazza
标识
DOI:10.1109/mlsp.2016.7738817
摘要

A Speaker Localization algorithm based on Neural Networks for multi-room domestic scenarios is proposed in this paper. The approach is fully data-driven and employs a Neural Network fed by GCC-PHAT (Generalized Cross Correlation Phase Transform) Patterns, calculated by means of the microphone signals, to determine the speaker position in the room under analysis. In particular, we deal with a multi-room case study, in which the acoustic scene of each room is influenced by sounds emitted in the other rooms. The algorithm is tested against the home recorded DIRHA dataset, characterized by multiple wall and ceiling microphone signals for each room. In particular, we focused on the speaker localization problem in two distinct neighbouring rooms. We assumed the presence of an Oracle multi-room Voice Activity Detector (VAD) in our experiments. A three-stage optimization procedure has been adopted to find the best network configuration and GCC-PHAT Patterns combination. Moreover, an algorithm based on Time Difference of Arrival (TDOA), recently proposed in literature for the addressed applicative context, has been considered as term of comparison. As result, the proposed algorithm outperforms the reference one, providing an average localization error, expressed in terms of RMSE, equal to 525 mm against 1465 mm. Concluding, we also assessed the algorithm performance when a real VAD, recently proposed by some of the authors, is used. Even though a degradation of localization capability is registered (an average RMSE equal to 770 mm), still a remarkable improvement with respect to the state of the art performance is obtained.

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
cheng发布了新的文献求助10
3秒前
MUAN完成签到 ,获得积分10
7秒前
天天快乐应助陈少华采纳,获得10
10秒前
夏木完成签到 ,获得积分10
11秒前
魔幻友菱完成签到 ,获得积分10
18秒前
19秒前
wanci应助电池博士采纳,获得10
21秒前
23秒前
陈少华发布了新的文献求助10
28秒前
cc完成签到 ,获得积分10
34秒前
彳亍宣完成签到 ,获得积分10
41秒前
btcat完成签到,获得积分0
42秒前
43秒前
jaytotti完成签到,获得积分10
43秒前
49秒前
陈少华完成签到,获得积分10
49秒前
桐桐应助科研通管家采纳,获得30
51秒前
51秒前
英姑应助科研通管家采纳,获得10
51秒前
酷波er应助科研通管家采纳,获得10
51秒前
星辰大海应助科研通管家采纳,获得10
51秒前
小蘑菇应助科研通管家采纳,获得10
51秒前
Lucas应助科研通管家采纳,获得10
51秒前
小马甲应助科研通管家采纳,获得10
51秒前
安静的ky完成签到,获得积分10
53秒前
cheng发布了新的文献求助10
54秒前
HPP123完成签到 ,获得积分10
55秒前
56秒前
小呵点完成签到 ,获得积分10
57秒前
又壮了完成签到 ,获得积分10
58秒前
久9完成签到 ,获得积分10
59秒前
电池博士发布了新的文献求助10
1分钟前
molihuakai应助Leo采纳,获得10
1分钟前
1分钟前
隐形曼青应助开放诗筠采纳,获得10
1分钟前
cquank完成签到,获得积分10
1分钟前
1分钟前
开放诗筠完成签到,获得积分10
1分钟前
1分钟前
1分钟前
高分求助中
Adhesion Science: Principles & Practice 1234
Signals, Systems, and Signal Processing 610
Burger's Medicinal Chemistry and Drug Discovery 400
A Step-by-Step Guide to Qualitative Data Coding 2nd Edition 400
Impact of Storage Orientation and Duration on Prefilled Syringe Performance: Break-Loose and Glide Forces, and Injection Time Across Multiple Time Points 360
Programming for Chemical Engineers Using C, C++, and MATLAB 300
Upland Kenya wild flowers and ferns: a flora of the flowers, ferns, grasses, and sedges of highland Kenya 300
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6662938
求助须知:如何正确求助?哪些是违规求助? 8413037
关于积分的说明 17984348
捐赠科研通 5866763
什么是DOI,文献DOI怎么找? 2974939
邀请新用户注册赠送积分活动 1950845
关于科研通互助平台的介绍 1876490