计算机科学
个性化
匹配(统计)
语音识别
集合(抽象数据类型)
卷积神经网络
传递函数
人工智能
数学
工程类
统计
电气工程
万维网
程序设计语言
作者
Bowen Zhi,Dmitry N. Zotkin,Ramani Duraiswami
标识
DOI:10.1109/icassp43922.2022.9746315
摘要
Incorporating individualized head-related transfer functions (HRTFs) into a high fidelity sound engine can further improve the perceived quality and realism of binaurally-rendered spatial audio. Traditional methods to measure individual HRTFs tend to be cumbersome, expensive and require physical access to the subject. To address these issues, we develop a convolutional neural network model that, given a single photo of an ear, predicts pinna landmarks that can be used to extract anthropometric features commonly used for HRTF personalization, and match to a database of subjects whose HRTFs and pictures are available. We propose and evaluate a system utilizing this model to generate an individualized HRTF using a minimal set of easily obtainable measurements: single photographs of both ears, as well as head and ear scale for matching interaural time difference (ITD). To extend the reach of our database we employ ideas from Kendall shape theory to match ears non-dimensionally, match all ears to right ears, and make corresponding changes to the database HRIRs. We also apply HAT models to the HRIRs to provide better matching.
科研通智能强力驱动
Strongly Powered by AbleSci AI