变压器
计算机科学
模棱两可
语音识别
分离(统计)
排列(音乐)
源分离
模式识别(心理学)
人工智能
声学
工程类
机器学习
电压
电气工程
物理
程序设计语言
作者
Sangwon Lee,Han-Gyu Kim,Gil‐Jin Jang
出处
期刊:Sensors
[Multidisciplinary Digital Publishing Institute]
日期:2025-08-08
卷期号:25 (16): 4905-4905
摘要
Most speech separation techniques require knowing the number of talkers mixed in an input, which is not always available in real situations. To address this problem, we present a novel speech separation method that automatically finds the number of talkers in input mixture recordings. The proposed method extracts the voices of individual talkers one by one in a deflationary manner and stops the extraction sequence when a predefined termination criterion is satisfied. The backbone separation model is built based on the transformer architecture with permutation-invariant training to avoid ambiguity in identifying talkers at the output. The experimental results on the Libri5Mix and Libri10Mix datasets show that the proposed method without the number of talkers as input significantly outperforms state-of-the-art models that are provided with the number of talkers.
科研通智能强力驱动
Strongly Powered by AbleSci AI