稳健性(进化)
计算机科学
一般化
探测器
人工智能
过程(计算)
机器学习
语音识别
数学
数学分析
电信
生物化学
化学
基因
操作系统
作者
Menglu Li,Yasaman Ahmadiadli,Xiao–Ping Zhang
标识
DOI:10.1109/mmsp59012.2023.10337724
摘要
ASVspoof Challenges have been launched to motivate research on Deepfake audio detection due to its threats to society. However, the state-of-the-art detection models produce an unsatisfactory performance on the Speech Deepfake (DF) of the challenge. The DF subset includes spoofed audio from various sources, which can better reflect the robustness of the detector. In this paper, we propose a novel detection architecture to improve the robustness and generalization ability in two ways. The first way is aggregating both learned embeddings and hand-crafted features to obtain more generalizable representations for Deepfake audio. Our second contribution is formulating the training process a bi-level optimization problem to make use of the knowledge of different Deepfake generation methods. Evaluations of our proposed method provide the best detection output reported in the literature as a single system without the help of ensemble modeling and data augmentation.
科研通智能强力驱动
Strongly Powered by AbleSci AI