可解释性
计算机科学
语音识别
造谣
克隆(编程)
特征提取
人工智能
说话人识别
Mel倒谱
感知
字错误率
特征(语言学)
模式识别(心理学)
机器学习
社会化媒体
心理学
万维网
哲学
语言学
神经科学
程序设计语言
作者
Sarah Barrington,Romit Barua,Gautham Koorma,Hany Farid
标识
DOI:10.1109/wifs58808.2023.10374911
摘要
Synthetic-voice cloning technologies have seen significant advances in recent years, giving rise to a range of potential harms. From small- and large-scale financial fraud to disinformation campaigns, the need for reliable methods to differentiate real and synthesized voices is imperative. We describe three techniques for differentiating a real from a cloned voice designed to impersonate a specific person. These three approaches differ in their feature extraction stage with low-dimensional perceptual features offering high interpretability but lower accuracy, to generic spectral features, and end-to-end learned features offering less interpretability but higher accuracy. We show the efficacy of these approaches when trained on a single speaker's voice and when trained on multiple voices. The learned features consistently yield an equal error rate between 0% and 4%, and are reasonably robust to adversarial laundering.
科研通智能强力驱动
Strongly Powered by AbleSci AI