计算机科学
一致性(知识库)
语音识别
指纹(计算)
集合(抽象数据类型)
代表(政治)
人工智能
可靠性
钥匙(锁)
模式识别(心理学)
计算机安全
政治
政治学
法学
程序设计语言
作者
Junlong Deng,Yanzhen Ren,Tong Zhang,H.-L. Zhu,Zongkun Sun
标识
DOI:10.1109/icassp48485.2024.10446798
摘要
With the rapid development of audio deepfake technology, the credibility and authenticity of public opinion is facing a formidable challenge. Since vocoder is the key component of audio deepfake and leaves distinctive fingerprint features, we propose VFD-Net (Vocoder Fingerprints Detection Net), a new vocoder architectures attribution scheme, which is based on patch-wise supervised contrastive learning (PCL) to capture the global consistency of the vocoder fingerprints and to improve the detection performance in cross-set testing and audio compression scenario. PCL brings patches belonging to the same vocoder class closer together in the representation space, while pushing patches from different vocoder classes further apart. Comparative experimental results show that the average accuracy of our proposed outperforms state-of-the-art 30%-45% under cross-set testing and AAC compression circumstances. Furthermore, our proposed approach achieves a 83.67% average accuracy in short-term fake audio detection within one second. It can be used to detect partially fake audio by analyzing the consistency of vocoder fingerprints.
科研通智能强力驱动
Strongly Powered by AbleSci AI