计算机科学
机制(生物学)
人机交互
认识论
哲学
作者
Lian Huang,Jixiang Yang,Jinhong Zhao,Lian Huang
摘要
ABSTRACT Due to the successful application of deep learning, audio spoofing detection has made significant progress. Spoofed audio with speech synthesis or voice conversion can be detected by many countermeasures well. However, an automatic speaker verification system is still vulnerable to spoofing attacks such as replay or deepfake audio. Deepfake audio, generated using text‐to‐speech (TTS) and voice conversion (VC) algorithms, poses a particularly significant challenge. To address this vulnerability, we propose a novel framework incorporating hybrid features and a self‐attention mechanism for enhanced spoofing detection. Our approach is distinguished by the following key contributions: (1) A novel dual‐path feature extraction architecture, leveraging parallel convolutional neural networks (CNNs) and Short‐Time Fourier Transform (STFT) with Mel‐frequency filtering to capture complementary deep learning and Mel‐spectrogram features, respectively; (2) A max‐pooling‐based feature fusion strategy, concatenating the extracted features to preserve crucial discriminative information; (3) The integration of a self‐attention mechanism to dynamically weight and focus on salient temporal‐spectral patterns within the fused feature representation; (4) A ResNet‐based classifier, augmented with linear layers, for robust spoofing classification. Rigorous evaluation on the ASVspoof 2021 dataset demonstrates the efficacy of our proposed framework. We achieve state‐of‐the‐art performance, attaining Equal Error Rate (EER) of 9.67% in the physical access (PA) scenario and 8.94% in the deepfake task. These results correspond to substantial relative improvements of 74.60% and 60.05%, respectively, compared to the best‐performing baseline systems. These findings underscore the superior discriminative power of our hybrid feature approach, highlighting its ability to capture richer utterance details compared to conventional single‐modality feature representations. This work offers a promising new direction for developing robust ASV systems resilient to increasingly sophisticated spoofing attacks.
科研通智能强力驱动
Strongly Powered by AbleSci AI