Enhancing Deepfake Audio Detection: A ResNet Framework Based on Hybrid Features and Self‐Attention Mechanism

计算机科学 机制(生物学) 人机交互 认识论 哲学
作者
Lian Huang,Jixiang Yang,Jinhong Zhao,Lian Huang
出处
期刊:Expert Systems [Wiley]
卷期号:42 (6) 被引量:3
标识
DOI:10.1111/exsy.70054
摘要

ABSTRACT Due to the successful application of deep learning, audio spoofing detection has made significant progress. Spoofed audio with speech synthesis or voice conversion can be detected by many countermeasures well. However, an automatic speaker verification system is still vulnerable to spoofing attacks such as replay or deepfake audio. Deepfake audio, generated using text‐to‐speech (TTS) and voice conversion (VC) algorithms, poses a particularly significant challenge. To address this vulnerability, we propose a novel framework incorporating hybrid features and a self‐attention mechanism for enhanced spoofing detection. Our approach is distinguished by the following key contributions: (1) A novel dual‐path feature extraction architecture, leveraging parallel convolutional neural networks (CNNs) and Short‐Time Fourier Transform (STFT) with Mel‐frequency filtering to capture complementary deep learning and Mel‐spectrogram features, respectively; (2) A max‐pooling‐based feature fusion strategy, concatenating the extracted features to preserve crucial discriminative information; (3) The integration of a self‐attention mechanism to dynamically weight and focus on salient temporal‐spectral patterns within the fused feature representation; (4) A ResNet‐based classifier, augmented with linear layers, for robust spoofing classification. Rigorous evaluation on the ASVspoof 2021 dataset demonstrates the efficacy of our proposed framework. We achieve state‐of‐the‐art performance, attaining Equal Error Rate (EER) of 9.67% in the physical access (PA) scenario and 8.94% in the deepfake task. These results correspond to substantial relative improvements of 74.60% and 60.05%, respectively, compared to the best‐performing baseline systems. These findings underscore the superior discriminative power of our hybrid feature approach, highlighting its ability to capture richer utterance details compared to conventional single‐modality feature representations. This work offers a promising new direction for developing robust ASV systems resilient to increasingly sophisticated spoofing attacks.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
发呆的小号完成签到 ,获得积分10
3秒前
CipherSage应助ylyao采纳,获得10
4秒前
可靠的书本完成签到,获得积分10
6秒前
阿尔法贝塔完成签到 ,获得积分10
6秒前
jiumi完成签到,获得积分10
6秒前
仁爱问芙完成签到,获得积分10
10秒前
蜡笔小天完成签到,获得积分20
11秒前
VLH完成签到,获得积分10
12秒前
13秒前
黄黄完成签到,获得积分0
13秒前
Rei完成签到,获得积分10
13秒前
13秒前
shaw完成签到,获得积分10
14秒前
脑洞疼应助危机的冷雪采纳,获得10
16秒前
cdragon完成签到,获得积分10
17秒前
隐形曼青应助闪闪黎昕采纳,获得10
18秒前
crab发布了新的文献求助10
20秒前
柳crystal完成签到 ,获得积分10
21秒前
maohui发布了新的文献求助10
22秒前
1351019完成签到,获得积分10
25秒前
危机的冷雪完成签到,获得积分10
26秒前
32秒前
大胖小子完成签到,获得积分10
33秒前
爱吃泡芙完成签到,获得积分10
33秒前
调皮的天真完成签到 ,获得积分10
33秒前
花开富贵完成签到,获得积分10
34秒前
73Jennie123完成签到,获得积分10
36秒前
maohui完成签到,获得积分10
36秒前
淡然的奎完成签到,获得积分10
38秒前
baobeikk完成签到,获得积分10
38秒前
crab完成签到 ,获得积分10
39秒前
彭于晏应助毅诚菌采纳,获得10
40秒前
41秒前
叶洛洛完成签到 ,获得积分10
42秒前
AmyHu完成签到,获得积分10
42秒前
王泰一发布了新的文献求助10
42秒前
充电宝应助夏傥采纳,获得10
43秒前
清风明月完成签到,获得积分10
43秒前
啦啦啦啦啦完成签到,获得积分10
44秒前
罗氏集团发布了新的文献求助10
44秒前
高分求助中
(应助此贴封号)【重要!!请各用户(尤其是新用户)详细阅读】【科研通的精品贴汇总】 10000
Les Mantodea de Guyane Insecta, Polyneoptera 2000
Emmy Noether's Wonderful Theorem 1200
Leading Academic-Practice Partnerships in Nursing and Healthcare: A Paradigm for Change 800
基于非线性光纤环形镜的全保偏锁模激光器研究-上海科技大学 800
Signals, Systems, and Signal Processing 610
Research Methods for Business: A Skill Building Approach, 9th Edition 500
热门求助领域 (近24小时)
化学 材料科学 医学 生物 纳米技术 工程类 有机化学 化学工程 生物化学 计算机科学 物理 内科学 复合材料 催化作用 物理化学 光电子学 电极 细胞生物学 基因 无机化学
热门帖子
关注 科研通微信公众号,转发送积分 6410758
求助须知:如何正确求助?哪些是违规求助? 8230028
关于积分的说明 17464107
捐赠科研通 5463718
什么是DOI,文献DOI怎么找? 2886990
邀请新用户注册赠送积分活动 1863426
关于科研通互助平台的介绍 1702532