PVASS-MDD: Predictive Visual-Audio Alignment Self-Supervision for Multimodal Deepfake Detection

计算机科学 视听 人工智能 模态(人机交互) 机器学习 模式识别(心理学) 多媒体
作者
Yang Yu,Xiaolong Liu,Rongrong Ni,Siyuan Yang,Yao Zhao,Alex C. Kot
出处
期刊:IEEE Transactions on Circuits and Systems for Video Technology [Institute of Electrical and Electronics Engineers]
卷期号:34 (8): 6926-6936 被引量:38
标识
DOI:10.1109/tcsvt.2023.3309899
摘要

Deepfake techniques can forge the visual or audio signals in the video, which leads to inconsistencies between visual and audio (VA) signals. Therefore, multimodal detection methods expose deepfake videos by extracting VA inconsistencies. Recently, deepfake technology has started VA collaborative forgery to obtain more realistic deepfake videos, which poses new challenges for extracting VA inconsistencies. Recent multimodal detection methods propose to first extract natural VA correspondences in real videos in a self-supervised manner, and then use the learned real correspondences as targets to guide the extraction of VA inconsistencies in the subsequent deepfake detection stage. However, the inherent VA relations are difficult to extract due to the modality gap, which leads to the limited auxiliary performance of the aforementioned self-supervised methods. In this paper, we propose Predictive Visual-audio Alignment Self-supervision for Multimodal Deepfake Detection (PVASS-MDD), which consists of PVASS auxiliary and MDD stages. In the PVASS auxiliary stage in real videos, we first devise a three-stream network to associate two augmented visual views with corresponding audio clues, leading to explore common VA correspondences based on cross-view learning. Secondly, we introduce a novel cross-modal predictive align module for eliminating VA gaps to provide inherent VA correspondences. In the MDD stage, we propose to the auxiliary loss to utilize the frozen PVASS network to align VA features of real videos, to better assist multimodal deepfake detector for capturing subtle VA inconsistencies. We conduct extensive experiments on existing widely used and latest multimodal deepfake datasets. Our method obtains a significant performance improvement compared to state-of-the-art methods.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
Sophia完成签到 ,获得积分10
1秒前
杨丽完成签到,获得积分10
1秒前
成功的强完成签到,获得积分10
4秒前
迷人绿柏完成签到 ,获得积分10
5秒前
Dreammy完成签到,获得积分10
5秒前
老迟到的小松鼠完成签到,获得积分0
6秒前
LKX完成签到,获得积分10
6秒前
pamela完成签到 ,获得积分10
7秒前
大爱人生完成签到 ,获得积分10
10秒前
lina完成签到 ,获得积分10
10秒前
滴答滴完成签到 ,获得积分10
10秒前
dididi完成签到 ,获得积分10
11秒前
幽默的乐安完成签到 ,获得积分10
11秒前
小希完成签到 ,获得积分10
11秒前
阿志应助Song采纳,获得10
12秒前
小张完成签到 ,获得积分10
14秒前
现代发布了新的文献求助10
16秒前
eyre完成签到 ,获得积分10
19秒前
SY15732023811完成签到 ,获得积分10
21秒前
光之霓裳完成签到 ,获得积分0
21秒前
21秒前
平淡的翅膀完成签到 ,获得积分10
22秒前
chuzihang完成签到 ,获得积分10
23秒前
凡事发生必有利于我完成签到,获得积分10
25秒前
DingShicong完成签到 ,获得积分10
26秒前
keyanxinshou完成签到 ,获得积分10
27秒前
梨花雨凉完成签到,获得积分10
27秒前
liuliqiong完成签到,获得积分10
29秒前
dingyushu完成签到,获得积分10
29秒前
俏皮的戎完成签到,获得积分10
30秒前
yanmh完成签到,获得积分10
30秒前
dddd完成签到 ,获得积分10
30秒前
saturn应助波哥采纳,获得10
32秒前
boss_astr完成签到,获得积分10
32秒前
2dingyushu完成签到,获得积分10
34秒前
xuebinxu完成签到 ,获得积分10
35秒前
boss_phy完成签到,获得积分10
37秒前
Tin完成签到,获得积分10
37秒前
文静若血完成签到,获得积分10
37秒前
从容的尔云完成签到 ,获得积分10
38秒前
高分求助中
Modern Epidemiology, Fourth Edition 5000
Kinesiophobia : a new view of chronic pain behavior 5000
Molecular Biology of Cancer: Mechanisms, Targets, and Therapeutics 3000
Digital Twins of Advanced Materials Processing 2000
Propeller Design 2000
Weaponeering, Fourth Edition – Two Volume SET 2000
First commercial application of ELCRES™ HTV150A film in Nichicon capacitors for AC-DC inverters: SABIC at PCIM Europe 1000
热门求助领域 (近24小时)
化学 材料科学 医学 生物 工程类 有机化学 纳米技术 化学工程 生物化学 物理 计算机科学 内科学 复合材料 催化作用 物理化学 光电子学 电极 冶金 细胞生物学 基因
热门帖子
关注 科研通微信公众号,转发送积分 6004985
求助须知:如何正确求助?哪些是违规求助? 7526245
关于积分的说明 16112199
捐赠科研通 5150432
什么是DOI,文献DOI怎么找? 2759784
邀请新用户注册赠送积分活动 1736789
关于科研通互助平台的介绍 1632104