计算机科学
异常检测
特征(语言学)
自回归模型
人工智能
视听
任务(项目管理)
语音识别
模式识别(心理学)
计算机视觉
多媒体
语言学
哲学
计量经济学
经济
管理
作者
Feng, Chao,Chen, Ziyang,Owens, Andrew
出处
期刊:Cornell University - arXiv
日期:2023-01-04
标识
DOI:10.48550/arxiv.2301.01767
摘要
Manipulated videos often contain subtle inconsistencies between their visual and audio signals. We propose a video forensics method, based on anomaly detection, that can identify these inconsistencies, and that can be trained solely using real, unlabeled data. We train an autoregressive model to generate sequences of audio-visual features, using feature sets that capture the temporal synchronization between video frames and sound. At test time, we then flag videos that the model assigns low probability. Despite being trained entirely on real videos, our model obtains strong performance on the task of detecting manipulated speech videos. Project site: https://cfeng16.github.io/audio-visual-forensics
科研通智能强力驱动
Strongly Powered by AbleSci AI