计算机科学
异常检测
人工智能
监督学习
机器学习
模式识别(心理学)
计算机视觉
人工神经网络
作者
Chao Huang,Weiliang Huang,Qiuping Jiang,Wei Wang,Jie Wen,Bob Zhang
标识
DOI:10.1109/tmm.2025.3557682
摘要
Efforts in weakly-supervised video anomaly detection center on detecting abnormal events within videos by coarse-grained labels, which has been successfully applied to many real-world applications. However, a significant limitation of most existing methods is that they are only effective for specific objects in specific scenarios, which makes them prone to misclassification or omission when confronted with previously unseen anomalies. Relative to conventional anomaly detection tasks, Open-world Weakly-supervised Video Anomaly Detection (OWVAD) poses greater challenges due to the absence of labels and fine-grained annotations for unknown anomalies. To address the above problem, we propose a multi-scale evidential vision-language model to achieve open-world video anomaly detection. Specifically, we leverage generalized visual-language associations derived from CLIP to harness the full potential of large pre-trained models in addressing the OWVAD task. Subsequently, we integrate a multi-scale temporal modeling module with a multimodal evidence collector to achieve precise frame-level detection of both seen and unseen anomalies. Extensive experiments on two widely-utilized benchmarks have conclusively validated the effectiveness of our method. The code will be made publicly available.
科研通智能强力驱动
Strongly Powered by AbleSci AI