异常检测
计算机科学
人工智能
公制(单位)
一般化
特征(语言学)
模式识别(心理学)
异常(物理)
噪音(视频)
代表(政治)
图像(数学)
计算机视觉
编码(集合论)
合成数据
扩散
简单(哲学)
度量(数据仓库)
数据点
限制
无监督学习
扩散图
特征提取
跟踪(心理语言学)
特征向量
数据挖掘
数据建模
目标检测
可视化
监督学习
作者
Hicsonmez, Samet,Shabayek, Abd El Rahman,Aouada, Djamila
出处
期刊:Cornell University - arXiv
日期:2025-11-11
标识
DOI:10.48550/arxiv.2511.08173
摘要
Detecting visual anomalies in diverse, multi-class real-world images is a significant challenge. We introduce \ours, a novel unsupervised multi-class visual anomaly detection framework. It integrates a Latent Diffusion Model (LDM) with a Vision-Language Model (VLM) for enhanced anomaly localization and detection. Specifically, a pre-trained VLM with a simple prompt extracts detailed image descriptions, serving as additional conditioning for LDM training. Current diffusion-based methods rely on synthetic noise generation, limiting their generalization and requiring per-class model training, which hinders scalability. \ours, however, leverages VLMs to obtain normal captions without manual annotations or additional training. These descriptions condition the diffusion model, learning a robust normal image feature representation for multi-class anomaly detection. Our method achieves competitive performance, improving the pixel-level Per-Region-Overlap (PRO) metric by up to 25 points on the Real-IAD dataset and 8 points on the COCO-AD dataset, outperforming state-of-the-art diffusion-based approaches. Code is available at https://github.com/giddyyupp/VLMDiff.
科研通智能强力驱动
Strongly Powered by AbleSci AI