变压器
异常检测
建筑
计算机科学
弹丸
对偶(语法数字)
一次性
人工智能
电气工程
工程类
电压
历史
材料科学
机械工程
艺术
文学类
考古
冶金
作者
Bingke Zhu,Zhaopeng Gu,Guibo Zhu,Yingying Chen,Ming Tang,Jinqiao Wang
标识
DOI:10.1109/tim.2024.3522624
摘要
This article investigates the potential of dual CNN-Transformer architectures for generalizable few-shot anomaly detection (GFSAD), a practical yet understudied form of anomaly detection (AD). In GFSAD, a common model must be learned and shared across several categories, while simultaneously ensuring that the model is adaptable to new categories with a restricted number of normal images. While CNN-Transformer architectures obtain high success in many vision tasks, the potential of CNN-Transformer architectures in GFSAD is still to be discovered. In this article, we introduce ADFormer, a dual CNN-Transformer architecture that combines the strengths of CNNs and Transformers, to learn discriminative features that have both local and global receptive fields. We also incorporate a self-supervised bipartite matching approach in ADFormer that reconstructs query images from support images, followed by detecting anomalies based on the high loss in reconstruction. Additionally, we present a consistency-enhanced loss to enhance the spatial and semantic consistency of features, thereby reducing the dependence on a large AD dataset for training. Experimental results show that ADFormer with consistency-enhanced loss significantly improves GFSAD performance. Compared to other AD methods, ADFormer outperforms considerably the MVTec AD, MPDD, and VisA datasets.
科研通智能强力驱动
Strongly Powered by AbleSci AI