计算机科学
人工智能
嵌入
卷积神经网络
模式识别(心理学)
变压器
视觉对象识别的认知神经科学
特征学习
计算机视觉
机器学习
特征提取
物理
量子力学
电压
作者
Tong Su,Shuo Ye,Chengqun Song,Jun Cheng
标识
DOI:10.1109/icip46576.2022.9897963
摘要
Fine-grained visual classification (FGVC) targets to accurately identify the subordinate categories from a target class. Convolutional neural network (CNN) based methods prove that the attention mechanism can enhance the representation of local regions and improve the recognition accuracy. Recently, vision transformer (ViT) has shown great application potential in image classification tasks by taking advantage of its inherent self-attention mechanism and early global information acquisition capability. However, this global information acquisition approach involves an irrelevant environment in the interaction process, which makes it difficult for fine-grained tasks that rely on local differences to quickly learn discriminant features. To this end, we propose a hybrid network termed Mask-ViT, which can effectively avoid environmental interference and express more robust features by focusing on the instance itself. Specifically, Contour Knowledge Embedding (CKE) is employed to transferred prior location information to ViT and guided the subsequent recognition. The experiments on three benchmarks demonstrate the effectiveness of the proposed method.
科研通智能强力驱动
Strongly Powered by AbleSci AI