判别式
人工智能
计算机科学
上下文图像分类
模式识别(心理学)
选择(遗传算法)
相似性(几何)
图像(数学)
班级(哲学)
计算机视觉
作者
Min Yuan,Ningning Lv,Yufei Xie,Fuxiang Lu,Kun Zhan
标识
DOI:10.1109/icip49359.2023.10223197
摘要
Fine-grained Visual Classification (FGVC), which aims to identify objects from subcategories, presents great challenges for classification due to large intra-class differences and subtle inter-class differences. To address these issues of FGVC, this paper proposes a patch selection model referenced from CLIP for Fine Grained Visual Classification, namely CLIP-FG. Specifically, unlike the previous CLIP, which focused only on the level of text and image, we calculate the similarity between labels and image patches. Top k image patches are selected and their indexes fed into the Vision Transformer to select discriminative areas to improve the performance of fine grained image classification. Quantitative evaluations show CLIP-FG's competitive performance against mainstream methods.
科研通智能强力驱动
Strongly Powered by AbleSci AI