人工智能
计算机科学
分割
计算机视觉
变压器
模式识别(心理学)
弹丸
图像分割
一次性
监督学习
工程类
材料科学
电气工程
电压
冶金
机械工程
人工神经网络
作者
Dahyun Kang,Piotr Koniusz,Minsu Cho,Naila Murray
标识
DOI:10.1109/cvpr52729.2023.01880
摘要
We address the task of weakly-supervised few-shot image classification and segmentation, by leveraging a Vision Transformer (ViT) pretrained with self-supervision. Our proposed method takes token representations from the self-supervised ViT and leverages their correlations, via selfattention, to produce classification and segmentation predictions through separate task heads. Our model is able to effectively learn to perform classification and segmentation in the absence of pixel-level labels during training, using only image-level labels. To do this it uses attention maps, created from tokens generated by the self-supervised ViT backbone, as pixel-level pseudo-labels. We also explore a practical setup with "mixed" supervision, where a small number of training images contains ground-truth pixel-level labels and the remaining images have only image-level labels. For this mixed setup, we propose to improve the pseudo-labels using a pseudo-label enhancer that was trained using the available ground-truth pixel-level labels. Experiments on Pascal-5 i and COCO-20 i demonstrate significant performance gains in a variety of supervision settings, and in particular when little-to-no pixel-level labels are available.
科研通智能强力驱动
Strongly Powered by AbleSci AI