人工智能
分割
计算机科学
像素
基本事实
模式识别(心理学)
变压器
帕斯卡(单位)
计算机视觉
安全性令牌
图像分割
一次性
工程类
机械工程
计算机安全
电压
电气工程
程序设计语言
作者
Dahyun Kang,Piotr Koniusz,Minsu Cho,Naila Murray
出处
期刊:Cornell University - arXiv
日期:2023-07-07
标识
DOI:10.48550/arxiv.2307.03407
摘要
We address the task of weakly-supervised few-shot image classification and segmentation, by leveraging a Vision Transformer (ViT) pretrained with self-supervision. Our proposed method takes token representations from the self-supervised ViT and leverages their correlations, via self-attention, to produce classification and segmentation predictions through separate task heads. Our model is able to effectively learn to perform classification and segmentation in the absence of pixel-level labels during training, using only image-level labels. To do this it uses attention maps, created from tokens generated by the self-supervised ViT backbone, as pixel-level pseudo-labels. We also explore a practical setup with ``mixed" supervision, where a small number of training images contains ground-truth pixel-level labels and the remaining images have only image-level labels. For this mixed setup, we propose to improve the pseudo-labels using a pseudo-label enhancer that was trained using the available ground-truth pixel-level labels. Experiments on Pascal-5i and COCO-20i demonstrate significant performance gains in a variety of supervision settings, and in particular when little-to-no pixel-level labels are available.
科研通智能强力驱动
Strongly Powered by AbleSci AI