计算机科学
人工智能
分割
图像分割
模式识别(心理学)
变压器
计算机视觉
自然语言处理
工程类
电气工程
电压
作者
Lingyu Yan,Jiangfeng Chen,Yuanyan Tang
标识
DOI:10.1117/1.jei.33.2.023029
摘要
Weakly supervised semantic segmentation (WSSS) using only image-level labels is a challenging task. Most existing methods utilize class activation map (CAM) to generate pixel-level pseudo labels for supervised training. However, the gap between classification and segmentation hinders the network from obtaining more comprehensive semantic information and generating more accurate pseudo masks for segmentation. To address this issue, we propose TSD-CAM, a transformer-based self distillation (SD) method that utilizes CAM similarity. TSD-CAM uses the similarity between CAMs generated from different views as a distillation target, providing additional supervision for the network and narrowing the gap between classification and segmentation. SD supervision allows the network to acquire more semantic information and refine CAMs to generate higher precision pseudo-labels. In addition, we propose the adaptive pixel refinement module, which adaptively refines and adjusts images based on pixel variations, further improving the precision of pseudo labels. Our method is a fully end-to-end single-stage approach that achieves state-of-the-art 71.3% mIoU on PASCAL VOC 2012 and 42.9% mIoU on the MS COCO 2014 dataset, and the proposed TSD-CAM can significantly outperform other single-stage competitors and achieve comparable performance with state-of-the-art multi-stage methods. Meanwhile, the effectiveness of our method is demonstrated by a large number of ablation experiments, and we provide a new way of thinking to solve the problems of WSSS. Our code is available at: https://github.com/pipizhum/TSD-CAM.
科研通智能强力驱动
Strongly Powered by AbleSci AI