计算机科学
人工智能
分割
变压器
计算机视觉
自然语言处理
图像分割
模式识别(心理学)
机器学习
工程类
电气工程
电压
作者
Xinting Hu,Li Jiang,Bernt Schiele
标识
DOI:10.1109/cvpr52733.2024.00384
摘要
We present S4 Former, a novel approach to training Vision Transformers for Semi-Supervised Semantic Segmentation (S4). At its core, S4 Former employs a Vision Transformer within a classic teacher-student framework, and then leverages three novel technical ingredients: PatchShuffle as a parameter-free perturbation technique, Patch-Adaptive Self-Attention (PASA) as a fine-grainedfeature modulation method, and the innovative Negative Class Ranking (NCR) regularization loss. Based on these regu-larization modules aligned with Transformer-specific char-acteristics across the image input, feature, and output di-mensions, S4Former exploits the Transformer's ability to capture and differentiate consistent global contextual information in unlabeled images. Overall, S4 Former not only defines a new state of the art in S4 but also maintains a streamlined and scalable architecture. Being readily compatible with existing frameworks, S4 Former achieves strong improvements (up to 4.9%) on benchmarks like Pascal VOC 2012, COCO, and Cityscapes, with varying numbers of labeled data. The code is at https://github.com/JoyHuYY1412/S4Former.
科研通智能强力驱动
Strongly Powered by AbleSci AI