计算机科学
变压器
分割
人工智能
编码器
棱锥(几何)
深度学习
解析
模式识别(心理学)
计算机视觉
电压
量子力学
操作系统
光学
物理
作者
Teerapong Panboonyuen,Kulsawasd Jitkajornwanich,Siam Lawawirojwong,Panu Srestasathiern,Peerapon Vateekul
出处
期刊:Remote Sensing
[MDPI AG]
日期:2021-12-15
卷期号:13 (24): 5100-5100
被引量:26
摘要
Transformers have demonstrated remarkable accomplishments in several natural language processing (NLP) tasks as well as image processing tasks. Herein, we present a deep-learning (DL) model that is capable of improving the semantic segmentation network in two ways. First, utilizing the pre-training Swin Transformer (SwinTF) under Vision Transformer (ViT) as a backbone, the model weights downstream tasks by joining task layers upon the pretrained encoder. Secondly, decoder designs are applied to our DL network with three decoder designs, U-Net, pyramid scene parsing (PSP) network, and feature pyramid network (FPN), to perform pixel-level segmentation. The results are compared with other image labeling state of the art (SOTA) methods, such as global convolutional network (GCN) and ViT. Extensive experiments show that our Swin Transformer (SwinTF) with decoder designs reached a new state of the art on the Thailand Isan Landsat-8 corpus (89.8% F1 score), Thailand North Landsat-8 corpus (63.12% F1 score), and competitive results on ISPRS Vaihingen. Moreover, both our best-proposed methods (SwinTF-PSP and SwinTF-FPN) even outperformed SwinTF with supervised pre-training ViT on the ImageNet-1K in the Thailand, Landsat-8, and ISPRS Vaihingen corpora.
科研通智能强力驱动
Strongly Powered by AbleSci AI