计算机科学
分割
人工智能
变压器
高分辨率
模式识别(心理学)
机器学习
遥感
物理
量子力学
电压
地质学
作者
Yifei Huang,Zideng Feng,Junli Yang,Bin Wang,Jiaying Wang,Zhenglin Xian
标识
DOI:10.1109/icip46576.2022.9897710
摘要
Semantic segmentation for remote sensing images (RSI) has been a thriving research topic for a long time. Existing supervised learning methods usually require a huge amount of labeled data. Meanwhile, large size, variation in object scales, and intricate details in RSI make it essential to capture both long-range context and local information. To address these problems, we propose Le-BEiT, a self-supervised Transformer with an improved positional encoding Local-Enhanced Positional Encoding (LePE). Self-supervised learning relieves the demanding requirement of a large amount of labeled data. The self-attention mechanism in Transformer has remarkable capability in capturing long-range context. Meanwhile, we use LePE as a substitution for Relative Positional Encoding (RPE) to represent local information more effectively. Moreover, considering the domain difference between natural images and RSI, instead of ImageNet-22K, we pre-train Le-BEiT on a very small high-resolution RSI dataset—GID. To investigate the influence of pre-training dataset size on segmentation accuracy, we furtherly conduct experiments on a larger pre-training dataset called GID-DOTA, which is 1/100 of ImageNet-22K, and have observed considerable accuracy improvements. The result of our method, which relies on a much smaller pretrained dataset, achieves competitive accuracy compared to the counterpart on ImageNet-22K.
科研通智能强力驱动
Strongly Powered by AbleSci AI