计算机科学
人工智能
分割
高分辨率
图像分割
计算机视觉
遥感
航空影像
地质学
图像(数学)
作者
Guohui Deng,Zhaocong Wu,Miaozhong Xu,Chengjun Wang,Zhiye Wang,Zhongyuan Lu
标识
DOI:10.1109/tgrs.2023.3276172
摘要
Semantic segmentation is a key means for understanding very-high resolution (VHR) aerial imagery. With the explosive development of deep learning, deep learning methods are being applied to the segmentation of VHR images, with convolutional neural networks (CNNs) as the basic framework. However, owing to the highly complex details present in VHR images and the high spatial dependence of geographical objects, CNN-based methods are inadequate. This is because the inherent locality of CNNs limits the size of the receptive field, thus limiting the ability to obtain long-range context information. To solve this problem, in this paper, we propose a transformer-based novel deep learning model called crisscross-global vision transformers (CGVT). CGVT exploits the transformer's inherent ability to obtain long-range context information to solve the restricted receptive field problem. Specifically, we redesign the self-attention mechanism in the transformer and call it crisscross-global attention. It consists of two parts: crisscross transformer encoder block (CC-TEB) and global squeeze transformer encoder block (GS-TEB). CC-TEB overcomes the limitation of the traditional self-attention design (specifically, difficulty applying it to VHR aerial image segmentation) and further increases the local feature representation ability of the model. GS-TEB increases the global feature representation ability of the model. The results of experiments conducted on the popular ISPRS Vaihingen, IEEE GRSS Data Fusion Contest Zeebrugge, and LoveDA Semantic Segmentation Challenge datasets verify the effectiveness and superiority of our proposed method. Specifically, it achieved state-of-the-art performance on both Zeebrugge and LoveDA datasets, and is currently ranked second in Vaihingen dataset.
科研通智能强力驱动
Strongly Powered by AbleSci AI