Transferable image synthesis for remote sensing semantic segmentation via joint reference-semantic fusion

计算机科学分割人工智能特征（语言学）接头（建筑物）语义鸿沟生成语法生成模型语义数据模型图像（数学）模式识别（心理学）传感器融合图像分割语义特征训练集特征学习语义学（计算机科学）数据驱动数据挖掘标记数据搜索引擎索引机器学习相似性（几何）特征提取合成数据情报检索计算机视觉图像融合

作者

Runmin Dong,Shuai Yuan,Litong Feng,Jinxiao Zhang,Weijia Li,Mengxuan Chen,Bin Luo,Wei Zhang,Haohuan Fu

出处

期刊：Information Fusion [Elsevier BV]
日期：2025-10-15 卷期号：127: 103839-103839 被引量：1

标识

DOI：10.1016/j.inffus.2025.103839

摘要

• A novel transferable image synthesis method for remote sensing semantic segmentation • Improving the diversity of synthetic data by introducing unlabeled target data • Integrating reference images, semantic masks, and text prompts during model training • Facilitating multi-modal information interaction through a joint learning strategy • Verifying that our data-centric method can further improve modelcentric results With the advancement of diffusion model-based generative methods, synthesizing pixel-level training datasets has emerged as a promising approach to mitigate the scarcity of annotated data in semantic segmentation tasks. However, a noticeable gap persists between data synthesis and semantic segmentation tasks in the remote sensing (RS) domain. Due to the diversity of RS data across sensors and spatial scales, relying solely on limited annotated data and pre-trained generative foundation models to synthesize training data brings minor improvements in RS semantic segmentation tasks. Therefore, it becomes crucial to incorporate large volumes of unlabeled external data into downstream tasks to enable more transferable image synthesis. Unlike training-free approaches that introduce reference (Ref) images primarily for shallow feature transfer, we propose a joint learning strategy that integrates Ref images, semantic masks, and text prompts during training. This facilitates multi-modal interaction and allows the model to capture deeper features such as content. To achieve effective multi-modal information fusion, the proposed Transferable Image Synthesis method (TISynth) avoids directly using real Ref images during training. Instead, it generates Ref images from augmented input images, and facilitates interaction between Ref images and semantic information through text prompts and an all-in-attention module. As a data augmentation approach for semantic segmentation task, TISynth improve the OA/mIoU/mAcc by 1.52%/2.32%/3.04% on FUSU-4k, 1.33%/1.06%/3.04% on GID-26k, and 1.15%/1.67%/2.04% on LoveDA (Rural → Urban), compared to the baseline trained only on the original data. Moreover, compared to state-of-the-art segmentation training data synthesis methods, our approach achieves superior performance across datasets of varying scales, resolutions, segmentation complexities, and domains. Our code is available at https://github.com/dongrunmin/TISynth.git .

求助该文献

最长约 10秒，即可获得该文献文件

Transferable image synthesis for remote sensing semantic segmentation via joint reference-semantic fusion

今日热心研友