计算机科学
分割
人工智能
特征(语言学)
接头(建筑物)
语义鸿沟
生成语法
生成模型
语义数据模型
图像(数学)
模式识别(心理学)
传感器融合
图像分割
语义特征
训练集
特征学习
语义学(计算机科学)
数据驱动
数据挖掘
标记数据
搜索引擎索引
机器学习
相似性(几何)
特征提取
合成数据
情报检索
计算机视觉
图像融合
作者
Runmin Dong,Shuai Yuan,Litong Feng,Jinxiao Zhang,Weijia Li,Mengxuan Chen,Bin Luo,Wei Zhang,Haohuan Fu
标识
DOI:10.1016/j.inffus.2025.103839
摘要
• A novel transferable image synthesis method for remote sensing semantic segmentation • Improving the diversity of synthetic data by introducing unlabeled target data • Integrating reference images, semantic masks, and text prompts during model training • Facilitating multi-modal information interaction through a joint learning strategy • Verifying that our data-centric method can further improve modelcentric results With the advancement of diffusion model-based generative methods, synthesizing pixel-level training datasets has emerged as a promising approach to mitigate the scarcity of annotated data in semantic segmentation tasks. However, a noticeable gap persists between data synthesis and semantic segmentation tasks in the remote sensing (RS) domain. Due to the diversity of RS data across sensors and spatial scales, relying solely on limited annotated data and pre-trained generative foundation models to synthesize training data brings minor improvements in RS semantic segmentation tasks. Therefore, it becomes crucial to incorporate large volumes of unlabeled external data into downstream tasks to enable more transferable image synthesis. Unlike training-free approaches that introduce reference (Ref) images primarily for shallow feature transfer, we propose a joint learning strategy that integrates Ref images, semantic masks, and text prompts during training. This facilitates multi-modal interaction and allows the model to capture deeper features such as content. To achieve effective multi-modal information fusion, the proposed Transferable Image Synthesis method (TISynth) avoids directly using real Ref images during training. Instead, it generates Ref images from augmented input images, and facilitates interaction between Ref images and semantic information through text prompts and an all-in-attention module. As a data augmentation approach for semantic segmentation task, TISynth improve the OA/mIoU/mAcc by 1.52%/2.32%/3.04% on FUSU-4k, 1.33%/1.06%/3.04% on GID-26k, and 1.15%/1.67%/2.04% on LoveDA (Rural → Urban), compared to the baseline trained only on the original data. Moreover, compared to state-of-the-art segmentation training data synthesis methods, our approach achieves superior performance across datasets of varying scales, resolutions, segmentation complexities, and domains. Our code is available at https://github.com/dongrunmin/TISynth.git .
科研通智能强力驱动
Strongly Powered by AbleSci AI