TPOV-Seg: Textually Enhanced Prompt Tuning of Vision-Language Models for Open-Vocabulary Remote Sensing Semantic Segmentation

计算机科学判别式分割编码器人工智能变压器词汇特征（语言学）遥感一般化遥感应用特征提取语义映射机器学习土地覆盖隐藏字幕语义特征自然语言处理特征学习领域（数学分析）语义学（计算机科学）模式识别（心理学）图像分割词（群论）语义数据模型主题模型专题地图

作者

Xiaokang Zhang,Chufeng Zhou,Jianzhong Huang,Lefei Zhang

出处

期刊：IEEE Transactions on Geoscience and Remote Sensing [Institute of Electrical and Electronics Engineers]
日期：2025-01-01 卷期号：63: 1-17 被引量：1

标识

DOI：10.1109/tgrs.2025.3624767

摘要

Remote sensing semantic segmentation faces significant challenges in open-world scenarios due to domain gaps and the presence of unseen categories in the test datasets. Open-vocabulary semantic segmentation (OVSS) based on vision-language models (VLMs) has emerged as a promising paradigm for remote sensing imagery interpretation, which enables adaptation to new datasets with arbitrary semantic categories. However, current OVSS approaches often struggle to achieve fine-grained pixel-level localization and classification for unseen categories when relying solely on fixed textual prompts and pretrained VLM encoders. The model’s generalization capability is further hindered by insufficiently fine-grained and adaptive textual representations. To address these limitations, we propose TPOV-Seg, a textually enhanced prompt tuning approach for OVSS Specifically, a remote sensing-specific Text TempLator (TTL) is introduced to enrich textual prompts and semantic representations for land cover categories by incorporating synonymous vocabulary combinations. To efficiently align the text encoder with remote sensing characteristics, a Lightweight Text-aware Prompt Tuning (LTP-Tuning) strategy is proposed for contextual modeling of word embeddings adaptation. Furthermore, a Textual-Guided Channel-Aware Aggregator (TGCA) is developed to promote inter-channel feature interaction and facilitate semantic modeling, leveraging Grouped Cross-Channel Transformers and linear Transformers under the guidance of enhanced textual features from TTL. Extensive experiments on five large-scale remote sensing segmentation datasets demonstrate that TPOV-Seg outperforms existing methods in OVSS tasks, showing strong discriminative ability for unseen categories while maintaining robust cross-domain generalization. The source codes will be available at https://github.com/zxk688/TPOVSeg.

求助该文献

最长约 10秒，即可获得该文献文件

TPOV-Seg: Textually Enhanced Prompt Tuning of Vision-Language Models for Open-Vocabulary Remote Sensing Semantic Segmentation

今日热心研友