遥感
计算机科学
杠杆(统计)
数据科学
追踪
变更检测
人工智能
时间尺度
图表
语义异质性
空间分析
钥匙(锁)
语义解释
遥感应用
口译(哲学)
作者
Chenyang Liu,J. Q. Zhang,Keyan Chen,Man Wang,Zhengxia Zou,Zhenwei Shi
标识
DOI:10.1109/mgrs.2025.3598283
摘要
The interpretation of multitemporal remote sensing imagery is critical for monitoring Earth’s dynamic processes. However, previous change detection (CD) methods, which produce binary or semantic masks, fall short of providing human-readable insights into changes. Recent advances in vision–language models (VLMs) have opened a new frontier by fusing visual and linguistic modalities, enabling spatiotemporal vision–language understanding: models that not only capture spatial and temporal dependencies to recognize changes but also provide a richer interactive semantic analysis of temporal images (e.g., generate descriptive captions and answer natural language queries). In this survey, we present the first comprehensive review of remote sensing spatiotemporal VLMs (RS-STVLMs). The survey covers the evolution of models from early task-specific models to recent general foundation models that leverage powerful large language models (LLMs). We discuss progress in representative tasks, such as change captioning, change question answering, and change grounding. Moreover, we systematically dissect the fundamental components and key technologies underlying these models and review the datasets and evaluation metrics that have driven the field. By synthesizing task-level insights with a deep dive into shared architectural patterns, we aim to illuminate current achievements and chart promising directions for future research in spatiotemporal vision–language understanding for remote sensing. We will keep tracing related works at https://github.com/Chen-Yang-Liu/Awesome-RS-SpatioTemporal-VLMs.
科研通智能强力驱动
Strongly Powered by AbleSci AI