概率逻辑
可扩展性
计算机科学
变更检测
人工智能
生成语法
比例(比率)
过程(计算)
生成模型
遥感
像素
数据挖掘
事件(粒子物理)
变压器
语义变化
随机建模
机器学习
数据建模
时间序列
随机过程
深度学习
图形模型
分割
数据同化
统计模型
作者
Zhuo Zheng,Stefano Ermon,Dongjun Kim,Liangpei Zhang,Yanfei Zhong
标识
DOI:10.1109/tpami.2024.3475824
摘要
Our understanding of the temporal dynamics of the Earth's surface has been significantly advanced by deep vision models, which often require a massive amount of labeled multi-temporal images for training. However, collecting, preprocessing, and annotating multi-temporal remote sensing images at scale is non-trivial since it is expensive and knowledge-intensive. In this paper, we present scalable multi-temporal change data generators based on generative models, which are cheap and automatic, alleviating these data problems. Our main idea is to simulate a stochastic change process over time. We describe the stochastic change process as a probabilistic graphical model, namely the generative probabilistic change model (GPCM), which factorizes the complex simulation problem into two more tractable sub-problems, i.e., condition-level change event simulation and image-level semantic change synthesis. To solve these two problems, we present Changen2, a GPCM implemented with a resolution-scalable diffusion transformer which can generate time series of remote sensing images and corresponding semantic and change labels from labeled and even unlabeled single-temporal images. Changen2 is a "generative change foundation model" that can be trained at scale via self-supervision, and is capable of producing change supervisory signals from unlabeled single-temporal images. Unlike existing "foundation models", our generative change foundation model synthesizes change data to train task-specific foundation models for change detection. The resulting model possesses inherent zero-shot change detection capabilities and excellent transferability. Comprehensive experiments suggest Changen2 has superior spatiotemporal scalability in data generation, e.g., Changen2 model trained on 256 pixel single-temporal images can yield time series of any length and resolutions of 1,024 pixels. Changen2 pre-trained models exhibit superior zero-shot performance (narrowing the performance gap to 3% on LEVIR-CD and approximately 10% on both S2Looking and SECOND, compared to fully supervised counterpart) and transferability across multiple types of change tasks, including ordinary and off-nadir building change, land-use/land-cover change, and disaster assessment.
科研通智能强力驱动
Strongly Powered by AbleSci AI