计算机科学
解耦(概率)
计算机视觉
图像(数学)
对象(语法)
人工智能
情报检索
工程类
控制工程
作者
Wenda Zhao,Zhepu Zhang,Fan Zhao,Haipeng Wang,You He,Huchuan Lu
出处
期刊:PubMed
日期:2025-08-15
卷期号:PP
标识
DOI:10.1109/tpami.2025.3599520
摘要
Remote sensing images usually reveal various objects with complex structures and different locations within vast ground area backgrounds. That leads to a major challenge for conventional generative models in handling remote sensing objects with correct shapes and clear textures. Integrating additional object-level controls can be a potential solution to improve generation quality, yet previous approaches inject the object-related conditions by specifying their locations, causing a limitation in object layout in generated results. To enable high object fidelity, high layout diversity and object customizable generation for remote sensing images, we propose a remote sensing image generation via object text decoupling, namely OTD-GAN. OTD-GAN takes advantage of the inherent text-toimage generation procedure and adaptively integrates the decoupled textual representations of visual objects into the global captions, thus achieving object-level controls without layout restrictions. Specifically, we design an object text decoupling module to predict a semantically consistent textual representation for each object. By decoupling the textual representation into a class invariant part and an object specific part, the converted representation is able to catch general semantics for similar objects as well as differentiated details for individual objects. After that, we use an object text semantic enhancement module to fuse the obtained object text representations with the global captions to enrich the object-related semantics within the textual modality. As a result, the generator will benefit from the object conditions and reinforce the generation quality while remaining flexible to create diverse layouts. Extensive experiments on remote sensing image-caption datasets including NWPU-Captions and RSICD demonstrate that our method achieves leading performance compared to existing state-of-the-art approaches.
科研通智能强力驱动
Strongly Powered by AbleSci AI