计算机科学
图像编辑
人工智能
图像(数学)
编码器
过程(计算)
水准点(测量)
像素
任务(项目管理)
钥匙(锁)
计算机视觉
自然语言处理
地理
操作系统
计算机安全
大地测量学
管理
经济
作者
Bo Li,Xiao Lin,Bin Liu,Zhifen He,Yu‐Kun Lai
标识
DOI:10.1109/tmm.2023.3289755
摘要
Text-driven image editing aims to manipulate images with the guidance of natural language description. Text is much more natural and intuitive than many other interaction modes, and attracts more attention recently. However, compared with classical supervised learning tasks, there is no standard benchmark dataset for text-driven interactive image editing up to now. Therefore, it is hard to train an end-to-end model for pixel-aligned interactive image editing driven by text. Some methods follow the paradigm of text-to-image models by incorporating the target image into the process of text-to-image generation. However, these methods relying on cross-modal text-to-image generation involve complicated and expensive models, which can lead to inconsistent editing effects. In this article, a novel text-driven image editing method is proposed. Our key observation is that this task can be more efficiently learned using image-to-image translation. To ensure effective learning for image editing, our framework takes paired text and the corresponding images for training, and disentangles each image into content and attributes, such that the content is maintained while the attributes are modified according to the text. Our network is a lightweight encoder-decoder architecture that accomplishes pixel-aligned end-to-end training via cycle-consistent supervision. Quantitative and qualitative experimental results show that the proposed method achieves state-of-the-art performance.
科研通智能强力驱动
Strongly Powered by AbleSci AI