图像编辑
计算机科学
发电机(电路理论)
图像(数学)
地点
人工智能
面子(社会学概念)
领域(数学分析)
计算机视觉
人机交互
数学分析
社会科学
功率(物理)
语言学
物理
哲学
数学
量子力学
社会学
作者
Changming Xiao,Yang Qi,Xiaoqiang Xu,Jianwei Zhang,Feng Zhu,Changshui Zhang
标识
DOI:10.1016/j.patcog.2023.109458
摘要
Leveraging the abundant knowledge learned from pre-trained multi-modal models like CLIP has recently proved to be effective for text-guided image editing. Though convincing results have been made when combining the image generator StyleGAN with CLIP, most methods need to train separate models for different prompts, and irrelevant regions are often changed after editing due to the lack of spatial disentanglement. We propose a novel framework that can edit different images according to different prompts in one model. Besides, an innovative region-based spatial attention mechanism is adopted to explicitly guarantee the locality of editing. Experiments mainly in the face domain verify the feasibility of our framework and show that when multi-text editing and local editing are accomplishable, our method can complete practical applications like sequential editing and regional style transfer.
科研通智能强力驱动
Strongly Powered by AbleSci AI