计算机科学
主题(文档)
计算机图形学(图像)
扩散
图像编辑
控制(管理)
人工智能
计算机视觉
图像(数学)
万维网
物理
热力学
作者
Weicheng Wang,Guoli Jia,Zhongqi Zhang,Liang Lin,Jufeng Yang
标识
DOI:10.1109/cvpr52734.2025.01706
摘要
Diffusion models pre-trained on large-scale paired image-text data achieve significant success in image editing. To convey more fine-grained visual details, subject-driven editing integrates subjects in user-provided reference images into existing scenes. However, it is challenging to obtain photorealistic results, which simulate contextual interactions, such as reflections, illumination, and shadows, induced by merging the target object into the source image. To address this issue, we propose PS-Diffusion, which ensures realistic and consistent object-scene blending while maintaining the invariance of subject appearance during editing. To be specific, we first divide the contextual interactions into those occurring in the foreground and the background areas. The effect of the former is estimated through intrinsic image decomposition, and the region of the latter is predicted in an additional background effect control branch. Moreover, we propose an effect attention module to disentangle the learning processes of interaction and subject, alleviating confusion between them. Additionally, we introduce a synthesized dataset, Replace-5K, consisting of 5,000 image pairs with invariant subject and contextual interactions via 3D rendering. Extensive quantitative and qualitative experiments on our dataset and two real-world datasets demonstrate that our method achieves state-of-the-art performance. The code is available in the https://github.com/wei-cheng777/PS-Diffusion.
科研通智能强力驱动
Strongly Powered by AbleSci AI