High-Resolution Image Synthesis with Latent Diffusion Models

计算机科学 人工智能 像素 推论 修补 图像翻译 计算机视觉 图像(数学)
作者
Robin Rombach,Andreas Blattmann,Dominik Lorenz,Patrick Esser,Björn Ommer
标识
DOI:10.1109/cvpr52688.2022.01042
摘要

By decomposing the image formation process into a sequential application of denoising autoencoders, diffusion models (DMs) achieve state-of-the-art synthesis results on image data and beyond. Additionally, their formulation allows for a guiding mechanism to control the image generation process without retraining. However, since these models typically operate directly in pixel space, optimization of powerful DMs often consumes hundreds of GPU days and inference is expensive due to sequential evaluations. To enable DM training on limited computational resources while retaining their quality and flexibility, we apply them in the latent space of powerful pretrained autoencoders. In contrast to previous work, training diffusion models on such a representation allows for the first time to reach a near-optimal point between complexity reduction and detail preservation, greatly boosting visual fidelity. By introducing cross-attention layers into the model architecture, we turn diffusion models into powerful and flexible generators for general conditioning inputs such as text or bounding boxes and high-resolution synthesis becomes possible in a convolutional manner. Our latent diffusion models (LDMs) achieve new state of the art scores for image inpainting and class-conditional image synthesis and highly competitive performance on various tasks, including unconditional image generation, text-to-image synthesis, and super-resolution, while significantly reducing computational requirements compared to pixel-based DMs.
最长约 10秒,即可获得该文献文件

科研通智能强力驱动
Strongly Powered by AbleSci AI
更新
PDF的下载单位、IP信息已删除 (2025-6-4)

科研通是完全免费的文献互助平台,具备全网最快的应助速度,最高的求助完成率。 对每一个文献求助,科研通都将尽心尽力,给求助人一个满意的交代。
实时播报
懒惰饼子完成签到,获得积分10
1秒前
1秒前
爆米花应助负责念梦采纳,获得10
2秒前
liber发布了新的文献求助10
4秒前
5秒前
guoguo发布了新的文献求助10
5秒前
8秒前
超级桂花糕完成签到 ,获得积分10
8秒前
xxfsx应助Xu采纳,获得10
11秒前
Orange应助BowieHuang采纳,获得30
12秒前
13秒前
温与暖完成签到,获得积分10
13秒前
席田兰完成签到,获得积分10
13秒前
干净的烧鹅完成签到,获得积分10
14秒前
秦坦发布了新的文献求助10
14秒前
ZXB完成签到,获得积分10
15秒前
咸鱼中下游完成签到,获得积分10
17秒前
所所应助shi hui采纳,获得10
17秒前
杰果完成签到,获得积分10
19秒前
赫连立果完成签到,获得积分10
20秒前
追寻若冰发布了新的文献求助10
20秒前
Gzh完成签到,获得积分10
20秒前
共享精神应助健忘捕采纳,获得10
21秒前
平淡的绮彤完成签到 ,获得积分10
21秒前
21秒前
guoguo完成签到,获得积分10
21秒前
美好乐松发布了新的文献求助30
22秒前
子车谷波完成签到,获得积分10
23秒前
温暖锦程完成签到,获得积分10
24秒前
GXL完成签到,获得积分10
25秒前
科研通AI5应助流派采纳,获得10
25秒前
关远航完成签到,获得积分10
27秒前
30秒前
浮游应助Mr采纳,获得10
30秒前
李健的小迷弟应助加缪采纳,获得10
31秒前
lzq发布了新的文献求助10
32秒前
AbOO完成签到,获得积分10
33秒前
华仔应助清秀的小刺猬采纳,获得10
34秒前
紧张的铅笔完成签到,获得积分10
34秒前
勤奋梨愁发布了新的文献求助10
34秒前
高分求助中
Pipeline and riser loss of containment 2001 - 2020 (PARLOC 2020) 1000
哈工大泛函分析教案课件、“72小时速成泛函分析:从入门到入土.PDF”等 660
The Emotional Life of Organisations 500
Comparing natural with chemical additive production 500
The Leucovorin Guide for Parents: Understanding Autism’s Folate 500
Phylogenetic study of the order Polydesmida (Myriapoda: Diplopoda) 500
A Manual for the Identification of Plant Seeds and Fruits : Second revised edition 500
热门求助领域 (近24小时)
化学 医学 生物 材料科学 工程类 有机化学 内科学 生物化学 物理 计算机科学 纳米技术 遗传学 基因 复合材料 化学工程 物理化学 病理 催化作用 免疫学 量子力学
热门帖子
关注 科研通微信公众号,转发送积分 5215156
求助须知:如何正确求助?哪些是违规求助? 4390335
关于积分的说明 13669629
捐赠科研通 4252050
什么是DOI,文献DOI怎么找? 2332987
邀请新用户注册赠送积分活动 1330600
关于科研通互助平台的介绍 1284361