Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2205.11487
Cited By
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
23 May 2022
Chitwan Saharia
William Chan
Saurabh Saxena
Lala Li
Jay Whang
Emily L. Denton
Seyed Kamyar Seyed Ghasemipour
Burcu Karagol Ayan
S. S. Mahdavi
Raphael Gontijo-Lopes
Tim Salimans
Jonathan Ho
David J Fleet
Mohammad Norouzi
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding"
50 / 1,364 papers shown
Title
Simpler Diffusion (SiD2): 1.5 FID on ImageNet512 with pixel-space diffusion
Emiel Hoogeboom
Thomas Mensink
Jonathan Heek
Kay Lamerigts
Ruiqi Gao
Tim Salimans
470
13
0
25 Oct 2024
Robust Watermarking Using Generative Priors Against Image Editing: From Benchmarking to Advances
Shilin Lu
Zihan Zhou
Jiayou Lu
Yuanzhi Zhu
A. Kong
WIGM
145
15
0
24 Oct 2024
Diff-Instruct++: Training One-step Text-to-image Generator Model to Align with Human Preferences
Weijian Luo
EGVM
114
9
0
24 Oct 2024
Multi-Scale Diffusion: Enhancing Spatial Layout in High-Resolution Panoramic Image Generation
Xiaoyu Zhang
Teng Zhou
Xinlong Zhang
Jia Wei
Yongchuan Tang
98
2
0
24 Oct 2024
Fast constrained sampling in pre-trained diffusion models
Alexandros Graikos
Nebojsa Jojic
Dimitris Samaras
DiffM
137
1
0
24 Oct 2024
FreCaS: Efficient Higher-Resolution Image Generation via Frequency-aware Cascaded Sampling
Zhengqiang Zhang
Ruihuang Li
Lei Zhang
104
3
0
24 Oct 2024
Progressive Compositionality in Text-to-Image Generative Models
Xu Han
Linghao Jin
Xiaofeng Liu
Paul Pu Liang
CoGe
149
4
0
22 Oct 2024
TopoDiffusionNet: A Topology-aware Diffusion Model
Saumya Gupta
Dimitris Samaras
Chong Chen
DiffM
146
4
0
22 Oct 2024
Triplane Grasping: Efficient 6-DoF Grasping with Single RGB Images
Yiming Li
Hanchi Ren
Yue Yang
Jingjing Deng
Xianghua Xie
116
0
0
21 Oct 2024
BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities
Shaozhe Hao
Xuantong Liu
Xianbiao Qi
Shihao Zhao
Bojia Zi
Rong Xiao
Kai Han
Kwan-Yee K. Wong
198
3
0
18 Oct 2024
FiTv2: Scalable and Improved Flexible Vision Transformer for Diffusion Model
ZiDong Wang
Zeyu Lu
Di Huang
Cai Zhou
Wanli Ouyang
and Lei Bai
123
6
0
17 Oct 2024
DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation
Guosheng Zhao
Chaojun Ni
Xiaofeng Wang
Zheng Zhu
Xinming Zhang
...
Xinze Chen
Boyuan Wang
Youyi Zhang
Wenjun Mei
Xingang Wang
VGen
174
32
0
17 Oct 2024
MagicTailor: Component-Controllable Personalization in Text-to-Image Diffusion Models
Donghao Zhou
Jiancheng Huang
J. Bai
Jiaze Wang
Hao Chen
Guangyong Chen
Xiaowei Hu
Pheng Ann Heng
146
5
0
17 Oct 2024
TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration
Yiwei Guo
Shaobin Zhuang
Kunchang Li
Yu Qiao
Yali Wang
VLM
CLIP
128
1
0
16 Oct 2024
FlashAudio: Rectified Flows for Fast and High-Fidelity Text-to-Audio Generation
Huadai Liu
Jialei Wang
Rongjie Huang
Yang Liu
H. Lu
Zhou Zhao
Wei Xue
65
3
0
16 Oct 2024
Feature-guided score diffusion for sampling conditional densities
Zahra Kadkhodaie
S. Mallat
Eero P. Simoncelli
DiffM
90
0
0
15 Oct 2024
DIAR: Diffusion-model-guided Implicit Q-learning with Adaptive Revaluation
Jaehyun Park
Yunho Kim
Sejin Kim
Byung-Jun Lee
Sundong Kim
OffRL
83
1
0
15 Oct 2024
Ctrl-U: Robust Conditional Image Generation via Uncertainty-aware Reward Modeling
Guiyu Zhang
Huan-ang Gao
Zijian Jiang
Hao Zhao
Zhedong Zheng
EGVM
119
6
0
15 Oct 2024
High-Resolution Frame Interpolation with Patch-based Cascaded Diffusion
Junhwa Hur
Charles Herrmann
Saurabh Saxena
Janne Kontkanen
Wei-Sheng Lai
Yichang Shih
Michael Rubinstein
David J. Fleet
Deqing Sun
157
1
0
15 Oct 2024
Improving Long-Text Alignment for Text-to-Image Diffusion Models
Luping Liu
Chao Du
Tianyu Pang
Zehan Wang
Chongxuan Li
Dong Xu
VLM
119
8
0
15 Oct 2024
big.LITTLE Vision Transformer for Efficient Visual Recognition
He Guo
Yulong Wang
Zixuan Ye
Jifeng Dai
Yuwen Xiong
ViT
92
0
0
14 Oct 2024
ForgeryGPT: Multimodal Large Language Model For Explainable Image Forgery Detection and Localization
Jiawei Li
Fanrui Zhang
Jiaying Zhu
Esther Sun
Qiang Zhang
Zheng-jun Zha
MLLM
160
14
0
14 Oct 2024
Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective
Xiangru Zhu
Penglei Sun
Yaoxian Song
Yanghua Xiao
Zhixu Li
Chengyu Wang
Jun Huang
Bei Yang
Xiaoxiao Xu
EGVM
514
2
0
14 Oct 2024
Provable Convergence and Limitations of Geometric Tempering for Langevin Dynamics
Omar Chehab
Anna Korba
Austin Stromme
Adrien Vacher
165
4
0
13 Oct 2024
CtrLoRA: An Extensible and Efficient Framework for Controllable Image Generation
Yifeng Xu
Zhenliang He
Shiguang Shan
Xilin Chen
DiffM
65
6
0
12 Oct 2024
SceneCraft: Layout-Guided 3D Scene Generation
Xiuyu Yang
Yunze Man
Jun-Kun Chen
Yu-Xiong Wang
3DV
178
9
0
11 Oct 2024
Distillation of Discrete Diffusion through Dimensional Correlations
Satoshi Hayakawa
Yuhta Takida
Masaaki Imaizumi
Hiromi Wakaki
Yuki Mitsufuji
DiffM
172
4
0
11 Oct 2024
DART: Denoising Autoregressive Transformer for Scalable Text-to-Image Generation
Jiatao Gu
Yuyang Wang
Yizhe Zhang
Qihang Zhang
Dinghuai Zhang
Navdeep Jaitly
Josh Susskind
Shuangfei Zhai
DiffM
133
17
0
10 Oct 2024
Towards Synergistic, Generalized, and Efficient Dual-System for Robotic Manipulation
Qingwen Bu
Hongyang Li
Li Chen
Jisong Cai
Jia Zeng
Heming Cui
Maoqing Yao
Yu Qiao
150
11
0
10 Oct 2024
Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis
Jinbin Bai
Tian-Chun Ye
Wei Chow
Enxin Song
Qing-Guo Chen
Hefei Ling
Zhen Dong
Lei Zhu
162
19
0
10 Oct 2024
AvatarGO: Zero-shot 4D Human-Object Interaction Generation and Animation
Yukang Cao
Liang Pan
Kai Han
Kwan-Yee K. Wong
Ziwei Liu
VGen
129
6
0
09 Oct 2024
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Sihyun Yu
Sangkyung Kwak
Huiwon Jang
Jongheon Jeong
Jonathan Huang
Jinwoo Shin
Saining Xie
OCL
184
102
0
09 Oct 2024
Pyramidal Flow Matching for Efficient Video Generative Modeling
Yang Jin
Zhicheng Sun
Ningyuan Li
Kun Xu
K. Xu
...
Nan Zhuang
Quzhe Huang
Yang Song
Yadong Mu
Zhouchen Lin
VGen
168
87
0
08 Oct 2024
Training-free Diffusion Model Alignment with Sampling Demons
Po-Hung Yeh
Kuang-Huei Lee
Jun-Cheng Chen
100
9
0
08 Oct 2024
Holistic Unlearning Benchmark: A Multi-Faceted Evaluation for Text-to-Image Diffusion Model Unlearning
Saemi Moon
M. Lee
Sangdon Park
Dongwoo Kim
94
3
0
08 Oct 2024
TweedieMix: Improving Multi-Concept Fusion for Diffusion-based Image/Video Generation
Gihyun Kwon
Jong Chul Ye
DiffM
120
5
0
08 Oct 2024
Diffusion Models in 3D Vision: A Survey
Zhen Wang
Dongyuan Li
Xue Liu
Tianyu He
Jiang Bian
Renhe Jiang
MedIm
252
4
0
07 Oct 2024
LANTERN: Accelerating Visual Autoregressive Models with Relaxed Speculative Decoding
Doohyuk Jang
Sihwan Park
J. Yang
Yeonsung Jung
Jihun Yun
Souvik Kundu
Sung-Yub Kim
Eunho Yang
139
10
0
04 Oct 2024
ControlAR: Controllable Image Generation with Autoregressive Models
Zongming Li
Tianheng Cheng
Shoufa Chen
Peize Sun
Haocheng Shen
Longjin Ran
Xiaoxin Chen
Wenyu Liu
Xinggang Wang
DiffM
246
19
0
03 Oct 2024
Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models
Seyedmorteza Sadat
Otmar Hilliges
Romann M. Weber
DiffM
58
13
0
03 Oct 2024
EC-DIT: Scaling Diffusion Transformers with Adaptive Expert-Choice Routing
Haotian Sun
Tao Lei
Bowen Zhang
Yanghao Li
Haoshuo Huang
Ruoming Pang
Bo Dai
Nan Du
DiffM
MoE
195
9
0
02 Oct 2024
KnobGen: Controlling the Sophistication of Artwork in Sketch-Based Diffusion Models
Pouyan Navard
Amin Karimi Monsefi
Mengxi Zhou
Wei-Lun Chao
Alper Yilmaz
R. Ramnath
DiffM
131
3
0
02 Oct 2024
Mining Your Own Secrets: Diffusion Classifier Scores for Continual Personalization of Text-to-Image Diffusion Models
Saurav Jha
Shiqi Yang
Masato Ishii
Mengjie Zhao
Christian Simon
Muhammad Jehanzeb Mirza
Dong Gong
Lina Yao
Shusuke Takahashi
Yuki Mitsufuji
DiffM
147
3
0
01 Oct 2024
Conditional Image Synthesis with Diffusion Models: A Survey
Zheyuan Zhan
Defang Chen
Jian-Ping Mei
Zhenghe Zhao
Jiawei Chen
Chun-Yen Chen
Siwei Lyu
Can Wang
VLM
107
10
0
28 Sep 2024
Discrete Policy: Learning Disentangled Action Space for Multi-Task Robotic Manipulation
Kun Wu
Yichen Zhu
Jinming Li
Junjie Wen
Ning Liu
Zhiyuan Xu
Qinru Qiu
184
8
0
27 Sep 2024
Multimodal Pragmatic Jailbreak on Text-to-image Models
Tong Liu
Zhixin Lai
Jiawen Wang
Gengyuan Zhang
Shuo Chen
Philip Torr
Vera Demberg
Volker Tresp
Jindong Gu
71
5
0
27 Sep 2024
Lotus: Diffusion-based Visual Foundation Model for High-quality Dense Prediction
Jing He
Haodong Li
Wei Yin
Yixun Liang
Leheng Li
Kaiqiang Zhou
Hongbo Zhang
Bingbing Liu
Ying-Cong Chen
DiffM
VLM
220
55
0
26 Sep 2024
Flexiffusion: Segment-wise Neural Architecture Search for Flexible Denoising Schedule
Hongtao Huang
Xiaojun Chang
L. Yao
79
0
0
26 Sep 2024
JoyType: A Robust Design for Multilingual Visual Text Creation
Chao Li
Chen Jiang
Xiaolong Liu
Jun Zhao
Guoxin Wang
DiffM
130
7
0
26 Sep 2024
A Simple but Strong Baseline for Sounding Video Generation: Effective Adaptation of Audio and Video Diffusion Models for Joint Generation
Masato Ishii
Akio Hayakawa
Takashi Shibuya
Yuki Mitsufuji
VGen
DiffM
161
4
0
26 Sep 2024
Previous
1
2
3
...
8
9
10
...
26
27
28
Next