Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2206.10789
Cited By
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
22 June 2022
Jiahui Yu
Yuanzhong Xu
Jing Yu Koh
Thang Luong
Gunjan Baid
Zirui Wang
Vijay Vasudevan
Alexander Ku
Yinfei Yang
Burcu Karagol Ayan
Ben Hutchinson
Wei Han
Zarana Parekh
Xin Li
Han Zhang
Jason Baldridge
Yonghui Wu
EGVM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Scaling Autoregressive Models for Content-Rich Text-to-Image Generation"
50 / 899 papers shown
Title
SINE: SINgle Image Editing with Text-to-Image Diffusion Models
Zhixing Zhang
Ligong Han
Arna Ghosh
Dimitris N. Metaxas
Jian Ren
DiffM
186
160
0
08 Dec 2022
X-Paste: Revisiting Scalable Copy-Paste for Instance Segmentation using CLIP and StableDiffusion
Hanqing Zhao
Dianmo Sheng
Jianmin Bao
Dongdong Chen
Dong Chen
...
Ce Liu
Wenbo Zhou
Qi Chu
Weiming Zhang
Neng H. Yu
VLM
DiffM
106
42
0
07 Dec 2022
Rethinking the Objectives of Vector-Quantized Tokenizers for Image Synthesis
Yuchao Gu
Xintao Wang
Yixiao Ge
Ying Shan
Xiaohu Qie
Mike Zheng Shou
DiffM
98
22
0
06 Dec 2022
Image Inpainting via Iteratively Decoupled Probabilistic Modeling
Wenbo Li
Xin Yu
Kun Zhou
Yibing Song
Zhe Lin
Jiaya Jia
DiffM
84
12
0
06 Dec 2022
M-VADER: A Model for Diffusion with Multimodal Context
Samuel Weinbach
Marco Bellagente
C. Eichenberg
Andrew M. Dai
R. Baldock
Souradeep Nanda
Bjorn Deiseroth
Koen Oostermeijer
H. Teufel
Andres Felipe Cruz Salinas
DiffM
189
11
0
06 Dec 2022
3D-LDM: Neural Implicit 3D Shape Generation with Latent Diffusion Models
Gimin Nam
Mariem Khlifi
Andrew Rodriguez
Alberto Tono
Linqi Zhou
Paul Guerrero
DiffM
78
68
0
01 Dec 2022
CLIPascene: Scene Sketching with Different Types and Levels of Abstraction
Yael Vinker
Yuval Alaluf
Daniel Cohen-Or
Ariel Shamir
CLIP
115
59
0
30 Nov 2022
Fast Inference from Transformers via Speculative Decoding
Yaniv Leviathan
Matan Kalman
Yossi Matias
LRM
155
738
0
30 Nov 2022
High-Fidelity Guided Image Synthesis with Latent Diffusion Models
Jaskirat Singh
Stephen Gould
Liang Zheng
DiffM
90
42
0
30 Nov 2022
Continuous diffusion for categorical data
Sander Dieleman
Laurent Sartran
Arman Roshannai
Nikolay Savinov
Yaroslav Ganin
...
Conor Durkan
Curtis Hawthorne
Rémi Leblond
Will Grathwohl
J. Adler
DiffM
121
106
0
28 Nov 2022
Unified Discrete Diffusion for Simultaneous Vision-Language Generation
Minghui Hu
Chuanxia Zheng
Heliang Zheng
Tat-Jen Cham
Chaoyue Wang
Zuopeng Yang
Dacheng Tao
Ponnuthurai Nagaratnam Suganthan
DiffM
131
26
0
27 Nov 2022
SpaText: Spatio-Textual Representation for Controllable Image Generation
Omri Avrahami
Thomas Hayes
Oran Gafni
Sonal Gupta
Yaniv Taigman
Devi Parikh
Dani Lischinski
Ohad Fried
Xiaoyue Yin
DiffM
133
210
0
25 Nov 2022
3DDesigner: Towards Photorealistic 3D Object Generation and Editing with Text-guided Diffusion Models
Gang Li
Heliang Zheng
Chaoyue Wang
Chang Li
C. Zheng
Dacheng Tao
DiffM
97
60
0
25 Nov 2022
Shifted Diffusion for Text-to-image Generation
Yufan Zhou
Bingchen Liu
Yizhe Zhu
Xiao Yang
Changyou Chen
Jinhui Xu
DiffM
135
45
0
24 Nov 2022
Paint by Example: Exemplar-based Image Editing with Diffusion Models
Binxin Yang
Shuyang Gu
Bo Zhang
Ting Zhang
Xuejin Chen
Xiaoyan Sun
Dong Chen
Fang Wen
DiffM
103
427
0
23 Nov 2022
Peekaboo: Text to Image Diffusion Models are Zero-Shot Segmentors
R. Burgert
Kanchana Ranasinghe
Xiang Li
Michael S. Ryoo
DiffM
VLM
88
38
0
23 Nov 2022
ReCo: Region-Controlled Text-to-Image Generation
Zhengyuan Yang
Jianfeng Wang
Zhe Gan
Linjie Li
Kevin Qinghong Lin
...
Nan Duan
Zicheng Liu
Ce Liu
Michael Zeng
Lijuan Wang
DiffM
105
150
0
23 Nov 2022
Inversion-Based Style Transfer with Diffusion Models
Yuxin Zhang
Nisha Huang
Fan Tang
Haibin Huang
Chongyang Ma
Weiming Dong
Changsheng Xu
DiffM
81
270
0
23 Nov 2022
Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video Generation
Tsu-Jui Fu
Licheng Yu
Ning Zhang
Cheng-Yang Fu
Jong-Chyi Su
William Yang Wang
Sean Bell
VGen
148
38
0
23 Nov 2022
Retrieval-Augmented Multimodal Language Modeling
Michihiro Yasunaga
Armen Aghajanyan
Weijia Shi
Rich James
J. Leskovec
Percy Liang
M. Lewis
Luke Zettlemoyer
Wen-tau Yih
RALM
104
108
0
22 Nov 2022
Human Evaluation of Text-to-Image Models on a Multi-Task Benchmark
Vitali Petsiuk
Alexander E. Siemenn
Saisamrit Surbehera
Zad Chin
Keith Tyser
...
Ori Kerret
Tonio Buonassisi
Kate Saenko
Armando Solar-Lezama
Iddo Drori
VLM
56
36
0
22 Nov 2022
SceneComposer: Any-Level Semantic Image Synthesis
Yu Zeng
Zhe Lin
Jianming Zhang
Qing Liu
John Collomosse
Jason Kuen
Vishal M. Patel
DiffM
63
50
0
21 Nov 2022
VectorFusion: Text-to-SVG by Abstracting Pixel-Based Diffusion Models
Ajay Jain
Amber Xie
Pieter Abbeel
DiffM
87
95
0
21 Nov 2022
MagicVideo: Efficient Video Generation With Latent Diffusion Models
Daquan Zhou
Weimin Wang
Hanshu Yan
Weiwei Lv
Yizhe Zhu
Jiashi Feng
DiffM
VGen
129
390
0
20 Nov 2022
Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models
Xichen Pan
Pengda Qin
Yuhong Li
Hui Xue
Wenhu Chen
DiffM
97
65
0
20 Nov 2022
Visual Programming: Compositional visual reasoning without training
Tanmay Gupta
Aniruddha Kembhavi
ReLM
VLM
LRM
171
439
0
18 Nov 2022
Is the Elephant Flying? Resolving Ambiguities in Text-to-Image Generative Models
Ninareh Mehrabi
Palash Goyal
Apurv Verma
Jwala Dhamala
Varun Kumar
Qian Hu
Kai-Wei Chang
R. Zemel
Aram Galstyan
Rahul Gupta
63
6
0
17 Nov 2022
Will Large-scale Generative Models Corrupt Future Datasets?
Ryuichiro Hataya
Han Bao
Hiromi Arai
59
58
0
15 Nov 2022
Cross-Reality Re-Rendering: Manipulating between Digital and Physical Realities
Siddhartha Datta
85
0
0
15 Nov 2022
A Novel Sampling Scheme for Text- and Image-Conditional Image Synthesis in Quantized Latent Spaces
Dominic Rampas
Pablo Pernias
Marc Aubreville
DiffM
61
12
0
14 Nov 2022
Large-Scale Bidirectional Training for Zero-Shot Image Captioning
Taehoon Kim
Mark A Marsden
Pyunghwan Ahn
Sangyun Kim
Sihaeng Lee
Alessandra Sala
S. Kim
VLM
62
4
0
13 Nov 2022
SSGVS: Semantic Scene Graph-to-Video Synthesis
Yuren Cong
Jinhui Yi
Bodo Rosenhahn
M. Yang
133
8
0
11 Nov 2022
Rickrolling the Artist: Injecting Backdoors into Text Encoders for Text-to-Image Synthesis
Lukas Struppek
Dominik Hintersdorf
Kristian Kersting
SILM
130
40
0
04 Nov 2022
Evaluating a Synthetic Image Dataset Generated with Stable Diffusion
Andreas Stöckl
82
23
0
03 Nov 2022
eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers
Yogesh Balaji
Seungjun Nah
Xun Huang
Arash Vahdat
Jiaming Song
...
Timo Aila
S. Laine
Bryan Catanzaro
Tero Karras
Xuan Li
VLM
MoE
213
832
0
02 Nov 2022
MagicMix: Semantic Mixing with Diffusion Models
Jun Hao Liew
Hanshu Yan
Daquan Zhou
Jiashi Feng
DiffM
233
64
0
28 Oct 2022
UPainting: Unified Text-to-Image Diffusion Generation with Cross-modal Guidance
Wei Li
Xue Xu
Xinyan Xiao
Jiacheng Liu
Hu Yang
...
Zhanpeng Wang
Zhifan Feng
Qiaoqiao She
Yajuan Lyu
Hua Wu
232
30
0
28 Oct 2022
Deep Generative Models on 3D Representations: A Survey
Zifan Shi
Sida Peng
Yinghao Xu
Andreas Geiger
Yiyi Liao
Yujun Shen
MedIm
3DV
98
0
0
27 Oct 2022
In-context Reinforcement Learning with Algorithm Distillation
Michael Laskin
Luyu Wang
Junhyuk Oh
Emilio Parisotto
Stephen Spencer
...
Ethan A. Brooks
Maxime Gazeau
Himanshu Sahni
Satinder Singh
Volodymyr Mnih
OffRL
82
133
0
25 Oct 2022
Lafite2: Few-shot Text-to-Image Generation
Yufan Zhou
Chunyuan Li
Changyou Chen
Jianfeng Gao
Jinhui Xu
DiffM
108
11
0
25 Oct 2022
Vitruvio: 3D Building Meshes via Single Perspective Sketches
Alberto Tono
Heyaojing Huang
Ashwin Agrawal
Martin Fischer
47
5
0
24 Oct 2022
Instance-Aware Image Completion
Ji-Ho Cho
Minguk Kang
Vibhav Vineet
Jaesik Park
ISeg
VLM
51
2
0
22 Oct 2022
SpaBERT: A Pretrained Language Model from Geographic Data for Geo-Entity Representation
Zekun Li
Jina Kim
Yao-Yi Chiang
Muhao Chen
133
31
0
21 Oct 2022
3DALL-E: Integrating Text-to-Image AI in 3D Design Workflows
Vivian Liu
Jo Vermeulen
G. Fitzmaurice
Justin Matejka
HAI
88
126
0
20 Oct 2022
Composing Ensembles of Pre-trained Models via Iterative Consensus
Shuang Li
Yilun Du
J. Tenenbaum
Antonio Torralba
Igor Mordatch
MoMe
73
25
0
20 Oct 2022
Transcending Scaling Laws with 0.1% Extra Compute
Yi Tay
Jason W. Wei
Hyung Won Chung
Vinh Q. Tran
David R. So
...
Donald Metzler
Slav Petrov
N. Houlsby
Quoc V. Le
Mostafa Dehghani
LRM
109
71
0
20 Oct 2022
OCR-VQGAN: Taming Text-within-Image Generation
Juan A. Rodriguez
David Vazquez
I. Laradji
M. Pedersoli
Pau Rodríguez López
152
20
0
19 Oct 2022
Optimizing Hierarchical Image VAEs for Sample Quality
Eric Luhman
Troy Luhman
DRL
75
5
0
18 Oct 2022
Large-scale Text-to-Image Generation Models for Visual Artists' Creative Works
Hyung-Kwon Ko
Gwanmo Park
Hyeon Jeon
Jaemin Jo
Juho Kim
Jinwook Seo
107
142
0
16 Oct 2022
LAION-5B: An open large-scale dataset for training next generation image-text models
Christoph Schuhmann
Romain Beaumont
Richard Vencu
Cade Gordon
Ross Wightman
...
Srivatsa Kundurthy
Katherine Crowson
Ludwig Schmidt
R. Kaczmarczyk
J. Jitsev
VLM
MLLM
CLIP
231
3,520
0
16 Oct 2022
Previous
1
2
3
...
16
17
18
Next