Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2206.10789
Cited By
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
22 June 2022
Jiahui Yu
Yuanzhong Xu
Jing Yu Koh
Thang Luong
Gunjan Baid
Zirui Wang
Vijay Vasudevan
Alexander Ku
Yinfei Yang
Burcu Karagol Ayan
Ben Hutchinson
Wei Han
Zarana Parekh
Xin Li
Han Zhang
Jason Baldridge
Yonghui Wu
EGVM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Scaling Autoregressive Models for Content-Rich Text-to-Image Generation"
50 / 899 papers shown
Title
Online Detection of AI-Generated Images
David C. Epstein
Ishan Jain
Oliver Wang
Richard Y. Zhang
72
60
0
23 Oct 2023
Matryoshka Diffusion Models
Jiatao Gu
Shuangfei Zhai
Yizhen Zhang
Joshua M. Susskind
Navdeep Jaitly
DiffM
102
47
0
23 Oct 2023
Bridging the Gap between Synthetic and Authentic Images for Multimodal Machine Translation
Wenyu Guo
Qingkai Fang
Dong Yu
Yang Feng
75
7
0
20 Oct 2023
Object-aware Inversion and Reassembly for Image Editing
Zhen Yang
Dinggang Gui
Wen Wang
Hao Chen
Bohan Zhuang
Chunhua Shen
DiffM
102
19
0
18 Oct 2023
Getting aligned on representational alignment
Ilia Sucholutsky
Lukas Muttenthaler
Adrian Weller
Andi Peng
Andreea Bobu
...
Thomas Unterthiner
Andrew Kyle Lampinen
Klaus-Robert Muller
M. Toneva
Thomas Griffiths
158
93
0
18 Oct 2023
DiagrammerGPT: Generating Open-Domain, Open-Platform Diagrams via LLM Planning
Abhaysinh Zala
Han Lin
Jaemin Cho
Mohit Bansal
91
16
0
18 Oct 2023
Scalable Diffusion for Materials Generation
Mengjiao Yang
KwangHwan Cho
Amil Merchant
Pieter Abbeel
Dale Schuurmans
Igor Mordatch
E. D. Cubuk
85
43
0
18 Oct 2023
GenEval: An Object-Focused Framework for Evaluating Text-to-Image Alignment
Dhruba Ghosh
Hanna Hajishirzi
Ludwig Schmidt
98
202
0
17 Oct 2023
Elucidating The Design Space of Classifier-Guided Diffusion Generation
Jiajun Ma
Tianyang Hu
Wei Cao
Jiacheng Sun
102
9
0
17 Oct 2023
A Survey on Video Diffusion Models
Zhen Xing
Qijun Feng
Haoran Chen
Qi Dai
Hang-Rui Hu
Hang Xu
Zuxuan Wu
Yu-Gang Jiang
EGVM
VGen
179
139
0
16 Oct 2023
LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts
Hanan Gani
Shariq Farooq Bhat
Muzammal Naseer
Salman Khan
Peter Wonka
DiffM
106
44
0
16 Oct 2023
Large Models for Time Series and Spatio-Temporal Data: A Survey and Outlook
Ming Jin
Qingsong Wen
Yuxuan Liang
Chaoli Zhang
Siqiao Xue
...
Shirui Pan
Vincent S. Tseng
Yu Zheng
Lei Chen
Hui Xiong
AI4TS
SyDa
168
123
0
16 Oct 2023
LL-VQ-VAE: Learnable Lattice Vector-Quantization For Efficient Representations
Ahmed Khalil
Robert Piechocki
Raúl Santos-Rodríguez
54
2
0
13 Oct 2023
Compositional Abilities Emerge Multiplicatively: Exploring Diffusion Models on a Synthetic Task
Maya Okawa
Ekdeep Singh Lubana
Robert P. Dick
Hidenori Tanaka
CoGe
DiffM
119
65
0
13 Oct 2023
HyperHuman: Hyper-Realistic Human Generation with Latent Structural Diffusion
Xian Liu
Jian Ren
Aliaksandr Siarohin
Ivan Skorokhodov
Yanyu Li
Dahua Lin
Xihui Liu
Ziwei Liu
Sergey Tulyakov
86
61
0
12 Oct 2023
Idea2Img: Iterative Self-Refinement with GPT-4V(ision) for Automatic Image Design and Generation
Zhengyuan Yang
Jianfeng Wang
Linjie Li
Kevin Qinghong Lin
Chung-Ching Lin
Zicheng Liu
Lijuan Wang
LRM
MLLM
DiffM
35
25
0
12 Oct 2023
DrivingDiffusion: Layout-Guided multi-view driving scene video generation with latent diffusion model
Xiaofan Li
Yifu Zhang
Xiaoqing Ye
VGen
125
78
0
11 Oct 2023
Transformers for Green Semantic Communication: Less Energy, More Semantics
Shubhabrata Mukherjee
Cory Beard
Sejun Song
55
2
0
11 Oct 2023
FLATTEN: optical FLow-guided ATTENtion for consistent text-to-video editing
Yuren Cong
Mengmeng Xu
Christian Simon
Shoufa Chen
Jiawei Ren
Yanping Xie
Juan-Manuel Perez-Rua
Bodo Rosenhahn
Tao Xiang
Sen He
DiffM
VGen
122
87
0
09 Oct 2023
Language Model Beats Diffusion -- Tokenizer is Key to Visual Generation
Lijun Yu
José Lezama
N. B. Gundavarapu
Luca Versari
Kihyuk Sohn
...
Boqing Gong
Ming-Hsuan Yang
Irfan Essa
David A. Ross
Lu Jiang
137
325
0
09 Oct 2023
Perceptual Artifacts Localization for Image Synthesis Tasks
Lingzhi Zhang
Zhengjie Xu
Connelly Barnes
Yuqian Zhou
Qing Liu
He Zhang
Sohrab Amirghodsi
Zhe Lin
Eli Shechtman
Jianbo Shi
DiffM
87
27
0
09 Oct 2023
Demystifying Embedding Spaces using Large Language Models
Guy Tennenholtz
Yinlam Chow
Chih-Wei Hsu
Jihwan Jeong
Lior Shani
Azamat Tulepbergenov
Deepak Ramachandran
Martin Mladenov
Craig Boutilier
57
14
0
06 Oct 2023
Leveraging Unpaired Data for Vision-Language Generative Models via Cycle Consistency
Tianhong Li
Sangnie Bhardwaj
Yonglong Tian
Han Zhang
Jarred Barber
Dina Katabi
Guillaume Lajoie
Huiwen Chang
Dilip Krishnan
VLM
88
5
0
05 Oct 2023
Kandinsky: an Improved Text-to-Image Synthesis with Image Prior and Latent Diffusion
Anton Razzhigaev
Arseniy Shakhmatov
Anastasia Maltseva
V.Ya. Arkhipkin
Igor Pavlov
Ilya Ryabov
Angelina Kuts
Alexander Panchenko
Andrey Kuznetsov
Denis Dimitrov
118
82
0
05 Oct 2023
Hate Speech Detection in Limited Data Contexts using Synthetic Data Generation
Aman Khullar
Daniel K. Nkemelu
Cuong V. Nguyen
Michael L. Best
77
5
0
04 Oct 2023
The Role of Linguistic Priors in Measuring Compositional Generalization of Vision-Language Models
Chenwei Wu
Erran L. Li
Stefano Ermon
Patrick Haffner
Rong Ge
Zaiwei Zhang
VLM
CoGe
106
1
0
04 Oct 2023
MiniGPT-5: Interleaved Vision-and-Language Generation via Generative Vokens
Kaizhi Zheng
Xuehai He
Xin Eric Wang
MLLM
137
99
0
03 Oct 2023
TP2O: Creative Text Pair-to-Object Generation using Balance Swap-Sampling
Jun Li
Zedong Zhang
Jian Yang
DiffM
83
7
0
03 Oct 2023
Text-image Alignment for Diffusion-based Perception
Neehar Kondapaneni
Markus Marks
Manuel Knott
Rogério Guimarães
Pietro Perona
VLM
DiffM
118
34
0
29 Sep 2023
Intriguing properties of generative classifiers
P. Jaini
Kevin Clark
Robert Geirhos
BDL
106
39
0
28 Sep 2023
KV Inversion: KV Embeddings Learning for Text-Conditioned Real Image Action Editing
Jiarui Yao
Yifan Liu
Simon S. Du
Shifeng Chen
DiffM
64
24
0
28 Sep 2023
MotionLM: Multi-Agent Motion Forecasting as Language Modeling
Ari Seff
Brian Cera
Dian Chen
Mason Ng
Aurick Zhou
Nigamaa Nayakanti
Khaled S. Refaat
Rami Al-Rfou
Benjamin Sapp
75
103
0
28 Sep 2023
Diverse and Aligned Audio-to-Video Generation via Text-to-Video Model Adaptation
Guy Yariv
Itai Gat
Sagie Benaim
Lior Wolf
Idan Schwartz
Yossi Adi
DiffM
VGen
108
41
0
28 Sep 2023
Targeted Image Data Augmentation Increases Basic Skills Captioning Robustness
Valentin Barriere
Felipe del Rio
Andres Carvallo De Ferari
Carlos Aspillaga
Eugenio Herrera-Berg
Cristian Buc Calderon
DiffM
65
0
0
27 Sep 2023
Emu: Enhancing Image Generation Models Using Photogenic Needles in a Haystack
Xiaoliang Dai
Ji Hou
Chih-Yao Ma
Sam S. Tsai
Jialiang Wang
...
Roshan Sumbaly
Vignesh Ramanathan
Zijian He
Peter Vajda
Devi Parikh
VLM
91
216
0
27 Sep 2023
Jointly Training Large Autoregressive Multimodal Models
Emanuele Aiello
L. Yu
Yixin Nie
Armen Aghajanyan
Barlas Oğuz
123
31
0
27 Sep 2023
FEC: Three Finetuning-free Methods to Enhance Consistency for Real Image Editing
Songyan Chen
Jiancheng Huang
DiffM
43
14
0
26 Sep 2023
Text-to-Image Generation for Abstract Concepts
Jiayi Liao
Xu Chen
Qiang Fu
Lun Du
Xiangnan He
Xiang Wang
Shi Han
Dongmei Zhang
111
16
0
26 Sep 2023
DreamLLM: Synergistic Multimodal Comprehension and Creation
Runpei Dong
Chunrui Han
Yuang Peng
Zekun Qi
Zheng Ge
...
Hao-Ran Wei
Xiangwen Kong
Xiangyu Zhang
Kaisheng Ma
Li Yi
MLLM
109
199
0
20 Sep 2023
Looking at words and points with attention: a benchmark for text-to-shape coherence
Andrea Amaduzzi
Giuseppe Lisanti
Samuele Salti
Luigi Di Stefano
49
2
0
14 Sep 2023
Masked Generative Modeling with Enhanced Sampling Scheme
Daesoo Lee
Erlend Aune
Sara Malacarne
DiffM
52
3
0
14 Sep 2023
InstaFlow: One Step is Enough for High-Quality Diffusion-Based Text-to-Image Generation
Xingchao Liu
Xiwen Zhang
Jianzhu Ma
Jian Peng
Qiang Liu
191
223
0
12 Sep 2023
PhotoVerse: Tuning-Free Image Customization with Text-to-Image Diffusion Models
Li Chen
Mengyi Zhao
Yiheng Liu
Mingxu Ding
Yangyang Song
...
Xu Wang
Hao Yang
Jing Liu
Kang Du
Min Zheng
DiffM
75
55
0
11 Sep 2023
ITI-GEN: Inclusive Text-to-Image Generation
Cheng Zhang
Xuanbai Chen
Siqi Chai
Chen Henry Wu
Dmitry Lagun
Thabo Beeler
Fernando de la Torre
VLM
122
58
0
11 Sep 2023
NExT-GPT: Any-to-Any Multimodal LLM
Shengqiong Wu
Hao Fei
Leigang Qu
Wei Ji
Tat-Seng Chua
MLLM
115
507
0
11 Sep 2023
Unified Language-Vision Pretraining in LLM with Dynamic Discrete Visual Tokenization
Yang Jin
Kun Xu
Kun Xu
Liwei Chen
Chao Liao
...
Xiaoqiang Lei
Di Zhang
Wenwu Ou
Kun Gai
Yadong Mu
MLLM
VLM
79
50
0
09 Sep 2023
Exploring Sparse MoE in GANs for Text-conditioned Image Synthesis
Jiapeng Zhu
Ceyuan Yang
Kecheng Zheng
Yinghao Xu
Zifan Shi
Yujun Shen
MoE
97
8
0
07 Sep 2023
Reuse and Diffuse: Iterative Denoising for Text-to-Video Generation
Jiaxi Gu
Shicong Wang
Haoyu Zhao
Tianyi Lu
Xing Zhang
Zuxuan Wu
Songcen Xu
Wei Zhang
Yu-Gang Jiang
Hang Xu
DiffM
VGen
82
48
0
07 Sep 2023
Scaling Autoregressive Multi-Modal Models: Pretraining and Instruction Tuning
L. Yu
Bowen Shi
Ramakanth Pasunuru
Benjamin Muller
O. Yu. Golovneva
...
Yaniv Taigman
Maryam Fazel-Zarandi
Asli Celikyilmaz
Luke Zettlemoyer
Armen Aghajanyan
MLLM
101
142
0
05 Sep 2023
Breaking Barriers to Creative Expression: Co-Designing and Implementing an Accessible Text-to-Image Interface
Atieh Taheri
Mohammad Izadi
Gururaj Shriram
Negar Rostamzadeh
Shaun Kane
DiffM
51
2
0
05 Sep 2023
Previous
1
2
3
...
10
11
12
...
16
17
18
Next