Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2206.10789
Cited By
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
22 June 2022
Jiahui Yu
Yuanzhong Xu
Jing Yu Koh
Thang Luong
Gunjan Baid
Zirui Wang
Vijay Vasudevan
Alexander Ku
Yinfei Yang
Burcu Karagol Ayan
Ben Hutchinson
Wei Han
Zarana Parekh
Xin Li
Han Zhang
Jason Baldridge
Yonghui Wu
EGVM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Scaling Autoregressive Models for Content-Rich Text-to-Image Generation"
50 / 899 papers shown
Title
Unified Reward Model for Multimodal Understanding and Generation
Yibin Wang
Yuhang Zang
Hao Li
Cheng Jin
Jiadong Wang
EGVM
165
11
0
07 Mar 2025
Frequency Autoregressive Image Generation with Continuous Tokens
Hu Yu
Hao Luo
Hangjie Yuan
Yu Rong
Feng Zhao
VGen
94
10
0
07 Mar 2025
Enhancing Vision-Language Compositional Understanding with Multimodal Synthetic Data
Haoxin Li
Boyang Li
CoGe
188
1
0
03 Mar 2025
CacheQuant: Comprehensively Accelerated Diffusion Models
Xuewen Liu
Zhikai Li
Qingyi Gu
DiffM
69
0
0
03 Mar 2025
MedUnifier: Unifying Vision-and-Language Pre-training on Medical Data with Vision Generation Task using Discrete Visual Representations
Ziyang Zhang
Yang Yu
Yucheng Chen
Xulei Yang
S. Yeo
MedIm
179
2
0
02 Mar 2025
Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation
Sucheng Ren
Qihang Yu
Ju He
Xiaohui Shen
Alan Yuille
Liang-Chieh Chen
VGen
223
11
0
27 Feb 2025
Speculative Decoding and Beyond: An In-Depth Survey of Techniques
Y. Hu
Zining Liu
Zhenyuan Dong
Tianfan Peng
Bradley McDanel
Shanghang Zhang
172
0
0
27 Feb 2025
Unified Prompt Attack Against Text-to-Image Generation Models
Duo Peng
Qiuhong Ke
Mark He Huang
Ping Hu
Jing Liu
89
1
0
23 Feb 2025
Data Attribution for Text-to-Image Models by Unlearning Synthesized Images
Sheng-Yu Wang
Aaron Hertzmann
Alexei A. Efros
Jun-Yan Zhu
Richard Zhang
TDI
209
3
0
21 Feb 2025
Accelerating Diffusion Transformers with Token-wise Feature Caching
Chang Zou
Xuyang Liu
Ting Liu
Siteng Huang
Linfeng Zhang
172
24
0
20 Feb 2025
EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling
Theodoros Kouzelis
Ioannis Kakogeorgiou
Spyros Gidaris
N. Komodakis
DRL
283
8
0
17 Feb 2025
UniMoD: Efficient Unified Multimodal Transformers with Mixture-of-Depths
Weijia Mao
Zhiyong Yang
Mike Zheng Shou
MoE
198
1
0
10 Feb 2025
LANTERN++: Enhancing Relaxed Speculative Decoding with Static Tree Drafting for Visual Auto-regressive Models
Sihwan Park
Doohyuk Jang
Sungyub Kim
Souvik Kundu
Eunho Yang
136
0
0
10 Feb 2025
FairT2I: Mitigating Social Bias in Text-to-Image Generation via Large Language Model-Assisted Detection and Attribute Rebalancing
Jinya Sakurai
Issei Sato
151
1
0
06 Feb 2025
HuViDPO:Enhancing Video Generation through Direct Preference Optimization for Human-Centric Alignment
Lifan Jiang
Boxi Wu
Jiahui Zhang
Xiaotong Guan
Shuang Chen
VGen
95
1
0
02 Feb 2025
CAT Pruning: Cluster-Aware Token Pruning For Text-to-Image Diffusion Models
Xinle Cheng
Zhuoming Chen
Zhihao Jia
DiffM
VLM
70
1
0
01 Feb 2025
PreciseCam: Precise Camera Control for Text-to-Image Generation
Edurne Bernal-Berdun
Ana Serrano
B. Masiá
Matheus Gadelha
Yannick Hold-Geoffroy
Xin Sun
Diego F. F. Gutierrez
DiffM
VGen
102
1
0
22 Jan 2025
VARGPT: Unified Understanding and Generation in a Visual Autoregressive Multimodal Large Language Model
Xianwei Zhuang
Yuxin Xie
Yufan Deng
Liming Liang
Jinghan Ru
Yuguo Yin
Yuexian Zou
MLLM
VLM
LRM
170
11
0
21 Jan 2025
A Comprehensive Survey of Foundation Models in Medicine
Wasif Khan
Seowung Leem
Kyle B. See
Joshua K. Wong
Shaoting Zhang
R. Fang
AI4CE
LM&MA
VLM
284
27
0
17 Jan 2025
Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens
Dongwon Kim
Ju He
Qihang Yu
Chenglin Yang
Xiaohui Shen
Suha Kwak
Liang-Chieh Chen
VLM
137
11
0
13 Jan 2025
Focus-N-Fix: Region-Aware Fine-Tuning for Text-to-Image Generation
Xiaoying Xing
Avinab Saha
Junfeng He
Susan Hao
Paul Vicol
...
Sahil Singla
Sarah Young
Yinxiao Li
Feng Yang
Deepak Ramachandran
DiffM
116
1
0
11 Jan 2025
INFELM: In-depth Fairness Evaluation of Large Text-To-Image Models
Di Jin
Xing Liu
Yu Liu
Jia Qing Yap
Andrea Wong
Adriana Crespo
Qi Lin
Zhiyuan Yin
Qiang Yan
Ryan Ye
EGVM
VLM
500
0
0
10 Jan 2025
EditAR: Unified Conditional Generation with Autoregressive Models
Jiteng Mu
Nuno Vasconcelos
Xinyu Wang
DiffM
89
6
0
08 Jan 2025
Learning the Language of Protein Structure
Benoit Gaujac
Jérémie Donà
Liviu Copoiu
Timothy Atkinson
Thomas Pierrot
Thomas D. Barrett
99
12
0
08 Jan 2025
Ethical-Lens: Curbing Malicious Usages of Open-Source Text-to-Image Models
Yuzhu Cai
Sheng Yin
Yuxi Wei
Chenxin Xu
Weibo Mao
Felix Juefei Xu
Siheng Chen
Yanfeng Wang
EGVM
200
3
0
03 Jan 2025
TexAVi: Generating Stereoscopic VR Video Clips from Text Descriptions
Vriksha Srihari
R. Bhavya
Shruti Jayaraman
V. Mary Anita Rajam
DiffM
VGen
128
0
0
02 Jan 2025
Grid Diffusion Models for Text-to-Video Generation
Taegyeong Lee
Soyeong Kwon
Taehwan Kim
155
8
0
31 Dec 2024
DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers
Yuntao Chen
Yuqi Wang
Zhaoxiang Zhang
465
11
0
24 Dec 2024
When Worse is Better: Navigating the compression-generation tradeoff in visual tokenization
Vivek Ramanujan
Kushal Tirumala
Armen Aghajanyan
Luke Zettlemoyer
Ali Farhadi
DiffM
124
3
0
20 Dec 2024
Next Patch Prediction for Autoregressive Visual Generation
Yatian Pang
Peng Jin
Shuo Yang
Bin Lin
Bin Zhu
...
Liuhan Chen
Francis E. H. Tay
Ser-Nam Lim
Harry Yang
Li Yuan
253
10
0
19 Dec 2024
Parallelized Autoregressive Visual Generation
Yanjie Wang
Shuhuai Ren
Zhijie Lin
Yujin Han
Haoyuan Guo
Zhenheng Yang
Difan Zou
Jiashi Feng
Xihui Liu
VGen
191
17
0
19 Dec 2024
Dialogue with the Machine and Dialogue with the Art World: Evaluating Generative AI for Culturally-Situated Creativity
Rida Qadri
Piotr Mirowski
Aroussiak Gabriellan
Farbod Mehr
Huma Gupta
Pamela Karimi
Remi Denton
119
1
0
18 Dec 2024
Dual-Schedule Inversion: Training- and Tuning-Free Inversion for Real Image Editing
Jiancheng Huang
Yi Huang
Jianzhuang Liu
Donghao Zhou
Yang Liu
Shifeng Chen
DiffM
156
2
0
15 Dec 2024
Mojito: Motion Trajectory and Intensity Control for Video Generation
Xuehai He
Shuohang Wang
Jianwei Yang
Xiaoxia Wu
Yansen Wang
Kuan-Chieh Wang
Z. Zhan
Olatunji Ruwase
Yelong Shen
Xinze Wang
VGen
236
2
0
12 Dec 2024
[MASK] is All You Need
Vincent Tao Hu
Bjorn Ommer
DiffM
214
5
0
09 Dec 2024
Nested Diffusion Models Using Hierarchical Latent Priors
Xiao Zhang
Ruoxi Jiang
Rebecca Willett
Michael Maire
BDL
DiffM
118
1
0
08 Dec 2024
T2I-FactualBench: Benchmarking the Factuality of Text-to-Image Models with Knowledge-Intensive Concepts
Ziwei Huang
Wanggui He
Quanyu Long
Yandi Wang
Haoyuan Li
...
Fangxun Shu
Long Chen
Hao Jiang
Leilei Gan
Leilei Gan
EGVM
521
4
0
05 Dec 2024
MFTF: Mask-free Training-free Object Level Layout Control Diffusion Model
Shan Yang
DiffM
85
0
0
02 Dec 2024
Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis
Anton Voronov
Denis Kuznedelev
Mikhail Khoroshikh
Valentin Khrulkov
Dmitry Baranchuk
261
4
0
02 Dec 2024
IQA-Adapter: Exploring Knowledge Transfer from Image Quality Assessment to Diffusion-based Generative Models
Khaled Abud
Sergey Lavrushkin
Alexey Kirillov
D. Vatolin
216
0
0
02 Dec 2024
DyMO: Training-Free Diffusion Model Alignment with Dynamic Multi-Objective Scheduling
Xin Xie
Dong Gong
179
1
0
01 Dec 2024
Continuous Concepts Removal in Text-to-image Diffusion Models
Tingxu Han
Weisong Sun
Yanrong Hu
Chunrong Fang
Yonglong Zhang
Shiqing Ma
Tao Zheng
Zhenyu Chen
Zhenting Wang
DiffM
190
3
0
30 Nov 2024
DreamBlend: Advancing Personalized Fine-tuning of Text-to-Image Diffusion Models
Shwetha Ram
T. Neiman
Qianli Feng
Andrew Stuart
S. D. Tran
Trishul Chilimbi
133
2
0
28 Nov 2024
Self-Cross Diffusion Guidance for Text-to-Image Synthesis of Similar Subjects
Weimin Qiu
Jieke Wang
Meng Tang
DiffM
185
1
0
28 Nov 2024
Enhancing MMDiT-Based Text-to-Image Models for Similar Subject Generation
Tianyi Wei
Dongdong Chen
Yifan Zhou
Xingang Pan
EGVM
137
3
0
27 Nov 2024
ModeDreamer: Mode Guiding Score Distillation for Text-to-3D Generation using Reference Image Prompts
Uy Dieu Tran
Minh Luu
P. Nguyen
K. Nguyen
Binh-Son Hua
DiffM
139
1
0
27 Nov 2024
Noise Diffusion for Enhancing Semantic Faithfulness in Text-to-Image Synthesis
Boming Miao
Cuiping Li
Xiaobei Wang
Andi Zhang
Rui Sun
Zizhe Wang
Yao Zhu
DiffM
126
0
0
25 Nov 2024
SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE
Yongwei Chen
Yushi Lan
Shangchen Zhou
Tengfei Wang
Xingang Pan
251
6
0
25 Nov 2024
TPIE: Topology-Preserved Image Editing With Text Instructions
Nivetha Jayakumar
Srivardhan Reddy Gadila
Tonmoy Hossain
Yangfeng Ji
Miaomiao Zhang
DiffM
MedIm
139
0
0
22 Nov 2024
Text Embedding is Not All You Need: Attention Control for Text-to-Image Semantic Alignment with Text Self-Attention Maps
Jeeyung Kim
Erfan Esmaeili
Qiang Qiu
DiffM
137
1
0
21 Nov 2024
Previous
1
2
3
4
5
6
...
16
17
18
Next