ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.10789
  4. Cited By
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

22 June 2022
Jiahui Yu
Yuanzhong Xu
Jing Yu Koh
Thang Luong
Gunjan Baid
Zirui Wang
Vijay Vasudevan
Alexander Ku
Yinfei Yang
Burcu Karagol Ayan
Ben Hutchinson
Wei Han
Zarana Parekh
Xin Li
Han Zhang
Jason Baldridge
Yonghui Wu
    EGVM
ArXiv (abs)PDFHTML

Papers citing "Scaling Autoregressive Models for Content-Rich Text-to-Image Generation"

50 / 899 papers shown
Title
World Model on Million-Length Video And Language With Blockwise RingAttention
World Model on Million-Length Video And Language With Blockwise RingAttention
Hao Liu
Wilson Yan
Matei A. Zaharia
Pieter Abbeel
VGen
140
85
0
13 Feb 2024
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model
  on 100K hours of data
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data
Mateusz Lajszczak
Guillermo Cámbara
Yang Li
Fatih Beyhan
Arent van Korlaar
...
Bartosz Putrycz
Soledad López Gambino
Kayeon Yoo
Elena Sokolova
Thomas Drugman
LM&MA
113
88
0
12 Feb 2024
Rolling Diffusion Models
Rolling Diffusion Models
David Ruhe
Jonathan Heek
Tim Salimans
Emiel Hoogeboom
DiffM
100
41
0
12 Feb 2024
Human Aesthetic Preference-Based Large Text-to-Image Model
  Personalization: Kandinsky Generation as an Example
Human Aesthetic Preference-Based Large Text-to-Image Model Personalization: Kandinsky Generation as an Example
Aven Le Zhou
Yu-Ao Wang
Wei Wu
Kang Zhang
50
1
0
09 Feb 2024
MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis
MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis
Dewei Zhou
You Li
Fan Ma
Zongxin Yang
Yi Yang
DiffM
96
61
0
08 Feb 2024
CapHuman: Capture Your Moments in Parallel Universes
CapHuman: Capture Your Moments in Parallel Universes
Chao Liang
Fan Ma
Linchao Zhu
Yingying Deng
Yi Yang
DiffM
80
23
0
01 Feb 2024
Spatial-Aware Latent Initialization for Controllable Image Generation
Spatial-Aware Latent Initialization for Controllable Image Generation
Wenqiang Sun
Tengtao Li
Zehong Lin
Jun Zhang
94
11
0
29 Jan 2024
BootPIG: Bootstrapping Zero-shot Personalized Image Generation
  Capabilities in Pretrained Diffusion Models
BootPIG: Bootstrapping Zero-shot Personalized Image Generation Capabilities in Pretrained Diffusion Models
Senthil Purushwalkam
Akash Gokul
Shafiq Joty
Nikhil Naik
DiffM
78
19
0
25 Jan 2024
Large-scale Reinforcement Learning for Diffusion Models
Large-scale Reinforcement Learning for Diffusion Models
Yinan Zhang
Eric Tzeng
Yilun Du
Dmitry Kislyuk
VLM
90
40
0
20 Jan 2024
Make-A-Shape: a Ten-Million-scale 3D Shape Model
Make-A-Shape: a Ten-Million-scale 3D Shape Model
Ka-Hei Hui
Aditya Sanghi
Arianna Rampini
Kamal Rahimi Malekshan
Zhengzhe Liu
Hooman Shayani
Chi-Wing Fu
DiffM
123
18
0
20 Jan 2024
MM-Interleaved: Interleaved Image-Text Generative Modeling via
  Multi-modal Feature Synchronizer
MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer
Changyao Tian
Xizhou Zhu
Yuwen Xiong
Weiyun Wang
Zhe Chen
...
Tong Lu
Jie Zhou
Hongsheng Li
Yu Qiao
Jifeng Dai
AuLLM
145
49
0
18 Jan 2024
DiffusionGPT: LLM-Driven Text-to-Image Generation System
DiffusionGPT: LLM-Driven Text-to-Image Generation System
Jie Qin
Jie Wu
Weifeng Chen
Yuxi Ren
Huixian Li
Hefeng Wu
Xuefeng Xiao
Rui Wang
S. Wen
DiffM
105
35
0
18 Jan 2024
WorldDreamer: Towards General World Models for Video Generation via
  Predicting Masked Tokens
WorldDreamer: Towards General World Models for Video Generation via Predicting Masked Tokens
Xiaofeng Wang
Zheng Zhu
Guan Huang
Boyuan Wang
Xinze Chen
Jiwen Lu
VGen
76
41
0
18 Jan 2024
HexaGen3D: StableDiffusion is just one step away from Fast and Diverse
  Text-to-3D Generation
HexaGen3D: StableDiffusion is just one step away from Fast and Diverse Text-to-3D Generation
Antoine Mercier
Ramin Nakhli
Mahesh Reddy
R. Yasarla
Hong Cai
Fatih Porikli
Guillaume Berger
DiffM
97
16
0
15 Jan 2024
Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for
  Text-to-Image Generation
Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation
Seung Hyun Lee
Yinxiao Li
Junjie Ke
Innfarn Yoo
Han Zhang
...
Junfeng He
Gang Li
Sangpil Kim
Irfan Essa
Feng Yang
EGVM
100
24
0
11 Jan 2024
Concept Alignment
Concept Alignment
Sunayana Rane
Polyphony J. Bruna
Ilia Sucholutsky
Christopher Kello
Thomas Griffiths
CVBM
69
8
0
09 Jan 2024
Uni3D-LLM: Unifying Point Cloud Perception, Generation and Editing with
  Large Language Models
Uni3D-LLM: Unifying Point Cloud Perception, Generation and Editing with Large Language Models
Dingning Liu
Xiaoshui Huang
Yuenan Hou
Zhihui Wang
Zhen-fei Yin
Yongshun Gong
Peng Gao
Wanli Ouyang
49
11
0
09 Jan 2024
Improving Diffusion-Based Image Synthesis with Context Prediction
Improving Diffusion-Based Image Synthesis with Context Prediction
Ling Yang
Jingwei Liu
Shenda Hong
Zhilong Zhang
Zhilin Huang
Zheming Cai
Wentao Zhang
Tengjiao Wang
DiffM
89
36
0
04 Jan 2024
Instruct-Imagen: Image Generation with Multi-modal Instruction
Instruct-Imagen: Image Generation with Multi-modal Instruction
Hexiang Hu
Kelvin C. K. Chan
Yu-Chuan Su
Wenhu Chen
Yandong Li
...
Xue Ben
Boqing Gong
William W. Cohen
Ming-Wei Chang
Xuhui Jia
MLLM
139
50
0
03 Jan 2024
Image Sculpting: Precise Object Editing with 3D Geometry Control
Image Sculpting: Precise Object Editing with 3D Geometry Control
Jiraphon Yenphraphai
Xichen Pan
Sainan Liu
Daniele Panozzo
Saining Xie
75
22
0
02 Jan 2024
Improving Image Restoration through Removing Degradations in Textual
  Representations
Improving Image Restoration through Removing Degradations in Textual Representations
Jingbo Lin
Zhilu Zhang
Yuxiang Wei
Dongwei Ren
Dongsheng Jiang
Wangmeng Zuo
72
30
0
28 Dec 2023
ZONE: Zero-Shot Instruction-Guided Local Editing
ZONE: Zero-Shot Instruction-Guided Local Editing
Shanglin Li
Bo-Wen Zeng
Yutang Feng
Sicheng Gao
Xuhui Liu
...
Li Lin
Xu Tang
Yao Hu
Jianzhuang Liu
Baochang Zhang
DiffM
101
35
0
28 Dec 2023
Semantic Guidance Tuning for Text-To-Image Diffusion Models
Hyun Kang
Dohae Lee
Myungjin Shin
In-Kwon Lee
51
1
0
26 Dec 2023
Cross Initialization for Personalized Text-to-Image Generation
Cross Initialization for Personalized Text-to-Image Generation
Lianyu Pang
Jian Yin
Haoran Xie
Qiping Wang
Qing Li
Xudong Mao
DiffM
100
7
0
26 Dec 2023
Emage: Non-Autoregressive Text-to-Image Generation
Emage: Non-Autoregressive Text-to-Image Generation
Zhangyin Feng
Runyi Hu
Liangxin Liu
Fan Zhang
Duyu Tang
Yong Dai
Xiaocheng Feng
Jiwei Li
Bing Qin
Shuming Shi
DiffMVLM
78
0
0
22 Dec 2023
Generative AI Beyond LLMs: System Implications of Multi-Modal Generation
Generative AI Beyond LLMs: System Implications of Multi-Modal Generation
Alicia Golden
Samuel Hsia
Fei Sun
Bilge Acun
Basil Hosmer
...
Zachary DeVito
Jeff Johnson
Gu-Yeon Wei
David Brooks
Carole-Jean Wu
VLMDiffM
118
8
0
22 Dec 2023
VideoPoet: A Large Language Model for Zero-Shot Video Generation
VideoPoet: A Large Language Model for Zero-Shot Video Generation
Dan Kondratyuk
Lijun Yu
Xiuye Gu
José Lezama
Jonathan Huang
...
Irfan Essa
Huisheng Wang
David A. Ross
Bryan Seybold
Lu Jiang
VGen
152
273
0
21 Dec 2023
DreamTuner: Single Image is Enough for Subject-Driven Generation
DreamTuner: Single Image is Enough for Subject-Driven Generation
Miao Hua
Jiawei Liu
Fei Ding
Wei Liu
Jie Wu
Qian He
70
31
0
21 Dec 2023
StarVector: Generating Scalable Vector Graphics Code from Images and Text
StarVector: Generating Scalable Vector Graphics Code from Images and Text
Juan A. Rodriguez
Shubham Agarwal
I. Laradji
Pau Rodríguez
P. Rodríguez
Sai Rajeswar
David Vazquez
Christopher Pal
M. Pedersoli
102
11
0
17 Dec 2023
Rich Human Feedback for Text-to-Image Generation
Rich Human Feedback for Text-to-Image Generation
Youwei Liang
Junfeng He
Gang Li
Peizhao Li
Arseniy Klimovskiy
...
Yiwen Luo
Yang Li
Kai Kohlhoff
Deepak Ramachandran
Vidhya Navalpakkam
EGVM
83
86
0
15 Dec 2023
HeadArtist: Text-conditioned 3D Head Generation with Self Score
  Distillation
HeadArtist: Text-conditioned 3D Head Generation with Self Score Distillation
Hongyu Liu
Xuan Wang
Bo Liu
Yujun Shen
Yibing Song
Jing Liao
Qifeng Chen
DiffM
102
17
0
12 Dec 2023
Photorealistic Video Generation with Diffusion Models
Photorealistic Video Generation with Diffusion Models
Agrim Gupta
Lijun Yu
Kihyuk Sohn
Xiuye Gu
Meera Hahn
Fei-Fei Li
Irfan Essa
Lu Jiang
José Lezama
VGen
135
201
0
11 Dec 2023
4M: Massively Multimodal Masked Modeling
4M: Massively Multimodal Masked Modeling
David Mizrahi
Roman Bachmann
Ouguzhan Fatih Kar
Teresa Yeo
Mingfei Gao
Afshin Dehghan
Amir Zamir
MLLM
99
74
0
11 Dec 2023
ControlNet-XS: Designing an Efficient and Effective Architecture for
  Controlling Text-to-Image Diffusion Models
ControlNet-XS: Designing an Efficient and Effective Architecture for Controlling Text-to-Image Diffusion Models
Denis Zavadski
Johann-Friedrich Feiden
Carsten Rother
DiffM
81
8
0
11 Dec 2023
Stellar: Systematic Evaluation of Human-Centric Personalized
  Text-to-Image Methods
Stellar: Systematic Evaluation of Human-Centric Personalized Text-to-Image Methods
Panos Achlioptas
Alexandros Benetatos
Iordanis Fostiropoulos
Dimitris Skourtis
119
9
0
11 Dec 2023
TabMT: Generating tabular data with masked transformers
TabMT: Generating tabular data with masked transformers
Manbir Gulati
Paul F. Roysdon
LMTD
99
38
0
11 Dec 2023
CONFORM: Contrast is All You Need For High-Fidelity Text-to-Image
  Diffusion Models
CONFORM: Contrast is All You Need For High-Fidelity Text-to-Image Diffusion Models
Tuna Han Salih Meral
Enis Simsar
Federico Tombari
Pinar Yanardag
DiffMVLM
116
34
0
11 Dec 2023
A Video is Worth 256 Bases: Spatial-Temporal Expectation-Maximization
  Inversion for Zero-Shot Video Editing
A Video is Worth 256 Bases: Spatial-Temporal Expectation-Maximization Inversion for Zero-Shot Video Editing
Maomao Li
Yu Li
Tianyu Yang
Yunfei Liu
Dongxu Yue
Zhihui Lin
Dong Xu
VGen
36
9
0
10 Dec 2023
SmartMask: Context Aware High-Fidelity Mask Generation for Fine-grained
  Object Insertion and Layout Control
SmartMask: Context Aware High-Fidelity Mask Generation for Fine-grained Object Insertion and Layout Control
Jaskirat Singh
Jianming Zhang
Qing Liu
Cameron Smith
Zhe Lin
Liang Zheng
DiffM
80
11
0
08 Dec 2023
Scaling Laws of Synthetic Images for Model Training ... for Now
Scaling Laws of Synthetic Images for Model Training ... for Now
Lijie Fan
Kaifeng Chen
Dilip Krishnan
Dina Katabi
Phillip Isola
Yonglong Tian
CLIPVLM
82
68
0
07 Dec 2023
Gen2Det: Generate to Detect
Gen2Det: Generate to Detect
Saksham Suri
Fanyi Xiao
Animesh Sinha
Sean Culatana
Raghuraman Krishnamoorthi
Chenchen Zhu
Abhinav Shrivastava
VLMDiffM
91
10
0
07 Dec 2023
GenTron: Diffusion Transformers for Image and Video Generation
GenTron: Diffusion Transformers for Image and Video Generation
Shoufa Chen
Mengmeng Xu
Jiawei Ren
Yuren Cong
Sen He
Yanping Xie
Animesh Sinha
Ping Luo
Tao Xiang
Juan-Manuel Perez-Rua
VGen
99
41
0
07 Dec 2023
Generating Illustrated Instructions
Generating Illustrated Instructions
Sachit Menon
Ishan Misra
Rohit Girdhar
DiffM
86
5
0
07 Dec 2023
KOALA: Empirical Lessons Toward Memory-Efficient and Fast Diffusion
  Models for Text-to-Image Synthesis
KOALA: Empirical Lessons Toward Memory-Efficient and Fast Diffusion Models for Text-to-Image Synthesis
Youngwan Lee
Kwanyong Park
Yoorhim Cho
Yong-Ju Lee
Sung Ju Hwang
VLM
68
6
0
07 Dec 2023
Diffusion Illusions: Hiding Images in Plain Sight
Diffusion Illusions: Hiding Images in Plain Sight
R. Burgert
Xiang Li
Abe Leite
Kanchana Ranasinghe
Michael S. Ryoo
118
17
0
06 Dec 2023
Language-Informed Visual Concept Learning
Language-Informed Visual Concept Learning
Sharon Lee
Yunzhi Zhang
Shangzhe Wu
Jiajun Wu
CoGe
72
9
0
06 Dec 2023
Make-A-Storyboard: A General Framework for Storyboard with Disentangled
  and Merged Control
Make-A-Storyboard: A General Framework for Storyboard with Disentangled and Merged Control
Jingkuan Song
Litao Guo
Lianli Gao
Hengtao Shen
Jingkuan Song
DiffM
70
3
0
06 Dec 2023
Cache Me if You Can: Accelerating Diffusion Models through Block Caching
Cache Me if You Can: Accelerating Diffusion Models through Block Caching
Felix Wimbauer
Bichen Wu
Edgar Schoenfeld
Xiaoliang Dai
Ji Hou
...
Jonas Kohler
Christian Rupprecht
Zorah Lähner
Peter Vajda
Jialiang Wang
DiffM
95
78
0
06 Dec 2023
Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment
Mismatch Quest: Visual and Textual Feedback for Image-Text Misalignment
Brian Gordon
Yonatan Bitton
Yonatan Shafir
Roopal Garg
Xi Chen
Dani Lischinski
Daniel Cohen-Or
Idan Szpektor
87
12
0
05 Dec 2023
Describing Differences in Image Sets with Natural Language
Describing Differences in Image Sets with Natural Language
Lisa Dunlap
Yuhui Zhang
Xiaohan Wang
Ruiqi Zhong
Trevor Darrell
Jacob Steinhardt
Joseph E. Gonzalez
Serena Yeung-Levy
CoGeVLM
125
32
0
05 Dec 2023
Previous
123...8910...161718
Next