ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.10789
  4. Cited By
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

22 June 2022
Jiahui Yu
Yuanzhong Xu
Jing Yu Koh
Thang Luong
Gunjan Baid
Zirui Wang
Vijay Vasudevan
Alexander Ku
Yinfei Yang
Burcu Karagol Ayan
Ben Hutchinson
Wei Han
Zarana Parekh
Xin Li
Han Zhang
Jason Baldridge
Yonghui Wu
    EGVM
ArXivPDFHTML

Papers citing "Scaling Autoregressive Models for Content-Rich Text-to-Image Generation"

50 / 874 papers shown
Title
High-Fidelity Image Compression with Score-based Generative Models
High-Fidelity Image Compression with Score-based Generative Models
Emiel Hoogeboom
E. Agustsson
Fabian Mentzer
Luca Versari
G. Toderici
Lucas Theis
DiffM
21
39
0
26 May 2023
Improved Visual Story Generation with Adaptive Context Modeling
Improved Visual Story Generation with Adaptive Context Modeling
Zhangyin Feng
Yuchen Ren
Xinmiao Yu
Xiaocheng Feng
Duyu Tang
Shuming Shi
Bing Qin
DiffM
40
14
0
26 May 2023
Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models
Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models
Shihao Zhao
Dongdong Chen
Yen-Chun Chen
Jianmin Bao
Shaozhe Hao
Lu Yuan
Kwan-Yee K. Wong
29
238
0
25 May 2023
Image as First-Order Norm+Linear Autoregression: Unveiling Mathematical
  Invariance
Image as First-Order Norm+Linear Autoregression: Unveiling Mathematical Invariance
Yinpeng Chen
Xiyang Dai
Dongdong Chen
Mengchen Liu
Lu Yuan
Zicheng Liu
Youzuo Lin
33
2
0
25 May 2023
Break-A-Scene: Extracting Multiple Concepts from a Single Image
Break-A-Scene: Extracting Multiple Concepts from a Single Image
Omri Avrahami
Kfir Aberman
Ohad Fried
Daniel Cohen-Or
Dani Lischinski
VLM
DiffM
35
165
0
25 May 2023
Prompt-Free Diffusion: Taking "Text" out of Text-to-Image Diffusion
  Models
Prompt-Free Diffusion: Taking "Text" out of Text-to-Image Diffusion Models
Xingqian Xu
Jiayi Guo
Zhangyang Wang
Gao Huang
Irfan Essa
Humphrey Shi
VLM
DiffM
40
57
0
25 May 2023
GenerateCT: Text-Conditional Generation of 3D Chest CT Volumes
GenerateCT: Text-Conditional Generation of 3D Chest CT Volumes
Ibrahim Ethem Hamamci
Sezgin Er
Anjany Sekuboyina
Enis Simsar
A. Tezcan
...
Hadrien Reynaud
Sarthak Pati
Christian Bluethgen
M. K. Özdemir
Bjoern H. Menze
DiffM
MedIm
50
16
0
25 May 2023
A Neural Space-Time Representation for Text-to-Image Personalization
A Neural Space-Time Representation for Text-to-Image Personalization
Yuval Alaluf
Elad Richardson
G. Metzer
Daniel Cohen-Or
DiffM
35
94
0
24 May 2023
Visual Programming for Text-to-Image Generation and Evaluation
Visual Programming for Text-to-Image Generation and Evaluation
Jaemin Cho
Abhaysinh Zala
Joey Tianyi Zhou
MLLM
38
50
0
24 May 2023
I Spy a Metaphor: Large Language Models and Diffusion Models Co-Create
  Visual Metaphors
I Spy a Metaphor: Large Language Models and Diffusion Models Co-Create Visual Metaphors
Tuhin Chakrabarty
Arkadiy Saakyan
Olivia Winn
Artemis Panagopoulou
Yue Yang
Marianna Apidianaki
Smaranda Muresan
DiffM
33
41
0
24 May 2023
Vision + Language Applications: A Survey
Vision + Language Applications: A Survey
Yutong Zhou
N. Shimada
VLM
35
6
0
24 May 2023
Diffusion Hyperfeatures: Searching Through Time and Space for Semantic
  Correspondence
Diffusion Hyperfeatures: Searching Through Time and Space for Semantic Correspondence
Grace Luo
Lisa Dunlap
Dong Huk Park
Aleksander Holynski
Trevor Darrell
42
119
0
23 May 2023
Training Transitive and Commutative Multimodal Transformers with LoReTTa
Training Transitive and Commutative Multimodal Transformers with LoReTTa
Manuel Tran
Yashin Dicente Cid
Amal Lahiani
Fabian J. Theis
Tingying Peng
Eldad Klaiman
26
2
0
23 May 2023
Training Priors Predict Text-To-Image Model Performance
Training Priors Predict Text-To-Image Model Performance
Charles Lovering
Ellie Pavlick
CoGe
43
3
0
23 May 2023
Enhancing Detail Preservation for Customized Text-to-Image Generation: A
  Regularization-Free Approach
Enhancing Detail Preservation for Customized Text-to-Image Generation: A Regularization-Free Approach
Yufan Zhou
Ruiyi Zhang
Tongfei Sun
Jinhui Xu
DiffM
109
38
0
23 May 2023
If at First You Don't Succeed, Try, Try Again: Faithful Diffusion-based
  Text-to-Image Generation by Selection
If at First You Don't Succeed, Try, Try Again: Faithful Diffusion-based Text-to-Image Generation by Selection
Shyamgopal Karthik
Karsten Roth
Massimiliano Mancini
Zeynep Akata
36
21
0
22 May 2023
ControlVideo: Training-free Controllable Text-to-Video Generation
ControlVideo: Training-free Controllable Text-to-Video Generation
Yabo Zhang
Yuxiang Wei
Dongsheng Jiang
Xiaopeng Zhang
W. Zuo
Qi Tian
VGen
DiffM
48
237
0
22 May 2023
Textually Pretrained Speech Language Models
Textually Pretrained Speech Language Models
Michael Hassid
Tal Remez
Tu Nguyen
Itai Gat
Alexis Conneau
...
Alexandre Défossez
Gabriel Synnaeve
Emmanuel Dupoux
Roy Schwartz
Yossi Adi
VLM
SyDa
44
54
0
22 May 2023
The Waymo Open Sim Agents Challenge
The Waymo Open Sim Agents Challenge
Nico Montali
John Lambert
Paul Mougin
Alex Kuefler
Nick Rhinehart
...
Tristan Emrich
Zoey Yang
Shimon Whiteson
Brandyn White
Drago Anguelov
LLMAG
43
46
0
19 May 2023
AI's Regimes of Representation: A Community-centered Study of
  Text-to-Image Models in South Asia
AI's Regimes of Representation: A Community-centered Study of Text-to-Image Models in South Asia
Rida Qadri
Renee Shelby
Cynthia L. Bennett
Emily Denton
31
67
0
19 May 2023
Towards Accurate Image Coding: Improved Autoregressive Image Generation
  with Dynamic Vector Quantization
Towards Accurate Image Coding: Improved Autoregressive Image Generation with Dynamic Vector Quantization
Mengqi Huang
Zhendong Mao
Zhuowei Chen
Yongdong Zhang
MQ
38
36
0
19 May 2023
Efficient Cross-Lingual Transfer for Chinese Stable Diffusion with
  Images as Pivots
Efficient Cross-Lingual Transfer for Chinese Stable Diffusion with Images as Pivots
Jinyi Hu
Xu Han
Xiaoyuan Yi
Yutong Chen
Wenhao Li
Zhiyuan Liu
Maosong Sun
DiffM
33
4
0
19 May 2023
A Survey of Safety and Trustworthiness of Large Language Models through
  the Lens of Verification and Validation
A Survey of Safety and Trustworthiness of Large Language Models through the Lens of Verification and Validation
Xiaowei Huang
Wenjie Ruan
Wei Huang
Gao Jin
Yizhen Dong
...
Sihao Wu
Peipei Xu
Dengyu Wu
André Freitas
Mustafa A. Mustafa
ALM
49
83
0
19 May 2023
Inspecting the Geographical Representativeness of Images from
  Text-to-Image Models
Inspecting the Geographical Representativeness of Images from Text-to-Image Models
Aparna Basu
R. Venkatesh Babu
Danish Pruthi
DiffM
31
39
0
18 May 2023
X-IQE: eXplainable Image Quality Evaluation for Text-to-Image Generation
  with Visual Large Language Models
X-IQE: eXplainable Image Quality Evaluation for Text-to-Image Generation with Visual Large Language Models
Yixiong Chen
Li Liu
C. Ding
34
21
0
18 May 2023
What You See is What You Read? Improving Text-Image Alignment Evaluation
What You See is What You Read? Improving Text-Image Alignment Evaluation
Michal Yarom
Yonatan Bitton
Soravit Changpinyo
Roee Aharoni
Jonathan Herzig
Oran Lang
E. Ofek
Idan Szpektor
EGVM
59
74
0
17 May 2023
Sequence-to-Sequence Pre-training with Unified Modality Masking for
  Visual Document Understanding
Sequence-to-Sequence Pre-training with Unified Modality Masking for Visual Document Understanding
ShuWei Feng
Tianyang Zhan
Zhanming Jie
Trung Quoc Luong
Xiaoran Jin
27
1
0
16 May 2023
DATED: Guidelines for Creating Synthetic Datasets for Engineering Design
  Applications
DATED: Guidelines for Creating Synthetic Datasets for Engineering Design Applications
Cyril Picard
Jürg Schiffmann
Faez Ahmed
45
8
0
15 May 2023
Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts
Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts
Yuyang Zhao
Enze Xie
Lanqing Hong
Zhenguo Li
G. Lee
DiffM
VGen
38
33
0
15 May 2023
MMoT: Mixture-of-Modality-Tokens Transformer for Composed Multimodal
  Conditional Image Synthesis
MMoT: Mixture-of-Modality-Tokens Transformer for Composed Multimodal Conditional Image Synthesis
Jinsheng Zheng
Daqing Liu
Chaoyue Wang
Minghui Hu
Zuopeng Yang
Changxing Ding
Dacheng Tao
34
1
0
10 May 2023
Recommender Systems with Generative Retrieval
Recommender Systems with Generative Retrieval
Shashank Rajput
Nikhil Mehta
Anima Singh
Raghunandan H. Keshavan
T. Vu
...
Vinh Q. Tran
Jonah Samost
Maciej Kula
Ed H. Chi
M. Sathiamoorthy
RALM
3DV
31
77
0
08 May 2023
ReGeneration Learning of Diffusion Models with Rich Prompts for
  Zero-Shot Image Translation
ReGeneration Learning of Diffusion Models with Rich Prompts for Zero-Shot Image Translation
Yupei Lin
Senyang Zhang
Xiaojun Yang
Tianlin Li
Yukai Shi
DiffM
38
5
0
08 May 2023
Towards Prompt-robust Face Privacy Protection via Adversarial Decoupling
  Augmentation Framework
Towards Prompt-robust Face Privacy Protection via Adversarial Decoupling Augmentation Framework
Ruijia Wu
Yuhang Wang
Huafeng Shi
Zhipeng Yu
Yichao Wu
Ding Liang
DiffM
29
9
0
06 May 2023
Controllable Visual-Tactile Synthesis
Controllable Visual-Tactile Synthesis
Ruihan Gao
Wenzhen Yuan
Jun-Yan Zhu
DiffM
22
6
0
04 May 2023
Shap-E: Generating Conditional 3D Implicit Functions
Shap-E: Generating Conditional 3D Implicit Functions
Heewoo Jun
Alex Nichol
DiffM
203
311
0
03 May 2023
Nonparametric Generative Modeling with Conditional Sliced-Wasserstein
  Flows
Nonparametric Generative Modeling with Conditional Sliced-Wasserstein Flows
Chao Du
Tianbo Li
Tianyu Pang
Shuicheng Yan
Min Lin
DiffM
BDL
47
12
0
03 May 2023
DreamPaint: Few-Shot Inpainting of E-Commerce Items for Virtual Try-On
  without 3D Modeling
DreamPaint: Few-Shot Inpainting of E-Commerce Items for Virtual Try-On without 3D Modeling
M. S. Seyfioglu
Karim Bouyarmane
Suren Kumar
A. Tavanaei
Ismail B. Tutar
DiffM
53
4
0
02 May 2023
Let the Chart Spark: Embedding Semantic Context into Chart with
  Text-to-Image Generative Model
Let the Chart Spark: Embedding Semantic Context into Chart with Text-to-Image Generative Model
Shishi Xiao
Suizi Huang
Yue Lin
Yilin Ye
Weizhen Zeng
41
31
0
28 Apr 2023
IconShop: Text-Guided Vector Icon Synthesis with Autoregressive
  Transformers
IconShop: Text-Guided Vector Icon Synthesis with Autoregressive Transformers
Rong Wu
Wanchao Su
Kede Ma
Jing Liao
35
34
0
27 Apr 2023
Energy-based Models are Zero-Shot Planners for Compositional Scene
  Rearrangement
Energy-based Models are Zero-Shot Planners for Compositional Scene Rearrangement
N. Gkanatsios
Ayush Jain
Zhou Xian
Yunchu Zhang
C. Atkeson
Katerina Fragkiadaki
LM&Ro
98
31
0
27 Apr 2023
TR0N: Translator Networks for 0-Shot Plug-and-Play Conditional
  Generation
TR0N: Translator Networks for 0-Shot Plug-and-Play Conditional Generation
Zhaoyan Liu
Noël Vouitsis
S. Gorti
Jimmy Ba
G. Loaiza-Ganem
ViT
33
1
0
26 Apr 2023
Seeing is not always believing: Benchmarking Human and Model Perception
  of AI-Generated Images
Seeing is not always believing: Benchmarking Human and Model Perception of AI-Generated Images
Zeyu Lu
Di Huang
Lei Bai
Jingjing Qu
Chengzhi Wu
Xihui Liu
Wanli Ouyang
26
53
0
25 Apr 2023
TextMesh: Generation of Realistic 3D Meshes From Text Prompts
TextMesh: Generation of Realistic 3D Meshes From Text Prompts
Christina Tsalicoglou
Fabian Manhardt
A. Tonioni
Michael Niemeyer
F. Tombari
DiffM
22
130
0
24 Apr 2023
A Cookbook of Self-Supervised Learning
A Cookbook of Self-Supervised Learning
Randall Balestriero
Mark Ibrahim
Vlad Sobal
Ari S. Morcos
Shashank Shekhar
...
Pierre Fernandez
Amir Bar
Hamed Pirsiavash
Yann LeCun
Micah Goldblum
SyDa
FedML
SSL
50
275
0
24 Apr 2023
Evolving Three Dimension (3D) Abstract Art: Fitting Concepts by Language
Evolving Three Dimension (3D) Abstract Art: Fitting Concepts by Language
Yingtao Tian
24
1
0
24 Apr 2023
Align your Latents: High-Resolution Video Synthesis with Latent
  Diffusion Models
Align your Latents: High-Resolution Video Synthesis with Latent Diffusion Models
A. Blattmann
Robin Rombach
Huan Ling
Tim Dockhorn
Seung Wook Kim
Sanja Fidler
Karsten Kreis
3DGS
VGen
133
1,019
0
18 Apr 2023
Visual Instruction Tuning
Visual Instruction Tuning
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
SyDa
VLM
MLLM
151
4,325
0
17 Apr 2023
Latent-Shift: Latent Diffusion with Temporal Shift for Efficient
  Text-to-Video Generation
Latent-Shift: Latent Diffusion with Temporal Shift for Efficient Text-to-Video Generation
Jie An
Songyang Zhang
Harry Yang
Sonal Gupta
Jia-Bin Huang
Jiebo Luo
Xiaoyue Yin
DiffM
VGen
38
107
0
17 Apr 2023
MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image
  Synthesis and Editing
MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing
Ming Cao
Xintao Wang
Zhongang Qi
Ying Shan
Xiaohu Qie
Yinqiang Zheng
DiffM
42
430
0
17 Apr 2023
AutoSplice: A Text-prompt Manipulated Image Dataset for Media Forensics
AutoSplice: A Text-prompt Manipulated Image Dataset for Media Forensics
Shan Jia
Mingzhen Huang
Zhou Zhou
Yan Ju
Jialing Cai
Siwei Lyu
DiffM
29
29
0
14 Apr 2023
Previous
123...121314...161718
Next