ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.10789
  4. Cited By
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

22 June 2022
Jiahui Yu
Yuanzhong Xu
Jing Yu Koh
Thang Luong
Gunjan Baid
Zirui Wang
Vijay Vasudevan
Alexander Ku
Yinfei Yang
Burcu Karagol Ayan
Ben Hutchinson
Wei Han
Zarana Parekh
Xin Li
Han Zhang
Jason Baldridge
Yonghui Wu
    EGVM
ArXiv (abs)PDFHTML

Papers citing "Scaling Autoregressive Models for Content-Rich Text-to-Image Generation"

50 / 899 papers shown
Title
Confidence-aware Reward Optimization for Fine-tuning Text-to-Image
  Models
Confidence-aware Reward Optimization for Fine-tuning Text-to-Image Models
Kyuyoung Kim
Jongheon Jeong
Minyong An
Mohammad Ghavamzadeh
Krishnamurthy Dvijotham
Jinwoo Shin
Kimin Lee
EGVM
81
6
0
02 Apr 2024
MotionChain: Conversational Motion Controllers via Multimodal Prompts
MotionChain: Conversational Motion Controllers via Multimodal Prompts
Biao Jiang
Xin Chen
C. Zhang
Fukun Yin
Zhuoyuan Li
Gang Yu
Jiayuan Fan
VGenLRM
96
11
0
02 Apr 2024
Bigger is not Always Better: Scaling Properties of Latent Diffusion
  Models
Bigger is not Always Better: Scaling Properties of Latent Diffusion Models
Kangfu Mei
Zhengzhong Tu
M. Delbracio
Hossein Talebi
Vishal M. Patel
P. Milanfar
DiffM
88
13
0
01 Apr 2024
Evaluating Text-to-Visual Generation with Image-to-Text Generation
Evaluating Text-to-Visual Generation with Image-to-Text Generation
Zhiqiu Lin
Deepak Pathak
Baiqi Li
Jiayao Li
Xide Xia
Graham Neubig
Pengchuan Zhang
Deva Ramanan
EGVM
150
171
0
01 Apr 2024
A Unified and Interpretable Emotion Representation and Expression
  Generation
A Unified and Interpretable Emotion Representation and Expression Generation
Reni Paskaleva
Mykyta Holubakha
Andela Ilic
Saman Motamed
Luc Van Gool
D. Paudel
62
6
0
01 Apr 2024
Uncovering the Text Embedding in Text-to-Image Diffusion Models
Uncovering the Text Embedding in Text-to-Image Diffusion Models
Huikang Yu
Hao Luo
Fan Wang
Feng Zhao
79
10
0
01 Apr 2024
DiffAgent: Fast and Accurate Text-to-Image API Selection with Large
  Language Model
DiffAgent: Fast and Accurate Text-to-Image API Selection with Large Language Model
Lirui Zhao
Yue Yang
Kaipeng Zhang
Wenqi Shao
Yuxin Zhang
Yu Qiao
Ping Luo
Rongrong Ji
LM&RoLLMAGVLM
70
3
0
31 Mar 2024
BAMM: Bidirectional Autoregressive Motion Model
BAMM: Bidirectional Autoregressive Motion Model
Ekkasit Pinyoanuntapong
Muhammad Usama Saleem
Pu Wang
Minwoo Lee
Srijan Das
Chong Chen
VGen
61
25
0
28 Mar 2024
DreamSalon: A Staged Diffusion Framework for Preserving Identity-Context
  in Editable Face Generation
DreamSalon: A Staged Diffusion Framework for Preserving Identity-Context in Editable Face Generation
Haonan Lin
Mengmeng Wang
Yan Chen
Wenbin An
Yuzhe Yao
Guang Dai
Qianying Wang
Yong-Jin Liu
Jingdong Wang
DiffM
81
4
0
28 Mar 2024
Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation
Automated Black-box Prompt Engineering for Personalized Text-to-Image Generation
Yutong He
Alexander Robey
Naoki Murata
Yiding Jiang
J. Williams
George Pappas
Hamed Hassani
Yuki Mitsufuji
Ruslan Salakhutdinov
J. Zico Kolter
DiffM
151
5
0
28 Mar 2024
TextCraftor: Your Text Encoder Can be Image Quality Controller
TextCraftor: Your Text Encoder Can be Image Quality Controller
Yanyu Li
Xian Liu
Anil Kag
Ju Hu
Yerlan Idelbayev
Dhritiman Sagar
Yanzhi Wang
Sergey Tulyakov
Jian Ren
93
18
0
27 Mar 2024
Attention Calibration for Disentangled Text-to-Image Personalization
Attention Calibration for Disentangled Text-to-Image Personalization
Yanbing Zhang
Mengping Yang
Qin Zhou
Zhe Wang
113
17
0
27 Mar 2024
Improving Text-to-Image Consistency via Automatic Prompt Optimization
Improving Text-to-Image Consistency via Automatic Prompt Optimization
Oscar Manas
Pietro Astolfi
Melissa Hall
Candace Ross
Jack Urbanek
Adina Williams
Aishwarya Agrawal
Adriana Romero Soriano
M. Drozdzal
80
36
0
26 Mar 2024
Skews in the Phenomenon Space Hinder Generalization in Text-to-Image
  Generation
Skews in the Phenomenon Space Hinder Generalization in Text-to-Image Generation
Yingshan Chang
Yasi Zhang
Zhiyuan Fang
Yingnian Wu
Yonatan Bisk
Feng Gao
EGVM
116
7
0
25 Mar 2024
Generative Active Learning for Image Synthesis Personalization
Generative Active Learning for Image Synthesis Personalization
Xu-Lu Zhang
Wengyu Zhang
Xiao Wei
Jinlin Wu
Zhaoxiang Zhang
Zhen Lei
Qing Li
142
4
0
22 Mar 2024
CLIP-VQDiffusion : Langauge Free Training of Text To Image generation
  using CLIP and vector quantized diffusion model
CLIP-VQDiffusion : Langauge Free Training of Text To Image generation using CLIP and vector quantized diffusion model
S. Han
Joohee Kim
DiffMCLIP
70
2
0
22 Mar 2024
When Do We Not Need Larger Vision Models?
When Do We Not Need Larger Vision Models?
Baifeng Shi
Ziyang Wu
Maolin Mao
Xin Wang
Trevor Darrell
VLMLRM
119
47
0
19 Mar 2024
Can AI Outperform Human Experts in Creating Social Media Creatives?
Can AI Outperform Human Experts in Creating Social Media Creatives?
Eunkyung Park
Raymond K. Wong
Junbum Kwon
62
0
0
19 Mar 2024
Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion
  Distillation
Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation
Axel Sauer
Frederic Boesel
Tim Dockhorn
A. Blattmann
Patrick Esser
Robin Rombach
DiffM
111
135
0
18 Mar 2024
LayerDiff: Exploring Text-guided Multi-layered Composable Image
  Synthesis via Layer-Collaborative Diffusion Model
LayerDiff: Exploring Text-guided Multi-layered Composable Image Synthesis via Layer-Collaborative Diffusion Model
Runhu Huang
Kaixin Cai
Jianhua Han
Xiaodan Liang
Renjing Pei
Guansong Lu
Songcen Xu
Wei Zhang
Hang Xu
DiffM
80
5
0
18 Mar 2024
LogicalDefender: Discovering, Extracting, and Utilizing Common-Sense
  Knowledge
LogicalDefender: Discovering, Extracting, and Utilizing Common-Sense Knowledge
Yuhe Liu
Mengxue Kang
Zengchang Qin
Xiangxiang Chu
NAIVLM
56
0
0
18 Mar 2024
Automated data processing and feature engineering for deep learning and
  big data applications: a survey
Automated data processing and feature engineering for deep learning and big data applications: a survey
A. Mumuni
F. Mumuni
TPM
84
60
0
18 Mar 2024
Reward Guided Latent Consistency Distillation
Reward Guided Latent Consistency Distillation
Jiachen Li
Weixi Feng
Wenhu Chen
William Y. Wang
EGVM
82
15
0
16 Mar 2024
Desigen: A Pipeline for Controllable Design Template Generation
Desigen: A Pipeline for Controllable Design Template Generation
Haohan Weng
Danqing Huang
Yu Qiao
Zheng Hu
Chin-Yew Lin
Tong Zhang
Chong Chen
DiffM
67
17
0
14 Mar 2024
Follow-Your-Click: Open-domain Regional Image Animation via Short
  Prompts
Follow-Your-Click: Open-domain Regional Image Animation via Short Prompts
Yue Ma
Yin-Yin He
Hongfa Wang
Andong Wang
Chenyang Qi
...
Xiu Li
Zhifeng Li
H. Shum
Wei Liu
Qifeng Chen
VGenDiffM
161
43
0
13 Mar 2024
AesopAgent: Agent-driven Evolutionary System on Story-to-Video
  Production
AesopAgent: Agent-driven Evolutionary System on Story-to-Video Production
Jiuniu Wang
Zehua Du
Yuyuan Zhao
Bo Yuan
Kexiang Wang
...
Yihen Lu
Gengliang Li
Junlong Gao
Xin Tu
Zhenyu Guo
LLMAGVGen
81
8
0
12 Mar 2024
SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with
  Auto-Generated Data
SELMA: Learning and Merging Skill-Specific Text-to-Image Experts with Auto-Generated Data
Jialu Li
Jaemin Cho
Yi-Lin Sung
Jaehong Yoon
Mohit Bansal
MoMeDiffM
99
9
0
11 Mar 2024
DivCon: Divide and Conquer for Progressive Text-to-Image Generation
DivCon: Divide and Conquer for Progressive Text-to-Image Generation
Yuhao Jia
Wenhan Tan
DiffM
104
1
0
11 Mar 2024
VideoElevator: Elevating Video Generation Quality with Versatile
  Text-to-Image Diffusion Models
VideoElevator: Elevating Video Generation Quality with Versatile Text-to-Image Diffusion Models
Yabo Zhang
Yuxiang Wei
Xianhui Lin
Zheng Hui
Peiran Ren
Xuansong Xie
Xiangyang Ji
Wangmeng Zuo
VGen
87
7
0
08 Mar 2024
Towards Effective Usage of Human-Centric Priors in Diffusion Models for
  Text-based Human Image Generation
Towards Effective Usage of Human-Centric Priors in Diffusion Models for Text-based Human Image Generation
Junyan Wang
Zhenhong Sun
Zhiyu Tan
Xuanbai Chen
Weihua Chen
Hao Li
Cheng Zhang
Yang Song
96
12
0
08 Mar 2024
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
Xiwei Hu
Rui Wang
Yixiao Fang
Bin-Bin Fu
Pei Cheng
Gang Yu
VLM
124
103
0
08 Mar 2024
CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion
CogView3: Finer and Faster Text-to-Image Generation via Relay Diffusion
Wendi Zheng
Jiayan Teng
Zhuoyi Yang
Weihan Wang
Jidong Chen
Xiaotao Gu
Yuxiao Dong
Ming Ding
Jie Tang
DiffM
101
41
0
08 Mar 2024
StereoDiffusion: Training-Free Stereo Image Generation Using Latent
  Diffusion Models
StereoDiffusion: Training-Free Stereo Image Generation Using Latent Diffusion Models
Lezhong Wang
J. Frisvad
Mark Bo Jensen
Siavash Bigdeli
DiffM
72
12
0
08 Mar 2024
Pix2Gif: Motion-Guided Diffusion for GIF Generation
Pix2Gif: Motion-Guided Diffusion for GIF Generation
Hitesh Kandala
Jianfeng Gao
Jianwei Yang
VGenDiffM
81
3
0
07 Mar 2024
CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?
CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?
Ibrahim Alabdulmohsin
Xiao Wang
Andreas Steiner
Priya Goyal
Alexander DÁmour
Xiao-Qi Zhai
84
21
0
07 Mar 2024
Discriminative Probing and Tuning for Text-to-Image Generation
Discriminative Probing and Tuning for Text-to-Image Generation
Leigang Qu
Wenjie Wang
Chak Tou Leong
Hanwang Zhang
Liqiang Nie
Tat-Seng Chua
87
8
0
07 Mar 2024
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Patrick Esser
Sumith Kulal
A. Blattmann
Rahim Entezari
Jonas Muller
...
Zion English
Kyle Lacey
Alex Goodwin
Yannik Marek
Robin Rombach
DiffM
321
1,410
0
05 Mar 2024
OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable
  Virtual Try-on
OOTDiffusion: Outfitting Fusion based Latent Diffusion for Controllable Virtual Try-on
Yuhao Xu
Tao Gu
Weifeng Chen
Chengcai Chen
DiffM
91
66
0
04 Mar 2024
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Tsai-Shien Chen
Aliaksandr Siarohin
Willi Menapace
Ekaterina Deyneka
Hsiang-wei Chao
...
Yuwei Fang
Hsin-Ying Lee
Jian Ren
Ming-Hsuan Yang
Sergey Tulyakov
VGen
166
211
0
29 Feb 2024
DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized
  Diffusion Models
DiffuseKronA: A Parameter Efficient Fine-tuning Method for Personalized Diffusion Models
Shyam Marjit
Harshit Singh
Nityanand Mathur
Sayak Paul
Chia-Mu Yu
Pin-Yu Chen
DiffM
77
7
0
27 Feb 2024
Disentangled 3D Scene Generation with Layout Learning
Disentangled 3D Scene Generation with Layout Learning
Dave Epstein
Ben Poole
B. Mildenhall
Alexei A. Efros
Aleksander Holynski
CoGeOCL3DV
79
21
0
26 Feb 2024
Contextualized Diffusion Models for Text-Guided Image and Video
  Generation
Contextualized Diffusion Models for Text-Guided Image and Video Generation
Ling Yang
Zhilong Zhang
Zhaochen Yu
Jingwei Liu
Minkai Xu
Stefano Ermon
Tengjiao Wang
68
4
0
26 Feb 2024
Generative AI in Vision: A Survey on Models, Metrics and Applications
Generative AI in Vision: A Survey on Models, Metrics and Applications
Gaurav Raut
Apoorv Singh
VLMMedIm
112
7
0
26 Feb 2024
Gen4Gen: Generative Data Pipeline for Generative Multi-Concept
  Composition
Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition
Chun-Hsiao Yeh
Ta-Ying Cheng
He-Yen Hsieh
Chuan-En Lin
Yi Ma
Andrew Markham
Niki Trigoni
H. T. Kung
Yubei Chen
DiffM
30
4
0
23 Feb 2024
ProTIP: Probabilistic Robustness Verification on Text-to-Image Diffusion
  Models against Stochastic Perturbation
ProTIP: Probabilistic Robustness Verification on Text-to-Image Diffusion Models against Stochastic Perturbation
Yi Zhang
Yun Tang
Wenjie Ruan
Xiaowei Huang
Siddartha Khastgir
P. Jennings
Xingyu Zhao
AAML
70
4
0
23 Feb 2024
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video
  Synthesis
Snap Video: Scaled Spatiotemporal Transformers for Text-to-Video Synthesis
Willi Menapace
Aliaksandr Siarohin
Ivan Skorokhodov
Ekaterina Deyneka
Tsai-Shien Chen
...
Yuwei Fang
A. Stoliar
Elisa Ricci
Jian Ren
Sergey Tulyakov
VGen
134
62
0
22 Feb 2024
Contrastive Prompts Improve Disentanglement in Text-to-Image Diffusion
  Models
Contrastive Prompts Improve Disentanglement in Text-to-Image Diffusion Models
C. Wu
Fernando de la Torre
DiffM
65
2
0
21 Feb 2024
CoFRIDA: Self-Supervised Fine-Tuning for Human-Robot Co-Painting
CoFRIDA: Self-Supervised Fine-Tuning for Human-Robot Co-Painting
Peter Schaldenbrand
Gaurav Parmar
Jun-Yan Zhu
James McCann
Jean Oh
63
14
0
21 Feb 2024
Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation
Self-Play Fine-Tuning of Diffusion Models for Text-to-Image Generation
Huizhuo Yuan
Zixiang Chen
Kaixuan Ji
Quanquan Gu
118
29
0
15 Feb 2024
L3GO: Language Agents with Chain-of-3D-Thoughts for Generating
  Unconventional Objects
L3GO: Language Agents with Chain-of-3D-Thoughts for Generating Unconventional Objects
Yutaro Yamada
Khyathi Chandu
Yuchen Lin
Jack Hessel
Ilker Yildirim
Yejin Choi
AI4CE
77
13
0
14 Feb 2024
Previous
123...789...161718
Next