Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2503.08250
Cited By
v1
v2
v3
v4 (latest)
Aligning Text to Image in Diffusion Models is Easier Than You Think
11 March 2025
J. Lee
Byunghee Cha
Jeongsol Kim
Jong Chul Ye
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Aligning Text to Image in Diffusion Models is Easier Than You Think"
37 / 37 papers shown
Title
FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models
Vladimir Kulikov
Matan Kleiner
Inbar Huberman-Spiegelglas
T. Michaeli
DiffM
103
17
0
11 Dec 2024
Scalable Ranked Preference Optimization for Text-to-Image Generation
Shyamgopal Karthik
Huseyin Coskun
Zeynep Akata
Sergey Tulyakov
J. Ren
Anil Kag
EGVM
87
9
0
23 Oct 2024
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Sihyun Yu
Sangkyung Kwak
Huiwon Jang
Jongheon Jeong
Jonathan Huang
Jinwoo Shin
Saining Xie
OCL
157
101
0
09 Oct 2024
Not All Diffusion Model Activations Have Been Evaluated as Discriminative Features
Benyuan Meng
Qianqian Xu
Zitai Wang
Xiaochun Cao
Qingming Huang
59
6
0
04 Oct 2024
CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models
Hyungjin Chung
Jeongsol Kim
Geon Yeong Park
Hyelin Nam
Jong Chul Ye
DiffM
75
35
0
12 Jun 2024
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Chameleon Team
MLLM
193
333
0
16 May 2024
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Patrick Esser
Sumith Kulal
A. Blattmann
Rahim Entezari
Jonas Muller
...
Zion English
Kyle Lacey
Alex Goodwin
Yannik Marek
Robin Rombach
DiffM
291
1,388
0
05 Mar 2024
Diffusion Model Alignment Using Direct Preference Optimization
Bram Wallace
Meihua Dang
Rafael Rafailov
Linqi Zhou
Aaron Lou
Senthil Purushwalkam
Stefano Ermon
Caiming Xiong
Shafiq Joty
Nikhil Naik
EGVM
131
287
0
21 Nov 2023
GenEval: An Object-Focused Framework for Evaluating Text-to-Image Alignment
Dhruba Ghosh
Hanna Hajishirzi
Ludwig Schmidt
93
203
0
17 Oct 2023
Interpretable Diffusion via Information Decomposition
Xianghao Kong
Ollie Liu
Han Li
Dani Yogatama
Greg Ver Steeg
71
22
0
12 Oct 2023
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis
Dustin Podell
Zion English
Kyle Lacey
A. Blattmann
Tim Dockhorn
Jonas Muller
Joe Penna
Robin Rombach
255
2,447
0
04 Jul 2023
Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis
Xiaoshi Wu
Yiming Hao
Keqiang Sun
Yixiong Chen
Feng Zhu
Rui Zhao
Hongsheng Li
106
314
0
15 Jun 2023
Training Diffusion Models with Reinforcement Learning
Kevin Black
Michael Janner
Yilun Du
Ilya Kostrikov
Sergey Levine
EGVM
142
376
0
22 May 2023
Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation
Yuval Kirstain
Adam Polyak
Uriel Singer
Shahbuland Matiana
Joe Penna
Omer Levy
EGVM
219
417
0
02 May 2023
Visual Instruction Tuning
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
SyDa
VLM
MLLM
569
4,910
0
17 Apr 2023
MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing
Ming Cao
Xintao Wang
Zhongang Qi
Ying Shan
Xiaohu Qie
Yinqiang Zheng
DiffM
95
466
0
17 Apr 2023
ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation
Jiazheng Xu
Xiao Liu
Yuchen Wu
Yuxuan Tong
Qinkai Li
Ming Ding
Jie Tang
Yuxiao Dong
139
408
0
12 Apr 2023
End-to-End Diffusion Latent Optimization Improves Classifier Guidance
Bram Wallace
Akash Gokul
Stefano Ermon
Nikhil Naik
172
80
0
23 Mar 2023
Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models
Hila Chefer
Yuval Alaluf
Yael Vinker
Lior Wolf
Daniel Cohen-Or
DiffM
114
515
0
31 Jan 2023
Scalable Diffusion Models with Transformers
William S. Peebles
Saining Xie
GNN
118
2,434
0
19 Dec 2022
Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation
Narek Tumanyan
Michal Geyer
Shai Bagon
Tali Dekel
128
685
0
22 Nov 2022
Flow Matching for Generative Modeling
Y. Lipman
Ricky T. Q. Chen
Heli Ben-Hamu
Maximilian Nickel
Matt Le
OOD
213
1,389
0
06 Oct 2022
Elucidating the Design Space of Diffusion-Based Generative Models
Tero Karras
M. Aittala
Timo Aila
S. Laine
DiffM
215
2,022
0
01 Jun 2022
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLM
VLM
418
3,607
0
29 Apr 2022
Visual Prompt Tuning
Menglin Jia
Luming Tang
Bor-Chun Chen
Claire Cardie
Serge Belongie
Bharath Hariharan
Ser-Nam Lim
VLM
VPVLM
155
1,645
0
23 Mar 2022
Conditional Prompt Learning for Vision-Language Models
Kaiyang Zhou
Jingkang Yang
Chen Change Loy
Ziwei Liu
VLM
CLIP
VPVLM
141
1,356
0
10 Mar 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
Guosheng Lin
MLLM
BDL
VLM
CLIP
555
4,413
0
28 Jan 2022
High-Resolution Image Synthesis with Latent Diffusion Models
Robin Rombach
A. Blattmann
Dominik Lorenz
Patrick Esser
Bjorn Ommer
3DV
496
15,768
0
20 Dec 2021
Learning to Prompt for Vision-Language Models
Kaiyang Zhou
Jingkang Yang
Chen Change Loy
Ziwei Liu
VPVLM
CLIP
VLM
507
2,413
0
02 Sep 2021
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation
Junnan Li
Ramprasaath R. Selvaraju
Akhilesh Deepak Gotmare
Shafiq Joty
Caiming Xiong
Guosheng Lin
FaML
221
1,975
0
16 Jul 2021
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
978
29,871
0
26 Feb 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
459
3,901
0
11 Feb 2021
Maximum Likelihood Training of Score-Based Diffusion Models
Yang Song
Conor Durkan
Iain Murray
Stefano Ermon
DiffM
171
673
0
22 Jan 2021
Denoising Diffusion Implicit Models
Jiaming Song
Chenlin Meng
Stefano Ermon
VLM
DiffM
292
7,492
0
06 Oct 2020
A Simple Framework for Contrastive Learning of Visual Representations
Ting-Li Chen
Simon Kornblith
Mohammad Norouzi
Geoffrey E. Hinton
SSL
387
18,866
0
13 Feb 2020
The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
Richard Y. Zhang
Phillip Isola
Alexei A. Efros
Eli Shechtman
Oliver Wang
EGVM
384
11,920
0
11 Jan 2018
Microsoft COCO: Common Objects in Context
Nayeon Lee
Michael Maire
Serge J. Belongie
Lubomir Bourdev
Ross B. Girshick
James Hays
Pietro Perona
Deva Ramanan
C. L. Zitnick
Piotr Dollár
ObjD
434
43,832
0
01 May 2014
1