ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.08250
  4. Cited By
Aligning Text to Image in Diffusion Models is Easier Than You Think
v1v2v3v4 (latest)

Aligning Text to Image in Diffusion Models is Easier Than You Think

11 March 2025
J. Lee
Byunghee Cha
Jeongsol Kim
Jong Chul Ye
ArXiv (abs)PDFHTML

Papers citing "Aligning Text to Image in Diffusion Models is Easier Than You Think"

37 / 37 papers shown
Title
FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow
  Models
FlowEdit: Inversion-Free Text-Based Editing Using Pre-Trained Flow Models
Vladimir Kulikov
Matan Kleiner
Inbar Huberman-Spiegelglas
T. Michaeli
DiffM
103
17
0
11 Dec 2024
Scalable Ranked Preference Optimization for Text-to-Image Generation
Scalable Ranked Preference Optimization for Text-to-Image Generation
Shyamgopal Karthik
Huseyin Coskun
Zeynep Akata
Sergey Tulyakov
J. Ren
Anil Kag
EGVM
87
9
0
23 Oct 2024
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Sihyun Yu
Sangkyung Kwak
Huiwon Jang
Jongheon Jeong
Jonathan Huang
Jinwoo Shin
Saining Xie
OCL
157
101
0
09 Oct 2024
Not All Diffusion Model Activations Have Been Evaluated as
  Discriminative Features
Not All Diffusion Model Activations Have Been Evaluated as Discriminative Features
Benyuan Meng
Qianqian Xu
Zitai Wang
Xiaochun Cao
Qingming Huang
59
6
0
04 Oct 2024
CFG++: Manifold-constrained Classifier Free Guidance for Diffusion
  Models
CFG++: Manifold-constrained Classifier Free Guidance for Diffusion Models
Hyungjin Chung
Jeongsol Kim
Geon Yeong Park
Hyelin Nam
Jong Chul Ye
DiffM
75
35
0
12 Jun 2024
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Chameleon Team
MLLM
193
333
0
16 May 2024
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Patrick Esser
Sumith Kulal
A. Blattmann
Rahim Entezari
Jonas Muller
...
Zion English
Kyle Lacey
Alex Goodwin
Yannik Marek
Robin Rombach
DiffM
291
1,388
0
05 Mar 2024
Diffusion Model Alignment Using Direct Preference Optimization
Diffusion Model Alignment Using Direct Preference Optimization
Bram Wallace
Meihua Dang
Rafael Rafailov
Linqi Zhou
Aaron Lou
Senthil Purushwalkam
Stefano Ermon
Caiming Xiong
Shafiq Joty
Nikhil Naik
EGVM
131
287
0
21 Nov 2023
GenEval: An Object-Focused Framework for Evaluating Text-to-Image
  Alignment
GenEval: An Object-Focused Framework for Evaluating Text-to-Image Alignment
Dhruba Ghosh
Hanna Hajishirzi
Ludwig Schmidt
93
203
0
17 Oct 2023
Interpretable Diffusion via Information Decomposition
Interpretable Diffusion via Information Decomposition
Xianghao Kong
Ollie Liu
Han Li
Dani Yogatama
Greg Ver Steeg
71
22
0
12 Oct 2023
SDXL: Improving Latent Diffusion Models for High-Resolution Image
  Synthesis
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis
Dustin Podell
Zion English
Kyle Lacey
A. Blattmann
Tim Dockhorn
Jonas Muller
Joe Penna
Robin Rombach
255
2,447
0
04 Jul 2023
Human Preference Score v2: A Solid Benchmark for Evaluating Human
  Preferences of Text-to-Image Synthesis
Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis
Xiaoshi Wu
Yiming Hao
Keqiang Sun
Yixiong Chen
Feng Zhu
Rui Zhao
Hongsheng Li
106
314
0
15 Jun 2023
Training Diffusion Models with Reinforcement Learning
Training Diffusion Models with Reinforcement Learning
Kevin Black
Michael Janner
Yilun Du
Ilya Kostrikov
Sergey Levine
EGVM
142
376
0
22 May 2023
Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image
  Generation
Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation
Yuval Kirstain
Adam Polyak
Uriel Singer
Shahbuland Matiana
Joe Penna
Omer Levy
EGVM
219
417
0
02 May 2023
Visual Instruction Tuning
Visual Instruction Tuning
Haotian Liu
Chunyuan Li
Qingyang Wu
Yong Jae Lee
SyDaVLMMLLM
569
4,910
0
17 Apr 2023
MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image
  Synthesis and Editing
MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing
Ming Cao
Xintao Wang
Zhongang Qi
Ying Shan
Xiaohu Qie
Yinqiang Zheng
DiffM
95
466
0
17 Apr 2023
ImageReward: Learning and Evaluating Human Preferences for Text-to-Image
  Generation
ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation
Jiazheng Xu
Xiao Liu
Yuchen Wu
Yuxuan Tong
Qinkai Li
Ming Ding
Jie Tang
Yuxiao Dong
139
408
0
12 Apr 2023
End-to-End Diffusion Latent Optimization Improves Classifier Guidance
End-to-End Diffusion Latent Optimization Improves Classifier Guidance
Bram Wallace
Akash Gokul
Stefano Ermon
Nikhil Naik
172
80
0
23 Mar 2023
Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image
  Diffusion Models
Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models
Hila Chefer
Yuval Alaluf
Yael Vinker
Lior Wolf
Daniel Cohen-Or
DiffM
114
515
0
31 Jan 2023
Scalable Diffusion Models with Transformers
Scalable Diffusion Models with Transformers
William S. Peebles
Saining Xie
GNN
118
2,434
0
19 Dec 2022
Plug-and-Play Diffusion Features for Text-Driven Image-to-Image
  Translation
Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation
Narek Tumanyan
Michal Geyer
Shai Bagon
Tali Dekel
128
685
0
22 Nov 2022
Flow Matching for Generative Modeling
Flow Matching for Generative Modeling
Y. Lipman
Ricky T. Q. Chen
Heli Ben-Hamu
Maximilian Nickel
Matt Le
OOD
213
1,389
0
06 Oct 2022
Elucidating the Design Space of Diffusion-Based Generative Models
Elucidating the Design Space of Diffusion-Based Generative Models
Tero Karras
M. Aittala
Timo Aila
S. Laine
DiffM
215
2,022
0
01 Jun 2022
Flamingo: a Visual Language Model for Few-Shot Learning
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLMVLM
418
3,607
0
29 Apr 2022
Visual Prompt Tuning
Visual Prompt Tuning
Menglin Jia
Luming Tang
Bor-Chun Chen
Claire Cardie
Serge Belongie
Bharath Hariharan
Ser-Nam Lim
VLMVPVLM
155
1,645
0
23 Mar 2022
Conditional Prompt Learning for Vision-Language Models
Conditional Prompt Learning for Vision-Language Models
Kaiyang Zhou
Jingkang Yang
Chen Change Loy
Ziwei Liu
VLMCLIPVPVLM
141
1,356
0
10 Mar 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified
  Vision-Language Understanding and Generation
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
Guosheng Lin
MLLMBDLVLMCLIP
555
4,413
0
28 Jan 2022
High-Resolution Image Synthesis with Latent Diffusion Models
High-Resolution Image Synthesis with Latent Diffusion Models
Robin Rombach
A. Blattmann
Dominik Lorenz
Patrick Esser
Bjorn Ommer
3DV
496
15,768
0
20 Dec 2021
Learning to Prompt for Vision-Language Models
Learning to Prompt for Vision-Language Models
Kaiyang Zhou
Jingkang Yang
Chen Change Loy
Ziwei Liu
VPVLMCLIPVLM
507
2,413
0
02 Sep 2021
Align before Fuse: Vision and Language Representation Learning with
  Momentum Distillation
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation
Junnan Li
Ramprasaath R. Selvaraju
Akhilesh Deepak Gotmare
Shafiq Joty
Caiming Xiong
Guosheng Lin
FaML
221
1,975
0
16 Jul 2021
Learning Transferable Visual Models From Natural Language Supervision
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIPVLM
978
29,871
0
26 Feb 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy
  Text Supervision
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLMCLIP
459
3,901
0
11 Feb 2021
Maximum Likelihood Training of Score-Based Diffusion Models
Maximum Likelihood Training of Score-Based Diffusion Models
Yang Song
Conor Durkan
Iain Murray
Stefano Ermon
DiffM
171
673
0
22 Jan 2021
Denoising Diffusion Implicit Models
Denoising Diffusion Implicit Models
Jiaming Song
Chenlin Meng
Stefano Ermon
VLMDiffM
292
7,492
0
06 Oct 2020
A Simple Framework for Contrastive Learning of Visual Representations
A Simple Framework for Contrastive Learning of Visual Representations
Ting-Li Chen
Simon Kornblith
Mohammad Norouzi
Geoffrey E. Hinton
SSL
387
18,866
0
13 Feb 2020
The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
The Unreasonable Effectiveness of Deep Features as a Perceptual Metric
Richard Y. Zhang
Phillip Isola
Alexei A. Efros
Eli Shechtman
Oliver Wang
EGVM
384
11,920
0
11 Jan 2018
Microsoft COCO: Common Objects in Context
Microsoft COCO: Common Objects in Context
Nayeon Lee
Michael Maire
Serge J. Belongie
Lubomir Bourdev
Ross B. Girshick
James Hays
Pietro Perona
Deva Ramanan
C. L. Zitnick
Piotr Dollár
ObjD
434
43,832
0
01 May 2014
1