ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.05135
  4. Cited By
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment

ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment

8 March 2024
Xiwei Hu
Rui Wang
Yixiao Fang
Bin-Bin Fu
Pei Cheng
Gang Yu
    VLM
ArXiv (abs)PDFHTML

Papers citing "ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment"

50 / 74 papers shown
Title
Show-o2: Improved Native Unified Multimodal Models
Show-o2: Improved Native Unified Multimodal Models
Jinheng Xie
Zhenheng Yang
Mike Zheng Shou
VGen
44
0
0
18 Jun 2025
Prompt-Guided Latent Diffusion with Predictive Class Conditioning for 3D Prostate MRI Generation
Prompt-Guided Latent Diffusion with Predictive Class Conditioning for 3D Prostate MRI Generation
Emerson P. Grabke
Masoom A. Haider
Babak Taati
MedIm
48
0
0
11 Jun 2025
OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation
OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation
Jingjing Chang
Yixiao Fang
Peng Xing
Shuhan Wu
Wei Cheng
Rui Wang
Xianfang Zeng
Gang Yu
H. Chen
EGVMVLM
30
0
0
09 Jun 2025
A Comprehensive Study of Decoder-Only LLMs for Text-to-Image Generation
Andrew Z. Wang
Songwei Ge
Tero Karras
Ming-Yu Liu
Yogesh Balaji
32
0
0
09 Jun 2025
FocusDiff: Advancing Fine-Grained Text-Image Alignment for Autoregressive Visual Generation through RL
FocusDiff: Advancing Fine-Grained Text-Image Alignment for Autoregressive Visual Generation through RL
Kaihang Pan
Wendong Bu
Y. Wu
Yang Wu
Kai Shen
Yunfei Li
Hang Zhao
Juncheng Billy Li
Siliang Tang
Yueting Zhuang
37
0
0
05 Jun 2025
ANT: Adaptive Neural Temporal-Aware Text-to-Motion Model
ANT: Adaptive Neural Temporal-Aware Text-to-Motion Model
Wenshuo Chen
Kuimou Yu
Haozhe Jia
Kaishen Yuan
Bowen Tian
Songning Lai
Hongru Xiao
Erhang Zhang
Lei Wang
Yutao Yue
DiffMVGen
73
0
0
03 Jun 2025
ComposeAnything: Composite Object Priors for Text-to-Image Generation
ComposeAnything: Composite Object Priors for Text-to-Image Generation
Zeeshan Khan
Shizhe Chen
Cordelia Schmid
DiffMCoGe
52
0
0
30 May 2025
Draw ALL Your Imagine: A Holistic Benchmark and Agent Framework for Complex Instruction-based Image Generation
Draw ALL Your Imagine: A Holistic Benchmark and Agent Framework for Complex Instruction-based Image Generation
Yucheng Zhou
Jiahao Yuan
Qianning Wang
EGVM
30
0
0
30 May 2025
R2I-Bench: Benchmarking Reasoning-Driven Text-to-Image Generation
R2I-Bench: Benchmarking Reasoning-Driven Text-to-Image Generation
Kaijie Chen
Zihao Lin
Zhiyang Xu
Ying Shen
Yuguang Yao
Joy Rimchala
Jiaxin Zhang
Lifu Huang
EGVMLRM
67
0
0
29 May 2025
VCapsBench: A Large-scale Fine-grained Benchmark for Video Caption Quality Evaluation
VCapsBench: A Large-scale Fine-grained Benchmark for Video Caption Quality Evaluation
Shi-Xue Zhang
Hongfa Wang
Duojun Huang
Xin Li
Xiaobin Zhu
Xu-Cheng Yin
CoGe
63
0
0
29 May 2025
Diffusion Sampling Path Tells More: An Efficient Plug-and-Play Strategy for Sample Filtering
Diffusion Sampling Path Tells More: An Efficient Plug-and-Play Strategy for Sample Filtering
Sixian Wang
Zhiwei Tang
Tsung-Hui Chang
DiffM
22
0
0
29 May 2025
HiDream-I1: A High-Efficient Image Generative Foundation Model with Sparse Diffusion Transformer
HiDream-I1: A High-Efficient Image Generative Foundation Model with Sparse Diffusion Transformer
Qi Cai
Jingwen Chen
Yang Chen
Yehao Li
Fuchen Long
...
Rui Tian
Siyu Wang
Bo Zhao
Ting Yao
Tao Mei
VLM
27
0
0
28 May 2025
RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction
RICO: Improving Accuracy and Completeness in Image Recaptioning via Visual Reconstruction
Yuchi Wang
Yishuo Cai
Shuhuai Ren
Sihan Yang
Linli Yao
Yuanxin Liu
Y. Zhang
Pengfei Wan
Xu Sun
VLM
62
0
0
28 May 2025
Thinking with Generated Images
Thinking with Generated Images
Ethan Chern
Zhulin Hu
Steffi Chern
Siqi Kou
Jiadi Su
Yan Ma
Zhijie Deng
Pengfei Liu
LRM
63
1
0
28 May 2025
Memory-Efficient Visual Autoregressive Modeling with Scale-Aware KV Cache Compression
Memory-Efficient Visual Autoregressive Modeling with Scale-Aware KV Cache Compression
Kunjun Li
Zigeng Chen
Cheng-Yen Yang
Jenq-Neng Hwang
91
0
0
26 May 2025
FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities
FUDOKI: Discrete Flow-based Unified Understanding and Generation via Kinetic-Optimal Velocities
Jin Wang
Yao Lai
Aoxue Li
Shifeng Zhang
Jiacheng Sun
Ning Kang
Chengyue Wu
Zhenguo Li
Ping Luo
72
2
0
26 May 2025
MMIG-Bench: Towards Comprehensive and Explainable Evaluation of Multi-Modal Image Generation Models
MMIG-Bench: Towards Comprehensive and Explainable Evaluation of Multi-Modal Image Generation Models
Hang Hua
Ziyun Zeng
Yizhi Song
Yunlong Tang
Liu He
Daniel G. Aliaga
Wei Xiong
Jiebo Luo
EGVM
88
0
0
26 May 2025
OmniGenBench: A Benchmark for Omnipotent Multimodal Generation across 50+ Tasks
OmniGenBench: A Benchmark for Omnipotent Multimodal Generation across 50+ Tasks
Jiayu Wang
Yang Jiao
Yue Yu
Tianwen Qian
Shaoxiang Chen
Jingjing Chen
Yu Jiang
MLLMLM&MAELM
110
0
0
24 May 2025
Co-Reinforcement Learning for Unified Multimodal Understanding and Generation
Co-Reinforcement Learning for Unified Multimodal Understanding and Generation
Jingjing Jiang
Chongjie Si
Jun Luo
Hanwang Zhang
Chao Ma
186
0
0
23 May 2025
DetailMaster: Can Your Text-to-Image Model Handle Long Prompts?
DetailMaster: Can Your Text-to-Image Model Handle Long Prompts?
Qirui Jiao
Daoyuan Chen
Yilun Huang
Xika Lin
Ying Shen
Yaliang Li
VLM
66
0
0
22 May 2025
MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO
MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO
Yicheng Xiao
Lin Song
Yukang Chen
Yingmin Luo
Yuxin Chen
Yukang Gan
Wei Huang
Xiu Li
Xiaojuan Qi
Ying Shan
LRM
107
5
0
19 May 2025
Towards Self-Improvement of Diffusion Models via Group Preference Optimization
Towards Self-Improvement of Diffusion Models via Group Preference Optimization
Renjie Chen
Wenfeng Lin
Yichen Zhang
Jiangchuan Wei
Boyuan Liu
Chao Feng
Jiao Ran
Mingyu Guo
64
0
0
16 May 2025
Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis
Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis
Bingda Tang
Boyang Zheng
Xichen Pan
Sayak Paul
Saining Xie
78
0
0
15 May 2025
Controllable Image Colorization with Instance-aware Texts and Masks
Controllable Image Colorization with Instance-aware Texts and Masks
Yanru An
Ling Gui
Qiang Hu
Chunlei Cai
Tianxiao Ye
Xiaoyun Zhang
Yanfeng Wang
DiffM
54
0
0
13 May 2025
MCCD: Multi-Agent Collaboration-based Compositional Diffusion for Complex Text-to-Image Generation
MCCD: Multi-Agent Collaboration-based Compositional Diffusion for Complex Text-to-Image Generation
Mingcheng Li
Xiaolu Hou
Ziyang Liu
Dingkang Yang
Ziyun Qian
Jiawei Chen
Jinjie Wei
Yiheng Jiang
Qingyao Xu
Li Zhang
DiffM
488
0
0
05 May 2025
SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing
SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing
Ming Li
Xin Gu
Fan Chen
X. Xing
Longyin Wen
Chong Chen
Sijie Zhu
DiffM
263
2
0
05 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Wei Wei
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
...
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
303
1
0
05 May 2025
CoherenDream: Boosting Holistic Text Coherence in 3D Generation via Multimodal Large Language Models Feedback
CoherenDream: Boosting Holistic Text Coherence in 3D Generation via Multimodal Large Language Models Feedback
Chenhan Jiang
Yihan Zeng
Hang Xu
Dit-Yan Yeung
82
0
0
28 Apr 2025
ESPLoRA: Enhanced Spatial Precision with Low-Rank Adaption in Text-to-Image Diffusion Models for High-Definition Synthesis
ESPLoRA: Enhanced Spatial Precision with Low-Rank Adaption in Text-to-Image Diffusion Models for High-Definition Synthesis
Andrea Rigo
Luca Stornaiuolo
Mauro Martino
Bruno Lepri
N. Sebe
85
0
0
18 Apr 2025
SimpleAR: Pushing the Frontier of Autoregressive Visual Generation through Pretraining, SFT, and RL
SimpleAR: Pushing the Frontier of Autoregressive Visual Generation through Pretraining, SFT, and RL
Junke Wang
Zhi Tian
Xinyu Wang
Xinyu Zhang
Weilin Huang
Zuxuan Wu
Yu Jiang
VGen
165
17
0
15 Apr 2025
Taming Consistency Distillation for Accelerated Human Image Animation
Taming Consistency Distillation for Accelerated Human Image Animation
Xinyu Wang
Shiwei Zhang
Hangjie Yuan
Yujie Wei
Yuanxing Zhang
Changxin Gao
Yuehuan Wang
Nong Sang
VGen
81
1
0
15 Apr 2025
Head-Aware KV Cache Compression for Efficient Visual Autoregressive Modeling
Head-Aware KV Cache Compression for Efficient Visual Autoregressive Modeling
Ziran Qin
Youru Lv
Mingbao Lin
Zeren Zhang
Danping Zou
Weiyao Lin
VLM
88
1
0
12 Apr 2025
PixelFlow: Pixel-Space Generative Models with Flow
PixelFlow: Pixel-Space Generative Models with Flow
Shoufa Chen
Chongjian Ge
Shilong Zhang
Peize Sun
Ping Luo
VLMDRL
71
0
0
10 Apr 2025
Have we unified image generation and understanding yet? An empirical study of GPT-4o's image generation ability
Have we unified image generation and understanding yet? An empirical study of GPT-4o's image generation ability
Ning Li
Jingran Zhang
Justin Cui
MLLM
180
3
0
09 Apr 2025
Transfer between Modalities with MetaQueries
Transfer between Modalities with MetaQueries
Xichen Pan
Satya Narayan Shukla
Aashu Singh
Zhuokai Zhao
Shlok Kumar Mishra
...
Jiuhai Chen
Kunpeng Li
F. Xu
Ji Hou
Saining Xie
DiffM
102
21
0
08 Apr 2025
VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning
VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning
Xianwei Zhuang
Yuxin Xie
Yufan Deng
Dongchao Yang
Liming Liang
Jinghan Ru
Yuguo Yin
Yuexian Zou
157
5
0
03 Apr 2025
Training-free Dense-Aligned Diffusion Guidance for Modular Conditional Image Synthesis
Training-free Dense-Aligned Diffusion Guidance for Modular Conditional Image Synthesis
Zixuan Wang
Duo Peng
Feng Chen
Yue Yang
Yinjie Lei
DiffM
140
0
0
02 Apr 2025
Implicit Bias Injection Attacks against Text-to-Image Diffusion Models
Implicit Bias Injection Attacks against Text-to-Image Diffusion Models
Huayang Huang
Xiangye Jin
Jiaxu Miao
Yu Wu
88
0
0
02 Apr 2025
Geometrical Properties of Text Token Embeddings for Strong Semantic Binding in Text-to-Image Generation
Geometrical Properties of Text Token Embeddings for Strong Semantic Binding in Text-to-Image Generation
H. Seo
Junseo Bang
Haechang Lee
Joohoon Lee
Byung Hyun Lee
Se Young Chun
119
0
0
29 Mar 2025
Spatial Transport Optimization by Repositioning Attention Map for Training-Free Text-to-Image Synthesis
Spatial Transport Optimization by Repositioning Attention Map for Training-Free Text-to-Image Synthesis
Woojung Han
Yeonkyung Lee
Chanyoung Kim
Kwanghyun Park
Seong Jae Hwang
DiffM
94
0
0
28 Mar 2025
FireEdit: Fine-grained Instruction-based Image Editing via Region-aware Vision Language Model
FireEdit: Fine-grained Instruction-based Image Editing via Region-aware Vision Language Model
Zhiqiang Zhang
Jia-Nan Li
Zunnan Xu
Hanhui Li
Yiji Cheng
Fa-Ting Hong
Qin Lin
Qinglin Lu
Xiaodan Liang
DiffM
140
2
0
25 Mar 2025
Towards Transformer-Based Aligned Generation with Self-Coherence Guidance
Towards Transformer-Based Aligned Generation with Self-Coherence Guidance
Shulei Wang
Wang Lin
Hai Huang
Hanting Wang
Sihang Cai
...
Tao Jin
Jingyuan Chen
Jiacheng Sun
Jieming Zhu
Zhou Zhao
DiffM
128
3
0
22 Mar 2025
POSTA: A Go-to Framework for Customized Artistic Poster Generation
POSTA: A Go-to Framework for Customized Artistic Poster Generation
Haoyu Chen
Xiaojie Xu
Wenbo Li
Jingjing Ren
Tian Ye
Songhua Liu
Ying Chen
Lei Zhu
Xinchao Wang
DiffM
107
7
0
19 Mar 2025
Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion Transformers via In-Context Reflection
Reflect-DiT: Inference-Time Scaling for Text-to-Image Diffusion Transformers via In-Context Reflection
Shufan Li
Konstantinos Kallidromitis
Akash Gokul
Arsh Koneru
Yusuke Kato
Kazuki Kozuka
Aditya Grover
VLM
138
5
0
15 Mar 2025
NAMI: Efficient Image Generation via Progressive Rectified Flow Transformers
Yuhang Ma
Bo Cheng
Shanyuan Liu
Ao Ma
Xiaoyu Wu
Liebucha Wu
Dawei Leng
Yuhui Yin
110
0
0
12 Mar 2025
WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation
WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation
Yuwei Niu
Munan Ning
Mengren Zheng
Weiyang Jin
Bin Lin
...
Jiaqi Liao
Chaoran Feng
Kunpeng Ning
Bin Zhu
Li Yuan
EGVM
147
26
0
10 Mar 2025
LDGen: Enhancing Text-to-Image Synthesis via Large Language Model-Driven Language Representation
LDGen: Enhancing Text-to-Image Synthesis via Large Language Model-Driven Language Representation
Pengzhi Li
Pengfei Yu
Zide Liu
Wei He
Xuhao Pan
Xudong Rao
Tao Wei
Wei Chen
VLM
157
0
0
25 Feb 2025
CHATS: Combining Human-Aligned Optimization and Test-Time Sampling for Text-to-Image Generation
CHATS: Combining Human-Aligned Optimization and Test-Time Sampling for Text-to-Image Generation
Minghao Fu
Guo-Hua Wang
Liangfu Cao
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
79
0
0
18 Feb 2025
HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation
HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation
L. Yang
Xinchen Zhang
Ye Tian
Chenming Shang
Minghao Xu
Wentao Zhang
Tengjiao Wang
147
4
0
17 Feb 2025
Rare-to-Frequent: Unlocking Compositional Generation Power of Diffusion Models on Rare Concepts with LLM Guidance
Rare-to-Frequent: Unlocking Compositional Generation Power of Diffusion Models on Rare Concepts with LLM Guidance
Dongmin Park
Sebin Kim
Taehong Moon
Minkyu Kim
Kangwook Lee
Jaewoong Cho
DiffMCoGe
117
5
0
08 Jan 2025
12
Next