ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2401.11708
  4. Cited By
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and
  Generating with Multimodal LLMs

Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs

22 January 2024
Ling Yang
Zhaochen Yu
Chenlin Meng
Minkai Xu
Stefano Ermon
Bin Cui
    CoGe
    DiffM
ArXivPDFHTML

Papers citing "Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs"

50 / 97 papers shown
Title
Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis
Exploring the Deep Fusion of Large Language Models and Diffusion Transformers for Text-to-Image Synthesis
Bingda Tang
Boyang Zheng
Xichen Pan
Sayak Paul
Saining Xie
29
0
0
15 May 2025
MCCD: Multi-Agent Collaboration-based Compositional Diffusion for Complex Text-to-Image Generation
MCCD: Multi-Agent Collaboration-based Compositional Diffusion for Complex Text-to-Image Generation
Mingcheng Li
Xiaolu Hou
Ziyang Liu
Dingkang Yang
Ziyun Qian
Jiawei Chen
Jinjie Wei
Y. Jiang
Qingyao Xu
Li Zhang
DiffM
156
0
0
05 May 2025
Step1X-Edit: A Practical Framework for General Image Editing
Step1X-Edit: A Practical Framework for General Image Editing
S. Liu
Yucheng Han
Peng Xing
Fukun Yin
Rui Wang
...
Yibo Zhu
Binxing Jiao
Xuzhi Zhang
Gang Yu
Daxin Jiang
DiffM
108
3
0
24 Apr 2025
DyST-XL: Dynamic Layout Planning and Content Control for Compositional Text-to-Video Generation
DyST-XL: Dynamic Layout Planning and Content Control for Compositional Text-to-Video Generation
Weijie He
Mushui Liu
Yunlong Yu
Zhao Wang
Chao Wu
DiffM
VGen
64
0
0
21 Apr 2025
Hierarchical and Step-Layer-Wise Tuning of Attention Specialty for Multi-Instance Synthesis in Diffusion Transformers
Hierarchical and Step-Layer-Wise Tuning of Attention Specialty for Multi-Instance Synthesis in Diffusion Transformers
Chunyang Zhang
Zhenhong Sun
Zhicheng Zhang
Junyan Wang
Yu Zhang
Dong Gong
H. Mo
Daoyi Dong
45
0
0
14 Apr 2025
Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization
Training-free Guidance in Text-to-Video Generation via Multimodal Planning and Structured Noise Initialization
Jialu Li
Shoubin Yu
Han Lin
Jaemin Cho
Jaehong Yoon
Joey Tianyi Zhou
DiffM
VGen
50
0
0
11 Apr 2025
Marmot: Multi-Agent Reasoning for Multi-Object Self-Correcting in Improving Image-Text Alignment
Marmot: Multi-Agent Reasoning for Multi-Object Self-Correcting in Improving Image-Text Alignment
Jiayang Sun
H. Wang
Jie Cao
Huaibo Huang
Ran He
DiffM
73
0
0
10 Apr 2025
RAGME: Retrieval Augmented Video Generation for Enhanced Motion Realism
RAGME: Retrieval Augmented Video Generation for Enhanced Motion Realism
E. Peruzzo
Dejia Xu
Xingqian Xu
Humphrey Shi
N. Sebe
DiffM
VGen
59
0
0
09 Apr 2025
Storybooth: Training-free Multi-Subject Consistency for Improved Visual Storytelling
Storybooth: Training-free Multi-Subject Consistency for Improved Visual Storytelling
Jaskirat Singh
Junshen Kevin Chen
Jonas Kohler
Michael Cohen
DiffM
VGen
43
0
0
08 Apr 2025
Your Image Generator Is Your New Private Dataset
Your Image Generator Is Your New Private Dataset
Nicolo Resmini
Eugenio Lomurno
Cristian Sbrolli
Matteo Matteucci
28
0
0
06 Apr 2025
TextCrafter: Accurately Rendering Multiple Texts in Complex Visual Scenes
TextCrafter: Accurately Rendering Multiple Texts in Complex Visual Scenes
Nikai Du
Zhennan Chen
Z. Chen
Shan Gao
Xi Chen
Zhengkai Jiang
Jian Yang
Ying Tai
DiffM
43
0
0
30 Mar 2025
Efficient Multi-Instance Generation with Janus-Pro-Dirven Prompt Parsing
Efficient Multi-Instance Generation with Janus-Pro-Dirven Prompt Parsing
Fan Qi
Yu Duan
Changsheng Xu
DiffM
57
0
0
27 Mar 2025
BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation
BizGen: Advancing Article-level Visual Text Rendering for Infographics Generation
Yuyang Peng
Shishi Xiao
Keming Wu
Qisheng Liao
Bohan Chen
Kevin Lin
Danqing Huang
Ji Li
Yuhui Yuan
DiffM
79
1
0
26 Mar 2025
LayerCraft: Enhancing Text-to-Image Generation with CoT Reasoning and Layered Object Integration
LayerCraft: Enhancing Text-to-Image Generation with CoT Reasoning and Layered Object Integration
Yuyao Zhang
Jinghao Li
Yu-Wing Tai
DiffM
64
0
0
25 Mar 2025
IPGO: Indirect Prompt Gradient Optimization for Parameter-Efficient Prompt-level Fine-Tuning on Text-to-Image Models
IPGO: Indirect Prompt Gradient Optimization for Parameter-Efficient Prompt-level Fine-Tuning on Text-to-Image Models
Jianping Ye
Michel Wedel
Kunpeng Zhang
39
0
0
25 Mar 2025
Training-free Diffusion Acceleration with Bottleneck Sampling
Training-free Diffusion Acceleration with Bottleneck Sampling
Ye Tian
Xin Xia
Yuxi Ren
Shanchuan Lin
Xing Wang
Xuefeng Xiao
Yunhai Tong
L. Yang
Bin Cui
60
0
0
24 Mar 2025
LaPIG: Cross-Modal Generation of Paired Thermal and Visible Facial Images
LaPIG: Cross-Modal Generation of Paired Thermal and Visible Facial Images
Leyang Wang
Joice Lin
DiffM
63
0
0
20 Mar 2025
POSTA: A Go-to Framework for Customized Artistic Poster Generation
POSTA: A Go-to Framework for Customized Artistic Poster Generation
Haoyu Chen
Xiaojie Xu
Wenbo Li
Jingjing Ren
Tian Ye
Songhua Liu
Ying Chen
Lei Zhu
Xinchao Wang
DiffM
57
1
0
19 Mar 2025
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
Multimodal Chain-of-Thought Reasoning: A Comprehensive Survey
Yixuan Wang
Shengqiong Wu
Yuyao Zhang
William Yang Wang
Ziwei Liu
Jiebo Luo
Hao Fei
LRM
92
8
0
16 Mar 2025
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing
Rongyao Fang
Chengqi Duan
Kun Wang
Linjiang Huang
Hao Li
...
Xingyu Zeng
R. Zhao
Jifeng Dai
Xihui Liu
Hongsheng Li
MLLM
ReLM
LRM
112
5
0
13 Mar 2025
Investigating and Improving Counter-Stereotypical Action Relation in Text-to-Image Diffusion Models
Sina Malakouti
Adriana Kovashka
EGVM
69
0
0
13 Mar 2025
Long-horizon Visual Instruction Generation with Logic and Attribute Self-reflection
Long-horizon Visual Instruction Generation with Logic and Attribute Self-reflection
Yucheng Suo
Fan Ma
Kaixin Shen
Linchao Zhu
Yi Yang
VLM
52
0
0
12 Mar 2025
Fine-Grained Alignment and Noise Refinement for Compositional Text-to-Image Generation
Amir Mohammad Izadi
Seyed Mohsen Hosseini
Soroush Vafaie Tabar
Ali Abdollahi
Armin Saghafian
M. Baghshah
EGVM
45
0
0
09 Mar 2025
DesignDiffusion: High-Quality Text-to-Design Image Generation with Diffusion Models
Zhendong Wang
Jianmin Bao
Shuyang Gu
Dong Chen
Wengang Zhou
Hao Li
DiffM
53
0
0
03 Mar 2025
ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation
ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation
Yifan Pu
Yiming Zhao
Zhicong Tang
Ruihong Yin
Haoxing Ye
...
Ji Li
Xiu Li
Zheng Lian
Gao Huang
Baining Guo
DiffM
62
2
0
25 Feb 2025
HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation
HermesFlow: Seamlessly Closing the Gap in Multimodal Understanding and Generation
L. Yang
Xinchen Zhang
Ye Tian
Chenming Shang
Minghao Xu
Wentao Zhang
Bin Cui
102
1
0
17 Feb 2025
SketchFlex: Facilitating Spatial-Semantic Coherence in Text-to-Image Generation with Region-Based Sketches
SketchFlex: Facilitating Spatial-Semantic Coherence in Text-to-Image Generation with Region-Based Sketches
Haichuan Lin
Yilin Ye
Jiazhi Xia
Wei Zeng
DiffM
70
0
0
11 Feb 2025
Weak Supervision Dynamic KL-Weighted Diffusion Models Guided by Large Language Models
Weak Supervision Dynamic KL-Weighted Diffusion Models Guided by Large Language Models
Julian Perry
Frank Sanders
Carter Scott
58
0
0
02 Feb 2025
Rare-to-Frequent: Unlocking Compositional Generation Power of Diffusion Models on Rare Concepts with LLM Guidance
Rare-to-Frequent: Unlocking Compositional Generation Power of Diffusion Models on Rare Concepts with LLM Guidance
Dongmin Park
Sebin Kim
Taehong Moon
Minkyu Kim
Kangwook Lee
Jaewoong Cho
DiffM
CoGe
64
2
0
08 Jan 2025
MoTrans: Customized Motion Transfer with Text-driven Video Diffusion
  Models
MoTrans: Customized Motion Transfer with Text-driven Video Diffusion Models
Xiaomin Li
Xu Jia
Qinghe Wang
Haiwen Diao
Mengmeng Ge
Pengxiang Li
You He
Huchuan Lu
VGen
DiffM
68
3
0
02 Dec 2024
Beyond Pixels: Text Enhances Generalization in Real-World Image
  Restoration
Beyond Pixels: Text Enhances Generalization in Real-World Image Restoration
Haoze Sun
W. J. Li
Jiaheng Liu
Kaiwen Zhou
Yongqiang Chen
Yong Guo
Y. Li
Renjing Pei
Long Peng
Yuqing Yang
DiffM
76
1
0
01 Dec 2024
PhyT2V: LLM-Guided Iterative Self-Refinement for Physics-Grounded Text-to-Video Generation
PhyT2V: LLM-Guided Iterative Self-Refinement for Physics-Grounded Text-to-Video Generation
Qiyao Xue
Xiangyu Yin
Boyuan Yang
Wei Gao
DiffM
VGen
80
9
0
30 Nov 2024
Fleximo: Towards Flexible Text-to-Human Motion Video Generation
Fleximo: Towards Flexible Text-to-Human Motion Video Generation
Yuhang Zhang
Yuan Zhou
Zeyu Liu
Yuxuan Cai
Qiuyue Wang
Aidong Men
Huan Yang
VGen
DiffM
84
0
0
29 Nov 2024
SPAgent: Adaptive Task Decomposition and Model Selection for General
  Video Generation and Editing
SPAgent: Adaptive Task Decomposition and Model Selection for General Video Generation and Editing
Rong-Cheng Tu
Wenhao Sun
Zhao Jin
Jingyi Liao
Jiaxing Huang
Dacheng Tao
VGen
DiffM
94
3
0
28 Nov 2024
Type-R: Automatically Retouching Typos for Text-to-Image Generation
Type-R: Automatically Retouching Typos for Text-to-Image Generation
Wataru Shimoda
Naoto Inoue
Daichi Haraguchi
Hayato Mitani
S. Uchida
Kota Yamaguchi
DiffM
99
0
0
27 Nov 2024
AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea
AnyEdit: Mastering Unified High-Quality Image Editing for Any Idea
Qifan Yu
Wei Chow
Zhongqi Yue
Kaihang Pan
Yang Wu
Xiaoyang Wan
Juncheng Billy Li
Siliang Tang
H. Zhang
Yueting Zhuang
DiffM
106
15
0
24 Nov 2024
Token Merging for Training-Free Semantic Binding in Text-to-Image
  Synthesis
Token Merging for Training-Free Semantic Binding in Text-to-Image Synthesis
Taihang Hu
Linxuan Li
Joost van de Weijer
Hongcheng Gao
Fahad Shahbaz Khan
Jian Yang
Ming-Ming Cheng
Kai Wang
Yaxing Wang
DiffM
57
4
0
11 Nov 2024
Region-Aware Text-to-Image Generation via Hard Binding and Soft
  Refinement
Region-Aware Text-to-Image Generation via Hard Binding and Soft Refinement
Zhennan Chen
Yajie Li
Haofan Wang
Z. Chen
Zhengkai Jiang
Jun Yu Li
Qian Wang
Jian Yang
Ying Tai
DiffM
52
8
0
10 Nov 2024
Training-free Regional Prompting for Diffusion Transformers
Training-free Regional Prompting for Diffusion Transformers
Anthony Chen
Jianjin Xu
Wenzhao Zheng
Gaole Dai
Yishuo Wang
Renrui Zhang
Haofan Wang
Shanghang Zhang
VLM
40
2
0
04 Nov 2024
CompGS: Unleashing 2D Compositionality for Compositional Text-to-3D via
  Dynamically Optimizing 3D Gaussians
CompGS: Unleashing 2D Compositionality for Compositional Text-to-3D via Dynamically Optimizing 3D Gaussians
Chongjian Ge
Chenfeng Xu
Yuanfeng Ji
C-T.John Peng
M. Tomizuka
Ping Luo
Mingyu Ding
Varun Jampani
W. Zhan
3DGS
34
4
0
28 Oct 2024
GrounDiT: Grounding Diffusion Transformers via Noisy Patch
  Transplantation
GrounDiT: Grounding Diffusion Transformers via Noisy Patch Transplantation
Phillip Y. Lee
Taehoon Yoon
Minhyuk Sung
46
4
1
27 Oct 2024
CtrlSynth: Controllable Image Text Synthesis for Data-Efficient
  Multimodal Learning
CtrlSynth: Controllable Image Text Synthesis for Data-Efficient Multimodal Learning
Qingqing Cao
Mahyar Najibi
Sachin Mehta
CLIP
DiffM
35
1
0
15 Oct 2024
SGEdit: Bridging LLM with Text2Image Generative Model for Scene
  Graph-based Image Editing
SGEdit: Bridging LLM with Text2Image Generative Model for Scene Graph-based Image Editing
Zhiyuan Zhang
Dongdong Chen
J. Liao
DiffM
26
3
0
15 Oct 2024
Improving Long-Text Alignment for Text-to-Image Diffusion Models
Improving Long-Text Alignment for Text-to-Image Diffusion Models
Luping Liu
Chao Du
Tianyu Pang
Zehan Wang
Chongxuan Li
Dong Xu
VLM
53
4
0
15 Oct 2024
Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective
Evaluating Semantic Variation in Text-to-Image Synthesis: A Causal Perspective
Xiangru Zhu
Penglei Sun
Yaoxian Song
Yanghua Xiao
Zhixu Li
Chengyu Wang
Jun Huang
Bei Yang
Xiaoxiao Xu
EGVM
185
1
0
14 Oct 2024
Semantic Score Distillation Sampling for Compositional Text-to-3D
  Generation
Semantic Score Distillation Sampling for Compositional Text-to-3D Generation
L. Yang
Zixiang Zhang
Junlin Han
Bohan Zeng
Runjia Li
Philip Torr
Wentao Zhang
38
2
0
11 Oct 2024
Trans4D: Realistic Geometry-Aware Transition for Compositional
  Text-to-4D Synthesis
Trans4D: Realistic Geometry-Aware Transition for Compositional Text-to-4D Synthesis
Bohan Zeng
Ling Yang
Siyu Li
Jiaming Liu
Zixiang Zhang
...
Yongzhen Guo
Fu-Yun Wang
Minkai Xu
Stefano Ermon
Wentao Zhang
VGen
AI4CE
28
7
0
09 Oct 2024
IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation
IterComp: Iterative Composition-Aware Feedback Learning from Model Gallery for Text-to-Image Generation
Xinchen Zhang
Ling Yang
Bernard Ghanem
Yaqi Cai
Jiake Xie
Yong Tang
Yujiu Yang
Mengdi Wang
Bin Cui
EGVM
CoGe
44
5
0
09 Oct 2024
VitaGlyph: Vitalizing Artistic Typography with Flexible Dual-branch
  Diffusion Models
VitaGlyph: Vitalizing Artistic Typography with Flexible Dual-branch Diffusion Models
Kailai Feng
Yabo Zhang
Haodong Yu
Zhilong Ji
Jinfeng Bai
Hongzhi Zhang
W. Zuo
DiffM
32
0
0
02 Oct 2024
KnobGen: Controlling the Sophistication of Artwork in Sketch-Based Diffusion Models
KnobGen: Controlling the Sophistication of Artwork in Sketch-Based Diffusion Models
Pouyan Navard
Amin Karimi Monsefi
Mengxi Zhou
Wei-Lun Chao
Alper Yilmaz
R. Ramnath
DiffM
48
2
0
02 Oct 2024
12
Next