ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2401.11708
  4. Cited By
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and
  Generating with Multimodal LLMs

Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs

22 January 2024
Ling Yang
Zhaochen Yu
Chenlin Meng
Minkai Xu
Stefano Ermon
Bin Cui
    CoGe
    DiffM
ArXivPDFHTML

Papers citing "Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMs"

47 / 97 papers shown
Title
Resolving Multi-Condition Confusion for Finetuning-Free Personalized
  Image Generation
Resolving Multi-Condition Confusion for Finetuning-Free Personalized Image Generation
Qihan Huang
Siming Fu
Jinlong Liu
Hao Jiang
Yipeng Yu
Jie Song
33
5
0
26 Sep 2024
ABHINAW: A method for Automatic Evaluation of Typography within
  AI-Generated Images
ABHINAW: A method for Automatic Evaluation of Typography within AI-Generated Images
Abhinaw Jagtap
Nachiket Tapas
R. G. Brajesh
EGVM
28
0
0
18 Sep 2024
MotionCom: Automatic and Motion-Aware Image Composition with LLM and
  Video Diffusion Prior
MotionCom: Automatic and Motion-Aware Image Composition with LLM and Video Diffusion Prior
Weijing Tao
Xiaofeng Yang
Miaomiao Cui
Guosheng Lin
DiffM
26
1
0
16 Sep 2024
Alignment of Diffusion Models: Fundamentals, Challenges, and Future
Alignment of Diffusion Models: Fundamentals, Challenges, and Future
Buhua Liu
Shitong Shao
Bao Li
Lichen Bai
Zhiqiang Xu
Haoyi Xiong
James Kwok
Sumi Helal
Zeke Xie
42
12
0
11 Sep 2024
Draw Like an Artist: Complex Scene Generation with Diffusion Model via
  Composition, Painting, and Retouching
Draw Like an Artist: Complex Scene Generation with Diffusion Model via Composition, Painting, and Retouching
Minghao Liu
Le Zhang
Yingjie Tian
Xiaochao Qu
Luoqi Liu
Ting Liu
DiffM
CoGe
37
2
0
25 Aug 2024
Diffusion-Based Visual Art Creation: A Survey and New Perspectives
Diffusion-Based Visual Art Creation: A Survey and New Perspectives
Bingyuan Wang
Qifeng Chen
Zeyu Wang
49
7
0
22 Aug 2024
MUSES: 3D-Controllable Image Generation via Multi-Modal Agent
  Collaboration
MUSES: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration
Yanbo Ding
Shaobin Zhuang
Kunchang Li
Zhengrong Yue
Yu Qiao
Yali Wang
VGen
32
2
0
20 Aug 2024
Diffusion Model for Planning: A Systematic Literature Review
Diffusion Model for Planning: A Systematic Literature Review
Toshihide Ubukata
Jialong Li
Kenji Tei
DiffM
MedIm
53
6
0
16 Aug 2024
The Fabrication of Reality and Fantasy: Scene Generation with
  LLM-Assisted Prompt Interpretation
The Fabrication of Reality and Fantasy: Scene Generation with LLM-Assisted Prompt Interpretation
Yi Yao
Chan-Feng Hsu
Jhe-Hao Lin
Hongxia Xie
Terence Lin
Yi-Ning Huang
Hong-Han Shuai
Wen-Huang Cheng
DiffM
31
4
0
17 Jul 2024
DreamStory: Open-Domain Story Visualization by LLM-Guided Multi-Subject Consistent Diffusion
DreamStory: Open-Domain Story Visualization by LLM-Guided Multi-Subject Consistent Diffusion
Huiguo He
Huan Yang
Zixi Tuo
Yuan Zhou
Qiuyue Wang
Yuhang Zhang
Zeyu Liu
Wenhao Huang
Hongyang Chao
Jian Yin
DiffM
VGen
62
12
0
17 Jul 2024
Exploring the Potentials and Challenges of Deep Generative Models in
  Product Design Conception
Exploring the Potentials and Challenges of Deep Generative Models in Product Design Conception
Phillip Mueller
Lars Mikelsons
AI4CE
41
1
0
15 Jul 2024
A Text-to-Game Engine for UGC-Based Role-Playing Games
A Text-to-Game Engine for UGC-Based Role-Playing Games
Lei Zhang
Xuezheng Peng
Shuyi Yang
Feiyang Wang
37
1
0
11 Jul 2024
GenArtist: Multimodal LLM as an Agent for Unified Image Generation and
  Editing
GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing
Zhenyu Wang
Aoxue Li
Zhenguo Li
Xihui Liu
MLLM
DiffM
43
25
0
08 Jul 2024
Consistency Flow Matching: Defining Straight Flows with Velocity
  Consistency
Consistency Flow Matching: Defining Straight Flows with Velocity Consistency
Ling Yang
Zixiang Zhang
Zhilong Zhang
Xingchao Liu
Minkai Xu
Wentao Zhang
Chenlin Meng
Stefano Ermon
Bin Cui
44
18
0
02 Jul 2024
LLM4GEN: Leveraging Semantic Representation of LLMs for Text-to-Image
  Generation
LLM4GEN: Leveraging Semantic Representation of LLMs for Text-to-Image Generation
Mushui Liu
Yuhang Ma
Yang Zhen
Jun Dan
Yunlong Yu
Zeng Zhao
Zhipeng Hu
Bai Liu
Changjie Fan
VLM
DiffM
63
13
0
30 Jun 2024
Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language
Auto Cherry-Picker: Learning from High-quality Generative Data Driven by Language
Yicheng Chen
Xiangtai Li
Yining Li
Yanhong Zeng
Jianzong Wu
Xiangyu Zhao
Kai Chen
VLM
DiffM
56
3
0
28 Jun 2024
PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image
  Models
PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image Models
Fanqing Meng
Wenqi Shao
Lixin Luo
Yahong Wang
Yiran Chen
...
Yue Yang
Tianshuo Yang
Kaipeng Zhang
Yu Qiao
Ping Luo
EGVM
41
8
0
17 Jun 2024
VideoTetris: Towards Compositional Text-to-Video Generation
VideoTetris: Towards Compositional Text-to-Video Generation
Ye Tian
Ling Yang
Haotian Yang
Yuan Gao
Yufan Deng
...
Zhaochen Yu
Xin Tao
Pengfei Wan
Di Zhang
Bin Cui
DiffM
VGen
84
15
0
06 Jun 2024
Evaluating Durability: Benchmark Insights into Multimodal Watermarking
Evaluating Durability: Benchmark Insights into Multimodal Watermarking
Jielin Qiu
William Jongwon Han
Xuandong Zhao
Shangbang Long
Christos Faloutsos
Lei Li
65
1
0
06 Jun 2024
AutoStudio: Crafting Consistent Subjects in Multi-turn Interactive Image
  Generation
AutoStudio: Crafting Consistent Subjects in Multi-turn Interactive Image Generation
Junhao Cheng
Xi Lu
Hanhui Li
Khun Loun Zai
Baiqiao Yin
Yuhao Cheng
Yiqiang Yan
Xiaodan Liang
DiffM
VGen
37
10
0
03 Jun 2024
StyleMaster: Towards Flexible Stylized Image Generation with Diffusion
  Models
StyleMaster: Towards Flexible Stylized Image Generation with Diffusion Models
Chengming Xu
Kai Hu
Donghao Luo
Jiangning Zhang
Wei Li
Yanhao Ge
Chengjie Wang
DiffM
37
0
0
24 May 2024
EditWorld: Simulating World Dynamics for Instruction-Following Image
  Editing
EditWorld: Simulating World Dynamics for Instruction-Following Image Editing
Ling Yang
Bo-Wen Zeng
Jiaming Liu
Hong Li
Minghao Xu
Wentao Zhang
Shuicheng Yan
DiffM
39
9
0
23 May 2024
Good Seed Makes a Good Crop: Discovering Secret Seeds in Text-to-Image Diffusion Models
Good Seed Makes a Good Crop: Discovering Secret Seeds in Text-to-Image Diffusion Models
Katherine Xu
Lingzhi Zhang
Jianbo Shi
43
12
0
23 May 2024
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with
  Fine-Grained Chinese Understanding
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
Zhimin Li
Jianwei Zhang
Qin Lin
Jiangfeng Xiong
Yanxin Long
...
Wei Liu
Dingyong Wang
Yong Yang
Jie Jiang
Qinglin Lu
ViT
48
91
0
14 May 2024
Lumina-T2X: Transforming Text into Any Modality, Resolution, and
  Duration via Flow-based Large Diffusion Transformers
Lumina-T2X: Transforming Text into Any Modality, Resolution, and Duration via Flow-based Large Diffusion Transformers
Peng Gao
Le Zhuo
Ziyi Lin
Ruoyi Du
Xu Luo
...
Weicai Ye
He Tong
Jingwen He
Yu Qiao
Hongsheng Li
VGen
37
83
0
09 May 2024
Fake Artificial Intelligence Generated Contents (FAIGC): A Survey of
  Theories, Detection Methods, and Opportunities
Fake Artificial Intelligence Generated Contents (FAIGC): A Survey of Theories, Detection Methods, and Opportunities
Xiaomin Yu
Yezhaohui Wang
Yanfang Chen
Zhen Tao
Dinghao Xi
Shichao Song
Simin Niu
Zhiyu Li
67
8
0
25 Apr 2024
BeyondScene: Higher-Resolution Human-Centric Scene Generation With
  Pretrained Diffusion
BeyondScene: Higher-Resolution Human-Centric Scene Generation With Pretrained Diffusion
Gwanghyun Kim
Hayeon Kim
H. Seo
Dong un Kang
Se Young Chun
43
4
0
06 Apr 2024
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept
  Matching
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
Dongzhi Jiang
Guanglu Song
Xiaoshi Wu
Renrui Zhang
Dazhong Shen
Zhuofan Zong
Yu Liu
Hongsheng Li
VLM
30
20
0
04 Apr 2024
Ground-A-Score: Scaling Up the Score Distillation for Multi-Attribute
  Editing
Ground-A-Score: Scaling Up the Score Distillation for Multi-Attribute Editing
Hangeol Chang
Jinho Chang
Jong Chul Ye
DiffM
37
3
0
20 Mar 2024
VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis
VSTAR: Generative Temporal Nursing for Longer Dynamic Video Synthesis
Yumeng Li
William H. Beluch
M. Keuper
Dan Zhang
Anna Khoreva
DiffM
VGen
81
5
0
20 Mar 2024
DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation
DialogGen: Multi-modal Interactive Dialogue System for Multi-turn Text-to-Image Generation
Minbin Huang
Yanxin Long
Xinchi Deng
Ruihang Chu
Jiangfeng Xiong
Xiaodan Liang
Hong Cheng
Qinglin Lu
Wei Liu
MLLM
EGVM
65
8
0
13 Mar 2024
Distribution-Aware Data Expansion with Diffusion Models
Distribution-Aware Data Expansion with Diffusion Models
Haowei Zhu
Ling Yang
Jun-Hai Yong
Hongzhi Yin
Jiawei Jiang
Meng Xiao
Wentao Zhang
Bin Wang
35
3
0
11 Mar 2024
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
Xiwei Hu
Rui Wang
Yixiao Fang
Bin-Bin Fu
Pei Cheng
Gang Yu
VLM
57
70
0
08 Mar 2024
SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code
SceneCraft: An LLM Agent for Synthesizing 3D Scene as Blender Code
Ziniu Hu
Ahmet Iscen
Aashi Jain
Thomas Kipf
Yisong Yue
David A. Ross
Cordelia Schmid
Alireza Fathi
LLMAG
42
24
0
02 Mar 2024
Retrieval-Augmented Generation for AI-Generated Content: A Survey
Retrieval-Augmented Generation for AI-Generated Content: A Survey
Penghao Zhao
Hailin Zhang
Qinhan Yu
Zhengren Wang
Yunteng Geng
Fangcheng Fu
Ling Yang
Wentao Zhang
Jie Jiang
Bin Cui
3DV
115
228
0
29 Feb 2024
Structure-Guided Adversarial Training of Diffusion Models
Structure-Guided Adversarial Training of Diffusion Models
Ling Yang
Haotian Qian
Zhilong Zhang
Jingwei Liu
Bin Cui
25
10
0
27 Feb 2024
Contextualized Diffusion Models for Text-Guided Image and Video
  Generation
Contextualized Diffusion Models for Text-Guided Image and Video Generation
Ling Yang
Zhilong Zhang
Zhaochen Yu
Jingwei Liu
Minkai Xu
Stefano Ermon
Bin Cui
41
4
0
26 Feb 2024
RealCompo: Balancing Realism and Compositionality Improves Text-to-Image
  Diffusion Models
RealCompo: Balancing Realism and Compositionality Improves Text-to-Image Diffusion Models
Xinchen Zhang
Ling Yang
Yaqi Cai
Zhaochen Yu
Kai-Ni Wang
...
Ye Tian
Minkai Xu
Yong Tang
Yujiu Yang
Bin Cui
DiffM
34
5
0
20 Feb 2024
MuLan: Multimodal-LLM Agent for Progressive and Interactive Multi-Object
  Diffusion
MuLan: Multimodal-LLM Agent for Progressive and Interactive Multi-Object Diffusion
Sen Li
Ruochen Wang
Cho-Jui Hsieh
Minhao Cheng
Tianyi Zhou
MLLM
LM&Ro
40
3
0
20 Feb 2024
MM-LLMs: Recent Advances in MultiModal Large Language Models
MM-LLMs: Recent Advances in MultiModal Large Language Models
Duzhen Zhang
Yahan Yu
Jiahua Dong
Chenxing Li
Dan Su
Chenhui Chu
Dong Yu
OffRL
LRM
52
179
0
24 Jan 2024
KOALA: Empirical Lessons Toward Memory-Efficient and Fast Diffusion
  Models for Text-to-Image Synthesis
KOALA: Empirical Lessons Toward Memory-Efficient and Fast Diffusion Models for Text-to-Image Synthesis
Youngwan Lee
Kwanyong Park
Yoorhim Cho
Yong-Ju Lee
Sung Ju Hwang
VLM
27
3
0
07 Dec 2023
Self-correcting LLM-controlled Diffusion Models
Self-correcting LLM-controlled Diffusion Models
Tsung-Han Wu
Long Lian
Joseph E. Gonzalez
Boyi Li
Trevor Darrell
62
53
0
27 Nov 2023
IPDreamer: Appearance-Controllable 3D Object Generation with Complex
  Image Prompts
IPDreamer: Appearance-Controllable 3D Object Generation with Complex Image Prompts
Bo-Wen Zeng
Shanglin Li
Yutang Feng
Ling Yang
Hong Li
...
Conghui He
Wentao Zhang
Jianzhuang Liu
Baochang Zhang
Shuicheng Yan
DiffM
32
1
0
09 Oct 2023
Training-Free Layout Control with Cross-Attention Guidance
Training-Free Layout Control with Cross-Attention Guidance
Minghao Chen
Iro Laina
Andrea Vedaldi
DiffM
135
222
0
06 Apr 2023
GLM-130B: An Open Bilingual Pre-trained Model
GLM-130B: An Open Bilingual Pre-trained Model
Aohan Zeng
Xiao Liu
Zhengxiao Du
Zihan Wang
Hanyu Lai
...
Jidong Zhai
Wenguang Chen
Peng-Zhen Zhang
Yuxiao Dong
Jie Tang
BDL
LRM
250
1,073
0
05 Oct 2022
Diffusion Models: A Comprehensive Survey of Methods and Applications
Diffusion Models: A Comprehensive Survey of Methods and Applications
Ling Yang
Zhilong Zhang
Yingxia Shao
Shenda Hong
Runsheng Xu
Yue Zhao
Wentao Zhang
Bin Cui
Ming-Hsuan Yang
DiffM
MedIm
224
1,304
0
02 Sep 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
313
11,953
0
04 Mar 2022
Previous
12