ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.15393
  4. Cited By
LayoutGPT: Compositional Visual Planning and Generation with Large
  Language Models

LayoutGPT: Compositional Visual Planning and Generation with Large Language Models

24 May 2023
Weixi Feng
Wanrong Zhu
Tsu-jui Fu
Varun Jampani
Arjun Reddy Akula
Xuehai He
Sugato Basu
Qing Guo
William Yang Wang
    MLLM
ArXivPDFHTML

Papers citing "LayoutGPT: Compositional Visual Planning and Generation with Large Language Models"

46 / 146 papers shown
Title
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment
Xiwei Hu
Rui Wang
Yixiao Fang
Bin-Bin Fu
Pei Cheng
Gang Yu
VLM
59
72
0
08 Mar 2024
Discriminative Probing and Tuning for Text-to-Image Generation
Discriminative Probing and Tuning for Text-to-Image Generation
Leigang Qu
Wenjie Wang
Yongqi Li
Hanwang Zhang
Liqiang Nie
Tat-Seng Chua
44
7
0
07 Mar 2024
RealCompo: Balancing Realism and Compositionality Improves Text-to-Image
  Diffusion Models
RealCompo: Balancing Realism and Compositionality Improves Text-to-Image Diffusion Models
Xinchen Zhang
Ling Yang
Yaqi Cai
Zhaochen Yu
Kai-Ni Wang
...
Ye Tian
Minkai Xu
Yong Tang
Yujiu Yang
Bin Cui
DiffM
34
5
0
20 Feb 2024
MuLan: Multimodal-LLM Agent for Progressive and Interactive Multi-Object
  Diffusion
MuLan: Multimodal-LLM Agent for Progressive and Interactive Multi-Object Diffusion
Sen Li
Ruochen Wang
Cho-Jui Hsieh
Minhao Cheng
Tianyi Zhou
MLLM
LM&Ro
48
3
0
20 Feb 2024
GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided
  Generative Gaussian Splatting
GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting
Xiaoyu Zhou
Xingjian Ran
Yajiao Xiong
Jinlin He
Zhiwei Lin
Yongtao Wang
Deqing Sun
Ming-Hsuan Yang
3DGS
27
54
0
11 Feb 2024
MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis
MIGC: Multi-Instance Generation Controller for Text-to-Image Synthesis
Dewei Zhou
You Li
Fan Ma
Zongxin Yang
Yi Yang
DiffM
25
57
0
08 Feb 2024
InstructScene: Instruction-Driven 3D Indoor Scene Synthesis with
  Semantic Graph Prior
InstructScene: Instruction-Driven 3D Indoor Scene Synthesis with Semantic Graph Prior
Chenguo Lin
Yadong Mu
3DV
22
32
0
07 Feb 2024
InstanceDiffusion: Instance-level Control for Image Generation
InstanceDiffusion: Instance-level Control for Image Generation
Xudong Wang
Trevor Darrell
Sai Saketh Rambhatla
Rohit Girdhar
Ishan Misra
VLM
DiffM
34
84
0
05 Feb 2024
Video-LaVIT: Unified Video-Language Pre-training with Decoupled
  Visual-Motional Tokenization
Video-LaVIT: Unified Video-Language Pre-training with Decoupled Visual-Motional Tokenization
Yang Jin
Zhicheng Sun
Kun Xu
Kun Xu
Liwei Chen
...
Yuliang Liu
Di Zhang
Yang Song
Kun Gai
Yadong Mu
VGen
55
42
0
05 Feb 2024
Holodeck: Language Guided Generation of 3D Embodied AI Environments
Holodeck: Language Guided Generation of 3D Embodied AI Environments
Yue Yang
Fan-Yun Sun
Luca Weihs
Eli VanderBilt
Alvaro Herrasti
...
Lingjie Liu
Chris Callison-Burch
Mark Yatskar
Aniruddha Kembhavi
Christopher Clark
LM&Ro
39
78
0
14 Dec 2023
AnyHome: Open-Vocabulary Generation of Structured and Textured 3D Homes
AnyHome: Open-Vocabulary Generation of Structured and Textured 3D Homes
Rao Fu
Zehao Wen
Zichen Liu
Srinath Sridhar
37
30
0
11 Dec 2023
SmartMask: Context Aware High-Fidelity Mask Generation for Fine-grained
  Object Insertion and Layout Control
SmartMask: Context Aware High-Fidelity Mask Generation for Fine-grained Object Insertion and Layout Control
Jaskirat Singh
Jianming Zhang
Qing Liu
Cameron Smith
Zhe-nan Lin
Liang Zheng
DiffM
34
11
0
08 Dec 2023
AVA: Towards Autonomous Visualization Agents through Visual
  Perception-Driven Decision-Making
AVA: Towards Autonomous Visualization Agents through Visual Perception-Driven Decision-Making
Shusen Liu
Haichao Miao
Zhimin Li
M. Olson
Valerio Pascucci
P. Bremer
30
9
0
07 Dec 2023
Visual Program Distillation: Distilling Tools and Programmatic Reasoning
  into Vision-Language Models
Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models
Yushi Hu
Otilia Stretcu
Chun-Ta Lu
Krishnamurthy Viswanathan
Kenji Hata
Enming Luo
Ranjay Krishna
Ariel Fuxman
VLM
LRM
MLLM
52
29
0
05 Dec 2023
Detailed Human-Centric Text Description-Driven Large Scene Synthesis
Detailed Human-Centric Text Description-Driven Large Scene Synthesis
Gwanghyun Kim
Dong un Kang
H. Seo
Hayeon Kim
Se Young Chun
3DV
DiffM
29
2
0
30 Nov 2023
Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following
Ranni: Taming Text-to-Image Diffusion for Accurate Instruction Following
Yutong Feng
Biao Gong
Di Chen
Yujun Shen
Yu Liu
Jingren Zhou
DiffM
34
43
0
28 Nov 2023
MotionZero:Exploiting Motion Priors for Zero-shot Text-to-Video
  Generation
MotionZero:Exploiting Motion Priors for Zero-shot Text-to-Video Generation
Sitong Su
Litao Guo
Lianli Gao
Hengtao Shen
Jingkuan Song
VGen
31
4
0
28 Nov 2023
TextDiffuser-2: Unleashing the Power of Language Models for Text
  Rendering
TextDiffuser-2: Unleashing the Power of Language Models for Text Rendering
Jingye Chen
Yupan Huang
Tengchao Lv
Lei Cui
Qifeng Chen
Furu Wei
DiffM
27
61
0
28 Nov 2023
Self-correcting LLM-controlled Diffusion Models
Self-correcting LLM-controlled Diffusion Models
Tsung-Han Wu
Long Lian
Joseph E. Gonzalez
Boyi Li
Trevor Darrell
70
53
0
27 Nov 2023
FlowZero: Zero-Shot Text-to-Video Synthesis with LLM-Driven Dynamic
  Scene Syntax
FlowZero: Zero-Shot Text-to-Video Synthesis with LLM-Driven Dynamic Scene Syntax
Yu Lu
Linchao Zhu
Hehe Fan
Yi Yang
VGen
DiffM
33
13
0
27 Nov 2023
Paragraph-to-Image Generation with Information-Enriched Diffusion Model
Paragraph-to-Image Generation with Information-Enriched Diffusion Model
Weijia Wu
Zhuang Li
Yefei He
Mike Zheng Shou
Chunhua Shen
Lele Cheng
Yan Li
Tingting Gao
Di Zhang
VLM
141
24
0
24 Nov 2023
Retrieval-Augmented Layout Transformer for Content-Aware Layout
  Generation
Retrieval-Augmented Layout Transformer for Content-Aware Layout Generation
Daichi Horita
Naoto Inoue
Kotaro Kikuchi
Kota Yamaguchi
Kiyoharu Aizawa
3DV
14
12
0
22 Nov 2023
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via
  Blender-Oriented GPT Planning
GPT4Motion: Scripting Physical Motions in Text-to-Video Generation via Blender-Oriented GPT Planning
Jiaxi Lv
Yi Huang
Mingfu Yan
Jiancheng Huang
Jianzhuang Liu
Yifan Liu
Yafei Wen
Xiaoxin Chen
Shifeng Chen
VGen
DiffM
32
23
0
21 Nov 2023
UI Layout Generation with LLMs Guided by UI Grammar
UI Layout Generation with LLMs Guided by UI Grammar
Yuwen Lu
Ziang Tong
Qinyi Zhao
Chengzhi Zhang
Toby Jia-Jun Li
33
11
0
24 Oct 2023
DiagrammerGPT: Generating Open-Domain, Open-Platform Diagrams via LLM
  Planning
DiagrammerGPT: Generating Open-Domain, Open-Platform Diagrams via LLM Planning
Abhaysinh Zala
Han Lin
Jaemin Cho
Mohit Bansal
43
12
0
18 Oct 2023
LLM Blueprint: Enabling Text-to-Image Generation with Complex and
  Detailed Prompts
LLM Blueprint: Enabling Text-to-Image Generation with Complex and Detailed Prompts
Hanan Gani
Shariq Farooq Bhat
Muzammal Naseer
Salman Khan
Peter Wonka
DiffM
44
38
0
16 Oct 2023
Co-NavGPT: Multi-Robot Cooperative Visual Semantic Navigation Using Vision Language Models
Co-NavGPT: Multi-Robot Cooperative Visual Semantic Navigation Using Vision Language Models
Bangguo Yu
Qihao Yuan
Kailai Li
H. Kasaei
Ming Cao
LM&Ro
48
0
0
11 Oct 2023
LLM for SoC Security: A Paradigm Shift
LLM for SoC Security: A Paradigm Shift
Dipayan Saha
Shams Tarek
Katayoon Yahyaei
S. Saha
Jingbo Zhou
M. Tehranipoor
Farimah Farahmandi
63
46
0
09 Oct 2023
LLM-grounded Video Diffusion Models
LLM-grounded Video Diffusion Models
Long Lian
Baifeng Shi
Semih Yavuz
Ye Liu
Boyi Li
DiffM
25
54
0
29 Sep 2023
Guiding Instruction-based Image Editing via Multimodal Large Language
  Models
Guiding Instruction-based Image Editing via Multimodal Large Language Models
Johannes Frey
Wenze Hu
Xianzhi Du
William Yang Wang
Yinfei Yang
Zhe Gan
40
89
0
29 Sep 2023
VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided
  Planning
VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning
Han Lin
Abhaysinh Zala
Jaemin Cho
Joey Tianyi Zhou
LM&Ro
VGen
DiffM
51
74
0
26 Sep 2023
LayoutNUWA: Revealing the Hidden Layout Expertise of Large Language
  Models
LayoutNUWA: Revealing the Hidden Layout Expertise of Large Language Models
Zecheng Tang
Chenfei Wu
Juntao Li
Nan Duan
3DV
28
9
0
18 Sep 2023
NExT-GPT: Any-to-Any Multimodal LLM
NExT-GPT: Any-to-Any Multimodal LLM
Shengqiong Wu
Hao Fei
Leigang Qu
Wei Ji
Tat-Seng Chua
MLLM
46
458
0
11 Sep 2023
BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained
  Diffusion
BoxDiff: Text-to-Image Synthesis with Training-Free Box-Constrained Diffusion
Jinheng Xie
Yuexiang Li
Yawen Huang
Haozhe Liu
Wentian Zhang
Yefeng Zheng
Mike Zheng Shou
DiffM
51
193
0
20 Jul 2023
Grounded Text-to-Image Synthesis with Attention Refocusing
Grounded Text-to-Image Synthesis with Attention Refocusing
Quynh Phung
Songwei Ge
Jia-Bin Huang
DiffM
36
104
0
08 Jun 2023
VisorGPT: Learning Visual Prior via Generative Pre-Training
VisorGPT: Learning Visual Prior via Generative Pre-Training
Jinheng Xie
Kai Ye
Yudong Li
Yuexiang Li
Kevin Qinghong Lin
Yefeng Zheng
Linlin Shen
Mike Zheng Shou
ViT
119
8
0
23 May 2023
LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image
  Diffusion Models with Large Language Models
LLM-grounded Diffusion: Enhancing Prompt Understanding of Text-to-Image Diffusion Models with Large Language Models
Long Lian
Boyi Li
Adam Yala
Trevor Darrell
43
152
0
23 May 2023
LLMScore: Unveiling the Power of Large Language Models in Text-to-Image
  Synthesis Evaluation
LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation
Yujie Lu
Xianjun Yang
Xiujun Li
Qing Guo
William Yang Wang
EGVM
52
73
0
18 May 2023
Training-Free Layout Control with Cross-Attention Guidance
Training-Free Layout Control with Cross-Attention Guidance
Minghao Chen
Iro Laina
Andrea Vedaldi
DiffM
135
221
0
06 Apr 2023
LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer
LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer
Ning Yu
Chia-Chih Chen
Zeyuan Chen
Rui Meng
Ganglu Wu
P. Josel
Juan Carlos Niebles
Caiming Xiong
Ran Xu
ViT
DiffM
24
7
0
19 Dec 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
345
12,003
0
04 Mar 2022
DALL-Eval: Probing the Reasoning Skills and Social Biases of
  Text-to-Image Generation Models
DALL-Eval: Probing the Reasoning Skills and Social Biases of Text-to-Image Generation Models
Jaemin Cho
Abhaysinh Zala
Joey Tianyi Zhou
ViT
145
170
0
08 Feb 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
398
8,559
0
28 Jan 2022
Multitask Prompted Training Enables Zero-Shot Task Generalization
Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh
Albert Webson
Colin Raffel
Stephen H. Bach
Lintang Sutawika
...
T. Bers
Stella Biderman
Leo Gao
Thomas Wolf
Alexander M. Rush
LRM
215
1,661
0
15 Oct 2021
An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA
An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA
Zhengyuan Yang
Zhe Gan
Jianfeng Wang
Xiaowei Hu
Yumao Lu
Zicheng Liu
Lijuan Wang
180
402
0
10 Sep 2021
End-to-End Optimization of Scene Layout
End-to-End Optimization of Scene Layout
Andrew F. Luo
Zhoutong Zhang
Jiajun Wu
J. Tenenbaum
3DV
59
64
0
23 Jul 2020
Previous
123