ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2404.01197
  4. Cited By
Getting it Right: Improving Spatial Consistency in Text-to-Image Models

Getting it Right: Improving Spatial Consistency in Text-to-Image Models

1 April 2024
Agneet Chatterjee
Gabriela Ben-Melech Stan
Estelle Aflalo
Sayak Paul
Dhruba Ghosh
Tejas Gokhale
Ludwig Schmidt
Hanna Hajishirzi
Vasudev Lal
Chitta Baral
Yezhou Yang
    EGVM
    VLM
ArXivPDFHTML

Papers citing "Getting it Right: Improving Spatial Consistency in Text-to-Image Models"

27 / 27 papers shown
Title
Diffusion Beats Autoregressive: An Evaluation of Compositional Generation in Text-to-Image Models
Diffusion Beats Autoregressive: An Evaluation of Compositional Generation in Text-to-Image Models
Arash Marioriyad
Parham Rezaei
M. Baghshah
M. Rohban
CoGe
392
0
0
30 Oct 2024
Attention Overlap Is Responsible for The Entity Missing Problem in Text-to-image Diffusion Models!
Attention Overlap Is Responsible for The Entity Missing Problem in Text-to-image Diffusion Models!
Arash Marioriyad
Mohammadali Banayeeanzade
Reza Abbasi
M. Rohban
M. Baghshah
DiffM
107
2
0
28 Oct 2024
Information Theoretic Text-to-Image Alignment
Information Theoretic Text-to-Image Alignment
Chao Wang
Giulio Franzese
A. Finamore
Massimo Gallo
Pietro Michiardi
108
0
0
31 May 2024
Multi-LoRA Composition for Image Generation
Multi-LoRA Composition for Image Generation
Ming Zhong
Yelong Shen
Shuohang Wang
Yadong Lu
Yizhu Jiao
Siru Ouyang
Donghan Yu
Jiawei Han
Weizhu Chen
MoMe
57
41
0
26 Feb 2024
Paragraph-to-Image Generation with Information-Enriched Diffusion Model
Paragraph-to-Image Generation with Information-Enriched Diffusion Model
Weijia Wu
Zhuang Li
Yefei He
Mike Zheng Shou
Chunhua Shen
Lele Cheng
Yan Li
Yan Li
Di Zhang
VLM
172
25
0
24 Nov 2023
Controllable Text-to-Image Generation with GPT-4
Controllable Text-to-Image Generation with GPT-4
Tianjun Zhang
Yi Zhang
Vibhav Vineet
Neel Joshi
Xin Eric Wang
DiffM
111
44
0
29 May 2023
LayoutGPT: Compositional Visual Planning and Generation with Large
  Language Models
LayoutGPT: Compositional Visual Planning and Generation with Large Language Models
Weixi Feng
Wanrong Zhu
Tsu-Jui Fu
Varun Jampani
Arjun Reddy Akula
Xuehai He
Sugato Basu
Xinze Wang
William Yang Wang
MLLM
77
173
0
24 May 2023
Unleashing Text-to-Image Diffusion Models for Visual Perception
Unleashing Text-to-Image Diffusion Models for Visual Perception
Wenliang Zhao
Yongming Rao
Zuyan Liu
Benlin Liu
Jie Zhou
Jiwen Lu
ObjD
VLM
MDE
225
229
0
03 Mar 2023
Adding Conditional Control to Text-to-Image Diffusion Models
Adding Conditional Control to Text-to-Image Diffusion Models
Lvmin Zhang
Anyi Rao
Maneesh Agrawala
AI4CE
133
4,106
1
10 Feb 2023
Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image
  Diffusion Models
Attend-and-Excite: Attention-Based Semantic Guidance for Text-to-Image Diffusion Models
Hila Chefer
Yuval Alaluf
Yael Vinker
Lior Wolf
Daniel Cohen-Or
DiffM
102
510
0
31 Jan 2023
Benchmarking Spatial Relationships in Text-to-Image Generation
Benchmarking Spatial Relationships in Text-to-Image Generation
Tejas Gokhale
Hamid Palangi
Besmira Nushi
Vibhav Vineet
Eric Horvitz
Ece Kamar
Chitta Baral
Yezhou Yang
EGVM
72
71
0
20 Dec 2022
Training-Free Structured Diffusion Guidance for Compositional
  Text-to-Image Synthesis
Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis
Weixi Feng
Xuehai He
Tsu-Jui Fu
Varun Jampani
Arjun Reddy Akula
P. Narayana
Sugato Basu
Xinze Wang
William Yang Wang
CoGe
97
315
0
09 Dec 2022
ReCo: Region-Controlled Text-to-Image Generation
ReCo: Region-Controlled Text-to-Image Generation
Zhengyuan Yang
Jianfeng Wang
Zhe Gan
Linjie Li
Kevin Qinghong Lin
...
Nan Duan
Zicheng Liu
Ce Liu
Michael Zeng
Lijuan Wang
DiffM
82
149
0
23 Nov 2022
LAION-5B: An open large-scale dataset for training next generation
  image-text models
LAION-5B: An open large-scale dataset for training next generation image-text models
Christoph Schuhmann
Romain Beaumont
Richard Vencu
Cade Gordon
Ross Wightman
...
Srivatsa Kundurthy
Katherine Crowson
Ludwig Schmidt
R. Kaczmarczyk
J. Jitsev
VLM
MLLM
CLIP
170
3,444
0
16 Oct 2022
PaLI: A Jointly-Scaled Multilingual Language-Image Model
PaLI: A Jointly-Scaled Multilingual Language-Image Model
Xi Chen
Tianlin Li
Soravit Changpinyo
A. Piergiovanni
Piotr Padlewski
...
Andreas Steiner
A. Angelova
Xiaohua Zhai
N. Houlsby
Radu Soricut
MLLM
VLM
83
720
0
14 Sep 2022
Prompt-to-Prompt Image Editing with Cross Attention Control
Prompt-to-Prompt Image Editing with Cross Attention Control
Amir Hertz
Ron Mokady
J. Tenenbaum
Kfir Aberman
Yael Pritch
Daniel Cohen-Or
DiffM
170
1,768
0
02 Aug 2022
Compositional Visual Generation with Composable Diffusion Models
Compositional Visual Generation with Composable Diffusion Models
Nan Liu
Shuang Li
Yilun Du
Antonio Torralba
J. Tenenbaum
DiffM
CoGe
167
519
0
03 Jun 2022
CogView2: Faster and Better Text-to-Image Generation via Hierarchical
  Transformers
CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers
Ming Ding
Wendi Zheng
Wenyi Hong
Jie Tang
VLM
94
331
0
28 Apr 2022
Hierarchical Text-Conditional Image Generation with CLIP Latents
Hierarchical Text-Conditional Image Generation with CLIP Latents
Aditya A. Ramesh
Prafulla Dhariwal
Alex Nichol
Casey Chu
Mark Chen
VLM
DiffM
377
6,859
0
13 Apr 2022
High-Resolution Image Synthesis with Latent Diffusion Models
High-Resolution Image Synthesis with Latent Diffusion Models
Robin Rombach
A. Blattmann
Dominik Lorenz
Patrick Esser
Bjorn Ommer
3DV
410
15,486
0
20 Dec 2021
GLIDE: Towards Photorealistic Image Generation and Editing with
  Text-Guided Diffusion Models
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
Alex Nichol
Prafulla Dhariwal
Aditya A. Ramesh
Pranav Shyam
Pamela Mishkin
Bob McGrew
Ilya Sutskever
Mark Chen
331
3,600
0
20 Dec 2021
Zero-Shot Text-to-Image Generation
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
391
4,941
0
24 Feb 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize
  Long-Tail Visual Concepts
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
429
1,127
0
17 Feb 2021
VQA-LOL: Visual Question Answering under the Lens of Logic
VQA-LOL: Visual Question Answering under the Lens of Logic
Tejas Gokhale
Pratyay Banerjee
Chitta Baral
Yezhou Yang
CoGe
49
75
0
19 Feb 2020
Effectively Unbiased FID and Inception Score and where to find them
Effectively Unbiased FID and Inception Score and where to find them
Min Jin Chong
David A. Forsyth
EGVM
68
203
0
16 Nov 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text
  Transformer
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
419
20,127
0
23 Oct 2019
Microsoft COCO: Common Objects in Context
Microsoft COCO: Common Objects in Context
Nayeon Lee
Michael Maire
Serge J. Belongie
Lubomir Bourdev
Ross B. Girshick
James Hays
Pietro Perona
Deva Ramanan
C. L. Zitnick
Piotr Dollár
ObjD
413
43,638
0
01 May 2014
1