ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.18730
  4. Cited By
Align Beyond Prompts: Evaluating World Knowledge Alignment in Text-to-Image Generation

Align Beyond Prompts: Evaluating World Knowledge Alignment in Text-to-Image Generation

24 May 2025
Wenchao Zhang
Jiahe Tian
Runze He
Jizhong Han
Jiao Dai
Miaomiao Feng
Wei Mi
Xiaodan Zhang
ArXivPDFHTML

Papers citing "Align Beyond Prompts: Evaluating World Knowledge Alignment in Text-to-Image Generation"

44 / 44 papers shown
Title
Science-T2I: Addressing Scientific Illusions in Image Synthesis
Science-T2I: Addressing Scientific Illusions in Image Synthesis
Jialuo Li
Wenhao Chai
Xingyu Fu
Haiyang Xu
Saining Xie
MedIm
62
1
0
17 Apr 2025
ConceptMix: A Compositional Image Generation Benchmark with Controllable
  Difficulty
ConceptMix: A Compositional Image Generation Benchmark with Controllable Difficulty
Xindi Wu
Dingli Yu
Yangsibo Huang
Olga Russakovsky
Sanjeev Arora
CoGe
EGVM
62
18
0
26 Aug 2024
Evaluating Numerical Reasoning in Text-to-Image Models
Evaluating Numerical Reasoning in Text-to-Image Models
Ivana Kajić
Olivia Wiles
Isabela Albuquerque
Matthias Bauer
Su Wang
Jordi Pont-Tuset
Aida Nematzadeh
EGVM
ReLM
107
2
0
20 Jun 2024
GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual
  Generation
GenAI-Bench: Evaluating and Improving Compositional Text-to-Visual Generation
Baiqi Li
Zhiqiu Lin
Deepak Pathak
Jiayao Li
Yixin Fei
...
Tiffany Ling
Xide Xia
Pengchuan Zhang
Graham Neubig
Deva Ramanan
EGVM
72
31
0
19 Jun 2024
PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image
  Models
PhyBench: A Physical Commonsense Benchmark for Evaluating Text-to-Image Models
Fanqing Meng
Wenqi Shao
Lixin Luo
Yahong Wang
Yiran Chen
...
Yue Yang
Tianshuo Yang
Kaipeng Zhang
Yu Qiao
Ping Luo
EGVM
76
10
0
17 Jun 2024
Commonsense-T2I Challenge: Can Text-to-Image Generation Models
  Understand Commonsense?
Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense?
Xingyu Fu
Muyu He
Yujie Lu
William Yang Wang
Dan Roth
EGVM
LRM
45
19
0
11 Jun 2024
Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings
Revisiting Text-to-Image Evaluation with Gecko: On Metrics, Prompts, and Human Ratings
Olivia Wiles
Chuhan Zhang
Isabela Albuquerque
Ivana Kajić
Su Wang
...
Jordi Pont-Tuset
Aida Nematzadeh
Anant Nawalgaria
Jordi Pont-Tuset
Aida Nematzadeh
EGVM
173
18
0
25 Apr 2024
Evaluating Text-to-Visual Generation with Image-to-Text Generation
Evaluating Text-to-Visual Generation with Image-to-Text Generation
Zhiqiu Lin
Deepak Pathak
Baiqi Li
Jiayao Li
Xide Xia
Graham Neubig
Pengchuan Zhang
Deva Ramanan
EGVM
81
143
0
01 Apr 2024
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Patrick Esser
Sumith Kulal
A. Blattmann
Rahim Entezari
Jonas Muller
...
Zion English
Kyle Lacey
Alex Goodwin
Yannik Marek
Robin Rombach
DiffM
212
1,244
0
05 Mar 2024
VIEScore: Towards Explainable Metrics for Conditional Image Synthesis
  Evaluation
VIEScore: Towards Explainable Metrics for Conditional Image Synthesis Evaluation
Max Ku
Dongfu Jiang
Cong Wei
Xiang Yue
Wenhu Chen
53
57
0
22 Dec 2023
Rich Human Feedback for Text-to-Image Generation
Rich Human Feedback for Text-to-Image Generation
Youwei Liang
Junfeng He
Gang Li
Peizhao Li
Arseniy Klimovskiy
...
Yiwen Luo
Yang Li
Kai Kohlhoff
Deepak Ramachandran
Vidhya Navalpakkam
EGVM
47
76
0
15 Dec 2023
A Contrastive Compositional Benchmark for Text-to-Image Synthesis: A
  Study with Unified Text-to-Image Fidelity Metrics
A Contrastive Compositional Benchmark for Text-to-Image Synthesis: A Study with Unified Text-to-Image Fidelity Metrics
Xiangru Zhu
Penglei Sun
Chengyu Wang
Jingping Liu
Zhixu Li
Yanghua Xiao
Jun Huang
CoGe
160
6
0
04 Dec 2023
Holistic Evaluation of Text-To-Image Models
Holistic Evaluation of Text-To-Image Models
Tony Lee
Michihiro Yasunaga
Chenlin Meng
Yifan Mai
Joon Sung Park
...
Jun-Yan Zhu
Fei-Fei Li
Jiajun Wu
Stefano Ermon
Percy Liang
181
132
0
07 Nov 2023
GPT-4V(ision) as a Generalist Evaluator for Vision-Language Tasks
GPT-4V(ision) as a Generalist Evaluator for Vision-Language Tasks
Xinlu Zhang
Yujie Lu
Weizhi Wang
An Yan
Jun Yan
Lianke Qin
Heng Wang
Xifeng Yan
William Y. Wang
Linda R. Petzold
LM&MA
MLLM
ELM
48
83
0
02 Nov 2023
Davidsonian Scene Graph: Improving Reliability in Fine-grained
  Evaluation for Text-to-Image Generation
Davidsonian Scene Graph: Improving Reliability in Fine-grained Evaluation for Text-to-Image Generation
Jaemin Cho
Yushi Hu
Roopal Garg
Peter Anderson
Ranjay Krishna
Jason Baldridge
Mohit Bansal
Jordi Pont-Tuset
Su Wang
EGVM
46
74
0
27 Oct 2023
SDXL: Improving Latent Diffusion Models for High-Resolution Image
  Synthesis
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis
Dustin Podell
Zion English
Kyle Lacey
A. Blattmann
Tim Dockhorn
Jonas Muller
Joe Penna
Robin Rombach
171
2,242
0
04 Jul 2023
Human Preference Score v2: A Solid Benchmark for Evaluating Human
  Preferences of Text-to-Image Synthesis
Human Preference Score v2: A Solid Benchmark for Evaluating Human Preferences of Text-to-Image Synthesis
Xiaoshi Wu
Yiming Hao
Keqiang Sun
Yixiong Chen
Feng Zhu
Rui Zhao
Hongsheng Li
73
274
0
15 Jun 2023
BLIP-Diffusion: Pre-trained Subject Representation for Controllable
  Text-to-Image Generation and Editing
BLIP-Diffusion: Pre-trained Subject Representation for Controllable Text-to-Image Generation and Editing
Dongxu Li
Junnan Li
Steven C. H. Hoi
59
319
0
24 May 2023
LLMScore: Unveiling the Power of Large Language Models in Text-to-Image
  Synthesis Evaluation
LLMScore: Unveiling the Power of Large Language Models in Text-to-Image Synthesis Evaluation
Yujie Lu
Xianjun Yang
Xiujun Li
Xinze Wang
William Yang Wang
EGVM
89
75
0
18 May 2023
What You See is What You Read? Improving Text-Image Alignment Evaluation
What You See is What You Read? Improving Text-Image Alignment Evaluation
Michal Yarom
Yonatan Bitton
Soravit Changpinyo
Roee Aharoni
Jonathan Herzig
Oran Lang
E. Ofek
Idan Szpektor
EGVM
76
80
0
17 May 2023
Measuring Progress in Fine-grained Vision-and-Language Understanding
Measuring Progress in Fine-grained Vision-and-Language Understanding
Emanuele Bugliarello
Laurent Sartran
Aishwarya Agrawal
Lisa Anne Hendricks
Aida Nematzadeh
VLM
53
23
0
12 May 2023
Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image
  Generation
Pick-a-Pic: An Open Dataset of User Preferences for Text-to-Image Generation
Yuval Kirstain
Adam Polyak
Uriel Singer
Shahbuland Matiana
Joe Penna
Omer Levy
EGVM
187
375
0
02 May 2023
ImageReward: Learning and Evaluating Human Preferences for Text-to-Image
  Generation
ImageReward: Learning and Evaluating Human Preferences for Text-to-Image Generation
Jiazheng Xu
Xiao Liu
Yuchen Wu
Yuxuan Tong
Qinkai Li
Ming Ding
Jie Tang
Yuxiao Dong
97
360
0
12 Apr 2023
Toward Verifiable and Reproducible Human Evaluation for Text-to-Image
  Generation
Toward Verifiable and Reproducible Human Evaluation for Text-to-Image Generation
Mayu Otani
Riku Togashi
Yu Sawai
Ryosuke Ishigami
Yuta Nakashima
Esa Rahtu
J. Heikkilä
Shiníchi Satoh
65
64
0
04 Apr 2023
Sigmoid Loss for Language Image Pre-Training
Sigmoid Loss for Language Image Pre-Training
Xiaohua Zhai
Basil Mustafa
Alexander Kolesnikov
Lucas Beyer
CLIP
VLM
96
1,076
0
27 Mar 2023
TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation
  with Question Answering
TIFA: Accurate and Interpretable Text-to-Image Faithfulness Evaluation with Question Answering
Yushi Hu
Benlin Liu
Jungo Kasai
Yizhong Wang
Mari Ostendorf
Ranjay Krishna
Noah A. Smith
EGVM
60
225
0
21 Mar 2023
Teaching CLIP to Count to Ten
Teaching CLIP to Count to Ten
Roni Paiss
Ariel Ephrat
Omer Tov
Shiran Zada
Inbar Mosseri
Michal Irani
Tali Dekel
VLM
CLIP
73
100
0
23 Feb 2023
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for
  Text-to-Video Generation
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
Jay Zhangjie Wu
Yixiao Ge
Xintao Wang
Weixian Lei
Yuchao Gu
Yufei Shi
Wynne Hsu
Ying Shan
Xiaohu Qie
Mike Zheng Shou
VGen
93
708
0
22 Dec 2022
Benchmarking Spatial Relationships in Text-to-Image Generation
Benchmarking Spatial Relationships in Text-to-Image Generation
Tejas Gokhale
Hamid Palangi
Besmira Nushi
Vibhav Vineet
Eric Horvitz
Ece Kamar
Chitta Baral
Yezhou Yang
EGVM
70
69
0
20 Dec 2022
Multi-Concept Customization of Text-to-Image Diffusion
Multi-Concept Customization of Text-to-Image Diffusion
Nupur Kumari
Bin Zhang
Richard Y. Zhang
Eli Shechtman
Jun-Yan Zhu
120
855
0
08 Dec 2022
LAION-5B: An open large-scale dataset for training next generation
  image-text models
LAION-5B: An open large-scale dataset for training next generation image-text models
Christoph Schuhmann
Romain Beaumont
Richard Vencu
Cade Gordon
Ross Wightman
...
Srivatsa Kundurthy
Katherine Crowson
Ludwig Schmidt
R. Kaczmarczyk
J. Jitsev
VLM
MLLM
CLIP
129
3,355
0
16 Oct 2022
DreamBooth: Fine Tuning Text-to-Image Diffusion Models for
  Subject-Driven Generation
DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation
Nataniel Ruiz
Yuanzhen Li
Varun Jampani
Yael Pritch
Michael Rubinstein
Kfir Aberman
218
2,789
0
25 Aug 2022
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
Jiahui Yu
Yuanzhong Xu
Jing Yu Koh
Thang Luong
Gunjan Baid
...
Zarana Parekh
Xin Li
Han Zhang
Jason Baldridge
Yonghui Wu
EGVM
167
1,089
0
22 Jun 2022
Photorealistic Text-to-Image Diffusion Models with Deep Language
  Understanding
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
Chitwan Saharia
William Chan
Saurabh Saxena
Lala Li
Jay Whang
...
Raphael Gontijo-Lopes
Tim Salimans
Jonathan Ho
David J Fleet
Mohammad Norouzi
VLM
304
5,904
0
23 May 2022
Generative Adversarial Networks
Generative Adversarial Networks
Gilad Cohen
Raja Giryes
GAN
194
30,069
0
01 Mar 2022
Diffusion bridges vector quantized Variational AutoEncoders
Diffusion bridges vector quantized Variational AutoEncoders
Max H. Cohen
Guillaume Quispe
Sylvain Le Corff
Charles Ollion
Eric Moulines
DiffM
41
15
0
10 Feb 2022
High-Resolution Image Synthesis with Latent Diffusion Models
High-Resolution Image Synthesis with Latent Diffusion Models
Robin Rombach
A. Blattmann
Dominik Lorenz
Patrick Esser
Bjorn Ommer
3DV
272
15,081
0
20 Dec 2021
DiffusionCLIP: Text-Guided Diffusion Models for Robust Image
  Manipulation
DiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation
Gwanghyun Kim
Taesung Kwon
Jong Chul Ye
DiffM
142
634
0
06 Oct 2021
CogView: Mastering Text-to-Image Generation via Transformers
CogView: Mastering Text-to-Image Generation via Transformers
Ming Ding
Zhuoyi Yang
Wenyi Hong
Wendi Zheng
Chang Zhou
...
Junyang Lin
Xu Zou
Zhou Shao
Hongxia Yang
Jie Tang
ViT
VLM
88
773
0
26 May 2021
CLIPScore: A Reference-free Evaluation Metric for Image Captioning
CLIPScore: A Reference-free Evaluation Metric for Image Captioning
Jack Hessel
Ari Holtzman
Maxwell Forbes
Ronan Le Bras
Yejin Choi
CLIP
107
1,512
0
18 Apr 2021
Zero-Shot Text-to-Image Generation
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
334
4,873
0
24 Feb 2021
TallyQA: Answering Complex Counting Questions
TallyQA: Answering Complex Counting Questions
Manoj Acharya
Kushal Kafle
Christopher Kanan
45
117
0
29 Oct 2018
ConceptNet 5.5: An Open Multilingual Graph of General Knowledge
ConceptNet 5.5: An Open Multilingual Graph of General Knowledge
R. Speer
Joshua Chin
Catherine Havasi
140
2,882
0
12 Dec 2016
Improved Techniques for Training GANs
Improved Techniques for Training GANs
Tim Salimans
Ian Goodfellow
Wojciech Zaremba
Vicki Cheung
Alec Radford
Xi Chen
GAN
429
8,999
0
10 Jun 2016
1