Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2102.12092
Cited By
Zero-Shot Text-to-Image Generation
24 February 2021
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Zero-Shot Text-to-Image Generation"
50 / 212 papers shown
Title
Knowledge Bridger: Towards Training-free Missing Modality Completion
Guanzhou Ke
Shengfeng He
Xinyu Wang
Bo Wang
Guoqing Chao
Yize Zhang
Yi Xie
HeXing Su
150
1
0
18 Jun 2025
Align Beyond Prompts: Evaluating World Knowledge Alignment in Text-to-Image Generation
Wenchao Zhang
Jiahe Tian
Runze He
Jizhong Han
Jiao Dai
Miaomiao Feng
Wei Mi
Xiaodan Zhang
82
0
0
24 May 2025
Segment Anyword: Mask Prompt Inversion for Open-Set Grounded Segmentation
Zhihua Liu
Amrutha Saseendran
Lei Tong
Xilin He
Fariba Yousefi
...
Dino Oglic
Tom Diethe
Philip Teare
Huiyu Zhou
Chen Jin
VLM
353
0
0
23 May 2025
DailyQA: A Benchmark to Evaluate Web Retrieval Augmented LLMs Based on Capturing Real-World Changes
Jiehan Cheng
Zhicheng Dou
RALM
165
0
0
22 May 2025
CompBench: Benchmarking Complex Instruction-guided Image Editing
Bohan Jia
Wenxuan Huang
Yuntian Tang
Junbo Qiao
Jincheng Liao
...
Lin Chen
Fei Zhao
Zihan Wang
Yuan Xie
Shaohui Lin
CoGe
125
1
0
18 May 2025
Guiding Diffusion with Deep Geometric Moments: Balancing Fidelity and Variation
Sangmin Jung
Utkarsh Nath
Yezhou Yang
Giulia Pedrielli
Joydeep Biswas
Amy Zhang
Hassan Ghasemzadeh
Pavan Turaga
DiffM
103
0
0
18 May 2025
One Image is Worth a Thousand Words: A Usability Preservable Text-Image Collaborative Erasing Framework
Feiran Li
Qianqian Xu
Shilong Bao
Zhiyong Yang
Xiaochun Cao
Qingming Huang
DiffM
98
0
0
16 May 2025
SuperEdit: Rectifying and Facilitating Supervision for Instruction-Based Image Editing
Ming Li
Xin Gu
Fan Chen
X. Xing
Longyin Wen
Chong Chen
Sijie Zhu
DiffM
222
1
0
05 May 2025
Adaptive Token Boundaries: Integrating Human Chunking Mechanisms into Multimodal LLMs
Dongxing Yu
104
0
0
03 May 2025
WILD: a new in-the-Wild Image Linkage Dataset for synthetic image attribution
Pietro Bongini
S. Mandelli
Andrea Montibeller
Mirko Casu
Orazio Pontorno
...
Paolo Bestagini
Irene Amerini
F. D. De Natale
Sebastiano Battiato
Mauro Barni
VLM
237
0
0
28 Apr 2025
MLEP: Multi-granularity Local Entropy Patterns for Universal AI-generated Image Detection
Lin Yuan
Xuelong Li
Yan Zhang
Jiawei Zhang
Hongbo Li
Xinbo Gao
94
0
0
18 Apr 2025
Evolved Hierarchical Masking for Self-Supervised Learning
Zhanzhou Feng
Shiliang Zhang
120
0
0
12 Apr 2025
Could AI Trace and Explain the Origins of AI-Generated Images and Text?
Hongchao Fang
Yixin Liu
Ran Xu
Can Qin
Yang Liu
Feng Liu
Lichao Sun
Dongwon Lee
Lifu Huang
Wenpeng Yin
DeLMO
116
0
0
05 Apr 2025
Spingarn's Method and Progressive Decoupling Beyond Elicitable Monotonicity
B. Evens
P. Latafat
Panagiotis Patrinos
213
1
0
01 Apr 2025
ShieldGemma 2: Robust and Tractable Image Content Moderation
Wenjun Zeng
D. Kurniawan
Ryan Mullins
Yuchi Liu
Tamoghna Saha
...
Mani Malek
Hamid Palangi
Joon Baek
Rick Pereira
Karthik Narasimhan
AI4MH
106
0
0
01 Apr 2025
Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
Size Wu
Wentao Zhang
Lumin Xu
Sheng Jin
Zhonghua Wu
Qingyi Tao
Wentao Liu
Wei Li
Chen Change Loy
VGen
419
5
0
27 Mar 2025
TDRI: Two-Phase Dialogue Refinement and Co-Adaptation for Interactive Image Generation
Yuheng Feng
Jianhui Wang
Kun Li
Sida Li
Tianyu Shi
Haoyue Han
Miao Zhang
Xueqian Wang
DiffM
445
0
0
22 Mar 2025
Generating Multimodal Driving Scenes via Next-Scene Prediction
Yanhao Wu
Haoyang Zhang
Tianwei Lin
Lichao Huang
Shujie Luo
Rui Wu
Congpei Qiu
Wei Ke
Tong Zhang
VGen
98
1
0
19 Mar 2025
Att-Adapter: A Robust and Precise Domain-Specific Multi-Attributes T2I Diffusion Adapter via Conditional Variational Autoencoder
Wonwoong Cho
Yan-Ying Chen
M. Klenk
David I. Inouye
Yanxia Zhang
DiffM
461
0
0
15 Mar 2025
Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards
Zijing Hu
Fengda Zhang
Long Chen
Kun Kuang
Jiahui Li
Kaifeng Gao
Jun Xiao
X. Wang
Wenwu Zhu
EGVM
205
4
0
14 Mar 2025
TikZero: Zero-Shot Text-Guided Graphics Program Synthesis
Jonas Belouadi
Eddy Ilg
Margret Keuper
Hideki Tanaka
Masao Utiyama
Raj Dabre
Steffen Eger
Simone Paolo Ponzetto
124
0
0
14 Mar 2025
Direction-Aware Diagonal Autoregressive Image Generation
Yijia Xu
Jianzhong Ju
Jian Luan
J. Cui
142
0
0
14 Mar 2025
TAR3D: Creating High-Quality 3D Assets via Next-Part Prediction
Xuying Zhang
Yutong Liu
Yangguang Li
Renrui Zhang
Yong Liu
...
Wanli Ouyang
Zhiwei Xiong
Peng Gao
Qibin Hou
Ming-Ming Cheng
196
3
0
13 Mar 2025
PromptMap: An Alternative Interaction Style for AI-Based Image Generation
Krzysztof Adamkiewicz
Paweł W. Woźniak
Julia Dominiak
Andrzej Romanowski
Jakob Karolus
Stanislav Frolov
129
1
0
12 Mar 2025
FaceID-6M: A Large-Scale, Open-Source FaceID Customization Dataset
Shuhe Wang
Xiaoya Li
Jiwei Li
G. Wang
Xiaofei Sun
...
Han Qiu
Mo Yu
Shengjie Shen
Tianwei Zhang
Eduard H. Hovy
VLM
108
1
0
10 Mar 2025
Unleashing the Potential of Large Language Models for Text-to-Image Generation through Autoregressive Representation Alignment
Xing Xie
Jiawei Liu
Ziyue Lin
Huijie Fan
Zhi Han
Yandong Tang
Liangqiong Qu
99
0
0
10 Mar 2025
Consistent Image Layout Editing with Diffusion Models
Tao Xia
Yudi Zhang
Ting Liu Lei Zhang
DiffM
111
1
0
09 Mar 2025
Generative Artificial Intelligence in Robotic Manipulation: A Survey
Kun Zhang
Peng Yun
Jun Cen
Junhao Cai
DiDi Zhu
...
Qifeng Chen
Jia Pan
Wei Zhang
Bo Yang
Hua Chen
148
1
0
05 Mar 2025
Direct Discriminative Optimization: Your Likelihood-Based Visual Generative Model is Secretly a GAN Discriminator
Kaiwen Zheng
Yongxin Chen
Huayu Chen
Guande He
Xuan Li
Jun Zhu
Qinsheng Zhang
DiffM
124
3
0
03 Mar 2025
MedUnifier: Unifying Vision-and-Language Pre-training on Medical Data with Vision Generation Task using Discrete Visual Representations
Ziyang Zhang
Yang Yu
Yucheng Chen
Xulei Yang
S. Yeo
MedIm
128
1
0
02 Mar 2025
A Review on Generative AI For Text-To-Image and Image-To-Image Generation and Implications To Scientific Images
Zineb Sordo
Eric Chagnon
Daniela Ushizima
EGVM
MedIm
135
1
0
28 Feb 2025
SYNTHIA: Novel Concept Design with Affordance Composition
Xiaomeng Jin
Xiaomeng Jin
Jeonghwan Kim
Qingbin Liu
Zhenhailong Wang
Khanh Duy Nguyen
Ansel Blume
Nanyun Peng
Kai-Wei Chang
Heng Ji
DiffM
463
1
0
25 Feb 2025
Distributional Vision-Language Alignment by Cauchy-Schwarz Divergence
Wenzhe Yin
Zehao Xiao
Pan Zhou
Shujian Yu
Jiayi Shen
Jan-Jakob Sonke
E. Gavves
143
1
0
24 Feb 2025
Towards Foundation Models for Mixed Integer Linear Programming
Sirui Li
Janardhan Kulkarni
Ishai Menache
Cathy Wu
Beibin Li
106
7
0
24 Feb 2025
Intelligent Anomaly Detection for Lane Rendering Using Transformer with Self-Supervised Pre-Training and Customized Fine-Tuning
Yongqi Dong
Xingmin Lu
Ruohan Li
Wei Song
B. Arem
Haneen Farah
ViT
167
1
0
21 Feb 2025
Hyperspherical Normalization for Scalable Deep Reinforcement Learning
Hojoon Lee
Youngdo Lee
Takuma Seno
Donghu Kim
Peter Stone
Jaegul Choo
156
4
0
21 Feb 2025
SMITE: Segment Me In TimE
Amirhossein Alimohammadi
Sauradip Nag
Saeid Asgari Taghanaki
Andrea Tagliasacchi
Ghassan Hamarneh
Ali Mahdavi-Amiri
VLM
VOS
503
3
0
20 Feb 2025
A Transfer Attack to Image Watermarks
Yuepeng Hu
Zhengyuan Jiang
Moyang Guo
Neil Zhenqiang Gong
131
13
0
20 Feb 2025
Scalable Model Merging with Progressive Layer-wise Distillation
Jing Xu
Jiazheng Li
J.N. Zhang
MoMe
FedML
300
2
0
18 Feb 2025
TechSinger: Technique Controllable Multilingual Singing Voice Synthesis via Flow Matching
Wenxiang Guo
Yu Zhang
Changhao Pan
Rongjie Huang
Li Tang
Ruiqi Li
Zhiqing Hong
Yongqi Wang
Zhou Zhao
207
4
0
18 Feb 2025
TinyEmo: Scaling down Emotional Reasoning via Metric Projection
Cristian Gutierrez
LRM
208
0
0
17 Feb 2025
CoDiff: Conditional Diffusion Model for Collaborative 3D Object Detection
zhe Huang
Shuo Wang
Yanjie Wang
Lei Wang
DiffM
393
0
0
17 Feb 2025
FreeBlend: Advancing Concept Blending with Staged Feedback-Driven Interpolation Diffusion
Yufan Zhou
Haoyu Shen
Huan Wang
DiffM
201
1
0
17 Feb 2025
ControlText: Unlocking Controllable Fonts in Multilingual Text Rendering without Font Annotations
Bowen Jiang
Yuan Yuan
Xinyi Bai
Zhuoqun Hao
Alyson Yin
Yaojie Hu
Wenyu Liao
Lyle Ungar
Camillo J Taylor
DiffM
101
2
0
16 Feb 2025
Transfer Your Perspective: Controllable 3D Generation from Any Viewpoint in a Driving Scene
Tai-Yu Pan
Sooyoung Jeon
Mengdi Fan
Jinsu Yoo
Zhenyang Feng
Mark E. Campbell
Kilian Q. Weinberger
Bharath Hariharan
Wei-Lun Chao
197
0
0
10 Feb 2025
From Pixels to Components: Eigenvector Masking for Visual Representation Learning
Alice Bizeul
Thomas M. Sutter
Alain Ryser
Bernhard Schölkopf
Julius von Kügelgen
Julia E. Vogt
172
2
0
10 Feb 2025
MoFM: A Large-Scale Human Motion Foundation Model
Mohammadreza Baharani
Ghazal Alinezhad Noghre
Armin Danesh Pazho
Gabriel Maldonado
Hamed Tabkhi
AI4CE
423
1
0
08 Feb 2025
PatentLMM: Large Multimodal Model for Generating Descriptions for Patent Figures
Shivalika Singh
Nakul Sharma
Manish Gupta
Anand Mishra
115
1
0
28 Jan 2025
Turn That Frown Upside Down: FaceID Customization via Cross-Training Data
Shuhe Wang
Xiaoya Li
Xiaofei Sun
G. Wang
Tianwei Zhang
Jiwei Li
Eduard H. Hovy
97
1
0
28 Jan 2025
Visual Generation Without Guidance
Huayu Chen
Kai Jiang
Kaiwen Zheng
Jianfei Chen
Hang Su
Jun Zhu
142
2
0
28 Jan 2025
1
2
3
4
5
Next