Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2205.11487
Cited By
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
23 May 2022
Chitwan Saharia
William Chan
Saurabh Saxena
Lala Li
Jay Whang
Emily L. Denton
Seyed Kamyar Seyed Ghasemipour
Burcu Karagol Ayan
S. S. Mahdavi
Raphael Gontijo-Lopes
Tim Salimans
Jonathan Ho
David J Fleet
Mohammad Norouzi
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding"
50 / 1,364 papers shown
Title
Diffusion on Graph: Augmentation of Graph Structure for Node Classification
Yancheng Wang
Changyu Liu
Yingzhen Yang
DiffM
GNN
250
0
0
16 Mar 2025
Personalize Anything for Free with Diffusion Transformer
Haoran Feng
Zehuan Huang
Lin Li
Hairong Lv
Lu Sheng
DiffM
152
5
0
16 Mar 2025
DiffGAP: A Lightweight Diffusion Module in Contrastive Space for Bridging Cross-Model Gap
Shentong Mo
Zehua Chen
Fan Bao
Jun-Jie Zhu
DiffM
104
1
0
15 Mar 2025
DiffAD: A Unified Diffusion Modeling Approach for Autonomous Driving
Tao Wang
Cong Zhang
Xingguang Qu
Kun Li
Wen Liu
Chenyu Huang
117
1
0
15 Mar 2025
Att-Adapter: A Robust and Precise Domain-Specific Multi-Attributes T2I Diffusion Adapter via Conditional Variational Autoencoder
Wonwoong Cho
Yan-Ying Chen
M. Klenk
David I. Inouye
Yanxia Zhang
DiffM
491
0
0
15 Mar 2025
Towards Better Alignment: Training Diffusion Models with Reinforcement Learning Against Sparse Rewards
Zijing Hu
Fengda Zhang
Long Chen
Kun Kuang
Jiahui Li
Kaifeng Gao
Jun Xiao
X. Wang
Wenwu Zhu
EGVM
235
5
0
14 Mar 2025
Flow to the Mode: Mode-Seeking Diffusion Autoencoders for State-of-the-Art Image Tokenization
Kyle Sargent
Kyle Hsu
Justin Johnson
L. Fei-Fei
Jiajun Wu
DiffM
MU
149
8
0
14 Mar 2025
Streaming Generation of Co-Speech Gestures via Accelerated Rolling Diffusion
Evgeniia Vu
Andrei Boiarov
Dmitry Vetrov
VGen
120
0
0
13 Mar 2025
PlanGen: Towards Unified Layout Planning and Image Generation in Auto-Regressive Vision Language Models
Runze He
Bo Cheng
Yuhang Ma
Qingxiang Jia
Shanyuan Liu
Ao Ma
Xiaoyu Wu
Liebucha Wu
Dawei Leng
Yuhui Yin
DiffM
VLM
183
0
0
13 Mar 2025
FlowTok: Flowing Seamlessly Across Text and Image Tokens
Ju He
Qihang Yu
Qihao Liu
Liang-Chieh Chen
150
1
0
13 Mar 2025
On the Generalization Properties of Diffusion Models
Puheng Li
Zhong Li
Huishuai Zhang
Jiang Bian
240
39
0
13 Mar 2025
UVE: Are MLLMs Unified Evaluators for AI-Generated Videos?
Yuanxin Liu
Rui Zhu
Shuhuai Ren
Jiacong Wang
Haoyuan Guo
Xu Sun
Lu Jiang
377
1
0
13 Mar 2025
TAR3D: Creating High-Quality 3D Assets via Next-Part Prediction
Xuying Zhang
Yutong Liu
Yangguang Li
Renrui Zhang
Yong Liu
...
Wanli Ouyang
Zhiwei Xiong
Peng Gao
Qibin Hou
Ming-Ming Cheng
239
3
0
13 Mar 2025
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing
Rongyao Fang
Chengqi Duan
Kun Wang
Linjiang Huang
Hao Li
...
Xingyu Zeng
R. Zhao
Jifeng Dai
Xihui Liu
Hongsheng Li
MLLM
ReLM
LRM
165
23
0
13 Mar 2025
Efficient Fine-Tuning and Concept Suppression for Pruned Diffusion Models
Reza Shirkavand
Peiran Yu
Shangqian Gao
Gowthami Somepalli
Tom Goldstein
Heng-Chiao Huang
193
2
0
13 Mar 2025
Sparse Autoencoder as a Zero-Shot Classifier for Concept Erasing in Text-to-Image Diffusion Models
Zhihua Tian
Sirun Nan
Ming Xu
Shengfang Zhai
Wenjie Qu
Enchao Gong
Kui Ren
Ruoxi Jia
Jiaheng Zhang
DiffM
141
2
0
12 Mar 2025
Long-horizon Visual Instruction Generation with Logic and Attribute Self-reflection
Yucheng Suo
Fan Ma
Kaixin Shen
Linchao Zhu
Yi Yang
VLM
86
0
0
12 Mar 2025
DexGrasp Anything: Towards Universal Robotic Dexterous Grasping with Physics Awareness
Yiming Zhong
Qi Jiang
Jingyi Yu
Yuexin Ma
186
4
0
11 Mar 2025
WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation
Yuwei Niu
Munan Ning
Mengren Zheng
Weiyang Jin
Bin Lin
...
Jiaqi Liao
Chaoran Feng
Kunpeng Ning
Bin Zhu
Li Yuan
EGVM
147
26
0
10 Mar 2025
PLADIS: Pushing the Limits of Attention in Diffusion Models at Inference Time by Leveraging Sparsity
Kwanyoung Kim
Byeongsu Sim
DiffM
VLM
151
0
0
10 Mar 2025
From Reusing to Forecasting: Accelerating Diffusion Models with TaylorSeers
Jiacheng Liu
Chang Zou
Yuanhuiyi Lyu
Junjie Chen
Linfeng Zhang
DiffM
150
5
0
10 Mar 2025
Boosting Diffusion-Based Text Image Super-Resolution Model Towards Generalized Real-World Scenarios
Chenglu Pan
Xiaogang Xu
Ganggui Ding
Yunke Zhang
Wenbo Li
Jiarong Xu
Qingbiao Wu
142
0
0
10 Mar 2025
FaceID-6M: A Large-Scale, Open-Source FaceID Customization Dataset
Shuhe Wang
Xiaoya Li
Jiwei Li
G. Wang
Xiaofei Sun
...
Han Qiu
Mo Yu
Shengjie Shen
Tianwei Zhang
Eduard H. Hovy
VLM
126
1
0
10 Mar 2025
Denoising Score Distillation: From Noisy Diffusion Pretraining to One-Step High-Quality Generation
Tianyu Chen
Yasi Zhang
Ziyi Wang
Ying Nian Wu
Oscar Leong
Mingyuan Zhou
DiffM
155
2
0
10 Mar 2025
TRCE: Towards Reliable Malicious Concept Erasure in Text-to-Image Diffusion Models
Ruidong Chen
Honglin Guo
Lanjun Wang
Chenyu Zhang
Weizhi Nie
An-an Liu
DiffM
109
2
0
10 Mar 2025
PTDiffusion: Free Lunch for Generating Optical Illusion Hidden Pictures with Phase-Transferred Diffusion Model
Xiang Gao
Shuai Yang
Jiaying Liu
DiffM
142
0
0
08 Mar 2025
X2I: Seamless Integration of Multimodal Understanding into Diffusion Transformer via Attention Distillation
Jian Ma
Qirong Peng
Xu Guo
Chen Chen
H. Lu
Zhenyu Yang
VLM
156
1
0
08 Mar 2025
Escaping Plato's Cave: Towards the Alignment of 3D and Text Latent Spaces
Souhail Hadgi
Luca Moschella
Andrea Santilli
Diego Gomez
Qixing Huang
Emanuele Rodolà
Simone Melzi
M. Ovsjanikov
91
0
0
07 Mar 2025
Fine-Tuning Florence2 for Enhanced Object Detection in Un-constructed Environments: Vision-Language Model Approach
Soumyadeep Ro
Sanapala Satwika
Pamarthi Yasoda Gayathri
Mohmmad Ghaith Balsha
Aysegul Ucar
VLM
ObjD
148
0
0
06 Mar 2025
ProReflow: Progressive Reflow with Decomposed Velocity
Lei Ke
Haohang Xu
Xuefei Ning
Yongqian Li
Jiajun Li
Haoling Li
Yuxuan Lin
Dongsheng Jiang
Yue Yang
Linfeng Zhang
DiffM
97
1
0
05 Mar 2025
Heuristics for AI-driven Graphical Asset Generation Tools in Game Design and Development Pipelines: A User-Centred Approach
Kaisei Fukaya
Damon Daylamani-Zad
Harry Agius
93
0
0
04 Mar 2025
Enhancing Vision-Language Compositional Understanding with Multimodal Synthetic Data
Haoxin Li
Boyang Li
CoGe
188
1
0
03 Mar 2025
Morpheus: Text-Driven 3D Gaussian Splat Shape and Color Stylization
Jamie Wynn
Z. Qureshi
Jakub Powierza
Jamie Watson
Mohamed Sayed
3DGS
DiffM
177
1
0
03 Mar 2025
Kiss3DGen: Repurposing Image Diffusion Models for 3D Asset Generation
Jiantao Lin
Xin Yang
Meixi Chen
Yingjie Xu
D. Yan
Leyi Wu
Xinli Xu
Lie Xu
Shunsi Zhang
Ying-Cong Chen
127
2
0
03 Mar 2025
One-shot In-context Part Segmentation
Zhenqi Dai
Ting Liu
Xinyu Zhang
Y. X. Wei
Yanning Zhang
VLM
176
1
0
03 Mar 2025
Generalized Diffusion Detector: Mining Robust Features from Diffusion Models for Domain-Generalized Detection
Boyong He
Yuxiang Ji
Qianwen Ye
Zhuoyue Tan
Liaoni Wu
DiffM
160
0
0
03 Mar 2025
WeGen: A Unified Model for Interactive Multimodal Generation as We Chat
Zhipeng Huang
Shaobin Zhuang
Canmiao Fu
Binxin Yang
Ying Zhang
Chong Sun
Zhizheng Zhang
Yali Wang
Chen Li
Zheng-Jun Zha
DiffM
123
3
0
03 Mar 2025
Zero-Shot Head Swapping in Real-World Scenarios
S. Jeong
Taewoong Kang
Hyojin Jang
Jaegul Choo
94
0
0
02 Mar 2025
Raccoon: Multi-stage Diffusion Training with Coarse-to-Fine Curating Videos
Zhiyu Tan
Junyan Wang
Hao Yang
Luozheng Qin
Hesen Chen
Qiang-feng Zhou
Hao Li
VGen
127
1
0
28 Feb 2025
Advancing AI-Powered Medical Image Synthesis: Insights from MedVQA-GI Challenge Using CLIP, Fine-Tuned Stable Diffusion, and Dream-Booth + LoRA
Ojonugwa Oluwafemi Ejiga Peter
Md Mahmudur Rahman
Fahmi Khalifa
DiffM
MedIm
92
1
0
28 Feb 2025
DualSpec: Text-to-spatial-audio Generation via Dual-Spectrogram Guided Diffusion Model
Lei Zhao
Sizhou Chen
Linfeng Feng
Ju Liu
Xuelong Li
Chi Zhang
Xuelong Li
DiffM
MDE
110
1
0
26 Feb 2025
Steganography Beyond Space-Time with Chain of Multimodal AI
Ching-Chun Chang
Isao Echizen
165
0
0
25 Feb 2025
Towards Hierarchical Rectified Flow
Yichi Zhang
Yici Yan
Alex Schwing
Zhizhen Zhao
121
2
0
24 Feb 2025
Distributional Vision-Language Alignment by Cauchy-Schwarz Divergence
Wenzhe Yin
Zehao Xiao
Pan Zhou
Shujian Yu
Jiayi Shen
Jan-Jakob Sonke
E. Gavves
177
1
0
24 Feb 2025
Model Lakes
Koyena Pal
David Bau
Renée J. Miller
176
2
0
24 Feb 2025
A Systematic Review of Open Datasets Used in Text-to-Image (T2I) Gen AI Model Safety
Rakeen Rouf
Trupti Bavalatti
Osama Ahmed
Dhaval Potdar
Faraz Jawed
EGVM
128
2
0
23 Feb 2025
Unified Prompt Attack Against Text-to-Image Generation Models
Duo Peng
Qiuhong Ke
Mark He Huang
Ping Hu
Jing Liu
89
1
0
23 Feb 2025
Dragen3D: Multiview Geometry Consistent 3D Gaussian Generation with Drag-Based Control
Jinbo Yan
Alan Zhao
Yixin Hu
3DGS
484
0
0
23 Feb 2025
Human2Robot: Learning Robot Actions from Paired Human-Robot Videos
Sicheng Xie
Haidong Cao
Zejia Weng
Zhen Xing
Shiwei Shen
Jiaqi Leng
Xipeng Qiu
Yanwei Fu
Zuxuan Wu
Yu Jiang
148
0
0
23 Feb 2025
DualNeRF: Text-Driven 3D Scene Editing via Dual-Field Representation
Yuxuan Xiong
Yue Shi
Yishun Dou
Bingbing Ni
DiffM
69
0
0
22 Feb 2025
Previous
1
2
3
4
5
6
...
26
27
28
Next