Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2205.11487
Cited By
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
23 May 2022
Chitwan Saharia
William Chan
Saurabh Saxena
Lala Li
Jay Whang
Emily L. Denton
Seyed Kamyar Seyed Ghasemipour
Burcu Karagol Ayan
S. S. Mahdavi
Raphael Gontijo-Lopes
Tim Salimans
Jonathan Ho
David J Fleet
Mohammad Norouzi
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding"
50 / 4,337 papers shown
Title
ToLo: A Two-Stage, Training-Free Layout-To-Image Generation Framework For High-Overlap Layouts
Linhao Huang
Jing Yu
DiffM
49
0
0
03 Mar 2025
Kiss3DGen: Repurposing Image Diffusion Models for 3D Asset Generation
Jiantao Lin
Xin Yang
Meixi Chen
Yingjie Xu
D. Yan
Leyi Wu
Xinli Xu
Lie Xu
Shunsi Zhang
Ying-Cong Chen
62
1
0
03 Mar 2025
Interactive Gadolinium-Free MRI Synthesis: A Transformer with Localization Prompt Learning
Linhao Li
Changhui Su
Yu Guo
Huimao Zhang
Dong Liang
K. Shang
MedIm
56
0
0
03 Mar 2025
Enhancing Vision-Language Compositional Understanding with Multimodal Synthetic Data
Haoxin Li
Boyang Li
CoGe
73
0
0
03 Mar 2025
One-shot In-context Part Segmentation
Zhenqi Dai
Ting Liu
X. Zhang
Y. X. Wei
Yanning Zhang
VLM
85
1
0
03 Mar 2025
Non-convergence to the optimal risk for Adam and stochastic gradient descent optimization in the training of deep neural networks
Thang Do
Arnulf Jentzen
Adrian Riekert
58
1
0
03 Mar 2025
CacheQuant: Comprehensively Accelerated Diffusion Models
Xuewen Liu
Zhikai Li
Qingyi Gu
DiffM
40
0
0
03 Mar 2025
Generalized Diffusion Detector: Mining Robust Features from Diffusion Models for Domain-Generalized Detection
Boyong He
Yuxiang Ji
Qianwen Ye
Zhuoyue Tan
Liaoni Wu
DiffM
80
0
0
03 Mar 2025
EasyCraft: A Robust and Efficient Framework for Automatic Avatar Crafting
Suzhen Wang
Weijie Chen
Wei Zhang
Minda Zhao
Lincheng Li
Rongsheng Zhang
Zhibo Hu
Xin Yu
63
1
0
03 Mar 2025
WeGen: A Unified Model for Interactive Multimodal Generation as We Chat
Zhipeng Huang
Shaobin Zhuang
Canmiao Fu
Binxin Yang
Ying Zhang
Chong Sun
Zhizheng Zhang
Yali Wang
Chen Li
Zheng-Jun Zha
DiffM
69
2
0
03 Mar 2025
Morpheus: Text-Driven 3D Gaussian Splat Shape and Color Stylization
Jamie Wynn
Z. Qureshi
Jakub Powierza
Jamie Watson
Mohamed Sayed
3DGS
DiffM
76
0
0
03 Mar 2025
Zero-Shot Head Swapping in Real-World Scenarios
S. Jeong
Taewoong Kang
Hyojin Jang
Jaegul Choo
39
0
0
02 Mar 2025
Evaluating and Predicting Distorted Human Body Parts for Generated Images
Lu Ma
Kaibo Cao
Hao Liang
Jiaxin Lin
Ziyu Li
Yuhong Liu
Jihong Zhang
Wentao Zhang
Tengjiao Wang
MedIm
44
0
0
02 Mar 2025
Extrapolating and Decoupling Image-to-Video Generation Models: Motion Modeling is Easier Than You Think
Jie Tian
Xiaoye Qu
Zhenyi Lu
Wei Wei
Sichen Liu
Yu-Xi Cheng
DiffM
VGen
44
0
0
02 Mar 2025
FaceShot: Bring Any Character into Life
Junyao Gao
Yanan Sun
Fei Shen
Xin Jiang
Zhening Xing
Kai-xiang Chen
Cairong Zhao
CVBM
3DH
50
1
0
02 Mar 2025
Periodic Materials Generation using Text-Guided Joint Diffusion Model
Kishalay Das
Subhojyoti Khastagir
Pawan Goyal
Seung-Cheol Lee
S. Bhattacharjee
Niloy Ganguly
DiffM
34
0
0
01 Mar 2025
Advancing AI-Powered Medical Image Synthesis: Insights from MedVQA-GI Challenge Using CLIP, Fine-Tuned Stable Diffusion, and Dream-Booth + LoRA
Ojonugwa Oluwafemi Ejiga Peter
Md Mahmudur Rahman
Fahmi Khalifa
DiffM
MedIm
41
1
0
28 Feb 2025
SafeText: Safe Text-to-image Models via Aligning the Text Encoder
Yuepeng Hu
Zhengyuan Jiang
Neil Zhenqiang Gong
69
1
0
28 Feb 2025
Raccoon: Multi-stage Diffusion Training with Coarse-to-Fine Curating Videos
Zhiyu Tan
Junyan Wang
Hao Yang
Luozheng Qin
Hesen Chen
Qiang-feng Zhou
Hao Li
VGen
69
0
0
28 Feb 2025
FlexiDiT: Your Diffusion Transformer Can Easily Generate High-Quality Samples with Less Compute
Sotiris Anagnostidis
Gregor Bachmann
Yeongmin Kim
Jonas Kohler
Markos Georgopoulos
A. Sanakoyeu
Yuming Du
Albert Pumarola
Ali K. Thabet
Edgar Schönfeld
92
0
0
27 Feb 2025
Point Policy: Unifying Observations and Actions with Key Points for Robot Manipulation
Siddhant Haldar
Lerrel Pinto
3DPC
66
2
0
27 Feb 2025
MFSR: Multi-fractal Feature for Super-resolution Reconstruction with Fine Details Recovery
Lianping Yang
Peng Jiao
Jinshan Pan
Hegui Zhu
Su Guo
43
0
0
27 Feb 2025
Intent Tagging: Exploring Micro-Prompting Interactions for Supporting Granular Human-GenAI Co-Creation Workflows
Frederic Gmeiner
Nicolai Marquardt
Michael Bentley
Hugo Romat
M. Pahud
...
Asta Roseway
Nikolas Martelaro
Kenneth Holstein
K. Hinckley
N. Riche
55
0
0
26 Feb 2025
DualSpec: Text-to-spatial-audio Generation via Dual-Spectrogram Guided Diffusion Model
Lei Zhao
Sizhou Chen
Linfeng Feng
Xiao-Lei Zhang
Xuelong Li
DiffM
MDE
71
1
0
26 Feb 2025
Multi-Perspective Data Augmentation for Few-shot Object Detection
Anh-Khoa Nguyen Vu
Quoc-Truong Truong
Vinh-Tiep Nguyen
T. Ngo
Thanh-Toan Do
Tam V. Nguyen
77
1
0
25 Feb 2025
ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation
Yifan Pu
Yiming Zhao
Zhicong Tang
Ruihong Yin
Haoxing Ye
...
Ji Li
Xiu Li
Zheng Lian
Gao Huang
Baining Guo
DiffM
64
2
0
25 Feb 2025
Synthesizing Consistent Novel Views via 3D Epipolar Attention without Re-Training
Botao Ye
Sifei Liu
Xueting Li
Marc Pollefeys
Ming Yang
69
0
0
25 Feb 2025
FairGen: Controlling Sensitive Attributes for Fair Generations in Diffusion Models via Adaptive Latent Guidance
Mintong Kang
Vinayshekhar Bannihatti Kumar
Shamik Roy
Abhishek Kumar
Sopan Khosla
Balakrishnan Narayanaswamy
Rashmi Gangadharaiah
50
0
0
25 Feb 2025
HRR: Hierarchical Retrospection Refinement for Generated Image Detection
Peipei Yuan
Zijing Xie
Shuo Ye
Hong Chen
Yulong Wang
DiffM
154
1
0
25 Feb 2025
Steganography Beyond Space-Time with Chain of Multimodal AI
Ching-Chun Chang
Isao Echizen
74
0
0
25 Feb 2025
GCDance: Genre-Controlled 3D Full Body Dance Generation Driven By Music
Xinran Liu
Xu Dong
Diptesh Kanojia
Wenwu Wang
Zhenhua Feng
DiffM
62
0
0
25 Feb 2025
LDGen: Enhancing Text-to-Image Synthesis via Large Language Model-Driven Language Representation
Pengzhi Li
Pengfei Yu
Zide Liu
Wei He
Xuhao Pan
Xudong Rao
Tao Wei
Wei Chen
VLM
60
0
0
25 Feb 2025
Distributional Vision-Language Alignment by Cauchy-Schwarz Divergence
Wenzhe Yin
Zehao Xiao
Pan Zhou
Shujian Yu
Jiayi Shen
J. Sonke
E. Gavves
42
0
0
24 Feb 2025
Mitigating Hallucinations in Diffusion Models through Adaptive Attention Modulation
Trevine Oorloff
Yaser Yacoob
Abhinav Shrivastava
51
0
0
24 Feb 2025
Model Lakes
Koyena Pal
David Bau
Renée J. Miller
67
0
0
24 Feb 2025
Improved Diffusion-based Generative Model with Better Adversarial Robustness
Zekun Wang
Mingyang Yi
Shuchen Xue
Ziyu Li
Ming Liu
Bing Qin
Zhi-Ming Ma
DiffM
42
0
0
24 Feb 2025
HumanGif: Single-View Human Diffusion with Generative Prior
Shoukang Hu
Takuya Narihira
Kazumi Fukuda
Ryosuke Sawata
Takashi Shibuya
Yuki Mitsufuji
98
1
0
24 Feb 2025
Towards Hierarchical Rectified Flow
Yichi Zhang
Yici Yan
A. Schwing
Zhizhen Zhao
55
1
0
24 Feb 2025
Human2Robot: Learning Robot Actions from Paired Human-Robot Videos
Sicheng Xie
Haidong Cao
Zejia Weng
Zhen Xing
Shiwei Shen
Jiaqi Leng
Xipeng Qiu
Yanwei Fu
Zuxuan Wu
Yu Jiang
58
0
0
23 Feb 2025
Unified Prompt Attack Against Text-to-Image Generation Models
Duo Peng
Qiuhong Ke
Mark He Huang
Ping Hu
Jun Liu
50
0
0
23 Feb 2025
Dragen3D: Multiview Geometry Consistent 3D Gaussian Generation with Drag-Based Control
Jinbo Yan
Alan Zhao
Yixin Hu
3DGS
216
0
0
23 Feb 2025
A Systematic Review of Open Datasets Used in Text-to-Image (T2I) Gen AI Model Safety
Rakeen Rouf
Trupti Bavalatti
Osama Ahmed
Dhaval Potdar
Faraz Jawed
EGVM
72
1
0
23 Feb 2025
PersGuard: Preventing Malicious Personalization via Backdoor Attacks on Pre-trained Text-to-Image Diffusion Models
Xinwei Liu
Xiaojun Jia
Yuan Xun
Hua Zhang
Xiaochun Cao
DiffM
AAML
49
0
0
22 Feb 2025
DualNeRF: Text-Driven 3D Scene Editing via Dual-Field Representation
Yuxuan Xiong
Yue Shi
Yishun Dou
Bingbing Ni
DiffM
44
0
0
22 Feb 2025
Concept Corrector: Erase concepts on the fly for text-to-image diffusion models
Zheling Meng
Bo Peng
Xiaochuan Jin
Yueming Lyu
Wei Wang
Jing Dong
DiffM
48
2
0
22 Feb 2025
Dynamic Concepts Personalization from Single Videos
Rameen Abdal
Or Patashnik
Ivan Skorokhodov
Willi Menapace
Aliaksandr Siarohin
Sergey Tulyakov
Daniel Cohen-Or
Kfir Aberman
DiffM
VGen
57
0
0
21 Feb 2025
CrossVTON: Mimicking the Logic Reasoning on Cross-category Virtual Try-on guided by Tri-zone Priors
Donghao Luo
Yujie Liang
Xu Peng
Xiaobin Hu
Boyuan Jiang
C. Xu
Taisong Jin
Chengjie Wang
Yanwei Fu
59
0
0
21 Feb 2025
Data Attribution for Text-to-Image Models by Unlearning Synthesized Images
Sheng-Yu Wang
Aaron Hertzmann
Alexei A. Efros
Jun-Yan Zhu
Richard Zhang
TDI
128
2
0
21 Feb 2025
T2ISafety: Benchmark for Assessing Fairness, Toxicity, and Privacy in Image Generation
Lijun Li
Zhelun Shi
Xuhao Hu
Bowen Dong
Yiran Qin
Xihui Liu
Lu Sheng
Jing Shao
114
1
0
21 Feb 2025
Text-to-Image Rectified Flow as Plug-and-Play Priors
Xiaofeng Yang
Cheng Chen
Xulei Yang
Fayao Liu
Guosheng Lin
DiffM
73
7
0
21 Feb 2025
Previous
1
2
3
...
7
8
9
...
85
86
87
Next