Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2204.06125
Cited By
Hierarchical Text-Conditional Image Generation with CLIP Latents
13 April 2022
Aditya A. Ramesh
Prafulla Dhariwal
Alex Nichol
Casey Chu
Mark Chen
VLM
DiffM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Hierarchical Text-Conditional Image Generation with CLIP Latents"
50 / 4,897 papers shown
Title
Token-Level Constraint Boundary Search for Jailbreaking Text-to-Image Models
Qingbin Liu
Zhaoxin Wang
Handing Wang
Cong Tian
Yaochu Jin
54
1
0
15 Apr 2025
Aligning Generative Denoising with Discriminative Objectives Unleashes Diffusion for Visual Perception
Ziqi Pang
Xin Xu
Yu-Xiong Wang
DiffM
193
0
0
15 Apr 2025
Separate to Collaborate: Dual-Stream Diffusion Model for Coordinated Piano Hand Motion Synthesis
Zihao Liu
Mingwen Ou
Zunnan Xu
Jiaqi Huang
Haonan Han
Ronghui Li
Xiaochen Li
DiffM
108
0
0
14 Apr 2025
InstructEngine: Instruction-driven Text-to-Image Alignment
Xingyu Lu
Yihan Hu
Yuanxing Zhang
Kaiyu Jiang
Changyi Liu
...
Bin Wen
C. Yuan
Fan Yang
Yan Li
Di Zhang
127
0
0
14 Apr 2025
GeoUni: A Unified Model for Generating Geometry Diagrams, Problems and Problem Solutions
Jo-Ku Cheng
Zeren Zhang
Ran Chen
Jingyang Deng
Ziran Qin
Jinwen Ma
100
1
0
14 Apr 2025
An Image is Worth
K
K
K
Topics: A Visual Structural Topic Model with Pretrained Image Embeddings
Matías Piqueras
Alexandra Segerberg
Matteo Magnani
Måns Magnusson
Nataša Sladoje
101
0
0
14 Apr 2025
Efficient Generative Model Training via Embedded Representation Warmup
Deyuan Liu
Peng Sun
Xufeng Li
Tao Lin
72
0
0
14 Apr 2025
Omni-Dish: Photorealistic and Faithful Image Generation and Editing for Arbitrary Chinese Dishes
Huijie Liu
Bingcan Wang
Jie Hu
Xiaoming Wei
Guoliang Kang
134
0
0
14 Apr 2025
Automatic Detection of Intro and Credits in Video using CLIP and Multihead Attention
Vasilii Korolkov
Andrey Yanchenko
VLM
76
1
0
13 Apr 2025
D
2
^2
2
iT: Dynamic Diffusion Transformer for Accurate Image Generation
Weinan Jia
Mengqi Huang
Nan Chen
Lei Zhang
Zhendong Mao
86
0
0
13 Apr 2025
SD-ReID: View-aware Stable Diffusion for Aerial-Ground Person Re-Identification
Xiang Hu
Pingping Zhang
Yuhao Wang
Bin Yan
Huchuan Lu
58
0
0
13 Apr 2025
Scalable Motion In-betweening via Diffusion and Physics-Based Character Adaptation
Jia Qin
DiffM
VGen
68
0
0
13 Apr 2025
UniFlowRestore: A General Video Restoration Framework via Flow Matching and Prompt Guidance
Siyang Song
Yu Zhang
Chen Wu
Dianjie Lu
Dianjie Lu
Guijuan Zhan
Yang Weng
Zhuoran Zheng
DiffM
VGen
61
0
0
12 Apr 2025
Towards Explainable Partial-AIGC Image Quality Assessment
Jiaying Qian
Ziheng Jia
Zicheng Zhang
Zeyu Zhang
Guangtao Zhai
Xiongkuo Min
71
0
0
12 Apr 2025
Generating Fine Details of Entity Interactions
Xinyi Gu
Jiayuan Mao
150
0
0
11 Apr 2025
AGENT: An Aerial Vehicle Generation and Design Tool Using Large Language Models
Colin Samplawski
Adam Cobb
Susmit Jha
LLMAG
AI4CE
110
0
0
11 Apr 2025
PixelFlow: Pixel-Space Generative Models with Flow
Shoufa Chen
Chongjian Ge
Shilong Zhang
Peize Sun
Ping Luo
VLM
DRL
65
0
0
10 Apr 2025
Teaching Humans Subtle Differences with DIFFusion
Mia Chiquier
Orr Avrech
Yossi Gandelsman
Berthy Feng
Katherine Bouman
Carl Vondrick
DiffM
128
0
0
10 Apr 2025
POEM: Precise Object-level Editing via MLLM control
Marco Schouten
Mehmet Onurcan Kaya
Serge Belongie
Dim P. Papadopoulos
DiffM
101
0
0
10 Apr 2025
Marmot: Multi-Agent Reasoning for Multi-Object Self-Correcting in Improving Image-Text Alignment
Jiayang Sun
Hongru Wang
Jie Cao
Huaibo Huang
Ran He
DiffM
110
0
0
10 Apr 2025
Compass Control: Multi Object Orientation Control for Text-to-Image Generation
Rishubh Parihar
Vaibhav Agrawal
Sachidanand VS
R. V. Babu
DiffM
118
0
0
09 Apr 2025
A Meaningful Perturbation Metric for Evaluating Explainability Methods
Danielle Cohen
Hila Chefer
Lior Wolf
AAML
59
0
0
09 Apr 2025
A Unified Agentic Framework for Evaluating Conditional Image Generation
Jifang Wang
Xue Yang
Longyue Wang
Zhenran Xu
Yansen Wang
Yaowei Wang
Weihua Luo
Kaifu Zhang
Baotian Hu
Min Zhang
EGVM
DiffM
129
2
0
09 Apr 2025
IGG: Image Generation Informed by Geodesic Dynamics in Deformation Spaces
Nian Wu
Nivetha Jayakumar
Jiarui Xing
Miaomiao Zhang
100
0
0
09 Apr 2025
CDM-QTA: Quantized Training Acceleration for Efficient LoRA Fine-Tuning of Diffusion Model
Jinming Lu
Minghao She
Wendong Mao
Zhongfeng Wang
MQ
47
0
0
08 Apr 2025
Tuning-Free Image Editing with Fidelity and Editability via Unified Latent Diffusion Model
Qi Mao
Lawrence Yunliang Chen
Yuchao Gu
Mike Zheng Shou
Ming-Hsuan Yang
DiffM
80
0
0
08 Apr 2025
Reinforced Multi-teacher Knowledge Distillation for Efficient General Image Forgery Detection and Localization
Zeqin Yu
Jiangqun Ni
Jian Zhang
Haoyi Deng
Yuzhen Lin
72
0
0
07 Apr 2025
SCAM: A Real-World Typographic Robustness Evaluation for Multimodal Foundation Models
Justus Westerhoff
Erblina Purellku
Jakob Hackstein
Jonas Loos
Leo Pinetzki
Lorenz Hufe
AAML
139
0
0
07 Apr 2025
Enhancing Compositional Reasoning in Vision-Language Models with Synthetic Preference Data
Samarth Mishra
Kate Saenko
Venkatesh Saligrama
CoGe
LRM
69
0
0
07 Apr 2025
Dimension-Free Convergence of Diffusion Models for Approximate Gaussian Mixtures
Gen Li
Changxiao Cai
Yuting Wei
DiffM
75
1
0
07 Apr 2025
CADCrafter: Generating Computer-Aided Design Models from Unconstrained Images
Cheng Chen
Jiacheng Wei
Tianrun Chen
Chi Zhang
Xiaofeng Yang
...
Bingchen Yang
Chuan-Sheng Foo
Guosheng Lin
Qixing Huang
Fayao Liu
88
4
0
07 Apr 2025
TestDG: Test-time Domain Generalization for Continual Test-time Adaptation
Sohyun Lee
N. Kim
Juwon Kang
Seong Joon Oh
Suha Kwak
OOD
TTA
191
0
0
07 Apr 2025
PartStickers: Generating Parts of Objects for Rapid Prototyping
Mo Zhou
Josh Myers-Dean
Danna Gurari
97
0
0
07 Apr 2025
UniToken: Harmonizing Multimodal Understanding and Generation through Unified Visual Encoding
Yang Jiao
Haibo Qiu
Zequn Jie
Tian Jin
Jingjing Chen
Lin Ma
Yu Jiang
106
10
0
06 Apr 2025
Can You Count to Nine? A Human Evaluation Benchmark for Counting Limits in Modern Text-to-Video Models
Xuyang Guo
Zekai Huang
Jiayan Huo
Yingyu Liang
Zhenmei Shi
Zhao Song
Jiahao Zhang
ALM
VGen
198
6
0
05 Apr 2025
Multi-identity Human Image Animation with Structural Video Diffusion
Zhenzhi Wang
Yongqian Li
Yanhong Zeng
Yuwei Guo
Dahua Lin
Tianfan Xue
Bo Dai
VGen
75
2
0
05 Apr 2025
Structured Knowledge Accumulation: The Principle of Entropic Least Action in Forward-Only Neural Learning
Bouarfa Mahi Quantiota
87
0
0
04 Apr 2025
MD-ProjTex: Texturing 3D Shapes with Multi-Diffusion Projection
Ahmet Burak Yildirim
Mustafa Utku Aydogdu
Duygu Ceylan
Aysegül Dündar
DiffM
133
1
0
03 Apr 2025
Fine-Tuning Visual Autoregressive Models for Subject-Driven Generation
Jiwoo Chung
Sangeek Hyun
Hyunjun Kim
Eunseo Koh
MinKyu Lee
Jae-Pil Heo
74
0
0
03 Apr 2025
MultiNeRF: Multiple Watermark Embedding for Neural Radiance Fields
Yash Kulthe
Andrew Gilbert
John Collomosse
128
0
0
03 Apr 2025
VARGPT-v1.1: Improve Visual Autoregressive Large Unified Model via Iterative Instruction Tuning and Reinforcement Learning
Xianwei Zhuang
Yuxin Xie
Yufan Deng
Dongchao Yang
Liming Liang
Jinghan Ru
Yuguo Yin
Yuexian Zou
155
5
0
03 Apr 2025
Scene Splatter: Momentum 3D Scene Generation from Single Image with Video Diffusion Model
Shengjun Zhang
Jinzhao Li
Xin Fei
Hao Liu
Yueqi Duan
DiffM
3DGS
VGen
110
1
0
03 Apr 2025
GPT-ImgEval: A Comprehensive Benchmark for Diagnosing GPT4o in Image Generation
Zhiyuan Yan
Junyan Ye
Weijia Li
Zilong Huang
Shenghai Yuan
Xiangyang He
Kaiqing Lin
Jun-Jian He
Conghui He
Li Yuan
MLLM
EGVM
191
24
0
03 Apr 2025
Pro-DG: Procedural Diffusion Guidance for Architectural Facade Generation
Aleksander Plocharski
Jan Swidzinski
Przemyslaw Musialski
DiffM
50
0
0
02 Apr 2025
Multi-party Collaborative Attention Control for Image Customization
Han Yang
Chuanguang Yang
Qiuli Wang
Zhulin An
Weilun Feng
Libo Huang
Yongjun Xu
DiffM
108
1
0
02 Apr 2025
FlowMotion: Target-Predictive Conditional Flow Matching for Jitter-Reduced Text-Driven Human Motion Generation
Manolo Canales Cuba
Vinícius do Carmo Melício
João Paulo Gois
3DH
119
0
0
02 Apr 2025
Random Conditioning with Distillation for Data-Efficient Diffusion Model Compression
Dohyun Kim
S. Park
Geonhee Han
Seung Wook Kim
Paul Hongsuck Seo
DiffM
106
0
0
02 Apr 2025
Less-to-More Generalization: Unlocking More Controllability by In-Context Generation
Shaojin Wu
Mengqi Huang
Wenxu Wu
Yufeng Cheng
Fei Ding
Qian He
DiffM
122
12
0
02 Apr 2025
Implicit Bias Injection Attacks against Text-to-Image Diffusion Models
Huayang Huang
Xiangye Jin
Jiaxu Miao
Yu Wu
88
0
0
02 Apr 2025
Prompting Medical Vision-Language Models to Mitigate Diagnosis Bias by Generating Realistic Dermoscopic Images
Nusrat Munia
Abdullah-Al-Zubaer Imran
LM&MA
MedIm
62
0
0
02 Apr 2025
Previous
1
2
3
...
5
6
7
...
96
97
98
Next