ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2204.06125
  4. Cited By
Hierarchical Text-Conditional Image Generation with CLIP Latents

Hierarchical Text-Conditional Image Generation with CLIP Latents

13 April 2022
Aditya A. Ramesh
Prafulla Dhariwal
Alex Nichol
Casey Chu
Mark Chen
    VLM
    DiffM
ArXivPDFHTML

Papers citing "Hierarchical Text-Conditional Image Generation with CLIP Latents"

50 / 4,782 papers shown
Title
Batch-Instructed Gradient for Prompt Evolution:Systematic Prompt
  Optimization for Enhanced Text-to-Image Synthesis
Batch-Instructed Gradient for Prompt Evolution:Systematic Prompt Optimization for Enhanced Text-to-Image Synthesis
Xinrui Yang
Zhuohan Wang
Anthony Hu
EGVM
64
0
0
13 Jun 2024
MMFakeBench: A Mixed-Source Multimodal Misinformation Detection Benchmark for LVLMs
MMFakeBench: A Mixed-Source Multimodal Misinformation Detection Benchmark for LVLMs
Xuannan Liu
Zekun Li
Peipei Li
Shuhan Xia
Xing Cui
Linzhi Huang
Huaibo Huang
Weihong Deng
Zhaofeng He
66
19
0
13 Jun 2024
FakeInversion: Learning to Detect Images from Unseen Text-to-Image
  Models by Inverting Stable Diffusion
FakeInversion: Learning to Detect Images from Unseen Text-to-Image Models by Inverting Stable Diffusion
George Cazenavette
Avneesh Sud
Thomas Leung
Ben Usman
DiffM
39
14
0
12 Jun 2024
DiTFastAttn: Attention Compression for Diffusion Transformer Models
DiTFastAttn: Attention Compression for Diffusion Transformer Models
Zhihang Yuan
Pu Lu
Hanling Zhang
Xuefei Ning
Linfeng Zhang
Tianchen Zhao
Shengen Yan
Guohao Dai
Yu Wang
55
24
0
12 Jun 2024
What If We Recaption Billions of Web Images with LLaMA-3?
What If We Recaption Billions of Web Images with LLaMA-3?
Xianhang Li
Haoqin Tu
Mude Hui
Zeyu Wang
Bingchen Zhao
...
Jieru Mei
Qing Liu
Huangjie Zheng
Yuyin Zhou
Cihang Xie
VLM
MLLM
51
36
0
12 Jun 2024
PAL: Pluralistic Alignment Framework for Learning from Heterogeneous
  Preferences
PAL: Pluralistic Alignment Framework for Learning from Heterogeneous Preferences
Daiwei Chen
Yi Chen
Aniket Rege
Ramya Korlakai Vinayak
61
18
0
12 Jun 2024
From a Social Cognitive Perspective: Context-aware Visual Social
  Relationship Recognition
From a Social Cognitive Perspective: Context-aware Visual Social Relationship Recognition
Shiwei Wu
Chao Zhang
Joya Chen
Tong Xu
Likang Wu
Yao Hu
Enhong Chen
37
0
0
12 Jun 2024
Dataset Enhancement with Instance-Level Augmentations
Dataset Enhancement with Instance-Level Augmentations
Orest Kupyn
Christian Rupprecht
71
9
0
12 Jun 2024
Ablation Based Counterfactuals
Ablation Based Counterfactuals
Zheng Dai
David K Gifford
41
0
0
12 Jun 2024
DiffPop: Plausibility-Guided Object Placement Diffusion for Image
  Composition
DiffPop: Plausibility-Guided Object Placement Diffusion for Image Composition
Jiacheng Liu
Hang Zhou
Shida Wei
Rui Ma
69
3
0
12 Jun 2024
Hierarchical Patch Diffusion Models for High-Resolution Video Generation
Hierarchical Patch Diffusion Models for High-Resolution Video Generation
Ivan Skorokhodov
Willi Menapace
Aliaksandr Siarohin
Sergey Tulyakov
VGen
53
10
0
12 Jun 2024
CUPID: Contextual Understanding of Prompt-conditioned Image
  Distributions
CUPID: Contextual Understanding of Prompt-conditioned Image Distributions
Yayan Zhao
Mingwei Li
Matthew Berger
DiffM
41
2
0
11 Jun 2024
AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and
  Video Generation
AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation
Kai Wang
Shijian Deng
Jing Shi
Dimitrios Hatzinakos
Yapeng Tian
VGen
80
10
0
11 Jun 2024
Treeffuser: Probabilistic Predictions via Conditional Diffusions with
  Gradient-Boosted Trees
Treeffuser: Probabilistic Predictions via Conditional Diffusions with Gradient-Boosted Trees
Nicolas Beltran-Velez
Alessandro Antonio Grande
Achille Nazaret
A. Kucukelbir
David M. Blei
58
3
0
11 Jun 2024
Commonsense-T2I Challenge: Can Text-to-Image Generation Models
  Understand Commonsense?
Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense?
Xingyu Fu
Muyu He
Yujie Lu
William Yang Wang
Dan Roth
EGVM
LRM
36
17
0
11 Jun 2024
Understanding Visual Concepts Across Models
Understanding Visual Concepts Across Models
Brandon Trabucco
Max Gurinas
Kyle Doherty
Ruslan Salakhutdinov
VLM
53
0
0
11 Jun 2024
Image Textualization: An Automatic Framework for Creating Accurate and
  Detailed Image Descriptions
Image Textualization: An Automatic Framework for Creating Accurate and Detailed Image Descriptions
Renjie Pi
Jianshu Zhang
Jipeng Zhang
Rui Pan
Zhekai Chen
Tong Zhang
3DV
52
20
0
11 Jun 2024
Image Neural Field Diffusion Models
Image Neural Field Diffusion Models
Yinbo Chen
Oliver Wang
Richard Zhang
Eli Shechtman
Xiaolong Wang
Michael Gharbi
DiffM
65
3
0
11 Jun 2024
Haptic Repurposing with GenAI
Haptic Repurposing with GenAI
Haoyu Wang
57
0
0
11 Jun 2024
Open-World Human-Object Interaction Detection via Multi-modal Prompts
Open-World Human-Object Interaction Detection via Multi-modal Prompts
Jie Yang
Bingliang Li
Ailing Zeng
L. Zhang
Ruimao Zhang
VLM
55
8
0
11 Jun 2024
Autoregressive Model Beats Diffusion: Llama for Scalable Image
  Generation
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Peize Sun
Yi Jiang
Shoufa Chen
Shilong Zhang
Bingyue Peng
Ping Luo
Zehuan Yuan
VLM
73
247
0
10 Jun 2024
NaRCan: Natural Refined Canonical Image with Integration of Diffusion
  Prior for Video Editing
NaRCan: Natural Refined Canonical Image with Integration of Diffusion Prior for Video Editing
Ting-Hsuan Chen
Jiewen Chan
Hau-Shiang Shiu
Shih-Han Yen
Chang-Han Yeh
Yu-Lun Liu
VGen
DiffM
59
3
0
10 Jun 2024
AID: Adapting Image2Video Diffusion Models for Instruction-guided Video
  Prediction
AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction
Zhen Xing
Qi Dai
Zejia Weng
Zuxuan Wu
Yu-Gang Jiang
VGen
71
14
0
10 Jun 2024
Margin-aware Preference Optimization for Aligning Diffusion Models
  without Reference
Margin-aware Preference Optimization for Aligning Diffusion Models without Reference
Jiwoo Hong
Sayak Paul
Noah Lee
Kashif Rasul
James Thorne
Jongheon Jeong
51
14
0
10 Jun 2024
Diffusion-RPO: Aligning Diffusion Models through Relative Preference
  Optimization
Diffusion-RPO: Aligning Diffusion Models through Relative Preference Optimization
Yi Gu
Zhendong Wang
Yueqin Yin
Yujia Xie
Mingyuan Zhou
43
15
0
10 Jun 2024
MOSA: Music Motion with Semantic Annotation Dataset for Cross-Modal
  Music Processing
MOSA: Music Motion with Semantic Annotation Dataset for Cross-Modal Music Processing
Yu-Fen Huang
Nikki Moran
Simon Coleman
Jon Kelly
Shun-Hwa Wei
...
Chih-Hsuan Li
Da-Yu Huang
Hsuan-Kai Kao
Ting-Wei Lin
Li Su
46
1
0
10 Jun 2024
Tuning-Free Visual Customization via View Iterative Self-Attention
  Control
Tuning-Free Visual Customization via View Iterative Self-Attention Control
Xiaojie Li
Chenghao Gu
Shuzhao Xie
Yunpeng Bai
Weixiang Zhang
Zhi Wang
58
0
0
10 Jun 2024
ProcessPainter: Learn Painting Process from Sequence Data
ProcessPainter: Learn Painting Process from Sequence Data
Yiren Song
Shijie Huang
Chen Yao
Xiaojun Ye
Hai Ci
Jiaming Liu
Yuxuan Zhang
Mike Zheng Shou
DiffM
47
7
0
10 Jun 2024
Synthesizing Efficient Data with Diffusion Models for Person
  Re-Identification Pre-Training
Synthesizing Efficient Data with Diffusion Models for Person Re-Identification Pre-Training
Ke Niu
Haiyang Yu
X. Qian
Teng Fu
Bin Li
Xiangyang Xue
58
2
0
10 Jun 2024
FRAG: Frequency Adapting Group for Diffusion Video Editing
FRAG: Frequency Adapting Group for Diffusion Video Editing
Sunjae Yoon
Gwanhyeong Koo
Geonwoo Kim
Chang D. Yoo
DiffM
51
5
0
10 Jun 2024
OmniControlNet: Dual-stage Integration for Conditional Image Generation
OmniControlNet: Dual-stage Integration for Conditional Image Generation
Yilin Wang
Haiyang Xu
Xiang Zhang
Zeyuan Chen
Zhizhou Sha
Zirui Wang
Zhuowen Tu
VLM
42
15
0
09 Jun 2024
SAM-PM: Enhancing Video Camouflaged Object Detection using
  Spatio-Temporal Attention
SAM-PM: Enhancing Video Camouflaged Object Detection using Spatio-Temporal Attention
Muhammad Nawfal Meeran
Gokul Adethya T
Bhanu Pratyush Mantha
45
3
0
09 Jun 2024
A Survey on Text-guided 3D Visual Grounding: Elements, Recent Advances,
  and Future Directions
A Survey on Text-guided 3D Visual Grounding: Elements, Recent Advances, and Future Directions
Daizong Liu
Yang Liu
Wencan Huang
Wei Hu
LM&Ro
56
9
0
09 Jun 2024
PaRa: Personalizing Text-to-Image Diffusion via Parameter Rank Reduction
PaRa: Personalizing Text-to-Image Diffusion via Parameter Rank Reduction
Shangyu Chen
Zizheng Pan
Jianfei Cai
Dinh Q. Phung
53
1
0
09 Jun 2024
Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis
Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis
Zanlin Ni
Yulin Wang
Renping Zhou
Jiayi Guo
Jinyi Hu
Zhiyuan Liu
Shiji Song
Yuan Yao
Gao Huang
53
16
0
08 Jun 2024
DiffusionPID: Interpreting Diffusion via Partial Information
  Decomposition
DiffusionPID: Interpreting Diffusion via Partial Information Decomposition
Shaurya Dewan
Rushikesh Zawar
Prakanshul Saxena
Yingshan Chang
Andrew F. Luo
Yonatan Bisk
DiffM
66
4
0
07 Jun 2024
AttnDreamBooth: Towards Text-Aligned Personalized Text-to-Image
  Generation
AttnDreamBooth: Towards Text-Aligned Personalized Text-to-Image Generation
Lianyu Pang
Jian Yin
Baoquan Zhao
Feize Wu
Fu Lee Wang
Qing Li
Xudong Mao
DiffM
62
1
0
07 Jun 2024
Faster Than Lies: Real-time Deepfake Detection using Binary Neural
  Networks
Faster Than Lies: Real-time Deepfake Detection using Binary Neural Networks
Lanzino Romeo
Fontana Federico
Diko Anxhelo
Marini Marco Raoul
Cinque Luigi
54
18
0
07 Jun 2024
Seeing the Unseen: Visual Metaphor Captioning for Videos
Seeing the Unseen: Visual Metaphor Captioning for Videos
Abisek Rajakumar Kalarani
Pushpak Bhattacharyya
Sumit Shekhar
VLM
42
1
0
07 Jun 2024
Variational Flow Matching for Graph Generation
Variational Flow Matching for Graph Generation
Floor Eijkelboom
Grigory Bartosh
C. A. Naesseth
Max Welling
Jan-Willem van de Meent
54
11
0
07 Jun 2024
STAR: Skeleton-aware Text-based 4D Avatar Generation with In-Network
  Motion Retargeting
STAR: Skeleton-aware Text-based 4D Avatar Generation with In-Network Motion Retargeting
Zenghao Chai
Chen Tang
Yongkang Wong
Mohan Kankanhalli
DiffM
63
9
0
07 Jun 2024
CTSyn: A Foundational Model for Cross Tabular Data Generation
CTSyn: A Foundational Model for Cross Tabular Data Generation
Xiaofeng Lin
Chenheng Xu
Matthew Yang
Guang Cheng
48
3
0
07 Jun 2024
Towards Semantic Equivalence of Tokenization in Multimodal LLM
Towards Semantic Equivalence of Tokenization in Multimodal LLM
Shengqiong Wu
Hao Fei
Xiangtai Li
Jiayi Ji
Hanwang Zhang
Tat-Seng Chua
Shuicheng Yan
MLLM
67
33
0
07 Jun 2024
Improving Geo-diversity of Generated Images with Contextualized Vendi
  Score Guidance
Improving Geo-diversity of Generated Images with Contextualized Vendi Score Guidance
Reyhane Askari Hemmat
Melissa Hall
Alicia Sun
Candace Ross
M. Drozdzal
Adriana Romero Soriano
63
5
0
06 Jun 2024
M&M VTO: Multi-Garment Virtual Try-On and Editing
M&M VTO: Multi-Garment Virtual Try-On and Editing
Luyang Zhu
Yingwei Li
Nan Liu
Hao Peng
Dawei Yang
Ira Kemelmacher-Shlizerman
DiffM
60
7
0
06 Jun 2024
GenAI Arena: An Open Evaluation Platform for Generative Models
GenAI Arena: An Open Evaluation Platform for Generative Models
Dongfu Jiang
Max Ku
Tianle Li
Yuansheng Ni
Shizhuo Sun
Rongqi Fan
Wenhu Chen
EGVM
46
20
0
06 Jun 2024
Coherent Zero-Shot Visual Instruction Generation
Coherent Zero-Shot Visual Instruction Generation
Quynh Phung
Songwei Ge
Jia-Bin Huang
57
2
0
06 Jun 2024
BitsFusion: 1.99 bits Weight Quantization of Diffusion Model
BitsFusion: 1.99 bits Weight Quantization of Diffusion Model
Yang Sui
Yanyu Li
Anil Kag
Yerlan Idelbayev
Junli Cao
Ju Hu
Dhritiman Sagar
Bo Yuan
Sergey Tulyakov
Jian Ren
MQ
52
19
0
06 Jun 2024
Zero-Painter: Training-Free Layout Control for Text-to-Image Synthesis
Zero-Painter: Training-Free Layout Control for Text-to-Image Synthesis
Marianna Ohanyan
Hayk Manukyan
Zhangyang Wang
Shant Navasardyan
Humphrey Shi
DiffM
75
1
0
06 Jun 2024
Evaluating Durability: Benchmark Insights into Multimodal Watermarking
Evaluating Durability: Benchmark Insights into Multimodal Watermarking
Jielin Qiu
William Jongwon Han
Xuandong Zhao
Shangbang Long
Christos Faloutsos
Lei Li
80
1
0
06 Jun 2024
Previous
123...262728...949596
Next