Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.00020
Cited By
Learning Transferable Visual Models From Natural Language Supervision
26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Github (29177★)
Papers citing
"Learning Transferable Visual Models From Natural Language Supervision"
50 / 1,722 papers shown
Title
Nearly Zero-Cost Protection Against Mimicry by Personalized Diffusion Models
Namhyuk Ahn
Kiyoon Yoo
Wonhyuk Ahn
Daesik Kim
Seung-Hun Nam
AAML
WIGM
DiffM
160
0
0
16 Dec 2024
ColorFlow: Retrieval-Augmented Image Sequence Colorization
Junhao Zhuang
Xuan Ju
Zhe Zhang
Yong-Jin Liu
Shiyi Zhang
Chun Yuan
Ying Shan
DiffM
161
1
0
16 Dec 2024
InterDyn: Controllable Interactive Dynamics with Video Diffusion Models
Rick Akkerman
Haiwen Feng
M. Black
Dimitrios Tzionas
Victoria Fernandez-Abrevaya
VGen
AI4CE
180
3
0
16 Dec 2024
Can video generation replace cinematographers? Research on the cinematic language of generated video
Xuelong Li
Kai WU
Siyi Yang
YiZhan Qu
Guohua. Zhang
...
Mingliang Xiong
Hao Deng
Qingwen Liu
Gang Li
Bin He
VGen
DiffM
157
1
0
16 Dec 2024
Gramian Multimodal Representation Learning and Alignment
Giordano Cicchetti
Eleonora Grassucci
Luigi Sigillo
Danilo Comminiello
175
3
0
16 Dec 2024
Transferable Adversarial Face Attack with Text Controlled Attribute
Wenyun Li
Zheng Zhang
X. Lan
D. Jiang
AAML
144
2
0
16 Dec 2024
CLIP-SR: Collaborative Linguistic and Image Processing for Super-Resolution
Bingwen Hu
Heng Liu
Zhedong Zheng
Ping Liu
SupR
223
0
0
16 Dec 2024
Wearable Accelerometer Foundation Models for Health via Knowledge Distillation
Salar Abbaspourazad
Anshuman Mishra
Joseph D. Futoma
Andrew C. Miller
Ian Shapiro
147
0
0
15 Dec 2024
Empowering LLMs to Understand and Generate Complex Vector Graphics
Ximing Xing
Juncheng Hu
Guotao Liang
Jing Zhang
Dong Xu
Qian Yu
158
12
0
15 Dec 2024
Adapter-Enhanced Semantic Prompting for Continual Learning
Baocai Yin
Ji Zhao
Huajie Jiang
Ningning Hou
Yongli Hu
Amin Beheshti
Ming-Hsuan Yang
Yuankai Qi
CLL
VLM
179
0
0
15 Dec 2024
Segment-Level Diffusion: A Framework for Controllable Long-Form Generation with Diffusion Language Models
Xiaochen Zhu
Georgi Karadzhov
Chenxi Whitehouse
Andreas Vlachos
DiffM
153
1
0
15 Dec 2024
Video Diffusion Transformers are In-Context Learners
Zhengcong Fei
Di Qiu
Changqian Yu
Debang Li
Mingyuan Fan
VGen
DiffM
380
3
0
14 Dec 2024
SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer
Hong Chen
Zihan Wang
Xianrui Li
Xingwu Sun
Fangyi Chen
Jiang Liu
Jiadong Wang
Bhiksha Raj
Zicheng Liu
Emad Barsoum
VLM
241
10
0
14 Dec 2024
SVGBuilder: Component-Based Colored SVG Generation with Text-Guided Autoregressive Transformers
Zehao Chen
Rong Pan
136
2
0
13 Dec 2024
Feature Selection for Latent Factor Models
Rittwika Kansabanik
Adrian Barbu
192
0
0
13 Dec 2024
Omni-ID: Holistic Identity Representation Designed for Generative Tasks
Guocheng Qian
Kuan-Chieh Wang
Or Patashnik
Negin Heravi
Daniil Ostashev
Sergey Tulyakov
Daniel Cohen-Or
Kfir Aberman
129
5
0
12 Dec 2024
TimeRefine: Temporal Grounding with Time Refining Video LLM
Xizi Wang
Feng Cheng
Ziyang Wang
Huiyu Wang
Md. Mohaiminul Islam
Lorenzo Torresani
Joey Tianyi Zhou
Gedas Bertasius
David J. Crandall
162
2
0
12 Dec 2024
Mojito: Motion Trajectory and Intensity Control for Video Generation
Xuehai He
Shuohang Wang
Jianwei Yang
Xiaoxia Wu
Yansen Wang
Kuan-Chieh Wang
Z. Zhan
Olatunji Ruwase
Yelong Shen
Xinze Wang
VGen
216
2
0
12 Dec 2024
SVGFusion: Scalable Text-to-SVG Generation via Vector Space Diffusion
Ximing Xing
Juncheng Hu
Jing Zhang
Dong Xu
Qian Yu
189
4
0
11 Dec 2024
jina-clip-v2: Multilingual Multimodal Embeddings for Text and Images
Andreas Koukounas
Georgios Mastrapas
Bo Wang
Mohammad Kalim Akram
Sedigheh Eslami
Michael Gunther
Isabelle Mohr
Saba Sturua
Scott Martens
Nan Wang
VLM
322
10
0
11 Dec 2024
VariFace: Fair and Diverse Synthetic Dataset Generation for Face Recognition
Michael Yeung
Toya Teramoto
Songtao Wu
Tatsuo Fujiwara
Kenji Suzuki
Tamaki Kojima
166
2
0
09 Dec 2024
AnyBimanual: Transferring Unimanual Policy for General Bimanual Manipulation
Guanxing Lu
Tengbo Yu
Haoyuan Deng
Season Si Chen
Yansong Tang
Ziwei Wang
140
3
0
09 Dec 2024
Black Swan: Abductive and Defeasible Video Reasoning in Unpredictable Events
Aditya Chinchure
Sahithya Ravi
R. Ng
Vered Shwartz
Boyang Albert Li
Leonid Sigal
ReLM
LRM
VLM
160
3
0
07 Dec 2024
PaintScene4D: Consistent 4D Scene Generation from Text Prompts
Vinayak Gupta
Yunze Man
Yu-Xiong Wang
VGen
140
0
0
05 Dec 2024
CLIP-PING: Boosting Lightweight Vision-Language Models with Proximus Intrinsic Neighbors Guidance
Chu Myaet Thwal
Ye Lin Tun
Minh N. H. Nguyen
Eui-nam Huh
Choong Seon Hong
VLM
137
0
0
05 Dec 2024
HumanEdit: A High-Quality Human-Rewarded Dataset for Instruction-based Image Editing
Jinbin Bai
Wei Chow
L. Yang
Hefei Ling
Juncheng Billy Li
Hao Zhang
Shuicheng Yan
169
10
0
05 Dec 2024
EgoPlan-Bench2: A Benchmark for Multimodal Large Language Model Planning in Real-World Scenarios
Lu Qiu
Yuying Ge
Yi Chen
Yixiao Ge
Ying Shan
Xihui Liu
LLMAG
LRM
183
8
0
05 Dec 2024
AdvDreamer Unveils: Are Vision-Language Models Truly Ready for Real-World 3D Variations?
Shouwei Ruan
Hanqin Liu
Yao Huang
Xiaoqi Wang
Caixin Kang
Hang Su
Yinpeng Dong
Xingxing Wei
VGen
186
0
0
04 Dec 2024
DynamicControl: Adaptive Condition Selection for Improved Text-to-Image Generation
Qu He
Jinlong Peng
P. Xu
Boyuan Jiang
Xiaobin Hu
...
Yang Liu
Yun Wang
Chengjie Wang
Xuelong Li
Jing Zhang
DiffM
187
1
0
04 Dec 2024
SGSST: Scaling Gaussian Splatting StyleTransfer
B. Galerne
Jianling Wang
Lara Raad
Jean-Michel Morel
3DGS
156
1
0
04 Dec 2024
Expanding Event Modality Applications through a Robust CLIP-Based Encoder
SungHeon Jeong
Hanning Chen
Sanggeon Yun
Suhyeon Cho
Wenjun Huang
Xiangjian Liu
Mohsen Imani
163
2
0
04 Dec 2024
Progress-Aware Video Frame Captioning
Zihui Xue
Joungbin An
Xitong Yang
Kristen Grauman
213
1
0
03 Dec 2024
COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training
Sanghwan Kim
Rui Xiao
Mariana-Iuliana Georgescu
Stephan Alaniz
Zeynep Akata
VLM
311
3
0
02 Dec 2024
OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows
Shufan Li
Konstantinos Kallidromitis
Akash Gokul
Zichun Liao
Yusuke Kato
Kazuki Kozuka
Aditya Grover
VGen
166
9
0
02 Dec 2024
EmojiDiff: Advanced Facial Expression Control with High Identity Preservation in Portrait Generation
Liangwei Jiang
Ruida Li
Zhifeng Zhang
Shuo Fang
Chenguang Ma
DiffM
151
1
0
02 Dec 2024
Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis
Anton Voronov
Denis Kuznedelev
Mikhail Khoroshikh
Valentin Khrulkov
Dmitry Baranchuk
219
4
0
02 Dec 2024
DiffPatch: Generating Customizable Adversarial Patches using Diffusion Models
Zhixiang Wang
Guangnan Ye
Xinyu Wang
Siheng Chen
Ziyi Wang
Xingjun Ma
Yu-Gang Jiang
AAML
DiffM
164
0
0
02 Dec 2024
Second FRCSyn-onGoing: Winning Solutions and Post-Challenge Analysis to Improve Face Recognition with Synthetic Data
Ivan Deandres-Tame
Ruben Tolosana
Pietro Melzi
R. Vera-Rodríguez
Minchul Kim
...
Bernardo Biesseck
Pedro Vidal
Luiz Coelho
Roger Granada
David Menotti
148
2
0
02 Dec 2024
See What You Seek: Semantic Contextual Integration for Cloth-Changing Person Re-Identification
Xiyu Han
Xian Zhong
Wenxin Huang
Xuemei Jia
Xiaohan Yu
Alex Chichung Kot
163
0
0
02 Dec 2024
SerialGen: Personalized Image Generation by First Standardization Then Personalization
Cong Xie
Han Zou
Ruiqi Yu
Yan Zhang
Zhenpeng Zhan
135
1
0
02 Dec 2024
Referring Video Object Segmentation via Language-aligned Track Selection
Seongchan Kim
Woojeong Jin
Sangbeom Lim
Heeji Yoon
Hyunwook Choi
Seungryong Kim
VOS
158
0
0
02 Dec 2024
SeqAfford: Sequential 3D Affordance Reasoning via Multimodal Large Language Model
Chunlin Yu
Hanqing Wang
Ye Shi
Haoyang Luo
Sibei Yang
Jingyi Yu
Jingya Wang
LRM
LM&Ro
170
3
0
02 Dec 2024
VideoSAVi: Self-Aligned Video Language Models without Human Supervision
Yogesh Kulkarni
Pooyan Fazli
VLM
207
2
0
01 Dec 2024
EDTformer: An Efficient Decoder Transformer for Visual Place Recognition
Tong Jin
Feng Lu
Shuyu Hu
Chun Yuan
Yunpeng Liu
ViT
130
0
0
01 Dec 2024
Adaptive Rank, Reduced Forgetting: Knowledge Retention in Continual Learning Vision-Language Models with Dynamic Rank-Selective LoRA
Haodong Lu
Chongyang Zhao
Jason Xue
Lina Yao
Kristen Moore
Dong Gong
CLL
KELM
VLM
183
7
0
01 Dec 2024
Spline-FRIDA: Towards Diverse, Humanlike Robot Painting Styles with a Sample-Efficient, Differentiable Brush Stroke Model
Lawrence Chen
Peter Schaldenbrand
Tanmay Shankar
Lia Coleman
Jean Oh
99
0
0
30 Nov 2024
DisCoRD: Discrete Tokens to Continuous Motion via Rectified Flow Decoding
Jungbin Cho
Junwan Kim
Jisoo Kim
Minseo Kim
Mingu Kang
S. Hong
Tae-Hyun Oh
Youngjae Yu
VGen
154
1
0
29 Nov 2024
T-3DGS: Removing Transient Objects for 3D Scene Reconstruction
Vadim Pryadilshchikov
Alexander Markin
Artem Komarichev
Ruslan Rakhimov
Peter Wonka
Evgeny Burnaev
3DGS
185
4
0
29 Nov 2024
Bootstraping Clustering of Gaussians for View-consistent 3D Scene Understanding
Weinan Zhang
Lu Zhang
Ping Hu
Liqian Ma
Yunzhi Zhuge
Huchuan Lu
3DGS
116
2
0
29 Nov 2024
ROSE: Revolutionizing Open-Set Dense Segmentation with Patch-Wise Perceptual Large Multimodal Model
Kunyang Han
Yibo Hu
Mengxue Qu
Hailin Shi
Yao Zhao
Y. X. Wei
MLLM
VLM
3DV
231
1
0
29 Nov 2024
Previous
1
2
3
...
16
17
18
...
33
34
35
Next