Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.00020
Cited By
Learning Transferable Visual Models From Natural Language Supervision
26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Learning Transferable Visual Models From Natural Language Supervision"
50 / 10,282 papers shown
Title
A CLIP-Enhanced Method for Video-Language Understanding
Guohao Li
Feng He
Zhifan Feng
CLIP
31
12
0
14 Oct 2021
Subspace Regularizers for Few-Shot Class Incremental Learning
Afra Feyza Akyürek
Ekin Akyürek
Derry Wijaya
Jacob Andreas
CLL
21
59
0
13 Oct 2021
Decoupled Contrastive Learning
Chun-Hsiao Yeh
Cheng-Yao Hong
Yen-Chi Hsu
Tyng-Luh Liu
Yubei Chen
Yann LeCun
183
183
0
13 Oct 2021
CLIP4Caption: CLIP for Video Caption
Mingkang Tang
Zhanyu Wang
Zhenhua Liu
Fengyun Rao
Dian Li
Xiu Li
CLIP
VLM
35
150
0
13 Oct 2021
Detecting Corrupted Labels Without Training a Model to Predict
Zhaowei Zhu
Zihao Dong
Yang Liu
NoLa
149
62
0
12 Oct 2021
Multi-modal Self-supervised Pre-training for Regulatory Genome Across Cell Types
Shentong Mo
Xiao Fu
Chenyang Hong
Yizhen Chen
Yuxuan Zheng
Xiangru Tang
Zhiqiang Shen
Eric Xing
Yanyan Lan
AI4CE
31
19
0
11 Oct 2021
Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm
Yangguang Li
Feng Liang
Lichen Zhao
Yufeng Cui
Wanli Ouyang
Jing Shao
F. Yu
Junjie Yan
VLM
CLIP
50
448
0
11 Oct 2021
Rethinking Person Re-Identification via Semantic-Based Pretraining
Suncheng Xiang
Jingsheng Gao
Zi-Yu Zhang
Mengyuan Guan
Binghai Yan
Ting Liu
Xiaobo Li
Yuzhuo Fu
VLM
32
11
0
11 Oct 2021
Embed Everything: A Method for Efficiently Co-Embedding Multi-Modal Spaces
Sarah Di
Robin Yu
Amol Kapoor
16
0
0
09 Oct 2021
CLIP-Adapter: Better Vision-Language Models with Feature Adapters
Peng Gao
Shijie Geng
Renrui Zhang
Teli Ma
Rongyao Fang
Yongfeng Zhang
Hongsheng Li
Yu Qiao
VLM
CLIP
104
992
0
09 Oct 2021
Temperature as Uncertainty in Contrastive Learning
Oliver Zhang
Mike Wu
Jasmine Bayrooti
Noah D. Goodman
UQCV
22
30
0
08 Oct 2021
Inferring Offensiveness In Images From Natural Language Supervision
P. Schramowski
Kristian Kersting
32
2
0
08 Oct 2021
Adversarial Retriever-Ranker for dense text retrieval
Hang Zhang
Yeyun Gong
Yelong Shen
Jiancheng Lv
Nan Duan
Weizhu Chen
VLM
RALM
48
115
0
07 Oct 2021
Human in the Loop for Machine Creativity
N. C. Chung
39
15
0
07 Oct 2021
Cut the CARP: Fishing for zero-shot story evaluation
Shahbuland Matiana
J. Smith
Ryan Teehan
Louis Castricato
Stella Biderman
Leo Gao
Spencer Frazier
49
16
0
06 Oct 2021
DiffusionCLIP: Text-Guided Diffusion Models for Robust Image Manipulation
Gwanghyun Kim
Taesung Kwon
Jong Chul Ye
DiffM
72
625
0
06 Oct 2021
Objects in Semantic Topology
Shuo Yang
Pei Sun
Yi-Xin Jiang
Xiaobo Xia
Ruiheng Zhang
Zehuan Yuan
Changhu Wang
Ping Luo
Min Xu
ObjD
89
29
0
06 Oct 2021
CLIP-Forge: Towards Zero-Shot Text-to-Shape Generation
Aditya Sanghi
Hang Chu
Joseph G. Lambourne
Ye Wang
Chin-Yi Cheng
Marco Fumero
Kamal Rahimi Malekshan
CLIP
60
289
0
06 Oct 2021
Exploring the Limits of Large Scale Pre-training
Samira Abnar
Mostafa Dehghani
Behnam Neyshabur
Hanie Sedghi
AI4CE
60
114
0
05 Oct 2021
CrossCLR: Cross-modal Contrastive Learning For Multi-modal Video Representations
Mohammadreza Zolfaghari
Yi Zhu
Peter V. Gehler
Thomas Brox
137
127
0
30 Sep 2021
VideoCLIP: Contrastive Pre-training for Zero-shot Video-Text Understanding
Hu Xu
Gargi Ghosh
Po-Yao (Bernie) Huang
Dmytro Okhonko
Armen Aghajanyan
Florian Metze
Luke Zettlemoyer
Florian Metze Luke Zettlemoyer Christoph Feichtenhofer
CLIP
VLM
259
561
0
28 Sep 2021
ClipMatrix: Text-controlled Creation of 3D Textured Meshes
Nikolay Jetchev
CLIP
11
40
0
27 Sep 2021
An animated picture says at least a thousand words: Selecting Gif-based Replies in Multimodal Dialog
Xingyao Wang
David Jurgens
24
5
0
24 Sep 2021
CLIPort: What and Where Pathways for Robotic Manipulation
Mohit Shridhar
Lucas Manuelli
Dieter Fox
LM&Ro
65
633
0
24 Sep 2021
CPT: Colorful Prompt Tuning for Pre-trained Vision-Language Models
Yuan Yao
Ao Zhang
Zhengyan Zhang
Zhiyuan Liu
Tat-Seng Chua
Maosong Sun
MLLM
VPVLM
VLM
211
221
0
24 Sep 2021
Dense Contrastive Visual-Linguistic Pretraining
Lei Shi
Kai Shuang
Shijie Geng
Peng Gao
Zuohui Fu
Gerard de Melo
Yunpeng Chen
Sen Su
VLM
SSL
57
10
0
24 Sep 2021
Recent Advances of Continual Learning in Computer Vision: An Overview
Haoxuan Qu
Hossein Rahmani
Li Xu
Bryan M. Williams
Jun Liu
VLM
CLL
30
73
0
23 Sep 2021
Does Vision-and-Language Pretraining Improve Lexical Grounding?
Tian Yun
Chen Sun
Ellie Pavlick
VLM
CoGe
40
32
0
21 Sep 2021
Chemical-Reaction-Aware Molecule Representation Learning
Hongwei Wang
Weijian Li
Xiaomeng Jin
Kyunghyun Cho
Heng Ji
Jiawei Han
Martin D. Burke
107
57
0
21 Sep 2021
Modern Evolution Strategies for Creativity: Fitting Concrete Images and Abstract Concepts
Yingtao Tian
David R Ha
65
43
0
18 Sep 2021
ActionCLIP: A New Paradigm for Video Action Recognition
Mengmeng Wang
Jiazheng Xing
Yong Liu
VLM
152
362
0
17 Sep 2021
xGQA: Cross-Lingual Visual Question Answering
Jonas Pfeiffer
Gregor Geigle
Aishwarya Kamath
Jan-Martin O. Steitz
Stefan Roth
Ivan Vulić
Iryna Gurevych
42
56
0
13 Sep 2021
Are Gender-Neutral Queries Really Gender-Neutral? Mitigating Gender Bias in Image Search
Jialu Wang
Yang Liu
Junfeng Fang
FaML
157
95
0
12 Sep 2021
An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA
Zhengyuan Yang
Zhe Gan
Jianfeng Wang
Xiaowei Hu
Yumao Lu
Zicheng Liu
Lijuan Wang
180
403
0
10 Sep 2021
EfficientCLIP: Efficient Cross-Modal Pre-training by Ensemble Confident Learning and Language Modeling
Jue Wang
Haofan Wang
Jincan Deng
Weijia Wu
Debing Zhang
VLM
CLIP
67
18
0
10 Sep 2021
EVOQUER: Enhancing Temporal Grounding with Video-Pivoted BackQuery Generation
Yanjun Gao
Lulu Liu
Jason Wang
Xin Chen
Huayan Wang
Rui Zhang
31
1
0
10 Sep 2021
M5Product: Self-harmonized Contrastive Learning for E-commercial Multi-modal Pretraining
Xiao Dong
Xunlin Zhan
Yangxin Wu
Yunchao Wei
Michael C. Kampffmeyer
Xiaoyong Wei
Minlong Lu
Yaowei Wang
Xiaodan Liang
35
37
0
09 Sep 2021
Learning cortical representations through perturbed and adversarial dreaming
Nicolas Deperrois
Mihai A. Petrovici
Walter Senn
Jakob Jordan
GAN
CLL
60
21
0
09 Sep 2021
Retrieve, Caption, Generate: Visual Grounding for Enhancing Commonsense in Text Generation Models
Steven Y. Feng
Kevin Lu
Zhuofu Tao
Malihe Alikhani
Teruko Mitamura
Eduard H. Hovy
Varun Gangal
LRM
40
13
0
08 Sep 2021
Learning to Generate Scene Graph from Natural Language Supervision
Yiwu Zhong
Jing Shi
Jianwei Yang
Chenliang Xu
Yin Li
SSL
42
77
0
06 Sep 2021
Robust fine-tuning of zero-shot models
Mitchell Wortsman
Gabriel Ilharco
Jong Wook Kim
Mike Li
Simon Kornblith
...
Raphael Gontijo-Lopes
Hannaneh Hajishirzi
Ali Farhadi
Hongseok Namkoong
Ludwig Schmidt
VLM
71
697
0
04 Sep 2021
Learning to Prompt for Vision-Language Models
Kaiyang Zhou
Jingkang Yang
Chen Change Loy
Ziwei Liu
VPVLM
CLIP
VLM
350
2,286
0
02 Sep 2021
Aligning Cross-lingual Sentence Representations with Dual Momentum Contrast
Liang Wang
Wei Zhao
Jingming Liu
40
14
0
01 Sep 2021
Fine-Grained Chemical Entity Typing with Multimodal Knowledge Representation
Chenkai Sun
Weijian Li
Jinfeng Xiao
Nikolaus Nova Parulian
ChengXiang Zhai
Heng Ji
44
4
0
29 Aug 2021
LocTex: Learning Data-Efficient Visual Representations from Localized Textual Supervision
Zhijian Liu
Simon Stent
Jie Li
John Gideon
Song Han
VLM
25
10
0
26 Aug 2021
EncoderMI: Membership Inference against Pre-trained Encoders in Contrastive Learning
Hongbin Liu
Jinyuan Jia
Wenjie Qu
Neil Zhenqiang Gong
6
94
0
25 Aug 2021
SimVLM: Simple Visual Language Model Pretraining with Weak Supervision
Zirui Wang
Jiahui Yu
Adams Wei Yu
Zihang Dai
Yulia Tsvetkov
Yuan Cao
VLM
MLLM
51
782
0
24 Aug 2021
Supervised Compression for Resource-Constrained Edge Computing Systems
Yoshitomo Matsubara
Ruihan Yang
Marco Levorato
Stephan Mandt
19
58
0
21 Aug 2021
MM-ViT: Multi-Modal Video Transformer for Compressed Video Action Recognition
Jiawei Chen
C. Ho
ViT
26
77
0
20 Aug 2021
ImageBART: Bidirectional Context with Multinomial Diffusion for Autoregressive Image Synthesis
Patrick Esser
Robin Rombach
A. Blattmann
Bjorn Ommer
DiffM
38
158
0
19 Aug 2021
Previous
1
2
3
...
202
203
204
205
206
Next