Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.00020
Cited By
Learning Transferable Visual Models From Natural Language Supervision
26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Learning Transferable Visual Models From Natural Language Supervision"
50 / 10,932 papers shown
Title
Temporal Sentence Grounding in Videos: A Survey and Future Directions
Hao Zhang
Aixin Sun
Wei Jing
Qiufeng Wang
3DGS
48
40
0
20 Jan 2022
CM3: A Causal Masked Multimodal Model of the Internet
Armen Aghajanyan
Po-Yao (Bernie) Huang
Candace Ross
Vladimir Karpukhin
Hu Xu
...
Dmytro Okhonko
Mandar Joshi
Gargi Ghosh
M. Lewis
Luke Zettlemoyer
30
155
0
19 Jan 2022
TriCoLo: Trimodal Contrastive Loss for Text to Shape Retrieval
Yue Ruan
Han-Hung Lee
Yiming Zhang
Ke Zhang
Angel X. Chang
37
22
0
19 Jan 2022
ProposalCLIP: Unsupervised Open-Category Object Proposal Generation via Exploiting CLIP Cues
Hengcan Shi
Munawar Hayat
Yicheng Wu
Jianfei Cai
VLM
35
61
0
18 Jan 2022
Unpaired Referring Expression Grounding via Bidirectional Cross-Modal Matching
Hengcan Shi
Munawar Hayat
Jianfei Cai
ObjD
25
10
0
18 Jan 2022
The CLEAR Benchmark: Continual LEArning on Real-World Imagery
Zhiqiu Lin
Jia Shi
Deepak Pathak
Deva Ramanan
CLL
VLM
151
92
0
17 Jan 2022
Video Transformers: A Survey
Javier Selva
A. S. Johansen
Sergio Escalera
Kamal Nasrollahi
T. Moeslund
Albert Clapés
ViT
34
103
0
16 Jan 2022
StolenEncoder: Stealing Pre-trained Encoders in Self-supervised Learning
Yupei Liu
Jinyuan Jia
Hongbin Liu
Neil Zhenqiang Gong
MIACV
33
24
0
15 Jan 2022
Transferability in Deep Learning: A Survey
Junguang Jiang
Yang Shu
Jianmin Wang
Mingsheng Long
OOD
52
102
0
15 Jan 2022
CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks
Zhecan Wang
Noel Codella
Yen-Chun Chen
Luowei Zhou
Jianwei Yang
Xiyang Dai
Bin Xiao
Haoxuan You
Shih-Fu Chang
Lu Yuan
CLIP
VLM
27
39
0
15 Jan 2022
When less is more: Simplifying inputs aids neural network understanding
R. Schirrmeister
Rosanne Liu
Sara Hooker
T. Ball
58
5
0
14 Jan 2022
Pushing the limits of self-supervised ResNets: Can we outperform supervised learning without labels on ImageNet?
Nenad Tomašev
Ioana Bica
Brian McWilliams
Lars Buesing
Razvan Pascanu
Charles Blundell
Jovana Mitrović
SSL
103
81
0
13 Jan 2022
CLIP-Event: Connecting Text and Images with Event Structures
Manling Li
Ruochen Xu
Shuohang Wang
Luowei Zhou
Xudong Lin
Chenguang Zhu
Michael Zeng
Heng Ji
Shih-Fu Chang
VLM
CLIP
32
124
0
13 Jan 2022
Bridging Video-text Retrieval with Multiple Choice Questions
Yuying Ge
Yixiao Ge
Xihui Liu
Dian Li
Ying Shan
Xiaohu Qie
Ping Luo
BDL
46
108
0
13 Jan 2022
Learning to Denoise Raw Mobile UI Layouts for Improving Datasets at Scale
Gang Li
Gilles Baechler
Manuel Tragut
Yang Li
29
49
0
11 Jan 2022
Music2Video: Automatic Generation of Music Video with fusion of audio and text
Yoonjeon Kim
Joel Jang
Sumin Shin
DiffM
VGen
46
7
0
11 Jan 2022
Language-driven Semantic Segmentation
Boyi Li
Kilian Q. Weinberger
Serge Belongie
V. Koltun
René Ranftl
VLM
80
606
0
10 Jan 2022
GUDN: A novel guide network with label reinforcement strategy for extreme multi-label text classification
Qing Wang
Jia Zhu
Hongji Shu
K. Asamoah
J. Shi
Cong Zhou
41
5
0
10 Jan 2022
MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound
Rowan Zellers
Jiasen Lu
Ximing Lu
Youngjae Yu
Yanpeng Zhao
Mohammadreza Salehi
Aditya Kusupati
Jack Hessel
Ali Farhadi
Yejin Choi
55
208
0
07 Jan 2022
Self-Supervised Approach to Addressing Zero-Shot Learning Problem
Ademola Okerinde
Sam Hoggatt
Divya Vani Lakkireddy
Nolan Brubaker
William H. Hsu
L. Shamir
Brian Spiesman
SSL
VLM
22
2
0
05 Jan 2022
Discrete and continuous representations and processing in deep learning: Looking forward
Ruben Cartuyvels
Graham Spinks
Marie-Francine Moens
OCL
59
20
0
04 Jan 2022
Sound and Visual Representation Learning with Multiple Pretraining Tasks
A. Vasudevan
Dengxin Dai
Luc Van Gool
SSL
43
6
0
04 Jan 2022
Optimal Representations for Covariate Shift
Yangjun Ruan
Yann Dubois
Chris J. Maddison
OOD
50
68
0
31 Dec 2021
Does CLIP Benefit Visual Question Answering in the Medical Domain as Much as it Does in the General Domain?
Sedigheh Eslami
Gerard de Melo
Christoph Meinel
CLIP
MedIm
24
116
0
27 Dec 2021
A Fistful of Words: Learning Transferable Visual Models from Bag-of-Words Supervision
Ajinkya Tejankar
Maziar Sanjabi
Bichen Wu
Saining Xie
Madian Khabsa
Hamed Pirsiavash
Hamed Firooz
VLM
42
17
0
27 Dec 2021
Multimodal Image Synthesis and Editing: The Generative AI Era
Fangneng Zhan
Yingchen Yu
Rongliang Wu
Jiahui Zhang
Shijian Lu
Lingjie Liu
Adam Kortylewski
Christian Theobalt
Eric Xing
EGVM
36
49
0
27 Dec 2021
Domain-Aware Continual Zero-Shot Learning
Kai Yi
Paul Janson
Wenxuan Zhang
Mohamed Elhoseiny
59
4
0
24 Dec 2021
Cross Modal Retrieval with Querybank Normalisation
Simion-Vlad Bogolin
Ioana Croitoru
Hailin Jin
Yang Liu
Samuel Albanie
35
85
0
23 Dec 2021
SLIP: Self-supervision meets Language-Image Pre-training
Norman Mu
Alexander Kirillov
David Wagner
Saining Xie
VLM
CLIP
63
483
0
23 Dec 2021
Scaling Open-Vocabulary Image Segmentation with Image-Level Labels
Golnaz Ghiasi
Xiuye Gu
Huayu Chen
Nayeon Lee
VLM
74
373
0
22 Dec 2021
Looking Beyond Corners: Contrastive Learning of Visual Representations for Keypoint Detection and Description Extraction
Henrique Siqueira
Patrick Ruhkamp
Ibrahim Halfaoui
Markus Karmann
O. Urfalioglu
SSL
25
1
0
22 Dec 2021
JoJoGAN: One Shot Face Stylization
Min Jin Chong
David A. Forsyth
CVBM
GAN
62
70
0
22 Dec 2021
Contrastive Object Detection Using Knowledge Graph Embeddings
Christopher Lang
Alexander Braun
Abhinav Valada
28
8
0
21 Dec 2021
Extending CLIP for Category-to-image Retrieval in E-commerce
Mariya Hendriksen
Maurits J. R. Bleeker
Svitlana Vakulenko
Nanne van Noord
E. Kuiper
Maarten de Rijke
VLM
16
30
0
21 Dec 2021
Provable Hierarchical Lifelong Learning with a Sketch-based Modular Architecture
Zihao Deng
Zee Fryer
Brendan Juba
Rina Panigrahy
Xin Wang
39
2
0
21 Dec 2021
High-Resolution Image Synthesis with Latent Diffusion Models
Robin Rombach
A. Blattmann
Dominik Lorenz
Patrick Esser
Bjorn Ommer
3DV
157
14,962
0
20 Dec 2021
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
Alex Nichol
Prafulla Dhariwal
Aditya A. Ramesh
Pranav Shyam
Pamela Mishkin
Bob McGrew
Ilya Sutskever
Mark Chen
112
3,514
0
20 Dec 2021
Mind-proofing Your Phone: Navigating the Digital Minefield with GreaseTerminator
Siddhartha Datta
Konrad Kollnig
N. Shadbolt
48
10
0
20 Dec 2021
Learning with Label Noise for Image Retrieval by Selecting Interactions
Sarah Ibrahimi
Arnaud Sors
Rafael Sampaio de Rezende
Stéphane Clinchant
NoLa
VLM
32
16
0
20 Dec 2021
Image Segmentation Using Text and Image Prompts
Timo Lüddecke
Alexander S. Ecker
CLIP
VLM
36
457
0
18 Dec 2021
Soundify: Matching Sound Effects to Video
David Chuan-En Lin
Anastasis Germanidis
Cristobal Valenzuela
Yining Shi
Nikolas Martelaro
35
16
0
17 Dec 2021
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
Dongxu Li
Junnan Li
Hongdong Li
Juan Carlos Niebles
Guosheng Lin
46
191
0
17 Dec 2021
Contrastive Vision-Language Pre-training with Limited Resources
Quan Cui
Boyan Zhou
Yu Guo
Weidong Yin
Hao Wu
Osamu Yoshie
Yubo Chen
VLM
CLIP
24
33
0
17 Dec 2021
Ensembling Off-the-shelf Models for GAN Training
Nupur Kumari
Richard Y. Zhang
Eli Shechtman
Jun-Yan Zhu
57
86
0
16 Dec 2021
RegionCLIP: Region-based Language-Image Pretraining
Yiwu Zhong
Jianwei Yang
Pengchuan Zhang
Chunyuan Li
Noel Codella
...
Luowei Zhou
Xiyang Dai
Lu Yuan
Yin Li
Jianfeng Gao
VLM
CLIP
45
565
0
16 Dec 2021
CODER: An efficient framework for improving retrieval through COntextual Document Embedding Reranking
George Zerveas
Navid Rekabsaz
Daniel Cohen
Carsten Eickhoff
44
8
0
16 Dec 2021
Lacuna Reconstruction: Self-supervised Pre-training for Low-Resource Historical Document Transcription
Nikolai Vogler
J. Allen
M. Miller
Taylor Berg-Kirkpatrick
37
5
0
16 Dec 2021
TransZero++: Cross Attribute-Guided Transformer for Zero-Shot Learning
Shiming Chen
Zi-Quan Hong
Wenjin Hou
Guosen Xie
Yibing Song
Jian-jun Zhao
Xinge You
Shuicheng Yan
Ling Shao
ViT
32
44
0
16 Dec 2021
Decoupling Zero-Shot Semantic Segmentation
Jian Ding
Nan Xue
Guisong Xia
Dengxin Dai
VLM
61
190
0
15 Dec 2021
Out-of-Distribution Detection Without Class Labels
Niv Cohen
Ron Abutbul
Yedid Hoshen
OODD
32
11
0
14 Dec 2021
Previous
1
2
3
...
211
212
213
...
217
218
219
Next