ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.00020
  4. Cited By
Learning Transferable Visual Models From Natural Language Supervision

Learning Transferable Visual Models From Natural Language Supervision

26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
    CLIP
    VLM
ArXivPDFHTML

Papers citing "Learning Transferable Visual Models From Natural Language Supervision"

50 / 10,282 papers shown
Title
Scaling Vision Transformers
Scaling Vision Transformers
Xiaohua Zhai
Alexander Kolesnikov
N. Houlsby
Lucas Beyer
ViT
85
1,064
0
08 Jun 2021
Image2Point: 3D Point-Cloud Understanding with 2D Image Pretrained
  Models
Image2Point: 3D Point-Cloud Understanding with 2D Image Pretrained Models
Chenfeng Xu
Shijia Yang
Tomer Galanti
Bichen Wu
Xiangyu Yue
Bohan Zhai
Wei Zhan
Peter Vajda
Kurt Keutzer
Masayoshi Tomizuka
3DPC
39
53
0
08 Jun 2021
Differentiable Quality Diversity
Differentiable Quality Diversity
Matthew C. Fontaine
Stefanos Nikolaidis
51
89
0
07 Jun 2021
On the Expressive Power of Self-Attention Matrices
On the Expressive Power of Self-Attention Matrices
Valerii Likhosherstov
K. Choromanski
Adrian Weller
37
34
0
07 Jun 2021
MERLOT: Multimodal Neural Script Knowledge Models
MERLOT: Multimodal Neural Script Knowledge Models
Rowan Zellers
Ximing Lu
Jack Hessel
Youngjae Yu
J. S. Park
Jize Cao
Ali Farhadi
Yejin Choi
VLM
LRM
33
374
0
04 Jun 2021
A Little Robustness Goes a Long Way: Leveraging Robust Features for
  Targeted Transfer Attacks
A Little Robustness Goes a Long Way: Leveraging Robust Features for Targeted Transfer Attacks
Jacob Mitchell Springer
Melanie Mitchell
Garrett Kenyon
AAML
31
43
0
03 Jun 2021
Effect of Pre-Training Scale on Intra- and Inter-Domain Full and
  Few-Shot Transfer Learning for Natural and Medical X-Ray Chest Images
Effect of Pre-Training Scale on Intra- and Inter-Domain Full and Few-Shot Transfer Learning for Natural and Medical X-Ray Chest Images
Mehdi Cherti
J. Jitsev
LM&MA
24
23
0
31 May 2021
SegFormer: Simple and Efficient Design for Semantic Segmentation with
  Transformers
SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers
Enze Xie
Wenhai Wang
Zhiding Yu
Anima Anandkumar
J. Álvarez
Ping Luo
ViT
50
4,855
0
31 May 2021
Connecting Language and Vision for Natural Language-Based Vehicle
  Retrieval
Connecting Language and Vision for Natural Language-Based Vehicle Retrieval
Shuai Bai
Zhedong Zheng
Xiaohan Wang
Junyang Lin
Zhu Zhang
Chang Zhou
Yi Yang
Hongxia Yang
24
27
0
31 May 2021
LIIR at SemEval-2021 task 6: Detection of Persuasion Techniques In Texts
  and Images using CLIP features
LIIR at SemEval-2021 task 6: Detection of Persuasion Techniques In Texts and Images using CLIP features
Erfan Ghadery
Damien Sileo
Marie-Francine Moens
VLM
24
2
0
31 May 2021
Contrastive Fine-tuning Improves Robustness for Neural Rankers
Contrastive Fine-tuning Improves Robustness for Neural Rankers
Xiaofei Ma
Cicero Nogueira dos Santos
Andrew O. Arnold
21
20
0
27 May 2021
CogView: Mastering Text-to-Image Generation via Transformers
CogView: Mastering Text-to-Image Generation via Transformers
Ming Ding
Zhuoyi Yang
Wenyi Hong
Wendi Zheng
Chang Zhou
...
Junyang Lin
Xu Zou
Zhou Shao
Hongxia Yang
Jie Tang
ViT
VLM
54
766
0
26 May 2021
True Few-Shot Learning with Language Models
True Few-Shot Learning with Language Models
Ethan Perez
Douwe Kiela
Kyunghyun Cho
21
428
0
24 May 2021
Improved OOD Generalization via Adversarial Training and Pre-training
Improved OOD Generalization via Adversarial Training and Pre-training
Mingyang Yi
Lu Hou
Jiacheng Sun
Lifeng Shang
Xin Jiang
Qun Liu
Zhi-Ming Ma
VLM
33
83
0
24 May 2021
Backdoor Attacks on Self-Supervised Learning
Backdoor Attacks on Self-Supervised Learning
Aniruddha Saha
Ajinkya Tejankar
Soroush Abbasi Koohpayegani
Hamed Pirsiavash
SSL
AAML
27
101
0
21 May 2021
A Review on Explainability in Multimodal Deep Neural Nets
A Review on Explainability in Multimodal Deep Neural Nets
Gargi Joshi
Rahee Walambe
K. Kotecha
29
140
0
17 May 2021
Vision Transformers are Robust Learners
Vision Transformers are Robust Learners
Sayak Paul
Pin-Yu Chen
ViT
28
309
0
17 May 2021
Evading the Simplicity Bias: Training a Diverse Set of Models Discovers
  Solutions with Superior OOD Generalization
Evading the Simplicity Bias: Training a Diverse Set of Models Discovers Solutions with Superior OOD Generalization
Damien Teney
Ehsan Abbasnejad
Simon Lucey
Anton Van Den Hengel
51
87
0
12 May 2021
Diffusion Models Beat GANs on Image Synthesis
Diffusion Models Beat GANs on Image Synthesis
Prafulla Dhariwal
Alex Nichol
85
7,480
0
11 May 2021
Contrastive Attraction and Contrastive Repulsion for Representation
  Learning
Contrastive Attraction and Contrastive Repulsion for Representation Learning
Huangjie Zheng
Xu Chen
Jiangchao Yao
Hongxia Yang
Chunyuan Li
Ya Zhang
Hao Zhang
Ivor Tsang
Jingren Zhou
Mingyuan Zhou
SSL
42
12
0
08 May 2021
Open-vocabulary Object Detection via Vision and Language Knowledge
  Distillation
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation
Xiuye Gu
Nayeon Lee
Weicheng Kuo
Huayu Chen
VLM
ObjD
225
899
0
28 Apr 2021
If your data distribution shifts, use self-learning
If your data distribution shifts, use self-learning
E. Rusak
Steffen Schneider
George Pachitariu
L. Eck
Peter V. Gehler
Oliver Bringmann
Wieland Brendel
Matthias Bethge
VLM
OOD
TTA
81
30
0
27 Apr 2021
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
MDETR -- Modulated Detection for End-to-End Multi-Modal Understanding
Aishwarya Kamath
Mannat Singh
Yann LeCun
Gabriel Synnaeve
Ishan Misra
Nicolas Carion
ObjD
VLM
90
864
0
26 Apr 2021
PanGu-$α$: Large-scale Autoregressive Pretrained Chinese Language
  Models with Auto-parallel Computation
PanGu-ααα: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation
Wei Zeng
Xiaozhe Ren
Teng Su
Hui Wang
Yi-Lun Liao
...
Gaojun Fan
Yaowei Wang
Xuefeng Jin
Qun Liu
Yonghong Tian
ALM
MoE
AI4CE
35
212
0
26 Apr 2021
Playing Lottery Tickets with Vision and Language
Playing Lottery Tickets with Vision and Language
Zhe Gan
Yen-Chun Chen
Linjie Li
Tianlong Chen
Yu Cheng
Shuohang Wang
Jingjing Liu
Lijuan Wang
Zicheng Liu
VLM
109
54
0
23 Apr 2021
Multiscale Vision Transformers
Multiscale Vision Transformers
Haoqi Fan
Bo Xiong
K. Mangalam
Yanghao Li
Zhicheng Yan
Jitendra Malik
Christoph Feichtenhofer
ViT
63
1,226
0
22 Apr 2021
Pri3D: Can 3D Priors Help 2D Representation Learning?
Pri3D: Can 3D Priors Help 2D Representation Learning?
Ji Hou
Saining Xie
Benjamin Graham
Angela Dai
Matthias Nießner
SSL
3DPC
MDE
85
80
0
22 Apr 2021
Understanding Chinese Video and Language via Contrastive Multimodal
  Pre-Training
Understanding Chinese Video and Language via Contrastive Multimodal Pre-Training
Chenyi Lei
Shixian Luo
Yong-jin Liu
Wanggui He
Jiamang Wang
Guoxin Wang
Haihong Tang
Chunyan Miao
Houqiang Li
30
41
0
19 Apr 2021
Data-Efficient Language-Supervised Zero-Shot Learning with
  Self-Distillation
Data-Efficient Language-Supervised Zero-Shot Learning with Self-Distillation
Rui Cheng
Bichen Wu
Peizhao Zhang
Peter Vajda
Joseph E. Gonzalez
CLIP
VLM
21
31
0
18 Apr 2021
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip
  Retrieval
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval
Huaishao Luo
Lei Ji
Ming Zhong
Yang Chen
Wen Lei
Nan Duan
Tianrui Li
CLIP
VLM
329
782
0
18 Apr 2021
CLIPScore: A Reference-free Evaluation Metric for Image Captioning
CLIPScore: A Reference-free Evaluation Metric for Image Captioning
Jack Hessel
Ari Holtzman
Maxwell Forbes
Ronan Le Bras
Yejin Choi
CLIP
17
1,454
0
18 Apr 2021
Cross-Modal Retrieval Augmentation for Multi-Modal Classification
Cross-Modal Retrieval Augmentation for Multi-Modal Classification
Shir Gur
Natalia Neverova
C. Stauffer
Ser-Nam Lim
Douwe Kiela
A. Reiter
22
26
0
16 Apr 2021
Exploring Visual Engagement Signals for Representation Learning
Exploring Visual Engagement Signals for Representation Learning
Menglin Jia
Zuxuan Wu
A. Reiter
Claire Cardie
Serge Belongie
Ser-Nam Lim
21
13
0
15 Apr 2021
Self-supervised Video Object Segmentation by Motion Grouping
Self-supervised Video Object Segmentation by Motion Grouping
Charig Yang
Hala Lamdouar
Erika Lu
Andrew Zisserman
Weidi Xie
VOS
OCL
30
157
0
15 Apr 2021
Compressing Visual-linguistic Model via Knowledge Distillation
Compressing Visual-linguistic Model via Knowledge Distillation
Zhiyuan Fang
Jianfeng Wang
Xiaowei Hu
Lijuan Wang
Yezhou Yang
Zicheng Liu
VLM
39
97
0
05 Apr 2021
An Empirical Study of Training Self-Supervised Vision Transformers
An Empirical Study of Training Self-Supervised Vision Transformers
Xinlei Chen
Saining Xie
Kaiming He
ViT
75
1,819
0
05 Apr 2021
Towards General Purpose Vision Systems
Towards General Purpose Vision Systems
Tanmay Gupta
Amita Kamath
Aniruddha Kembhavi
Derek Hoiem
11
50
0
01 Apr 2021
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
Frozen in Time: A Joint Video and Image Encoder for End-to-End Retrieval
Max Bain
Arsha Nagrani
Gül Varol
Andrew Zisserman
VGen
57
1,134
0
01 Apr 2021
Composable Augmentation Encoding for Video Representation Learning
Composable Augmentation Encoding for Video Representation Learning
Chen Sun
Arsha Nagrani
Yonglong Tian
Cordelia Schmid
SSL
AI4TS
37
17
0
01 Apr 2021
Diagnosing Vision-and-Language Navigation: What Really Matters
Diagnosing Vision-and-Language Navigation: What Really Matters
Wanrong Zhu
Yuankai Qi
P. Narayana
Kazoo Sone
Sugato Basu
Junfeng Fang
Qi Wu
Miguel P. Eckstein
Wenjie Wang
LM&Ro
27
50
0
30 Mar 2021
LatentKeypointGAN: Controlling GANs via Latent Keypoints
LatentKeypointGAN: Controlling GANs via Latent Keypoints
Xingzhe He
Bastian Wandt
Helge Rhodin
GAN
30
6
0
29 Mar 2021
Generic Attention-model Explainability for Interpreting Bi-Modal and
  Encoder-Decoder Transformers
Generic Attention-model Explainability for Interpreting Bi-Modal and Encoder-Decoder Transformers
Hila Chefer
Shir Gur
Lior Wolf
ViT
31
306
0
29 Mar 2021
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Ze Liu
Yutong Lin
Yue Cao
Han Hu
Yixuan Wei
Zheng-Wei Zhang
Stephen Lin
B. Guo
ViT
151
20,812
0
25 Mar 2021
Retrieve Fast, Rerank Smart: Cooperative and Joint Approaches for
  Improved Cross-Modal Retrieval
Retrieve Fast, Rerank Smart: Cooperative and Joint Approaches for Improved Cross-Modal Retrieval
Gregor Geigle
Jonas Pfeiffer
Nils Reimers
Ivan Vulić
Iryna Gurevych
40
60
0
22 Mar 2021
Paint by Word
Paint by Word
A. Andonian
David Bau
Audrey Cui
YeonHwan Park
Ali Jahanian
Antonio Torralba
A. Oliva
DiffM
20
125
0
19 Mar 2021
MDMMT: Multidomain Multimodal Transformer for Video Retrieval
MDMMT: Multidomain Multimodal Transformer for Video Retrieval
Maksim Dzabraev
M. Kalashnikov
Stepan Alekseevich Komkov
Aleksandr Petiushko
24
128
0
19 Mar 2021
Large-Scale Zero-Shot Image Classification from Rich and Diverse Textual
  Descriptions
Large-Scale Zero-Shot Image Classification from Rich and Diverse Textual Descriptions
Sebastian Bujwid
Josephine Sullivan
VLM
23
28
0
17 Mar 2021
LightningDOT: Pre-training Visual-Semantic Embeddings for Real-Time
  Image-Text Retrieval
LightningDOT: Pre-training Visual-Semantic Embeddings for Real-Time Image-Text Retrieval
Siqi Sun
Yen-Chun Chen
Linjie Li
Shuohang Wang
Yuwei Fang
Jingjing Liu
VLM
38
82
0
16 Mar 2021
What is Multimodality?
What is Multimodality?
Letitia Parcalabescu
Nils Trost
Anette Frank
21
0
0
10 Mar 2021
Pretrained Transformers as Universal Computation Engines
Pretrained Transformers as Universal Computation Engines
Kevin Lu
Aditya Grover
Pieter Abbeel
Igor Mordatch
28
218
0
09 Mar 2021
Previous
123...204205206
Next