ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.00020
  4. Cited By
Learning Transferable Visual Models From Natural Language Supervision

Learning Transferable Visual Models From Natural Language Supervision

26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
    CLIP
    VLM
ArXivPDFHTML

Papers citing "Learning Transferable Visual Models From Natural Language Supervision"

50 / 10,939 papers shown
Title
BadEncoder: Backdoor Attacks to Pre-trained Encoders in Self-Supervised
  Learning
BadEncoder: Backdoor Attacks to Pre-trained Encoders in Self-Supervised Learning
Jinyuan Jia
Yupei Liu
Neil Zhenqiang Gong
SILM
SSL
65
152
0
01 Aug 2021
Product1M: Towards Weakly Supervised Instance-Level Product Retrieval
  via Cross-modal Pretraining
Product1M: Towards Weakly Supervised Instance-Level Product Retrieval via Cross-modal Pretraining
Xunlin Zhan
Yangxin Wu
Xiao Dong
Yunchao Wei
Minlong Lu
Yichi Zhang
Hang Xu
Xiaodan Liang
ViT
34
64
0
30 Jul 2021
Is Object Detection Necessary for Human-Object Interaction Recognition?
Is Object Detection Necessary for Human-Object Interaction Recognition?
Ying Jin
Yinpeng Chen
Lijuan Wang
Jianfeng Wang
Pei Yu
Zicheng Liu
Lei Li
38
7
0
27 Jul 2021
Pointer Value Retrieval: A new benchmark for understanding the limits of
  neural network generalization
Pointer Value Retrieval: A new benchmark for understanding the limits of neural network generalization
Chiyuan Zhang
M. Raghu
Jon M. Kleinberg
Samy Bengio
OOD
39
30
0
27 Jul 2021
Segmentation in Style: Unsupervised Semantic Image Segmentation with
  Stylegan and CLIP
Segmentation in Style: Unsupervised Semantic Image Segmentation with Stylegan and CLIP
D. Pakhomov
Sanchit Hira
Narayani Wagle
K. Green
Nassir Navab
VLM
37
31
0
26 Jul 2021
Language Grounding with 3D Objects
Language Grounding with 3D Objects
Jesse Thomason
Mohit Shridhar
Yonatan Bisk
Chris Paxton
Luke Zettlemoyer
LM&Ro
36
53
0
26 Jul 2021
Boosting Entity-aware Image Captioning with Multi-modal Knowledge Graph
Boosting Entity-aware Image Captioning with Multi-modal Knowledge Graph
Wentian Zhao
Yao Hu
Heda Wang
Xinxiao Wu
Jiebo Luo
28
47
0
26 Jul 2021
LARGE: Latent-Based Regression through GAN Semantics
LARGE: Latent-Based Regression through GAN Semantics
Yotam Nitzan
Rinon Gal
Ofir Brenner
Daniel Cohen-Or
GAN
34
26
0
22 Jul 2021
Theoretical foundations and limits of word embeddings: what types of
  meaning can they capture?
Theoretical foundations and limits of word embeddings: what types of meaning can they capture?
Alina Arseniev-Koehler
46
21
0
22 Jul 2021
CycleMLP: A MLP-like Architecture for Dense Prediction
CycleMLP: A MLP-like Architecture for Dense Prediction
Shoufa Chen
Enze Xie
Chongjian Ge
Runjian Chen
Ding Liang
Ping Luo
63
231
0
21 Jul 2021
QVHighlights: Detecting Moments and Highlights in Videos via Natural
  Language Queries
QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries
Jie Lei
Tamara L. Berg
Joey Tianyi Zhou
ViT
46
63
0
20 Jul 2021
Exploiting generative self-supervised learning for the assessment of
  biological images with lack of annotations: a COVID-19 case-study
Exploiting generative self-supervised learning for the assessment of biological images with lack of annotations: a COVID-19 case-study
Alessio Mascolini
Dario Cardamone
Francesco Ponzio
S. D. Cataldo
E. Ficarra
MedIm
32
15
0
16 Jul 2021
Align before Fuse: Vision and Language Representation Learning with
  Momentum Distillation
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation
Junnan Li
Ramprasaath R. Selvaraju
Akhilesh Deepak Gotmare
Shafiq Joty
Caiming Xiong
Guosheng Lin
FaML
117
1,907
0
16 Jul 2021
From Show to Tell: A Survey on Deep Learning-based Image Captioning
From Show to Tell: A Survey on Deep Learning-based Image Captioning
Matteo Stefanini
Marcella Cornia
Lorenzo Baraldi
S. Cascianelli
G. Fiameni
Rita Cucchiara
3DV
VLM
MLLM
71
259
0
14 Jul 2021
How Much Can CLIP Benefit Vision-and-Language Tasks?
How Much Can CLIP Benefit Vision-and-Language Tasks?
Sheng Shen
Liunian Harold Li
Hao Tan
Joey Tianyi Zhou
Anna Rohrbach
Kai-Wei Chang
Z. Yao
Kurt Keutzer
CLIP
VLM
MLLM
215
407
0
13 Jul 2021
eProduct: A Million-Scale Visual Search Benchmark to Address Product
  Recognition Challenges
eProduct: A Million-Scale Visual Search Benchmark to Address Product Recognition Challenges
Jiangbo Yuan
An-Ti Chiang
Wen Tang
A. Haro
VLM
27
6
0
13 Jul 2021
Memes in the Wild: Assessing the Generalizability of the Hateful Memes
  Challenge Dataset
Memes in the Wild: Assessing the Generalizability of the Hateful Memes Challenge Dataset
Hannah Rose Kirk
Yennie Jun
Paulius Rauba
Gal Wachtel
Ruining Li
Xingjian Bai
Noah Broestl
Martin Doff-Sotta
Aleksandar Shtedritski
Yuki M. Asano
40
25
0
09 Jul 2021
LanguageRefer: Spatial-Language Model for 3D Visual Grounding
LanguageRefer: Spatial-Language Model for 3D Visual Grounding
Junha Roh
Karthik Desingh
Ali Farhadi
Dieter Fox
27
95
0
07 Jul 2021
Evaluating Large Language Models Trained on Code
Evaluating Large Language Models Trained on Code
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
...
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELM
ALM
93
5,256
0
07 Jul 2021
Predicting with Confidence on Unseen Distributions
Predicting with Confidence on Unseen Distributions
Devin Guillory
Vaishaal Shankar
Sayna Ebrahimi
Trevor Darrell
Ludwig Schmidt
UQCV
OOD
30
117
0
07 Jul 2021
CLIP-It! Language-Guided Video Summarization
CLIP-It! Language-Guided Video Summarization
Medhini Narasimhan
Anna Rohrbach
Trevor Darrell
CLIP
37
114
0
01 Jul 2021
Applications of the Free Energy Principle to Machine Learning and
  Neuroscience
Applications of the Free Energy Principle to Machine Learning and Neuroscience
Beren Millidge
DRL
62
7
0
30 Jun 2021
The Evolution of Out-of-Distribution Robustness Throughout Fine-Tuning
The Evolution of Out-of-Distribution Robustness Throughout Fine-Tuning
Anders Andreassen
Yasaman Bahri
Behnam Neyshabur
Rebecca Roelofs
OOD
OODD
43
79
0
30 Jun 2021
Data Poisoning Won't Save You From Facial Recognition
Data Poisoning Won't Save You From Facial Recognition
Evani Radiya-Dixit
Sanghyun Hong
Nicholas Carlini
Florian Tramèr
AAML
PICV
29
57
0
28 Jun 2021
CLIPDraw: Exploring Text-to-Drawing Synthesis through Language-Image
  Encoders
CLIPDraw: Exploring Text-to-Drawing Synthesis through Language-Image Encoders
Kevin Frans
Lisa Soros
Olaf Witkowski
CLIP
40
207
0
28 Jun 2021
Visual Conceptual Blending with Large-scale Language and Vision Models
Visual Conceptual Blending with Large-scale Language and Vision Models
Songwei Ge
Devi Parikh
VLM
DiffM
30
14
0
27 Jun 2021
Core Challenges in Embodied Vision-Language Planning
Core Challenges in Embodied Vision-Language Planning
Jonathan M Francis
Nariaki Kitamura
Felix Labelle
Xiaopeng Lu
Ingrid Navarro
Jean Oh
LM&Ro
56
45
0
26 Jun 2021
Multimodal Few-Shot Learning with Frozen Language Models
Multimodal Few-Shot Learning with Frozen Language Models
Maria Tsimpoukelli
Jacob Menick
Serkan Cabi
S. M. Ali Eslami
Oriol Vinyals
Felix Hill
MLLM
96
761
0
25 Jun 2021
Fairness for Image Generation with Uncertain Sensitive Attributes
Fairness for Image Generation with Uncertain Sensitive Attributes
A. Jalal
Sushrut Karmalkar
Jessica Hoffmann
A. Dimakis
Eric Price
DiffM
40
39
0
23 Jun 2021
DocFormer: End-to-End Transformer for Document Understanding
DocFormer: End-to-End Transformer for Document Understanding
Srikar Appalaraju
Bhavan A. Jasani
Bhargava Urala Kota
Yusheng Xie
R. Manmatha
ViT
46
274
0
22 Jun 2021
CLIP2Video: Mastering Video-Text Retrieval via Image CLIP
CLIP2Video: Mastering Video-Text Retrieval via Image CLIP
Han Fang
Pengfei Xiong
Luhui Xu
Yu Chen
CLIP
VLM
63
292
0
21 Jun 2021
Efficient Self-supervised Vision Transformers for Representation
  Learning
Efficient Self-supervised Vision Transformers for Representation Learning
Chunyuan Li
Jianwei Yang
Pengchuan Zhang
Mei Gao
Bin Xiao
Xiyang Dai
Lu Yuan
Jianfeng Gao
ViT
47
212
0
17 Jun 2021
Poisoning and Backdooring Contrastive Learning
Poisoning and Backdooring Contrastive Learning
Nicholas Carlini
Andreas Terzis
46
160
0
17 Jun 2021
A Simple Fix to Mahalanobis Distance for Improving Near-OOD Detection
A Simple Fix to Mahalanobis Distance for Improving Near-OOD Detection
Jie Jessie Ren
Stanislav Fort
J. Liu
Abhijit Guha Roy
Shreyas Padhy
Balaji Lakshminarayanan
UQCV
33
220
0
16 Jun 2021
Revisiting the Calibration of Modern Neural Networks
Revisiting the Calibration of Modern Neural Networks
Matthias Minderer
Josip Djolonga
Rob Romijnders
F. Hubis
Xiaohua Zhai
N. Houlsby
Dustin Tran
Mario Lucic
UQCV
59
362
0
15 Jun 2021
Communicating Natural Programs to Humans and Machines
Communicating Natural Programs to Humans and Machines
Samuel Acquaviva
Yewen Pu
Marta Kryven
Theo Sechopoulos
Catherine Wong
Gabrielle Ecanow
Maxwell Nye
Michael Henry Tessler
J. Tenenbaum
42
41
0
15 Jun 2021
Improved Transformer for High-Resolution GANs
Improved Transformer for High-Resolution GANs
Long Zhao
Zizhao Zhang
Ting Chen
Dimitris N. Metaxas
Han Zhang
ViT
42
95
0
14 Jun 2021
Partial success in closing the gap between human and machine vision
Partial success in closing the gap between human and machine vision
Robert Geirhos
Kantharaju Narayanappa
Benjamin Mitzkus
Tizian Thieringer
Matthias Bethge
Felix Wichmann
Wieland Brendel
VLM
AAML
58
223
0
14 Jun 2021
Pre-Trained Models: Past, Present and Future
Pre-Trained Models: Past, Present and Future
Xu Han
Zhengyan Zhang
Ning Ding
Yuxian Gu
Xiao Liu
...
Jie Tang
Ji-Rong Wen
Jinhui Yuan
Wayne Xin Zhao
Jun Zhu
AIFin
MQ
AI4MH
74
825
0
14 Jun 2021
D2C: Diffusion-Denoising Models for Few-shot Conditional Generation
D2C: Diffusion-Denoising Models for Few-shot Conditional Generation
Abhishek Sinha
Jiaming Song
Chenlin Meng
Stefano Ermon
VLM
DiffM
45
118
0
12 Jun 2021
Assessing Multilingual Fairness in Pre-trained Multimodal
  Representations
Assessing Multilingual Fairness in Pre-trained Multimodal Representations
Jialu Wang
Yang Liu
Xinze Wang
EGVM
33
36
0
12 Jun 2021
Neural Symbolic Regression that Scales
Neural Symbolic Regression that Scales
Luca Biggio
Tommaso Bendinelli
Alexander Neitz
Aurelien Lucchi
Giambattista Parascandolo
59
174
0
11 Jun 2021
What Can Knowledge Bring to Machine Learning? -- A Survey of Low-shot
  Learning for Structured Data
What Can Knowledge Bring to Machine Learning? -- A Survey of Low-shot Learning for Structured Data
Yang Hu
Adriane P. Chapman
Guihua Wen
Dame Wendy Hall
64
24
0
11 Jun 2021
Learning to See by Looking at Noise
Learning to See by Looking at Noise
Manel Baradad
Jonas Wulff
Tongzhou Wang
Phillip Isola
Antonio Torralba
49
90
0
10 Jun 2021
Pivotal Tuning for Latent-based Editing of Real Images
Pivotal Tuning for Latent-based Editing of Real Images
Daniel Roich
Ron Mokady
Amit H. Bermano
Daniel Cohen-Or
DiffM
53
524
0
10 Jun 2021
Taxonomy of Machine Learning Safety: A Survey and Primer
Taxonomy of Machine Learning Safety: A Survey and Primer
Sina Mohseni
Haotao Wang
Zhiding Yu
Chaowei Xiao
Zhangyang Wang
J. Yadawa
31
32
0
09 Jun 2021
Scaling Vision Transformers
Scaling Vision Transformers
Xiaohua Zhai
Alexander Kolesnikov
N. Houlsby
Lucas Beyer
ViT
85
1,070
0
08 Jun 2021
What Makes Multi-modal Learning Better than Single (Provably)
What Makes Multi-modal Learning Better than Single (Provably)
Yu Huang
Chenzhuang Du
Zihui Xue
Xuanyao Chen
Hang Zhao
Longbo Huang
56
254
0
08 Jun 2021
Image2Point: 3D Point-Cloud Understanding with 2D Image Pretrained
  Models
Image2Point: 3D Point-Cloud Understanding with 2D Image Pretrained Models
Chenfeng Xu
Shijia Yang
Tomer Galanti
Bichen Wu
Xiangyu Yue
Bohan Zhai
Wei Zhan
Peter Vajda
Kurt Keutzer
Masayoshi Tomizuka
3DPC
39
53
0
08 Jun 2021
Differentiable Quality Diversity
Differentiable Quality Diversity
Matthew C. Fontaine
Stefanos Nikolaidis
60
89
0
07 Jun 2021
Previous
123...216217218219
Next