ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.00020
  4. Cited By
Learning Transferable Visual Models From Natural Language Supervision

Learning Transferable Visual Models From Natural Language Supervision

26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
    CLIP
    VLM
ArXivPDFHTML

Papers citing "Learning Transferable Visual Models From Natural Language Supervision"

50 / 9,975 papers shown
Title
VLUE: A Multi-Task Benchmark for Evaluating Vision-Language Models
VLUE: A Multi-Task Benchmark for Evaluating Vision-Language Models
Wangchunshu Zhou
Yan Zeng
Shizhe Diao
Xinsong Zhang
CoGe
VLM
32
13
0
30 May 2022
Prompt-aligned Gradient for Prompt Tuning
Prompt-aligned Gradient for Prompt Tuning
Beier Zhu
Yulei Niu
Yucheng Han
Yuehua Wu
Hanwang Zhang
VLM
189
272
0
30 May 2022
SupMAE: Supervised Masked Autoencoders Are Efficient Vision Learners
SupMAE: Supervised Masked Autoencoders Are Efficient Vision Learners
Feng Liang
Yangguang Li
Diana Marculescu
SSL
TPM
ViT
51
22
0
28 May 2022
Parameter-Efficient and Student-Friendly Knowledge Distillation
Parameter-Efficient and Student-Friendly Knowledge Distillation
Jun Rao
Xv Meng
Liang Ding
Shuhan Qi
Dacheng Tao
37
46
0
28 May 2022
CyCLIP: Cyclic Contrastive Language-Image Pretraining
CyCLIP: Cyclic Contrastive Language-Image Pretraining
Shashank Goel
Hritik Bansal
S. Bhatia
Ryan A. Rossi
Vishwa Vinay
Aditya Grover
CLIP
VLM
184
133
0
28 May 2022
Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud
  Pre-training
Point-M2AE: Multi-scale Masked Autoencoders for Hierarchical Point Cloud Pre-training
Renrui Zhang
Ziyu Guo
Rongyao Fang
Bingyan Zhao
Dong Wang
Yu Qiao
Hongsheng Li
Peng Gao
3DPC
184
245
0
28 May 2022
Contrastive Learning Rivals Masked Image Modeling in Fine-tuning via
  Feature Distillation
Contrastive Learning Rivals Masked Image Modeling in Fine-tuning via Feature Distillation
Yixuan Wei
Han Hu
Zhenda Xie
Zheng-Wei Zhang
Yue Cao
Jianmin Bao
Dong Chen
B. Guo
CLIP
88
124
0
27 May 2022
GIT: A Generative Image-to-text Transformer for Vision and Language
GIT: A Generative Image-to-text Transformer for Vision and Language
Jianfeng Wang
Zhengyuan Yang
Xiaowei Hu
Linjie Li
Kevin Qinghong Lin
Zhe Gan
Zicheng Liu
Ce Liu
Lijuan Wang
VLM
59
529
0
27 May 2022
Video2StyleGAN: Disentangling Local and Global Variations in a Video
Video2StyleGAN: Disentangling Local and Global Variations in a Video
Rameen Abdal
Peihao Zhu
Niloy J. Mitra
Peter Wonka
VGen
30
7
0
27 May 2022
A Survey on Long-Tailed Visual Recognition
A Survey on Long-Tailed Visual Recognition
Lu Yang
He Jiang
Q. Song
Jun Guo
22
123
0
27 May 2022
Can Foundation Models Help Us Achieve Perfect Secrecy?
Can Foundation Models Help Us Achieve Perfect Secrecy?
Simran Arora
Christopher Ré
FedML
24
6
0
27 May 2022
Prompt-based Learning for Unpaired Image Captioning
Prompt-based Learning for Unpaired Image Captioning
Peipei Zhu
Tianlin Li
Lin Zhu
Zhenglong Sun
Weishi Zheng
Yaowei Wang
Chia-Ju Chen
VLM
25
31
0
26 May 2022
DisinfoMeme: A Multimodal Dataset for Detecting Meme Intentionally
  Spreading Out Disinformation
DisinfoMeme: A Multimodal Dataset for Detecting Meme Intentionally Spreading Out Disinformation
Jingnong Qu
Liunian Harold Li
Jieyu Zhao
Sunipa Dev
Kai-Wei Chang
21
12
0
25 May 2022
Mutual Information Divergence: A Unified Metric for Multimodal
  Generative Models
Mutual Information Divergence: A Unified Metric for Multimodal Generative Models
Jin-Hwa Kim
Yunji Kim
Jiyoung Lee
Kang Min Yoo
Sang-Woo Lee
EGVM
36
32
0
25 May 2022
The Dialog Must Go On: Improving Visual Dialog via Generative
  Self-Training
The Dialog Must Go On: Improving Visual Dialog via Generative Self-Training
Gi-Cheon Kang
Sungdong Kim
Jin-Hwa Kim
Donghyun Kwak
Byoung-Tak Zhang
32
10
0
25 May 2022
End-to-End Multimodal Fact-Checking and Explanation Generation: A
  Challenging Dataset and Models
End-to-End Multimodal Fact-Checking and Explanation Generation: A Challenging Dataset and Models
Barry Menglong Yao
Aditya Shah
Lichao Sun
Jin-Hee Cho
Lifu Huang
MLLM
LRM
46
78
0
25 May 2022
ATTEMPT: Parameter-Efficient Multi-task Tuning via Attentional Mixtures
  of Soft Prompts
ATTEMPT: Parameter-Efficient Multi-task Tuning via Attentional Mixtures of Soft Prompts
Akari Asai
Mohammadreza Salehi
Matthew E. Peters
Hannaneh Hajishirzi
130
100
0
24 May 2022
M6-Fashion: High-Fidelity Multi-modal Image Generation and Editing
M6-Fashion: High-Fidelity Multi-modal Image Generation and Editing
Zhikang Li
Huiling Zhou
Shuai Bai
Peike Li
Chang Zhou
Hongxia Yang
37
4
0
24 May 2022
On Advances in Text Generation from Images Beyond Captioning: A Case
  Study in Self-Rationalization
On Advances in Text Generation from Images Beyond Captioning: A Case Study in Self-Rationalization
Shruti Palaskar
Akshita Bhagia
Yonatan Bisk
Florian Metze
A. Black
Ana Marasović
25
4
0
24 May 2022
Utilizing Language-Image Pretraining for Efficient and Robust Bilingual
  Word Alignment
Utilizing Language-Image Pretraining for Efficient and Robust Bilingual Word Alignment
Tuan Dinh
Jy-yong Sohn
Shashank Rajput
Timothy Ossowski
Yifei Ming
Junjie Hu
Dimitris Papailiopoulos
Kangwook Lee
28
0
0
23 May 2022
Photorealistic Text-to-Image Diffusion Models with Deep Language
  Understanding
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
Chitwan Saharia
William Chan
Saurabh Saxena
Lala Li
Jay Whang
...
Raphael Gontijo-Lopes
Tim Salimans
Jonathan Ho
David J Fleet
Mohammad Norouzi
VLM
81
5,797
0
23 May 2022
Decoder Denoising Pretraining for Semantic Segmentation
Decoder Denoising Pretraining for Semantic Segmentation
Emmanuel B. Asiedu
Simon Kornblith
Ting Chen
Niki Parmar
Matthias Minderer
Mohammad Norouzi
AI4CE
199
26
0
23 May 2022
Markedness in Visual Semantic AI
Markedness in Visual Semantic AI
Robert Wolfe
Aylin Caliskan
VLM
30
35
0
23 May 2022
GR-GAN: Gradual Refinement Text-to-image Generation
GR-GAN: Gradual Refinement Text-to-image Generation
Bo Yang
Fangxiang Feng
Xiaojie Wang
EGVM
16
7
0
23 May 2022
PEVL: Position-enhanced Pre-training and Prompt Tuning for
  Vision-language Models
PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Models
Yuan Yao
Qi-An Chen
Ao Zhang
Wei Ji
Zhiyuan Liu
Tat-Seng Chua
Maosong Sun
VLM
MLLM
26
38
0
23 May 2022
Covariance Matrix Adaptation MAP-Annealing
Covariance Matrix Adaptation MAP-Annealing
Matthew C. Fontaine
Stefanos Nikolaidis
48
25
0
22 May 2022
Visually-Augmented Language Modeling
Visually-Augmented Language Modeling
Weizhi Wang
Li Dong
Hao Cheng
Haoyu Song
Xiaodong Liu
Xifeng Yan
Jianfeng Gao
Furu Wei
VLM
36
18
0
20 May 2022
The developmental trajectory of object recognition robustness: children
  are like small adults but unlike big deep neural networks
The developmental trajectory of object recognition robustness: children are like small adults but unlike big deep neural networks
Lukas Huber
Robert Geirhos
Felix Wichmann
56
16
0
20 May 2022
RankGen: Improving Text Generation with Large Ranking Models
RankGen: Improving Text Generation with Large Ranking Models
Kalpesh Krishna
Yapei Chang
John Wieting
Mohit Iyyer
AIMat
24
68
0
19 May 2022
Voxel-informed Language Grounding
Voxel-informed Language Grounding
Rodolfo Corona
Shizhan Zhu
Dan Klein
Trevor Darrell
141
11
0
19 May 2022
TransTab: Learning Transferable Tabular Transformers Across Tables
TransTab: Learning Transferable Tabular Transformers Across Tables
Zifeng Wang
Jimeng Sun
LMTD
36
137
0
19 May 2022
AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars
AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars
Fangzhou Hong
Mingyuan Zhang
Liang Pan
Zhongang Cai
Lei Yang
Ziwei Liu
CLIP
98
79
0
17 May 2022
A CLIP-Hitchhiker's Guide to Long Video Retrieval
A CLIP-Hitchhiker's Guide to Long Video Retrieval
Max Bain
Arsha Nagrani
Gül Varol
Andrew Zisserman
CLIP
129
62
0
17 May 2022
Sparse Visual Counterfactual Explanations in Image Space
Sparse Visual Counterfactual Explanations in Image Space
Valentyn Boreiko
Maximilian Augustin
Francesco Croce
Philipp Berens
Matthias Hein
BDL
CML
30
26
0
16 May 2022
Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised
  Semantic Segmentation and Localization
Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised Semantic Segmentation and Localization
Luke Melas-Kyriazi
Christian Rupprecht
Iro Laina
Andrea Vedaldi
30
160
0
16 May 2022
On the Difficulty of Defending Self-Supervised Learning against Model
  Extraction
On the Difficulty of Defending Self-Supervised Learning against Model Extraction
Adam Dziedzic
Nikita Dhawan
Muhammad Ahmad Kaleem
Jonas Guan
Nicolas Papernot
MIACV
54
22
0
16 May 2022
Aligning Robot Representations with Humans
Aligning Robot Representations with Humans
Andreea Bobu
Andi Peng
27
0
0
15 May 2022
Breaking with Fixed Set Pathology Recognition through Report-Guided
  Contrastive Training
Breaking with Fixed Set Pathology Recognition through Report-Guided Contrastive Training
C. Seibold
Simon Reiß
M. Sarfraz
Rainer Stiefelhagen
Jens Kleesiek
21
31
0
14 May 2022
Multimodal Conversational AI: A Survey of Datasets and Approaches
Multimodal Conversational AI: A Survey of Datasets and Approaches
Anirudh S. Sundar
Larry Heck
38
29
0
13 May 2022
A Comprehensive Survey of Few-shot Learning: Evolution, Applications,
  Challenges, and Opportunities
A Comprehensive Survey of Few-shot Learning: Evolution, Applications, Challenges, and Opportunities
Yisheng Song
Ting-Yuan Wang
S. Mondal
J. P. Sahoo
SLR
50
344
0
13 May 2022
The Creativity of Text-to-Image Generation
The Creativity of Text-to-Image Generation
J. Oppenlaender
25
192
0
13 May 2022
PoisonedEncoder: Poisoning the Unlabeled Pre-training Data in
  Contrastive Learning
PoisonedEncoder: Poisoning the Unlabeled Pre-training Data in Contrastive Learning
Hongbin Liu
Jinyuan Jia
Neil Zhenqiang Gong
25
34
0
13 May 2022
What's in a Caption? Dataset-Specific Linguistic Diversity and Its
  Effect on Visual Description Models and Metrics
What's in a Caption? Dataset-Specific Linguistic Diversity and Its Effect on Visual Description Models and Metrics
David M. Chan
Austin Myers
Sudheendra Vijayanarasimhan
David A. Ross
Bryan Seybold
John F. Canny
28
6
0
12 May 2022
Simple Open-Vocabulary Object Detection with Vision Transformers
Simple Open-Vocabulary Object Detection with Vision Transformers
Matthias Minderer
A. Gritsenko
Austin Stone
Maxim Neumann
Dirk Weissenborn
...
Zhuoran Shen
Tianlin Li
Xiaohua Zhai
Thomas Kipf
N. Houlsby
ObjD
CLIP
VLM
ViT
OCL
34
307
0
12 May 2022
The Mechanism of Prediction Head in Non-contrastive Self-supervised
  Learning
The Mechanism of Prediction Head in Non-contrastive Self-supervised Learning
Zixin Wen
Yuanzhi Li
SSL
27
34
0
12 May 2022
Automated Audio Captioning: An Overview of Recent Progress and New
  Challenges
Automated Audio Captioning: An Overview of Recent Progress and New Challenges
Xinhao Mei
Xubo Liu
Mark D. Plumbley
Wenwu Wang
29
37
0
12 May 2022
Deep Learning and Synthetic Media
Deep Learning and Synthetic Media
Raphaël Millière
23
18
0
11 May 2022
Learning to Retrieve Videos by Asking Questions
Learning to Retrieve Videos by Asking Questions
Avinash Madasu
Junier Oliva
Gedas Bertasius
VGen
32
16
0
11 May 2022
DISARM: Detecting the Victims Targeted by Harmful Memes
DISARM: Detecting the Victims Targeted by Harmful Memes
Shivam Sharma
Md. Shad Akhtar
Preslav Nakov
Tanmoy Chakraborty
16
29
0
11 May 2022
Learning to Answer Visual Questions from Web Videos
Learning to Answer Visual Questions from Web Videos
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
ViT
37
33
0
10 May 2022
Previous
123...188189190...198199200
Next