ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.00020
  4. Cited By
Learning Transferable Visual Models From Natural Language Supervision

Learning Transferable Visual Models From Natural Language Supervision

26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
    CLIP
    VLM
ArXivPDFHTML

Papers citing "Learning Transferable Visual Models From Natural Language Supervision"

50 / 9,984 papers shown
Title
DualCoOp: Fast Adaptation to Multi-Label Recognition with Limited
  Annotations
DualCoOp: Fast Adaptation to Multi-Label Recognition with Limited Annotations
Ximeng Sun
Ping Hu
Kate Saenko
VLM
36
120
0
20 Jun 2022
GaLeNet: Multimodal Learning for Disaster Prediction, Management and
  Relief
GaLeNet: Multimodal Learning for Disaster Prediction, Management and Relief
Rohit Saha
Meng Fang
Angeline Yasodhara
Kyryl Truskovskyi
Azin Asgarian
D. Homola
Raahil Shah
Frederik Dieleman
Jack Weatheritt
Thomas Rogers
23
3
0
18 Jun 2022
Score-Guided Intermediate Layer Optimization: Fast Langevin Mixing for
  Inverse Problems
Score-Guided Intermediate Layer Optimization: Fast Langevin Mixing for Inverse Problems
Giannis Daras
Y. Dagan
A. Dimakis
C. Daskalakis
BDL
31
15
0
18 Jun 2022
Self-Supervised Learning for Videos: A Survey
Self-Supervised Learning for Videos: A Survey
Madeline Chantry Schiappa
Yogesh S Rawat
M. Shah
SSL
36
131
0
18 Jun 2022
Landscape Learning for Neural Network Inversion
Landscape Learning for Neural Network Inversion
Ruoshi Liu
Chen-Guang Mao
Purva Tendulkar
Hongya Wang
Carl Vondrick
38
8
0
17 Jun 2022
VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix
VLMixer: Unpaired Vision-Language Pre-training via Cross-Modal CutMix
Teng Wang
Wenhao Jiang
Zhichao Lu
Feng Zheng
Ran Cheng
Chengguo Yin
Ping Luo
VLM
34
42
0
17 Jun 2022
Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
Jiasen Lu
Christopher Clark
Rowan Zellers
Roozbeh Mottaghi
Aniruddha Kembhavi
ObjD
VLM
MLLM
74
393
0
17 Jun 2022
MineDojo: Building Open-Ended Embodied Agents with Internet-Scale
  Knowledge
MineDojo: Building Open-Ended Embodied Agents with Internet-Scale Knowledge
Linxi Fan
Guanzhi Wang
Yunfan Jiang
Ajay Mandlekar
Yuncong Yang
Haoyi Zhu
Andrew Tang
De-An Huang
Yuke Zhu
Anima Anandkumar
LM&Ro
51
352
0
17 Jun 2022
Entity-Graph Enhanced Cross-Modal Pretraining for Instance-level Product
  Retrieval
Entity-Graph Enhanced Cross-Modal Pretraining for Instance-level Product Retrieval
Xiao Dong
Xunlin Zhan
Yunchao Wei
Xiaoyong Wei
Yaowei Wang
Minlong Lu
Xiaochun Cao
Xiaodan Liang
27
11
0
17 Jun 2022
Rectify ViT Shortcut Learning by Visual Saliency
Rectify ViT Shortcut Learning by Visual Saliency
Chong Ma
Lin Zhao
Yuzhong Chen
David Liu
Xi Jiang
Tuo Zhang
Xintao Hu
Dinggang Shen
Dajiang Zhu
Tianming Liu
ViT
36
20
0
17 Jun 2022
Rarity Score : A New Metric to Evaluate the Uncommonness of Synthesized
  Images
Rarity Score : A New Metric to Evaluate the Uncommonness of Synthesized Images
Jiyeon Han
Hwanil Choi
Yunjey Choi
Jae Hyun Kim
Jung-Woo Ha
Jaesik Choi
EGVM
20
31
0
17 Jun 2022
VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation
VLMbench: A Compositional Benchmark for Vision-and-Language Manipulation
Kai Zheng
Xiaotong Chen
Odest Chadwicke Jenkins
Qing Guo
LM&Ro
CoGe
24
54
0
17 Jun 2022
Zero-Shot Video Question Answering via Frozen Bidirectional Language
  Models
Zero-Shot Video Question Answering via Frozen Bidirectional Language Models
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
41
228
0
16 Jun 2022
Patch-level Representation Learning for Self-supervised Vision
  Transformers
Patch-level Representation Learning for Self-supervised Vision Transformers
Sukmin Yun
Hankook Lee
Jaehyung Kim
Jinwoo Shin
ViT
22
64
0
16 Jun 2022
Disentangling visual and written concepts in CLIP
Disentangling visual and written concepts in CLIP
Joanna Materzyñska
Antonio Torralba
David Bau
CoGe
23
47
0
15 Jun 2022
Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
Coarse-to-Fine Vision-Language Pre-training with Fusion in the Backbone
Zi-Yi Dou
Aishwarya Kamath
Zhe Gan
Pengchuan Zhang
Jianfeng Wang
...
Ce Liu
Yann LeCun
Nanyun Peng
Jianfeng Gao
Lijuan Wang
VLM
ObjD
30
124
0
15 Jun 2022
A Meta-Analysis of Distributionally-Robust Models
A Meta-Analysis of Distributionally-Robust Models
Ben Feuer
Ameya Joshi
C. Hegde
OOD
VLM
40
3
0
15 Jun 2022
Forecasting of depth and ego-motion with transformers and
  self-supervision
Forecasting of depth and ego-motion with transformers and self-supervision
Houssem-eddine Boulahbal
A. Voicila
Andrew I. Comport
ViT
MDE
27
3
0
15 Jun 2022
Zero-shot object goal visual navigation
Zero-shot object goal visual navigation
Qianfan Zhao
Lu Zhang
Bin He
Hong Qiao
Zhi-yong Liu
36
37
0
15 Jun 2022
Differentiable Top-k Classification Learning
Differentiable Top-k Classification Learning
Felix Petersen
Hilde Kuehne
Christian Borgelt
Oliver Deussen
59
28
0
15 Jun 2022
Beyond Grounding: Extracting Fine-Grained Event Hierarchies Across
  Modalities
Beyond Grounding: Extracting Fine-Grained Event Hierarchies Across Modalities
Hammad A. Ayyubi
Christopher Thomas
Lovish Chum
R. Lokesh
Long Chen
...
Xudong Lin
Xuande Feng
Jaywon Koo
Sounak Ray
Shih-Fu Chang
AI4TS
31
0
0
14 Jun 2022
LAVENDER: Unifying Video-Language Understanding as Masked Language
  Modeling
LAVENDER: Unifying Video-Language Understanding as Masked Language Modeling
Linjie Li
Zhe Gan
Kevin Qinghong Lin
Chung-Ching Lin
Zicheng Liu
Ce Liu
Lijuan Wang
MLLM
VLM
20
81
0
14 Jun 2022
Self-Supervision on Images and Text Reduces Reliance on Visual Shortcut
  Features
Self-Supervision on Images and Text Reduces Reliance on Visual Shortcut Features
Anil Palepu
Andrew L. Beam
OOD
VLM
29
5
0
14 Jun 2022
Prioritized Training on Points that are Learnable, Worth Learning, and
  Not Yet Learnt
Prioritized Training on Points that are Learnable, Worth Learning, and Not Yet Learnt
Sören Mindermann
J. Brauner
Muhammed Razzak
Mrinank Sharma
Andreas Kirsch
...
Benedikt Höltgen
Aidan Gomez
Adrien Morisot
Sebastian Farquhar
Y. Gal
62
149
0
14 Jun 2022
ProcTHOR: Large-Scale Embodied AI Using Procedural Generation
ProcTHOR: Large-Scale Embodied AI Using Procedural Generation
Matt Deitke
Eli VanderBilt
Alvaro Herrasti
Luca Weihs
Jordi Salvador
...
Winson Han
Eric Kolve
Ali Farhadi
Aniruddha Kembhavi
Roozbeh Mottaghi
LM&Ro
44
237
0
14 Jun 2022
Comprehending and Ordering Semantics for Image Captioning
Comprehending and Ordering Semantics for Image Captioning
Yehao Li
Yingwei Pan
Ting Yao
Tao Mei
26
88
0
14 Jun 2022
LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer
  Learning
LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning
Yi-Lin Sung
Jaemin Cho
Joey Tianyi Zhou
VLM
21
237
0
13 Jun 2022
Multimodal Learning with Transformers: A Survey
Multimodal Learning with Transformers: A Survey
P. Xu
Xiatian Zhu
David A. Clifton
ViT
72
528
0
13 Jun 2022
Transductive CLIP with Class-Conditional Contrastive Learning
Transductive CLIP with Class-Conditional Contrastive Learning
Junchu Huang
Weijie Chen
Shicai Yang
Di Xie
Shiliang Pu
Yueting Zhuang
VLM
BDL
NoLa
27
6
0
13 Jun 2022
INDIGO: Intrinsic Multimodality for Domain Generalization
INDIGO: Intrinsic Multimodality for Domain Generalization
Puneet Mangla
Shivam Chandhok
Milan Aggarwal
V. Balasubramanian
Balaji Krishnamurthy
VLM
41
2
0
13 Jun 2022
Bootstrapping Multi-view Representations for Fake News Detection
Bootstrapping Multi-view Representations for Fake News Detection
Qichao Ying
Xiaoxiao Hu
Yangming Zhou
Zhenxing Qian
Dan Zeng
Shiming Ge
24
45
0
12 Jun 2022
Referring Image Matting
Referring Image Matting
Jizhizi Li
Jing Zhang
Dacheng Tao
ObjD
VLM
26
22
0
10 Jun 2022
Seeing the forest and the tree: Building representations of both
  individual and collective dynamics with transformers
Seeing the forest and the tree: Building representations of both individual and collective dynamics with transformers
Ran Liu
Mehdi Azabou
M. Dabagia
Jingyun Xiao
Eva L. Dyer
AI4CE
32
19
0
10 Jun 2022
Neural Prompt Search
Neural Prompt Search
Yuanhan Zhang
Kaiyang Zhou
Ziwei Liu
VPVLM
VLM
46
144
0
09 Jun 2022
Extreme Masking for Learning Instance and Distributed Visual
  Representations
Extreme Masking for Learning Instance and Distributed Visual Representations
Zhirong Wu
Zihang Lai
Xiao Sun
Stephen Lin
35
22
0
09 Jun 2022
DORA: Exploring Outlier Representations in Deep Neural Networks
DORA: Exploring Outlier Representations in Deep Neural Networks
Kirill Bykov
Mayukh Deb
Dennis Grinwald
Klaus-Robert Muller
Marina M.-C. Höhne
27
12
0
09 Jun 2022
The Missing Link: Finding label relations across datasets
The Missing Link: Finding label relations across datasets
J. Uijlings
Thomas Mensink
V. Ferrari
VLM
29
10
0
09 Jun 2022
FOAM: A Follower-aware Speaker Model For Vision-and-Language Navigation
FOAM: A Follower-aware Speaker Model For Vision-and-Language Navigation
Zi-Yi Dou
Nanyun Peng
26
22
0
09 Jun 2022
Decentralized, not Dehumanized in the Metaverse: Bringing Utility to
  NFTs through Multimodal Interaction
Decentralized, not Dehumanized in the Metaverse: Bringing Utility to NFTs through Multimodal Interaction
Anqi Wang
Ze-Feng Gao
Lik-Hang Lee
Tristan Braud
Pan Hui
31
19
0
08 Jun 2022
Revealing Single Frame Bias for Video-and-Language Learning
Revealing Single Frame Bias for Video-and-Language Learning
Jie Lei
Tamara L. Berg
Joey Tianyi Zhou
24
111
0
07 Jun 2022
Intra-agent speech permits zero-shot task acquisition
Intra-agent speech permits zero-shot task acquisition
Chen Yan
Federico Carnevale
Petko Georgiev
Adam Santoro
Aurelia Guy
Alistair Muldal
Chia-Chun Hung
Josh Abramson
Timothy Lillicrap
Greg Wayne
LM&Ro
36
9
0
07 Jun 2022
Training Subset Selection for Weak Supervision
Training Subset Selection for Weak Supervision
Hunter Lang
Aravindan Vijayaraghavan
David Sontag
NoLa
16
21
0
06 Jun 2022
Blended Latent Diffusion
Blended Latent Diffusion
Omri Avrahami
Ohad Fried
Dani Lischinski
DiffM
77
373
0
06 Jun 2022
Volumetric Disentanglement for 3D Scene Manipulation
Volumetric Disentanglement for 3D Scene Manipulation
Sagie Benaim
Frederik Warburg
Peter Ebert Christensen
Serge Belongie
31
15
0
06 Jun 2022
APES: Articulated Part Extraction from Sprite Sheets
APES: Articulated Part Extraction from Sprite Sheets
Zhan Xu
Matthew Fisher
Yang Zhou
Deepali Aneja
Rushikesh Dudhat
Li Yi
E. Kalogerakis
41
2
0
04 Jun 2022
Delving into the Openness of CLIP
Delving into the Openness of CLIP
Shuhuai Ren
Lei Li
Xuancheng Ren
Guangxiang Zhao
Xu Sun
VLM
28
13
0
04 Jun 2022
Revisiting the "Video" in Video-Language Understanding
Revisiting the "Video" in Video-Language Understanding
S. Buch
Cristobal Eyzaguirre
Adrien Gaidon
Jiajun Wu
L. Fei-Fei
Juan Carlos Niebles
41
158
0
03 Jun 2022
A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge
A-OKVQA: A Benchmark for Visual Question Answering using World Knowledge
Dustin Schwenk
Apoorv Khandelwal
Christopher Clark
Kenneth Marino
Roozbeh Mottaghi
16
506
0
03 Jun 2022
Fine-tuning Language Models over Slow Networks using Activation
  Compression with Guarantees
Fine-tuning Language Models over Slow Networks using Activation Compression with Guarantees
Jue Wang
Binhang Yuan
Luka Rimanic
Yongjun He
Tri Dao
Beidi Chen
Christopher Ré
Ce Zhang
AI4CE
24
11
0
02 Jun 2022
Decentralized Training of Foundation Models in Heterogeneous
  Environments
Decentralized Training of Foundation Models in Heterogeneous Environments
Binhang Yuan
Yongjun He
Jared Davis
Tianyi Zhang
Tri Dao
Beidi Chen
Percy Liang
Christopher Ré
Ce Zhang
33
90
0
02 Jun 2022
Previous
123...187188189...198199200
Next