ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.00020
  4. Cited By
Learning Transferable Visual Models From Natural Language Supervision

Learning Transferable Visual Models From Natural Language Supervision

26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
    CLIPVLM
ArXiv (abs)PDFHTMLGithub (29177★)

Papers citing "Learning Transferable Visual Models From Natural Language Supervision"

50 / 1,722 papers shown
Title
Diversify, Don't Fine-Tune: Scaling Up Visual Recognition Training with Synthetic Images
Diversify, Don't Fine-Tune: Scaling Up Visual Recognition Training with Synthetic Images
Zhuoran Yu
Chenchen Zhu
Sean Culatana
Raghuraman Krishnamoorthi
Fanyi Xiao
Yong Jae Lee
171
15
0
04 Dec 2023
StoryGPT-V: Large Language Models as Consistent Story Visualizers
StoryGPT-V: Large Language Models as Consistent Story Visualizers
Xiaoqian Shen
Mohamed Elhoseiny
VLM
185
11
0
04 Dec 2023
Visual Encoders for Data-Efficient Imitation Learning in Modern Video Games
Visual Encoders for Data-Efficient Imitation Learning in Modern Video Games
Lukas Schäfer
Logan Jones
Anssi Kanervisto
Yuhan Cao
Tabish Rashid
Raluca Georgescu
David Bignell
Siddhartha Sen
Andrea Trevino Gavito
Sam Devlin
162
3
0
04 Dec 2023
Meta ControlNet: Enhancing Task Adaptation via Meta Learning
Meta ControlNet: Enhancing Task Adaptation via Meta Learning
Junjie Yang
Jinze Zhao
Peihao Wang
Zhangyang Wang
Yingbin Liang
105
3
0
03 Dec 2023
FreeZe: Training-free zero-shot 6D pose estimation with geometric and vision foundation models
FreeZe: Training-free zero-shot 6D pose estimation with geometric and vision foundation models
Andrea Caraffa
Davide Boscaini
Amir Hamza
Fabio Poiesi
108
18
0
01 Dec 2023
Segment Any 3D Gaussians
Segment Any 3D Gaussians
Jiazhong Cen
Jiemin Fang
Chen Yang
Lingxi Xie
Xiaopeng Zhang
Wei Shen
Qi Tian
3DGS
146
76
0
01 Dec 2023
Spacewalk-18: A Benchmark for Multimodal and Long-form Procedural Video Understanding in Novel Domains
Spacewalk-18: A Benchmark for Multimodal and Long-form Procedural Video Understanding in Novel Domains
Rohan Myer Krishnan
Zitian Tang
Zhiqiu Yu
Chen Sun
128
2
0
30 Nov 2023
ElasticDiffusion: Training-free Arbitrary Size Image Generation through Global-Local Content Separation
ElasticDiffusion: Training-free Arbitrary Size Image Generation through Global-Local Content Separation
Moayed Haji-Ali
Guha Balakrishnan
Vicente Ordonez
165
27
0
30 Nov 2023
Critical Influence of Overparameterization on Sharpness-aware Minimization
Critical Influence of Overparameterization on Sharpness-aware Minimization
Sungbin Shin
Dongyeop Lee
Maksym Andriushchenko
Namhoon Lee
AAML
150
2
0
29 Nov 2023
Meta Co-Training: Two Views are Better than One
Meta Co-Training: Two Views are Better than One
Jay C. Rothenberger
Dimitrios I. Diochnos
VLM
160
3
0
29 Nov 2023
CLAP: Isolating Content from Style through Contrastive Learning with Augmented Prompts
CLAP: Isolating Content from Style through Contrastive Learning with Augmented Prompts
Yichao Cai
Yuhang Liu
Zhen Zhang
Javen Qinfeng Shi
CLIPVLM
126
8
0
28 Nov 2023
FALCON: Fairness Learning via Contrastive Attention Approach to Continual Semantic Scene Understanding
FALCON: Fairness Learning via Contrastive Attention Approach to Continual Semantic Scene Understanding
Thanh-Dat Truong
Utsav Prabhu
Bhiksha Raj
Jackson Cothren
Khoa Luu
CLL
161
3
0
27 Nov 2023
Regularization by Texts for Latent Diffusion Inverse Solvers
Regularization by Texts for Latent Diffusion Inverse Solvers
Jeongsol Kim
Geon Yeong Park
Hyungjin Chung
Jong Chul Ye
AI4CE
140
16
0
27 Nov 2023
Image Super-Resolution with Text Prompt Diffusion
Image Super-Resolution with Text Prompt Diffusion
Zheng Chen
Yulun Zhang
Jinjin Gu
Xin Yuan
Linghe Kong
Guihai Chen
Xiaokang Yang
DiffM
142
21
0
24 Nov 2023
Paragraph-to-Image Generation with Information-Enriched Diffusion Model
Paragraph-to-Image Generation with Information-Enriched Diffusion Model
Weijia Wu
Zhuang Li
Yefei He
Mike Zheng Shou
Chunhua Shen
Lele Cheng
Yan Li
Yan Li
Di Zhang
VLM
220
25
0
24 Nov 2023
HOPE: A Memory-Based and Composition-Aware Framework for Zero-Shot Learning with Hopfield Network and Soft Mixture of Experts
HOPE: A Memory-Based and Composition-Aware Framework for Zero-Shot Learning with Hopfield Network and Soft Mixture of Experts
Do Huu Dat
Po Yuan Mao
Tien Hoang Nguyen
Wray Buntine
Bennamoun
104
1
0
23 Nov 2023
SegVol: Universal and Interactive Volumetric Medical Image Segmentation
SegVol: Universal and Interactive Volumetric Medical Image Segmentation
Yuxin Du
Fan Bai
Tiejun Huang
Bo Zhao
VLM
107
43
0
22 Nov 2023
Nepotistically Trained Generative-AI Models Collapse
Nepotistically Trained Generative-AI Models Collapse
Matyáš Boháček
Hany Farid
105
19
0
20 Nov 2023
MultiDelete for Multimodal Machine Unlearning
MultiDelete for Multimodal Machine Unlearning
Jiali Cheng
Hadi Amiri
MU
107
9
0
18 Nov 2023
Pretrain like Your Inference: Masked Tuning Improves Zero-Shot Composed Image Retrieval
Pretrain like Your Inference: Masked Tuning Improves Zero-Shot Composed Image Retrieval
Junyang Chen
Hanjiang Lai
VLM
115
15
0
13 Nov 2023
FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts
FigStep: Jailbreaking Large Vision-Language Models via Typographic Visual Prompts
Yichen Gong
Delong Ran
Jinyuan Liu
Conglei Wang
Tianshuo Cong
Anyu Wang
Sisi Duan
Xiaoyun Wang
MLLM
224
160
0
09 Nov 2023
Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models
Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models
Hanlin Zhang
Benjamin L. Edelman
Danilo Francati
Daniele Venturi
G. Ateniese
Boaz Barak
WaLM
240
64
0
07 Nov 2023
CLIP-Motion: Learning Reward Functions for Robotic Actions Using Consecutive Observations
CLIP-Motion: Learning Reward Functions for Robotic Actions Using Consecutive Observations
Xuzhe Dang
Stefan Edelkamp
164
4
0
06 Nov 2023
Advances in Embodied Navigation Using Large Language Models: A Survey
Advances in Embodied Navigation Using Large Language Models: A Survey
Jinzhou Lin
Han Gao
Xuxiang Feng
Rongtao Xu
Changwei Wang
Man Zhang
Li Guo
Shibiao Xu
LM&RoLLMAG
154
10
0
01 Nov 2023
Audio-Visual Instance Segmentation
Audio-Visual Instance Segmentation
Ruohao Guo
Yaru Chen
Yanyu Qi
Wenzhen Yue
Dantong Niu
...
Wenzhen Yue
Ji Shi
Qixun Wang
Peiliang Zhang
Buwen Liang
VLMVOS
107
2
0
28 Oct 2023
Grid Jigsaw Representation with CLIP: A New Perspective on Image Clustering
Grid Jigsaw Representation with CLIP: A New Perspective on Image Clustering
Zijie Song
Zhenzhen Hu
Richang Hong
SSL
89
0
0
27 Oct 2023
On the Proactive Generation of Unsafe Images From Text-To-Image Models Using Benign Prompts
On the Proactive Generation of Unsafe Images From Text-To-Image Models Using Benign Prompts
Yixin Wu
Ning Yu
Michael Backes
Yun Shen
Yang Zhang
DiffM
137
8
0
25 Oct 2023
FLTrojan: Privacy Leakage Attacks against Federated Language Models Through Selective Weight Tampering
FLTrojan: Privacy Leakage Attacks against Federated Language Models Through Selective Weight Tampering
Md Rafi Ur Rashid
Vishnu Asutosh Dasu
Kang Gu
Najrin Sultana
Shagufta Mehnaz
AAMLFedML
162
12
0
24 Oct 2023
PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm
PonderV2: Pave the Way for 3D Foundation Model with A Universal Pre-training Paradigm
Haoyi Zhu
Honghui Yang
Xiaoyang Wu
Di Huang
Sha Zhang
...
Hengshuang Zhao
Chunhua Shen
Yu Qiao
Tong He
Wanli Ouyang
SSL
176
47
0
12 Oct 2023
SpikeCLIP: A Contrastive Language-Image Pretrained Spiking Neural Network
SpikeCLIP: A Contrastive Language-Image Pretrained Spiking Neural Network
Changze Lv
Tianlong Li
Changze Lv
Yufei Gu
Jianhan Xu
Cenyuan Zhang
Muling Wu
Xiaoqing Zheng
Xuanjing Huang
CLIPVLM
132
3
0
10 Oct 2023
TextPSG: Panoptic Scene Graph Generation from Textual Descriptions
TextPSG: Panoptic Scene Graph Generation from Textual Descriptions
Chengyang Zhao
Songlin Yang
Zhenfang Chen
Mingyu Ding
Chuang Gan
149
17
0
10 Oct 2023
URLOST: Unsupervised Representation Learning without Stationarity or Topology
URLOST: Unsupervised Representation Learning without Stationarity or Topology
Zeyu Yun
Juexiao Zhang
Bruno A. Olshausen
Yann LeCun
189
1
0
06 Oct 2023
PrototypeFormer: Learning to Explore Prototype Relationships for Few-shot Image Classification
PrototypeFormer: Learning to Explore Prototype Relationships for Few-shot Image Classification
Feihong He
Gang Li
Hui Xiong
VLMViT
118
2
0
05 Oct 2023
CLEVRER-Humans: Describing Physical and Causal Events the Human Way
CLEVRER-Humans: Describing Physical and Causal Events the Human Way
Jiayuan Mao
Xuelin Yang
Xikun Zhang
Noah D. Goodman
Jiajun Wu
NAI
94
22
0
05 Oct 2023
Ctrl-Room: Controllable Text-to-3D Room Meshes Generation with Layout Constraints
Ctrl-Room: Controllable Text-to-3D Room Meshes Generation with Layout Constraints
Chuan Fang
Yuan Dong
Kunming Luo
Xiaotao Hu
Rakesh Shrestha
Ping Tan
DiffM
133
37
0
05 Oct 2023
Sweeping Heterogeneity with Smart MoPs: Mixture of Prompts for LLM Task Adaptation
Sweeping Heterogeneity with Smart MoPs: Mixture of Prompts for LLM Task Adaptation
Chen Dun
Mirian Hipolito Garcia
Guoqing Zheng
Ahmed Hassan Awadallah
Anastasios Kyrillidis
Robert Sim
192
6
0
04 Oct 2023
LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving
LanguageMPC: Large Language Models as Decision Makers for Autonomous Driving
Hao Sha
Yao Mu
Yuxuan Jiang
Li Chen
Chenfeng Xu
Ping Luo
Shengbo Eben Li
Masayoshi Tomizuka
Wei Zhan
Mingyu Ding
256
179
0
04 Oct 2023
Subjective Face Transform using Human First Impressions
Subjective Face Transform using Human First Impressions
Chaitanya Roygaga
Joshua Krinsky
Kai Zhang
Kenny Kwok
Aparna Bharati
CVBM
143
0
0
27 Sep 2023
STUPD: A Synthetic Dataset for Spatial and Temporal Relation Reasoning
STUPD: A Synthetic Dataset for Spatial and Temporal Relation Reasoning
Palaash Agrawal
Haidi Azaman
Cheston Tan
132
3
0
13 Sep 2023
PILOT: A Pre-Trained Model-Based Continual Learning Toolbox
PILOT: A Pre-Trained Model-Based Continual Learning Toolbox
Hai-Long Sun
Da-Wei Zhou
Han-Jia Ye
De-Chuan Zhan
CLL
192
34
0
13 Sep 2023
Language Prompt for Autonomous Driving
Language Prompt for Autonomous Driving
Dongming Wu
Wencheng Han
Tiancai Wang
Yingfei Liu
Cheng-zhong Xu
Jianbing Shen
Jianbing Shen
VLM
112
87
0
08 Sep 2023
AI-Generated Content (AIGC) for Various Data Modalities: A Survey
AI-Generated Content (AIGC) for Various Data Modalities: A Survey
Lin Geng Foo
Hossein Rahmani
Jing Liu
240
31
0
27 Aug 2023
Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning
Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning
Jiasheng Ye
Zaixiang Zheng
Yu Bao
Lihua Qian
Quanquan Gu
DiffM
161
19
0
23 Aug 2023
A Foundation Language-Image Model of the Retina (FLAIR): Encoding Expert Knowledge in Text Supervision
A Foundation Language-Image Model of the Retina (FLAIR): Encoding Expert Knowledge in Text Supervision
Julio Silva-Rodríguez
H. Chakor
Riadh Kobbi
Jose Dolz
Ismail Ben Ayed
VLMMedIm
231
44
0
15 Aug 2023
Exploring Part-Informed Visual-Language Learning for Person Re-Identification
Exploring Part-Informed Visual-Language Learning for Person Re-Identification
Y. Lin
Cong Liu
Yehansen Chen
Jinshui Hu
Bing Yin
Baocai Yin
Zengfu Wang
157
7
0
04 Aug 2023
Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures
Learning Multi-modal Representations by Watching Hundreds of Surgical Video Lectures
Kun Yuan
V. Srivastav
Tong Yu
Joël L. Lavanchy
J. Marescaux
Pietro Mascagni
Nassir Navab
N. Padoy
150
23
0
27 Jul 2023
An Empirical Study of Pre-trained Model Selection for Out-of-Distribution Generalization and Calibration
An Empirical Study of Pre-trained Model Selection for Out-of-Distribution Generalization and Calibration
Hiroki Naganuma
Ryuichiro Hataya
Kotaro Yoshida
Ioannis Mitliagkas
OODD
169
3
0
17 Jul 2023
Linear Alignment of Vision-language Models for Image Captioning
Linear Alignment of Vision-language Models for Image Captioning
Fabian Paischer
M. Hofmarcher
Sepp Hochreiter
Thomas Adler
CLIPVLM
153
0
0
10 Jul 2023
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Shilong Zhang
Pei Sun
Shoufa Chen
Min Xiao
Wenqi Shao
Wenwei Zhang
Yu Liu
Kai-xiang Chen
Ping Luo
MLLMVLM
154
238
0
07 Jul 2023
All in One: Exploring Unified Vision-Language Tracking with Multi-Modal Alignment
All in One: Exploring Unified Vision-Language Tracking with Multi-Modal Alignment
Chunhui Zhang
Xin Sun
Li Liu
Yiqian Yang
Qiong Liu
Xiaoping Zhou
Yanfeng Wang
196
16
0
07 Jul 2023
Previous
123...303132333435
Next