ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.00020
  4. Cited By
Learning Transferable Visual Models From Natural Language Supervision

Learning Transferable Visual Models From Natural Language Supervision

26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
    CLIP
    VLM
ArXivPDFHTML

Papers citing "Learning Transferable Visual Models From Natural Language Supervision"

50 / 10,418 papers shown
Title
NoisyTwins: Class-Consistent and Diverse Image Generation through
  StyleGANs
NoisyTwins: Class-Consistent and Diverse Image Generation through StyleGANs
Harsh Rangwani
Lavish Bansal
Kartik Sharma
Tejan Karmali
Varun Jampani
R. Venkatesh Babu
31
13
0
12 Apr 2023
Gradient-Free Textual Inversion
Gradient-Free Textual Inversion
Zhengcong Fei
Mingyuan Fan
Junshi Huang
DiffM
44
31
0
12 Apr 2023
ALADIN-NST: Self-supervised disentangled representation learning of
  artistic style through Neural Style Transfer
ALADIN-NST: Self-supervised disentangled representation learning of artistic style through Neural Style Transfer
Dan Ruta
Gemma Canet Tarrés
Alexander Black
Andrew Gilbert
John Collomosse
DRL
OOD
32
3
0
12 Apr 2023
SketchANIMAR: Sketch-based 3D Animal Fine-Grained Retrieval
SketchANIMAR: Sketch-based 3D Animal Fine-Grained Retrieval
Trung-Nghia Le
Tam V. Nguyen
Minh-Quan Le
Trong-Thuan Nguyen
Viet-Tham Huynh
...
Hoai-Danh Vo
Minh H. Doan
Hai-Dang Nguyen
Akihiro Sugimoto
M. Tran
3DV
65
4
0
12 Apr 2023
InterGen: Diffusion-based Multi-human Motion Generation under Complex
  Interactions
InterGen: Diffusion-based Multi-human Motion Generation under Complex Interactions
Hanming Liang
Wenqian Zhang
Wenxu Li
Jingyi Yu
Lan Xu
DiffM
VGen
31
102
0
12 Apr 2023
Open-TransMind: A New Baseline and Benchmark for 1st Foundation Model
  Challenge of Intelligent Transportation
Open-TransMind: A New Baseline and Benchmark for 1st Foundation Model Challenge of Intelligent Transportation
Yifeng Shi
Feng Lv
Xinliang Wang
Chunlong Xia
Shaojie Li
Shu-Zhen Yang
Teng Xi
Gang Zhang
VLM
51
13
0
12 Apr 2023
Looking Similar, Sounding Different: Leveraging Counterfactual
  Cross-Modal Pairs for Audiovisual Representation Learning
Looking Similar, Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning
Nikhil Singh
Chih-Wei Wu
Iroro Orife
Mahdi M. Kalayeh
43
2
0
12 Apr 2023
Learning Transferable Pedestrian Representation from Multimodal
  Information Supervision
Learning Transferable Pedestrian Representation from Multimodal Information Supervision
Li-Na Bao
Longhui Wei
Xiaoyu Qiu
Wen-gang Zhou
Houqiang Li
Qi Tian
SSL
47
5
0
12 Apr 2023
VidStyleODE: Disentangled Video Editing via StyleGAN and NeuralODEs
VidStyleODE: Disentangled Video Editing via StyleGAN and NeuralODEs
Moayed Haji-Ali
Andrew Bond
Tolga Birdal
Duygu Ceylan
Levent Karacan
Erkut Erdem
Aykut Erdem
VGen
DiffM
136
2
0
12 Apr 2023
CamDiff: Camouflage Image Augmentation via Diffusion Model
CamDiff: Camouflage Image Augmentation via Diffusion Model
Xuejiao Luo
Shuo Wang
Zongwei Wu
Daniel Gehrig
Yun Cheng
Deng-Ping Fan
Luc Van Gool
DiffM
41
19
0
11 Apr 2023
HRS-Bench: Holistic, Reliable and Scalable Benchmark for Text-to-Image
  Models
HRS-Bench: Holistic, Reliable and Scalable Benchmark for Text-to-Image Models
Eslam Mohamed Bakr
Pengzhan Sun
Xiaoqian Shen
Faizan Farooq Khan
Li Erran Li
Mohamed Elhoseiny
VLM
38
77
0
11 Apr 2023
ELVIS: Empowering Locality of Vision Language Pre-training with
  Intra-modal Similarity
ELVIS: Empowering Locality of Vision Language Pre-training with Intra-modal Similarity
Sumin Seo
Jaewoong Shin
Jaewoo Kang
Tae Soo Kim
Thijs Kooi
37
1
0
11 Apr 2023
Improving Image Recognition by Retrieving from Web-Scale Image-Text Data
Improving Image Recognition by Retrieving from Web-Scale Image-Text Data
Ahmet Iscen
Alireza Fathi
Cordelia Schmid
VLM
3DV
49
25
0
11 Apr 2023
Generating Features with Increased Crop-related Diversity for Few-Shot
  Object Detection
Generating Features with Increased Crop-related Diversity for Few-Shot Object Detection
Jingyi Xu
Hieu M. Le
Dimitris Samaras
ObjD
40
26
0
11 Apr 2023
A Comprehensive Survey on Deep Graph Representation Learning
A Comprehensive Survey on Deep Graph Representation Learning
Wei Ju
Zheng Fang
Yiyang Gu
Zequn Liu
Qingqing Long
...
Jingyang Yuan
Yusheng Zhao
Yifan Wang
Xiao Luo
Ming Zhang
GNN
AI4TS
75
142
0
11 Apr 2023
Improving Vision-and-Language Navigation by Generating Future-View Image
  Semantics
Improving Vision-and-Language Navigation by Generating Future-View Image Semantics
Jialu Li
Joey Tianyi Zhou
34
34
0
11 Apr 2023
Can SAM Segment Anything? When SAM Meets Camouflaged Object Detection
Can SAM Segment Anything? When SAM Meets Camouflaged Object Detection
Lv Tang
Haoke Xiao
Bo Li
45
113
0
10 Apr 2023
Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary
  Visual Recognition
Prompt Pre-Training with Twenty-Thousand Classes for Open-Vocabulary Visual Recognition
Shuhuai Ren
Aston Zhang
Yi Zhu
Shuai Zhang
Shuai Zheng
Mu Li
Alexander J. Smola
Xu Sun
VPVLM
VLM
27
28
0
10 Apr 2023
EKILA: Synthetic Media Provenance and Attribution for Generative Art
EKILA: Synthetic Media Provenance and Attribution for Generative Art
Kar Balan
S. Agarwal
Simon Jenni
Andy Parsons
Andrew Gilbert
John Collomosse
30
12
0
10 Apr 2023
DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via
  Word-Region Alignment
DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment
Lewei Yao
Jianhua Han
Xiaodan Liang
Danqian Xu
Wei Zhang
Zhenguo Li
Hang Xu
VLM
ObjD
CLIP
61
74
0
10 Apr 2023
Defense-Prefix for Preventing Typographic Attacks on CLIP
Defense-Prefix for Preventing Typographic Attacks on CLIP
Hiroki Azuma
Yusuke Matsui
VLM
AAML
22
18
0
10 Apr 2023
CAVL: Learning Contrastive and Adaptive Representations of Vision and
  Language
CAVL: Learning Contrastive and Adaptive Representations of Vision and Language
Shentong Mo
Jingfei Xia
Ihor Markevych
CLIP
VLM
35
1
0
10 Apr 2023
Towards Real-time Text-driven Image Manipulation with Unconditional
  Diffusion Models
Towards Real-time Text-driven Image Manipulation with Unconditional Diffusion Models
Nikita Starodubcev
Dmitry Baranchuk
Valentin Khrulkov
Artem Babenko
DiffM
54
4
0
10 Apr 2023
RGB-T Tracking Based on Mixed Attention
RGB-T Tracking Based on Mixed Attention
Yang Luo
Xiqing Guo
Ming Dong
Jin-xia Yu
46
15
0
09 Apr 2023
CrowdCLIP: Unsupervised Crowd Counting via Vision-Language Model
CrowdCLIP: Unsupervised Crowd Counting via Vision-Language Model
Dingkang Liang
Jiahao Xie
Zhikang Zou
Xiaoqing Ye
Wei Xu
Xiang Bai
SSL
CLIP
VLM
39
54
0
09 Apr 2023
Similarity-Aware Multimodal Prompt Learning for Fake News Detection
Similarity-Aware Multimodal Prompt Learning for Fake News Detection
Ye Jiang
Xiaomin Yu
Yimin Wang
Xiaoman Xu
Xingyi Song
Diana Maynard
34
20
0
09 Apr 2023
Token Boosting for Robust Self-Supervised Visual Transformer
  Pre-training
Token Boosting for Robust Self-Supervised Visual Transformer Pre-training
Tianjiao Li
Lin Geng Foo
Ping Hu
Xindi Shang
Hossein Rahmani
Zehuan Yuan
Jing Liu
57
7
0
09 Apr 2023
Progressive Volume Distillation with Active Learning for Efficient NeRF
  Architecture Conversion
Progressive Volume Distillation with Active Learning for Efficient NeRF Architecture Conversion
Shuangkang Fang
Yufeng Wang
Yezhou Yang
Weixin Xu
He-Xuan Wang
Wenrui Ding
Shuchang Zhou
44
7
0
08 Apr 2023
Mitigating Spurious Correlations in Multi-modal Models during
  Fine-tuning
Mitigating Spurious Correlations in Multi-modal Models during Fine-tuning
Yu Yang
Besmira Nushi
Hamid Palangi
Baharan Mirzasoleiman
44
36
0
08 Apr 2023
Harnessing the Spatial-Temporal Attention of Diffusion Models for
  High-Fidelity Text-to-Image Synthesis
Harnessing the Spatial-Temporal Attention of Diffusion Models for High-Fidelity Text-to-Image Synthesis
Qiucheng Wu
Yujian Liu
Handong Zhao
T. Bui
Zhe Lin
Yang Zhang
Shiyu Chang
DiffM
47
45
0
07 Apr 2023
V3Det: Vast Vocabulary Visual Detection Dataset
V3Det: Vast Vocabulary Visual Detection Dataset
Jiaqi Wang
Pan Zhang
Tao Chu
Yuhang Cao
Yujie Zhou
Tong Wu
Bin Wang
Conghui He
Dahua Lin
VLM
ObjD
40
52
0
07 Apr 2023
On Efficient Training of Large-Scale Deep Learning Models: A Literature
  Review
On Efficient Training of Large-Scale Deep Learning Models: A Literature Review
Li Shen
Yan Sun
Zhiyuan Yu
Liang Ding
Xinmei Tian
Dacheng Tao
VLM
37
41
0
07 Apr 2023
Language-aware Multiple Datasets Detection Pretraining for DETRs
Language-aware Multiple Datasets Detection Pretraining for DETRs
Jing Hao
Song Chen
Xiaodi Wang
Shumin Han
ObjD
36
3
0
07 Apr 2023
AI Model Disgorgement: Methods and Choices
AI Model Disgorgement: Methods and Choices
Alessandro Achille
Michael Kearns
Carson Klingenberg
Stefano Soatto
MU
45
11
0
07 Apr 2023
Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting
Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting
Syed Talal Wasim
Muzammal Naseer
Salman Khan
Fahad Shahbaz Khan
M. Shah
VLM
VPVLM
41
75
0
06 Apr 2023
Diffusion Models as Masked Autoencoders
Diffusion Models as Masked Autoencoders
Chen Wei
K. Mangalam
Po-Yao (Bernie) Huang
Yanghao Li
Haoqi Fan
Hu Xu
Huiyu Wang
Cihang Xie
Alan Yuille
Christoph Feichtenhofer
DiffM
SyDa
36
49
0
06 Apr 2023
Synthesizing Anyone, Anywhere, in Any Pose
Synthesizing Anyone, Anywhere, in Any Pose
Håkon Hukkelås
Frank Lindseth
34
4
0
06 Apr 2023
MemeFier: Dual-stage Modality Fusion for Image Meme Classification
MemeFier: Dual-stage Modality Fusion for Image Meme Classification
C. Koutlis
Manos Schinas
Symeon Papadopoulos
21
12
0
06 Apr 2023
Learning Neural Eigenfunctions for Unsupervised Semantic Segmentation
Learning Neural Eigenfunctions for Unsupervised Semantic Segmentation
Zhijie Deng
Yucen Luo
31
6
0
06 Apr 2023
Uncurated Image-Text Datasets: Shedding Light on Demographic Bias
Uncurated Image-Text Datasets: Shedding Light on Demographic Bias
Noa Garcia
Yusuke Hirota
Yankun Wu
Yuta Nakashima
EGVM
51
52
0
06 Apr 2023
Segment Anything
Segment Anything
A. Kirillov
Eric Mintun
Nikhila Ravi
Hanzi Mao
Chloe Rolland
...
Spencer Whitehead
Alexander C. Berg
Wan-Yen Lo
Piotr Dollár
Ross B. Girshick
MLLM
VLM
155
6,901
0
05 Apr 2023
VicTR: Video-conditioned Text Representations for Activity Recognition
VicTR: Video-conditioned Text Representations for Activity Recognition
Kumara Kahatapitiya
Anurag Arnab
Arsha Nagrani
Michael S. Ryoo
47
20
0
05 Apr 2023
Detecting and Grounding Multi-Modal Media Manipulation
Detecting and Grounding Multi-Modal Media Manipulation
Rui Shao
Tianxing Wu
Ziwei Liu
49
60
0
05 Apr 2023
ERRA: An Embodied Representation and Reasoning Architecture for
  Long-horizon Language-conditioned Manipulation Tasks
ERRA: An Embodied Representation and Reasoning Architecture for Long-horizon Language-conditioned Manipulation Tasks
Chao Zhao
Shuai Yuan
Chunli Jiang
Junhao Cai
Hongyu Yu
M. Y. Wang
Qifeng Chen
LM&Ro
37
14
0
05 Apr 2023
Towards Self-Explainability of Deep Neural Networks with Heatmap
  Captioning and Large-Language Models
Towards Self-Explainability of Deep Neural Networks with Heatmap Captioning and Large-Language Models
Osman Tursun
Simon Denman
Sridha Sridharan
Clinton Fookes
ViT
VLM
34
6
0
05 Apr 2023
A Diffusion-based Method for Multi-turn Compositional Image Generation
A Diffusion-based Method for Multi-turn Compositional Image Generation
Chao Wang
DiffM
38
3
0
05 Apr 2023
I2I: Initializing Adapters with Improvised Knowledge
I2I: Initializing Adapters with Improvised Knowledge
Tejas Srinivasan
Furong Jia
Mohammad Rostami
Jesse Thomason
CLL
37
6
0
04 Apr 2023
Revisiting the Evaluation of Image Synthesis with GANs
Revisiting the Evaluation of Image Synthesis with GANs
Mengping Yang
Ceyuan Yang
Yichi Zhang
Qingyan Bai
Yujun Shen
Bo Dai
EGVM
46
7
0
04 Apr 2023
ERM++: An Improved Baseline for Domain Generalization
ERM++: An Improved Baseline for Domain Generalization
Piotr Teterwak
Kuniaki Saito
Theodoros Tsiligkaridis
Kate Saenko
Bryan A. Plummer
OOD
51
9
0
04 Apr 2023
PODIA-3D: Domain Adaptation of 3D Generative Model Across Large Domain
  Gap Using Pose-Preserved Text-to-Image Diffusion
PODIA-3D: Domain Adaptation of 3D Generative Model Across Large Domain Gap Using Pose-Preserved Text-to-Image Diffusion
Gwanghyun Kim
Jinhyun Jang
S. Chun
DiffM
39
13
0
04 Apr 2023
Previous
123...170171172...207208209
Next