ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2103.00020
  4. Cited By
Learning Transferable Visual Models From Natural Language Supervision

Learning Transferable Visual Models From Natural Language Supervision

26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
    CLIP
    VLM
ArXivPDFHTML

Papers citing "Learning Transferable Visual Models From Natural Language Supervision"

50 / 10,312 papers shown
Title
Soundify: Matching Sound Effects to Video
Soundify: Matching Sound Effects to Video
David Chuan-En Lin
Anastasis Germanidis
Cristobal Valenzuela
Yining Shi
Nikolas Martelaro
30
16
0
17 Dec 2021
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
Align and Prompt: Video-and-Language Pre-training with Entity Prompts
Dongxu Li
Junnan Li
Hongdong Li
Juan Carlos Niebles
Guosheng Lin
28
191
0
17 Dec 2021
Contrastive Vision-Language Pre-training with Limited Resources
Contrastive Vision-Language Pre-training with Limited Resources
Quan Cui
Boyan Zhou
Yu Guo
Weidong Yin
Hao Wu
Osamu Yoshie
Yubo Chen
VLM
CLIP
19
33
0
17 Dec 2021
Ensembling Off-the-shelf Models for GAN Training
Ensembling Off-the-shelf Models for GAN Training
Nupur Kumari
Richard Y. Zhang
Eli Shechtman
Jun-Yan Zhu
39
86
0
16 Dec 2021
RegionCLIP: Region-based Language-Image Pretraining
RegionCLIP: Region-based Language-Image Pretraining
Yiwu Zhong
Jianwei Yang
Pengchuan Zhang
Chunyuan Li
Noel Codella
...
Luowei Zhou
Xiyang Dai
Lu Yuan
Yin Li
Jianfeng Gao
VLM
CLIP
40
557
0
16 Dec 2021
CODER: An efficient framework for improving retrieval through COntextual
  Document Embedding Reranking
CODER: An efficient framework for improving retrieval through COntextual Document Embedding Reranking
George Zerveas
Navid Rekabsaz
Daniel Cohen
Carsten Eickhoff
31
8
0
16 Dec 2021
Lacuna Reconstruction: Self-supervised Pre-training for Low-Resource
  Historical Document Transcription
Lacuna Reconstruction: Self-supervised Pre-training for Low-Resource Historical Document Transcription
Nikolai Vogler
J. Allen
M. Miller
Taylor Berg-Kirkpatrick
32
5
0
16 Dec 2021
TransZero++: Cross Attribute-Guided Transformer for Zero-Shot Learning
TransZero++: Cross Attribute-Guided Transformer for Zero-Shot Learning
Shiming Chen
Zi-Quan Hong
Wenjin Hou
Guosen Xie
Yibing Song
Jian-jun Zhao
Xinge You
Shuicheng Yan
Ling Shao
ViT
19
44
0
16 Dec 2021
Decoupling Zero-Shot Semantic Segmentation
Decoupling Zero-Shot Semantic Segmentation
Jian Ding
Nan Xue
Guisong Xia
Dengxin Dai
VLM
56
190
0
15 Dec 2021
Out-of-Distribution Detection Without Class Labels
Out-of-Distribution Detection Without Class Labels
Niv Cohen
Ron Abutbul
Yedid Hoshen
OODD
29
10
0
14 Dec 2021
CLIP-Lite: Information Efficient Visual Representation Learning with
  Language Supervision
CLIP-Lite: Information Efficient Visual Representation Learning with Language Supervision
A. Shrivastava
Ramprasaath R. Selvaraju
Nikhil Naik
Vicente Ordonez
VLM
CLIP
30
6
0
14 Dec 2021
VL-Adapter: Parameter-Efficient Transfer Learning for
  Vision-and-Language Tasks
VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks
Yi-Lin Sung
Jaemin Cho
Joey Tianyi Zhou
VLM
VPVLM
37
344
0
13 Dec 2021
SAC-GAN: Structure-Aware Image Composition
SAC-GAN: Structure-Aware Image Composition
Hang Zhou
Rui Ma
Ling-Xiao Zhang
Lina Gao
Ali Mahdavi-Amiri
Haotong Zhang
GAN
40
7
0
13 Dec 2021
Shaping Visual Representations with Attributes for Few-Shot Recognition
Shaping Visual Representations with Attributes for Few-Shot Recognition
Haoxing Chen
Huaxiong Li
Yaohui Li
Chunlin Chen
34
7
0
13 Dec 2021
PartGlot: Learning Shape Part Segmentation from Language Reference Games
PartGlot: Learning Shape Part Segmentation from Language Reference Games
Juil Koo
Ian Huang
Panos Achlioptas
Leonidas J. Guibas
Minhyuk Sung
3DPC
40
28
0
13 Dec 2021
Show, Write, and Retrieve: Entity-aware Article Generation and Retrieval
Show, Write, and Retrieve: Entity-aware Article Generation and Retrieval
Zhongping Zhang
Yiwen Gu
Bryan A. Plummer
48
2
0
11 Dec 2021
More Control for Free! Image Synthesis with Semantic Diffusion Guidance
More Control for Free! Image Synthesis with Semantic Diffusion Guidance
Xihui Liu
Dong Huk Park
S. Azadi
Gong Zhang
Arman Chopikyan
Yuxiao Hu
Humphrey Shi
Anna Rohrbach
Trevor Darrell
DiffM
48
252
0
10 Dec 2021
Unified Multimodal Pre-training and Prompt-based Tuning for
  Vision-Language Understanding and Generation
Unified Multimodal Pre-training and Prompt-based Tuning for Vision-Language Understanding and Generation
Tianyi Liu
Zuxuan Wu
Wenhan Xiong
Jingjing Chen
Yu-Gang Jiang
VLM
MLLM
32
10
0
10 Dec 2021
Multimodal Interactions Using Pretrained Unimodal Models for SIMMC 2.0
Multimodal Interactions Using Pretrained Unimodal Models for SIMMC 2.0
Joosung Lee
Kijong Han
45
6
0
10 Dec 2021
CLIP2StyleGAN: Unsupervised Extraction of StyleGAN Edit Directions
CLIP2StyleGAN: Unsupervised Extraction of StyleGAN Edit Directions
Rameen Abdal
Peihao Zhu
John C. Femiani
Niloy J. Mitra
Peter Wonka
CLIP
39
103
0
09 Dec 2021
HairCLIP: Design Your Hair by Text and Reference Image
HairCLIP: Design Your Hair by Text and Reference Image
Tianyi Wei
Dongdong Chen
Wenbo Zhou
Jing Liao
Zhentao Tan
Lu Yuan
Weiming Zhang
Nenghai Yu
CLIP
35
108
0
09 Dec 2021
CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields
CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields
Can Wang
Menglei Chai
Mingming He
Dongdong Chen
Jing Liao
CLIP
51
379
0
09 Dec 2021
FLAVA: A Foundational Language And Vision Alignment Model
FLAVA: A Foundational Language And Vision Alignment Model
Amanpreet Singh
Ronghang Hu
Vedanuj Goswami
Guillaume Couairon
Wojciech Galuba
Marcus Rohrbach
Douwe Kiela
CLIP
VLM
40
691
0
08 Dec 2021
Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Nina Shvetsova
Brian Chen
Andrew Rouditchenko
Samuel Thomas
Brian Kingsbury
Rogerio Feris
David Harwath
James R. Glass
Hilde Kuehne
ViT
34
129
0
08 Dec 2021
Grounded Language-Image Pre-training
Grounded Language-Image Pre-training
Liunian Harold Li
Pengchuan Zhang
Haotian Zhang
Jianwei Yang
Chunyuan Li
...
Lu Yuan
Lei Zhang
Lei Li
Kai-Wei Chang
Jianfeng Gao
ObjD
VLM
33
1,025
0
07 Dec 2021
A Generic Approach for Enhancing GANs by Regularized Latent Optimization
A Generic Approach for Enhancing GANs by Regularized Latent Optimization
Yufan Zhou
Chunyuan Li
Changyou Chen
Jinhui Xu
27
0
0
07 Dec 2021
Text2Mesh: Text-Driven Neural Stylization for Meshes
Text2Mesh: Text-Driven Neural Stylization for Meshes
O. Michel
Roi Bar-On
Richard Liu
Sagie Benaim
Rana Hanocka
CLIP
AI4CE
226
353
0
06 Dec 2021
Semantic Segmentation In-the-Wild Without Seeing Any Segmentation
  Examples
Semantic Segmentation In-the-Wild Without Seeing Any Segmentation Examples
Nir Zabari
Yedid Hoshen
VLM
35
26
0
06 Dec 2021
Embedding Arithmetic of Multimodal Queries for Image Retrieval
Embedding Arithmetic of Multimodal Queries for Image Retrieval
Guillaume Couairon
Matthieu Cord
Matthijs Douze
Holger Schwenk
40
23
0
06 Dec 2021
General Facial Representation Learning in a Visual-Linguistic Manner
General Facial Representation Learning in a Visual-Linguistic Manner
Yinglin Zheng
Hao Yang
Ting Zhang
Jianmin Bao
Dongdong Chen
Yangyu Huang
Lu Yuan
Dong Chen
Ming Zeng
Fang Wen
CVBM
146
164
0
06 Dec 2021
Joint Learning of Localized Representations from Medical Images and
  Reports
Joint Learning of Localized Representations from Medical Images and Reports
Philipp Muller
Georgios Kaissis
Cong Zou
Daniel Munich
140
81
0
06 Dec 2021
Forward Compatible Training for Large-Scale Embedding Retrieval Systems
Forward Compatible Training for Large-Scale Embedding Retrieval Systems
Vivek Ramanujan
Pavan Kumar Anasosalu Vasu
Ali Farhadi
Oncel Tuzel
Hadi Pouransari
VLM
32
16
0
06 Dec 2021
Open Vocabulary Electroencephalography-To-Text Decoding and Zero-shot
  Sentiment Classification
Open Vocabulary Electroencephalography-To-Text Decoding and Zero-shot Sentiment Classification
Zhenhailong Wang
Heng Ji
92
73
0
05 Dec 2021
VT-CLIP: Enhancing Vision-Language Models with Visual-guided Texts
VT-CLIP: Enhancing Vision-Language Models with Visual-guided Texts
Longtian Qiu
Renrui Zhang
Ziyu Guo
Wei Zhang
Zilu Guo
Ziyao Zeng
Guangnan Zhang
VLM
CLIP
30
45
0
04 Dec 2021
LAVT: Language-Aware Vision Transformer for Referring Image Segmentation
LAVT: Language-Aware Vision Transformer for Referring Image Segmentation
Zhao Yang
Jiaqi Wang
Yansong Tang
Kai-xiang Chen
Hengshuang Zhao
Philip Torr
148
310
0
04 Dec 2021
SemanticStyleGAN: Learning Compositional Generative Priors for
  Controllable Image Synthesis and Editing
SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing
Yichun Shi
Xiao Yang
Yangyue Wan
Xiaohui Shen
GAN
151
83
0
04 Dec 2021
FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN
  Space Optimization
FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space Optimization
Xingchao Liu
Chengyue Gong
Lemeng Wu
Shujian Zhang
Haoran Su
Qiang Liu
CLIP
35
89
0
02 Dec 2021
Uni-Perceiver: Pre-training Unified Architecture for Generic Perception
  for Zero-shot and Few-shot Tasks
Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks
Xizhou Zhu
Jinguo Zhu
Hao Li
Xiaoshi Wu
Xiaogang Wang
Hongsheng Li
Xiaohua Wang
Jifeng Dai
56
129
0
02 Dec 2021
DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting
DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting
Yongming Rao
Wenliang Zhao
Guangyi Chen
Yansong Tang
Zheng Zhu
Guan Huang
Jie Zhou
Jiwen Lu
VLM
CLIP
94
553
0
02 Dec 2021
Video-Text Pre-training with Learned Regions
Video-Text Pre-training with Learned Regions
Rui Yan
Mike Zheng Shou
Yixiao Ge
Alex Jinpeng Wang
Xudong Lin
Guanyu Cai
Jinhui Tang
33
23
0
02 Dec 2021
Extract Free Dense Labels from CLIP
Extract Free Dense Labels from CLIP
Chong Zhou
Chen Change Loy
Bo Dai
VLM
CLIP
61
457
0
02 Dec 2021
Editing a classifier by rewriting its prediction rules
Editing a classifier by rewriting its prediction rules
Shibani Santurkar
Dimitris Tsipras
Mahalaxmi Elango
David Bau
Antonio Torralba
Aleksander Madry
KELM
186
89
0
02 Dec 2021
Object-Centric Unsupervised Image Captioning
Object-Centric Unsupervised Image Captioning
Zihang Meng
David Yang
Xuefei Cao
Ashish Shah
Ser-Nam Lim
OCL
VLM
27
11
0
02 Dec 2021
The Augmented Image Prior: Distilling 1000 Classes by Extrapolating from
  a Single Image
The Augmented Image Prior: Distilling 1000 Classes by Extrapolating from a Single Image
Yuki M. Asano
Aaqib Saeed
50
7
0
01 Dec 2021
RegNeRF: Regularizing Neural Radiance Fields for View Synthesis from
  Sparse Inputs
RegNeRF: Regularizing Neural Radiance Fields for View Synthesis from Sparse Inputs
Michael Niemeyer
Jonathan T. Barron
B. Mildenhall
Mehdi S. M. Sajjadi
Andreas Geiger
Noha Radwan
51
581
0
01 Dec 2021
Object-aware Video-language Pre-training for Retrieval
Object-aware Video-language Pre-training for Retrieval
Alex Jinpeng Wang
Yixiao Ge
Guanyu Cai
Rui Yan
Xudong Lin
Ying Shan
Xiaohu Qie
Mike Zheng Shou
ViT
VLM
25
79
0
01 Dec 2021
MAD: A Scalable Dataset for Language Grounding in Videos from Movie
  Audio Descriptions
MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions
Mattia Soldan
Alejandro Pardo
Juan Carlos León Alcázar
Fabian Caba Heilbron
Chen Zhao
Silvio Giancola
Guohao Li
VGen
64
96
0
01 Dec 2021
CLIPstyler: Image Style Transfer with a Single Text Condition
CLIPstyler: Image Style Transfer with a Single Text Condition
Gihyun Kwon
Jong Chul Ye
VLM
CLIP
44
240
0
01 Dec 2021
Open-Domain, Content-based, Multi-modal Fact-checking of Out-of-Context
  Images via Online Resources
Open-Domain, Content-based, Multi-modal Fact-checking of Out-of-Context Images via Online Resources
Sahar Abdelnabi
Rakibul Hasan
Mario Fritz
26
74
0
30 Nov 2021
Task2Sim : Towards Effective Pre-training and Transfer from Synthetic
  Data
Task2Sim : Towards Effective Pre-training and Transfer from Synthetic Data
Samarth Mishra
Yikang Shen
Cheng Perng Phoo
Chun-Fu Chen
Leonid Karlinsky
Kate Saenko
Venkatesh Saligrama
Rogerio Feris
26
35
0
30 Nov 2021
Previous
123...200201202...205206207
Next