Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.00020
Cited By
Learning Transferable Visual Models From Natural Language Supervision
26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Learning Transferable Visual Models From Natural Language Supervision"
50 / 10,939 papers shown
Title
Ensembling Off-the-shelf Models for GAN Training
Nupur Kumari
Richard Y. Zhang
Eli Shechtman
Jun-Yan Zhu
57
86
0
16 Dec 2021
RegionCLIP: Region-based Language-Image Pretraining
Yiwu Zhong
Jianwei Yang
Pengchuan Zhang
Chunyuan Li
Noel Codella
...
Luowei Zhou
Xiyang Dai
Lu Yuan
Yin Li
Jianfeng Gao
VLM
CLIP
57
565
0
16 Dec 2021
CODER: An efficient framework for improving retrieval through COntextual Document Embedding Reranking
George Zerveas
Navid Rekabsaz
Daniel Cohen
Carsten Eickhoff
44
8
0
16 Dec 2021
Lacuna Reconstruction: Self-supervised Pre-training for Low-Resource Historical Document Transcription
Nikolai Vogler
J. Allen
M. Miller
Taylor Berg-Kirkpatrick
37
5
0
16 Dec 2021
TransZero++: Cross Attribute-Guided Transformer for Zero-Shot Learning
Shiming Chen
Zi-Quan Hong
Wenjin Hou
Guosen Xie
Yibing Song
Jian-jun Zhao
Xinge You
Shuicheng Yan
Ling Shao
ViT
35
44
0
16 Dec 2021
Decoupling Zero-Shot Semantic Segmentation
Jian Ding
Nan Xue
Guisong Xia
Dengxin Dai
VLM
61
190
0
15 Dec 2021
Out-of-Distribution Detection Without Class Labels
Niv Cohen
Ron Abutbul
Yedid Hoshen
OODD
32
11
0
14 Dec 2021
CLIP-Lite: Information Efficient Visual Representation Learning with Language Supervision
A. Shrivastava
Ramprasaath R. Selvaraju
Nikhil Naik
Vicente Ordonez
VLM
CLIP
35
6
0
14 Dec 2021
VL-Adapter: Parameter-Efficient Transfer Learning for Vision-and-Language Tasks
Yi-Lin Sung
Jaemin Cho
Joey Tianyi Zhou
VLM
VPVLM
54
345
0
13 Dec 2021
SAC-GAN: Structure-Aware Image Composition
Hang Zhou
Rui Ma
Ling-Xiao Zhang
Lina Gao
Ali Mahdavi-Amiri
Haotong Zhang
GAN
40
7
0
13 Dec 2021
Shaping Visual Representations with Attributes for Few-Shot Recognition
Haoxing Chen
Huaxiong Li
Yaohui Li
Chunlin Chen
50
7
0
13 Dec 2021
PartGlot: Learning Shape Part Segmentation from Language Reference Games
Juil Koo
Ian Huang
Panos Achlioptas
Leonidas Guibas
Minhyuk Sung
3DPC
57
30
0
13 Dec 2021
Show, Write, and Retrieve: Entity-aware Article Generation and Retrieval
Zhongping Zhang
Yiwen Gu
Bryan A. Plummer
53
2
0
11 Dec 2021
More Control for Free! Image Synthesis with Semantic Diffusion Guidance
Xihui Liu
Dong Huk Park
S. Azadi
Gong Zhang
Arman Chopikyan
Yuxiao Hu
Humphrey Shi
Anna Rohrbach
Trevor Darrell
DiffM
53
253
0
10 Dec 2021
Unified Multimodal Pre-training and Prompt-based Tuning for Vision-Language Understanding and Generation
Tianyi Liu
Zuxuan Wu
Wenhan Xiong
Jingjing Chen
Yu-Gang Jiang
VLM
MLLM
32
10
0
10 Dec 2021
Multimodal Interactions Using Pretrained Unimodal Models for SIMMC 2.0
Joosung Lee
Kijong Han
55
6
0
10 Dec 2021
CLIP2StyleGAN: Unsupervised Extraction of StyleGAN Edit Directions
Rameen Abdal
Peihao Zhu
John C. Femiani
Niloy J. Mitra
Peter Wonka
CLIP
44
104
0
09 Dec 2021
HairCLIP: Design Your Hair by Text and Reference Image
Tianyi Wei
Dongdong Chen
Wenbo Zhou
Jing Liao
Zhentao Tan
Lu Yuan
Weiming Zhang
Nenghai Yu
CLIP
40
108
0
09 Dec 2021
CLIP-NeRF: Text-and-Image Driven Manipulation of Neural Radiance Fields
Can Wang
Menglei Chai
Mingming He
Dongdong Chen
Jing Liao
CLIP
86
381
0
09 Dec 2021
Extending the WILDS Benchmark for Unsupervised Adaptation
Shiori Sagawa
Pang Wei Koh
Tony Lee
Irena Gao
Sang Michael Xie
...
Kate Saenko
Tatsunori Hashimoto
Sergey Levine
Chelsea Finn
Percy Liang
OOD
28
98
0
09 Dec 2021
FLAVA: A Foundational Language And Vision Alignment Model
Amanpreet Singh
Ronghang Hu
Vedanuj Goswami
Guillaume Couairon
Wojciech Galuba
Marcus Rohrbach
Douwe Kiela
CLIP
VLM
42
695
0
08 Dec 2021
Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Nina Shvetsova
Brian Chen
Andrew Rouditchenko
Samuel Thomas
Brian Kingsbury
Rogerio Feris
David Harwath
James R. Glass
Hilde Kuehne
ViT
51
129
0
08 Dec 2021
Grounded Language-Image Pre-training
Liunian Harold Li
Pengchuan Zhang
Haotian Zhang
Jianwei Yang
Chunyuan Li
...
Lu Yuan
Lei Zhang
Lei Li
Kai-Wei Chang
Jianfeng Gao
ObjD
VLM
36
1,036
0
07 Dec 2021
A Generic Approach for Enhancing GANs by Regularized Latent Optimization
Yufan Zhou
Chunyuan Li
Changyou Chen
Jinhui Xu
35
0
0
07 Dec 2021
CALVIN: A Benchmark for Language-Conditioned Policy Learning for Long-Horizon Robot Manipulation Tasks
Oier Mees
Lukás Hermann
Erick Rosete-Beas
Wolfram Burgard
LM&Ro
41
245
0
06 Dec 2021
Text2Mesh: Text-Driven Neural Stylization for Meshes
O. Michel
Roi Bar-On
Richard Liu
Sagie Benaim
Rana Hanocka
CLIP
AI4CE
228
356
0
06 Dec 2021
Semantic Segmentation In-the-Wild Without Seeing Any Segmentation Examples
Nir Zabari
Yedid Hoshen
VLM
38
26
0
06 Dec 2021
Embedding Arithmetic of Multimodal Queries for Image Retrieval
Guillaume Couairon
Matthieu Cord
Matthijs Douze
Holger Schwenk
43
24
0
06 Dec 2021
General Facial Representation Learning in a Visual-Linguistic Manner
Yinglin Zheng
Hao Yang
Ting Zhang
Jianmin Bao
Dongdong Chen
Yangyu Huang
Lu Yuan
Dong Chen
Ming Zeng
Fang Wen
CVBM
148
166
0
06 Dec 2021
Joint Learning of Localized Representations from Medical Images and Reports
Philipp Muller
Georgios Kaissis
Cong Zou
Daniel Munich
140
81
0
06 Dec 2021
Forward Compatible Training for Large-Scale Embedding Retrieval Systems
Vivek Ramanujan
Pavan Kumar Anasosalu Vasu
Ali Farhadi
Oncel Tuzel
Hadi Pouransari
VLM
41
17
0
06 Dec 2021
Open Vocabulary Electroencephalography-To-Text Decoding and Zero-shot Sentiment Classification
Zhenhailong Wang
Heng Ji
92
74
0
05 Dec 2021
VT-CLIP: Enhancing Vision-Language Models with Visual-guided Texts
Longtian Qiu
Renrui Zhang
Ziyu Guo
Wei Zhang
Zilu Guo
Ziyao Zeng
Guangnan Zhang
VLM
CLIP
36
45
0
04 Dec 2021
LAVT: Language-Aware Vision Transformer for Referring Image Segmentation
Zhao Yang
Jiaqi Wang
Yansong Tang
Kai-xiang Chen
Hengshuang Zhao
Philip Torr
157
315
0
04 Dec 2021
SemanticStyleGAN: Learning Compositional Generative Priors for Controllable Image Synthesis and Editing
Yichun Shi
Xiao Yang
Yangyue Wan
Xiaohui Shen
GAN
162
85
0
04 Dec 2021
FuseDream: Training-Free Text-to-Image Generation with Improved CLIP+GAN Space Optimization
Xingchao Liu
Chengyue Gong
Lemeng Wu
Shujian Zhang
Haoran Su
Qiang Liu
CLIP
40
90
0
02 Dec 2021
Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks
Xizhou Zhu
Jinguo Zhu
Hao Li
Xiaoshi Wu
Xiaogang Wang
Hongsheng Li
Xiaohua Wang
Jifeng Dai
56
129
0
02 Dec 2021
DenseCLIP: Language-Guided Dense Prediction with Context-Aware Prompting
Yongming Rao
Wenliang Zhao
Guangyi Chen
Yansong Tang
Zheng Zhu
Guan Huang
Jie Zhou
Jiwen Lu
VLM
CLIP
119
559
0
02 Dec 2021
Video-Text Pre-training with Learned Regions
Rui Yan
Mike Zheng Shou
Yixiao Ge
Alex Jinpeng Wang
Xudong Lin
Guanyu Cai
Jinhui Tang
49
23
0
02 Dec 2021
Extract Free Dense Labels from CLIP
Chong Zhou
Chen Change Loy
Bo Dai
VLM
CLIP
90
463
0
02 Dec 2021
Editing a classifier by rewriting its prediction rules
Shibani Santurkar
Dimitris Tsipras
Mahalaxmi Elango
David Bau
Antonio Torralba
Aleksander Madry
KELM
199
89
0
02 Dec 2021
Object-Centric Unsupervised Image Captioning
Zihang Meng
David Yang
Xuefei Cao
Ashish Shah
Ser-Nam Lim
OCL
VLM
32
12
0
02 Dec 2021
The Augmented Image Prior: Distilling 1000 Classes by Extrapolating from a Single Image
Yuki M. Asano
Aaqib Saeed
50
7
0
01 Dec 2021
RegNeRF: Regularizing Neural Radiance Fields for View Synthesis from Sparse Inputs
Michael Niemeyer
Jonathan T. Barron
B. Mildenhall
Mehdi S. M. Sajjadi
Andreas Geiger
Noha Radwan
53
587
0
01 Dec 2021
Object-aware Video-language Pre-training for Retrieval
Alex Jinpeng Wang
Yixiao Ge
Guanyu Cai
Rui Yan
Xudong Lin
Ying Shan
Xiaohu Qie
Mike Zheng Shou
ViT
VLM
39
79
0
01 Dec 2021
MAD: A Scalable Dataset for Language Grounding in Videos from Movie Audio Descriptions
Mattia Soldan
Alejandro Pardo
Juan Carlos León Alcázar
Fabian Caba Heilbron
Chen Zhao
Silvio Giancola
Guohao Li
VGen
67
98
0
01 Dec 2021
CLIPstyler: Image Style Transfer with a Single Text Condition
Gihyun Kwon
Jong Chul Ye
VLM
CLIP
49
241
0
01 Dec 2021
Open-Domain, Content-based, Multi-modal Fact-checking of Out-of-Context Images via Online Resources
Sahar Abdelnabi
Rakibul Hasan
Mario Fritz
40
76
0
30 Nov 2021
Task2Sim : Towards Effective Pre-training and Transfer from Synthetic Data
Samarth Mishra
Yikang Shen
Cheng Perng Phoo
Chun-Fu Chen
Leonid Karlinsky
Kate Saenko
Venkatesh Saligrama
Rogerio Feris
31
37
0
30 Nov 2021
HyperStyle: StyleGAN Inversion with HyperNetworks for Real Image Editing
Yuval Alaluf
Omer Tov
Ron Mokady
Rinon Gal
Amit H. Bermano
62
267
0
30 Nov 2021
Previous
1
2
3
...
212
213
214
...
217
218
219
Next