Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.00020
Cited By
Learning Transferable Visual Models From Natural Language Supervision
26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Learning Transferable Visual Models From Natural Language Supervision"
50 / 9,974 papers shown
Title
Weakly-supervised segmentation of referring expressions
Robin Strudel
Ivan Laptev
Cordelia Schmid
22
21
0
10 May 2022
When does dough become a bagel? Analyzing the remaining mistakes on ImageNet
Vijay Vasudevan
Benjamin Caine
Raphael Gontijo-Lopes
Sara Fridovich-Keil
Rebecca Roelofs
VLM
UQCV
46
57
0
09 May 2022
ConvMAE: Masked Convolution Meets Masked Autoencoders
Peng Gao
Teli Ma
Hongsheng Li
Ziyi Lin
Jifeng Dai
Yu Qiao
ViT
19
121
0
08 May 2022
Generating Representative Samples for Few-Shot Classification
Jingyi Xu
Hieu M. Le
VLM
21
61
0
05 May 2022
Language Models Can See: Plugging Visual Controls in Text Generation
Yixuan Su
Tian Lan
Yahui Liu
Fangyu Liu
Dani Yogatama
Yan Wang
Lingpeng Kong
Nigel Collier
VLM
MLLM
46
97
0
05 May 2022
Relational Representation Learning in Visually-Rich Documents
Xin Li
Yan Zheng
Yiqing Hu
H. Cao
Yunfei Wu
Deqiang Jiang
Yinsong Liu
Bo Ren
18
12
0
05 May 2022
P3IV: Probabilistic Procedure Planning from Instructional Videos with Weak Supervision
Henghui Zhao
Isma Hadji
Nikita Dvornik
Konstantinos G. Derpanis
Richard P. Wildes
Allan D. Jepson
31
45
0
04 May 2022
Explain to Not Forget: Defending Against Catastrophic Forgetting with XAI
Sami Ede
Serop Baghdadlian
Leander Weber
A. Nguyen
Dario Zanca
Wojciech Samek
Sebastian Lapuschkin
CLL
27
6
0
04 May 2022
CoCa: Contrastive Captioners are Image-Text Foundation Models
Jiahui Yu
Zirui Wang
Vijay Vasudevan
Legg Yeung
Mojtaba Seyedhosseini
Yonghui Wu
VLM
CLIP
OffRL
79
1,262
0
04 May 2022
All You May Need for VQA are Image Captions
Soravit Changpinyo
Doron Kukliansky
Idan Szpektor
Xi Chen
Nan Ding
Radu Soricut
32
70
0
04 May 2022
i-Code: An Integrative and Composable Multimodal Learning Framework
Ziyi Yang
Yuwei Fang
Chenguang Zhu
Reid Pryzant
Dongdong Chen
...
Bin Xiao
Yuanxun Lu
Takuya Yoshioka
Michael Zeng
Xuedong Huang
40
45
0
03 May 2022
Comparison of CoModGANs, LaMa and GLIDE for Art Inpainting- Completing M.C Escher's Print Gallery
Lucia Cipolina-Kun
Simone Caenazzo
Gaston Mazzei
19
2
0
03 May 2022
Data Determines Distributional Robustness in Contrastive Language Image Pre-training (CLIP)
Alex Fang
Gabriel Ilharco
Mitchell Wortsman
Yu Wan
Vaishaal Shankar
Achal Dave
Ludwig Schmidt
VLM
OOD
33
139
0
03 May 2022
Answer-Me: Multi-Task Open-Vocabulary Visual Question Answering
A. Piergiovanni
Wei Li
Weicheng Kuo
M. Saffar
Fred Bertsch
A. Angelova
17
16
0
02 May 2022
Seeding Diversity into AI Art
Marvin Zammit
Antonios Liapis
Georgios N. Yannakakis
33
4
0
02 May 2022
Visual Spatial Reasoning
Fangyu Liu
Guy Edward Toh Emerson
Nigel Collier
ReLM
42
159
0
30 Apr 2022
CLIP-Art: Contrastive Pre-training for Fine-Grained Art Classification
Marcos V. Conde
Kerem Turgutlu
CLIP
VLM
33
94
0
29 Apr 2022
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLM
VLM
46
3,349
0
29 Apr 2022
PyramidCLIP: Hierarchical Feature Alignment for Vision-language Model Pretraining
Yuting Gao
Jinfeng Liu
Zihan Xu
Jinchao Zhang
Ke Li
Rongrong Ji
Chunhua Shen
VLM
CLIP
29
100
0
29 Apr 2022
Leaner and Faster: Two-Stage Model Compression for Lightweight Text-Image Retrieval
Siyu Ren
Kenny Q. Zhu
VLM
30
7
0
29 Apr 2022
Vision-Language Pre-Training for Boosting Scene Text Detectors
Sibo Song
Jianqiang Wan
Zhibo Yang
Jun Tang
Wenqing Cheng
Xiang Bai
Cong Yao
VLM
44
24
0
29 Apr 2022
CogView2: Faster and Better Text-to-Image Generation via Hierarchical Transformers
Ming Ding
Wendi Zheng
Wenyi Hong
Jie Tang
VLM
41
322
0
28 Apr 2022
MILES: Visual BERT Pre-training with Injected Language Semantics for Video-text Retrieval
Yuying Ge
Yixiao Ge
Xihui Liu
Alex Jinpeng Wang
Jianping Wu
Ying Shan
Xiaohu Qie
Ping Luo
VLM
18
44
0
26 Apr 2022
Causal Transportability for Visual Recognition
Chengzhi Mao
K. Xia
James Wang
Hongya Wang
Junfeng Yang
Elias Bareinboim
Carl Vondrick
CML
OOD
BDL
30
35
0
26 Apr 2022
Contrastive Language-Action Pre-training for Temporal Localization
Mengmeng Xu
Erhan Gundogdu
⋆⋆ Maksim
Guohao Li
M. Donoser
Loris Bazzani
38
27
0
26 Apr 2022
TEMOS: Generating diverse human motions from textual descriptions
Mathis Petrovich
Michael J. Black
Gül Varol
40
373
0
25 Apr 2022
Progressive Learning for Image Retrieval with Hybrid-Modality Queries
Yida Zhao
Yuqing Song
Qin Jin
8
29
0
24 Apr 2022
Can Foundation Models Perform Zero-Shot Task Specification For Robot Manipulation?
Yuchen Cui
S. Niekum
Abhi Gupta
Vikash Kumar
Aravind Rajeswaran
LM&Ro
30
74
0
23 Apr 2022
Training and challenging models for text-guided fashion image retrieval
Eric Dodds
Jack Culpepper
Gaurav Srivastava
18
8
0
23 Apr 2022
MCSE: Multimodal Contrastive Learning of Sentence Embeddings
Miaoran Zhang
Marius Mosbach
David Ifeoluwa Adelani
Michael A. Hedderich
Dietrich Klakow
33
34
0
22 Apr 2022
Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks
Zhecan Wang
Noel Codella
Yen-Chun Chen
Luowei Zhou
Xiyang Dai
...
Jianwei Yang
Haoxuan You
Kai-Wei Chang
Shih-Fu Chang
Lu Yuan
VLM
OffRL
31
22
0
22 Apr 2022
A Taxonomy of Prompt Modifiers for Text-To-Image Generation
J. Oppenlaender
28
102
0
20 Apr 2022
ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models
Chunyuan Li
Haotian Liu
Liunian Harold Li
Pengchuan Zhang
J. Aneja
...
Ping Jin
Houdong Hu
Zicheng Liu
Yong Jae Lee
Jianfeng Gao
32
145
0
19 Apr 2022
VQGAN-CLIP: Open Domain Image Generation and Editing with Natural Language Guidance
Katherine Crowson
Stella Biderman
Daniel Kornis
Dashiell Stander
Eric Hallahan
Louis Castricato
Edward Raff
CLIP
74
369
0
18 Apr 2022
Empirical Evaluation and Theoretical Analysis for Representation Learning: A Survey
Kento Nozawa
Issei Sato
AI4TS
21
4
0
18 Apr 2022
Simultaneous Multiple-Prompt Guided Generation Using Differentiable Optimal Transport
Yingtao Tian
Marco Cuturi
David R Ha
DiffM
OT
40
1
0
18 Apr 2022
StyleT2F: Generating Human Faces from Textual Description Using StyleGAN2
Mohamed Shawky Sabae
Mohamed Ahmed Dardir
Remonda Talaat Eskarous
M. Ebbed
CVBM
35
2
0
17 Apr 2022
Language-Grounded Indoor 3D Semantic Segmentation in the Wild
Dávid Rozenberszki
Or Litany
Angela Dai
3DV
VLM
23
183
0
16 Apr 2022
COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval
Haoyu Lu
Nanyi Fei
Yuqi Huo
Yizhao Gao
Zhiwu Lu
Jiaxin Wen
CLIP
VLM
27
55
0
15 Apr 2022
BYOL for Audio: Exploring Pre-trained General-purpose Audio Representations
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
N. Harada
K. Kashino
SSL
36
53
0
15 Apr 2022
Vision-and-Language Pretrained Models: A Survey
Siqu Long
Feiqi Cao
S. Han
Haiqing Yang
VLM
33
63
0
15 Apr 2022
WikiDiverse: A Multimodal Entity Linking Dataset with Diversified Contextual Topics and Entity Types
Xuwu Wang
Junfeng Tian
Min Gui
Zhixu Li
Rui-cang Wang
Ming Yan
Lihan Chen
Yanghua Xiao
VGen
24
48
0
13 Apr 2022
What Matters in Language Conditioned Robotic Imitation Learning over Unstructured Data
Oier Mees
Lukás Hermann
Wolfram Burgard
LM&Ro
30
149
0
13 Apr 2022
Hierarchical Text-Conditional Image Generation with CLIP Latents
Aditya A. Ramesh
Prafulla Dhariwal
Alex Nichol
Casey Chu
Mark Chen
VLM
DiffM
101
6,660
0
13 Apr 2022
ReCLIP: A Strong Zero-Shot Baseline for Referring Expression Comprehension
Sanjay Subramanian
William Merrill
Trevor Darrell
Matt Gardner
Sameer Singh
Anna Rohrbach
ObjD
33
125
0
12 Apr 2022
MuCoT: Multilingual Contrastive Training for Question-Answering in Low-resource Languages
Gokul Karthik Kumar
Abhishek Singh Gehlot
Sahal Shaji Mullappilly
Karthik Nandakumar
34
13
0
12 Apr 2022
Text-Driven Separation of Arbitrary Sounds
Kevin Kilgour
Beat Gfeller
Qingqing Huang
A. Jansen
Scott Wisdom
Marco Tagliasacchi
30
30
0
12 Apr 2022
CLMLF:A Contrastive Learning and Multi-Layer Fusion Method for Multimodal Sentiment Detection
Zhen Li
Bing Xu
Conghui Zhu
T. Zhao
40
71
0
12 Apr 2022
Are Multimodal Transformers Robust to Missing Modality?
Mengmeng Ma
Jian Ren
Long Zhao
Davide Testuggine
Xi Peng
ViT
33
148
0
12 Apr 2022
XMP-Font: Self-Supervised Cross-Modality Pre-training for Few-Shot Font Generation
Wei Liu
Fangyue Liu
Fei Din
Qian He
Zili Yi
VLM
21
36
0
11 Apr 2022
Previous
1
2
3
...
189
190
191
...
198
199
200
Next