Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.00020
Cited By
Learning Transferable Visual Models From Natural Language Supervision
26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Learning Transferable Visual Models From Natural Language Supervision"
50 / 9,787 papers shown
Title
Mutual Information Divergence: A Unified Metric for Multimodal Generative Models
Jin-Hwa Kim
Yunji Kim
Jiyoung Lee
Kang Min Yoo
Sang-Woo Lee
EGVM
36
32
0
25 May 2022
The Dialog Must Go On: Improving Visual Dialog via Generative Self-Training
Gi-Cheon Kang
Sungdong Kim
Jin-Hwa Kim
Donghyun Kwak
Byoung-Tak Zhang
32
10
0
25 May 2022
End-to-End Multimodal Fact-Checking and Explanation Generation: A Challenging Dataset and Models
Barry Menglong Yao
Aditya Shah
Lichao Sun
Jin-Hee Cho
Lifu Huang
MLLM
LRM
46
78
0
25 May 2022
ATTEMPT: Parameter-Efficient Multi-task Tuning via Attentional Mixtures of Soft Prompts
Akari Asai
Mohammadreza Salehi
Matthew E. Peters
Hannaneh Hajishirzi
130
100
0
24 May 2022
M6-Fashion: High-Fidelity Multi-modal Image Generation and Editing
Zhikang Li
Huiling Zhou
Shuai Bai
Peike Li
Chang Zhou
Hongxia Yang
34
4
0
24 May 2022
On Advances in Text Generation from Images Beyond Captioning: A Case Study in Self-Rationalization
Shruti Palaskar
Akshita Bhagia
Yonatan Bisk
Florian Metze
A. Black
Ana Marasović
18
4
0
24 May 2022
Utilizing Language-Image Pretraining for Efficient and Robust Bilingual Word Alignment
Tuan Dinh
Jy-yong Sohn
Shashank Rajput
Timothy Ossowski
Yifei Ming
Junjie Hu
Dimitris Papailiopoulos
Kangwook Lee
28
0
0
23 May 2022
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
Chitwan Saharia
William Chan
Saurabh Saxena
Lala Li
Jay Whang
...
Raphael Gontijo-Lopes
Tim Salimans
Jonathan Ho
David J Fleet
Mohammad Norouzi
VLM
66
5,778
0
23 May 2022
Decoder Denoising Pretraining for Semantic Segmentation
Emmanuel B. Asiedu
Simon Kornblith
Ting Chen
Niki Parmar
Matthias Minderer
Mohammad Norouzi
AI4CE
196
26
0
23 May 2022
Markedness in Visual Semantic AI
Robert Wolfe
Aylin Caliskan
VLM
27
35
0
23 May 2022
GR-GAN: Gradual Refinement Text-to-image Generation
Bo Yang
Fangxiang Feng
Xiaojie Wang
EGVM
13
7
0
23 May 2022
PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Models
Yuan Yao
Qi-An Chen
Ao Zhang
Wei Ji
Zhiyuan Liu
Tat-Seng Chua
Maosong Sun
VLM
MLLM
26
38
0
23 May 2022
Covariance Matrix Adaptation MAP-Annealing
Matthew C. Fontaine
Stefanos Nikolaidis
45
25
0
22 May 2022
Visually-Augmented Language Modeling
Weizhi Wang
Li Dong
Hao Cheng
Haoyu Song
Xiaodong Liu
Xifeng Yan
Jianfeng Gao
Furu Wei
VLM
36
18
0
20 May 2022
The developmental trajectory of object recognition robustness: children are like small adults but unlike big deep neural networks
Lukas Huber
Robert Geirhos
Felix Wichmann
54
16
0
20 May 2022
RankGen: Improving Text Generation with Large Ranking Models
Kalpesh Krishna
Yapei Chang
John Wieting
Mohit Iyyer
AIMat
24
68
0
19 May 2022
Voxel-informed Language Grounding
Rodolfo Corona
Shizhan Zhu
Dan Klein
Trevor Darrell
138
11
0
19 May 2022
TransTab: Learning Transferable Tabular Transformers Across Tables
Zifeng Wang
Jimeng Sun
LMTD
36
135
0
19 May 2022
AvatarCLIP: Zero-Shot Text-Driven Generation and Animation of 3D Avatars
Fangzhou Hong
Mingyuan Zhang
Liang Pan
Zhongang Cai
Lei Yang
Ziwei Liu
CLIP
98
79
0
17 May 2022
A CLIP-Hitchhiker's Guide to Long Video Retrieval
Max Bain
Arsha Nagrani
Gül Varol
Andrew Zisserman
CLIP
129
62
0
17 May 2022
Sparse Visual Counterfactual Explanations in Image Space
Valentyn Boreiko
Maximilian Augustin
Francesco Croce
Philipp Berens
Matthias Hein
BDL
CML
30
26
0
16 May 2022
Deep Spectral Methods: A Surprisingly Strong Baseline for Unsupervised Semantic Segmentation and Localization
Luke Melas-Kyriazi
Christian Rupprecht
Iro Laina
Andrea Vedaldi
30
159
0
16 May 2022
On the Difficulty of Defending Self-Supervised Learning against Model Extraction
Adam Dziedzic
Nikita Dhawan
Muhammad Ahmad Kaleem
Jonas Guan
Nicolas Papernot
MIACV
54
22
0
16 May 2022
Aligning Robot Representations with Humans
Andreea Bobu
Andi Peng
27
0
0
15 May 2022
Breaking with Fixed Set Pathology Recognition through Report-Guided Contrastive Training
C. Seibold
Simon Reiß
M. Sarfraz
Rainer Stiefelhagen
Jens Kleesiek
18
31
0
14 May 2022
Multimodal Conversational AI: A Survey of Datasets and Approaches
Anirudh S. Sundar
Larry Heck
38
29
0
13 May 2022
A Comprehensive Survey of Few-shot Learning: Evolution, Applications, Challenges, and Opportunities
Yisheng Song
Ting-Yuan Wang
S. Mondal
J. P. Sahoo
SLR
50
344
0
13 May 2022
The Creativity of Text-to-Image Generation
J. Oppenlaender
25
192
0
13 May 2022
PoisonedEncoder: Poisoning the Unlabeled Pre-training Data in Contrastive Learning
Hongbin Liu
Jinyuan Jia
Neil Zhenqiang Gong
25
34
0
13 May 2022
What's in a Caption? Dataset-Specific Linguistic Diversity and Its Effect on Visual Description Models and Metrics
David M. Chan
Austin Myers
Sudheendra Vijayanarasimhan
David A. Ross
Bryan Seybold
John F. Canny
28
6
0
12 May 2022
Simple Open-Vocabulary Object Detection with Vision Transformers
Matthias Minderer
A. Gritsenko
Austin Stone
Maxim Neumann
Dirk Weissenborn
...
Zhuoran Shen
Tianlin Li
Xiaohua Zhai
Thomas Kipf
N. Houlsby
ObjD
CLIP
VLM
ViT
OCL
34
307
0
12 May 2022
The Mechanism of Prediction Head in Non-contrastive Self-supervised Learning
Zixin Wen
Yuanzhi Li
SSL
27
34
0
12 May 2022
Automated Audio Captioning: An Overview of Recent Progress and New Challenges
Xinhao Mei
Xubo Liu
Mark D. Plumbley
Wenwu Wang
29
37
0
12 May 2022
Deep Learning and Synthetic Media
Raphaël Millière
23
18
0
11 May 2022
Learning to Retrieve Videos by Asking Questions
Avinash Madasu
Junier Oliva
Gedas Bertasius
VGen
32
16
0
11 May 2022
DISARM: Detecting the Victims Targeted by Harmful Memes
Shivam Sharma
Md. Shad Akhtar
Preslav Nakov
Tanmoy Chakraborty
13
29
0
11 May 2022
Learning to Answer Visual Questions from Web Videos
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
ViT
34
33
0
10 May 2022
Transformer-based Cross-Modal Recipe Embeddings with Large Batch Training
Jing Yang
Junwen Chen
Keiji Yanai
ViT
21
5
0
10 May 2022
Weakly-supervised segmentation of referring expressions
Robin Strudel
Ivan Laptev
Cordelia Schmid
22
21
0
10 May 2022
When does dough become a bagel? Analyzing the remaining mistakes on ImageNet
Vijay Vasudevan
Benjamin Caine
Raphael Gontijo-Lopes
Sara Fridovich-Keil
Rebecca Roelofs
VLM
UQCV
46
57
0
09 May 2022
ConvMAE: Masked Convolution Meets Masked Autoencoders
Peng Gao
Teli Ma
Hongsheng Li
Ziyi Lin
Jifeng Dai
Yu Qiao
ViT
19
121
0
08 May 2022
Generating Representative Samples for Few-Shot Classification
Jingyi Xu
Hieu M. Le
VLM
21
61
0
05 May 2022
Language Models Can See: Plugging Visual Controls in Text Generation
Yixuan Su
Tian Lan
Yahui Liu
Fangyu Liu
Dani Yogatama
Yan Wang
Lingpeng Kong
Nigel Collier
VLM
MLLM
46
97
0
05 May 2022
Relational Representation Learning in Visually-Rich Documents
Xin Li
Yan Zheng
Yiqing Hu
H. Cao
Yunfei Wu
Deqiang Jiang
Yinsong Liu
Bo Ren
18
12
0
05 May 2022
P3IV: Probabilistic Procedure Planning from Instructional Videos with Weak Supervision
Henghui Zhao
Isma Hadji
Nikita Dvornik
Konstantinos G. Derpanis
Richard P. Wildes
Allan D. Jepson
31
45
0
04 May 2022
Explain to Not Forget: Defending Against Catastrophic Forgetting with XAI
Sami Ede
Serop Baghdadlian
Leander Weber
A. Nguyen
Dario Zanca
Wojciech Samek
Sebastian Lapuschkin
CLL
27
6
0
04 May 2022
CoCa: Contrastive Captioners are Image-Text Foundation Models
Jiahui Yu
Zirui Wang
Vijay Vasudevan
Legg Yeung
Mojtaba Seyedhosseini
Yonghui Wu
VLM
CLIP
OffRL
79
1,256
0
04 May 2022
All You May Need for VQA are Image Captions
Soravit Changpinyo
Doron Kukliansky
Idan Szpektor
Xi Chen
Nan Ding
Radu Soricut
32
70
0
04 May 2022
i-Code: An Integrative and Composable Multimodal Learning Framework
Ziyi Yang
Yuwei Fang
Chenguang Zhu
Reid Pryzant
Dongdong Chen
...
Bin Xiao
Yuanxun Lu
Takuya Yoshioka
Michael Zeng
Xuedong Huang
40
45
0
03 May 2022
Comparison of CoModGANs, LaMa and GLIDE for Art Inpainting- Completing M.C Escher's Print Gallery
Lucia Cipolina-Kun
Simone Caenazzo
Gaston Mazzei
19
2
0
03 May 2022
Previous
1
2
3
...
185
186
187
...
194
195
196
Next