Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.00020
Cited By
Learning Transferable Visual Models From Natural Language Supervision
26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Github (29177★)
Papers citing
"Learning Transferable Visual Models From Natural Language Supervision"
50 / 1,722 papers shown
Title
UniFine: A Unified and Fine-grained Approach for Zero-shot Vision-Language Understanding
Rui Sun
Zhecan Wang
Haoxuan You
Noel Codella
Kai-Wei Chang
Shih-Fu Chang
CLIP
100
4
0
03 Jul 2023
Counting Guidance for High Fidelity Text-to-Image Synthesis
Wonjune Kang
Kevin Galim
H. Koo
Nam Ik Cho
DiffM
114
10
0
30 Jun 2023
When Foundation Model Meets Federated Learning: Motivations, Challenges, and Future Directions
Weiming Zhuang
Chen Chen
Lingjuan Lyu
Chong Chen
Yaochu Jin
Lingjuan Lyu
AIFin
AI4CE
195
98
0
27 Jun 2023
Quilt-1M: One Million Image-Text Pairs for Histopathology
Wisdom O. Ikezogwo
M. S. Seyfioglu
Fatemeh Ghezloo
Dylan Stefan Chan Geva
Fatwir Sheikh Mohammed
Pavan Kumar Anand
Ranjay Krishna
Linda G. Shapiro
CLIP
VLM
307
125
0
20 Jun 2023
RedMotion: Motion Prediction via Redundancy Reduction
Royden Wagner
Omer Sahin Tas
Marvin Klemp
Carlos Fernandez Lopez
Christoph Stiller
154
8
0
19 Jun 2023
Investigating Prompting Techniques for Zero- and Few-Shot Visual Question Answering
Rabiul Awal
Le Zhang
Aishwarya Agrawal
LRM
108
13
0
16 Jun 2023
Variational Positive-incentive Noise: How Noise Benefits Models
Hongyuan Zhang
Si-Ying Huang
Yubin Guo
Xuelong Li
68
9
0
13 Jun 2023
Referring Camouflaged Object Detection
Xuying Zhang
Bo Yin
Zheng Lin
Qibin Hou
Deng-Ping Fan
Ming-Ming Cheng
134
18
0
13 Jun 2023
Instruct-ReID: A Multi-purpose Person Re-identification Task with Instructions
Weizhen He
Yihe Deng
Shixiang Tang
Qihao Chen
Qingsong Xie
...
Feng Zhu
Rui Zhao
Wanli Ouyang
Donglian Qi
Yunfeng Yan
117
24
0
13 Jun 2023
Learning to Mask and Permute Visual Tokens for Vision Transformer Pre-Training
Lorenzo Baraldi
Roberto Amoroso
Marcella Cornia
Lorenzo Baraldi
Andrea Pilzer
Rita Cucchiara
137
2
0
12 Jun 2023
Valley: Video Assistant with Large Language model Enhanced abilitY
Ruipu Luo
Ziwang Zhao
Min Yang
Junwei Dong
Da Li
Pengcheng Lu
Tao Wang
Linmei Hu
Ming-Hui Qiu
MLLM
123
209
0
12 Jun 2023
Transferring Foundation Models for Generalizable Robotic Manipulation
Jiange Yang
Wenhui Tan
Chuhao Jin
Keling Yao
Bei Liu
Jianlong Fu
Ruihua Song
Gangshan Wu
Limin Wang
LM&Ro
132
9
0
09 Jun 2023
Learning without Forgetting for Vision-Language Models
Da-Wei Zhou
Yuanhan Zhang
Jingyi Ning
Jingyi Ning
De-Chuan Zhan
De-Chuan Zhan
Ziwei Liu
VLM
CLL
134
44
0
30 May 2023
On the Importance of Backbone to the Adversarial Robustness of Object Detectors
Xiao-Li Li
Hang Chen
Xiaolin Hu
AAML
120
4
0
27 May 2023
Differentially Private Synthetic Data via Foundation Model APIs 1: Images
Zinan Lin
Sivakanth Gopi
Janardhan Kulkarni
Harsha Nori
Sergey Yekhanin
139
44
0
24 May 2023
Generalizable Synthetic Image Detection via Language-guided Contrastive Learning
Haiwei Wu
Jiantao Zhou
Shile Zhang
198
30
0
23 May 2023
Tomography of Quantum States from Structured Measurements via quantum-aware transformer
Hailan Ma
Zhenhong Sun
Daoyi Dong
Chunlin Chen
H. Rabitz
81
4
0
09 May 2023
LASER: A Neuro-Symbolic Framework for Learning Spatial-Temporal Scene Graphs with Weak Supervision
Jiani Huang
Ziyang Li
Mayur Naik
Ser-Nam Lim
153
5
0
15 Apr 2023
Expressive Text-to-Image Generation with Rich Text
Songwei Ge
Taesung Park
Jun-Yan Zhu
Jia-Bin Huang
DiffM
137
82
0
13 Apr 2023
VidStyleODE: Disentangled Video Editing via StyleGAN and NeuralODEs
Moayed Haji-Ali
Andrew Bond
Tolga Birdal
Duygu Ceylan
Levent Karacan
Erkut Erdem
Aykut Erdem
VGen
DiffM
207
2
0
12 Apr 2023
Discriminative Class Tokens for Text-to-Image Diffusion Models
Idan Schwartz
Vésteinn Snaebjarnarson
Hila Chefer
Ryan Cotterell
Serge Belongie
Lior Wolf
Sagie Benaim
81
10
0
30 Mar 2023
Towards Foundation Models and Few-Shot Parameter-Efficient Fine-Tuning for Volumetric Organ Segmentation
Julio Silva-Rodríguez
Jose Dolz
Ismail Ben Ayed
191
14
0
29 Mar 2023
PDPP: Projected Diffusion for Procedure Planning in Instructional Videos
Hanlin Wang
Yilu Wu
Sheng Guo
Limin Wang
VGen
DiffM
165
31
0
26 Mar 2023
Location-Free Scene Graph Generation
Ege Özsoy
Felix Holm
Tobias Czempiel
Tobias Czempiel
Benjamin Busam
Nassir Navab
Benjamin Busam
114
4
0
20 Mar 2023
Deep Learning for Cross-Domain Few-Shot Visual Recognition: A Survey
Huali Xu
Shuaifeng Zhi
Shuzhou Sun
Vishal M. Patel
Li Liu
128
14
0
15 Mar 2023
Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering
Zhou Yu
Xuecheng Ouyang
Zhenwei Shao
Mei Wang
Jun Yu
MLLM
176
11
0
03 Mar 2023
A Text-guided Protein Design Framework
Shengchao Liu
Yanjing Li
Zhuoxinran Li
A. Gitter
Yutao Zhu
...
Arvind Ramanathan
Chaowei Xiao
Jian Tang
Hongyu Guo
Anima Anandkumar
128
70
0
09 Feb 2023
DDS: Decoupled Dynamic Scene-Graph Generation Network
A S M Iftekhar
Raphael Ruschel
Satish Kumar
Suya You
B. S. Manjunath
107
2
0
18 Jan 2023
MVTN: Learning Multi-View Transformations for 3D Understanding
Abdullah Hamdi
Faisal AlZahrani
Silvio Giancola
Guohao Li
3DV
3DPC
132
6
0
27 Dec 2022
SINE: SINgle Image Editing with Text-to-Image Diffusion Models
Zhixing Zhang
Ligong Han
Arna Ghosh
Dimitris N. Metaxas
Jian Ren
DiffM
158
160
0
08 Dec 2022
Ham2Pose: Animating Sign Language Notation into Pose Sequences
Rotem Shalev-Arkushin
Amit Moryossef
Ohad Fried
SLR
83
19
0
24 Nov 2022
PART: Pre-trained Authorship Representation Transformer
Javier Huertas-Tato
Álvaro Huertas-García
Alejandro Martín
123
9
0
30 Sep 2022
MOVE: Effective and Harmless Ownership Verification via Embedded External Features
Yiming Li
Linghui Zhu
Xiaojun Jia
Yang Bai
Yong Jiang
Shutao Xia
Xiaochun Cao
Kui Ren
AAML
91
14
0
04 Aug 2022
Efficient Self-supervised Vision Pretraining with Local Masked Reconstruction
Jun Chen
Ming Hu
Boyang Albert Li
Mohamed Elhoseiny
142
37
0
01 Jun 2022
Vision-Language Pre-Training with Triple Contrastive Learning
Jinyu Yang
Jiali Duan
Son N. Tran
Yi Xu
Sampath Chanda
Liqun Chen
Belinda Zeng
Trishul Chilimbi
Junzhou Huang
VLM
110
297
0
21 Feb 2022
CLIP-Adapter: Better Vision-Language Models with Feature Adapters
Peng Gao
Shijie Geng
Renrui Zhang
Teli Ma
Rongyao Fang
Yongfeng Zhang
Hongsheng Li
Yu Qiao
VLM
CLIP
335
1,053
0
09 Oct 2021
An Information Theory-inspired Strategy for Automatic Network Pruning
Xiawu Zheng
Yuexiao Ma
Teng Xi
Gang Zhang
Errui Ding
Yuchao Li
Jie Chen
Yonghong Tian
Rongrong Ji
181
13
0
19 Aug 2021
Making Pre-trained Language Models Better Few-shot Learners
Tianyu Gao
Adam Fisch
Danqi Chen
404
1,981
0
31 Dec 2020
A Multimodal Framework for the Detection of Hateful Memes
Phillip Lippe
Nithin Holla
Shantanu Chandra
S. Rajamanickam
Georgios Antoniou
Ekaterina Shutova
H. Yannakoudakis
57
74
0
23 Dec 2020
TAP: Text-Aware Pre-training for Text-VQA and Text-Caption
Zhengyuan Yang
Yijuan Lu
Jianfeng Wang
Xi Yin
D. Florêncio
Lijuan Wang
Cha Zhang
Lei Zhang
Jiebo Luo
VLM
99
144
0
08 Dec 2020
Underspecification Presents Challenges for Credibility in Modern Machine Learning
Alexander DÁmour
Katherine A. Heller
D. Moldovan
Ben Adlam
B. Alipanahi
...
Kellie Webster
Steve Yadlowsky
T. Yun
Xiaohua Zhai
D. Sculley
OffRL
143
688
0
06 Nov 2020
A Sober Look at the Unsupervised Learning of Disentangled Representations and their Evaluation
Francesco Locatello
Stefan Bauer
Mario Lucic
Gunnar Rätsch
Sylvain Gelly
Bernhard Schölkopf
Olivier Bachem
OOD
75
70
0
27 Oct 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
684
41,563
0
22 Oct 2020
Contrastive Learning of Medical Visual Representations from Paired Images and Text
Yuhao Zhang
Hang Jiang
Yasuhide Miura
Christopher D. Manning
C. Langlotz
MedIm
150
768
0
02 Oct 2020
ALICE: Active Learning with Contrastive Natural Language Explanations
Weixin Liang
James Zou
Zhou Yu
VLM
91
51
0
22 Sep 2020
Learning Visual Representations with Caption Annotations
Mert Bulent Sariyildiz
J. Perez
Diane Larlus
VLM
SSL
105
161
0
04 Aug 2020
Late Temporal Modeling in 3D CNN Architectures with BERT for Action Recognition
M. E. Kalfaoglu
Sinan Kalkan
A. Aydin Alatan
3DPC
73
142
0
03 Aug 2020
RareAct: A video dataset of unusual interactions
Antoine Miech
Jean-Baptiste Alayrac
Ivan Laptev
Josef Sivic
Andrew Zisserman
VLM
58
25
0
03 Aug 2020
Learning Video Representations from Textual Web Supervision
Jonathan C. Stroud
Zhichao Lu
Chen Sun
Jia Deng
Rahul Sukthankar
Cordelia Schmid
David A. Ross
SSL
91
48
0
29 Jul 2020
Measuring Robustness to Natural Distribution Shifts in Image Classification
Rohan Taori
Achal Dave
Vaishaal Shankar
Nicholas Carlini
Benjamin Recht
Ludwig Schmidt
OOD
126
548
0
01 Jul 2020
Previous
1
2
3
...
31
32
33
34
35
Next