Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.00020
Cited By
Learning Transferable Visual Models From Natural Language Supervision
26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Learning Transferable Visual Models From Natural Language Supervision"
50 / 10,312 papers shown
Title
CiT: Curation in Training for Effective Vision-Language Data
Hu Xu
Saining Xie
Po-Yao (Bernie) Huang
Licheng Yu
Russ Howes
Gargi Ghosh
Luke Zettlemoyer
Christoph Feichtenhofer
VLM
DiffM
33
25
0
05 Jan 2023
All in Tokens: Unifying Output Space of Visual Tasks via Soft Token
Jia Ning
Chen Li
Zheng-Wei Zhang
Zigang Geng
Qi Dai
Kun He
Han Hu
56
44
0
05 Jan 2023
ANNA: Abstractive Text-to-Image Synthesis with Filtered News Captions
Aashish Anantha Ramakrishnan
Sharon X. Huang
Dongwon Lee
29
5
0
05 Jan 2023
Test of Time: Instilling Video-Language Models with a Sense of Time
Piyush Bagad
Makarand Tapaswi
Cees G. M. Snoek
92
36
0
05 Jan 2023
Learning by Sorting: Self-supervised Learning with Group Ordering Constraints
Nina Shvetsova
Felix Petersen
Anna Kukleva
Bernt Schiele
Hilde Kuehne
SSL
49
13
0
05 Jan 2023
Learning Trajectory-Word Alignments for Video-Language Tasks
Xu Yang
Zhang Li
Haiyang Xu
Hanwang Zhang
Qinghao Ye
Chenliang Li
Ming Yan
Yu Zhang
Fei Huang
Songfang Huang
41
7
0
05 Jan 2023
SPRING: Situated Conversation Agent Pretrained with Multimodal Questions from Incremental Layout Graph
Yuxing Long
Binyuan Hui
Fulong Ye
Yanyang Li
Zhuoxin Han
Caixia Yuan
Yongbin Li
Xiaojie Wang
LLMAG
38
7
0
05 Jan 2023
Semi-MAE: Masked Autoencoders for Semi-supervised Vision Transformers
Haojie Yu
Kangnian Zhao
Xiaoming Xu
ViT
33
1
0
04 Jan 2023
Explainability and Robustness of Deep Visual Classification Models
Jindong Gu
AAML
52
2
0
03 Jan 2023
TinyMIM: An Empirical Study of Distilling MIM Pre-trained Models
Sucheng Ren
Fangyun Wei
Zheng-Wei Zhang
Han Hu
44
35
0
03 Jan 2023
Understanding Imbalanced Semantic Segmentation Through Neural Collapse
Zhisheng Zhong
Jiequan Cui
Yibo Yang
Xiaoyang Wu
Xiaojuan Qi
Xinming Zhang
Jiaya Jia
143
46
0
03 Jan 2023
PanopticPartFormer++: A Unified and Decoupled View for Panoptic Part Segmentation
Xiangtai Li
Shilin Xu
Yibo Yang
Haobo Yuan
Guangliang Cheng
Yu Tong
Zhouchen Lin
Ming-Hsuan Yang
Dacheng Tao
ViT
54
21
0
03 Jan 2023
Class-Continuous Conditional Generative Neural Radiance Field
Jiwook Kim
Minhyeok Lee
35
4
0
03 Jan 2023
Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation
Jianzong Wu
Xiangtai Li
Henghui Ding
Xia Li
Guangliang Cheng
Yu Tong
Chen Change Loy
VLM
100
31
0
02 Jan 2023
NaQ: Leveraging Narrations as Queries to Supervise Episodic Memory
Santhosh Kumar Ramakrishnan
Ziad Al-Halah
Kristen Grauman
119
39
0
02 Jan 2023
P3DC-Shot: Prior-Driven Discrete Data Calibration for Nearest-Neighbor Few-Shot Classification
Shuang Wang
Rui Ma
Tieru Wu
Yang Cao
31
5
0
02 Jan 2023
Muse: Text-To-Image Generation via Masked Generative Transformers
Huiwen Chang
Han Zhang
Jarred Barber
AJ Maschinot
José Lezama
...
Kevin Patrick Murphy
William T. Freeman
Michael Rubinstein
Yuanzhen Li
Dilip Krishnan
DiffM
197
526
0
02 Jan 2023
Deep Learning Technique for Human Parsing: A Survey and Outlook
Lu Yang
Wenhe Jia
Shane Li
Q. Song
ViT
61
17
0
01 Jan 2023
DiRaC-I: Identifying Diverse and Rare Training Classes for Zero-Shot Learning
Sandipan Sarma
Arijit Sur
VLM
29
1
0
31 Dec 2022
Unlearnable Clusters: Towards Label-agnostic Unlearnable Examples
Jiaming Zhang
Xingjun Ma
Qiaomin Yi
Jitao Sang
Yugang Jiang
Yaowei Wang
Changsheng Xu
21
24
0
31 Dec 2022
Stroke-based Rendering: From Heuristics to Deep Learning
Florian Nolte
Andrew Melnik
Helge J. Ritter
GAN
43
5
0
30 Dec 2022
HiTeA: Hierarchical Temporal-Aware Video-Language Pre-training
Qinghao Ye
Guohai Xu
Ming Yan
Haiyang Xu
Qi Qian
Ji Zhang
Fei Huang
VLM
AI4TS
188
70
0
30 Dec 2022
Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and Text-to-Image Diffusion Models
Jiale Xu
Xintao Wang
Weihao Cheng
Yan-Pei Cao
Ying Shan
Xiaohu Qie
Shenghua Gao
188
161
0
28 Dec 2022
Swin MAE: Masked Autoencoders for Small Datasets
Zián Xu
Yin Dai
Fayu Liu
Weibin Chen
Yue Liu
Li-Li Shi
Sheng Liu
Yuhang Zhou
SyDa
MedIm
ViT
36
28
0
28 Dec 2022
Exploring Vision Transformers as Diffusion Learners
He Cao
Jianan Wang
Tianhe Ren
Xianbiao Qi
Yihao Chen
Yuan Yao
Lefei Zhang
44
10
0
28 Dec 2022
MVTN: Learning Multi-View Transformations for 3D Understanding
Abdullah Hamdi
Faisal AlZahrani
Silvio Giancola
Guohao Li
3DPC
3DV
36
6
0
27 Dec 2022
DiffFace: Diffusion-based Face Swapping with Facial Guidance
Kihong Kim
Yunho Kim
Seokju Cho
Junyoung Seo
Jisu Nam
Kychul Lee
Seung Wook Kim
Kwanghee Lee
DiffM
32
53
0
27 Dec 2022
PaletteNeRF: Palette-based Color Editing for NeRFs
Qiling Wu
Jianchao Tan
Kun Xu
37
18
0
25 Dec 2022
On Realization of Intelligent Decision-Making in the Real World: A Foundation Decision Model Perspective
Ying Wen
Bo Liu
M. Zhou
Shufang Hou
Zhe Cao
Chenyang Le
Jingxiao Chen
Zheng Tian
Weinan Zhang
Jun Wang
AI4CE
26
10
0
24 Dec 2022
Principled and Efficient Transfer Learning of Deep Models via Neural Collapse
Xiao Li
Sheng Liu
Jin-li Zhou
Xin Lu
C. Fernandez‐Granda
Zhihui Zhu
Q. Qu
AAML
30
19
0
23 Dec 2022
GOOD: Exploring Geometric Cues for Detecting Objects in an Open World
Haiwen Huang
Andreas Geiger
Dan Zhang
VLM
ObjD
21
11
0
22 Dec 2022
Robust Meta-Representation Learning via Global Label Inference and Classification
Ruohan Wang
Isak Falk
Massimiliano Pontil
C. Ciliberto
43
3
0
22 Dec 2022
Reversible Column Networks
Yuxuan Cai
Yi Zhou
Qi Han
Jianjian Sun
Xiangwen Kong
Jun Yu Li
Xiangyu Zhang
VLM
36
53
0
22 Dec 2022
Tune-A-Video: One-Shot Tuning of Image Diffusion Models for Text-to-Video Generation
Jay Zhangjie Wu
Yixiao Ge
Xintao Wang
Weixian Lei
Yuchao Gu
Yufei Shi
Wynne Hsu
Ying Shan
Xiaohu Qie
Mike Zheng Shou
VGen
64
694
0
22 Dec 2022
Multi-modal Molecule Structure-text Model for Text-based Retrieval and Editing
Shengchao Liu
Weili Nie
Chengpeng Wang
Jiarui Lu
Zhuoran Qiao
Ling Liu
Jian Tang
Chaowei Xiao
Anima Anandkumar
48
155
0
21 Dec 2022
Unleashing the Power of Visual Prompting At the Pixel Level
Junyang Wu
Xianhang Li
Chen Wei
Huiyu Wang
Alan Yuille
Yuyin Zhou
Cihang Xie
VPVLM
VLM
34
31
0
20 Dec 2022
A Length-Extrapolatable Transformer
Yutao Sun
Li Dong
Barun Patra
Shuming Ma
Shaohan Huang
Alon Benhaim
Vishrav Chaudhary
Xia Song
Furu Wei
35
116
0
20 Dec 2022
RangeAugment: Efficient Online Augmentation with Range Learning
Sachin Mehta
Saeid Naderiparizi
Fartash Faghri
Maxwell Horton
Lailin Chen
Ali Farhadi
Oncel Tuzel
Mohammad Rastegari
26
6
0
20 Dec 2022
Cross-modal Attention Congruence Regularization for Vision-Language Relation Alignment
Rohan Pandey
Rulin Shao
Paul Pu Liang
Ruslan Salakhutdinov
Louis-Philippe Morency
34
12
0
20 Dec 2022
Does CLIP Bind Concepts? Probing Compositionality in Large Image Models
Martha Lewis
Nihal V. Nayak
Peilin Yu
Qinan Yu
Jack Merullo
Stephen H. Bach
Ellie Pavlick
VLM
OCL
CoGe
34
59
0
20 Dec 2022
DePlot: One-shot visual language reasoning by plot-to-table translation
Fangyu Liu
Julian Martin Eisenschlos
Francesco Piccinno
Syrine Krichene
Chenxi Pang
Kenton Lee
Mandar Joshi
Wenhu Chen
Nigel Collier
Yasemin Altun
VLM
ReLM
LRM
35
89
0
20 Dec 2022
Can Current Task-oriented Dialogue Models Automate Real-world Scenarios in the Wild?
Sang-Woo Lee
Sungdong Kim
Donghyeon Ko
Dong-hyun Ham
Youngki Hong
...
Wangkyo Jung
Kyunghyun Cho
Donghyun Kwak
H. Noh
W. Park
56
1
0
20 Dec 2022
Towards Unsupervised Visual Reasoning: Do Off-The-Shelf Features Know How to Reason?
Monika Wysoczañska
Tom Monnier
Tomasz Trzciñski
David Picard
ReLM
OCL
45
1
0
20 Dec 2022
Position-guided Text Prompt for Vision-Language Pre-training
Alex Jinpeng Wang
Pan Zhou
Mike Zheng Shou
Shuicheng Yan
VLM
24
37
0
19 Dec 2022
Universal Object Detection with Large Vision Model
Feng-Huei Lin
Wenze Hu
Yaowei Wang
Yonghong Tian
Guangming Lu
Fanglin Chen
Yong-mei Xu
Xiaoyu Wang
VLM
ObjD
39
8
0
19 Dec 2022
AI Art in Architecture
J. Ploennigs
Markus Berger
DiffM
45
66
0
19 Dec 2022
SrTR: Self-reasoning Transformer with Visual-linguistic Knowledge for Scene Graph Generation
Yuxiang Zhang
Zhenbo Liu
Shuai Wang
ReLM
LRM
39
1
0
19 Dec 2022
Diffusing Surrogate Dreams of Video Scenes to Predict Video Memorability
Lorin Sweeney
Graham Healy
Alan F. Smeaton
DiffM
22
2
0
19 Dec 2022
Transferring General Multimodal Pretrained Models to Text Recognition
Junyang Lin
Xuancheng Ren
Yichang Zhang
Gao Liu
Peng Wang
An Yang
Chang Zhou
34
4
0
19 Dec 2022
Face Generation and Editing with StyleGAN: A Survey
Andrew Melnik
Maksim Miasayedzenkau
Dzianis Makaravets
Dzianis Pirshtuk
Eren Akbulut
Dennis Holzmann
Tarek Renusch
Gustav Reichert
Helge J. Ritter
CVBM
34
40
0
18 Dec 2022
Previous
1
2
3
...
178
179
180
...
205
206
207
Next