Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.00020
Cited By
Learning Transferable Visual Models From Natural Language Supervision
26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Learning Transferable Visual Models From Natural Language Supervision"
50 / 10,312 papers shown
Title
HyperStyle: StyleGAN Inversion with HyperNetworks for Real Image Editing
Yuval Alaluf
Omer Tov
Ron Mokady
Rinon Gal
Amit H. Bermano
54
266
0
30 Nov 2021
Sound-Guided Semantic Image Manipulation
Seung Hyun Lee
Wonseok Roh
Wonmin Byeon
Sang Ho Yoon
Chanyoung Kim
Jinkyu Kim
Sangpil Kim
DiffM
37
43
0
30 Nov 2021
CRIS: CLIP-Driven Referring Image Segmentation
Zhaoqing Wang
Yu Lu
Qiang Li
Xunqiang Tao
Yan Guo
Ming Gong
Tongliang Liu
VLM
63
361
0
30 Nov 2021
CLIP Meets Video Captioning: Concept-Aware Representation Learning Does Matter
Bang-ju Yang
Tong Zhang
Yuexian Zou
CLIP
25
20
0
30 Nov 2021
Vector Quantized Diffusion Model for Text-to-Image Synthesis
Shuyang Gu
Dong Chen
Jianmin Bao
Fang Wen
Bo Zhang
Dongdong Chen
Lu Yuan
B. Guo
DiffM
76
766
0
29 Nov 2021
Point-BERT: Pre-training 3D Point Cloud Transformers with Masked Point Modeling
Xumin Yu
Lulu Tang
Yongming Rao
Tiejun Huang
Jie Zhou
Jiwen Lu
3DPC
53
655
0
29 Nov 2021
Blended Diffusion for Text-driven Editing of Natural Images
Omri Avrahami
Dani Lischinski
Ohad Fried
DiffM
55
922
0
29 Nov 2021
Classification-Regression for Chart Comprehension
Matan Levy
Rami Ben-Ari
Dani Lischinski
31
15
0
29 Nov 2021
ZeroCap: Zero-Shot Image-to-Text Generation for Visual-Semantic Arithmetic
Yoad Tewel
Yoav Shalev
Idan Schwartz
Lior Wolf
VLM
34
192
0
29 Nov 2021
Collective Intelligence for Deep Learning: A Survey of Recent Developments
David R Ha
Yu Tang
AI4CE
33
69
0
29 Nov 2021
LAFITE: Towards Language-Free Training for Text-to-Image Generation
Yufan Zhou
Ruiyi Zhang
Changyou Chen
Chunyuan Li
Chris Tensmeyer
Tong Yu
Jiuxiang Gu
Jinhui Xu
Tong Sun
VLM
35
163
0
27 Nov 2021
VL-LTR: Learning Class-wise Visual-Linguistic Representation for Long-Tailed Visual Recognition
Changyao Tian
Wenhai Wang
Xizhou Zhu
Jifeng Dai
Yu Qiao
VLM
32
70
0
26 Nov 2021
Predict, Prevent, and Evaluate: Disentangled Text-Driven Image Manipulation Empowered by Pre-Trained Vision-Language Model
Zipeng Xu
Tianwei Lin
Hao Tang
Fu Li
Dongliang He
N. Sebe
Radu Timofte
Luc Van Gool
Errui Ding
EGVM
38
41
0
26 Nov 2021
Domain Prompt Learning for Efficiently Adapting CLIP to Unseen Domains
X. Zhang
S. Gu
Yutaka Matsuo
Yusuke Iwasawa
VLM
55
37
0
25 Nov 2021
Generating More Pertinent Captions by Leveraging Semantics and Style on Multi-Source Datasets
Marcella Cornia
Lorenzo Baraldi
G. Fiameni
Rita Cucchiara
20
12
0
24 Nov 2021
Unleashing Transformers: Parallel Token Prediction with Discrete Absorbing Diffusion for Fast High-Resolution Image Generation from Vector-Quantized Codes
Sam Bond-Taylor
P. Hessey
Hiroshi Sasaki
T. Breckon
Chris G. Willcocks
DiffM
27
71
0
24 Nov 2021
Open-Vocabulary Instance Segmentation via Robust Cross-Modal Pseudo-Labeling
Dat T. Huynh
Jason Kuen
Zhe Lin
Jiuxiang Gu
Ehsan Elhamifar
ISeg
VLM
35
84
0
24 Nov 2021
VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling
Tsu-Jui Fu
Linjie Li
Zhe Gan
Kevin Qinghong Lin
Wenjie Wang
Lijuan Wang
Zicheng Liu
VLM
53
218
0
24 Nov 2021
NÜWA: Visual Synthesis Pre-training for Neural visUal World creAtion
Chenfei Wu
Jian Liang
Lei Ji
Fan Yang
Yuejian Fang
Daxin Jiang
Nan Duan
ViT
VGen
18
293
0
24 Nov 2021
SPCL: A New Framework for Domain Adaptive Semantic Segmentation via Semantic Prototype-based Contrastive Learning
Binhui Xie
Kejia Yin
Shuang Li
33
11
0
24 Nov 2021
Scaling Up Vision-Language Pre-training for Image Captioning
Xiaowei Hu
Zhe Gan
Jianfeng Wang
Zhengyuan Yang
Zicheng Liu
Yumao Lu
Lijuan Wang
MLLM
VLM
34
246
0
24 Nov 2021
Multi-label Iterated Learning for Image Classification with Label Ambiguity
Sai Rajeswar
Pau Rodríguez López
Soumye Singhal
David Vazquez
Rameswar Panda
VLM
28
30
0
23 Nov 2021
Florence: A New Foundation Model for Computer Vision
Lu Yuan
Dongdong Chen
Yi-Ling Chen
Noel Codella
Xiyang Dai
...
Zhen Xiao
Jianwei Yang
Michael Zeng
Luowei Zhou
Pengchuan Zhang
VLM
75
882
0
22 Nov 2021
Class-agnostic Object Detection with Multi-modal Transformer
Muhammad Maaz
H. Rasheed
Salman Khan
Fahad Shahbaz Khan
Rao Muhammad Anwer
Ming-Hsuan Yang
23
91
0
22 Nov 2021
L-Verse: Bidirectional Generation Between Image and Text
Taehoon Kim
Gwangmo Song
Sihaeng Lee
Sangyun Kim
Yewon Seo
Soonyoung Lee
S. Kim
Honglak Lee
Kyunghoon Bae
31
25
0
22 Nov 2021
Many Heads but One Brain: Fusion Brain -- a Competition and a Single Multimodal Multitask Architecture
Daria Bakshandaeva
Denis Dimitrov
V.Ya. Arkhipkin
Alex Shonenkov
M. Potanin
...
Mikhail Martynov
Anton Voronov
Vera Davydova
E. Tutubalina
Aleksandr Petiushko
45
0
0
22 Nov 2021
Self-supervised Semi-supervised Learning for Data Labeling and Quality Evaluation
Haoping Bai
Mengyao Cao
Ping Huang
Jiulong Shan
SSL
31
10
0
22 Nov 2021
MaIL: A Unified Mask-Image-Language Trimodal Network for Referring Image Segmentation
Zizhang Li
Mengmeng Wang
Jianbiao Mei
Yong Liu
20
18
0
21 Nov 2021
UFO: A UniFied TransfOrmer for Vision-Language Representation Learning
Jianfeng Wang
Xiaowei Hu
Zhe Gan
Zhengyuan Yang
Xiyang Dai
Zicheng Liu
Yumao Lu
Lijuan Wang
ViT
29
57
0
19 Nov 2021
Beyond NDCG: behavioral testing of recommender systems with RecList
P. Chia
Jacopo Tagliabue
Federico Bianchi
Chloe He
Brian Ko
38
27
0
18 Nov 2021
Simple but Effective: CLIP Embeddings for Embodied AI
Apoorv Khandelwal
Luca Weihs
Roozbeh Mottaghi
Aniruddha Kembhavi
VLM
LM&Ro
47
219
0
18 Nov 2021
Swin Transformer V2: Scaling Up Capacity and Resolution
Ze Liu
Han Hu
Yutong Lin
Zhuliang Yao
Zhenda Xie
...
Yue Cao
Zheng-Wei Zhang
Li Dong
Furu Wei
B. Guo
ViT
100
1,761
0
18 Nov 2021
One-Shot Generative Domain Adaptation
Ceyuan Yang
Yujun Shen
Zhiyi Zhang
Yinghao Xu
Jiapeng Zhu
Zhirong Wu
Bolei Zhou
31
49
0
18 Nov 2021
Self-Supervised Class Incremental Learning
Zixuan Ni
Siliang Tang
Yueting Zhuang
CLL
36
6
0
18 Nov 2021
CSI: Contrastive Data Stratification for Interaction Prediction and its Application to Compound-Protein Interaction Prediction
A. Kalia
Dilip Krishnan
Soha Hassoun Tufts University
23
2
0
18 Nov 2021
Learning to Compose Visual Relations
Nan Liu
Shuang Li
Yilun Du
J. Tenenbaum
Antonio Torralba
CoGe
OCL
32
77
0
17 Nov 2021
Transparent Human Evaluation for Image Captioning
Jungo Kasai
Keisuke Sakaguchi
Lavinia Dunagan
Jacob Morrison
Ronan Le Bras
Yejin Choi
Noah A. Smith
33
47
0
17 Nov 2021
EMScore: Evaluating Video Captioning via Coarse-Grained and Fine-Grained Embedding Matching
Yaya Shi
Xu Yang
Haiyang Xu
Chunfen Yuan
Bing Li
Weiming Hu
Zhengjun Zha
39
33
0
17 Nov 2021
Achieving Human Parity on Visual Question Answering
Ming Yan
Haiyang Xu
Chenliang Li
Junfeng Tian
Bin Bi
...
Ji Zhang
Songfang Huang
Fei Huang
Luo Si
Rong Jin
32
12
0
17 Nov 2021
LiT: Zero-Shot Transfer with Locked-image text Tuning
Xiaohua Zhai
Tianlin Li
Basil Mustafa
Andreas Steiner
Daniel Keysers
Alexander Kolesnikov
Lucas Beyer
VLM
48
544
0
15 Nov 2021
Scaling Law for Recommendation Models: Towards General-purpose User Representations
Kyuyong Shin
Hanock Kwak
KyungHyun Kim
Max Nihlén Ramström
Jisu Jeong
Jung-Woo Ha
S. Kim
ELM
36
38
0
15 Nov 2021
Explainable Semantic Space by Grounding Language to Vision with Cross-Modal Contrastive Learning
Yizhen Zhang
Minkyu Choi
Kuan Han
Zhongming Liu
VLM
23
15
0
13 Nov 2021
A Survey of Visual Transformers
Yang Liu
Yao Zhang
Yixin Wang
Feng Hou
Jin Yuan
Jiang Tian
Yang Zhang
Zhongchao Shi
Jianping Fan
Zhiqiang He
3DGS
ViT
79
332
0
11 Nov 2021
Advances in Neural Rendering
A. Tewari
Justus Thies
B. Mildenhall
P. Srinivasan
E. Tretschk
...
S. Fanello
Jun Zhu
Gordon Wetzstein
Michael Zollhoefer
D. B. Goldman
3DH
AI4CE
73
446
0
10 Nov 2021
FILIP: Fine-grained Interactive Language-Image Pre-Training
Lewei Yao
Runhu Huang
Lu Hou
Guansong Lu
Minzhe Niu
Hang Xu
Xiaodan Liang
Zhenguo Li
Xin Jiang
Chunjing Xu
VLM
CLIP
32
619
0
09 Nov 2021
Evolving Evocative 2D Views of Generated 3D Objects
Eric Chu
27
4
0
08 Nov 2021
Tip-Adapter: Training-free CLIP-Adapter for Better Vision-Language Modeling
Renrui Zhang
Rongyao Fang
Wei Zhang
Peng Gao
Kunchang Li
Jifeng Dai
Yu Qiao
Hongsheng Li
VLM
196
387
0
06 Nov 2021
Visualizing the Emergence of Intermediate Visual Patterns in DNNs
Mingjie Li
Shaobo Wang
Quanshi Zhang
51
11
0
05 Nov 2021
EditGAN: High-Precision Semantic Image Editing
Huan Ling
Karsten Kreis
Daiqing Li
Seung Wook Kim
Antonio Torralba
Sanja Fidler
DiffM
GAN
18
201
0
04 Nov 2021
Benchmarking Multimodal AutoML for Tabular Data with Text Fields
Xingjian Shi
Jonas W. Mueller
Nick Erickson
Mu Li
Alexander J. Smola
LMTD
48
29
0
04 Nov 2021
Previous
1
2
3
...
201
202
203
...
205
206
207
Next