Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.00020
Cited By
Learning Transferable Visual Models From Natural Language Supervision
26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Learning Transferable Visual Models From Natural Language Supervision"
50 / 10,312 papers shown
Title
3D Point Cloud Pre-training with Knowledge Distillation from 2D Images
Yuan Yao
Yuanhan Zhang
Zhen-fei Yin
Jiebo Luo
Wanli Ouyang
Xiaoshui Huang
3DPC
29
10
0
17 Dec 2022
Foundation models in brief: A historical, socio-technical focus
Johannes Schneider
VLM
34
9
0
17 Dec 2022
Hyperbolic Hierarchical Contrastive Hashing
Rukai Wei
Yu Liu
Jingkuan Song
Yanzhao Xie
Ke Zhou
37
7
0
17 Dec 2022
Point-E: A System for Generating 3D Point Clouds from Complex Prompts
Alex Nichol
Heewoo Jun
Prafulla Dhariwal
Pamela Mishkin
Mark Chen
DiffM
58
587
0
16 Dec 2022
Uncovering the Disentanglement Capability in Text-to-Image Diffusion Models
Qiucheng Wu
Yujian Liu
Handong Zhao
Ajinkya Kale
T. Bui
Tong Yu
Zhe Lin
Yang Zhang
Shiyu Chang
DiffM
CoGe
30
98
0
16 Dec 2022
Attentive Mask CLIP
Yifan Yang
Weiquan Huang
Yixuan Wei
Houwen Peng
Xinyang Jiang
...
Fangyun Wei
Yin Wang
Han Hu
Lili Qiu
Yuqing Yang
CLIP
VLM
42
27
0
16 Dec 2022
Enhancing Multi-modal and Multi-hop Question Answering via Structured Knowledge and Unified Retrieval-Generation
Qian Yang
Qian Chen
Wen Wang
Baotian Hu
Min Zhang
44
24
0
16 Dec 2022
Autoencoders as Cross-Modal Teachers: Can Pretrained 2D Image Transformers Help 3D Representation Learning?
Runpei Dong
Zekun Qi
Linfeng Zhang
Junbo Zhang
Jian‐Yuan Sun
Zheng Ge
Li Yi
Kaisheng Ma
ViT
3DPC
29
84
0
16 Dec 2022
Can We Find Strong Lottery Tickets in Generative Models?
Sangyeop Yeo
Yoojin Jang
Jy-yong Sohn
Dongyoon Han
Jaejun Yoo
20
6
0
16 Dec 2022
CLIP is Also an Efficient Segmenter: A Text-Driven Approach for Weakly Supervised Semantic Segmentation
Yuqi Lin
Minghao Chen
Wenxiao Wang
Boxi Wu
Ke Li
Binbin Lin
Haifeng Liu
Xiaofei He
VLM
CLIP
22
119
0
16 Dec 2022
MM-SHAP: A Performance-agnostic Metric for Measuring Multimodal Contributions in Vision and Language Models & Tasks
Letitia Parcalabescu
Anette Frank
40
22
0
15 Dec 2022
NeRF-Art: Text-Driven Neural Radiance Fields Stylization
Can Wang
Ruixia Jiang
Menglei Chai
Mingming He
Dongdong Chen
Jing Liao
AI4CE
26
119
0
15 Dec 2022
Objaverse: A Universe of Annotated 3D Objects
Matt Deitke
Dustin Schwenk
Jordi Salvador
Luca Weihs
Oscar Michel
Eli VanderBilt
Ludwig Schmidt
Kiana Ehsani
Aniruddha Kembhavi
Ali Farhadi
31
894
0
15 Dec 2022
FlexiViT: One Model for All Patch Sizes
Lucas Beyer
Pavel Izmailov
Alexander Kolesnikov
Mathilde Caron
Simon Kornblith
Xiaohua Zhai
Matthias Minderer
Michael Tschannen
Ibrahim M. Alabdulmohsin
Filip Pavetić
VLM
55
90
0
15 Dec 2022
Integrating Multimodal Data for Joint Generative Modeling of Complex Dynamics
Manuela Brenner
Florian Hess
G. Koppe
Daniel Durstewitz
33
10
0
15 Dec 2022
TeTIm-Eval: a novel curated evaluation data set for comparing text-to-image models
Federico A. Galatolo
M. G. Cimino
E. Cogotti
32
4
0
15 Dec 2022
Improve Text Classification Accuracy with Intent Information
Yifeng Xie
VLM
32
0
0
15 Dec 2022
EM-Paste: EM-guided Cut-Paste with DALL-E Augmentation for Image-level Weakly Supervised Instance Segmentation
Yunhao Ge
Lyne Tchapmi
Brian Nlong Zhao
Laurent Itti
Vibhav Vineet
DiffM
39
5
0
15 Dec 2022
Text-Guided Mask-free Local Image Retouching
Zerun Liu
Fan Zhang
Jingxuan He
Jin Wang
Zhangye Wang
Lechao Cheng
DiffM
33
5
0
15 Dec 2022
IMos: Intent-Driven Full-Body Motion Synthesis for Human-Object Interactions
Anindita Ghosh
Rishabh Dabral
Vladislav Golyanik
Christian Theobalt
P. Slusallek
DiffM
46
85
0
14 Dec 2022
Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language
Alexei Baevski
Arun Babu
Wei-Ning Hsu
Michael Auli
VLM
SSL
39
92
0
14 Dec 2022
The Infinite Index: Information Retrieval on Generative Text-To-Image Models
Niklas Deckers
Maik Fröbe
Johannes Kiesel
G. Pandolfo
Christopher Schröder
Benno Stein
Martin Potthast
DiffM
44
16
0
14 Dec 2022
ContraFeat: Contrasting Deep Features for Semantic Discovery
Xinqi Zhu
Chang Xu
Dacheng Tao
DRL
28
2
0
14 Dec 2022
CLIPSep: Learning Text-queried Sound Separation with Noisy Unlabeled Videos
Hao-Wen Dong
Naoya Takahashi
Yuki Mitsufuji
Julian McAuley
Taylor Berg-Kirkpatrick
VLM
CLIP
31
25
0
14 Dec 2022
Significantly Improving Zero-Shot X-ray Pathology Classification via Fine-tuning Pre-trained Image-Text Encoders
Jongseong Jang
Daeun Kyung
Seunghyeon Kim
Honglak Lee
Kyunghoon Bae
Edward Choi
LM&MA
MedIm
32
10
0
14 Dec 2022
Understanding Zero-Shot Adversarial Robustness for Large-Scale Models
Chengzhi Mao
Scott Geng
Junfeng Yang
Xin Eric Wang
Carl Vondrick
VLM
44
60
0
14 Dec 2022
Imagen Editor and EditBench: Advancing and Evaluating Text-Guided Image Inpainting
Su Wang
Chitwan Saharia
Ceslee Montgomery
Jordi Pont-Tuset
Shai Noy
...
Radu Soricut
Jason Baldridge
Mohammad Norouzi
Peter Anderson
William Chan
35
178
0
13 Dec 2022
CREPE: Can Vision-Language Foundation Models Reason Compositionally?
Zixian Ma
Jerry Hong
Mustafa Omer Gul
Mona Gandhi
Irena Gao
Ranjay Krishna
CoGe
37
125
0
13 Dec 2022
Foresight -- Generative Pretrained Transformer (GPT) for Modelling of Patient Timelines using EHRs
Z. Kraljevic
D. Bean
Anthony Shek
R. Bendayan
H. Hemingway
...
Alfie Baston
Jack Ross
Esther Idowu
J. Teo
Richard J. B. Dobson
AI4TS
24
20
0
13 Dec 2022
LidarCLIP or: How I Learned to Talk to Point Clouds
Georg Hess
Adam Tonderski
Christoffer Petersson
Kalle AAstrom
Lennart Svensson
DiffM
27
22
0
13 Dec 2022
Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders
Renrui Zhang
Liuhui Wang
Yu Qiao
Peng Gao
Hongsheng Li
3DPC
46
126
0
13 Dec 2022
What do Vision Transformers Learn? A Visual Exploration
Amin Ghiasi
Hamid Kazemi
Eitan Borgnia
Steven Reich
Manli Shu
Micah Goldblum
A. Wilson
Tom Goldstein
ViT
36
60
0
13 Dec 2022
TIER: Text-Image Entropy Regularization for CLIP-style models
Anil Palepu
Andrew L. Beam
MedIm
31
6
0
13 Dec 2022
Localized Latent Updates for Fine-Tuning Vision-Language Models
Moritz Ibing
I. Lim
Leif Kobbelt
VLM
26
1
0
13 Dec 2022
TencentPretrain: A Scalable and Flexible Toolkit for Pre-training Models of Different Modalities
Zhe Zhao
Yudong Li
Cheng-An Hou
Jing-xin Zhao
Rong Tian
...
Xingwu Sun
Zhanhui Kang
Xiaoyong Du
Linlin Shen
Kimmo Yan
VLM
41
23
0
13 Dec 2022
Structure-Guided Image Completion with Image-level and Object-level Semantic Discriminators
Haitian Zheng
Zhe Lin
Jingwan Lu
Scott D. Cohen
Eli Shechtman
...
Jianming Zhang
Qing Liu
Yuqian Zhou
Sohrab Amirghodsi
Jiebo Luo
DiffM
30
1
0
13 Dec 2022
Rodin: A Generative Model for Sculpting 3D Digital Avatars Using Diffusion
Tengfei Wang
Bo Zhang
Ting Zhang
Shuyang Gu
Jianmin Bao
...
Jingjing Shen
Dong Chen
Fang Wen
Qifeng Chen
B. Guo
40
280
0
12 Dec 2022
RGBD2: Generative Scene Synthesis via Incremental View Inpainting using RGBD Diffusion Models
Jiabao Lei
Jiapeng Tang
Kui Jia
DiffM
32
38
0
12 Dec 2022
A Survey on Natural Language Processing for Programming
Qingfu Zhu
Xianzhen Luo
Fang Liu
Cuiyun Gao
Wanxiang Che
25
2
0
12 Dec 2022
A Survey of Knowledge Graph Reasoning on Graph Types: Static, Dynamic, and Multimodal
K. Liang
Lingyuan Meng
Meng Liu
Yue Liu
Wenxuan Tu
Siwei Wang
Sihang Zhou
Xinwang Liu
Fu Sun
LRM
36
110
0
12 Dec 2022
On Pre-Training for Visuo-Motor Control: Revisiting a Learning-from-Scratch Baseline
Nicklas Hansen
Zhecheng Yuan
Yanjie Ze
Tongzhou Mu
Aravind Rajeswaran
H. Su
Huazhe Xu
Xiaolong Wang
37
65
0
12 Dec 2022
Multimodal and Explainable Internet Meme Classification
A. Thakur
Filip Ilievski
Hông-Ân Sandlin
Zhivar Sourati
Luca Luceri
Riccardo Tommasini
Alain Mermoud
30
6
0
11 Dec 2022
Using Multiple Instance Learning to Build Multimodal Representations
Peiqi Wang
W. Wells
Seth Berkowitz
Steven Horng
Polina Golland
SSL
26
6
0
11 Dec 2022
Cap2Aug: Caption guided Image to Image data Augmentation
Aniket Roy
Anshul B. Shah
Ketul Shah
Anirban Roy
Rama Chellappa
DiffM
36
0
0
11 Dec 2022
OpenD: A Benchmark for Language-Driven Door and Drawer Opening
Yizhou Zhao
Qiaozi Gao
Liang Qiu
Govind Thattai
Gaurav Sukhatme
49
5
0
10 Dec 2022
Uniform Masking Prevails in Vision-Language Pretraining
Siddharth Verma
Yuchen Lu
Rui Hou
Hanchao Yu
Nicolas Ballas
Madian Khabsa
Amjad Almahairi
VLM
21
0
0
10 Dec 2022
A Whac-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others
Zhiheng Li
Ivan Evtimov
Albert Gordo
C. Hazirbas
Tal Hassner
Cristian Canton Ferrer
Chenliang Xu
Mark Ibrahim
41
72
0
09 Dec 2022
SmartBrush: Text and Shape Guided Object Inpainting with Diffusion Model
Shaoan Xie
Zhifei Zhang
Zhe Lin
Tobias Hinz
Kun Zhang
DiffM
33
232
0
09 Dec 2022
Training-Free Structured Diffusion Guidance for Compositional Text-to-Image Synthesis
Weixi Feng
Xuehai He
Tsu-Jui Fu
Varun Jampani
Arjun Reddy Akula
P. Narayana
Sugato Basu
Xinze Wang
William Yang Wang
CoGe
56
300
0
09 Dec 2022
LADIS: Language Disentanglement for 3D Shape Editing
Ian Huang
Panos Achlioptas
Tianyi Zhang
Sergey Tulyakov
Minhyuk Sung
Leonidas J. Guibas
34
10
0
09 Dec 2022
Previous
1
2
3
...
179
180
181
...
205
206
207
Next