Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.00020
Cited By
Learning Transferable Visual Models From Natural Language Supervision
26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Learning Transferable Visual Models From Natural Language Supervision"
50 / 10,282 papers shown
Title
Diffusion Video Autoencoders: Toward Temporally Consistent Face Video Editing via Disentangled Video Encoding
Gyeongman Kim
Hajin Shim
Hyunsung Kim
Yunjey Choi
Junho Kim
Eunho Yang
DiffM
VGen
39
31
0
06 Dec 2022
Spuriosity Rankings: Sorting Data to Measure and Mitigate Biases
Mazda Moayeri
Wenxiao Wang
Sahil Singla
S. Feizi
71
14
0
05 Dec 2022
Unifying Vision, Text, and Layout for Universal Document Processing
Zineng Tang
Ziyi Yang
Guoxin Wang
Yuwei Fang
Yang Liu
Chenguang Zhu
Michael Zeng
Chao-Yue Zhang
Joey Tianyi Zhou
VLM
32
107
0
05 Dec 2022
One-shot Implicit Animatable Avatars with Model-based Priors
Yangyi Huang
Hongwei Yi
Weiyang Liu
Haofan Wang
Boxi Wu
Wenxiao Wang
Binbin Lin
Debing Zhang
Deng Cai
3DH
34
32
0
05 Dec 2022
I2MVFormer: Large Language Model Generated Multi-View Document Supervision for Zero-Shot Image Classification
Muhammad Ferjad Naeem
Muhammad Gul Zain Ali Khan
Yongqin Xian
Muhammad Zeshan Afzal
D. Stricker
Luc Van Gool
F. Tombari
VLM
35
52
0
05 Dec 2022
3D-LatentMapper: View Agnostic Single-View Reconstruction of 3D Shapes
Alara Dirik
Pinar Yanardag
3DV
19
1
0
05 Dec 2022
CLIPVG: Text-Guided Image Manipulation Using Differentiable Vector Graphics
Yiren Song
Xuning Shao
Kang Chen
Weidong Zhang
Minzhe Li
Zhongliang Jing
CLIP
VLM
29
22
0
05 Dec 2022
Med-Query: Steerable Parsing of 9-DoF Medical Anatomies with Query Embedding
Heng Guo
Jianfeng Zhang
K. Yan
Le Lu
Minfeng Xu
MedIm
24
2
0
05 Dec 2022
PointCaM: Cut-and-Mix for Open-Set Point Cloud Learning
Jie Hong
Shi Qiu
Weihong Li
Saeed Anwar
Mehrtash Harandi
Nick Barnes
L. Petersson
3DPC
41
6
0
05 Dec 2022
Controllable Image Captioning via Prompting
Ning Wang
Jiahao Xie
Jihao Wu
Mingbo Jia
Linlin Li
24
23
0
04 Dec 2022
Brain Tumor Synthetic Data Generation with Adaptive StyleGANs
Usama Tariq
Rizwan Qureshi
Ana Zafar
Danyal Aftab
Jia Wu
Tanvirul Alam
Zubair Shah
Hazrat Ali
MedIm
20
8
0
04 Dec 2022
CoupAlign: Coupling Word-Pixel with Sentence-Mask Alignments for Referring Image Segmentation
Zicheng Zhang
Yi Zhu
Jian-zhuo Liu
Xiaodan Liang
Wei Ke
36
29
0
04 Dec 2022
Improving Zero-shot Generalization and Robustness of Multi-modal Models
Yunhao Ge
Jie Jessie Ren
Andrew Gallagher
Yuxiao Wang
Ming Yang
Hartwig Adam
Laurent Itti
Balaji Lakshminarayanan
Jiaping Zhao
VLM
32
35
0
04 Dec 2022
VLG: General Video Recognition with Web Textual Knowledge
Jintao Lin
Zhaoyang Liu
Wenhai Wang
Wayne Wu
Limin Wang
39
0
0
03 Dec 2022
Generalizing Multiple Object Tracking to Unseen Domains by Introducing Natural Language Representation
En Yu
Songtao Liu
Zhuoling Li
Jinrong Yang
Zeming Li
Shoudong Han
Wenbing Tao
29
12
0
03 Dec 2022
PartSLIP: Low-Shot Part Segmentation for 3D Point Clouds via Pretrained Image-Language Models
Minghua Liu
Yinhao Zhu
H. Cai
Shizhong Han
Z. Ling
Fatih Porikli
Hao Su
3DPC
41
70
0
03 Dec 2022
Event knowledge in large language models: the gap between the impossible and the unlikely
Carina Kauf
Anna A. Ivanova
Giulia Rambelli
Emmanuele Chersoni
Jingyuan Selena She
Zawad Chowdhury
Evelina Fedorenko
Alessandro Lenci
37
67
0
02 Dec 2022
LatentSwap3D: Semantic Edits on 3D Image GANs
Enis Simsar
A. Tonioni
Evin Pınar Örnek
F. Tombari
50
8
0
02 Dec 2022
Embedding Synthetic Off-Policy Experience for Autonomous Driving via Zero-Shot Curricula
Eli Bronstein
S. Srinivasan
Supratik Paul
Aman Sinha
Matthew O'Kelly
Payam Nikdel
Shimon Whiteson
OffRL
8
18
0
02 Dec 2022
3D-TOGO: Towards Text-Guided Cross-Category 3D Object Generation
Zutao Jiang
Guangsong Lu
Xiaodan Liang
Jihua Zhu
Wei Zhang
Xiaojun Chang
Hang Xu
DiffM
21
8
0
02 Dec 2022
Sonus Texere! Automated Dense Soundtrack Construction for Books using Movie Adaptations
Jaidev Shriram
Makarand Tapaswi
Vinoo Alluri
19
2
0
02 Dec 2022
ObjectStitch: Generative Object Compositing
Yi-Zhe Song
Zhifei Zhang
Zhe-nan Lin
Scott D. Cohen
Brian L. Price
Jianming Zhang
Seunggeun Kim
Daniel G. Aliaga
DiffM
26
31
0
02 Dec 2022
Focus! Relevant and Sufficient Context Selection for News Image Captioning
Mingyang Zhou
Grace Luo
Anna Rohrbach
Zhou Yu
CLIP
30
13
0
01 Dec 2022
UniT3D: A Unified Transformer for 3D Dense Captioning and Visual Grounding
Dave Zhenyu Chen
Ronghang Hu
Xinlei Chen
Matthias Nießner
Angel X. Chang
29
52
0
01 Dec 2022
Scaling Language-Image Pre-training via Masking
Yanghao Li
Haoqi Fan
Ronghang Hu
Christoph Feichtenhofer
Kaiming He
CLIP
VLM
45
318
0
01 Dec 2022
Improving Zero-Shot Models with Label Distribution Priors
Jonathan Kahana
Niv Cohen
Yedid Hoshen
VLM
17
14
0
01 Dec 2022
ResFormer: Scaling ViTs with Multi-Resolution Training
Rui Tian
Zuxuan Wu
Qiuju Dai
Hang-Rui Hu
Yu Qiao
Yu-Gang Jiang
ViT
32
33
0
01 Dec 2022
Score Jacobian Chaining: Lifting Pretrained 2D Diffusion Models for 3D Generation
Haochen Wang
Xiaodan Du
Jiahao Li
Raymond A. Yeh
Gregory Shakhnarovich
DiffM
69
528
0
01 Dec 2022
FakeOut: Leveraging Out-of-domain Self-supervision for Multi-modal Video Deepfake Detection
Gil Knafo
Ohad Fried
31
5
0
01 Dec 2022
What do you MEME? Generating Explanations for Visual Semantic Role Labelling in Memes
Shivam Sharma
Siddhant Agarwal
Tharun Suresh
Preslav Nakov
Md. Shad Akhtar
Tanmoy Charkraborty
VLM
33
18
0
01 Dec 2022
Hyperbolic Contrastive Learning for Visual Representations beyond Objects
Songwei Ge
Shlok Kumar Mishra
Simon Kornblith
Chun-Liang Li
David Jacobs
OCL
SSL
27
51
0
01 Dec 2022
One-shot recognition of any material anywhere using contrastive learning with physics-based rendering
Manuel S. Drehwald
S. Eppel
Jolina Li
Han Hao
Alán Aspuru-Guzik
38
6
0
01 Dec 2022
Finetune like you pretrain: Improved finetuning of zero-shot vision models
Sachin Goyal
Ananya Kumar
Sankalp Garg
Zico Kolter
Aditi Raghunathan
CLIP
VLM
58
138
0
01 Dec 2022
Localization vs. Semantics: Visual Representations in Unimodal and Multimodal Models
Zhuowan Li
Cihang Xie
Benjamin Van Durme
Alan Yuille
VLM
SSL
28
2
0
01 Dec 2022
Shape-Guided Diffusion with Inside-Outside Attention
Dong Huk Park
Grace Luo
C. Toste
S. Azadi
Xihui Liu
M. Karalashvili
Anna Rohrbach
Trevor Darrell
DiffM
40
44
0
01 Dec 2022
One Artist's Personal Reflections on Methods and Ethics of Creating Mixed Media Artificial Intelligence Art
J. Adams
26
1
0
30 Nov 2022
ObjCAViT: Improving Monocular Depth Estimation Using Natural Language Models And Image-Object Cross-Attention
Dylan Auty
K. Mikolajczyk
VLM
25
3
0
30 Nov 2022
High-Fidelity Guided Image Synthesis with Latent Diffusion Models
Jaskirat Singh
Stephen Gould
Liang Zheng
DiffM
41
40
0
30 Nov 2022
Exploiting Category Names for Few-Shot Classification with Vision-Language Models
Taihong Xiao
Zirui Wang
Liangliang Cao
Jiahui Yu
Shengyang Dai
Ming Yang
VLM
MLLM
36
5
0
29 Nov 2022
SinDDM: A Single Image Denoising Diffusion Model
Vladimir Kulikov
Shahar Yadin
Matan Kleiner
T. Michaeli
DiffM
20
77
0
29 Nov 2022
Abstract Visual Reasoning with Tangram Shapes
Anya Ji
Noriyuki Kojima
N. Rush
Alane Suhr
Wai Keen Vong
Robert D. Hawkins
Yoav Artzi
LRM
17
34
0
29 Nov 2022
NeuralLift-360: Lifting An In-the-wild 2D Photo to A 3D Object with 360° Views
Dejia Xu
Yi Ding
Peihao Wang
Zhiwen Fan
Yi Wang
Zhangyang Wang
DiffM
51
143
0
29 Nov 2022
DATID-3D: Diversity-Preserved Domain Adaptation Using Text-to-Image Diffusion for 3D Generative Model
Gwanghyun Kim
S. Chun
DiffM
33
39
0
29 Nov 2022
Context-Aware Robust Fine-Tuning
Xiaofeng Mao
YueFeng Chen
Xiaojun Jia
Rong Zhang
Hui Xue
Zhao Li
VLM
CLIP
40
25
0
29 Nov 2022
NoisyQuant: Noisy Bias-Enhanced Post-Training Activation Quantization for Vision Transformers
Yijiang Liu
Huanrui Yang
Zhen Dong
Kurt Keutzer
Li Du
Shanghang Zhang
MQ
33
47
0
29 Nov 2022
UDE: A Unified Driving Engine for Human Motion Generation
Zixiang Zhou
Baoyuan Wang
DiffM
36
60
0
29 Nov 2022
One is All: Bridging the Gap Between Neural Radiance Fields Architectures with Progressive Volume Distillation
Shuangkang Fang
Weixin Xu
Heng Wang
Yi Yang
Yu-feng Wang
Shuchang Zhou
37
15
0
29 Nov 2022
Survey on Self-Supervised Multimodal Representation Learning and Foundation Models
Sushil Thapa
AI4TS
SSL
20
1
0
29 Nov 2022
DQ-DETR: Dual Query Detection Transformer for Phrase Extraction and Grounding
Siyi Liu
Yaoyuan Liang
Feng Li
Shijia Huang
Hao Zhang
Hang Su
Jun Zhu
Lei Zhang
ObjD
50
26
0
28 Nov 2022
Perceive, Ground, Reason, and Act: A Benchmark for General-purpose Visual Representation
Jiangyong Huang
William Zhu
Baoxiong Jia
Zan Wang
Xiaojian Ma
Qing Li
Siyuan Huang
40
5
0
28 Nov 2022
Previous
1
2
3
...
180
181
182
...
204
205
206
Next