Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2103.00020
Cited By
Learning Transferable Visual Models From Natural Language Supervision
26 February 2021
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
Sandhini Agarwal
Girish Sastry
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Learning Transferable Visual Models From Natural Language Supervision"
50 / 10,312 papers shown
Title
Affective Faces for Goal-Driven Dyadic Communication
Scott Geng
Revant Teotia
Purva Tendulkar
Sachit Menon
Carl Vondrick
VGen
34
19
0
26 Jan 2023
ITstyler: Image-optimized Text-based Style Transfer
Yun-Hao Bai
Jiayue Liu
Chao Dong
Chun Yuan
35
7
0
26 Jan 2023
Towards Arbitrary Text-driven Image Manipulation via Space Alignment
Yun-Hao Bai
Zi-Qi Zhong
Chao Dong
Weichen Zhang
Guowei Xu
Chun Yuan
40
0
0
25 Jan 2023
RangeViT: Towards Vision Transformers for 3D Semantic Segmentation in Autonomous Driving
Angelika Ando
Spyros Gidaris
Andrei Bursuc
Gilles Puy
Alexandre Boulch
Renaud Marlet
ViT
3DPC
28
71
0
24 Jan 2023
A Data-Efficient Visual-Audio Representation with Intuitive Fine-tuning for Voice-Controlled Robots
Peixin Chang
Shuijing Liu
Tianchen Ji
Neeloy Chakraborty
Kaiwen Hong
Katherine Driggs-Campbell
51
3
0
23 Jan 2023
LEGO-Net: Learning Regular Rearrangements of Objects in Rooms
Qiuhong Anna Wei
Sijie Ding
Jeong Joon Park
Rahul Sajnani
A. Poulenard
Srinath Sridhar
Leonidas J. Guibas
DiffM
32
61
0
23 Jan 2023
StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis
Axel Sauer
Tero Karras
S. Laine
Andreas Geiger
Timo Aila
37
209
0
23 Jan 2023
OvarNet: Towards Open-vocabulary Object Attribute Recognition
Keyan Chen
Xiaolong Jiang
Yao Hu
Xu Tang
Yan Gao
Jianqi Chen
Weidi Xie
VLM
ObjD
40
40
0
23 Jan 2023
Fast Inference in Denoising Diffusion Models via MMD Finetuning
Emanuele Aiello
D. Valsesia
E. Magli
DiffM
24
4
0
19 Jan 2023
RecolorNeRF: Layer Decomposed Radiance Fields for Efficient Color Editing of 3D Scenes
Bingchen Gong
Yuehao Wang
Xiaoguang Han
Qingxu Dou
33
27
0
19 Jan 2023
DDS: Decoupled Dynamic Scene-Graph Generation Network
A S M Iftekhar
Raphael Ruschel
Satish Kumar
Suya You
B. S. Manjunath
47
2
0
18 Jan 2023
Joint Representation Learning for Text and 3D Point Cloud
Rui Huang
Xuran Pan
Henry Zheng
Haojun Jiang
Zhifeng Xie
S. Song
Gao Huang
41
19
0
18 Jan 2023
Temporal Perceiving Video-Language Pre-training
Fan Ma
Xiaojie Jin
Heng Wang
Jingjia Huang
Linchao Zhu
Jiashi Feng
Yi Yang
VLM
32
15
0
18 Jan 2023
Class Enhancement Losses with Pseudo Labels for Zero-shot Semantic Segmentation
S. D. Dao
Hengcan Shi
Dinh Q. Phung
Jianfei Cai
VLM
34
0
0
18 Jan 2023
GLIGEN: Open-Set Grounded Text-to-Image Generation
Yuheng Li
Haotian Liu
Qingyang Wu
Fangzhou Mu
Jianwei Yang
Jianfeng Gao
Chunyuan Li
Yong Jae Lee
VLM
82
570
1
17 Jan 2023
Vision Learners Meet Web Image-Text Pairs
Bingchen Zhao
Quan Cui
Hao Wu
Osamu Yoshie
Cheng Yang
Oisin Mac Aodha
VLM
40
5
0
17 Jan 2023
Dataset Distillation: A Comprehensive Review
Ruonan Yu
Songhua Liu
Xinchao Wang
DD
60
121
0
17 Jan 2023
RILS: Masked Visual Reconstruction in Language Semantic Space
Shusheng Yang
Yixiao Ge
Kun Yi
Dian Li
Ying Shan
Xiaohu Qie
Xinggang Wang
CLIP
43
11
0
17 Jan 2023
USER: Unified Semantic Enhancement with Momentum Contrast for Image-Text Retrieval
Yan Zhang
Zhong Ji
Dingrong Wang
Yanwei Pang
Xuelong Li
VLM
24
23
0
17 Jan 2023
A Large-Scale Outdoor Multi-modal Dataset and Benchmark for Novel View Synthesis and Implicit Scene Reconstruction
Chongshan Lu
Fukun Yin
Xin Chen
Tao Chen
YU Gang
Jiayuan Fan
30
31
0
17 Jan 2023
UATVR: Uncertainty-Adaptive Text-Video Retrieval
Bo Fang
Wenhao Wu
Chang-rui Liu
Yu Zhou
Yuxin Song
Weiping Wang
Min Yang
Xiang Ji
Jingdong Wang
28
46
0
16 Jan 2023
Multimodality Helps Unimodality: Cross-Modal Few-Shot Learning with Multimodal Models
Zhiqiu Lin
Samuel Yu
Zhiyi Kuang
Deepak Pathak
Deva Ramana
VLM
20
102
0
16 Jan 2023
AutoFraudNet: A Multimodal Network to Detect Fraud in the Auto Insurance Industry
Azin Asgarian
Rohit Saha
Daniel Jakubovitz
Julia Peyre
32
2
0
15 Jan 2023
T2M-GPT: Generating Human Motion from Textual Descriptions with Discrete Representations
Jianrong Zhang
Yangsong Zhang
Xiaodong Cun
Shaoli Huang
Yong Zhang
Hongwei Zhao
Hongtao Lu
Xiaodong Shen
55
333
0
15 Jan 2023
Diatom-inspired architected materials using language-based deep learning: Perception, transformation and manufacturing
Markus J. Buehler
AI4CE
16
5
0
14 Jan 2023
GH-Feat: Learning Versatile Generative Hierarchical Features from GANs
Yinghao Xu
Yujun Shen
Jiapeng Zhu
Ceyuan Yang
Bolei Zhou
31
2
0
12 Jan 2023
See, Think, Confirm: Interactive Prompting Between Vision and Language Models for Knowledge-based Visual Reasoning
Zhenfang Chen
Qinhong Zhou
Songlin Yang
Yining Hong
Hao Zhang
Chuang Gan
LRM
VLM
42
36
0
12 Jan 2023
Why is the State of Neural Network Pruning so Confusing? On the Fairness, Comparison Setup, and Trainability in Network Pruning
Huan Wang
Can Qin
Yue Bai
Yun Fu
37
20
0
12 Jan 2023
Scene-centric vs. Object-centric Image-Text Cross-modal Retrieval: A Reproducibility Study
Mariya Hendriksen
Svitlana Vakulenko
E. Kuiper
Maarten de Rijke
53
5
0
12 Jan 2023
Causal Triplet: An Open Challenge for Intervention-centric Causal Representation Learning
Yuejiang Liu
Alexandre Alahi
Chris Russell
Max Horn
Dominik Zietlow
Bernhard Schölkopf
Francesco Locatello
CML
62
22
0
12 Jan 2023
Poses of People in Art: A Data Set for Human Pose Estimation in Digital Art History
Stefanie Schneider
Ricarda Vollmer
3DH
35
5
0
12 Jan 2023
Toward Building General Foundation Models for Language, Vision, and Vision-Language Understanding Tasks
Xinsong Zhang
Yan Zeng
Jipeng Zhang
Hang Li
VLM
AI4CE
LRM
37
17
0
12 Jan 2023
CLIP2Scene: Towards Label-efficient 3D Scene Understanding by CLIP
Runnan Chen
Youquan Liu
Lingdong Kong
Xinge Zhu
Yuexin Ma
Yikang Li
Yuenan Hou
Yu Qiao
Wenping Wang
CLIP
3DPC
33
140
0
12 Jan 2023
SlideVQA: A Dataset for Document Visual Question Answering on Multiple Images
Ryota Tanaka
Kyosuke Nishida
Kosuke Nishida
Taku Hasegawa
Itsumi Saito
Kuniko Saito
25
74
0
12 Jan 2023
Diffusion-based Data Augmentation for Skin Disease Classification: Impact Across Original Medical Datasets to Fully Synthetic Images
Mohamed Akrout
Bálint Gyepesi
P. Holló
A. Poór
Blága Kincso
...
J. Kawahara
Dekker Slade
Latif Abid
Máté Kovács
I. Fazekas
DiffM
MedIm
29
60
0
12 Jan 2023
Artificial Intelligence Generated Coins for Size Comparison
Gerald Artner
37
0
0
11 Jan 2023
HADA: A Graph-based Amalgamation Framework in Image-text Retrieval
Manh-Duy Nguyen
Binh T. Nguyen
C. Gurrin
VLM
28
4
0
11 Jan 2023
EXIF as Language: Learning Cross-Modal Associations Between Images and Camera Metadata
Chenhao Zheng
Ayush Shrivastava
Andrew Owens
VLM
36
11
0
11 Jan 2023
Does progress on ImageNet transfer to real-world datasets?
Alex Fang
Simon Kornblith
Ludwig Schmidt
VLM
39
34
0
11 Jan 2023
Learning to Exploit Temporal Structure for Biomedical Vision-Language Processing
Shruthi Bannur
Stephanie L. Hyland
Qianchu Liu
Fernando Pérez-García
Maximilian Ilse
...
Maria T. A. Wetscherek
M. Lungren
A. Nori
Javier Alvarez-Valle
Ozan Oktay
36
115
0
11 Jan 2023
ChatGPT is not all you need. A State of the Art Review of large Generative AI models
Roberto Gozalo-Brizuela
E.C. Garrido-Merchán
27
261
0
11 Jan 2023
Multimodal Inverse Cloze Task for Knowledge-based Visual Question Answering
Paul Lerner
O. Ferret
C. Guinaudeau
29
9
0
11 Jan 2023
Pix2Map: Cross-modal Retrieval for Inferring Street Maps from Images
Xindi Wu
Kwun-fung Lau
Francesco Ferroni
Aljosa Osep
Deva Ramanan
34
7
0
10 Jan 2023
Transferring Pre-trained Multimodal Representations with Cross-modal Similarity Matching
Byoungjip Kim
Sun Choi
Dasol Hwang
Moontae Lee
Honglak Lee
33
10
0
07 Jan 2023
3DAvatarGAN: Bridging Domains for Personalized Editable Avatars
Rameen Abdal
Hsin-Ying Lee
Peihao Zhu
Menglei Chai
Aliaksandr Siarohin
Peter Wonka
Sergey Tulyakov
3DH
29
47
0
06 Jan 2023
TarViS: A Unified Approach for Target-based Video Segmentation
A. Athar
Alexander Hermans
Jonathon Luiten
Deva Ramanan
Bastian Leibe
VOS
31
29
0
06 Jan 2023
Does compressing activations help model parallel training?
S. Bian
Dacheng Li
Hongyi Wang
Eric P. Xing
Shivaram Venkataraman
40
5
0
06 Jan 2023
In Defense of Structural Symbolic Representation for Video Event-Relation Prediction
Andrew Lu
Xudong Lin
Yulei Niu
Shih-Fu Chang
32
2
0
06 Jan 2023
HierVL: Learning Hierarchical Video-Language Embeddings
Kumar Ashutosh
Rohit Girdhar
Lorenzo Torresani
Kristen Grauman
VLM
AI4TS
33
53
0
05 Jan 2023
What You Say Is What You Show: Visual Narration Detection in Instructional Videos
Kumar Ashutosh
Rohit Girdhar
Lorenzo Torresani
Kristen Grauman
29
4
0
05 Jan 2023
Previous
1
2
3
...
177
178
179
...
205
206
207
Next