Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2112.12750
Cited By
SLIP: Self-supervision meets Language-Image Pre-training
23 December 2021
Norman Mu
Alexander Kirillov
David Wagner
Saining Xie
VLM
CLIP
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SLIP: Self-supervision meets Language-Image Pre-training"
50 / 339 papers shown
Title
Advancing Radiograph Representation Learning with Masked Record Modeling
Hong-Yu Zhou
Chenyu Lian
Lian-cheng Wang
Yizhou Yu
MedIm
38
55
0
30 Jan 2023
Masked Autoencoding Does Not Help Natural Language Supervision at Scale
Floris Weers
Vaishaal Shankar
Angelos Katharopoulos
Yinfei Yang
Tom Gunter
CLIP
23
4
0
19 Jan 2023
Learning Customized Visual Models with Retrieval-Augmented Knowledge
Haotian Liu
Kilho Son
Jianwei Yang
Ce Liu
Jianfeng Gao
Yong Jae Lee
Chunyuan Li
VLM
40
53
0
17 Jan 2023
Vision Learners Meet Web Image-Text Pairs
Bingchen Zhao
Quan Cui
Hao Wu
Osamu Yoshie
Cheng Yang
Oisin Mac Aodha
VLM
27
5
0
17 Jan 2023
RILS: Masked Visual Reconstruction in Language Semantic Space
Shusheng Yang
Yixiao Ge
Kun Yi
Dian Li
Ying Shan
Xiaohu Qie
Xinggang Wang
CLIP
43
11
0
17 Jan 2023
Multimodality Helps Unimodality: Cross-Modal Few-Shot Learning with Multimodal Models
Zhiqiu Lin
Samuel Yu
Zhiyi Kuang
Deepak Pathak
Deva Ramana
VLM
20
100
0
16 Jan 2023
CiT: Curation in Training for Effective Vision-Language Data
Hu Xu
Saining Xie
Po-Yao (Bernie) Huang
Licheng Yu
Russ Howes
Gargi Ghosh
Luke Zettlemoyer
Christoph Feichtenhofer
VLM
DiffM
33
25
0
05 Jan 2023
FICE: Text-Conditioned Fashion Image Editing With Guided GAN Inversion
Martin Pernuš
Clinton Fookes
Vitomir Štruc
Simon Dobrišek
DiffM
21
27
0
05 Jan 2023
Contrastive Language-Vision AI Models Pretrained on Web-Scraped Multimodal Data Exhibit Sexual Objectification Bias
Robert Wolfe
Yiwei Yang
Billy Howe
Aylin Caliskan
DiffM
15
51
0
21 Dec 2022
Attentive Mask CLIP
Yifan Yang
Weiquan Huang
Yixuan Wei
Houwen Peng
Xinyang Jiang
...
Fangyun Wei
Yin Wang
Han Hu
Lili Qiu
Yuqing Yang
CLIP
VLM
42
27
0
16 Dec 2022
Retrieval-based Disentangled Representation Learning with Natural Language Supervision
Jiawei Zhou
Xiaoguang Li
Lifeng Shang
Xin Jiang
Qun Liu
L. Chen
DRL
35
6
0
15 Dec 2022
NLIP: Noise-robust Language-Image Pre-training
Runhu Huang
Yanxin Long
Jianhua Han
Hang Xu
Xiwen Liang
Chunjing Xu
Xiaodan Liang
VLM
41
30
0
14 Dec 2022
Significantly Improving Zero-Shot X-ray Pathology Classification via Fine-tuning Pre-trained Image-Text Encoders
Jongseong Jang
Daeun Kyung
Seunghyeon Kim
Honglak Lee
Kyunghoon Bae
Edward Choi
LM&MA
MedIm
32
10
0
14 Dec 2022
TIER: Text-Image Entropy Regularization for CLIP-style models
Anil Palepu
Andrew L. Beam
MedIm
26
6
0
13 Dec 2022
Group Generalized Mean Pooling for Vision Transformer
ByungSoo Ko
Han-Gyu Kim
Byeongho Heo
Sangdoo Yun
Sanghyuk Chun
Geonmo Gu
Wonjae Kim
ViT
27
1
0
08 Dec 2022
Scaling Language-Image Pre-training via Masking
Yanghao Li
Haoqi Fan
Ronghang Hu
Christoph Feichtenhofer
Kaiming He
CLIP
VLM
27
318
0
01 Dec 2022
Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles
Shuquan Ye
Yujia Xie
Dongdong Chen
Yichong Xu
Lu Yuan
Chenguang Zhu
Jing Liao
VLM
27
11
0
29 Nov 2022
Context-Aware Robust Fine-Tuning
Xiaofeng Mao
YueFeng Chen
Xiaojun Jia
Rong Zhang
Hui Xue
Zhao Li
VLM
CLIP
35
25
0
29 Nov 2022
ComCLIP: Training-Free Compositional Image and Text Matching
Kenan Jiang
Xuehai He
Ruize Xu
Qing Guo
VLM
CLIP
CoGe
19
20
0
25 Nov 2022
Unifying Vision-Language Representation Space with Single-tower Transformer
Jiho Jang
Chaerin Kong
D. Jeon
Seonhoon Kim
Nojun Kwak
25
19
0
21 Nov 2022
Task Residual for Tuning Vision-Language Models
Tao Yu
Zhihe Lu
Xin Jin
Zhibo Chen
Xinchao Wang
VLM
CLIP
24
82
0
18 Nov 2022
Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information
Weijie Su
Xizhou Zhu
Chenxin Tao
Lewei Lu
Bin Li
Gao Huang
Yu Qiao
Xiaogang Wang
Jie Zhou
Jifeng Dai
42
41
0
17 Nov 2022
ContextCLIP: Contextual Alignment of Image-Text pairs on CLIP visual representations
Chanda Grover
Indra Deep Mastan
Debayan Gupta
VLM
CLIP
24
4
0
14 Nov 2022
Hear The Flow: Optical Flow-Based Self-Supervised Visual Sound Source Localization
Dennis Fedorishin
D. Mohan
Bhavin Jawade
S. Setlur
V. Govindaraju
VGen
19
10
0
06 Nov 2022
VTC: Improving Video-Text Retrieval with User Comments
Laura Hanu
James Thewlis
Yuki M. Asano
Christian Rupprecht
VGen
32
7
0
19 Oct 2022
A Unified View of Masked Image Modeling
Zhiliang Peng
Li Dong
Hangbo Bao
QiXiang Ye
Furu Wei
VLM
56
35
0
19 Oct 2022
Non-Contrastive Learning Meets Language-Image Pre-Training
Jinghao Zhou
Li Dong
Zhe Gan
Lijuan Wang
Furu Wei
VLM
CLIP
25
26
0
17 Oct 2022
VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature Alignment
Shraman Pramanick
Li Jing
Sayan Nag
Jiachen Zhu
Hardik Shah
Yann LeCun
Ramalingam Chellappa
32
21
0
09 Oct 2022
ERNIE-ViL 2.0: Multi-view Contrastive Learning for Image-Text Pre-training
Bin Shan
Weichong Yin
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
VLM
27
19
0
30 Sep 2022
Data Poisoning Attacks Against Multimodal Encoders
Ziqing Yang
Xinlei He
Zheng Li
Michael Backes
Mathias Humbert
Pascal Berrang
Yang Zhang
AAML
116
45
0
30 Sep 2022
Understanding Pure CLIP Guidance for Voxel Grid NeRF Models
Han-Hung Lee
Angel X. Chang
24
63
0
30 Sep 2022
UniCLIP: Unified Framework for Contrastive Language-Image Pre-training
Janghyeon Lee
Jongsuk Kim
Hyounguk Shon
Bumsoo Kim
Seung Wook Kim
Honglak Lee
Junmo Kim
CLIP
VLM
54
53
0
27 Sep 2022
GAMA: Generative Adversarial Multi-Object Scene Attacks
Abhishek Aich
Calvin-Khang Ta
Akash Gupta
Chengyu Song
S. Krishnamurthy
M. Salman Asif
A. Roy-Chowdhury
AAML
51
17
0
20 Sep 2022
Non-Linguistic Supervision for Contrastive Learning of Sentence Embeddings
Yiren Jian
Chongyang Gao
Soroush Vosoughi
SSL
31
15
0
20 Sep 2022
Design of the topology for contrastive visual-textual alignment
Zhun Sun
30
1
0
05 Sep 2022
Injecting Image Details into CLIP's Feature Space
Zilun Zhang
Cuifeng Shen
Yuan-Chung Shen
Huixin Xiong
Xinyu Zhou
VLM
CLIP
30
0
0
31 Aug 2022
Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment
Mustafa Shukor
Guillaume Couairon
Matthieu Cord
VLM
CLIP
24
27
0
29 Aug 2022
Prompt Tuning with Soft Context Sharing for Vision-Language Models
Kun Ding
Ying Wang
Pengzhang Liu
Qiang Yu
Hao Zhang
Shiming Xiang
Chunhong Pan
VPVLM
VLM
29
14
0
29 Aug 2022
MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining
Xiaoyi Dong
Jianmin Bao
Yinglin Zheng
Ting Zhang
Dongdong Chen
...
Weiming Zhang
Lu Yuan
Dong Chen
Fang Wen
Nenghai Yu
CLIP
VLM
54
158
0
25 Aug 2022
Contrastive Audio-Language Learning for Music
Ilaria Manco
Emmanouil Benetos
Elio Quinton
Gyorgy Fazekas
27
44
0
25 Aug 2022
Open Vocabulary Multi-Label Classification with Dual-Modal Decoder on Aligned Visual-Textual Features
Shichao Xu
Yikang Li
Jenhao Hsiao
C. Ho
Zhuang Qi
14
7
0
19 Aug 2022
MILAN: Masked Image Pretraining on Language Assisted Representation
Zejiang Hou
Fei Sun
Yen-kuang Chen
Yuan Xie
S. Kung
ViT
34
68
0
11 Aug 2022
Quality Not Quantity: On the Interaction between Dataset Design and Robustness of CLIP
Thao Nguyen
Gabriel Ilharco
Mitchell Wortsman
Sewoong Oh
Ludwig Schmidt
CLIP
VLM
47
99
0
10 Aug 2022
A Survey on Masked Autoencoder for Self-supervised Learning in Vision and Beyond
Chaoning Zhang
Chenshuang Zhang
Junha Song
John Seon Keun Yi
Kang Zhang
In So Kweon
SSL
57
71
0
30 Jul 2022
V
2
^2
2
L: Leveraging Vision and Vision-language Models into Large-scale Product Retrieval
Wenhao Wang
Yifan Sun
Zongxin Yang
Yi Yang
VLM
24
3
0
26 Jul 2022
Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training
Haoxuan You
Luowei Zhou
Bin Xiao
Noel Codella
Yu Cheng
Ruochen Xu
Shih-Fu Chang
Lu Yuan
CLIP
VLM
27
47
0
26 Jul 2022
Is a Caption Worth a Thousand Images? A Controlled Study for Representation Learning
Shibani Santurkar
Yann Dubois
Rohan Taori
Percy Liang
Tatsunori Hashimoto
CLIP
VLM
19
41
0
15 Jul 2022
Contrastive Adapters for Foundation Model Group Robustness
Michael Zhang
Christopher Ré
VLM
18
62
0
14 Jul 2022
IDEA: Increasing Text Diversity via Online Multi-Label Recognition for Vision-Language Pre-training
Xinyu Huang
Youcai Zhang
Ying Cheng
Weiwei Tian
Ruiwei Zhao
Rui Feng
Yuejie Zhang
Yaqian Li
Yandong Guo
Xuanyang Zhang
VLM
21
14
0
12 Jul 2022
American == White in Multimodal Language-and-Image AI
Robert Wolfe
Aylin Caliskan
VLM
29
46
0
01 Jul 2022
Previous
1
2
3
4
5
6
7
Next