Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.08815
Cited By
Efficient Vision-Language Pre-training by Cluster Masking
14 May 2024
Zihao Wei
Zixuan Pan
Andrew Owens
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Efficient Vision-Language Pre-training by Cluster Masking"
31 / 31 papers shown
Title
Progressive Local Alignment for Medical Multimodal Pre-training
Huimin Yan
Xian Yang
Liang Bai
Jiye Liang
87
0
0
25 Feb 2025
SwinMM: Masked Multi-view with Swin Transformers for 3D Medical Image Segmentation
Yiqing Wang
Zihan Li
Jieru Mei
Zi-Ying Wei
Li Liu
Chen Wang
Shengtian Sang
Alan Yuille
Cihang Xie
Yuyin Zhou
ViT
MedIm
52
31
0
24 Jul 2023
When and why vision-language models behave like bags-of-words, and what to do about it?
Mert Yuksekgonul
Federico Bianchi
Pratyusha Kalluri
Dan Jurafsky
James Zou
VLM
CoGe
62
387
0
04 Oct 2022
BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers
Zhiliang Peng
Li Dong
Hangbo Bao
QiXiang Ye
Furu Wei
60
317
0
12 Aug 2022
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLM
VLM
344
3,515
0
29 Apr 2022
Context Autoencoder for Self-Supervised Representation Learning
Xiaokang Chen
Mingyu Ding
Xiaodi Wang
Ying Xin
Shentong Mo
Yunhao Wang
Shumin Han
Ping Luo
Gang Zeng
Jingdong Wang
SSL
83
395
0
07 Feb 2022
SLIP: Self-supervision meets Language-Image Pre-training
Norman Mu
Alexander Kirillov
David Wagner
Saining Xie
VLM
CLIP
129
489
0
23 Dec 2021
Masked Feature Prediction for Self-Supervised Visual Pre-Training
Chen Wei
Haoqi Fan
Saining Xie
Chaoxia Wu
Alan Yuille
Christoph Feichtenhofer
ViT
137
666
0
16 Dec 2021
PeCo: Perceptual Codebook for BERT Pre-training of Vision Transformers
Xiaoyi Dong
Jianmin Bao
Ting Zhang
Dongdong Chen
Weiming Zhang
Lu Yuan
Dong Chen
Fang Wen
Nenghai Yu
Baining Guo
ViT
96
242
0
24 Nov 2021
VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling
Tsu-Jui Fu
Linjie Li
Zhe Gan
Kevin Qinghong Lin
Wenjie Wang
Lijuan Wang
Zicheng Liu
VLM
85
220
0
24 Nov 2021
Florence: A New Foundation Model for Computer Vision
Lu Yuan
Dongdong Chen
Yi-Ling Chen
Noel Codella
Xiyang Dai
...
Zhen Xiao
Jianwei Yang
Michael Zeng
Luowei Zhou
Pengchuan Zhang
VLM
115
904
0
22 Nov 2021
iBOT: Image BERT Pre-Training with Online Tokenizer
Jinghao Zhou
Chen Wei
Huiyu Wang
Wei Shen
Cihang Xie
Alan Yuille
Tao Kong
72
729
0
15 Nov 2021
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
427
7,705
0
11 Nov 2021
Align before Fuse: Vision and Language Representation Learning with Momentum Distillation
Junnan Li
Ramprasaath R. Selvaraju
Akhilesh Deepak Gotmare
Shafiq Joty
Caiming Xiong
Guosheng Lin
FaML
167
1,943
0
16 Jul 2021
BEiT: BERT Pre-Training of Image Transformers
Hangbo Bao
Li Dong
Songhao Piao
Furu Wei
ViT
229
2,812
0
15 Jun 2021
MST: Masked Self-Supervised Transformer for Visual Representation
Zhaowen Li
Zhiyang Chen
Fan Yang
Wei Li
Yousong Zhu
...
Rui Deng
Liwei Wu
Rui Zhao
Ming Tang
Jinqiao Wang
ViT
76
166
0
10 Jun 2021
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
611
6,029
0
29 Apr 2021
An Empirical Study of Training Self-Supervised Vision Transformers
Xinlei Chen
Saining Xie
Kaiming He
ViT
146
1,857
0
05 Apr 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
426
1,120
0
17 Feb 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
419
3,826
0
11 Feb 2021
Training data-efficient image transformers & distillation through attention
Hugo Touvron
Matthieu Cord
Matthijs Douze
Francisco Massa
Alexandre Sablayrolles
Hervé Jégou
ViT
353
6,731
0
23 Dec 2020
The Many Faces of Robustness: A Critical Analysis of Out-of-Distribution Generalization
Dan Hendrycks
Steven Basart
Norman Mu
Saurav Kadavath
Frank Wang
...
Samyak Parajuli
Mike Guo
D. Song
Jacob Steinhardt
Justin Gilmer
OOD
308
1,727
0
29 Jun 2020
A Simple Framework for Contrastive Learning of Visual Representations
Ting-Li Chen
Simon Kornblith
Mohammad Norouzi
Geoffrey E. Hinton
SSL
333
18,721
0
13 Feb 2020
Momentum Contrast for Unsupervised Visual Representation Learning
Kaiming He
Haoqi Fan
Yuxin Wu
Saining Xie
Ross B. Girshick
SSL
173
12,065
0
13 Nov 2019
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
Weijie Su
Xizhou Zhu
Yue Cao
Bin Li
Lewei Lu
Furu Wei
Jifeng Dai
VLM
MLLM
SSL
145
1,661
0
22 Aug 2019
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
Hao Hao Tan
Joey Tianyi Zhou
VLM
MLLM
227
2,474
0
20 Aug 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
529
24,351
0
26 Jul 2019
Natural Adversarial Examples
Dan Hendrycks
Kevin Zhao
Steven Basart
Jacob Steinhardt
D. Song
OODD
193
1,465
0
16 Jul 2019
Learning Robust Global Representations by Penalizing Local Predictive Power
Haohan Wang
Songwei Ge
Eric Xing
Zachary Chase Lipton
OOD
109
957
0
29 May 2019
Neural Discrete Representation Learning
Aaron van den Oord
Oriol Vinyals
Koray Kavukcuoglu
BDL
SSL
OCL
208
4,989
0
02 Nov 2017
Microsoft COCO: Common Objects in Context
Nayeon Lee
Michael Maire
Serge J. Belongie
Lubomir Bourdev
Ross B. Girshick
James Hays
Pietro Perona
Deva Ramanan
C. L. Zitnick
Piotr Dollár
ObjD
379
43,524
0
01 May 2014
1