Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2212.00794
Cited By
Scaling Language-Image Pre-training via Masking
1 December 2022
Yanghao Li
Haoqi Fan
Ronghang Hu
Christoph Feichtenhofer
Kaiming He
CLIP
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Scaling Language-Image Pre-training via Masking"
50 / 249 papers shown
Title
Aligning Medical Images with General Knowledge from Large Language Models
X. B. Fang
Yi Lin
Dong Zhang
Kwang-Ting Cheng
Hao Chen
LM&MA
VLM
38
4
0
31 Aug 2024
Symmetric masking strategy enhances the performance of Masked Image Modeling
Khanh-Binh Nguyen
Chae Jung Park
42
0
0
23 Aug 2024
CLIP-CID: Efficient CLIP Distillation via Cluster-Instance Discrimination
Kaicheng Yang
Tiancheng Gu
Xiang An
Haiqiang Jiang
Xiangzi Dai
Ziyong Feng
Weidong Cai
Jiankang Deng
VLM
54
7
0
18 Aug 2024
Towards Linguistic Neural Representation Learning and Sentence Retrieval from Electroencephalogram Recordings
Jinzhao Zhou
Yiqun Duan
Ziyi Zhao
Yu-Cheng Chang
Yu-Kai Wang
T. Do
Chin-Teng Lin
52
1
0
08 Aug 2024
ComKD-CLIP: Comprehensive Knowledge Distillation for Contrastive Language-Image Pre-traning Model
Yifan Chen
Xiaozhen Qiao
Zhe Sun
Xuelong Li
VLM
45
3
0
08 Aug 2024
Multistain Pretraining for Slide Representation Learning in Pathology
Guillaume Jaume
Anurag J. Vaidya
Andrew Zhang
Andrew H. Song
Richard J. Chen
S. Sahai
Dandan Mo
Emilio Madrigal
L. Le
Faisal Mahmood
36
12
0
05 Aug 2024
Unsupervised Domain Adaption Harnessing Vision-Language Pre-training
Wenlve Zhou
Zhiheng Zhou
VLM
38
33
0
05 Aug 2024
Text-Guided Video Masked Autoencoder
D. Fan
Jue Wang
Shuai Liao
Zhikang Zhang
Vimal Bhat
Xinyu Li
VGen
36
3
0
01 Aug 2024
UniProcessor: A Text-induced Unified Low-level Image Processor
Huiyu Duan
Xiongkuo Min
Sijing Wu
Wei Shen
Guangtao Zhai
DiffM
47
8
0
30 Jul 2024
ActivityCLIP: Enhancing Group Activity Recognition by Mining Complementary Information from Text to Supplement Image Modality
Guoliang Xu
Jianqin Yin
Feng Zhou
Yonghao Dang
VLM
41
0
0
29 Jul 2024
MMCLIP: Cross-modal Attention Masked Modelling for Medical Language-Image Pre-Training
Biao Wu
Yutong Xie
Zeyu Zhang
Minh Hieu Phan
Qi Chen
Ling-Hao Chen
Qi Wu
LM&MA
47
0
0
28 Jul 2024
Multi-label Cluster Discrimination for Visual Representation Learning
Xiang An
Kaicheng Yang
Xiangzi Dai
Ziyong Feng
Jiankang Deng
VLM
45
6
0
24 Jul 2024
XMeCap: Meme Caption Generation with Sub-Image Adaptability
Yuyan Chen
Songzhou Yan
Zhihong Zhu
Zhixu Li
Yanghua Xiao
VLM
49
10
0
24 Jul 2024
Accelerating Pre-training of Multimodal LLMs via Chain-of-Sight
Ziyuan Huang
Kaixiang Ji
Biao Gong
Zhiwu Qing
Qinglong Zhang
Kecheng Zheng
Jian Wang
Jingdong Chen
Ming Yang
LRM
42
1
0
22 Jul 2024
Braille-to-Speech Generator: Audio Generation Based on Joint Fine-Tuning of CLIP and Fastspeech2
Chun Xu
En-Wei Sun
36
0
0
19 Jul 2024
Rethinking Visual Content Refinement in Low-Shot CLIP Adaptation
Jinda Lu
Shuo Wang
Yanbin Hao
Haifeng Liu
Xiang Wang
Meng Wang
30
2
0
19 Jul 2024
OmniBind: Large-scale Omni Multimodal Representation via Binding Spaces
Zehan Wang
Ziang Zhang
Hang Zhang
Luping Liu
Rongjie Huang
Xize Cheng
Hengshuang Zhao
Zhou Zhao
46
9
0
16 Jul 2024
Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Cross-Regularization
Jinlong Li
Zequn Jie
Elisa Ricci
Lin Ma
N. Sebe
VLM
39
0
0
11 Jul 2024
Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions
Yu-Guan Hsieh
Cheng-Yu Hsieh
Shih-Ying Yeh
Louis Béthune
Hadi Pour Ansari
Pavan Kumar Anasosalu Vasu
Chun-Liang Li
Ranjay Krishna
Oncel Tuzel
Marco Cuturi
66
4
0
09 Jul 2024
CBM: Curriculum by Masking
Andrei Jarca
Florinel-Alin Croitoru
Radu Tudor Ionescu
40
0
0
06 Jul 2024
How JEPA Avoids Noisy Features: The Implicit Bias of Deep Linear Self Distillation Networks
Etai Littwin
Omid Saremi
Madhu Advani
Vimal Thilak
Preetum Nakkiran
Chen Huang
Joshua Susskind
46
3
0
03 Jul 2024
FastCLIP: A Suite of Optimization Techniques to Accelerate CLIP Training with Limited Resources
Xiyuan Wei
Fanjiang Ye
Ori Yonay
Xingyu Chen
Baixi Sun
Dingwen Tao
Tianbao Yang
VLM
CLIP
59
2
0
01 Jul 2024
Learning Robust 3D Representation from CLIP via Dual Denoising
Shuqing Luo
Bowen Qu
Wei-Nan Gao
51
1
0
01 Jul 2024
Dynamic Data Pruning for Automatic Speech Recognition
Q. Xiao
Pingchuan Ma
Adriana Fernandez-Lopez
Boqian Wu
Lu Yin
Stavros Petridis
Mykola Pechenizkiy
Maja Pantic
Decebal Constantin Mocanu
Shiwei Liu
36
1
0
26 Jun 2024
PianoBART: Symbolic Piano Music Generation and Understanding with Large-Scale Pre-Training
Xiao Liang
Zijian Zhao
Weichao Zeng
Yutong He
Fupeng He
Yiyi Wang
Chengying Gao
48
5
0
26 Jun 2024
GraphSnapShot: Caching Local Structure for Fast Graph Learning
Dong Liu
R. Waleffe
Meng Jiang
Shivaram Venkataraman
GNN
3DH
31
0
0
25 Jun 2024
A Simple Framework for Open-Vocabulary Zero-Shot Segmentation
Thomas Stegmüller
Tim Lebailly
Nikola Dukic
Behzad Bozorgtabar
Tinne Tuytelaars
Jean-Philippe Thiran
VLM
39
1
0
23 Jun 2024
An Image is Worth 32 Tokens for Reconstruction and Generation
Qihang Yu
Mark Weber
XueQing Deng
Xiaohui Shen
Daniel Cremers
Liang-Chieh Chen
VLM
ViT
60
85
0
11 Jun 2024
Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
Chenyu Yang
Xizhou Zhu
Jinguo Zhu
Weijie Su
Junjie Wang
...
Lewei Lu
Bin Li
Jie Zhou
Yu Qiao
Jifeng Dai
VLM
CLIP
47
5
0
11 Jun 2024
Benchmarking Vision-Language Contrastive Methods for Medical Representation Learning
Shuvendu Roy
Yasaman Parhizkar
Franklin Ogidi
Vahid Reza Khazaie
Michael Colacci
Ali Etemad
Elham Dolatabadi
Arash Afkanpour
VLM
52
1
0
11 Jun 2024
Gentle-CLIP: Exploring Aligned Semantic In Low-Quality Multimodal Data With Soft Alignment
Zijia Song
Z. Zang
Yelin Wang
Guozheng Yang
Jiangbin Zheng
Kaicheng Yu
Wanyu Chen
Stan Z. Li
44
1
0
09 Jun 2024
OVMR: Open-Vocabulary Recognition with Multi-Modal References
Zehong Ma
Shiliang Zhang
Longhui Wei
Qi Tian
VLM
44
0
0
07 Jun 2024
VISTA: Visualized Text Embedding For Universal Multi-Modal Retrieval
Yueze Wang
Zheng Liu
Shitao Xiao
Bo Zhao
Yongping Xiong
51
22
0
06 Jun 2024
FILS: Self-Supervised Video Feature Prediction In Semantic Language Space
Mona Ahmadian
Frank Guerin
Andrew Gilbert
44
1
0
05 Jun 2024
MLIP: Efficient Multi-Perspective Language-Image Pretraining with Exhaustive Data Utilization
Yu Zhang
Qi Zhang
Zixuan Gong
Yiwei Shi
Yepeng Liu
...
Ke Liu
Kun Yi
Wei Fan
Liang Hu
Changwei Wang
CLIP
VLM
64
3
0
03 Jun 2024
ED-SAM: An Efficient Diffusion Sampling Approach to Domain Generalization in Vision-Language Foundation Models
Thanh-Dat Truong
Xin Li
Bhiksha Raj
Jackson Cothren
Khoa Luu
DiffM
VLM
54
1
0
03 Jun 2024
Scaling White-Box Transformers for Vision
Jinrui Yang
Xianhang Li
Druv Pai
Yuyin Zhou
Yi Ma
Yaodong Yu
Cihang Xie
ViT
44
9
0
30 May 2024
Enhancing Vision-Language Model with Unmasked Token Alignment
Jihao Liu
Jinliang Zheng
Boxiao Liu
Yu Liu
Hongsheng Li
CLIP
32
0
0
29 May 2024
OUS: Scene-Guided Dynamic Facial Expression Recognition
Xinji Mai
Haoran Wang
Zeng Tao
Junxiong Lin
Shaoqi Yan
...
Jing Liu
Jiawen Yu
Xuan Tong
Yating Li
Wenqiang Zhang
33
3
0
29 May 2024
OV-DQUO: Open-Vocabulary DETR with Denoising Text Query Training and Open-World Unknown Objects Supervision
Junjie Wang
Bin Chen
Bin Kang
Yulin Li
Yichi Chen
Weizhi Xian
Huifeng Chang
VLM
ObjD
36
7
0
28 May 2024
Diagnosing the Compositional Knowledge of Vision Language Models from a Game-Theoretic View
Jin Wang
Shichao Dong
Yapeng Zhu
Kelu Yao
Weidong Zhao
Chao Li
Ping Luo
CoGe
LRM
50
2
0
27 May 2024
ARVideo: Autoregressive Pretraining for Self-Supervised Video Representation Learning
Sucheng Ren
Hongru Zhu
Chen Wei
Yijiang Li
Alan Yuille
Cihang Xie
AI4TS
VGen
SSL
59
1
0
24 May 2024
Configuring Data Augmentations to Reduce Variance Shift in Positional Embedding of Vision Transformers
Bum Jun Kim
Sang Woo Kim
ViT
43
1
0
23 May 2024
Transcriptomics-guided Slide Representation Learning in Computational Pathology
Guillaume Jaume
Lukas Oldenburg
Anurag J. Vaidya
Richard J. Chen
Drew F. K. Williamson
Thomas Peeters
Andrew H. Song
Faisal Mahmood
47
25
0
19 May 2024
Enhancing Fine-Grained Image Classifications via Cascaded Vision Language Models
Canshi Wei
VLM
32
0
0
18 May 2024
Efficient Vision-Language Pre-training by Cluster Masking
Zihao Wei
Zixuan Pan
Andrew Owens
VLM
29
8
0
14 May 2024
You Only Need Half: Boosting Data Augmentation by Using Partial Content
Juntao Hu
Yuan Wu
38
1
0
05 May 2024
Few Shot Class Incremental Learning using Vision-Language models
Anurag Kumar
Chinmay Bharti
Saikat Dutta
Srikrishna Karanam
Biplab Banerjee
VLM
CLL
38
0
0
02 May 2024
Modeling Caption Diversity in Contrastive Vision-Language Pretraining
Samuel Lavoie
Polina Kirichenko
Mark Ibrahim
Mahmoud Assran
Andrew Gordon Wilson
Aaron Courville
Nicolas Ballas
CLIP
VLM
69
22
0
30 Apr 2024
Semi-supervised Text-based Person Search
Daming Gao
Yang Bai
Min Cao
Hao Dou
Mang Ye
Min Zhang
41
1
0
28 Apr 2024
Previous
1
2
3
4
5
Next