ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.00794
  4. Cited By
Scaling Language-Image Pre-training via Masking

Scaling Language-Image Pre-training via Masking

1 December 2022
Yanghao Li
Haoqi Fan
Ronghang Hu
Christoph Feichtenhofer
Kaiming He
    CLIP
    VLM
ArXivPDFHTML

Papers citing "Scaling Language-Image Pre-training via Masking"

49 / 249 papers shown
Title
Make A Long Image Short: Adaptive Token Length for Vision Transformers
Make A Long Image Short: Adaptive Token Length for Vision Transformers
Yuqin Zhu
Yichen Zhu
ViT
72
17
0
05 Jul 2023
Towards Open Vocabulary Learning: A Survey
Towards Open Vocabulary Learning: A Survey
Jianzong Wu
Xiangtai Li
Shilin Xu
Haobo Yuan
Henghui Ding
...
Jiangning Zhang
Yu Tong
Xudong Jiang
Guohao Li
Dacheng Tao
ObjD
VLM
45
136
0
28 Jun 2023
CLIPA-v2: Scaling CLIP Training with 81.1% Zero-shot ImageNet Accuracy
  within a \$10,000 Budget; An Extra \$4,000 Unlocks 81.8% Accuracy
CLIPA-v2: Scaling CLIP Training with 81.1% Zero-shot ImageNet Accuracy within a \10,000 Budget; An Extra \4,000 Unlocks 81.8% Accuracy
Xianhang Li
Zeyu Wang
Cihang Xie
CLIP
VLM
56
19
0
27 Jun 2023
RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large
  Vision-Language Model for Remote Sensing
RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large Vision-Language Model for Remote Sensing
Zilun Zhang
Tiancheng Zhao
Yulong Guo
Jianwei Yin
DiffM
VLM
32
54
0
20 Jun 2023
RemoteCLIP: A Vision Language Foundation Model for Remote Sensing
RemoteCLIP: A Vision Language Foundation Model for Remote Sensing
F. Liu
Delong Chen
Zhan-Rong Guan
Xiaocong Zhou
Jiale Zhu
Qiaolin Ye
Liyong Fu
Jun Zhou
VLM
71
193
0
19 Jun 2023
Segment Any Point Cloud Sequences by Distilling Vision Foundation Models
Segment Any Point Cloud Sequences by Distilling Vision Foundation Models
You-Chen Liu
Lingdong Kong
Jun Cen
Runnan Chen
Wenwei Zhang
Liang Pan
Kai-xiang Chen
Ziwei Liu
37
83
0
15 Jun 2023
Fast Training of Diffusion Models with Masked Transformers
Fast Training of Diffusion Models with Masked Transformers
Hongkai Zheng
Weili Nie
Arash Vahdat
Anima Anandkumar
DiffM
45
68
0
15 Jun 2023
ViP: A Differentially Private Foundation Model for Computer Vision
ViP: A Differentially Private Foundation Model for Computer Vision
Yaodong Yu
Maziar Sanjabi
Yi Ma
Kamalika Chaudhuri
Chuan Guo
24
12
0
15 Jun 2023
MOFI: Learning Image Representations from Noisy Entity Annotated Images
MOFI: Learning Image Representations from Noisy Entity Annotated Images
Wentao Wu
Aleksei Timofeev
Chen Chen
Bowen Zhang
Kun Duan
...
Yantao Zheng
Jonathon Shlens
Xianzhi Du
Zhe Gan
Yinfei Yang
VLM
26
7
0
13 Jun 2023
EventCLIP: Adapting CLIP for Event-based Object Recognition
EventCLIP: Adapting CLIP for Event-based Object Recognition
Ziyi Wu
Xudong Liu
Igor Gilitschenski
VLM
37
15
0
10 Jun 2023
Generative Text-Guided 3D Vision-Language Pretraining for Unified
  Medical Image Segmentation
Generative Text-Guided 3D Vision-Language Pretraining for Unified Medical Image Segmentation
Yinda Chen
Che Liu
Wei Huang
Sibo Cheng
Rossella Arcucci
Zhiwei Xiong
VLM
MedIm
34
48
0
07 Jun 2023
UniBoost: Unsupervised Unimodal Pre-training for Boosting Zero-shot
  Vision-Language Tasks
UniBoost: Unsupervised Unimodal Pre-training for Boosting Zero-shot Vision-Language Tasks
Yanan Sun
Zi-Qi Zhong
Qi Fan
Chi-Keung Tang
Yu-Wing Tai
VLM
35
4
0
07 Jun 2023
Fine-Grained Visual Prompting
Fine-Grained Visual Prompting
Lingfeng Yang
Yueze Wang
Xiang Li
Xinlong Wang
Jian Yang
ObjD
VLM
37
61
0
07 Jun 2023
On the Generalization of Multi-modal Contrastive Learning
On the Generalization of Multi-modal Contrastive Learning
Qi Zhang
Yifei Wang
Yisen Wang
22
24
0
07 Jun 2023
Learning Probabilistic Symmetrization for Architecture Agnostic
  Equivariance
Learning Probabilistic Symmetrization for Architecture Agnostic Equivariance
Jinwoo Kim
Tien Dat Nguyen
Ayhan Suleymanzade
Hyeokjun An
Seunghoon Hong
52
23
0
05 Jun 2023
Recent Advances of Local Mechanisms in Computer Vision: A Survey and
  Outlook of Recent Work
Recent Advances of Local Mechanisms in Computer Vision: A Survey and Outlook of Recent Work
Qiangchang Wang
Yilong Yin
43
0
0
02 Jun 2023
Improving CLIP Training with Language Rewrites
Improving CLIP Training with Language Rewrites
Lijie Fan
Dilip Krishnan
Phillip Isola
Dina Katabi
Yonglong Tian
BDL
VLM
CLIP
33
157
0
31 May 2023
TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at
  Scale
TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at Scale
Ziyun Zeng
Yixiao Ge
Zhan Tong
Xihui Liu
Shutao Xia
Ying Shan
24
9
0
23 May 2023
S-CLIP: Semi-supervised Vision-Language Learning using Few Specialist
  Captions
S-CLIP: Semi-supervised Vision-Language Learning using Few Specialist Captions
Sangwoo Mo
Minkyu Kim
Kyungmin Lee
Jinwoo Shin
VLM
CLIP
44
22
0
23 May 2023
Enhancing Vision-Language Pre-Training with Jointly Learned Questioner
  and Dense Captioner
Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner
Zikang Liu
Sihan Chen
Longteng Guo
Handong Li
Xingjian He
Qingbin Liu
15
1
0
19 May 2023
Improved baselines for vision-language pre-training
Improved baselines for vision-language pre-training
Enrico Fini
Pietro Astolfi
Adriana Romero Soriano
Jakob Verbeek
M. Drozdzal
SSL
CLIP
VLM
53
22
0
15 May 2023
An Inverse Scaling Law for CLIP Training
An Inverse Scaling Law for CLIP Training
Xianhang Li
Zeyu Wang
Cihang Xie
VLM
CLIP
48
55
0
11 May 2023
Self-Chained Image-Language Model for Video Localization and Question
  Answering
Self-Chained Image-Language Model for Video Localization and Question Answering
Shoubin Yu
Jaemin Cho
Prateek Yadav
Joey Tianyi Zhou
56
130
0
11 May 2023
Less is More: Removing Text-regions Improves CLIP Training Efficiency
  and Robustness
Less is More: Removing Text-regions Improves CLIP Training Efficiency and Robustness
Liangliang Cao
Bowen Zhang
Chen Chen
Yinfei Yang
Xianzhi Du
Wen‐Cheng Zhang
Zhiyun Lu
Yantao Zheng
CLIP
VLM
29
15
0
08 May 2023
Stable and low-precision training for large-scale vision-language models
Stable and low-precision training for large-scale vision-language models
Mitchell Wortsman
Tim Dettmers
Luke Zettlemoyer
Ari S. Morcos
Ali Farhadi
Ludwig Schmidt
MQ
MLLM
VLM
24
39
0
25 Apr 2023
Transformer-Based Visual Segmentation: A Survey
Transformer-Based Visual Segmentation: A Survey
Xiangtai Li
Henghui Ding
Haobo Yuan
Wenwei Zhang
Jiangmiao Pang
Guangliang Cheng
Kai-xiang Chen
Ziwei Liu
Chen Change Loy
ViT
MedIm
42
132
0
19 Apr 2023
RECLIP: Resource-efficient CLIP by Training with Small Images
RECLIP: Resource-efficient CLIP by Training with Small Images
Runze Li
Dahun Kim
B. Bhanu
Weicheng Kuo
VLM
CLIP
36
13
0
12 Apr 2023
Not All Features Matter: Enhancing Few-shot CLIP with Adaptive Prior
  Refinement
Not All Features Matter: Enhancing Few-shot CLIP with Adaptive Prior Refinement
Xiang-yu Zhu
Renrui Zhang
Bowei He
A-Long Zhou
Dong Wang
Bingyan Zhao
Peng Gao
VLM
42
79
0
03 Apr 2023
DIME-FM: DIstilling Multimodal and Efficient Foundation Models
DIME-FM: DIstilling Multimodal and Efficient Foundation Models
Ximeng Sun
Pengchuan Zhang
Peizhao Zhang
Hardik Shah
Kate Saenko
Xide Xia
VLM
25
20
0
31 Mar 2023
Self-Supervised Multimodal Learning: A Survey
Self-Supervised Multimodal Learning: A Survey
Yongshuo Zong
Oisin Mac Aodha
Timothy M. Hospedales
SSL
24
44
0
31 Mar 2023
Task-Oriented Multi-Modal Mutual Leaning for Vision-Language Models
Task-Oriented Multi-Modal Mutual Leaning for Vision-Language Models
Sifan Long
Zhen Zhao
Junkun Yuan
Zichang Tan
Jiangjiang Liu
Luping Zhou
Sheng-sheng Wang
Jingdong Wang
VLM
31
2
0
30 Mar 2023
EVA-CLIP: Improved Training Techniques for CLIP at Scale
EVA-CLIP: Improved Training Techniques for CLIP at Scale
Quan-Sen Sun
Yuxin Fang
Ledell Yu Wu
Xinlong Wang
Yue Cao
CLIP
VLM
81
470
0
27 Mar 2023
Sigmoid Loss for Language Image Pre-Training
Sigmoid Loss for Language Image Pre-Training
Xiaohua Zhai
Basil Mustafa
Alexander Kolesnikov
Lucas Beyer
CLIP
VLM
36
967
0
27 Mar 2023
EVA-02: A Visual Representation for Neon Genesis
EVA-02: A Visual Representation for Neon Genesis
Yuxin Fang
Quan-Sen Sun
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
VLM
ViT
CLIP
42
261
0
20 Mar 2023
A Categorical Framework of General Intelligence
A Categorical Framework of General Intelligence
Yang Yuan
28
1
0
08 Mar 2023
AV-data2vec: Self-supervised Learning of Audio-Visual Speech
  Representations with Contextualized Target Representations
AV-data2vec: Self-supervised Learning of Audio-Visual Speech Representations with Contextualized Target Representations
Jiachen Lian
Alexei Baevski
Wei-Ning Hsu
Michael Auli
SSL
40
34
0
10 Feb 2023
Glaze: Protecting Artists from Style Mimicry by Text-to-Image Models
Glaze: Protecting Artists from Style Mimicry by Text-to-Image Models
Shawn Shan
Jenna Cryan
Emily Wenger
Haitao Zheng
Rana Hanocka
Ben Y. Zhao
WIGM
17
177
0
08 Feb 2023
Contrast with Reconstruct: Contrastive 3D Representation Learning Guided
  by Generative Pretraining
Contrast with Reconstruct: Contrastive 3D Representation Learning Guided by Generative Pretraining
Zekun Qi
Runpei Dong
Guo Fan
Zheng Ge
Xiangyu Zhang
Kaisheng Ma
Li Yi
38
118
0
05 Feb 2023
A Survey on Efficient Training of Transformers
A Survey on Efficient Training of Transformers
Bohan Zhuang
Jing Liu
Zizheng Pan
Haoyu He
Yuetian Weng
Chunhua Shen
31
47
0
02 Feb 2023
SimMTM: A Simple Pre-Training Framework for Masked Time-Series Modeling
SimMTM: A Simple Pre-Training Framework for Masked Time-Series Modeling
Jiaxiang Dong
Haixu Wu
Haoran Zhang
Li Zhang
Jianmin Wang
Mingsheng Long
AI4TS
40
83
0
02 Feb 2023
Masked Autoencoding Does Not Help Natural Language Supervision at Scale
Masked Autoencoding Does Not Help Natural Language Supervision at Scale
Floris Weers
Vaishaal Shankar
Angelos Katharopoulos
Yinfei Yang
Tom Gunter
CLIP
23
4
0
19 Jan 2023
UnICLAM:Contrastive Representation Learning with Adversarial Masking for
  Unified and Interpretable Medical Vision Question Answering
UnICLAM:Contrastive Representation Learning with Adversarial Masking for Unified and Interpretable Medical Vision Question Answering
Chenlu Zhan
Peng Peng
Hongsen Wang
Tao Chen
Hongwei Wang
MedIm
25
3
0
21 Dec 2022
Attentive Mask CLIP
Attentive Mask CLIP
Yifan Yang
Weiquan Huang
Yixuan Wei
Houwen Peng
Xinyang Jiang
...
Fangyun Wei
Yin Wang
Han Hu
Lili Qiu
Yuqing Yang
CLIP
VLM
42
27
0
16 Dec 2022
Perceptual Grouping in Contrastive Vision-Language Models
Perceptual Grouping in Contrastive Vision-Language Models
Kanchana Ranasinghe
Brandon McKinzie
S. S. Ravi
Yinfei Yang
Alexander Toshev
Jonathon Shlens
VLM
30
51
0
18 Oct 2022
VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature
  Alignment
VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature Alignment
Shraman Pramanick
Li Jing
Sayan Nag
Jiachen Zhu
Hardik Shah
Yann LeCun
Ramalingam Chellappa
32
21
0
09 Oct 2022
Masked World Models for Visual Control
Masked World Models for Visual Control
Younggyo Seo
Danijar Hafner
Hao Liu
Fangchen Liu
Stephen James
Kimin Lee
Pieter Abbeel
OffRL
93
147
0
28 Jun 2022
Contrastive Learning Rivals Masked Image Modeling in Fine-tuning via
  Feature Distillation
Contrastive Learning Rivals Masked Image Modeling in Fine-tuning via Feature Distillation
Yixuan Wei
Han Hu
Zhenda Xie
Zheng-Wei Zhang
Yue Cao
Jianmin Bao
Dong Chen
B. Guo
CLIP
88
124
0
27 May 2022
Masked Autoencoders Are Scalable Vision Learners
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
322
7,481
0
11 Nov 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy
  Text Supervision
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
337
3,720
0
11 Feb 2021
Previous
12345