ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.00794
  4. Cited By
Scaling Language-Image Pre-training via Masking

Scaling Language-Image Pre-training via Masking

1 December 2022
Yanghao Li
Haoqi Fan
Ronghang Hu
Christoph Feichtenhofer
Kaiming He
    CLIP
    VLM
ArXivPDFHTML

Papers citing "Scaling Language-Image Pre-training via Masking"

50 / 249 papers shown
Title
Training-Free Unsupervised Prompt for Vision-Language Models
Training-Free Unsupervised Prompt for Vision-Language Models
Sifan Long
Linbin Wang
Zhen Zhao
Zichang Tan
Yiming Wu
Shengsheng Wang
Jingdong Wang
VLM
VPVLM
48
1
0
25 Apr 2024
MoDE: CLIP Data Experts via Clustering
MoDE: CLIP Data Experts via Clustering
Jiawei Ma
Po-Yao Huang
Saining Xie
Shang-Wen Li
Luke Zettlemoyer
Shih-Fu Chang
Wen-tau Yih
Hu Xu
MoE
CLIP
VLM
31
11
0
24 Apr 2024
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster
  Pre-training on Web-scale Image-Text Data
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Sachin Mehta
Maxwell Horton
Fartash Faghri
Mohammad Hossein Sekhavat
Mahyar Najibi
Mehrdad Farajtabar
Oncel Tuzel
Mohammad Rastegari
VLM
CLIP
44
6
0
24 Apr 2024
The Devil is in the Few Shots: Iterative Visual Knowledge Completion for
  Few-shot Learning
The Devil is in the Few Shots: Iterative Visual Knowledge Completion for Few-shot Learning
Yaohui Li
Qifeng Zhou
Haoxing Chen
Jianbing Zhang
Xinyu Dai
Hao Zhou
VLM
53
0
0
15 Apr 2024
Salience-Based Adaptive Masking: Revisiting Token Dynamics for Enhanced
  Pre-training
Salience-Based Adaptive Masking: Revisiting Token Dynamics for Enhanced Pre-training
Hyesong Choi
Hyejin Park
Kwang Moo Yi
Sungmin Cha
Dongbo Min
39
9
0
12 Apr 2024
Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and
  Training Strategies
Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies
Zichao Li
Cihang Xie
E. D. Cubuk
CLIP
34
9
0
12 Apr 2024
Two Effects, One Trigger: On the Modality Gap, Object Bias, and Information Imbalance in Contrastive Vision-Language Models
Two Effects, One Trigger: On the Modality Gap, Object Bias, and Information Imbalance in Contrastive Vision-Language Models
Simon Schrodi
David T. Hoffmann
Max Argus
Volker Fischer
Thomas Brox
VLM
58
1
0
11 Apr 2024
Anchor-based Robust Finetuning of Vision-Language Models
Anchor-based Robust Finetuning of Vision-Language Models
Jinwei Han
Zhiwen Lin
Zhongyi Sun
Yingguo Gao
Ke Yan
Shouhong Ding
Yuan Gao
Gui-Song Xia
VLM
71
6
0
09 Apr 2024
Unified Multi-modal Diagnostic Framework with Reconstruction
  Pre-training and Heterogeneity-combat Tuning
Unified Multi-modal Diagnostic Framework with Reconstruction Pre-training and Heterogeneity-combat Tuning
Yupei Zhang
Li Pan
Qiushi Yang
Tan Li
Zhen Chen
31
1
0
09 Apr 2024
Foundation Model for Advancing Healthcare: Challenges, Opportunities,
  and Future Directions
Foundation Model for Advancing Healthcare: Challenges, Opportunities, and Future Directions
Yuting He
Fuxiang Huang
Xinrui Jiang
Yuxiang Nie
Minghao Wang
Jiguang Wang
Hao Chen
LM&MA
AI4CE
84
28
0
04 Apr 2024
ViTamin: Designing Scalable Vision Models in the Vision-Language Era
ViTamin: Designing Scalable Vision Models in the Vision-Language Era
Jienneg Chen
Qihang Yu
Xiaohui Shen
Alan Yuille
Liang-Chieh Chen
3DV
VLM
47
25
0
02 Apr 2024
Siamese Vision Transformers are Scalable Audio-visual Learners
Siamese Vision Transformers are Scalable Audio-visual Learners
Yan-Bo Lin
Gedas Bertasius
37
5
0
28 Mar 2024
LocCa: Visual Pretraining with Location-aware Captioners
LocCa: Visual Pretraining with Location-aware Captioners
Bo Wan
Michael Tschannen
Yongqin Xian
Filip Pavetić
Ibrahim M. Alabdulmohsin
Xiao Wang
André Susano Pinto
Andreas Steiner
Lucas Beyer
Xiao-Qi Zhai
VLM
51
6
0
28 Mar 2024
DreamLIP: Language-Image Pre-training with Long Captions
DreamLIP: Language-Image Pre-training with Long Captions
Kecheng Zheng
Yifei Zhang
Wei Wu
Fan Lu
Shuailei Ma
Xin Jin
Wei Chen
Yujun Shen
VLM
CLIP
47
26
0
25 Mar 2024
SD-DiT: Unleashing the Power of Self-supervised Discrimination in
  Diffusion Transformer
SD-DiT: Unleashing the Power of Self-supervised Discrimination in Diffusion Transformer
Rui Zhu
Yingwei Pan
Yehao Li
Ting Yao
Zhenglong Sun
Tao Mei
C. Chen
50
24
0
25 Mar 2024
Centered Masking for Language-Image Pre-Training
Centered Masking for Language-Image Pre-Training
Mingliang Liang
Martha Larson
VLM
CLIP
36
4
0
23 Mar 2024
Rethinking Multi-view Representation Learning via Distilled
  Disentangling
Rethinking Multi-view Representation Learning via Distilled Disentangling
Guanzhou Ke
Bo Wang
Xiaoli Wang
Shengfeng He
39
3
0
16 Mar 2024
Improving Medical Multi-modal Contrastive Learning with Expert
  Annotations
Improving Medical Multi-modal Contrastive Learning with Expert Annotations
Yogesh Kumar
Pekka Marttinen
MedIm
VLM
31
10
0
15 Mar 2024
MIM4D: Masked Modeling with Multi-View Video for Autonomous Driving
  Representation Learning
MIM4D: Masked Modeling with Multi-View Video for Autonomous Driving Representation Learning
Jialv Zou
Bencheng Liao
Qian Zhang
Wenyu Liu
Xinggang Wang
49
2
0
13 Mar 2024
Language-Driven Visual Consensus for Zero-Shot Semantic Segmentation
Language-Driven Visual Consensus for Zero-Shot Semantic Segmentation
Zicheng Zhang
Tong Zhang
Yi Zhu
Jian-zhuo Liu
Xiaodan Liang
QiXiang Ye
Wei Ke
VLM
52
2
0
13 Mar 2024
MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with
  Module-wise Pruning Error Metric
MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric
Haokun Lin
Haoli Bai
Zhili Liu
Lu Hou
Muyi Sun
Linqi Song
Ying Wei
Zhenan Sun
CLIP
VLM
63
14
0
12 Mar 2024
Textual Knowledge Matters: Cross-Modality Co-Teaching for Generalized
  Visual Class Discovery
Textual Knowledge Matters: Cross-Modality Co-Teaching for Generalized Visual Class Discovery
Haiyang Zheng
Nan Pu
Wenjing Li
N. Sebe
Zhun Zhong
49
7
0
12 Mar 2024
FocusCLIP: Multimodal Subject-Level Guidance for Zero-Shot Transfer in
  Human-Centric Tasks
FocusCLIP: Multimodal Subject-Level Guidance for Zero-Shot Transfer in Human-Centric Tasks
Muhammad Gul Zain Ali Khan
Muhammad Ferjad Naeem
F. Tombari
Luc Van Gool
Didier Stricker
Muhammad Zeshan Afzal
VLM
CLIP
47
3
0
11 Mar 2024
MedFLIP: Medical Vision-and-Language Self-supervised Fast Pre-Training
  with Masked Autoencoder
MedFLIP: Medical Vision-and-Language Self-supervised Fast Pre-Training with Masked Autoencoder
Lei Li
Tianfang Zhang
Xinglin Zhang
Jiaqi Liu
Bingqi Ma
Yan-chun Luo
Tao Chen
MedIm
45
0
0
07 Mar 2024
Differentially Private Representation Learning via Image Captioning
Differentially Private Representation Learning via Image Captioning
Tom Sander
Yaodong Yu
Maziar Sanjabi
Alain Durmus
Yi Ma
Kamalika Chaudhuri
Chuan Guo
71
3
0
04 Mar 2024
Rethinking CLIP-based Video Learners in Cross-Domain Open-Vocabulary
  Action Recognition
Rethinking CLIP-based Video Learners in Cross-Domain Open-Vocabulary Action Recognition
Kun-Yu Lin
Henghui Ding
Jiaming Zhou
Yu-Ming Tang
Yi-Xing Peng
Zhilin Zhao
Chen Change Loy
Wei-Shi Zheng
VLM
43
15
0
03 Mar 2024
Generalizable Whole Slide Image Classification with Fine-Grained
  Visual-Semantic Interaction
Generalizable Whole Slide Image Classification with Fine-Grained Visual-Semantic Interaction
Hao Li
Ying Chen
Yifei Chen
Wenxian Yang
Bowen Ding
Yuchen Han
Liansheng Wang
Rongshan Yu
36
15
0
29 Feb 2024
Parameter-efficient Prompt Learning for 3D Point Cloud Understanding
Parameter-efficient Prompt Learning for 3D Point Cloud Understanding
Hongyu Sun
Yongcai Wang
Wang Chen
Haoran Deng
Deying Li
VPVLM
53
5
0
24 Feb 2024
A Touch, Vision, and Language Dataset for Multimodal Alignment
A Touch, Vision, and Language Dataset for Multimodal Alignment
Letian Fu
Gaurav Datta
Huang Huang
Will Panitch
Jaimyn Drake
Joseph Ortiz
Mustafa Mukadam
Mike Lambeta
Roberto Calandra
Ken Goldberg
VLM
40
34
0
20 Feb 2024
VideoPrism: A Foundational Visual Encoder for Video Understanding
VideoPrism: A Foundational Visual Encoder for Video Understanding
Long Zhao
N. B. Gundavarapu
Liangzhe Yuan
Hao Zhou
Shen Yan
...
Huisheng Wang
Hartwig Adam
Mikhail Sirotenko
Ting Liu
Boqing Gong
VGen
45
29
0
20 Feb 2024
Towards Privacy-Aware Sign Language Translation at Scale
Towards Privacy-Aware Sign Language Translation at Scale
Phillip Rust
Bowen Shi
Skyler Wang
Necati Cihan Camgöz
Jean Maillard
SLR
47
14
0
14 Feb 2024
Cacophony: An Improved Contrastive Audio-Text Model
Cacophony: An Improved Contrastive Audio-Text Model
Ge Zhu
Jordan Darefsky
Zhiyao Duan
AuLLM
46
11
0
10 Feb 2024
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters
Quan-Sen Sun
Jinsheng Wang
Qiying Yu
Yufeng Cui
Fan Zhang
Xiaosong Zhang
Xinlong Wang
VLM
CLIP
MLLM
97
42
0
06 Feb 2024
MOMENT: A Family of Open Time-series Foundation Models
MOMENT: A Family of Open Time-series Foundation Models
Mononito Goswami
Konrad Szafer
Arjun Choudhry
Yifu Cai
Shuo Li
Artur Dubrawski
AIFin
AI4TS
71
118
0
06 Feb 2024
Organic or Diffused: Can We Distinguish Human Art from AI-generated
  Images?
Organic or Diffused: Can We Distinguish Human Art from AI-generated Images?
Anna Yoo Jeong Ha
Josephine Passananti
Ronik Bhaskar
Shawn Shan
Reid Southen
Haitao Zheng
Ben Y. Zhao
AAML
29
22
0
05 Feb 2024
Delving into Multi-modal Multi-task Foundation Models for Road Scene
  Understanding: From Learning Paradigm Perspectives
Delving into Multi-modal Multi-task Foundation Models for Road Scene Understanding: From Learning Paradigm Perspectives
Sheng Luo
Wei Chen
Wanxin Tian
Rui Liu
Luanxuan Hou
...
Ling Shao
Yi Yang
Bojun Gao
Qun Li
Guobin Wu
51
13
0
05 Feb 2024
TimeSiam: A Pre-Training Framework for Siamese Time-Series Modeling
TimeSiam: A Pre-Training Framework for Siamese Time-Series Modeling
Jiaxiang Dong
Haixu Wu
Yuxuan Wang
Yunzhong Qiu
Li Zhang
Jianmin Wang
Mingsheng Long
AI4TS
23
13
0
04 Feb 2024
A Survey on Self-Supervised Learning for Non-Sequential Tabular Data
A Survey on Self-Supervised Learning for Non-Sequential Tabular Data
Wei-Yao Wang
Wei-Wei Du
Derek Xu
Wei Wang
Wenjie Peng
LMTD
42
7
0
02 Feb 2024
Rethinking Patch Dependence for Masked Autoencoders
Rethinking Patch Dependence for Masked Autoencoders
Letian Fu
Long Lian
Renhao Wang
Baifeng Shi
Xudong Wang
Adam Yala
Trevor Darrell
Alexei A. Efros
Ken Goldberg
39
14
0
25 Jan 2024
Exploring Simple Open-Vocabulary Semantic Segmentation
Exploring Simple Open-Vocabulary Semantic Segmentation
Zihang Lai
VLM
26
0
0
22 Jan 2024
CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model
  for Multimodal Processing
CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal Processing
Xianghu Yue
Xiaohai Tian
Lu Lu
Malu Zhang
Zhizheng Wu
Haizhou Li
39
0
0
22 Jan 2024
MM-Interleaved: Interleaved Image-Text Generative Modeling via
  Multi-modal Feature Synchronizer
MM-Interleaved: Interleaved Image-Text Generative Modeling via Multi-modal Feature Synchronizer
Changyao Tian
Xizhou Zhu
Yuwen Xiong
Weiyun Wang
Zhe Chen
...
Tong Lu
Jie Zhou
Hongsheng Li
Yu Qiao
Jifeng Dai
AuLLM
85
42
0
18 Jan 2024
UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding
Bowen Shi
Peisen Zhao
Zichen Wang
Yuhang Zhang
Yaoming Wang
...
Wenrui Dai
Junni Zou
Hongkai Xiong
Qi Tian
Xiaopeng Zhang
VLM
40
7
0
12 Jan 2024
Revisiting Adversarial Training at Scale
Revisiting Adversarial Training at Scale
Zeyu Wang
Xianhang Li
Hongru Zhu
Cihang Xie
36
15
0
09 Jan 2024
Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes
  Interactively
Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively
Haobo Yuan
Xiangtai Li
Chong Zhou
Yining Li
Kai Chen
Chen Change Loy
VLM
29
51
0
05 Jan 2024
MLIP: Medical Language-Image Pre-training with Masked Local
  Representation Learning
MLIP: Medical Language-Image Pre-training with Masked Local Representation Learning
Jiarun Liu
Hong-Yu Zhou
Cheng Li
Weijian Huang
Hao Yang
Yong Liang
Shanshan Wang
VLM
53
4
0
03 Jan 2024
Masked Modeling for Self-supervised Representation Learning on Vision
  and Beyond
Masked Modeling for Self-supervised Representation Learning on Vision and Beyond
Siyuan Li
Luyuan Zhang
Zedong Wang
Di Wu
Lirong Wu
...
Jun Xia
Cheng Tan
Yang Liu
Baigui Sun
Stan Z. Li
SSL
42
14
0
31 Dec 2023
Masked Contrastive Reconstruction for Cross-modal Medical Image-Report
  Retrieval
Masked Contrastive Reconstruction for Cross-modal Medical Image-Report Retrieval
Zeqiang Wei
Kai Jin
Xiuzhuang Zhou
MedIm
24
5
0
26 Dec 2023
InternVL: Scaling up Vision Foundation Models and Aligning for Generic
  Visual-Linguistic Tasks
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Zhe Chen
Jiannan Wu
Wenhai Wang
Weijie Su
Guo Chen
...
Bin Li
Ping Luo
Tong Lu
Yu Qiao
Jifeng Dai
VLM
MLLM
176
961
0
21 Dec 2023
ECAMP: Entity-centered Context-aware Medical Vision Language
  Pre-training
ECAMP: Entity-centered Context-aware Medical Vision Language Pre-training
Rongsheng Wang
Qingsong Yao
Haoran Lai
Zhiyang He
Xiaodong Tao
Zihang Jiang
S.Kevin Zhou
VLM
MedIm
59
6
0
20 Dec 2023
Previous
12345
Next