ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2208.12262
  4. Cited By
MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image
  Pretraining

MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining

25 August 2022
Xiaoyi Dong
Jianmin Bao
Yinglin Zheng
Ting Zhang
Dongdong Chen
Hao Yang
Ming Zeng
Weiming Zhang
Lu Yuan
Dong Chen
Fang Wen
Nenghai Yu
    CLIP
    VLM
ArXivPDFHTML

Papers citing "MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining"

50 / 124 papers shown
Title
Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language
  Pre-training
Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language Pre-training
Haowei Liu
Yaya Shi
Haiyang Xu
Chunfen Yuan
Qinghao Ye
...
Mingshi Yan
Ji Zhang
Fei Huang
Bing Li
Weiming Hu
VLM
35
0
0
01 Mar 2024
CARZero: Cross-Attention Alignment for Radiology Zero-Shot
  Classification
CARZero: Cross-Attention Alignment for Radiology Zero-Shot Classification
Haoran Lai
Qingsong Yao
Zihang Jiang
Rongsheng Wang
Zhiyang He
Xiaodong Tao
S. Kevin Zhou
MedIm
38
12
0
27 Feb 2024
Delving into Multi-modal Multi-task Foundation Models for Road Scene
  Understanding: From Learning Paradigm Perspectives
Delving into Multi-modal Multi-task Foundation Models for Road Scene Understanding: From Learning Paradigm Perspectives
Sheng Luo
Wei Chen
Wanxin Tian
Rui Liu
Luanxuan Hou
...
Ling Shao
Yi Yang
Bojun Gao
Qun Li
Guobin Wu
51
13
0
05 Feb 2024
Overcoming the Pitfalls of Vision-Language Model Finetuning for OOD
  Generalization
Overcoming the Pitfalls of Vision-Language Model Finetuning for OOD Generalization
Yuhang Zang
Hanlin Goh
Josh Susskind
Chen Huang
VLM
37
12
0
29 Jan 2024
Forging Vision Foundation Models for Autonomous Driving: Challenges,
  Methodologies, and Opportunities
Forging Vision Foundation Models for Autonomous Driving: Challenges, Methodologies, and Opportunities
Xu Yan
Haiming Zhang
Yingjie Cai
Jingming Guo
Weichao Qiu
...
Lihui Jiang
Wei Zhang
Hongbo Zhang
Dengxin Dai
Bingbing Liu
54
17
0
16 Jan 2024
UMG-CLIP: A Unified Multi-Granularity Vision Generalist for Open-World Understanding
Bowen Shi
Peisen Zhao
Zichen Wang
Yuhang Zhang
Yaoming Wang
...
Wenrui Dai
Junni Zou
Hongkai Xiong
Qi Tian
Xiaopeng Zhang
VLM
40
7
0
12 Jan 2024
SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for
  Multimodal Alignment
SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment
Ziping Ma
Furong Xu
Jian Liu
Ming Yang
Qingpei Guo
VLM
42
3
0
04 Jan 2024
Masked Modeling for Self-supervised Representation Learning on Vision
  and Beyond
Masked Modeling for Self-supervised Representation Learning on Vision and Beyond
Siyuan Li
Luyuan Zhang
Zedong Wang
Di Wu
Lirong Wu
...
Jun-Xiong Xia
Cheng Tan
Yang Liu
Baigui Sun
Stan Z. Li
SSL
36
14
0
31 Dec 2023
Masked Contrastive Reconstruction for Cross-modal Medical Image-Report
  Retrieval
Masked Contrastive Reconstruction for Cross-modal Medical Image-Report Retrieval
Zeqiang Wei
Kai Jin
Xiuzhuang Zhou
MedIm
24
5
0
26 Dec 2023
TagAlign: Improving Vision-Language Alignment with Multi-Tag
  Classification
TagAlign: Improving Vision-Language Alignment with Multi-Tag Classification
Qinying Liu
Wei Wu
Kecheng Zheng
Zhan Tong
Jiawei Liu
Yu Liu
Wei Chen
Zilei Wang
Yujun Shen
VLM
26
6
0
21 Dec 2023
Open Vocabulary Semantic Scene Sketch Understanding
Open Vocabulary Semantic Scene Sketch Understanding
Ahmed Bourouis
Judith E. Fan
Yulia Gryaditskaya
VLM
3DV
23
1
0
18 Dec 2023
Collaborating Foundation Models for Domain Generalized Semantic
  Segmentation
Collaborating Foundation Models for Domain Generalized Semantic Segmentation
Yasser Benigmim
Subhankar Roy
S. Essid
Vicky Kalogeiton
Stéphane Lathuilière
25
12
0
15 Dec 2023
CSL: Class-Agnostic Structure-Constrained Learning for Segmentation
  Including the Unseen
CSL: Class-Agnostic Structure-Constrained Learning for Segmentation Including the Unseen
Hao Zhang
Fang Li
Lu Qi
Ming-Hsuan Yang
Narendra Ahuja
44
11
0
09 Dec 2023
Auto-Vocabulary Semantic Segmentation
Auto-Vocabulary Semantic Segmentation
Osman Ülger
Maksymilian Kulicki
Yuki M. Asano
Martin R. Oswald
VLM
45
2
0
07 Dec 2023
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Zeyi Sun
Ye Fang
Tong Wu
Pan Zhang
Yuhang Zang
Shu Kong
Yuanjun Xiong
Dahua Lin
Jiaqi Wang
VLM
CLIP
48
83
0
06 Dec 2023
Novel class discovery meets foundation models for 3D semantic
  segmentation
Novel class discovery meets foundation models for 3D semantic segmentation
Luigi Riz
Cristiano Saltori
Yiming Wang
Elisa Ricci
Fabio Poiesi
3DPC
36
0
0
06 Dec 2023
Grounding Everything: Emerging Localization Properties in
  Vision-Language Transformers
Grounding Everything: Emerging Localization Properties in Vision-Language Transformers
Walid Bousselham
Felix Petersen
Vittorio Ferrari
Hilde Kuehne
ObjD
VLM
40
39
0
01 Dec 2023
Segment and Caption Anything
Segment and Caption Anything
Xiaoke Huang
Jianfeng Wang
Yansong Tang
Zheng Zhang
Han Hu
Jiwen Lu
Lijuan Wang
Zicheng Liu
MLLM
VLM
26
18
0
01 Dec 2023
Unified Medical Image Pre-training in Language-Guided Common Semantic
  Space
Unified Medical Image Pre-training in Language-Guided Common Semantic Space
Xiaoxuan He
Yifan Yang
Xinyang Jiang
Xufang Luo
Haoji Hu
Siyun Zhao
Dongsheng Li
Yuqing Yang
Lili Qiu
38
1
0
24 Nov 2023
Recent Advances in Multi-modal 3D Scene Understanding: A Comprehensive
  Survey and Evaluation
Recent Advances in Multi-modal 3D Scene Understanding: A Comprehensive Survey and Evaluation
Yinjie Lei
Zixuan Wang
Feng Chen
Guoqing Wang
Peng Wang
Yang Yang
31
8
0
24 Oct 2023
SILC: Improving Vision Language Pretraining with Self-Distillation
SILC: Improving Vision Language Pretraining with Self-Distillation
Muhammad Ferjad Naeem
Yongqin Xian
Xiaohua Zhai
Lukas Hoyer
Luc Van Gool
F. Tombari
VLM
26
33
0
20 Oct 2023
HairCLIPv2: Unifying Hair Editing via Proxy Feature Blending
HairCLIPv2: Unifying Hair Editing via Proxy Feature Blending
Tianyi Wei
Dongdong Chen
Wenbo Zhou
Jing Liao
Weiming Zhang
Gang Hua
Neng H. Yu
31
12
0
16 Oct 2023
Black-box Targeted Adversarial Attack on Segment Anything (SAM)
Black-box Targeted Adversarial Attack on Segment Anything (SAM)
Sheng Zheng
Chaoning Zhang
Xinhong Hao
AAML
29
7
0
16 Oct 2023
Utilizing Synthetic Data for Medical Vision-Language Pre-training:
  Bypassing the Need for Real Images
Utilizing Synthetic Data for Medical Vision-Language Pre-training: Bypassing the Need for Real Images
Che Liu
Anand Shah
Wenjia Bai
Rossella Arcucci
MedIm
37
12
0
10 Oct 2023
Improving Compositional Text-to-image Generation with Large
  Vision-Language Models
Improving Compositional Text-to-image Generation with Large Vision-Language Models
Song Wen
Guian Fang
Renrui Zhang
Peng Gao
Hao Dong
Dimitris N. Metaxas
25
17
0
10 Oct 2023
TextPSG: Panoptic Scene Graph Generation from Textual Descriptions
TextPSG: Panoptic Scene Graph Generation from Textual Descriptions
Chengyang Zhao
Yikang Shen
Zhenfang Chen
Mingyu Ding
Chuang Gan
51
15
0
10 Oct 2023
ALT-Pilot: Autonomous navigation with Language augmented Topometric maps
ALT-Pilot: Autonomous navigation with Language augmented Topometric maps
Mohammad Omama
Pranav Inani
Pranjal Paul
Sarat Chandra Yellapragada
Krishna Murthy Jatavallabhula
Sandeep P. Chinchali
Madhava Krishna
23
13
0
03 Oct 2023
Contrastive Feature Masking Open-Vocabulary Vision Transformer
Contrastive Feature Masking Open-Vocabulary Vision Transformer
Dahun Kim
A. Angelova
Weicheng Kuo
ObjD
VLM
23
27
0
02 Sep 2023
Ref-Diff: Zero-shot Referring Image Segmentation with Generative Models
Ref-Diff: Zero-shot Referring Image Segmentation with Generative Models
Minheng Ni
Yabo Zhang
Kailai Feng
Xiaoming Li
Yiwen Guo
W. Zuo
DiffM
20
24
0
31 Aug 2023
Improving Adversarial Robustness of Masked Autoencoders via Test-time
  Frequency-domain Prompting
Improving Adversarial Robustness of Masked Autoencoders via Test-time Frequency-domain Prompting
Qidong Huang
Xiaoyi Dong
Dongdong Chen
Yinpeng Chen
Lu Yuan
Gang Hua
Weiming Zhang
Neng H. Yu
AAML
30
9
0
20 Aug 2023
Language-enhanced RNR-Map: Querying Renderable Neural Radiance Field
  maps with natural language
Language-enhanced RNR-Map: Querying Renderable Neural Radiance Field maps with natural language
Francesco Taioli
Federico Cunico
Federico Girella
Riccardo Bologna
Alessandro Farinelli
Marco Cristani
18
7
0
17 Aug 2023
MixReorg: Cross-Modal Mixed Patch Reorganization is a Good Mask Learner
  for Open-World Semantic Segmentation
MixReorg: Cross-Modal Mixed Patch Reorganization is a Good Mask Learner for Open-World Semantic Segmentation
Kaixin Cai
Pengzhen Ren
Yi Zhu
Hang Xu
Jian-zhuo Liu
Changlin Li
Guangrun Wang
Xiaodan Liang
VLM
29
14
0
09 Aug 2023
Unsupervised Camouflaged Object Segmentation as Domain Adaptation
Unsupervised Camouflaged Object Segmentation as Domain Adaptation
Yi Zhang
Chengyi Wu
28
3
0
08 Aug 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming Yang
F. Khan
VLM
38
118
0
25 Jul 2023
CLIP-KD: An Empirical Study of CLIP Model Distillation
CLIP-KD: An Empirical Study of CLIP Model Distillation
Chuanguang Yang
Zhulin An
Libo Huang
Junyu Bi
Xinqiang Yu
Hansheng Yang
Boyu Diao
Yongjun Xu
VLM
26
27
0
24 Jul 2023
Unified Open-Vocabulary Dense Visual Prediction
Unified Open-Vocabulary Dense Visual Prediction
Hengcan Shi
Munawar Hayat
Jianfei Cai
ObjD
VLM
43
19
0
17 Jul 2023
Distilling Large Vision-Language Model with Out-of-Distribution
  Generalizability
Distilling Large Vision-Language Model with Out-of-Distribution Generalizability
Xuanlin Li
Yunhao Fang
Minghua Liu
Z. Ling
Z. Tu
Haoran Su
VLM
31
23
0
06 Jul 2023
Learning-to-Rank Meets Language: Boosting Language-Driven Ordering
  Alignment for Ordinal Classification
Learning-to-Rank Meets Language: Boosting Language-Driven Ordering Alignment for Ordinal Classification
Rui Wang
Peipei Li
Huaibo Huang
Chunshui Cao
Ran He
Zhaofeng He
12
13
0
24 Jun 2023
Retrieval-Enhanced Contrastive Vision-Text Models
Retrieval-Enhanced Contrastive Vision-Text Models
Ahmet Iscen
Mathilde Caron
Alireza Fathi
Cordelia Schmid
CLIP
VLM
31
26
0
12 Jun 2023
COURIER: Contrastive User Intention Reconstruction for Large-Scale
  Visual Recommendation
COURIER: Contrastive User Intention Reconstruction for Large-Scale Visual Recommendation
Jia-Qi Yang
Chen Dai
OU Dan
Dongshuai Li
Ju Huang
De-Chuan Zhan
Xiaoyi Zeng
Yang Yang
20
1
0
08 Jun 2023
Exploring Open-Vocabulary Semantic Segmentation without Human Labels
Exploring Open-Vocabulary Semantic Segmentation without Human Labels
Jun Chen
Deyao Zhu
Guocheng Qian
Guohao Li
Zhicheng Yan
Chenchen Zhu
Fanyi Xiao
Mohamed Elhoseiny
Sean Culatana
VLM
38
11
0
01 Jun 2023
Album Storytelling with Iterative Story-aware Captioning and Large
  Language Models
Album Storytelling with Iterative Story-aware Captioning and Large Language Models
Munan Ning
Yujia Xie
Dongdong Chen
Zeyin Song
Lu Yuan
Yonghong Tian
QiXiang Ye
Liuliang Yuan
19
8
0
22 May 2023
MALM: Mask Augmentation based Local Matching for Food-Recipe Retrieval
MALM: Mask Augmentation based Local Matching for Food-Recipe Retrieval
Bhanu Prakash Voutharoja
Peng Wang
Lei Wang
Vivienne Guan
22
6
0
18 May 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited
  Modalities
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLM
MLLM
ObjD
42
114
0
18 May 2023
Improved baselines for vision-language pre-training
Improved baselines for vision-language pre-training
Enrico Fini
Pietro Astolfi
Adriana Romero Soriano
Jakob Verbeek
M. Drozdzal
SSL
CLIP
VLM
45
22
0
15 May 2023
CLIP-S$^4$: Language-Guided Self-Supervised Semantic Segmentation
CLIP-S4^44: Language-Guided Self-Supervised Semantic Segmentation
Wenbin He
Suphanut Jamonnak
Liangke Gou
Liu Ren
VLM
40
31
0
01 May 2023
RECLIP: Resource-efficient CLIP by Training with Small Images
RECLIP: Resource-efficient CLIP by Training with Small Images
Runze Li
Dahun Kim
B. Bhanu
Weicheng Kuo
VLM
CLIP
28
13
0
12 Apr 2023
SATR: Zero-Shot Semantic Segmentation of 3D Shapes
SATR: Zero-Shot Semantic Segmentation of 3D Shapes
Ahmed Abdelreheem
Ivan Skorokhodov
M. Ovsjanikov
Peter Wonka
3DPC
35
38
0
11 Apr 2023
DIME-FM: DIstilling Multimodal and Efficient Foundation Models
DIME-FM: DIstilling Multimodal and Efficient Foundation Models
Ximeng Sun
Pengchuan Zhang
Peizhao Zhang
Hardik Shah
Kate Saenko
Xide Xia
VLM
25
20
0
31 Mar 2023
Robo3D: Towards Robust and Reliable 3D Perception against Corruptions
Robo3D: Towards Robust and Reliable 3D Perception against Corruptions
Lingdong Kong
You-Chen Liu
Xin Li
Runnan Chen
Wenwei Zhang
Jiawei Ren
Liang Pan
Kaili Chen
Ziwei Liu
50
85
0
30 Mar 2023
Previous
123
Next