Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2208.12262
Cited By
MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining
25 August 2022
Xiaoyi Dong
Jianmin Bao
Yinglin Zheng
Ting Zhang
Dongdong Chen
Hao Yang
Ming Zeng
Weiming Zhang
Lu Yuan
Dong Chen
Fang Wen
Nenghai Yu
CLIP
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining"
50 / 124 papers shown
Title
Simple Semi-supervised Knowledge Distillation from Vision-Language Models via
D
\mathbf{\texttt{D}}
D
ual-
H
\mathbf{\texttt{H}}
H
ead
O
\mathbf{\texttt{O}}
O
ptimization
Seongjae Kang
Dong Bok Lee
Hyungjoon Jang
Sung Ju Hwang
VLM
57
0
0
12 May 2025
Causal Prompt Calibration Guided Segment Anything Model for Open-Vocabulary Multi-Entity Segmentation
Jingyao Wang
Jianqi Zhang
Wenwen Qiang
Changwen Zheng
VLM
37
0
0
10 May 2025
CLIP-Powered Domain Generalization and Domain Adaptation: A Comprehensive Survey
Jindong Li
Y. Li
Yali Fu
Jiahong Liu
Yixin Liu
Menglin Yang
Irwin King
VLM
38
0
0
19 Apr 2025
DSM: Building A Diverse Semantic Map for 3D Visual Grounding
Qinghongbing Xie
Zijian Liang
Long Zeng
29
0
0
11 Apr 2025
IAAO: Interactive Affordance Learning for Articulated Objects in 3D Environments
Can Zhang
G. Lee
32
0
0
09 Apr 2025
Falcon: Fractional Alternating Cut with Overcoming Minima in Unsupervised Segmentation
Xiao Zhang
Xiangyu Han
Xiwen Lai
Yao Sun
Pei Zhang
Konrad Kording
34
0
0
08 Apr 2025
Refining CLIP's Spatial Awareness: A Visual-Centric Perspective
Congpei Qiu
Yanhao Wu
Wei Ke
Xiuxiu Bai
Tong Zhang
VLM
52
0
0
03 Apr 2025
GOAL: Global-local Object Alignment Learning
Hyungyu Choi
Young Kyun Jang
Chanho Eom
VLM
127
0
0
22 Mar 2025
Seeing What Matters: Empowering CLIP with Patch Generation-to-Selection
Gensheng Pei
Tao Chen
Yujia Wang
Xinhao Cai
Xiangbo Shu
Tianfei Zhou
Yazhou Yao
VLM
53
1
0
21 Mar 2025
REF-VLM: Triplet-Based Referring Paradigm for Unified Visual Decoding
Yan Tai
Luhao Zhu
Zhiqiang Chen
Ynan Ding
Yiying Dong
Xiaohong Liu
Guodong Guo
MLLM
ObjD
54
0
0
10 Mar 2025
Vision-based 3D Semantic Scene Completion via Capture Dynamic Representations
Meng Wang
Fan Wu
Yunchuan Qin
Ruihui Li
Zhuo Tang
KenLi Li
3DPC
96
0
0
08 Mar 2025
OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction
Huang Huang
Fangchen Liu
Letian Fu
Tingfan Wu
Mustafa Mukadam
Jitendra Malik
Ken Goldberg
Pieter Abbeel
LM&Ro
VLM
82
5
0
05 Mar 2025
ATLAS Navigator: Active Task-driven LAnguage-embedded Gaussian Splatting
Dexter Ong
Yuezhan Tao
Varun Murali
Igor Spasojevic
Vijay R. Kumar
Pratik Chaudhari
3DGS
61
0
0
27 Feb 2025
Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering
Yanpeng Zhao
Yiwei Hao
Siyu Gao
Yunbo Wang
Xiaokang Yang
OCL
129
1
0
17 Feb 2025
Disentangling CLIP for Multi-Object Perception
Samyak Rawelekar
Yujun Cai
Yiwei Wang
Ming-Hsuan Yang
N. Ahuja
VLM
CoGe
72
0
0
05 Feb 2025
Efficient Redundancy Reduction for Open-Vocabulary Semantic Segmentation
Lin Chen
Qi Yang
Kun Ding
Z. Li
Gang Shen
Fei Li
Qiyuan Cao
Shiming Xiang
VLM
56
0
0
29 Jan 2025
sDREAMER: Self-distilled Mixture-of-Modality-Experts Transformer for Automatic Sleep Staging
Jingyuan Chen
Yuan Yao
Mie Anderson
Natalie Hauglund
Celia Kjaerby
Verena Untiet
Maiken Nedergaard
Jiebo Luo
41
1
0
28 Jan 2025
A Comprehensive Survey of Foundation Models in Medicine
Wasif Khan
Seowung Leem
Kyle B. See
Joshua K. Wong
Shaoting Zhang
R. Fang
AI4CE
LM&MA
VLM
105
18
0
17 Jan 2025
GSemSplat: Generalizable Semantic 3D Gaussian Splatting from Uncalibrated Image Pairs
Xingrui Wang
Cuiling Lan
Hanxin Zhu
Zhibo Chen
Yan Lu
3DGS
91
1
0
22 Dec 2024
Incorporating Feature Pyramid Tokenization and Open Vocabulary Semantic Segmentation
J. Zhang
Li Zhang
Shijian Li
VLM
81
0
0
18 Dec 2024
GaussTR: Foundation Model-Aligned Gaussian Transformer for Self-Supervised 3D Spatial Understanding
Haoyi Jiang
Liu Liu
Tianheng Cheng
Xinjie Wang
Tianwei Lin
Zhizhong Su
W. Liu
X. Wang
3DGS
ViT
113
5
0
17 Dec 2024
FLAIR: VLM with Fine-grained Language-informed Image Representations
Rui Xiao
Sanghwan Kim
Mariana-Iuliana Georgescu
Zeynep Akata
Stephan Alaniz
VLM
CLIP
71
2
0
04 Dec 2024
COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training
Sanghwan Kim
Rui Xiao
Mariana-Iuliana Georgescu
Stephan Alaniz
Zeynep Akata
VLM
74
1
0
02 Dec 2024
Bootstraping Clustering of Gaussians for View-consistent 3D Scene Understanding
W. Zhang
Lu Zhang
Ping Hu
Liqian Ma
Yunzhi Zhuge
Huchuan Lu
3DGS
71
2
0
29 Nov 2024
Language-guided Hierarchical Fine-grained Image Forgery Detection and Localization
Xiao Guo
Xiaohong Liu
I. Masi
Xiaoming Liu
95
9
0
31 Oct 2024
TIPS: Text-Image Pretraining with Spatial awareness
Kevis-Kokitsi Maninis
Kaifeng Chen
Soham Ghosh
Arjun Karpur
Koert Chen
...
Jan Dlabal
Dan Gnanapragasam
Mojtaba Seyedhosseini
Howard Zhou
Andre Araujo
VLM
35
3
0
21 Oct 2024
Locality Alignment Improves Vision-Language Models
Ian Covert
Tony Sun
James Y. Zou
Tatsunori Hashimoto
VLM
67
3
0
14 Oct 2024
Suppress Content Shift: Better Diffusion Features via Off-the-Shelf Generation Techniques
Benyuan Meng
Qianqian Xu
Zitai Wang
Zhiyong Yang
Xiaochun Cao
Qingming Huang
21
0
0
09 Oct 2024
Not All Diffusion Model Activations Have Been Evaluated as Discriminative Features
Benyuan Meng
Qianqian Xu
Zitai Wang
Xiaochun Cao
Qingming Huang
37
1
0
04 Oct 2024
Robotic-CLIP: Fine-tuning CLIP on Action Data for Robotic Applications
Nghia Nguyen
Minh Nhat Vu
Tung D. Ta
Baoru Huang
T. Vo
Ngan Le
Anh Nguyen
VLM
CLIP
41
4
0
26 Sep 2024
Global-Local Medical SAM Adaptor Based on Full Adaption
Meng Wang
Yarong Feng
Yongwei Tang
Tian Zhang
Yuxin Liang
Chao Lv
MedIm
40
0
0
26 Sep 2024
DetailCLIP: Detail-Oriented CLIP for Fine-Grained Tasks
Amin Karimi Monsefi
Kishore Prakash Sailaja
Ali Alilooee
Ser-Nam Lim
R. Ramnath
VLM
37
6
0
10 Sep 2024
Masked Image Modeling: A Survey
Vlad Hondru
Florinel-Alin Croitoru
Shervin Minaee
Radu Tudor Ionescu
N. Sebe
66
6
0
13 Aug 2024
ComKD-CLIP: Comprehensive Knowledge Distillation for Contrastive Language-Image Pre-traning Model
Yifan Chen
Xiaozhen Qiao
Zhe Sun
Xuelong Li
VLM
39
3
0
08 Aug 2024
GalLoP: Learning Global and Local Prompts for Vision-Language Models
Marc Lafon
Elias Ramzi
Clément Rambour
Nicolas Audebert
Nicolas Thome
VLM
39
8
0
01 Jul 2024
3D Feature Distillation with Object-Centric Priors
Georgios Tziafas
Yucheng Xu
Zhibin Li
H. Kasaei
31
1
0
26 Jun 2024
A Simple Framework for Open-Vocabulary Zero-Shot Segmentation
Thomas Stegmüller
Tim Lebailly
Nikola Dukic
Behzad Bozorgtabar
Tinne Tuytelaars
Jean-Philippe Thiran
VLM
36
1
0
23 Jun 2024
StableSemantics: A Synthetic Language-Vision Dataset of Semantic Representations in Naturalistic Images
Rushikesh Zawar
Shaurya Dewan
Andrew F. Luo
Margaret M. Henderson
Michael J. Tarr
Leila Wehbe
VGen
CoGe
41
1
0
19 Jun 2024
FILS: Self-Supervised Video Feature Prediction In Semantic Language Space
Mona Ahmadian
Frank Guerin
Andrew Gilbert
44
1
0
05 Jun 2024
MLIP: Efficient Multi-Perspective Language-Image Pretraining with Exhaustive Data Utilization
Yu Zhang
Qi Zhang
Zixuan Gong
Yiwei Shi
Yepeng Liu
...
Ke Liu
Kun Yi
Wei Fan
Liang Hu
Changwei Wang
CLIP
VLM
56
3
0
03 Jun 2024
Cross-sensor self-supervised training and alignment for remote sensing
V. Marsocci
Nicolas Audebert
30
1
0
16 May 2024
Open-Vocabulary Object Detection via Neighboring Region Attention Alignment
Sunyuan Qiang
Xianfei Li
Yanyan Liang
Wenlong Liao
Tao He
Pai Peng
ObjD
27
0
0
14 May 2024
OpenDlign: Enhancing Open-World 3D Learning with Depth-Aligned Images
Ye Mao
Junpeng Jing
K. Mikolajczyk
VLM
32
2
0
25 Apr 2024
HiVG: Hierarchical Multimodal Fine-grained Modulation for Visual Grounding
Linhui Xiao
Xiaoshan Yang
Fang Peng
Yaowei Wang
Changsheng Xu
ObjD
34
8
0
20 Apr 2024
Anchor-based Robust Finetuning of Vision-Language Models
Jinwei Han
Zhiwen Lin
Zhongyi Sun
Yingguo Gao
Ke Yan
Shouhong Ding
Yuan Gao
Gui-Song Xia
VLM
62
6
0
09 Apr 2024
ViTamin: Designing Scalable Vision Models in the Vision-Language Era
Jienneg Chen
Qihang Yu
Xiaohui Shen
Alan L. Yuille
Liang-Chieh Chen
3DV
VLM
36
24
0
02 Apr 2024
DreamLIP: Language-Image Pre-training with Long Captions
Kecheng Zheng
Yifei Zhang
Wei Wu
Fan Lu
Shuailei Ma
Xin Jin
Wei Chen
Yujun Shen
VLM
CLIP
39
26
0
25 Mar 2024
Centered Masking for Language-Image Pre-Training
Mingliang Liang
Martha Larson
VLM
CLIP
25
4
0
23 Mar 2024
GaussianGrasper: 3D Language Gaussian Splatting for Open-vocabulary Robotic Grasping
Yuhang Zheng
Xiangyu Chen
Yupeng Zheng
Songen Gu
Runyi Yang
...
Chao Yang
Dawei Wang
Zhen Chen
Xiaoxiao Long
Meiqing Wang
50
43
0
14 Mar 2024
FocusCLIP: Multimodal Subject-Level Guidance for Zero-Shot Transfer in Human-Centric Tasks
Muhammad Gul Zain Ali Khan
Muhammad Ferjad Naeem
F. Tombari
Luc Van Gool
Didier Stricker
Muhammad Zeshan Afzal
VLM
CLIP
35
3
0
11 Mar 2024
1
2
3
Next