Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.00729
Cited By
What Do Self-Supervised Vision Transformers Learn?
1 May 2023
Namuk Park
Wonjae Kim
Byeongho Heo
Taekyung Kim
Sangdoo Yun
SSL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"What Do Self-Supervised Vision Transformers Learn?"
50 / 62 papers shown
Title
Can Masked Autoencoders Also Listen to Birds?
Lukas Rauch
Ilyass Moummad
René Heinrich
Alexis Joly
Bernhard Sick
Christoph Scholz
31
0
0
17 Apr 2025
Self-Supervised Pre-training with Combined Datasets for 3D Perception in Autonomous Driving
Shumin Wang
Zhuoran Yang
Liwen Wang
ZhiPeng Tang
Heng Li
Lehan Pan
Sha Zhang
Jie Peng
Jianmin Ji
Y. Zhang
3DPC
46
0
0
17 Apr 2025
A Survey on Remote Sensing Foundation Models: From Vision to Multimodality
Ziyue Huang
Hongxi Yan
Qiqi Zhan
Shuai Yang
Mingming Zhang
Yiming Lei
Chenkai Zhang
Zeming Liu
Qingjie Liu
Yansen Wang
49
0
0
28 Mar 2025
Similarity-Aware Token Pruning: Your VLM but Faster
Ahmadreza Jeddi
Negin Baghbanzadeh
Elham Dolatabadi
Babak Taati
3DV
VLM
59
1
0
14 Mar 2025
Interpretable Image Classification via Non-parametric Part Prototype Learning
Zhijie Zhu
Lei Fan
Maurice Pagnucco
Yang Song
55
0
0
13 Mar 2025
Scale-Aware Pre-Training for Human-Centric Visual Perception: Enabling Lightweight and Generalizable Models
Xuanhan Wang
Huimin Deng
Lianli Gao
Jingkuan Song
VLM
59
0
0
11 Mar 2025
UNIP: Rethinking Pre-trained Attention Patterns for Infrared Semantic Segmentation
Tao Zhang
Jinyong Wen
Zhen Chen
Kun Ding
S. Xiang
Chunhong Pan
74
1
0
04 Feb 2025
Keypoint Aware Masked Image Modelling
Madhava Krishna
Convin.AI
73
0
0
03 Jan 2025
Beyond [cls]: Exploring the true potential of Masked Image Modeling representations
Marcin Przewiȩźlikowski
Randall Balestriero
Wojciech Jasiński
Marek 'Smieja
Bartosz Zieliñski
74
0
0
04 Dec 2024
Relational Contrastive Learning and Masked Image Modeling for Scene Text Recognition
T. Lin
Jinglei Zhang
Yi Xu
Kai Chen
Rui Zhang
Chong Chen
38
0
0
18 Nov 2024
On the Surprising Effectiveness of Attention Transfer for Vision Transformers
Alexander C. Li
Yuandong Tian
Bin Chen
Deepak Pathak
Xinlei Chen
43
0
0
14 Nov 2024
Towards Real Zero-Shot Camouflaged Object Segmentation without Camouflaged Annotations
Cheng Lei
Jie Fan
Xinran Li
Tianzhu Xiang
Ao Li
Ce Zhu
Le Zhang
30
0
0
22 Oct 2024
Self-Supervised Learning for Real-World Object Detection: a Survey
Alina Ciocarlan
Sidonie Lefebvre
S. L. Hégarat-Mascle
Arnaud Woiselle
ObjD
41
1
0
09 Oct 2024
Vote&Mix: Plug-and-Play Token Reduction for Efficient Vision Transformer
Shuai Peng
Di Fu
Baole Wei
Yong Cao
Liangcai Gao
Zhi Tang
ViT
45
1
0
30 Aug 2024
QPT V2: Masked Image Modeling Advances Visual Scoring
Qizhi Xie
Kun Yuan
Yunpeng Qu
Mingda Wu
Ming Sun
Chao Zhou
Jihong Zhu
42
3
0
23 Jul 2024
Improving Representation of High-frequency Components for Medical Visual Foundation Models
Yuetan Chu
Yilan Zhang
Zhongyi Han
Changchun Yang
Longxi Zhou
Gongning Luo
Chao Huang
Xin Gao
MedIm
49
1
0
19 Jul 2024
X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs
S. Swetha
Jinyu Yang
T. Neiman
Mamshad Nayeem Rizve
Son Tran
Benjamin Z. Yao
Trishul Chilimbi
Mubarak Shah
62
2
0
18 Jul 2024
Joint-Embedding Predictive Architecture for Self-Supervised Learning of Mask Classification Architecture
Donghee Kim
Sungduk Cho
Hyeonwoo Cho
Chanmin Park
Jinyoung Kim
Won Hwa Kim
52
0
0
15 Jul 2024
Accelerating Transformers with Spectrum-Preserving Token Merging
Hoai-Chau Tran
D. M. Nguyen
Duy M. Nguyen
Trung Thanh Nguyen
Ngan Le
Pengtao Xie
Daniel Sonntag
James Y. Zou
Binh T. Nguyen
Mathias Niepert
44
8
0
25 May 2024
LookHere: Vision Transformers with Directed Attention Generalize and Extrapolate
A. Fuller
Daniel G. Kyrollos
Yousef Yassin
James R. Green
52
2
0
22 May 2024
LangCell: Language-Cell Pre-training for Cell Identity Understanding
Suyuan Zhao
Jiahuan Zhang
Yushuai Wu
Yizhen Luo
Zaiqing Nie
VLM
41
6
0
09 May 2024
Self-Supervised Learning for Interventional Image Analytics: Towards Robust Device Trackers
Saahil Islam
Venkatesh N. Murthy
Dominik Neumann
Badhan Kumar Das
Puneet Sharma
Andreas Maier
Dorin Comaniciu
Florin-Cristian Ghesu
36
1
0
02 May 2024
X-MIC: Cross-Modal Instance Conditioning for Egocentric Action Generalization
Anna Kukleva
Fadime Sener
Edoardo Remelli
Bugra Tekin
Eric Sauser
Bernt Schiele
Shugao Ma
VLM
EgoV
45
1
0
28 Mar 2024
DODA: Adapting Object Detectors to Dynamic Agricultural Environments in Real-Time with Diffusion
Shuai Xiang
Pieter M. Blok
James Burridge
Haozhou Wang
Wei Guo
37
0
0
27 Mar 2024
Rotary Position Embedding for Vision Transformer
Byeongho Heo
Song Park
Dongyoon Han
Sangdoo Yun
43
36
0
20 Mar 2024
Rethinking cluster-conditioned diffusion models
Nikolas Adaloglou
Tim Kaiser
Félix D. P. Michels
M. Kollmann
VLM
37
3
0
01 Mar 2024
SwitchLight: Co-design of Physics-driven Architecture and Pre-training Framework for Human Portrait Relighting
Hoon Kim
Minje Jang
Wonjun Yoon
Jisoo Lee
Donghyun Na
Sanghyun Woo
AI4CE
39
19
0
29 Feb 2024
LocalGCL: Local-aware Contrastive Learning for Graphs
Haojun Jiang
Jiawei Sun
Jie Li
Chentao Wu
SSL
25
0
0
27 Feb 2024
LHRS-Bot: Empowering Remote Sensing with VGI-Enhanced Large Multimodal Language Model
Dilxat Muhtar
Zhenshi Li
Feng-Xue Gu
Xue-liang Zhang
Pengfeng Xiao
80
51
0
04 Feb 2024
Guiding Masked Representation Learning to Capture Spatio-Temporal Relationship of Electrocardiogram
Yeongyeon Na
Minje Park
Yunwon Tae
S. Joo
35
24
0
02 Feb 2024
Understanding Video Transformers via Universal Concept Discovery
M. Kowal
Achal Dave
Rares Andrei Ambrus
Adrien Gaidon
Konstantinos G. Derpanis
P. Tokmakov
ViT
37
8
0
19 Jan 2024
Exploring scalable medical image encoders beyond text supervision
Fernando Pérez-García
Harshita Sharma
Sam Bond-Taylor
Kenza Bouzid
Valentina Salvatelli
...
Maria T. A. Wetscherek
Noel C. F. Codella
Stephanie L. Hyland
Javier Alvarez-Valle
Ozan Oktay
LM&MA
MedIm
50
9
0
19 Jan 2024
Analyzing Local Representations of Self-supervised Vision Transformers
Ani Vanyan
Alvard Barseghyan
Hakob Tamazyan
Vahan Huroyan
Hrant Khachatrian
Martin Danelljan
50
3
0
31 Dec 2023
Masked Contrastive Reconstruction for Cross-modal Medical Image-Report Retrieval
Zeqiang Wei
Kai Jin
Xiuzhuang Zhou
MedIm
24
5
0
26 Dec 2023
Toward General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis
Yafei Hu
Quanting Xie
Vidhi Jain
Jonathan M Francis
Jay Patrikar
...
Xiaolong Wang
Sebastian A. Scherer
Z. Kira
Fei Xia
Yonatan Bisk
LM&Ro
AI4CE
43
63
0
14 Dec 2023
Rescuing referral failures during automated diagnosis of domain-shifted medical images
Anuj Srivastava
Karm Patel
Pradeep Shenoy
D. Sridharan
OOD
26
0
0
28 Nov 2023
ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy
Kirill Vishniakov
Zhiqiang Shen
Zhuang Liu
CLIP
42
16
0
15 Nov 2023
CROMA: Remote Sensing Representations with Contrastive Radar-Optical Masked Autoencoders
A. Fuller
K. Millard
James R. Green
29
60
0
01 Nov 2023
Limited Data, Unlimited Potential: A Study on ViTs Augmented by Masked Autoencoders
Srijan Das
Tanmay Jain
Dominick Reilly
P. Balaji
Soumyajit Karmakar
Shyam Marjit
Xiang Li
Abhijit Das
Michael S. Ryoo
39
16
0
31 Oct 2023
SAM-CLIP: Merging Vision Foundation Models towards Semantic and Spatial Understanding
Haoxiang Wang
Pavan Kumar Anasosalu Vasu
Fartash Faghri
Raviteja Vemulapalli
Mehrdad Farajtabar
Sachin Mehta
Mohammad Rastegari
Oncel Tuzel
Hadi Pouransari
VLM
35
67
0
23 Oct 2023
Adaptive Multi-head Contrastive Learning
Lei Wang
Piotr Koniusz
Tom Gedeon
Liang Zheng
41
4
0
09 Oct 2023
Understanding Masked Autoencoders From a Local Contrastive Perspective
Xiaoyu Yue
Lei Bai
Meng Wei
Jiangmiao Pang
Xihui Liu
Luping Zhou
Wanli Ouyang
SSL
67
4
0
03 Oct 2023
Attention De-sparsification Matters: Inducing Diversity in Digital Pathology Representation Learning
S. Kapse
Srijan Das
Jingwei Zhang
Rajarsi R. Gupta
Joel H. Saltz
Dimitris Samaras
Prateek Prasanna
41
9
0
12 Sep 2023
AnyLoc: Towards Universal Visual Place Recognition
Nikhil Varma Keetha
Avneesh Mishra
Jay Karhade
Krishna Murthy Jatavallabhula
Sebastian Scherer
Madhava Krishna
Sourav Garg
35
117
0
01 Aug 2023
Does Visual Pretraining Help End-to-End Reasoning?
Chen Sun
Calvin Luo
Xingyi Zhou
Anurag Arnab
Cordelia Schmid
OCL
LRM
ViT
38
3
0
17 Jul 2023
Hybrid Distillation: Connecting Masked Autoencoders with Contrastive Learners
Bowen Shi
Xiaopeng Zhang
Yaoming Wang
Jin Li
Wenrui Dai
Junni Zou
H. Xiong
Qi Tian
51
4
0
28 Jun 2023
RemoteCLIP: A Vision Language Foundation Model for Remote Sensing
F. Liu
Delong Chen
Zhan-Rong Guan
Xiaocong Zhou
Jiale Zhu
Qiaolin Ye
Liyong Fu
Jun Zhou
VLM
71
193
0
19 Jun 2023
Reverse Engineering Self-Supervised Learning
Ido Ben-Shaul
Ravid Shwartz-Ziv
Tomer Galanti
S. Dekel
Yann LeCun
SSL
26
34
0
24 May 2023
Know Your Self-supervised Learning: A Survey on Image-based Generative and Discriminative Training
Utku Ozbulak
Hyun Jung Lee
Beril Boga
Esla Timothy Anzaku
Ho-min Park
Arnout Van Messem
W. D. Neve
J. Vankerschaver
DiffM
26
36
0
23 May 2023
Objectives Matter: Understanding the Impact of Self-Supervised Objectives on Vision Transformer Representations
Shashank Shekhar
Florian Bordes
Pascal Vincent
Ari S. Morcos
29
10
0
25 Apr 2023
1
2
Next