Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.06377
Cited By
Masked Autoencoders Are Scalable Vision Learners
11 November 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Masked Autoencoders Are Scalable Vision Learners"
50 / 4,611 papers shown
Title
MaskSDM with Shapley values to improve flexibility, robustness, and explainability in species distribution modeling
Robin Zbinden
Nina Van Tiel
Gencer Sumbul
Chiara Vanalli
B. Kellenberger
D. Tuia
39
0
0
17 Mar 2025
Quantum EigenGame for excited state calculation
David Quiroga
Jason Han
Anastasios Kyrillidis
48
0
0
17 Mar 2025
8-Calves Image dataset
Xuyang Fang
S. Hannuna
Neill D. F. Campbell
107
0
0
17 Mar 2025
SAM2-ELNet: Label Enhancement and Automatic Annotation for Remote Sensing Segmentation
Jianhao Yang
Wenshuo Yu
Yuanchao Lv
Jiance Sun
Bokang Sun
Mingyang Liu
49
0
0
16 Mar 2025
Pathology Image Restoration via Mixture of Prompts
Jiangdong Cai
Yan Chen
Zhenrong Shen
Haotian Jiang
Honglin Xiong
Kai Xuan
Lichi Zhang
Qian Wang
MedIm
48
0
0
16 Mar 2025
Multi Activity Sequence Alignment via Implicit Clustering
Taein Kwon
Zador Pataki
Mahdi Rad
Marc Pollefeys
HAI
AI4TS
60
0
0
16 Mar 2025
Similarity-Aware Token Pruning: Your VLM but Faster
Ahmadreza Jeddi
Negin Baghbanzadeh
Elham Dolatabadi
Babak Taati
3DV
VLM
54
1
0
14 Mar 2025
Self-Supervised Pretraining for Fine-Grained Plankton Recognition
Joona Kareinen
T. Eerola
K. Kraft
L. Lensu
S. Suikkanen
H. Kalviainen
SSL
131
0
0
14 Mar 2025
SpaceSeg: A High-Precision Intelligent Perception Segmentation Method for Multi-Spacecraft On-Orbit Targets
Hao Liu
Pengyu Guo
Siyuan Yang
Zeqing Jiang
Qinglei Hu
Dongyu Li
43
0
0
14 Mar 2025
Towards a Unified Copernicus Foundation Model for Earth Vision
Yi Wang
Zhitong Xiong
Chenying Liu
Adam J. Stewart
Thomas Dujardin
...
Angelos Zavras
Franziska Gerken
Ioannis Papoutsis
Laura Leal-Taixé
Xiao Xiang Zhu
44
1
0
14 Mar 2025
COIN: Confidence Score-Guided Distillation for Annotation-Free Cell Segmentation
Sanghyun Jo
Seo Jin Lee
Seungwoo Lee
Seohyung Hong
Hyungseok Seo
Kyungsu Kim
43
0
0
14 Mar 2025
Do computer vision foundation models learn the low-level characteristics of the human visual system?
Yancheng Cai
Fei Yin
Dounia Hammou
Rafal Mantiuk
VLM
Presented at
ResearchTrend Connect | VLM
on
14 Mar 2025
140
1
0
13 Mar 2025
Beyond Atoms: Enhancing Molecular Pretrained Representations with 3D Space Modeling
Shuqi Lu
Xiaohong Ji
Bohang Zhang
Lin Yao
Siyuan Liu
Zhifeng Gao
Linfeng Zhang
Guolin Ke
AI4CE
46
1
0
13 Mar 2025
Masked Sensory-Temporal Attention for Sensor Generalization in Quadruped Locomotion
Dikai Liu
Tianwei Zhang
Jianxiong Yin
Simon See
85
1
0
13 Mar 2025
Interactive Multimodal Fusion with Temporal Modeling
Jun-chen Yu
Yongqi Wang
Lei Wang
Yang Zheng
Shengfan Xu
67
1
0
13 Mar 2025
RoMA: Scaling up Mamba-based Foundation Models for Remote Sensing
Fengxiang Wang
H. Wang
Y. Wang
Di Wang
Mingshuo Chen
...
Yangang Sun
Shuo Wang
L. Lan
Wenjing Yang
Jing Zhang
Mamba
75
2
0
13 Mar 2025
Towards Graph Foundation Models: A Transferability Perspective
Y. Wang
Wenqi Fan
Suhang Wang
Yao Ma
41
1
0
13 Mar 2025
CoStoDet-DDPM: Collaborative Training of Stochastic and Deterministic Models Improves Surgical Workflow Anticipation and Recognition
Kaixiang Yang
Xin Li
Qiang Li
Zhiwei Wang
48
0
0
13 Mar 2025
Robustness Tokens: Towards Adversarial Robustness of Transformers
Brian Pulfer
Yury Belousov
S. Voloshynovskiy
AAML
37
0
0
13 Mar 2025
The Power of One: A Single Example is All it Takes for Segmentation in VLMs
Mir Rayat Imtiaz Hossain
Mennatullah Siam
Leonid Sigal
James J. Little
MLLM
VLM
72
0
0
13 Mar 2025
Panopticon: Advancing Any-Sensor Foundation Models for Earth Observation
Leonard Waldmann
Ando Shah
Yi Wang
Nils Lehmann
Adam J. Stewart
Zhitong Xiong
Xiao Xiang Zhu
Stefan Bauer
John Chuang
41
1
0
13 Mar 2025
AudioX: Diffusion Transformer for Anything-to-Audio Generation
Zeyue Tian
Yizhu Jin
Zhaoyang Liu
Ruibin Yuan
Xu Tan
Qifeng Chen
Wei Xue
Y. Guo
67
3
0
13 Mar 2025
Transformers without Normalization
Jiachen Zhu
Xinlei Chen
Kaiming He
Yann LeCun
Zhuang Liu
ViT
OffRL
51
7
0
13 Mar 2025
MotionLab: Unified Human Motion Generation and Editing via the Motion-Condition-Motion Paradigm
Ziyan Guo
Zeyu Hu
Na Zhao
De Wen Soh
VGen
82
2
0
13 Mar 2025
Semantic Latent Motion for Portrait Video Generation
Qiyuan Zhang
Chenyu Wu
Wenzhang Sun
Huaize Liu
Donglin Di
Wei Chen
Changqing Zou
VGen
67
0
0
13 Mar 2025
Freeze and Cluster: A Simple Baseline for Rehearsal-Free Continual Category Discovery
Chuyu Zhang
Xueyang Yu
Peiyan Gu
Xuming He
CLL
78
0
0
12 Mar 2025
Multi-Modal Foundation Models for Computational Pathology: A Survey
Dong Li
Guihong Wan
Xintao Wu
Xinyu Wu
Xiaohui Chen
Yi He
Christine G. Lian
Peter K. Sorger
Yevgeniy R. Semenov
Chen Zhao
MedIm
44
0
0
12 Mar 2025
Discovering Influential Neuron Path in Vision Transformers
Yifan Wang
Yifei Liu
Yingdong Shi
C. Li
Anqi Pang
Sibei Yang
Jingyi Yu
Kan Ren
ViT
69
0
0
12 Mar 2025
HaploVL: A Single-Transformer Baseline for Multi-Modal Understanding
Rui Yang
Lin Song
Yicheng Xiao
Runhui Huang
Yixiao Ge
Ying Shan
Hengshuang Zhao
MLLM
62
0
0
12 Mar 2025
Isolated Channel Vision Transformers: From Single-Channel Pretraining to Multi-Channel Finetuning
Wenyi Lian
Joakim Lindblad
Patrick Micke
Natasa Sladoje
57
0
0
12 Mar 2025
Evaluating Visual Explanations of Attention Maps for Transformer-based Medical Imaging
Minjae Chung
Jong Bum Won
Ganghyun Kim
Yujin Kim
Utku Ozbulak
MedIm
57
0
0
12 Mar 2025
CleverDistiller: Simple and Spatially Consistent Cross-modal Distillation
Hariprasath Govindarajan
Maciej K. Wozniak
Marvin Klingner
Camille Maurice
B. R. Kiran
S. Yogamani
53
0
0
12 Mar 2025
Evaluation of state-of-the-art deep learning models in the segmentation of the heart ventricles in parasternal short-axis echocardiograms
Julian Rene Cuellar Buritica
Vu Dinh
Manjula Burri
Julie Roelandts
James Wendling
Jon D. Klingensmith
58
0
0
12 Mar 2025
Scale-Aware Pre-Training for Human-Centric Visual Perception: Enabling Lightweight and Generalizable Models
Xuanhan Wang
Huimin Deng
Lianli Gao
Jingkuan Song
VLM
54
0
0
11 Mar 2025
Robust Latent Matters: Boosting Image Generation with Sampling Error Synthesis
Kai Qiu
X. Li
Jason Kuen
H. Chen
Xiaohao Xu
Jiuxiang Gu
Yinyi Luo
Bhiksha Raj
Zhe-nan Lin
Marios Savvides
57
0
0
11 Mar 2025
Seal Your Backdoor with Variational Defense
Ivan Sabolić
Matej Grcić
Sinisa Segvic
AAML
118
0
0
11 Mar 2025
"Principal Components" Enable A New Language of Images
Xin Wen
Bingchen Zhao
Ismail Elezi
Jiankang Deng
Xiaojuan Qi
61
0
0
11 Mar 2025
SARA: Structural and Adversarial Representation Alignment for Training-efficient Diffusion Models
Hesen Chen
Junyan Wang
Zhiyu Tan
Hao Li
53
0
0
11 Mar 2025
Pre-trained Models Succeed in Medical Imaging with Representation Similarity Degradation
Wenqiang Zu
Shenghao Xie
Hao Chen
Lei Ma
MedIm
42
0
0
11 Mar 2025
3D Medical Imaging Segmentation on Non-Contrast CT
Canxuan Gang
Yuhan Peng
55
0
0
11 Mar 2025
Seeing Beyond Haze: Generative Nighttime Image Dehazing
Beibei Lin
Stephen Lin
Robby T. Tan
DiffM
60
0
0
11 Mar 2025
Effective and Efficient Masked Image Generation Models
Zebin You
Jingyang Ou
Xiaolu Zhang
Jun Hu
Jun Zhou
Chongxuan Li
DiffM
VLM
54
1
0
10 Mar 2025
On the Generalization of Representation Uncertainty in Earth Observation
Spyros Kondylatos
N. Bountos
Dimitrios Michail
Xiao Xiang Zhu
Gustau Camps-Valls
Ioannis Papoutsis
66
1
0
10 Mar 2025
Iterative Prompt Relocation for Distribution-Adaptive Visual Prompt Tuning
Chikai Shang
Mengke Li
Yiqun Zhang
Zhen Chen
Jinlin Wu
Fangqing Gu
Yang Lu
Yiu-ming Cheung
VLM
69
0
0
10 Mar 2025
Denoising Hamiltonian Network for Physical Reasoning
Congyue Deng
Brandon Yushan Feng
Cecilia Garraffo
Alan Garbarz
Robin Walters
William T. Freeman
Leonidas J. Guibas
Kaiming He
AI4CE
63
0
0
10 Mar 2025
Alligat0R: Pre-Training Through Co-Visibility Segmentation for Relative Camera Pose Regression
Thibaut Loiseau
Guillaume Bourmaud
Vincent Lepetit
62
0
0
10 Mar 2025
Keeping Representation Similarity in Finetuning for Medical Image Analysis
Wenqiang Zu
Shenghao Xie
Hao Chen
Yiming Liang
Lei Ma
MedIm
OOD
43
0
0
10 Mar 2025
A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning
Xin Wen
Bingchen Zhao
Yilun Chen
Jiangmiao Pang
Xiaojuan Qi
LM&Ro
39
0
0
10 Mar 2025
OmniSAM: Omnidirectional Segment Anything Model for UDA in Panoramic Semantic Segmentation
Ding Zhong
Xu Zheng
Chenfei Liao
Yuanhuiyi Lyu
Jialei Chen
Shengyang Wu
Linfeng Zhang
Xuming Hu
VLM
53
4
0
10 Mar 2025
MIRAM: Masked Image Reconstruction Across Multiple Scales for Breast Lesion Risk Prediction
H. Q. Vo
Pengyu Yuan
Zheng Yin
Kelvin K. Wong
Chika F. Ezeana
S. Ly
Stephen T. C. Wong
H. Nguyen
39
0
0
10 Mar 2025
Previous
1
2
3
...
5
6
7
...
91
92
93
Next