Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.06377
Cited By
Masked Autoencoders Are Scalable Vision Learners
11 November 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Masked Autoencoders Are Scalable Vision Learners"
50 / 4,611 papers shown
Title
LRSCLIP: A Vision-Language Foundation Model for Aligning Remote Sensing Image with Longer Text
Weizhi Chen
Jingbo Chen
Yupeng Deng
Jiansheng Chen
Yuman Feng
Zhihao Xi
Diyou Liu
Kai Li
Yu Meng
VLM
51
0
0
25 Mar 2025
Vanishing Depth: A Depth Adapter with Positional Depth Encoding for Generalized Image Encoders
Paul Koch
Jörg Krüger
Ankit Chowdhury
O. Heimann
MDE
53
0
0
25 Mar 2025
ChA-MAEViT: Unifying Channel-Aware Masked Autoencoders and Multi-Channel Vision Transformers for Improved Cross-Channel Learning
Chau Pham
Juan C. Caicedo
Bryan A. Plummer
42
0
0
25 Mar 2025
Surg-3M: A Dataset and Foundation Model for Perception in Surgical Settings
Chengan Che
Chao Wang
Tom Vercauteren
Sophia Tsoka
Luis C. García-Peraza-Herrera
MedIm
41
0
0
25 Mar 2025
Recover from Horcrux: A Spectrogram Augmentation Method for Cardiac Feature Monitoring from Radar Signal Components
Y. Zhang
Sijie Xiong
Rui Yang
EngGee Lim
Yutao Yue
46
0
0
25 Mar 2025
Self-Supervised Learning of Motion Concepts by Optimizing Counterfactuals
Stefan Stojanov
David Wendt
Seungwoo Kim
R. Venkatesh
Kevin T. Feigelis
Jiajun Wu
Daniel L. K. Yamins
SSL
66
0
0
25 Mar 2025
Scaling Vision Pre-Training to 4K Resolution
Baifeng Shi
Boyi Li
Han Cai
Y. Lu
Sifei Liu
...
Jan Kautz
Song Han
Trevor Darrell
Pavlo Molchanov
Hongxu Yin
CLIP
109
0
0
25 Mar 2025
An Overview of Low-Rank Structures in the Training and Adaptation of Large Models
Laura Balzano
Tianjiao Ding
B. Haeffele
Soo Min Kwon
Qing Qu
Peng Wang
Z. Wang
Can Yaras
OffRL
AI4CE
60
0
0
25 Mar 2025
Out-of-distribution evaluations of channel agnostic masked autoencoders in fluorescence microscopy
Christian John Hurry
Jinjie Zhang
Olubukola Ishola
Emma Slade
Cuong Q. Nguyen
OOD
OODD
60
0
0
24 Mar 2025
MoST: Efficient Monarch Sparse Tuning for 3D Representation Learning
Xu Han
Yuan Tang
Jinfeng Xu
Xianzhi Li
51
0
0
24 Mar 2025
U-REPA: Aligning Diffusion U-Nets to ViTs
Yuchuan Tian
Hanting Chen
Mengyu Zheng
Yuchen Liang
Chao Xu
Yunhe Wang
54
0
0
24 Mar 2025
Video-XL-Pro: Reconstructive Token Compression for Extremely Long Video Understanding
Xiangrui Liu
Yan Shu
Zheng Liu
Ao Li
Yang Tian
Bo Zhao
VGen
VLM
86
0
0
24 Mar 2025
Revisiting Automatic Data Curation for Vision Foundation Models in Digital Pathology
Boqi Chen
Cédric Vincent-Cuaz
Lydia A. Schoenpflug
Manuel Madeira
Lisa Fournier
...
D. Thanou
V. Koelzer
Pascal Frossard
Gabriele Campanella
Gunnar Rätsch
46
0
0
24 Mar 2025
Self-Supervised Learning based on Transformed Image Reconstruction for Equivariance-Coherent Feature Representation
Qin Wang
Benjamin Bruns
Hanno Scharr
Kai Krajsek
53
0
0
24 Mar 2025
Linguistics-aware Masked Image Modeling for Self-supervised Scene Text Recognition
Yifei Zhang
Chang-Shu Liu
Jin Wei
Xiaomeng Yang
Yu Zhou
Can Ma
Xiangyang Ji
60
2
0
24 Mar 2025
Explaining Domain Shifts in Language: Concept erasing for Interpretable Image Classification
Zequn Zeng
Yudi Su
Jianqiao Sun
Tiansheng Wen
Hao Zhang
Zhengjue Wang
Bo Chen
Hongwei Liu
Jiawei Ma
VLM
58
0
0
24 Mar 2025
HOIGPT: Learning Long Sequence Hand-Object Interaction with Language Models
Mingzhen Huang
Fu-Jen Chu
Bugra Tekin
Kevin J Liang
Haoyu Ma
...
Hongfei Xue
Siwei Lyu
Kris M. Kitani
Matt Feiszli
Hao Tang
VLM
65
0
0
24 Mar 2025
FisherTune: Fisher-Guided Robust Tuning of Vision Foundation Models for Domain Generalized Segmentation
Dong Zhao
Jinlong Li
Shuang Wang
Mengyao Wu
Qi Zang
N. Sebe
Zhun Zhong
117
0
0
23 Mar 2025
Interpretable Feature Interaction via Statistical Self-supervised Learning on Tabular Data
Xiaochen Zhang
Haoyi Xiong
34
0
0
23 Mar 2025
SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining
Yue Li
Qi Ma
Runyi Yang
Huapeng Li
Mengjiao Ma
...
E. Konukoglu
Theo Gevers
Luc Van Gool
Martin R. Oswald
Danda Pani Paudel
3DGS
VLM
71
0
0
23 Mar 2025
Leveraging Audio Representations for Vibration-Based Crowd Monitoring in Stadiums
Yen Cheng Chang
Jesse Codling
Yiwen Dong
J. Zhang
Jiasi Chen
Hae Young Noh
Pei Zhang
51
0
0
22 Mar 2025
EasyRobust: A Comprehensive and Easy-to-use Toolkit for Robust and Generalized Vision
Xiaofeng Mao
YueFeng Chen
Rong Zhang
Hui Xue
Zhao Li
Hang Su
AAML
VLM
41
0
0
21 Mar 2025
ModalTune: Fine-Tuning Slide-Level Foundation Models with Multi-Modal Information for Multi-task Learning in Digital Pathology
Vishwesh Ramanathan
Tony Xu
Pushpak Pati
Faruk Ahmed
Maged Goubran
Anne L. Martel
43
0
0
21 Mar 2025
Halton Scheduler For Masked Generative Image Transformer
Victor Besnier
Mickael Chen
David Hurych
Eduardo Valle
Matthieu Cord
47
1
0
21 Mar 2025
Seeing What Matters: Empowering CLIP with Patch Generation-to-Selection
Gensheng Pei
Tao Chen
Yujia Wang
Xinhao Cai
Xiangbo Shu
Tianfei Zhou
Yazhou Yao
VLM
51
1
0
21 Mar 2025
Missing Target-Relevant Information Prediction with World Model for Accurate Zero-Shot Composed Image Retrieval
Yuanmin Tang
Jing Yu
Keke Gai
Jiamin Zhuang
Gang Xiong
Gaopeng Gou
Qi Wu
VGen
42
1
0
21 Mar 2025
Beyond Accuracy: What Matters in Designing Well-Behaved Models?
Robin Hesse
Doğukan Bağcı
Bernt Schiele
Simone Schaub-Meyer
Stefan Roth
VLM
57
0
0
21 Mar 2025
D2C: Unlocking the Potential of Continuous Autoregressive Image Generation with Discrete Tokens
Panpan Wang
Liqiang Niu
Fandong Meng
Jinan Xu
Yufeng Chen
Jie Zhou
DiffM
45
0
0
21 Mar 2025
Should we pre-train a decoder in contrastive learning for dense prediction tasks?
S. Quetin
Tapotosh Ghosh
Farhad Maleki
SSL
72
0
0
21 Mar 2025
MapGlue: Multimodal Remote Sensing Image Matching
Peihao Wu
Yongxiang Yao
Wenfei Zhang
Dong Wei
Y. Wan
Yansheng Li
Yongjun Zhang
44
0
0
20 Mar 2025
GAIR: Improving Multimodal Geo-Foundation Model with Geo-Aligned Implicit Representations
Z. Liu
Fan Zhang
Junfeng Jiao
Ni Lao
Gengchen Mai
47
1
0
20 Mar 2025
Learning 3D Scene Analogies with Neural Contextual Scene Maps
Junho Kim
Gwangtak Bae
E. Lee
Young Min Kim
3DPC
3DV
60
0
0
20 Mar 2025
BlockDance: Reuse Structurally Similar Spatio-Temporal Features to Accelerate Diffusion Transformers
Hui Zhang
Tingwei Gao
Jie Shao
Zuxuan Wu
64
0
0
20 Mar 2025
Tokenize Image as a Set
Zigang Geng
Mengde Xu
Han Hu
Shuyang Gu
DiffM
48
0
0
20 Mar 2025
DynamicVis: An Efficient and General Visual Foundation Model for Remote Sensing Image Understanding
Keyan Chen
Chenyang Liu
Bowen Chen
Wenyuan Li
Zhengxia Zou
Zhenwei Shi
39
2
0
20 Mar 2025
Object-Centric Pretraining via Target Encoder Bootstrapping
Nikola Đukić
Tim Lebailly
Tinne Tuytelaars
OCL
66
0
0
19 Mar 2025
Conjuring Positive Pairs for Efficient Unification of Representation Learning and Image Synthesis
Imanol G. Estepa
Jesús M. Rodríguez-de-Vera
Ignacio Sarasúa
Bhalaji Nagarajan
P. Radeva
52
0
0
19 Mar 2025
Transport-Related Surface Detection with Machine Learning: Analyzing Temporal Trends in Madrid and Vienna
Miguel Ureña Pliego
Rubén Martínez Marín
Nianfang Shi
Takeru Shibayama
Ulrich Leth
Miguel Marchamalo Sacristán
53
0
0
19 Mar 2025
Aligning Information Capacity Between Vision and Language via Dense-to-Sparse Feature Distillation for Image-Text Matching
Yang Liu
Wentao Feng
Zhuoyao Liu
Shudong Huang
Jiancheng Lv
DiffM
VLM
51
0
0
19 Mar 2025
CAM-Seg: A Continuous-valued Embedding Approach for Semantic Image Generation
Masud Ahmed
Zahid Hasan
Syed Arefinul Haque
A. Faridee
S. Purushotham
Suya You
Nirmalya Roy
48
0
0
19 Mar 2025
TULIP: Towards Unified Language-Image Pretraining
Zineng Tang
Long Lian
Seun Eisape
Xudong Wang
Roei Herzig
Adam Yala
Alane Suhr
Trevor Darrell
David M. Chan
VLM
CLIP
MLLM
95
3
0
19 Mar 2025
Representational Similarity via Interpretable Visual Concepts
Neehar Kondapaneni
Oisin Mac Aodha
Pietro Perona
DRL
130
0
0
19 Mar 2025
CHROME: Clothed Human Reconstruction with Occlusion-Resilience and Multiview-Consistency from a Single Image
Arindam Dutta
Meng Zheng
Zhongpai Gao
Benjamin Planche
Anwesha Choudhuri
Terrence Chen
A. Roy-Chowdhury
Ziyan Wu
3DH
69
1
0
19 Mar 2025
FedSCA: Federated Tuning with Similarity-guided Collaborative Aggregation for Heterogeneous Medical Image Segmentation
Yumin Zhang
Yan Gao
Haoran Duan
Hanqing Guo
Tejal Shah
R. Ranjan
Bo Wei
FedML
68
0
0
19 Mar 2025
FusDreamer: Label-efficient Remote Sensing World Model for Multimodal Data Classification
J. Wang
Weiwei Song
Hao Chen
J. Ren
Huimin Zhao
62
0
0
18 Mar 2025
Deeply Supervised Flow-Based Generative Models
Inkyu Shin
Chenglin Yang
Liang-Chieh Chen
58
0
0
18 Mar 2025
Utilization of Neighbor Information for Image Classification with Different Levels of Supervision
Gihan Jayatilaka
Abhinav Shrivastava
M. Gwilliam
59
0
0
18 Mar 2025
Towards Scalable Foundation Model for Multi-modal and Hyperspectral Geospatial Data
Haozhe Si
Yuxuan Wan
Minh Do
Deepak Vasisht
Han Zhao
Hendrik Hamann
41
0
0
17 Mar 2025
Quantum EigenGame for excited state calculation
David Quiroga
Jason Han
Anastasios Kyrillidis
48
0
0
17 Mar 2025
Graph Generative Models Evaluation with Masked Autoencoder
Chengen Wang
Murat Kantarcioglu
46
0
0
17 Mar 2025
Previous
1
2
3
4
5
6
...
91
92
93
Next