ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.06377
  4. Cited By
Masked Autoencoders Are Scalable Vision Learners

Masked Autoencoders Are Scalable Vision Learners

11 November 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
    ViT
    TPM
ArXivPDFHTML

Papers citing "Masked Autoencoders Are Scalable Vision Learners"

50 / 4,611 papers shown
Title
Adept: Annotation-Denoising Auxiliary Tasks with Discrete Cosine Transform Map and Keypoint for Human-Centric Pretraining
Adept: Annotation-Denoising Auxiliary Tasks with Discrete Cosine Transform Map and Keypoint for Human-Centric Pretraining
Weizhen He
Yunfeng Yan
Shixiang Tang
Yiheng Deng
Yangyang Zhong
Pengxin Luo
Donglian Qi
VLM
86
1
0
29 Apr 2025
GarmentX: Autoregressive Parametric Representations for High-Fidelity 3D Garment Generation
GarmentX: Autoregressive Parametric Representations for High-Fidelity 3D Garment Generation
Jingfeng Guo
J. Chen
Weikai Chen
Zhenyu Sun
Lanjiong Li
Baozhu Zhao
Lingting Zhu
X. Wang
Qi Liu
3DH
80
0
0
29 Apr 2025
LR-IAD:Mask-Free Industrial Anomaly Detection with Logical Reasoning
LR-IAD:Mask-Free Industrial Anomaly Detection with Logical Reasoning
Peijian Zeng
Feiyan Pang
Zhanbo Wang
Aimin Yang
71
0
0
28 Apr 2025
Towards Robust Multimodal Physiological Foundation Models: Handling Arbitrary Missing Modalities
Towards Robust Multimodal Physiological Foundation Models: Handling Arbitrary Missing Modalities
Xi Fu
Wei-Bang Jiang
Yi Ding
Cuntai Guan
41
0
0
28 Apr 2025
Learning Streaming Video Representation via Multitask Training
Learning Streaming Video Representation via Multitask Training
Yibin Yan
Jilan Xu
Shangzhe Di
Yikun Liu
Yudi Shi
Qirui Chen
Zeqian Li
Yifei Huang
Weidi Xie
CLL
84
0
0
28 Apr 2025
CARL: Camera-Agnostic Representation Learning for Spectral Image Analysis
CARL: Camera-Agnostic Representation Learning for Spectral Image Analysis
Alexander Baumann
Leonardo Ayala
S.
Jan Sellner
Alexander Studier-Fischer
Berkin Özdemir
Lena Maier-Hein
Slobodan Ilic
51
0
0
27 Apr 2025
HoloDx: Knowledge- and Data-Driven Multimodal Diagnosis of Alzheimer's Disease
HoloDx: Knowledge- and Data-Driven Multimodal Diagnosis of Alzheimer's Disease
Qiuhui Chen
Jintao Wang
Gang Wang
Yi Hong
47
0
0
27 Apr 2025
OpenFusion++: An Open-vocabulary Real-time Scene Understanding System
OpenFusion++: An Open-vocabulary Real-time Scene Understanding System
Xiaofeng Jin
Matteo Frosi
Matteo Matteucci
113
0
0
27 Apr 2025
PyViT-FUSE: A Foundation Model for Multi-Sensor Earth Observation Data
PyViT-FUSE: A Foundation Model for Multi-Sensor Earth Observation Data
Manuel Weber
Carly Beneke
ViT
61
0
0
26 Apr 2025
RoboVerse: Towards a Unified Platform, Dataset and Benchmark for Scalable and Generalizable Robot Learning
RoboVerse: Towards a Unified Platform, Dataset and Benchmark for Scalable and Generalizable Robot Learning
Haoran Geng
Feishi Wang
Songlin Wei
Y. Li
Bangjun Wang
...
Hao Dong
Siyuan Huang
Yue Wang
Jitendra Malik
Pieter Abbeel
80
4
0
26 Apr 2025
What is the Added Value of UDA in the VFM Era?
What is the Added Value of UDA in the VFM Era?
B. B. Englert
Tommie Kerssies
Gijs Dubbelman
39
0
0
25 Apr 2025
E-InMeMo: Enhanced Prompting for Visual In-Context Learning
E-InMeMo: Enhanced Prompting for Visual In-Context Learning
Jiahao Zhang
Bowen Wang
Hong Liu
Liangzhi Li
Yuta Nakashima
Hajime Nagahara
VLM
99
0
0
25 Apr 2025
A BERT-Style Self-Supervised Learning CNN for Disease Identification from Retinal Images
A BERT-Style Self-Supervised Learning CNN for Disease Identification from Retinal Images
Xin Li
Wenhui Zhu
Peijie Qiu
Oana Dumitrascu
Amal Youssef
Y. Wang
SSL
MedIm
87
0
0
25 Apr 2025
SSL4Eco: A Global Seasonal Dataset for Geospatial Foundation Models in Ecology
SSL4Eco: A Global Seasonal Dataset for Geospatial Foundation Models in Ecology
Elena Plekhanova
Damien Robert
Johannes Dollinger
Emilia Arens
Philipp Brun
Jan Dirk Wegner
Niklaus Zimmermann
19
0
0
25 Apr 2025
CIVIL: Causal and Intuitive Visual Imitation Learning
CIVIL: Causal and Intuitive Visual Imitation Learning
Yinlong Dai
Robert Ramirez Sanchez
Ryan Jeronimus
Shahabedin Sagheb
Cara M. Nunez
Heramb Nemlekar
Dylan P. Losey
61
0
0
24 Apr 2025
Occlusion-Aware Self-Supervised Monocular Depth Estimation for Weak-Texture Endoscopic Images
Occlusion-Aware Self-Supervised Monocular Depth Estimation for Weak-Texture Endoscopic Images
Zebo Huang
Yinghui Wang
MDE
28
0
0
24 Apr 2025
Fine-tune Smarter, Not Harder: Parameter-Efficient Fine-Tuning for Geospatial Foundation Models
Fine-tune Smarter, Not Harder: Parameter-Efficient Fine-Tuning for Geospatial Foundation Models
Francesc Marti Escofet
Benedikt Blumenstiel
L. Scheibenreif
P. Fraccaro
Konrad Schindler
41
0
0
24 Apr 2025
A Simple Review of EEG Foundation Models: Datasets, Advancements and Future Perspectives
A Simple Review of EEG Foundation Models: Datasets, Advancements and Future Perspectives
Junhong Lai
Jiyu Wei
Lin Yao
Yueming Wang
38
0
0
24 Apr 2025
A Genealogy of Multi-Sensor Foundation Models in Remote Sensing
A Genealogy of Multi-Sensor Foundation Models in Remote Sensing
Kevin Lane
Morteza Karimzadeh
36
0
0
24 Apr 2025
Prompt-Tuning SAM: From Generalist to Specialist with only 2048 Parameters and 16 Training Images
Prompt-Tuning SAM: From Generalist to Specialist with only 2048 Parameters and 16 Training Images
Tristan Piater
Björn Barz
Alexander Freytag
VLM
MedIm
55
0
0
23 Apr 2025
Federated EndoViT: Pretraining Vision Transformers via Federated Learning on Endoscopic Image Collections
Federated EndoViT: Pretraining Vision Transformers via Federated Learning on Endoscopic Image Collections
Max Kirchner
Alexander C. Jenke
S. Bodenstedt
F. Kolbinger
Oliver Saldanha
Jakob N. Kather
M. Wagner
Stefanie Speidel
FedML
MedIm
62
0
0
23 Apr 2025
MTSGL: Multi-Task Structure Guided Learning for Robust and Interpretable SAR Aircraft Recognition
MTSGL: Multi-Task Structure Guided Learning for Robust and Interpretable SAR Aircraft Recognition
Qishan He
Lingjun Zhao
Ru Luo
Siqian Zhang
Lin Lei
Kefeng Ji
Gangyao Kuang
22
0
0
23 Apr 2025
$π_{0.5}$: a Vision-Language-Action Model with Open-World Generalization
π0.5π_{0.5}π0.5​: a Vision-Language-Action Model with Open-World Generalization
Physical Intelligence
Kevin Black
Noah Brown
James Darpinian
Karan Dhabalia
...
Homer Walke
Anna Walling
Haohuan Wang
Lili Yu
Ury Zhilinsky
LM&Ro
VLM
31
10
0
22 Apr 2025
SparseJEPA: Sparse Representation Learning of Joint Embedding Predictive Architectures
SparseJEPA: Sparse Representation Learning of Joint Embedding Predictive Architectures
Max Hartman
L. Varshney
22
0
0
22 Apr 2025
ForesightNav: Learning Scene Imagination for Efficient Exploration
ForesightNav: Learning Scene Imagination for Efficient Exploration
Hardik Shah
Jiaxu Xing
Nico Messikommer
Boyang Sun
Marc Pollefeys
Davide Scaramuzza
67
0
0
22 Apr 2025
PointLoRA: Low-Rank Adaptation with Token Selection for Point Cloud Learning
PointLoRA: Low-Rank Adaptation with Token Selection for Point Cloud Learning
Song Wang
Xiaolu Liu
Lingdong Kong
Jianyun Xu
Chunyong Hu
Gongfan Fang
Wentong Li
Jianke Zhu
Xinchao Wang
22
0
0
22 Apr 2025
Boosting Generative Image Modeling via Joint Image-Feature Synthesis
Boosting Generative Image Modeling via Joint Image-Feature Synthesis
Theodoros Kouzelis
Efstathios Karypidis
Ioannis Kakogeorgiou
Spyros Gidaris
N. Komodakis
DiffM
26
0
0
22 Apr 2025
OmniSage: Large Scale, Multi-Entity Heterogeneous Graph Representation Learning
OmniSage: Large Scale, Multi-Entity Heterogeneous Graph Representation Learning
Anirudhan Badrinath
Alex Yang
Kousik Rajesh
Prabhat Agarwal
Jaewon Yang
Haoyu Chen
Jiajing Xu
Charles R. Rosenberg
AI4TS
27
0
0
22 Apr 2025
Multimodal Perception for Goal-oriented Navigation: A Survey
Multimodal Perception for Goal-oriented Navigation: A Survey
I-Tak Ieong
Hao Tang
LM&Ro
LRM
29
0
0
22 Apr 2025
Can We Ignore Labels In Out of Distribution Detection?
Can We Ignore Labels In Out of Distribution Detection?
Hong Yang
Qi Yu
Travis Desel
OODD
36
0
0
20 Apr 2025
SUDO: Enhancing Text-to-Image Diffusion Models with Self-Supervised Direct Preference Optimization
SUDO: Enhancing Text-to-Image Diffusion Models with Self-Supervised Direct Preference Optimization
Liang Peng
Boxi Wu
Haoran Cheng
Yibo Zhao
Xiaofei He
29
0
0
20 Apr 2025
Exploring Generalizable Pre-training for Real-world Change Detection via Geometric Estimation
Exploring Generalizable Pre-training for Real-world Change Detection via Geometric Estimation
Yitao Zhao
Sen Lei
Nanqing Liu
Heng Li
Turgay Celik
Qing Zhu
24
0
0
19 Apr 2025
Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D
Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D
Sergio Arnaud
Paul Mcvay
Ada Martin
Arjun Majumdar
Krishna Murthy Jatavallabhula
...
Nicolas Ballas
Mido Assran
Oleksandr Maksymets
Aravind Rajeswaran
Franziska Meier
3DPC
41
0
0
19 Apr 2025
DAM-Net: Domain Adaptation Network with Micro-Labeled Fine-Tuning for Change Detection
DAM-Net: Domain Adaptation Network with Micro-Labeled Fine-Tuning for Change Detection
H. Chen
Xin Xu
Fangling Pu
28
0
0
18 Apr 2025
BeetleVerse: A study on taxonomic classification of ground beetles
BeetleVerse: A study on taxonomic classification of ground beetles
S M Rayeed
Alyson East
Samuel Stevens
Sydne Record
Charles V. Stewart
21
0
0
18 Apr 2025
6G WavesFM: A Foundation Model for Sensing, Communication, and Localization
6G WavesFM: A Foundation Model for Sensing, Communication, and Localization
Ahmed Aboulfotouh
E. Mohammed
Hatem Abou-Zeid
24
0
0
18 Apr 2025
CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning
CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning
Yang Yue
Yulin Wang
Chenxin Tao
Pan Liu
Shiji Song
Gao Huang
MedIm
24
0
0
18 Apr 2025
CM3AE: A Unified RGB Frame and Event-Voxel/-Frame Pre-training Framework
CM3AE: A Unified RGB Frame and Event-Voxel/-Frame Pre-training Framework
Wentao Wu
X. Wang
Chenglong Li
Bo Jiang
Jin Tang
Bin Luo
Qi Liu
29
0
0
17 Apr 2025
PSG-MAE: Robust Multitask Sleep Event Monitoring using Multichannel PSG Reconstruction and Inter-channel Contrastive Learning
PSG-MAE: Robust Multitask Sleep Event Monitoring using Multichannel PSG Reconstruction and Inter-channel Contrastive Learning
Yifei Wang
Qi Liu
Fuli Min
Honghao Wang
17
0
0
17 Apr 2025
SAR Object Detection with Self-Supervised Pretraining and Curriculum-Aware Sampling
SAR Object Detection with Self-Supervised Pretraining and Curriculum-Aware Sampling
Yasin Almalioglu
Andrzej Kucik
Geoffrey French
Dafni Antotsiou
Alexander Adam
Cedric Archambeau
21
0
0
17 Apr 2025
Perception Encoder: The best visual embeddings are not at the output of the network
Perception Encoder: The best visual embeddings are not at the output of the network
Daniel Bolya
Po-Yao (Bernie) Huang
Peize Sun
Jang Hyun Cho
Andrea Madotto
...
Shiyu Dong
Nikhila Ravi
Daniel Li
Piotr Dollár
Christoph Feichtenhofer
ObjD
VOS
103
0
0
17 Apr 2025
Can Masked Autoencoders Also Listen to Birds?
Can Masked Autoencoders Also Listen to Birds?
Lukas Rauch
Ilyass Moummad
René Heinrich
Alexis Joly
Bernhard Sick
Christoph Scholz
27
0
0
17 Apr 2025
Towards Cardiac MRI Foundation Models: Comprehensive Visual-Tabular Representations for Whole-Heart Assessment and Beyond
Towards Cardiac MRI Foundation Models: Comprehensive Visual-Tabular Representations for Whole-Heart Assessment and Beyond
Yundi Zhang
Paul Hager
Che Liu
Suprosanna Shit
C. L. P. Chen
Daniel Rueckert
Jiazhen Pan
40
0
0
17 Apr 2025
EchoWorld: Learning Motion-Aware World Models for Echocardiography Probe Guidance
EchoWorld: Learning Motion-Aware World Models for Echocardiography Probe Guidance
Yang Yue
Yulin Wang
Haojun Jiang
Pan Liu
S. Song
Gao Huang
VGen
27
0
0
17 Apr 2025
LIFT+: Lightweight Fine-Tuning for Long-Tail Learning
LIFT+: Lightweight Fine-Tuning for Long-Tail Learning
Jiang-Xin Shi
Tong Wei
Yu-Feng Li
25
0
0
17 Apr 2025
Self-Supervised Pre-training with Combined Datasets for 3D Perception in Autonomous Driving
Self-Supervised Pre-training with Combined Datasets for 3D Perception in Autonomous Driving
Shumin Wang
Zhuoran Yang
L. Wang
Zhipeng Tang
Heng Li
Lehan Pan
Sha Zhang
Jie Peng
J. Ji
Y. Zhang
3DPC
41
0
0
17 Apr 2025
AnomalyR1: A GRPO-based End-to-end MLLM for Industrial Anomaly Detection
AnomalyR1: A GRPO-based End-to-end MLLM for Industrial Anomaly Detection
Yuhao Chao
Jie Liu
J. Tang
Gangshan Wu
25
1
0
16 Apr 2025
Generative Recommendation with Continuous-Token Diffusion
Generative Recommendation with Continuous-Token Diffusion
Haohao Qu
Wenqi Fan
Shanru Lin
DiffM
84
0
0
16 Apr 2025
SIDME: Self-supervised Image Demoiréing via Masked Encoder-Decoder Reconstruction
SIDME: Self-supervised Image Demoiréing via Masked Encoder-Decoder Reconstruction
Xia Wang
Haiyang Sun
Tiantian Cao
Yueying Sun
Min Feng
DiffM
37
0
0
16 Apr 2025
A Complex-valued SAR Foundation Model Based on Physically Inspired Representation Learning
A Complex-valued SAR Foundation Model Based on Physically Inspired Representation Learning
M. D. Wang
Hanbo Bi
Yingchao Feng
Linlin Xin
Shuo Gong
Tianqi Wang
Zhiyuan Yan
Peijin Wang
Wenhui Diao
Xian Sun
29
0
0
16 Apr 2025
Previous
12345...919293
Next