ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.06377
  4. Cited By
Masked Autoencoders Are Scalable Vision Learners
v1v2v3 (latest)

Masked Autoencoders Are Scalable Vision Learners

11 November 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
    ViTTPM
ArXiv (abs)PDFHTML

Papers citing "Masked Autoencoders Are Scalable Vision Learners"

50 / 4,777 papers shown
Title
Enhancing User Sequence Modeling through Barlow Twins-based Self-Supervised Learning
Enhancing User Sequence Modeling through Barlow Twins-based Self-Supervised Learning
Yuhan Liu
Lin Ning
Neo Wu
Karan Singhal
Philip Mansfield
D. Berlowitz
Sushant Prakash
Bradley Green
SSL
119
0
0
02 May 2025
Contextures: Representations from Contexts
Contextures: Representations from Contexts
Runtian Zhai
Kai Yang
Che-Ping Tsai
Burak Varici
Zico Kolter
Pradeep Ravikumar
447
0
0
02 May 2025
Self-Supervision Enhances Instance-based Multiple Instance Learning Methods in Digital Pathology: A Benchmark Study
Self-Supervision Enhances Instance-based Multiple Instance Learning Methods in Digital Pathology: A Benchmark Study
Ali Mammadov
Loic Le Folgoc
Julien Adam
Anne Buronfosse
Gilles Hayem
Guillaume Hocquet
Pietro Gori
SSL
75
0
0
02 May 2025
ViSA-Flow: Accelerating Robot Skill Learning via Large-Scale Video Semantic Action Flow
ViSA-Flow: Accelerating Robot Skill Learning via Large-Scale Video Semantic Action Flow
Changhe Chen
Quantao Yang
Xiaohao Xu
Nima Fazeli
Olov Andersson
122
0
0
02 May 2025
A Sensor Agnostic Domain Generalization Framework for Leveraging Geospatial Foundation Models: Enhancing Semantic Segmentation viaSynergistic Pseudo-Labeling and Generative Learning
A Sensor Agnostic Domain Generalization Framework for Leveraging Geospatial Foundation Models: Enhancing Semantic Segmentation viaSynergistic Pseudo-Labeling and Generative Learning
Anan Yaghmour
Melba M. Crawford
Saurabh Prasad
63
0
0
02 May 2025
Protocol-agnostic and Data-free Backdoor Attacks on Pre-trained Models in RF Fingerprinting
Protocol-agnostic and Data-free Backdoor Attacks on Pre-trained Models in RF Fingerprinting
Tianya Zhao
Ningning Wang
Junqing Zhang
Xuyu Wang
AAML
70
0
0
01 May 2025
SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models
SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models
Wufei Ma
Luoxin Ye
Nessa McWeeney
Celso M de Melo
Jieneng Chen
LRM
118
1
0
01 May 2025
Multimodal Masked Autoencoder Pre-training for 3D MRI-Based Brain Tumor Analysis with Missing Modalities
Multimodal Masked Autoencoder Pre-training for 3D MRI-Based Brain Tumor Analysis with Missing Modalities
Lucas Robinet
Ahmad Berjaoui
Elizabeth Cohen-Jonathan Moyal
55
0
0
01 May 2025
AdCare-VLM: Leveraging Large Vision Language Model (LVLM) to Monitor Long-Term Medication Adherence and Care
AdCare-VLM: Leveraging Large Vision Language Model (LVLM) to Monitor Long-Term Medication Adherence and Care
Md Asaduzzaman Jabin
Hanqi Jiang
Yuchen Li
Patrick Kaggwa
Eugene Douglass
Juliet N. Sekandi
Tianming Liu
LM&MA
146
0
0
01 May 2025
Vision Mamba in Remote Sensing: A Comprehensive Survey of Techniques, Applications and Outlook
Vision Mamba in Remote Sensing: A Comprehensive Survey of Techniques, Applications and Outlook
Muyi Bao
Shuchang Lyu
Zhaoyang Xu
Huiyu Zhou
Jinchang Ren
Shiming Xiang
Xuelong Li
Guangliang Cheng
Mamba
267
0
0
01 May 2025
RoboGround: Robotic Manipulation with Grounded Vision-Language Priors
RoboGround: Robotic Manipulation with Grounded Vision-Language Priors
Haifeng Huang
Xinyi Chen
Yuxiao Chen
Haoyang Li
Xiaoshen Han
Zihao Wang
Tai Wang
Jiangmiao Pang
Zhou Zhao
LM&Ro
148
1
0
30 Apr 2025
Wireless Communication as an Information Sensor for Multi-agent Cooperative Perception: A Survey
Wireless Communication as an Information Sensor for Multi-agent Cooperative Perception: A Survey
Zhiying Song
Tenghui Xie
Fuxi Wen
Jun Li
79
1
0
30 Apr 2025
Insulin Resistance Prediction From Wearables and Routine Blood Biomarkers
Insulin Resistance Prediction From Wearables and Routine Blood Biomarkers
Ahmed A. Metwally
A. Heydari
Daniel J. McDuff
Alexandru Solot
Zeinab Esmaeilpour
...
David B. Savage
C. Heneghan
Shwetak N. Patel
Cathy Speed
Javier L. Prieto
78
1
0
30 Apr 2025
Embracing Collaboration Over Competition: Condensing Multiple Prompts for Visual In-Context Learning
Embracing Collaboration Over Competition: Condensing Multiple Prompts for Visual In-Context Learning
Jiadong Wang
Tianci Luo
Yaohua Zha
Yan Feng
Ruisheng Luo
Bin Chen
Tao Dai
Long Chen
Yaowei Wang
Shu-Tao Xia
VLM
102
0
0
30 Apr 2025
Adept: Annotation-Denoising Auxiliary Tasks with Discrete Cosine Transform Map and Keypoint for Human-Centric Pretraining
Adept: Annotation-Denoising Auxiliary Tasks with Discrete Cosine Transform Map and Keypoint for Human-Centric Pretraining
Weizhen He
Yunfeng Yan
Shixiang Tang
Yiheng Deng
Yangyang Zhong
Pengxin Luo
Donglian Qi
VLM
197
1
0
29 Apr 2025
GarmentX: Autoregressive Parametric Representations for High-Fidelity 3D Garment Generation
GarmentX: Autoregressive Parametric Representations for High-Fidelity 3D Garment Generation
Jingfeng Guo
Jianfei Chen
Weikai Chen
Zhenyu Sun
Lanjiong Li
Baozhu Zhao
Lingting Zhu
Xinyu Wang
Qi Liu
3DH
155
0
0
29 Apr 2025
Multimodal Large Language Models for Medicine: A Comprehensive Survey
Multimodal Large Language Models for Medicine: A Comprehensive Survey
Jiarui Ye
Hao Tang
LM&MA
183
0
0
29 Apr 2025
Towards Robust Multimodal Physiological Foundation Models: Handling Arbitrary Missing Modalities
Towards Robust Multimodal Physiological Foundation Models: Handling Arbitrary Missing Modalities
Xi Fu
Wei-Bang Jiang
Yi Ding
Cuntai Guan
118
0
0
28 Apr 2025
Learning Streaming Video Representation via Multitask Training
Learning Streaming Video Representation via Multitask Training
Yibin Yan
Jilan Xu
Shangzhe Di
Yikun Liu
Yudi Shi
Qirui Chen
Zeqian Li
Yifei Huang
Weidi Xie
CLL
164
1
0
28 Apr 2025
LR-IAD:Mask-Free Industrial Anomaly Detection with Logical Reasoning
LR-IAD:Mask-Free Industrial Anomaly Detection with Logical Reasoning
Peijian Zeng
Feiyan Pang
Zhanbo Wang
Aimin Yang
134
0
0
28 Apr 2025
OpenFusion++: An Open-vocabulary Real-time Scene Understanding System
OpenFusion++: An Open-vocabulary Real-time Scene Understanding System
Xiaofeng Jin
Matteo Frosi
Matteo Matteucci
439
0
0
27 Apr 2025
HoloDx: Knowledge- and Data-Driven Multimodal Diagnosis of Alzheimer's Disease
HoloDx: Knowledge- and Data-Driven Multimodal Diagnosis of Alzheimer's Disease
Qiuhui Chen
Jintao Wang
Gang Wang
Yi Hong
80
0
0
27 Apr 2025
CARL: Camera-Agnostic Representation Learning for Spectral Image Analysis
CARL: Camera-Agnostic Representation Learning for Spectral Image Analysis
Alexander Baumann
Leonardo Ayala
Siyang Song
Jan Sellner
Alexander Studier-Fischer
Berkin Özdemir
Lena Maier-Hein
Slobodan Ilic
108
0
0
27 Apr 2025
RoboVerse: Towards a Unified Platform, Dataset and Benchmark for Scalable and Generalizable Robot Learning
RoboVerse: Towards a Unified Platform, Dataset and Benchmark for Scalable and Generalizable Robot Learning
Haoran Geng
Feishi Wang
Songlin Wei
Yuchen Li
Bangjun Wang
...
Hao Dong
Siyuan Huang
Yue Wang
Jitendra Malik
Pieter Abbeel
185
8
0
26 Apr 2025
PyViT-FUSE: A Foundation Model for Multi-Sensor Earth Observation Data
PyViT-FUSE: A Foundation Model for Multi-Sensor Earth Observation Data
Manuel Weber
Carly Beneke
ViT
121
0
0
26 Apr 2025
What is the Added Value of UDA in the VFM Era?
What is the Added Value of UDA in the VFM Era?
B. B. Englert
Tommie Kerssies
Gijs Dubbelman
105
0
0
25 Apr 2025
SSL4Eco: A Global Seasonal Dataset for Geospatial Foundation Models in Ecology
SSL4Eco: A Global Seasonal Dataset for Geospatial Foundation Models in Ecology
Elena Plekhanova
Damien Robert
Johannes Dollinger
Emilia Arens
Philipp Brun
Jan Dirk Wegner
Niklaus Zimmermann
83
0
0
25 Apr 2025
A BERT-Style Self-Supervised Learning CNN for Disease Identification from Retinal Images
A BERT-Style Self-Supervised Learning CNN for Disease Identification from Retinal Images
Xin Li
Wenhui Zhu
Peijie Qiu
Oana Dumitrascu
Amal Youssef
Yucheng Wang
SSLMedIm
143
0
0
25 Apr 2025
E-InMeMo: Enhanced Prompting for Visual In-Context Learning
E-InMeMo: Enhanced Prompting for Visual In-Context Learning
Jiahao Zhang
Bowen Wang
Hong Liu
Liangzhi Li
Yuta Nakashima
Hajime Nagahara
VLM
169
0
0
25 Apr 2025
A Genealogy of Multi-Sensor Foundation Models in Remote Sensing
A Genealogy of Multi-Sensor Foundation Models in Remote Sensing
Kevin Lane
Morteza Karimzadeh
81
0
0
24 Apr 2025
Fine-tune Smarter, Not Harder: Parameter-Efficient Fine-Tuning for Geospatial Foundation Models
Fine-tune Smarter, Not Harder: Parameter-Efficient Fine-Tuning for Geospatial Foundation Models
Francesc Marti Escofet
Benedikt Blumenstiel
L. Scheibenreif
P. Fraccaro
Konrad Schindler
91
0
0
24 Apr 2025
Occlusion-Aware Self-Supervised Monocular Depth Estimation for Weak-Texture Endoscopic Images
Occlusion-Aware Self-Supervised Monocular Depth Estimation for Weak-Texture Endoscopic Images
Zebo Huang
Yinghui Wang
MDE
64
0
0
24 Apr 2025
CIVIL: Causal and Intuitive Visual Imitation Learning
CIVIL: Causal and Intuitive Visual Imitation Learning
Yinlong Dai
Robert Ramirez Sanchez
Ryan Jeronimus
Shahabedin Sagheb
Cara M. Nunez
Heramb Nemlekar
Dylan P. Losey
133
1
0
24 Apr 2025
A Simple Review of EEG Foundation Models: Datasets, Advancements and Future Perspectives
A Simple Review of EEG Foundation Models: Datasets, Advancements and Future Perspectives
Junhong Lai
Jiyu Wei
Lin Yao
Yueming Wang
92
0
0
24 Apr 2025
Prompt-Tuning SAM: From Generalist to Specialist with only 2048 Parameters and 16 Training Images
Prompt-Tuning SAM: From Generalist to Specialist with only 2048 Parameters and 16 Training Images
Tristan Piater
Björn Barz
Alexander Freytag
VLMMedIm
138
0
0
23 Apr 2025
Federated EndoViT: Pretraining Vision Transformers via Federated Learning on Endoscopic Image Collections
Federated EndoViT: Pretraining Vision Transformers via Federated Learning on Endoscopic Image Collections
Max Kirchner
Alexander C. Jenke
S. Bodenstedt
Fiona Kolbinger
Oliver Saldanha
Jakob N. Kather
M. Wagner
Stefanie Speidel
FedMLMedIm
158
1
0
23 Apr 2025
MTSGL: Multi-Task Structure Guided Learning for Robust and Interpretable SAR Aircraft Recognition
MTSGL: Multi-Task Structure Guided Learning for Robust and Interpretable SAR Aircraft Recognition
Qishan He
Lingjun Zhao
Ru Luo
Siqian Zhang
Lin Lei
Kefeng Ji
Gangyao Kuang
103
0
0
23 Apr 2025
ForesightNav: Learning Scene Imagination for Efficient Exploration
ForesightNav: Learning Scene Imagination for Efficient Exploration
Hardik Shah
Jiaxu Xing
Nico Messikommer
Boyang Sun
Marc Pollefeys
Davide Scaramuzza
224
1
0
22 Apr 2025
PointLoRA: Low-Rank Adaptation with Token Selection for Point Cloud Learning
PointLoRA: Low-Rank Adaptation with Token Selection for Point Cloud Learning
Song Wang
Xiaolu Liu
Lingdong Kong
Jianyun Xu
Chunyong Hu
Gongfan Fang
Wentong Li
Jianke Zhu
Xinchao Wang
133
0
0
22 Apr 2025
$π_{0.5}$: a Vision-Language-Action Model with Open-World Generalization
π0.5π_{0.5}π0.5​: a Vision-Language-Action Model with Open-World Generalization
Physical Intelligence
Kevin Black
Noah Brown
James Darpinian
Karan Dhabalia
...
Homer Walke
Anna Walling
Haohuan Wang
Lili Yu
Ury Zhilinsky
LM&RoVLM
135
51
0
22 Apr 2025
Multimodal Perception for Goal-oriented Navigation: A Survey
Multimodal Perception for Goal-oriented Navigation: A Survey
I-Tak Ieong
Hao Tang
LM&RoLRM
102
0
0
22 Apr 2025
Boosting Generative Image Modeling via Joint Image-Feature Synthesis
Boosting Generative Image Modeling via Joint Image-Feature Synthesis
Theodoros Kouzelis
Efstathios Karypidis
Ioannis Kakogeorgiou
Spyros Gidaris
N. Komodakis
DiffM
94
0
0
22 Apr 2025
OmniSage: Large Scale, Multi-Entity Heterogeneous Graph Representation Learning
OmniSage: Large Scale, Multi-Entity Heterogeneous Graph Representation Learning
Anirudhan Badrinath
Alex Yang
Kousik Rajesh
Prabhat Agarwal
Jaewon Yang
Haoyu Chen
Jiajing Xu
Charles R. Rosenberg
AI4TS
167
1
0
22 Apr 2025
SparseJEPA: Sparse Representation Learning of Joint Embedding Predictive Architectures
SparseJEPA: Sparse Representation Learning of Joint Embedding Predictive Architectures
Max Hartman
Lav Varshney
114
0
0
22 Apr 2025
SUDO: Enhancing Text-to-Image Diffusion Models with Self-Supervised Direct Preference Optimization
SUDO: Enhancing Text-to-Image Diffusion Models with Self-Supervised Direct Preference Optimization
Liang Peng
Boxi Wu
Haoran Cheng
Yibo Zhao
Xiaofei He
59
0
0
20 Apr 2025
Can We Ignore Labels In Out of Distribution Detection?
Can We Ignore Labels In Out of Distribution Detection?
Hong Yang
Qi Yu
Travis Desel
OODD
70
0
0
20 Apr 2025
Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D
Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D
Sergio Arnaud
Paul Mcvay
Ada Martin
Arjun Majumdar
Krishna Murthy Jatavallabhula
...
Nicolas Ballas
Mido Assran
Oleksandr Maksymets
Aravind Rajeswaran
Franziska Meier
3DPC
81
2
0
19 Apr 2025
Exploring Generalizable Pre-training for Real-world Change Detection via Geometric Estimation
Exploring Generalizable Pre-training for Real-world Change Detection via Geometric Estimation
Yitao Zhao
Sen Lei
Nanqing Liu
Heng-Chao Li
Turgay Celik
Qing Zhu
72
0
0
19 Apr 2025
6G WavesFM: A Foundation Model for Sensing, Communication, and Localization
6G WavesFM: A Foundation Model for Sensing, Communication, and Localization
Ahmed Aboulfotouh
E. Mohammed
Hatem Abou-Zeid
68
2
0
18 Apr 2025
CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning
CheXWorld: Exploring Image World Modeling for Radiograph Representation Learning
Yang Yue
Yulin Wang
Chenxin Tao
Pan Liu
Shiji Song
Gao Huang
MedIm
75
0
0
18 Apr 2025
Previous
123456...949596
Next