ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.06377
  4. Cited By
Masked Autoencoders Are Scalable Vision Learners
v1v2v3 (latest)

Masked Autoencoders Are Scalable Vision Learners

11 November 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
    ViTTPM
ArXiv (abs)PDFHTML

Papers citing "Masked Autoencoders Are Scalable Vision Learners"

50 / 4,777 papers shown
Title
Detecting Localized Deepfake Manipulations Using Action Unit-Guided Video Representations
Detecting Localized Deepfake Manipulations Using Action Unit-Guided Video Representations
Tharun Anand
Siva Sankar
Pravin Nair
AAML
99
0
0
28 Mar 2025
Efficient Building Roof Type Classification: A Domain-Specific Self-Supervised Approach
Efficient Building Roof Type Classification: A Domain-Specific Self-Supervised Approach
Guneet Mutreja
Ksenia Bittner
74
0
0
28 Mar 2025
A Survey on Remote Sensing Foundation Models: From Vision to Multimodality
A Survey on Remote Sensing Foundation Models: From Vision to Multimodality
Ziyue Huang
Hongxi Yan
Qiqi Zhan
Shuai Yang
Mingming Zhang
Yiming Lei
Chenkai Zhang
Zeming Liu
Qingjie Liu
Yansen Wang
147
2
0
28 Mar 2025
Assessing Foundation Models for Sea Ice Type Segmentation in Sentinel-1 SAR Imagery
Assessing Foundation Models for Sea Ice Type Segmentation in Sentinel-1 SAR Imagery
Samira Alkaee Taleghan
Morteza Karimzadeh
A. Barrett
Walter N. Meier
F. Banaei-Kashani
137
0
0
28 Mar 2025
MedCL: Learning Consistent Anatomy Distribution for Scribble-supervised Medical Image Segmentation
MedCL: Learning Consistent Anatomy Distribution for Scribble-supervised Medical Image Segmentation
Ke Zhang
Vishal M. Patel
99
0
0
28 Mar 2025
Test-Time Visual In-Context Tuning
Test-Time Visual In-Context Tuning
Jiahao Xie
A. Tonioni
N. Rauschmayr
F. Tombari
Bernt Schiele
OODVLM
85
0
0
27 Mar 2025
Delving Deep into Semantic Relation Distillation
Delving Deep into Semantic Relation Distillation
Zhaoyi Yan
Kangjun Liu
Qixiang Ye
86
0
0
27 Mar 2025
Evaluating Text-to-Image Synthesis with a Conditional Fréchet Distance
Evaluating Text-to-Image Synthesis with a Conditional Fréchet Distance
Jaywon Koo
J. Hernandez
Moayed Haji-Ali
Ziyan Yang
Vicente Ordonez
EGVM
117
0
0
27 Mar 2025
Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
Size Wu
Wentao Zhang
Lumin Xu
Sheng Jin
Zhonghua Wu
Qingyi Tao
Wentao Liu
Wei Li
Chen Change Loy
VGen
459
6
0
27 Mar 2025
HyperFree: A Channel-adaptive and Tuning-free Foundation Model for Hyperspectral Remote Sensing Imagery
HyperFree: A Channel-adaptive and Tuning-free Foundation Model for Hyperspectral Remote Sensing Imagery
Jingtao Li
Yunxing Liu
Xinyu Wang
Yunning Peng
Chen Sun
...
Tian Ke
Xiao Jiang
Tangwei Lu
Anran Zhao
Yanfei Zhong
VLM
108
1
0
27 Mar 2025
Mamba-3D as Masked Autoencoders for Accurate and Data-Efficient Analysis of Medical Ultrasound Videos
Mamba-3D as Masked Autoencoders for Accurate and Data-Efficient Analysis of Medical Ultrasound Videos
Jiaheng Zhou
Yanfeng Zhou
Wei Fang
Yuxing Tang
Le Lu
Ge Yang
Mamba
515
0
0
26 Mar 2025
MAR-3D: Progressive Masked Auto-regressor for High-Resolution 3D Generation
MAR-3D: Progressive Masked Auto-regressor for High-Resolution 3D Generation
Jinnan Chen
Lingting Zhu
Zeyu Hu
Shengju Qian
Yuxiao Chen
Xin Wang
G. Lee
202
2
0
26 Mar 2025
ATP: Adaptive Threshold Pruning for Efficient Data Encoding in Quantum Neural Networks
ATP: Adaptive Threshold Pruning for Efficient Data Encoding in Quantum Neural Networks
Mohamed Afane
Gabrielle Ebbrecht
Ying Wang
Juntao Chen
Junaid Farooq
68
0
0
26 Mar 2025
RxRx3-core: Benchmarking drug-target interactions in High-Content Microscopy
RxRx3-core: Benchmarking drug-target interactions in High-Content Microscopy
Oren Z. Kraus
Federico Comitani
John Urbanik
Kian Kenyon-Dean
Lakshmanan Arumugam
Saber Saberian
Cas Wognum
Safiye Celik
I. Haque
127
1
0
26 Mar 2025
Feature4X: Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature Fields
Feature4X: Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature Fields
Shijie Zhou
Hui Ren
Yijia Weng
Shuwang Zhang
Zhen Wang
...
Zhiwen Fan
Suya You
Ziyi Wang
Leonidas Guibas
A. Kadambi
VGen3DGS
148
3
0
26 Mar 2025
LRSCLIP: A Vision-Language Foundation Model for Aligning Remote Sensing Image with Longer Text
LRSCLIP: A Vision-Language Foundation Model for Aligning Remote Sensing Image with Longer Text
Weizhi Chen
Jingbo Chen
Yupeng Deng
Jiansheng Chen
Yuman Feng
Zhihao Xi
Diyou Liu
Kai Li
Yu Meng
VLM
102
1
0
25 Mar 2025
Self-Supervised Learning of Motion Concepts by Optimizing Counterfactuals
Self-Supervised Learning of Motion Concepts by Optimizing Counterfactuals
Stefan Stojanov
David Wendt
Seungwoo Kim
R. Venkatesh
Kevin T. Feigelis
Jiajun Wu
Daniel L. K. Yamins
SSL
99
0
0
25 Mar 2025
Scaling Vision Pre-Training to 4K Resolution
Scaling Vision Pre-Training to 4K Resolution
Baifeng Shi
Boyi Li
Han Cai
Yaojie Lu
Sifei Liu
...
Jan Kautz
Enze Xie
Trevor Darrell
Pavlo Molchanov
Hongxu Yin
CLIP
411
0
0
25 Mar 2025
Recover from Horcrux: A Spectrogram Augmentation Method for Cardiac Feature Monitoring from Radar Signal Components
Recover from Horcrux: A Spectrogram Augmentation Method for Cardiac Feature Monitoring from Radar Signal Components
Yize Zhang
Sijie Xiong
Rui Yang
EngGee Lim
Yutao Yue
94
0
0
25 Mar 2025
Surg-3M: A Dataset and Foundation Model for Perception in Surgical Settings
Surg-3M: A Dataset and Foundation Model for Perception in Surgical Settings
Chengan Che
Chao Wang
Tom Vercauteren
Sophia Tsoka
Luis C. Garcia-Peraza-Herrera
MedIm
84
1
0
25 Mar 2025
Vanishing Depth: A Depth Adapter with Positional Depth Encoding for Generalized Image Encoders
Vanishing Depth: A Depth Adapter with Positional Depth Encoding for Generalized Image Encoders
Paul Koch
Jörg Krüger
Ankit Chowdhury
O. Heimann
MDE
99
0
0
25 Mar 2025
An Overview of Low-Rank Structures in the Training and Adaptation of Large Models
An Overview of Low-Rank Structures in the Training and Adaptation of Large Models
Laura Balzano
Tianjiao Ding
B. Haeffele
Soo Min Kwon
Qing Qu
Peng Wang
Ziyi Wang
Can Yaras
OffRLAI4CE
88
2
0
25 Mar 2025
ChA-MAEViT: Unifying Channel-Aware Masked Autoencoders and Multi-Channel Vision Transformers for Improved Cross-Channel Learning
ChA-MAEViT: Unifying Channel-Aware Masked Autoencoders and Multi-Channel Vision Transformers for Improved Cross-Channel Learning
Chau Pham
Juan C. Caicedo
Bryan A. Plummer
78
0
0
25 Mar 2025
Explaining Domain Shifts in Language: Concept erasing for Interpretable Image Classification
Explaining Domain Shifts in Language: Concept erasing for Interpretable Image Classification
Zequn Zeng
Yudi Su
Jianqiao Sun
Tiansheng Wen
Hao Zhang
Zhengjue Wang
Bo Chen
Hongwei Liu
Jiawei Ma
VLM
164
0
0
24 Mar 2025
Linguistics-aware Masked Image Modeling for Self-supervised Scene Text Recognition
Linguistics-aware Masked Image Modeling for Self-supervised Scene Text Recognition
Yifei Zhang
Chang-Shu Liu
Jin Wei
Xiaomeng Yang
Yu Zhou
Can Ma
Xiangyang Ji
104
3
0
24 Mar 2025
Self-Supervised Learning based on Transformed Image Reconstruction for Equivariance-Coherent Feature Representation
Self-Supervised Learning based on Transformed Image Reconstruction for Equivariance-Coherent Feature Representation
Qin Wang
Benjamin Bruns
Hanno Scharr
Kai Krajsek
99
0
0
24 Mar 2025
HOIGPT: Learning Long Sequence Hand-Object Interaction with Language Models
HOIGPT: Learning Long Sequence Hand-Object Interaction with Language Models
Mingzhen Huang
Fu-Jen Chu
Bugra Tekin
Kevin J. Liang
Haoyu Ma
...
Hongfei Xue
Siwei Lyu
Kris Kitani
Matt Feiszli
Hao Tang
VLM
118
4
0
24 Mar 2025
Video-XL-Pro: Reconstructive Token Compression for Extremely Long Video Understanding
Video-XL-Pro: Reconstructive Token Compression for Extremely Long Video Understanding
Xiangrui Liu
Yan Shu
Zhengyang Liang
Ao Li
Yang Tian
Bo Zhao
VGenVLM
274
9
0
24 Mar 2025
U-REPA: Aligning Diffusion U-Nets to ViTs
U-REPA: Aligning Diffusion U-Nets to ViTs
Yuchuan Tian
Hanting Chen
Mengyu Zheng
Yuchen Liang
Chao Xu
Yunhe Wang
112
2
0
24 Mar 2025
Out-of-distribution evaluations of channel agnostic masked autoencoders in fluorescence microscopy
Out-of-distribution evaluations of channel agnostic masked autoencoders in fluorescence microscopy
Christian John Hurry
Jinjie Zhang
Olubukola Ishola
Emma Slade
Cuong Q. Nguyen
OODOODD
89
0
0
24 Mar 2025
Revisiting Automatic Data Curation for Vision Foundation Models in Digital Pathology
Revisiting Automatic Data Curation for Vision Foundation Models in Digital Pathology
Boqi Chen
Cédric Vincent-Cuaz
Lydia A. Schoenpflug
Manuel Madeira
Lisa Fournier
...
D. Thanou
V. Koelzer
Pascal Frossard
Gabriele Campanella
Gunnar Rätsch
97
1
0
24 Mar 2025
CoMP: Continual Multimodal Pre-training for Vision Foundation Models
CoMP: Continual Multimodal Pre-training for Vision Foundation Models
Yuxiao Chen
L. Meng
Wujian Peng
Zuxuan Wu
Yu-Gang Jiang
VLM
211
1
0
24 Mar 2025
MoST: Efficient Monarch Sparse Tuning for 3D Representation Learning
MoST: Efficient Monarch Sparse Tuning for 3D Representation Learning
Xu Han
Yuan Tang
Jinfeng Xu
Xianzhi Li
97
0
0
24 Mar 2025
SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining
SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining
Yue Li
Qi Ma
Runyi Yang
Huapeng Li
Mengjiao Ma
...
E. Konukoglu
Theo Gevers
Luc Van Gool
Martin R. Oswald
Danda Pani Paudel
3DGSVLM
235
2
0
23 Mar 2025
Interpretable Feature Interaction via Statistical Self-supervised Learning on Tabular Data
Interpretable Feature Interaction via Statistical Self-supervised Learning on Tabular Data
Xiaochen Zhang
Haoyi Xiong
66
0
0
23 Mar 2025
FisherTune: Fisher-Guided Robust Tuning of Vision Foundation Models for Domain Generalized Segmentation
FisherTune: Fisher-Guided Robust Tuning of Vision Foundation Models for Domain Generalized Segmentation
Dong Zhao
Jinlong Li
Shuang Wang
Mengyao Wu
Qi Zang
N. Sebe
Zhun Zhong
461
1
0
23 Mar 2025
Leveraging Audio Representations for Vibration-Based Crowd Monitoring in Stadiums
Leveraging Audio Representations for Vibration-Based Crowd Monitoring in Stadiums
Yen Cheng Chang
Jesse Codling
Yiwen Dong
Junxuan Zhang
Jiasi Chen
Hae Young Noh
Pei Zhang
168
0
0
22 Mar 2025
Seeing What Matters: Empowering CLIP with Patch Generation-to-Selection
Seeing What Matters: Empowering CLIP with Patch Generation-to-Selection
Gensheng Pei
Tao Chen
Yujia Wang
Xinhao Cai
Xiangbo Shu
Tianfei Zhou
Yazhou Yao
VLM
95
1
0
21 Mar 2025
Beyond Accuracy: What Matters in Designing Well-Behaved Models?
Beyond Accuracy: What Matters in Designing Well-Behaved Models?
Robin Hesse
Doğukan Bağcı
Bernt Schiele
Simone Schaub-Meyer
Stefan Roth
VLM
112
0
0
21 Mar 2025
ModalTune: Fine-Tuning Slide-Level Foundation Models with Multi-Modal Information for Multi-task Learning in Digital Pathology
ModalTune: Fine-Tuning Slide-Level Foundation Models with Multi-Modal Information for Multi-task Learning in Digital Pathology
Vishwesh Ramanathan
Tony Xu
Pushpak Pati
Faruk Ahmed
Maged Goubran
Anne L. Martel
80
0
0
21 Mar 2025
Should we pre-train a decoder in contrastive learning for dense prediction tasks?
Should we pre-train a decoder in contrastive learning for dense prediction tasks?
S. Quetin
Tapotosh Ghosh
Farhad Maleki
SSL
110
0
0
21 Mar 2025
Missing Target-Relevant Information Prediction with World Model for Accurate Zero-Shot Composed Image Retrieval
Missing Target-Relevant Information Prediction with World Model for Accurate Zero-Shot Composed Image Retrieval
Yuanmin Tang
Jing Yu
Keke Gai
Jiamin Zhuang
Gang Xiong
Gaopeng Gou
Qi Wu
VGen
176
2
0
21 Mar 2025
D2C: Unlocking the Potential of Continuous Autoregressive Image Generation with Discrete Tokens
D2C: Unlocking the Potential of Continuous Autoregressive Image Generation with Discrete Tokens
Panpan Wang
Liqiang Niu
Fandong Meng
Jinan Xu
Yufeng Chen
Jie Zhou
DiffM
108
0
0
21 Mar 2025
EasyRobust: A Comprehensive and Easy-to-use Toolkit for Robust and Generalized Vision
EasyRobust: A Comprehensive and Easy-to-use Toolkit for Robust and Generalized Vision
Xiaofeng Mao
YueFeng Chen
Rong Zhang
Hui Xue
Zhao Li
Hang Su
AAMLVLM
81
0
0
21 Mar 2025
Halton Scheduler For Masked Generative Image Transformer
Halton Scheduler For Masked Generative Image Transformer
Victor Besnier
Mickael Chen
David Hurych
Eduardo Valle
Matthieu Cord
101
3
0
21 Mar 2025
DynamicVis: An Efficient and General Visual Foundation Model for Remote Sensing Image Understanding
DynamicVis: An Efficient and General Visual Foundation Model for Remote Sensing Image Understanding
Keyan Chen
Chenyang Liu
Bowen Chen
Wenyuan Li
Zhengxia Zou
Zhenwei Shi
78
3
0
20 Mar 2025
GAIR: Improving Multimodal Geo-Foundation Model with Geo-Aligned Implicit Representations
GAIR: Improving Multimodal Geo-Foundation Model with Geo-Aligned Implicit Representations
Ziqiang Liu
Fan Zhang
Junfeng Jiao
Ni Lao
Gengchen Mai
91
2
0
20 Mar 2025
Tokenize Image as a Set
Tokenize Image as a Set
Zigang Geng
Mengde Xu
Han Hu
Shuyang Gu
DiffM
82
0
0
20 Mar 2025
Learning 3D Scene Analogies with Neural Contextual Scene Maps
Learning 3D Scene Analogies with Neural Contextual Scene Maps
Junho Kim
Gwangtak Bae
E. Lee
Young Min Kim
3DPC3DV
99
0
0
20 Mar 2025
BlockDance: Reuse Structurally Similar Spatio-Temporal Features to Accelerate Diffusion Transformers
BlockDance: Reuse Structurally Similar Spatio-Temporal Features to Accelerate Diffusion Transformers
Hui Zhang
Tingwei Gao
Jie Shao
Zuxuan Wu
117
2
0
20 Mar 2025
Previous
123...789...949596
Next