ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.06377
  4. Cited By
Masked Autoencoders Are Scalable Vision Learners
v1v2v3 (latest)

Masked Autoencoders Are Scalable Vision Learners

11 November 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
    ViTTPM
ArXiv (abs)PDFHTML

Papers citing "Masked Autoencoders Are Scalable Vision Learners"

50 / 4,777 papers shown
Title
TerraFM: A Scalable Foundation Model for Unified Multisensor Earth Observation
TerraFM: A Scalable Foundation Model for Unified Multisensor Earth Observation
M. S. Danish
Muhammad Akhtar Munir
Syed Aziz Shah
M. H. Khan
Rao Muhammad Anwer
Jorma T. Laaksonen
Fahad Shahbaz Khan
Salman Khan
66
0
0
06 Jun 2025
Token Transforming: A Unified and Training-Free Token Compression Framework for Vision Transformer Acceleration
Token Transforming: A Unified and Training-Free Token Compression Framework for Vision Transformer Acceleration
Fanhu Zeng
Deli Yu
Zhenglun Kong
Hao Tang
ViT
56
1
0
06 Jun 2025
Perfecting Depth: Uncertainty-Aware Enhancement of Metric Depth
Jinyoung Jun
Lei Chu
Jiahao Li
Yan Lu
Chang-Su Kim
MDE
133
0
0
05 Jun 2025
SAM-aware Test-time Adaptation for Universal Medical Image Segmentation
Jianghao Wu
Yicheng Wu
Yutong Xie
Wenjia Bai
You Zhang
Feilong Tang
Yulong Li
Yasmeen George
Imran Razzak
MedIm
162
0
0
05 Jun 2025
Using In-Context Learning for Automatic Defect Labelling of Display Manufacturing Data
Babar Hussain
Qiang Liu
Gang Chen
Bihai She
Dahai Yu
158
0
0
05 Jun 2025
VoxDet: Rethinking 3D Semantic Occupancy Prediction as Dense Object Detection
W. Li
Zhu Yu
Alexandre Alahi
3DPC
124
0
0
05 Jun 2025
FRAME: Pre-Training Video Feature Representations via Anticipation and Memory
FRAME: Pre-Training Video Feature Representations via Anticipation and Memory
Sethuraman TV
Savya Khosla
Vignesh Srinivasakumar
Jiahui Huang
Seoung Wug Oh
Simon Jenni
Derek Hoiem
Joon-Young Lee
36
0
0
05 Jun 2025
Fine-Tuning Video Transformers for Word-Level Bangla Sign Language: A Comparative Analysis for Classification Tasks
Jubayer Ahmed Bhuiyan Shawon
H. Mahmud
Kamrul Hasan
43
0
0
04 Jun 2025
Object-level Self-Distillation for Vision Pretraining
Object-level Self-Distillation for Vision Pretraining
Çağlar Hızlı
Çağatay Yıldız
Pekka Marttinen
OCLVLM
48
0
0
04 Jun 2025
A Generative Adaptive Replay Continual Learning Model for Temporal Knowledge Graph Reasoning
A Generative Adaptive Replay Continual Learning Model for Temporal Knowledge Graph Reasoning
Zhiyu Zhang
Wei Chen
Youfang Lin
Huaiyu Wan
OffRLCLL
111
0
0
04 Jun 2025
HMAR: Efficient Hierarchical Masked Auto-Regressive Image Generation
Hermann Kumbong
Xian Liu
Tsung-Yi Lin
Ming-Yu Liu
Xihui Liu
Ziwei Liu
Daniel Y. Fu
Christopher Ré
David W. Romero
DiffM
48
0
0
04 Jun 2025
Enhancing Neural Autoregressive Distribution Estimators for Image Reconstruction
Enhancing Neural Autoregressive Distribution Estimators for Image Reconstruction
Ambrose Emmett-Iwaniw
Nathan Kirk
15
0
0
03 Jun 2025
Efficient Tactile Perception with Soft Electrical Impedance Tomography and Pre-trained Transformer
Efficient Tactile Perception with Soft Electrical Impedance Tomography and Pre-trained Transformer
Huazhi Dong
Ronald B. Liu
Sihao Teng
Delin Hu
Peisan
F. G. Serchi
Yunjie Yang
39
0
0
03 Jun 2025
Zero-Shot Tree Detection and Segmentation from Aerial Forest Imagery
Zero-Shot Tree Detection and Segmentation from Aerial Forest Imagery
Michelle Chen
David Russell
Amritha Pallavoor
Derek Young
Jane Wu
VLM
57
0
0
03 Jun 2025
Large-scale Self-supervised Video Foundation Model for Intelligent Surgery
Large-scale Self-supervised Video Foundation Model for Intelligent Surgery
Shu Yang
F. Zhou
Leon D. Mayer
Fuxiang Huang
Yiliang Chen
...
Zheng Li
Jing Qin
J. Teoh
Lena Maier-Hein
Hao-tao Chen
75
0
0
03 Jun 2025
FORLA:Federated Object-centric Representation Learning with Slot Attention
FORLA:Federated Object-centric Representation Learning with Slot Attention
Guiqiu Liao
M. Jogan
Eric Eaton
Daniel A. Hashimoto
FedML
64
0
0
03 Jun 2025
MemoryOut: Learning Principal Features via Multimodal Sparse Filtering Network for Semi-supervised Video Anomaly Detection
MemoryOut: Learning Principal Features via Multimodal Sparse Filtering Network for Semi-supervised Video Anomaly Detection
Juntong Li
Lingwei Dang
Yukun Su
Yun Hao
Qingxin Xiao
Yongwei Nie
Qingyao Wu
64
0
0
03 Jun 2025
OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models
OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models
Mengdi Jia
Zekun Qi
Shaochen Zhang
Wenyao Zhang
Xinqiang Yu
Jiawei He
He Wang
L. Yi
LRMVLM
50
0
0
03 Jun 2025
VLCD: Vision-Language Contrastive Distillation for Accurate and Efficient Automatic Placenta Analysis
VLCD: Vision-Language Contrastive Distillation for Accurate and Efficient Automatic Placenta Analysis
Manas Mehta
Yimu Pan
Kelly Gallagher
Alison D. Gernand
Jeffery A. Goldstein
Delia Mwinyelle
Leena Mithal
J. Z. Wang
24
0
0
02 Jun 2025
MoCA: Multi-modal Cross-masked Autoencoder for Digital Health Measurements
MoCA: Multi-modal Cross-masked Autoencoder for Digital Health Measurements
Howon Ryu
Y. Chen
Yacun Wang
Andrea Z. LaCroix
Chongzhi Di
L. Natarajan
Yu Wang
Jingjing Zou
26
0
0
02 Jun 2025
Unraveling Spatio-Temporal Foundation Models via the Pipeline Lens: A Comprehensive Review
Unraveling Spatio-Temporal Foundation Models via the Pipeline Lens: A Comprehensive Review
Yuchen Fang
Hao Miao
Yuxuan Liang
Liwei Deng
Yue Cui
...
Yan Zhao
T. Pedersen
Christian S. Jensen
Xiaofang Zhou
Kai Zheng
AI4TSAI4CE
64
0
0
02 Jun 2025
CACTI: Leveraging Copy Masking and Contextual Information to Improve Tabular Data Imputation
CACTI: Leveraging Copy Masking and Contextual Information to Improve Tabular Data Imputation
Aditya Gorla
Ryan Wang
Zhengtong Liu
Ulzee An
Sriram Sankararaman
30
0
0
02 Jun 2025
Self-supervised ControlNet with Spatio-Temporal Mamba for Real-world Video Super-resolution
Self-supervised ControlNet with Spatio-Temporal Mamba for Real-world Video Super-resolution
Shijun Shi
Jing Xu
Lijing Lu
Zhihang Li
Kai Hu
37
0
0
01 Jun 2025
AuralSAM2: Enabling SAM2 Hear Through Pyramid Audio-Visual Feature Prompting
AuralSAM2: Enabling SAM2 Hear Through Pyramid Audio-Visual Feature Prompting
Yuyuan Liu
Yuanhong Chen
Chong Wang
Junlin Han
Junde Wu
Can Peng
Jingkun Chen
Yu Tian
Gustavo Carneiro
VLM
49
0
0
01 Jun 2025
PCoreSet: Effective Active Learning through Knowledge Distillation from Vision-Language Models
PCoreSet: Effective Active Learning through Knowledge Distillation from Vision-Language Models
Seongjae Kang
Dong Bok Lee
Hyungjoon Jang
Dongseop Kim
Sung Ju Hwang
VLM
43
0
0
01 Jun 2025
RPRA-ADD: Forgery Trace Enhancement-Driven Audio Deepfake Detection
RPRA-ADD: Forgery Trace Enhancement-Driven Audio Deepfake Detection
Ruibo Fu
Xiaopeng Wang
Zhengqi Wen
Jianhua Tao
Yuankun Xie
...
Chunyu Qiang
Xuefei Liu
Cunhang Fan
Chenxing Li
Guanjun Li
22
0
0
31 May 2025
SST: Self-training with Self-adaptive Thresholding for Semi-supervised Learning
SST: Self-training with Self-adaptive Thresholding for Semi-supervised Learning
Shuai Zhao
Heyan Huang
Xinge Li
Xiaokang Chen
Rui Wang
35
0
0
31 May 2025
IMPACT: Iterative Mask-based Parallel Decoding for Text-to-Audio Generation with Diffusion Modeling
IMPACT: Iterative Mask-based Parallel Decoding for Text-to-Audio Generation with Diffusion Modeling
Kuan Po Huang
Shu-Wen Yang
Huy Phan
Bo-Ru Lu
Byeonggeun Kim
...
Qingming Tang
Shalini Ghosh
Hung-yi Lee
Chieh-Chi Kao
Chao Wang
27
0
0
31 May 2025
CineMA: A Foundation Model for Cine Cardiac MRI
CineMA: A Foundation Model for Cine Cardiac MRI
Yunguan Fu
Weixi Yi
C. Manisty
A. Bhuva
T. Treibel
James C. Moon
Matthew J. Clarkson
R. Davies
Yipeng Hu
VGenMedIm
43
0
0
31 May 2025
SenseFlow: Scaling Distribution Matching for Flow-based Text-to-Image Distillation
SenseFlow: Scaling Distribution Matching for Flow-based Text-to-Image Distillation
Xingtong Ge
Xin Zhang
Tongda Xu
Yi Zhang
Xinjie Zhang
Yan Wang
Jun Zhang
DiffM
26
0
0
31 May 2025
Proxy-FDA: Proxy-based Feature Distribution Alignment for Fine-tuning Vision Foundation Models without Forgetting
Proxy-FDA: Proxy-based Feature Distribution Alignment for Fine-tuning Vision Foundation Models without Forgetting
Chen Huang
Skyler Seto
Hadi Pouransari
Mehrdad Farajtabar
Raviteja Vemulapalli
Fartash Faghri
Oncel Tuzel
B. Theobald
Josh Susskind
CLL
48
0
0
30 May 2025
Pretraining Deformable Image Registration Networks with Random Images
Pretraining Deformable Image Registration Networks with Random Images
Junyu Chen
Shuwen Wei
Yihao Liu
A. Carass
Yong Du
49
0
0
30 May 2025
PDE-Transformer: Efficient and Versatile Transformers for Physics Simulations
PDE-Transformer: Efficient and Versatile Transformers for Physics Simulations
Benjamin Holzschuh
Qiang Liu
Georg Kohl
Nils Thuerey
AI4CE
46
1
0
30 May 2025
Contrast-Invariant Self-supervised Segmentation for Quantitative Placental MRI
Contrast-Invariant Self-supervised Segmentation for Quantitative Placental MRI
Xinliu Zhong
Ruiying Liu
Emily S. Nichols
Xuzhe Zhang
Andrew F. Laine
Emma G. Duerden
Yun Wang
44
0
0
30 May 2025
Federated Foundation Model for GI Endoscopy Images
Federated Foundation Model for GI Endoscopy Images
Alina Devkota
Annahita Amireskandari
Joel Palko
Shyam Thakkar
Donald Adjeroh
Xiajun Jiang
Binod Bhattarai
P. Gyawali
MedIm
33
0
0
30 May 2025
Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces
Visual Embodied Brain: Let Multimodal Large Language Models See, Think, and Control in Spaces
Gen Luo
Ganlin Yang
Ziyang Gong
Guanzhou Chen
Haonan Duan
...
Wenhai Wang
Jifeng Dai
Yu Qiao
Rongrong Ji
X. Zhu
LM&Ro
27
1
0
30 May 2025
LTM3D: Bridging Token Spaces for Conditional 3D Generation with Auto-Regressive Diffusion Framework
LTM3D: Bridging Token Spaces for Conditional 3D Generation with Auto-Regressive Diffusion Framework
Xin Kang
Zihan Zheng
Lei Chu
Yue Gao
Jiahao Li
Hao Pan
Xuejin Chen
Yan Lu
DiffM
33
0
0
30 May 2025
un$^2$CLIP: Improving CLIP's Visual Detail Capturing Ability via Inverting unCLIP
un2^22CLIP: Improving CLIP's Visual Detail Capturing Ability via Inverting unCLIP
Yinqi Li
Jiahe Zhao
Hong Chang
Ruibing Hou
Shiguang Shan
Xilin Chen
CLIPVLM
43
0
0
30 May 2025
Spatio-Temporal Joint Density Driven Learning for Skeleton-Based Action Recognition
Spatio-Temporal Joint Density Driven Learning for Skeleton-Based Action Recognition
Shanaka Ramesh Gunasekara
Wanqing Li
P. Ogunbona
Jack Yang
24
0
0
29 May 2025
Skin Lesion Phenotyping via Nested Multi-modal Contrastive Learning
Skin Lesion Phenotyping via Nested Multi-modal Contrastive Learning
Dionysis Christopoulos
Sotiris Spanos
Eirini Baltzi
Valsamis Ntouskos
Konstantinos Karantzalos
64
0
0
29 May 2025
KVzip: Query-Agnostic KV Cache Compression with Context Reconstruction
KVzip: Query-Agnostic KV Cache Compression with Context Reconstruction
Jang-Hyun Kim
Jinuk Kim
S. Kwon
Jae W. Lee
Sangdoo Yun
Hyun Oh Song
MQVLM
57
0
0
29 May 2025
Graph Positional Autoencoders as Self-supervised Learners
Graph Positional Autoencoders as Self-supervised Learners
Yang Liu
Deyu Bo
Wenxuan Cao
Yuan Fang
Yawen Li
C. Shi
SSL
66
1
0
29 May 2025
Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model
Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model
Qingyu Shi
Jinbin Bai
Zhuoran Zhao
Wenhao Chai
Kaidong Yu
...
Shuangyong Song
Yunhai Tong
Xiangtai Li
X. Li
Shuicheng Yan
87
2
0
29 May 2025
Navigating the Latent Space Dynamics of Neural Models
Navigating the Latent Space Dynamics of Neural Models
Marco Fumero
Luca Moschella
Emanuele Rodolà
Francesco Locatello
23
0
0
28 May 2025
A Survey on Training-free Open-Vocabulary Semantic Segmentation
A Survey on Training-free Open-Vocabulary Semantic Segmentation
Naomi Kombol
Ivan Martinović
Sinisa Segvic
ObjDVLM
73
0
0
28 May 2025
Towards Scalable Language-Image Pre-training for 3D Medical Imaging
Towards Scalable Language-Image Pre-training for 3D Medical Imaging
Chenhui Zhao
Yiwei Lyu
Asadur Chowdury
Edward Harake
A. Kondepudi
Akshay Rao
X. Hou
Honglak Lee
Todd C. Hollon
LM&MAMedIm
39
0
0
28 May 2025
From Large AI Models to Agentic AI: A Tutorial on Future Intelligent Communications
From Large AI Models to Agentic AI: A Tutorial on Future Intelligent Communications
Feibo Jiang
Cunhua Pan
Li Dong
Kezhi Wang
O. Dobre
Mérouane Debbah
LLMAGAI4TS
172
1
0
28 May 2025
Progressive Data Dropout: An Embarrassingly Simple Approach to Faster Training
Progressive Data Dropout: An Embarrassingly Simple Approach to Faster Training
S. Srinivasan
Xinyue Hao
Shihao Hou
Yang Lu
Laura Sevilla-Lara
Anurag Arnab
Shreyank N Gowda
64
0
0
28 May 2025
On Geometry-Enhanced Parameter-Efficient Fine-Tuning for 3D Scene Segmentation
On Geometry-Enhanced Parameter-Efficient Fine-Tuning for 3D Scene Segmentation
Liyao Tang
Zhe Chen
Dacheng Tao
3DPC
50
0
0
28 May 2025
Incorporating Flexible Image Conditioning into Text-to-Video Diffusion Models without Training
Incorporating Flexible Image Conditioning into Text-to-Video Diffusion Models without Training
Bolin Lai
Sangmin Lee
Xu Cao
Xiang Li
James M. Rehg
DiffM
70
0
0
27 May 2025
Previous
12345...949596
Next