ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.06377
  4. Cited By
Masked Autoencoders Are Scalable Vision Learners
v1v2v3 (latest)

Masked Autoencoders Are Scalable Vision Learners

11 November 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
    ViTTPM
ArXiv (abs)PDFHTML

Papers citing "Masked Autoencoders Are Scalable Vision Learners"

50 / 4,777 papers shown
Title
BeetleVerse: A study on taxonomic classification of ground beetles
BeetleVerse: A study on taxonomic classification of ground beetles
S M Rayeed
Alyson East
Samuel Stevens
Sydne Record
Charles V. Stewart
53
0
0
18 Apr 2025
DAM-Net: Domain Adaptation Network with Micro-Labeled Fine-Tuning for Change Detection
DAM-Net: Domain Adaptation Network with Micro-Labeled Fine-Tuning for Change Detection
Ningyu Zhang
Xin Xu
Fangling Pu
86
0
0
18 Apr 2025
EchoWorld: Learning Motion-Aware World Models for Echocardiography Probe Guidance
EchoWorld: Learning Motion-Aware World Models for Echocardiography Probe Guidance
Yang Yue
Yulin Wang
Haojun Jiang
Pan Liu
S. Song
Gao Huang
VGen
114
0
0
17 Apr 2025
Self-Supervised Pre-training with Combined Datasets for 3D Perception in Autonomous Driving
Self-Supervised Pre-training with Combined Datasets for 3D Perception in Autonomous Driving
Shumin Wang
Zhuoran Yang
Liwen Wang
ZhiPeng Tang
Heng Li
Lehan Pan
Sha Zhang
Jie Peng
Jianmin Ji
Y. Zhang
3DPC
100
0
0
17 Apr 2025
Perception Encoder: The best visual embeddings are not at the output of the network
Perception Encoder: The best visual embeddings are not at the output of the network
Daniel Bolya
Po-Yao (Bernie) Huang
Peize Sun
Jang Hyun Cho
Andrea Madotto
...
Shiyu Dong
Nikhila Ravi
Daniel Li
Piotr Dollár
Christoph Feichtenhofer
ObjDVOS
329
9
0
17 Apr 2025
PSG-MAE: Robust Multitask Sleep Event Monitoring using Multichannel PSG Reconstruction and Inter-channel Contrastive Learning
PSG-MAE: Robust Multitask Sleep Event Monitoring using Multichannel PSG Reconstruction and Inter-channel Contrastive Learning
Yifei Wang
Qi Liu
Fuli Min
Honghao Wang
48
0
0
17 Apr 2025
CM3AE: A Unified RGB Frame and Event-Voxel/-Frame Pre-training Framework
CM3AE: A Unified RGB Frame and Event-Voxel/-Frame Pre-training Framework
Wentao Wu
Xinyu Wang
Chenglong Li
Bo Jiang
Jin Tang
Bin Luo
Qi Liu
100
0
0
17 Apr 2025
SAR Object Detection with Self-Supervised Pretraining and Curriculum-Aware Sampling
SAR Object Detection with Self-Supervised Pretraining and Curriculum-Aware Sampling
Yasin Almalioglu
Andrzej Kucik
Geoffrey French
Dafni Antotsiou
Alexander Adam
Cedric Archambeau
79
0
0
17 Apr 2025
Can Masked Autoencoders Also Listen to Birds?
Can Masked Autoencoders Also Listen to Birds?
Lukas Rauch
Ilyass Moummad
René Heinrich
Alexis Joly
Bernhard Sick
Christoph Scholz
151
0
0
17 Apr 2025
LIFT+: Lightweight Fine-Tuning for Long-Tail Learning
LIFT+: Lightweight Fine-Tuning for Long-Tail Learning
Jiang-Xin Shi
Tong Wei
Yu-Feng Li
54
0
0
17 Apr 2025
Towards Cardiac MRI Foundation Models: Comprehensive Visual-Tabular Representations for Whole-Heart Assessment and Beyond
Towards Cardiac MRI Foundation Models: Comprehensive Visual-Tabular Representations for Whole-Heart Assessment and Beyond
Yundi Zhang
Paul Hager
Che Liu
Suprosanna Shit
Chong Chen
Daniel Rueckert
Jiazhen Pan
136
1
0
17 Apr 2025
A Complex-valued SAR Foundation Model Based on Physically Inspired Representation Learning
A Complex-valued SAR Foundation Model Based on Physically Inspired Representation Learning
M. D. Wang
Hanbo Bi
Yingchao Feng
Linlin Xin
Shuo Gong
Tianqi Wang
Zhiyuan Yan
Peijin Wang
Wenhui Diao
Xian Sun
65
0
0
16 Apr 2025
Generative Recommendation with Continuous-Token Diffusion
Generative Recommendation with Continuous-Token Diffusion
Haohao Qu
Wenqi Fan
Shanru Lin
DiffM
181
1
0
16 Apr 2025
SIDME: Self-supervised Image Demoiréing via Masked Encoder-Decoder Reconstruction
SIDME: Self-supervised Image Demoiréing via Masked Encoder-Decoder Reconstruction
Xia Wang
Haiyang Sun
Tiantian Cao
Yueying Sun
Min Feng
DiffM
88
0
0
16 Apr 2025
A Review of YOLOv12: Attention-Based Enhancements vs. Previous Versions
A Review of YOLOv12: Attention-Based Enhancements vs. Previous Versions
Rahima Khanam
Muhammad Hussain
98
0
0
16 Apr 2025
AnomalyR1: A GRPO-based End-to-end MLLM for Industrial Anomaly Detection
AnomalyR1: A GRPO-based End-to-end MLLM for Industrial Anomaly Detection
Yuhao Chao
Jie Liu
J. Tang
Gangshan Wu
113
2
0
16 Apr 2025
Deep Learning Approaches for Medical Imaging Under Varying Degrees of Label Availability: A Comprehensive Survey
Deep Learning Approaches for Medical Imaging Under Varying Degrees of Label Availability: A Comprehensive Survey
Siteng Ma
Honghui Du
Yu An
Jing Wang
Qinqin Wang
Haochang Wu
Aonghus Lawlor
Ruihai Dong
126
0
0
15 Apr 2025
AFiRe: Anatomy-Driven Self-Supervised Learning for Fine-Grained Representation in Radiographic Images
AFiRe: Anatomy-Driven Self-Supervised Learning for Fine-Grained Representation in Radiographic Images
Yihang Liu
Lianghua He
Y. Wen
Longzhen Yang
Hongzhou Chen
MedIm
137
0
0
15 Apr 2025
On the Value of Cross-Modal Misalignment in Multimodal Representation Learning
On the Value of Cross-Modal Misalignment in Multimodal Representation Learning
Yichao Cai
Yuhang Liu
Erdun Gao
Tianjiao Jiang
Zhen Zhang
Anton van den Hengel
Javen Qinfeng Shi
149
0
0
14 Apr 2025
ALMTokenizer: A Low-bitrate and Semantic-rich Audio Codec Tokenizer for Audio Language Modeling
ALMTokenizer: A Low-bitrate and Semantic-rich Audio Codec Tokenizer for Audio Language Modeling
Dongchao Yang
Songxiang Liu
Haohan Guo
Jiankun Zhao
Yuanyuan Wang
...
Xubo Liu
Xueyuan Chen
Xu Tan
Xixin Wu
Helen Meng
233
2
0
14 Apr 2025
The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer
The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer
Weixian Lei
Jiacong Wang
Haochen Wang
Xuelong Li
Jun Hao Liew
Jiashi Feng
Zilong Huang
74
5
0
14 Apr 2025
Efficient Generative Model Training via Embedded Representation Warmup
Efficient Generative Model Training via Embedded Representation Warmup
Deyuan Liu
Peng Sun
Xufeng Li
Tao Lin
72
0
0
14 Apr 2025
GFT: Gradient Focal Transformer
GFT: Gradient Focal Transformer
Boris Kriuk
Simranjit Kaur Gill
Shoaib Aslam
Amir Fakhrutdinov
94
0
0
14 Apr 2025
Causal integration of chemical structures improves representations of microscopy images for morphological profiling
Causal integration of chemical structures improves representations of microscopy images for morphological profiling
Yemin Yu
Neil A. Tenenholtz
Lester W. Mackey
Ying Wei
David Alvarez-Melis
Ava P. Amini
Alex X. Lu
75
1
0
13 Apr 2025
Automatic Detection of Intro and Credits in Video using CLIP and Multihead Attention
Automatic Detection of Intro and Credits in Video using CLIP and Multihead Attention
Vasilii Korolkov
Andrey Yanchenko
VLM
76
1
0
13 Apr 2025
Vision-Language Model for Object Detection and Segmentation: A Review and Evaluation
Vision-Language Model for Object Detection and Segmentation: A Review and Evaluation
Yongchao Feng
Yajie Liu
Shuai Yang
Wenrui Cai
Jing Zhang
...
Jiahui Lv
Ziqiang Liu
Tengyuan Shi
Qingjie Liu
Yansen Wang
MLLMVLM
121
2
0
13 Apr 2025
Multi-scale Activation, Refinement, and Aggregation: Exploring Diverse Cues for Fine-Grained Bird Recognition
Multi-scale Activation, Refinement, and Aggregation: Exploring Diverse Cues for Fine-Grained Bird Recognition
Zhenru Zhang
Hao Tang
Jinhui Tang
58
0
0
12 Apr 2025
Evolved Hierarchical Masking for Self-Supervised Learning
Evolved Hierarchical Masking for Self-Supervised Learning
Zhanzhou Feng
Shiliang Zhang
141
0
0
12 Apr 2025
Learning Occlusion-Robust Vision Transformers for Real-Time UAV Tracking
Learning Occlusion-Robust Vision Transformers for Real-Time UAV Tracking
You Wu
Xucheng Wang
Xiangyang Yang
Mengyuan Liu
Dan Zeng
Hengzhou Ye
Shuiwang Li
103
0
0
12 Apr 2025
EchoMask: Speech-Queried Attention-based Mask Modeling for Holistic Co-Speech Motion Generation
EchoMask: Speech-Queried Attention-based Mask Modeling for Holistic Co-Speech Motion Generation
Xiangyue Zhang
Jianfang Li
Jiaxu Zhang
Jianqiang Ren
Liefeng Bo
Zhigang Tu
89
0
0
12 Apr 2025
FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations
FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations
Cheng-Yu Hsieh
Pavan Kumar Anasosalu Vasu
Fartash Faghri
Raviteja Vemulapalli
Chun-Liang Li
Ranjay Krishna
Oncel Tuzel
Hadi Pouransari
VLM
472
0
0
11 Apr 2025
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation
Tianwei Xiong
Jun Hao Liew
Zilong Huang
Jiashi Feng
Xihui Liu
89
1
0
11 Apr 2025
SARFormer -- An Acquisition Parameter Aware Vision Transformer for Synthetic Aperture Radar Data
SARFormer -- An Acquisition Parameter Aware Vision Transformer for Synthetic Aperture Radar Data
Jonathan Prexl
M. Recla
M. Schmitt
58
0
0
11 Apr 2025
Neural Encoding and Decoding at Scale
Neural Encoding and Decoding at Scale
Yizi Zhang
Yanchen Wang
Mehdi Azabou
Alexandre Andre
Zixuan Wang
Hanrui Lyu
International Brain Laboratory
Eva L. Dyer
Liam Paninski
Cole Hurwitz
AI4CE
160
1
0
11 Apr 2025
Deep Learning-based Intrusion Detection Systems: A Survey
Deep Learning-based Intrusion Detection Systems: A Survey
Zhiwei Xu
Yujuan Wu
Shiheng Wang
Jiabao Gao
Tian Qiu
Ziqi Wang
Hai Wan
Xibin Zhao
65
3
0
10 Apr 2025
Deep Learning Meets Teleconnections: Improving S2S Predictions for European Winter Weather
Deep Learning Meets Teleconnections: Improving S2S Predictions for European Winter Weather
P. Bommer
M. Kretschmer
Fiona R. Spuler
Kirill Bykov
Marina M.-C. Höhne
AI4Cl
51
1
0
10 Apr 2025
Breaking the Barriers: Video Vision Transformers for Word-Level Sign Language Recognition
Breaking the Barriers: Video Vision Transformers for Word-Level Sign Language Recognition
Alexander Brettmann
Jakob Grävinghoff
Marlene Rüschoff
Marie Westhues
SLR
86
0
0
10 Apr 2025
Heart Failure Prediction using Modal Decomposition and Masked Autoencoders for Scarce Echocardiography Databases
Heart Failure Prediction using Modal Decomposition and Masked Autoencoders for Scarce Echocardiography Databases
Andrés Bell-Navas
M. Villalba-Orero
Enrique Lara Pezzi
J. Garicano-Mena
S. L. Clainche
215
0
0
10 Apr 2025
DGOcc: Depth-aware Global Query-based Network for Monocular 3D Occupancy Prediction
DGOcc: Depth-aware Global Query-based Network for Monocular 3D Occupancy Prediction
Xu Zhao
Pengju Zhang
Bo Liu
Yihong Wu
95
0
0
10 Apr 2025
Learning Object Focused Attention
Learning Object Focused Attention
Vivek Trivedy
A. Almalki
Longin Jan Latecki
86
0
0
10 Apr 2025
Memory-efficient Streaming VideoLLMs for Real-time Procedural Video Understanding
Memory-efficient Streaming VideoLLMs for Real-time Procedural Video Understanding
Dibyadip Chatterjee
Edoardo Remelli
Yale Song
Bugra Tekin
Abhay Mittal
...
Shreyas Hampali
Eric Sauser
Shugao Ma
Angela Yao
Fadime Sener
VLM
101
0
0
10 Apr 2025
Self-Bootstrapping for Versatile Test-Time Adaptation
Self-Bootstrapping for Versatile Test-Time Adaptation
Shuaicheng Niu
Guohao Chen
P. Zhao
Tianyi Wang
Pengcheng Wu
Zhiqi Shen
ViTTTA
133
0
0
10 Apr 2025
Benchmarking Image Embeddings for E-Commerce: Evaluating Off-the Shelf Foundation Models, Fine-Tuning Strategies and Practical Trade-offs
Benchmarking Image Embeddings for E-Commerce: Evaluating Off-the Shelf Foundation Models, Fine-Tuning Strategies and Practical Trade-offs
Urszula Czerwinska
Cenk Bircanoglu
Jeremy Chamoux
66
0
0
10 Apr 2025
Evolutionary algorithms meet self-supervised learning: a comprehensive survey
Evolutionary algorithms meet self-supervised learning: a comprehensive survey
Adriano Vinhas
João Correia
Penousal Machado
SSLSyDa
125
0
0
09 Apr 2025
Masked Scene Modeling: Narrowing the Gap Between Supervised and Self-Supervised Learning in 3D Scene Understanding
Masked Scene Modeling: Narrowing the Gap Between Supervised and Self-Supervised Learning in 3D Scene Understanding
Pedro Hermosilla
Christian Stippel
Leon Sick
SSL3DPC
128
0
0
09 Apr 2025
A Comparison of Deep Learning Methods for Cell Detection in Digital Cytology
A Comparison of Deep Learning Methods for Cell Detection in Digital Cytology
Marco Acerbis
Natasa Sladoje
Joakim Lindblad
53
0
0
09 Apr 2025
Face-LLaVA: Facial Expression and Attribute Understanding through Instruction Tuning
Face-LLaVA: Facial Expression and Attribute Understanding through Instruction Tuning
Ashutosh Chaubey
Xulang Guan
Mohammad Soleymani
CVBMMLLMVLM
179
0
0
09 Apr 2025
SEVERE++: Evaluating Benchmark Sensitivity in Generalization of Video Representation Learning
SEVERE++: Evaluating Benchmark Sensitivity in Generalization of Video Representation Learning
Fida Mohammad Thoker
Letian Jiang
Chen Zhao
Piyush Bagad
Hazel Doughty
Bernard Ghanem
Cees G. M. Snoek
ViTSSL
118
0
0
08 Apr 2025
Falcon: Fractional Alternating Cut with Overcoming Minima in Unsupervised Segmentation
Falcon: Fractional Alternating Cut with Overcoming Minima in Unsupervised Segmentation
Xiao Zhang
Xiangyu Han
Xiwen Lai
Yao Sun
Pei Zhang
Konrad Kording
60
0
0
08 Apr 2025
TAPNext: Tracking Any Point (TAP) as Next Token Prediction
TAPNext: Tracking Any Point (TAP) as Next Token Prediction
Artem Zholus
Carl Doersch
Yi Yang
Skanda Koppula
Viorica Patraucean
Xu He
Ignacio Rocco
Mehdi S. M. Sajjadi
Sarath Chandar
Ross Goroshin
89
0
0
08 Apr 2025
Previous
123...567...949596
Next