ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.06377
  4. Cited By
Masked Autoencoders Are Scalable Vision Learners

Masked Autoencoders Are Scalable Vision Learners

11 November 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
    ViT
    TPM
ArXivPDFHTML

Papers citing "Masked Autoencoders Are Scalable Vision Learners"

50 / 4,611 papers shown
Title
A Complex-valued SAR Foundation Model Based on Physically Inspired Representation Learning
A Complex-valued SAR Foundation Model Based on Physically Inspired Representation Learning
M. D. Wang
Hanbo Bi
Yingchao Feng
Linlin Xin
Shuo Gong
Tianqi Wang
Zhiyuan Yan
Peijin Wang
Wenhui Diao
Xian Sun
29
0
0
16 Apr 2025
AFiRe: Anatomy-Driven Self-Supervised Learning for Fine-Grained Representation in Radiographic Images
AFiRe: Anatomy-Driven Self-Supervised Learning for Fine-Grained Representation in Radiographic Images
Yihang Liu
Lianghua He
Y. Wen
Longzhen Yang
Hongzhou Chen
MedIm
29
0
0
15 Apr 2025
Deep Learning Approaches for Medical Imaging Under Varying Degrees of Label Availability: A Comprehensive Survey
Deep Learning Approaches for Medical Imaging Under Varying Degrees of Label Availability: A Comprehensive Survey
Siteng Ma
Honghui Du
Yu An
Jing Wang
Qinqin Wang
Haochang Wu
Aonghus Lawlor
Ruihai Dong
32
0
0
15 Apr 2025
Negate or Embrace: On How Misalignment Shapes Multimodal Representation Learning
Negate or Embrace: On How Misalignment Shapes Multimodal Representation Learning
Yichao Cai
Yuhang Liu
Erdun Gao
T. Jiang
Zhen Zhang
Anton van den Hengel
J. Shi
55
0
0
14 Apr 2025
The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer
The Scalability of Simplicity: Empirical Analysis of Vision-Language Learning with a Single Transformer
Weixian Lei
Jiacong Wang
Haochen Wang
X. Li
Jun Hao Liew
Jiashi Feng
Zilong Huang
26
2
0
14 Apr 2025
GFT: Gradient Focal Transformer
GFT: Gradient Focal Transformer
Boris Kriuk
Simranjit Kaur Gill
Shoaib Aslam
Amir Fakhrutdinov
31
0
0
14 Apr 2025
ALMTokenizer: A Low-bitrate and Semantic-rich Audio Codec Tokenizer for Audio Language Modeling
ALMTokenizer: A Low-bitrate and Semantic-rich Audio Codec Tokenizer for Audio Language Modeling
Dongchao Yang
Songxiang Liu
Haohan Guo
Jiankun Zhao
Yuanyuan Wang
...
Xubo Liu
Xueyuan Chen
Xu Tan
Xixin Wu
H. Meng
85
0
0
14 Apr 2025
Efficient Generative Model Training via Embedded Representation Warmup
Efficient Generative Model Training via Embedded Representation Warmup
Deyuan Liu
Peng Sun
Xufeng Li
Tao Lin
21
0
0
14 Apr 2025
Automatic Detection of Intro and Credits in Video using CLIP and Multihead Attention
Automatic Detection of Intro and Credits in Video using CLIP and Multihead Attention
Vasilii Korolkov
Andrey Yanchenko
VLM
38
0
0
13 Apr 2025
Vision-Language Model for Object Detection and Segmentation: A Review and Evaluation
Vision-Language Model for Object Detection and Segmentation: A Review and Evaluation
Yongchao Feng
Yajie Liu
Shuai Yang
Wenrui Cai
J. Zhang
...
Jiahui Lv
Z. Liu
Tengyuan Shi
Qingjie Liu
Y. Wang
MLLM
VLM
55
1
0
13 Apr 2025
Causal integration of chemical structures improves representations of microscopy images for morphological profiling
Causal integration of chemical structures improves representations of microscopy images for morphological profiling
Yemin Yu
Neil A. Tenenholtz
Lester W. Mackey
Ying Wei
David Alvarez-Melis
Ava P. Amini
Alex X. Lu
29
0
0
13 Apr 2025
Learning Occlusion-Robust Vision Transformers for Real-Time UAV Tracking
Learning Occlusion-Robust Vision Transformers for Real-Time UAV Tracking
You Wu
Xucheng Wang
Xiangyang Yang
Mengyuan Liu
Dan Zeng
Hengzhou Ye
Shuiwang Li
29
0
0
12 Apr 2025
Multi-scale Activation, Refinement, and Aggregation: Exploring Diverse Cues for Fine-Grained Bird Recognition
Multi-scale Activation, Refinement, and Aggregation: Exploring Diverse Cues for Fine-Grained Bird Recognition
Z. Zhang
Hao Tang
Jinhui Tang
19
0
0
12 Apr 2025
Evolved Hierarchical Masking for Self-Supervised Learning
Evolved Hierarchical Masking for Self-Supervised Learning
Zhanzhou Feng
Shiliang Zhang
37
0
0
12 Apr 2025
EchoMask: Speech-Queried Attention-based Mask Modeling for Holistic Co-Speech Motion Generation
EchoMask: Speech-Queried Attention-based Mask Modeling for Holistic Co-Speech Motion Generation
Xiangyue Zhang
Jianfang Li
Jiaxu Zhang
Jianqiang Ren
Liefeng Bo
Zhigang Tu
25
0
0
12 Apr 2025
SARFormer -- An Acquisition Parameter Aware Vision Transformer for Synthetic Aperture Radar Data
SARFormer -- An Acquisition Parameter Aware Vision Transformer for Synthetic Aperture Radar Data
Jonathan Prexl
M. Recla
M. Schmitt
29
0
0
11 Apr 2025
FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations
FocalLens: Instruction Tuning Enables Zero-Shot Conditional Image Representations
Cheng-Yu Hsieh
Pavan Kumar Anasosalu Vasu
Fartash Faghri
Raviteja Vemulapalli
Chun-Liang Li
Ranjay Krishna
Oncel Tuzel
Hadi Pouransari
VLM
123
0
0
11 Apr 2025
Neural Encoding and Decoding at Scale
Neural Encoding and Decoding at Scale
Yizi Zhang
Yanchen Wang
Mehdi Azabou
Alexandre Andre
Zixuan Wang
Hanrui Lyu
International Brain Laboratory
Eva L. Dyer
Liam Paninski
Cole Hurwitz
AI4CE
29
0
0
11 Apr 2025
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation
GigaTok: Scaling Visual Tokenizers to 3 Billion Parameters for Autoregressive Image Generation
Tianwei Xiong
Jun Hao Liew
Zilong Huang
Jiashi Feng
Xihui Liu
31
0
0
11 Apr 2025
Learning Object Focused Attention
Learning Object Focused Attention
Vivek Trivedy
A. Almalki
Longin Jan Latecki
31
0
0
10 Apr 2025
DGOcc: Depth-aware Global Query-based Network for Monocular 3D Occupancy Prediction
DGOcc: Depth-aware Global Query-based Network for Monocular 3D Occupancy Prediction
Xu Zhao
Pengju Zhang
Bo Liu
Yihong Wu
41
0
0
10 Apr 2025
Heart Failure Prediction using Modal Decomposition and Masked Autoencoders for Scarce Echocardiography Databases
Heart Failure Prediction using Modal Decomposition and Masked Autoencoders for Scarce Echocardiography Databases
Andrés Bell-Navas
M. Villalba-Orero
Enrique Lara Pezzi
J. Garicano-Mena
S. L. Clainche
51
0
0
10 Apr 2025
Memory-efficient Streaming VideoLLMs for Real-time Procedural Video Understanding
Memory-efficient Streaming VideoLLMs for Real-time Procedural Video Understanding
Dibyadip Chatterjee
Edoardo Remelli
Yale Song
Bugra Tekin
Abhay Mittal
...
Shreyas Hampali
Eric Sauser
Shugao Ma
Angela Yao
Fadime Sener
VLM
35
0
0
10 Apr 2025
Deep Learning Meets Teleconnections: Improving S2S Predictions for European Winter Weather
Deep Learning Meets Teleconnections: Improving S2S Predictions for European Winter Weather
P. Bommer
M. Kretschmer
Fiona R. Spuler
Kirill Bykov
Marina M.-C. Höhne
AI4Cl
18
0
0
10 Apr 2025
Benchmarking Image Embeddings for E-Commerce: Evaluating Off-the Shelf Foundation Models, Fine-Tuning Strategies and Practical Trade-offs
Benchmarking Image Embeddings for E-Commerce: Evaluating Off-the Shelf Foundation Models, Fine-Tuning Strategies and Practical Trade-offs
Urszula Czerwinska
Cenk Bircanoglu
Jeremy Chamoux
33
0
0
10 Apr 2025
Breaking the Barriers: Video Vision Transformers for Word-Level Sign Language Recognition
Breaking the Barriers: Video Vision Transformers for Word-Level Sign Language Recognition
Alexander Brettmann
Jakob Grävinghoff
Marlene Rüschoff
Marie Westhues
SLR
51
0
0
10 Apr 2025
Deep Learning-based Intrusion Detection Systems: A Survey
Deep Learning-based Intrusion Detection Systems: A Survey
Zhiwei Xu
Yujuan Wu
Shiheng Wang
Jiabao Gao
Tian Qiu
Ziqi Wang
Hai Wan
Xibin Zhao
21
1
0
10 Apr 2025
Self-Bootstrapping for Versatile Test-Time Adaptation
Self-Bootstrapping for Versatile Test-Time Adaptation
Shuaicheng Niu
Guohao Chen
P. Zhao
Tianyi Wang
Pengcheng Wu
Zhiqi Shen
ViT
TTA
55
0
0
10 Apr 2025
Evolutionary algorithms meet self-supervised learning: a comprehensive survey
Evolutionary algorithms meet self-supervised learning: a comprehensive survey
Adriano Vinhas
João Correia
Penousal Machado
SSL
SyDa
59
0
0
09 Apr 2025
Masked Scene Modeling: Narrowing the Gap Between Supervised and Self-Supervised Learning in 3D Scene Understanding
Masked Scene Modeling: Narrowing the Gap Between Supervised and Self-Supervised Learning in 3D Scene Understanding
Pedro Hermosilla
Christian Stippel
Leon Sick
SSL
3DPC
74
0
0
09 Apr 2025
Face-LLaVA: Facial Expression and Attribute Understanding through Instruction Tuning
Face-LLaVA: Facial Expression and Attribute Understanding through Instruction Tuning
Ashutosh Chaubey
Xulang Guan
Mohammad Soleymani
CVBM
MLLM
VLM
66
0
0
09 Apr 2025
A Comparison of Deep Learning Methods for Cell Detection in Digital Cytology
A Comparison of Deep Learning Methods for Cell Detection in Digital Cytology
Marco Acerbis
Natasa Sladoje
Joakim Lindblad
22
0
0
09 Apr 2025
Earth-Adapter: Bridge the Geospatial Domain Gaps with Mixture of Frequency Adaptation
Earth-Adapter: Bridge the Geospatial Domain Gaps with Mixture of Frequency Adaptation
Xiaoxing Hu
Ziyang Gong
Y. Wang
Yuru Jia
Gen Luo
Xue Yang
100
0
0
08 Apr 2025
Falcon: Fractional Alternating Cut with Overcoming Minima in Unsupervised Segmentation
Falcon: Fractional Alternating Cut with Overcoming Minima in Unsupervised Segmentation
Xiao Zhang
Xiangyu Han
Xiwen Lai
Yao Sun
Pei Zhang
Konrad Kording
29
0
0
08 Apr 2025
SEVERE++: Evaluating Benchmark Sensitivity in Generalization of Video Representation Learning
SEVERE++: Evaluating Benchmark Sensitivity in Generalization of Video Representation Learning
Fida Mohammad Thoker
Letian Jiang
Chen Zhao
Piyush Bagad
Hazel Doughty
Bernard Ghanem
Cees G. M. Snoek
ViT
SSL
46
0
0
08 Apr 2025
ViTaMIn: Learning Contact-Rich Tasks Through Robot-Free Visuo-Tactile Manipulation Interface
ViTaMIn: Learning Contact-Rich Tasks Through Robot-Free Visuo-Tactile Manipulation Interface
Fangchen Liu
Chuanyu Li
Yihua Qin
Ankit Shaw
J. Xu
Pieter Abbeel
Rui Chen
38
2
0
08 Apr 2025
MAPLE: Encoding Dexterous Robotic Manipulation Priors Learned From Egocentric Videos
MAPLE: Encoding Dexterous Robotic Manipulation Priors Learned From Egocentric Videos
Alexey Gavryushin
Xi Wang
Robert J. S. Malate
Chenyu Yang
X. Jia
Shubh Goel
Davide Liconti
René Zurbrugg
Robert K. Katzschmann
Marc Pollefeys
34
0
0
08 Apr 2025
TAPNext: Tracking Any Point (TAP) as Next Token Prediction
TAPNext: Tracking Any Point (TAP) as Next Token Prediction
Artem Zholus
Carl Doersch
Yi Yang
Skanda Koppula
Viorica Patraucean
Xu He
Ignacio Rocco
Mehdi S. M. Sajjadi
Sarath Chandar
Ross Goroshin
30
0
0
08 Apr 2025
SapiensID: Foundation for Human Recognition
SapiensID: Foundation for Human Recognition
Minchul Kim
Dingqiang Ye
Yiyang Su
Feng Liu
Xiaoming Liu
CVBM
VLM
44
0
0
07 Apr 2025
S^4M: Boosting Semi-Supervised Instance Segmentation with SAM
S^4M: Boosting Semi-Supervised Instance Segmentation with SAM
Heeji Yoon
Heeseong Shin
Eunbeen Hong
Hyunwook Choi
Hansang Cho
Daun Jeong
Seungryong Kim
24
0
0
07 Apr 2025
Uni4D: A Unified Self-Supervised Learning Framework for Point Cloud Videos
Uni4D: A Unified Self-Supervised Learning Framework for Point Cloud Videos
Zhi Zuo
Chenyi Zhuang
Zhiqiang Shen
Pan Gao
Jie Qin
3DPC
27
0
0
07 Apr 2025
Attributed Synthetic Data Generation for Zero-shot Domain-specific Image Classification
Attributed Synthetic Data Generation for Zero-shot Domain-specific Image Classification
Shijian Wang
Linxin Song
Ryotaro Shimizu
M. Goto
Hanqian Wu
VLM
25
0
0
06 Apr 2025
A Survey of Pathology Foundation Model: Progress and Future Directions
A Survey of Pathology Foundation Model: Progress and Future Directions
Conghao Xiong
Hao Chen
Joseph J. Y. Sung
LM&MA
AI4CE
51
0
0
05 Apr 2025
Window Token Concatenation for Efficient Visual Large Language Models
Window Token Concatenation for Efficient Visual Large Language Models
Yifan Li
Wentao Bao
Botao Ye
Zhen Tan
Tianlong Chen
Huan Liu
Yu Kong
VLM
39
0
0
05 Apr 2025
MInCo: Mitigating Information Conflicts in Distracted Visual Model-based Reinforcement Learning
MInCo: Mitigating Information Conflicts in Distracted Visual Model-based Reinforcement Learning
Shiguang Sun
Hanbo Zhang
Zeyang Liu
Xinrui Yang
Lipeng Wan
Bing Yan
Xingyu Chen
Xuguang Lan
32
0
0
05 Apr 2025
Resilience of Vision Transformers for Domain Generalisation in the Presence of Out-of-Distribution Noisy Images
Resilience of Vision Transformers for Domain Generalisation in the Presence of Out-of-Distribution Noisy Images
Hamza Riaz
A. Smeaton
39
0
0
05 Apr 2025
Detecting underdetermination in parameterized quantum circuits
Detecting underdetermination in parameterized quantum circuits
Marie Kempkes
Jakob Spiegelberg
Evert van Nieuwenburg
Vedran Dunjko
34
0
0
04 Apr 2025
AutoSSVH: Exploring Automated Frame Sampling for Efficient Self-Supervised Video Hashing
AutoSSVH: Exploring Automated Frame Sampling for Efficient Self-Supervised Video Hashing
Niu Lian
Jun Li
Jinpeng Wang
Ruisheng Luo
Yaowei Wang
Shu-Tao Xia
Bin Chen
95
0
0
04 Apr 2025
Temporal-contextual Event Learning for Pedestrian Crossing Intent Prediction
Temporal-contextual Event Learning for Pedestrian Crossing Intent Prediction
Hongbin Liang
Hezhe Qiao
Wei Huang
Qizhou Wang
Mingsheng Shang
Lin Chen
31
0
0
04 Apr 2025
MIMRS: A Survey on Masked Image Modeling in Remote Sensing
MIMRS: A Survey on Masked Image Modeling in Remote Sensing
Shabnam Choudhury
Akhil Vasim
Michael Schmitt
Biplab Banerjee
30
0
0
04 Apr 2025
Previous
123456...919293
Next