ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.06377
  4. Cited By
Masked Autoencoders Are Scalable Vision Learners
v1v2v3 (latest)

Masked Autoencoders Are Scalable Vision Learners

11 November 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
    ViTTPM
ArXiv (abs)PDFHTML

Papers citing "Masked Autoencoders Are Scalable Vision Learners"

50 / 4,779 papers shown
Title
What Makes for Good Visual Tokenizers for Large Language Models?
What Makes for Good Visual Tokenizers for Large Language Models?
Guangzhi Wang
Yixiao Ge
Xiaohan Ding
Mohan S. Kankanhalli
Ying Shan
MLLMVLM
96
39
0
20 May 2023
Joint Generative-Contrastive Representation Learning for Anomalous Sound
  Detection
Joint Generative-Contrastive Representation Learning for Anomalous Sound Detection
Xiaoyan Zeng
Yan Song
Zhu Zhuo
Yujie Zhou
Yu-Hong Li
Hui Xue
Lirong Dai
Ian Mcloughlin
83
13
0
20 May 2023
CARD: Channel Aligned Robust Blend Transformer for Time Series
  Forecasting
CARD: Channel Aligned Robust Blend Transformer for Time Series Forecasting
Xue Wang
Tian Zhou
Qingsong Wen
Jinyang Gao
Bolin Ding
Rong Jin
AI4TS
85
45
0
20 May 2023
Multimodal Web Navigation with Instruction-Finetuned Foundation Models
Multimodal Web Navigation with Instruction-Finetuned Foundation Models
Hiroki Furuta
Kuang-Huei Lee
Ofir Nachum
Yutaka Matsuo
Aleksandra Faust
S. Gu
Izzeddin Gur
LM&Ro
181
103
0
19 May 2023
Neural Foundations of Mental Simulation: Future Prediction of Latent
  Representations on Dynamic Scenes
Neural Foundations of Mental Simulation: Future Prediction of Latent Representations on Dynamic Scenes
Aran Nayebi
R. Rajalingham
M. Jazayeri
G. R. Yang
82
20
0
19 May 2023
S-JEA: Stacked Joint Embedding Architectures for Self-Supervised Visual
  Representation Learning
S-JEA: Stacked Joint Embedding Architectures for Self-Supervised Visual Representation Learning
Alvzbveta Manová
A. Durrant
Georgios Leontidis
SSL
73
4
0
19 May 2023
Recycle-and-Distill: Universal Compression Strategy for
  Transformer-based Speech SSL Models with Attention Map Reusing and Masking
  Distillation
Recycle-and-Distill: Universal Compression Strategy for Transformer-based Speech SSL Models with Attention Map Reusing and Masking Distillation
Kangwook Jang
Sungnyun Kim
Se-Young Yun
Hoi-Rim Kim
100
5
0
19 May 2023
Cinematic Mindscapes: High-quality Video Reconstruction from Brain
  Activity
Cinematic Mindscapes: High-quality Video Reconstruction from Brain Activity
Zijiao Chen
Jiaxin Qing
J. Zhou
DiffMVGen
81
59
0
19 May 2023
PointGPT: Auto-regressively Generative Pre-training from Point Clouds
PointGPT: Auto-regressively Generative Pre-training from Point Clouds
Guang-Sheng Chen
Meiling Wang
Yi Yang
Kai Yu
Li-xin Yuan
Yufeng Yue
3DPC
61
90
0
19 May 2023
CM-MaskSD: Cross-Modality Masked Self-Distillation for Referring Image
  Segmentation
CM-MaskSD: Cross-Modality Masked Self-Distillation for Referring Image Segmentation
Wenxuan Wang
Jing Liu
Xingjian He
Yisi Zhang
Cheng Chen
Jiachen Shen
Yan Zhang
Jiangyun Li
72
14
0
19 May 2023
Reciprocal Attention Mixing Transformer for Lightweight Image
  Restoration
Reciprocal Attention Mixing Transformer for Lightweight Image Restoration
Haram Choi
Cheolwoong Na
Jihyeon Oh
Seungjae Lee
Jinseop S. Kim
Subeen Choe
Jeongmin Lee
Taehoon Kim
Jihoon Yang
96
9
0
19 May 2023
SurgMAE: Masked Autoencoders for Long Surgical Video Analysis
SurgMAE: Masked Autoencoders for Long Surgical Video Analysis
Muhammad Abdullah Jamal
Omid Mohareri
55
6
0
19 May 2023
Few-Shot Learning with Visual Distribution Calibration and Cross-Modal
  Distribution Alignment
Few-Shot Learning with Visual Distribution Calibration and Cross-Modal Distribution Alignment
Runqi Wang
Hao Zheng
Xiaoyue Duan
Jianzhuang Liu
Yuning Lu
Tian Wang
Songcen Xu
Baochang Zhang
VLM
66
12
0
19 May 2023
SlotDiffusion: Object-Centric Generative Modeling with Diffusion Models
SlotDiffusion: Object-Centric Generative Modeling with Diffusion Models
Ziyi Wu
Jingyu Hu
Wuyue Lu
Igor Gilitschenski
Animesh Garg
DiffMOCL
126
47
0
18 May 2023
Information-Ordered Bottlenecks for Adaptive Semantic Compression
Information-Ordered Bottlenecks for Adaptive Semantic Compression
Matthew Ho
Xiao-Fen Zhao
Benjamin Dan Wandelt
56
5
0
18 May 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited
  Modalities
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLMMLLMObjD
151
122
0
18 May 2023
Universal Domain Adaptation from Foundation Models: A Baseline Study
Universal Domain Adaptation from Foundation Models: A Baseline Study
Bin Deng
Kui Jia
VLM
96
8
0
18 May 2023
Annotation-free Audio-Visual Segmentation
Annotation-free Audio-Visual Segmentation
Jinxian Liu
Yu Wang
Chen Ju
Chaofan Ma
Ya Zhang
Weidi Xie
VOSVLM
111
30
0
18 May 2023
HMSN: Hyperbolic Self-Supervised Learning by Clustering with Ideal
  Prototypes
HMSN: Hyperbolic Self-Supervised Learning by Clustering with Ideal Prototypes
A. Durrant
Georgios Leontidis
SSL
76
4
0
18 May 2023
How Deep Learning Sees the World: A Survey on Adversarial Attacks &
  Defenses
How Deep Learning Sees the World: A Survey on Adversarial Attacks & Defenses
Joana Cabral Costa
Tiago Roxo
Hugo Manuel Proença
Pedro R. M. Inácio
AAML
122
64
0
18 May 2023
A Survey on Time-Series Pre-Trained Models
A Survey on Time-Series Pre-Trained Models
Qianli Ma
Ziqiang Liu
Zhenjing Zheng
Ziyang Huang
Siying Zhu
Zhongzhong Yu
James T. Kwok
AI4TS
103
56
0
18 May 2023
CLIP-GCD: Simple Language Guided Generalized Category Discovery
CLIP-GCD: Simple Language Guided Generalized Category Discovery
Rabah Ouldnoughi
Chia-Wen Kuo
Z. Kira
VLM
82
14
0
17 May 2023
Transfer Learning for Fine-grained Classification Using Semi-supervised
  Learning and Visual Transformers
Transfer Learning for Fine-grained Classification Using Semi-supervised Learning and Visual Transformers
Manuel Lagunas
Brayan Impata
Victor Martinez
Virginia Fernandez
Christos Georgakis
Sofia Braun
Felipe Bertrand
ViT
69
8
0
17 May 2023
Understanding 3D Object Interaction from a Single Image
Understanding 3D Object Interaction from a Single Image
Shengyi Qian
David Fouhey
106
22
0
16 May 2023
Sequence-to-Sequence Pre-training with Unified Modality Masking for
  Visual Document Understanding
Sequence-to-Sequence Pre-training with Unified Modality Masking for Visual Document Understanding
ShuWei Feng
Tianyang Zhan
Zhanming Jie
Trung Quoc Luong
Xiaoran Jin
51
1
0
16 May 2023
Evaluation of self-supervised pre-training for automatic infant movement
  classification using wearable movement sensors
Evaluation of self-supervised pre-training for automatic infant movement classification using wearable movement sensors
Einari Vaaras
Manu Airaksinen
S. Vanhatalo
Okko Räsänen
102
4
0
16 May 2023
MIMEx: Intrinsic Rewards from Masked Input Modeling
MIMEx: Intrinsic Rewards from Masked Input Modeling
Toru Lin
Allan Jabri
OffRL
105
6
0
15 May 2023
Straightening Out the Straight-Through Estimator: Overcoming
  Optimization Challenges in Vector Quantized Networks
Straightening Out the Straight-Through Estimator: Overcoming Optimization Challenges in Vector Quantized Networks
Minyoung Huh
Brian Cheung
Pulkit Agrawal
Phillip Isola
MQ
62
55
0
15 May 2023
Learning Better Contrastive View from Radiologist's Gaze
Learning Better Contrastive View from Radiologist's Gaze
Sheng Wang
Zixu Zhuang
Xi Ouyang
Lichi Zhang
Zheren Li
Chong Ma
Tianming Liu
Dinggang Shen
Qian Wang
MedIm
66
2
0
15 May 2023
GeoMAE: Masked Geometric Target Prediction for Self-supervised Point
  Cloud Pre-Training
GeoMAE: Masked Geometric Target Prediction for Self-supervised Point Cloud Pre-Training
Xiaoyu Tian
Haoxi Ran
Yue Wang
Hang Zhao
3DPCViT
64
42
0
15 May 2023
Improved baselines for vision-language pre-training
Improved baselines for vision-language pre-training
Enrico Fini
Pietro Astolfi
Adriana Romero Soriano
Jakob Verbeek
M. Drozdzal
SSLCLIPVLM
131
23
0
15 May 2023
Artificial intelligence to advance Earth observation: a perspective
Artificial intelligence to advance Earth observation: a perspective
D. Tuia
Konrad Schindler
Begüm Demir
Gustau Camps-Valls
Xiao Xiang Zhu
...
Mihai Datcu
Jorge-Arnulfo Quiané-Ruiz
Volker Markl
Bertrand Le Saux
Rochelle Schneider
122
12
0
15 May 2023
PLIP: Language-Image Pre-training for Person Representation Learning
PLIP: Language-Image Pre-training for Person Representation Learning
Jia-li Zuo
Jiahao Hong
Feng Zhang
Changqian Yu
Hanyu Zhou
Changxin Gao
Nong Sang
Jingdong Wang
VLMMLLM
136
38
0
15 May 2023
Parameter-Efficient Fine-Tuning with Layer Pruning on Free-Text
  Sequence-to-Sequence Modeling
Parameter-Efficient Fine-Tuning with Layer Pruning on Free-Text Sequence-to-Sequence Modeling
Y. Zhu
Xuebing Yang
Yuanyuan Wu
Wensheng Zhang
MedIm
41
2
0
15 May 2023
Parameter-Efficient Fine-Tuning for Medical Image Analysis: The Missed
  Opportunity
Parameter-Efficient Fine-Tuning for Medical Image Analysis: The Missed Opportunity
Raman Dutt
Linus Ericsson
Pedro Sanchez
Sotirios A. Tsaftaris
Timothy M. Hospedales
MedIm
131
55
0
14 May 2023
A Comprehensive Survey on Segment Anything Model for Vision and Beyond
A Comprehensive Survey on Segment Anything Model for Vision and Beyond
Chunhui Zhang
Li Liu
Yawen Cui
Guanjie Huang
Weilin Lin
Yiqian Yang
Yuehong Hu
VLM
102
101
0
14 May 2023
Multi-task Paired Masking with Alignment Modeling for Medical
  Vision-Language Pre-training
Multi-task Paired Masking with Alignment Modeling for Medical Vision-Language Pre-training
Kecheng Zhang
Shuai Liu
Jun Yu
Han Jiang
Jianping Fan
Qing-An Huang
Weidong Han
MedIm
80
33
0
13 May 2023
Mask to reconstruct: Cooperative Semantics Completion for Video-text
  Retrieval
Mask to reconstruct: Cooperative Semantics Completion for Video-text Retrieval
Han Fang
Zhifei Yang
Xianghao Zang
Chao Ban
Hao Sun
VGen
72
3
0
13 May 2023
Exploring the Rate-Distortion-Complexity Optimization in Neural Image
  Compression
Exploring the Rate-Distortion-Complexity Optimization in Neural Image Compression
Yixin Gao
Runsen Feng
Zongyu Guo
Zhibo Chen
69
6
0
12 May 2023
OneCAD: One Classifier for All image Datasets using multimodal learning
OneCAD: One Classifier for All image Datasets using multimodal learning
S. Wadekar
Eugenio Culurciello
108
0
0
11 May 2023
An Inverse Scaling Law for CLIP Training
An Inverse Scaling Law for CLIP Training
Xianhang Li
Zeyu Wang
Cihang Xie
VLMCLIP
117
58
0
11 May 2023
Region-Aware Pretraining for Open-Vocabulary Object Detection with
  Vision Transformers
Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers
Dahun Kim
A. Angelova
Weicheng Kuo
ObjDViTVLM
86
80
0
11 May 2023
Extending Audio Masked Autoencoders Toward Audio Restoration
Extending Audio Masked Autoencoders Toward Audio Restoration
Zhi-Wei Zhong
Hao Shi
M. Hirano
Kazuki Shimada
Kazuya Tateishi
Takashi Shibuya
Shusuke Takahashi
Yuki Mitsufuji
67
6
0
11 May 2023
XTab: Cross-table Pretraining for Tabular Transformers
XTab: Cross-table Pretraining for Tabular Transformers
Bingzhao Zhu
Xingjian Shi
Nick Erickson
Mu Li
George Karypis
Mahsa Shoaran
LMTD
127
78
0
10 May 2023
Visual Tuning
Visual Tuning
Bruce X. B. Yu
Jianlong Chang
Haixin Wang
Lin Liu
Shijie Wang
...
Lingxi Xie
Haojie Li
Zhouchen Lin
Qi Tian
Chang Wen Chen
VLM
174
41
0
10 May 2023
MMoT: Mixture-of-Modality-Tokens Transformer for Composed Multimodal
  Conditional Image Synthesis
MMoT: Mixture-of-Modality-Tokens Transformer for Composed Multimodal Conditional Image Synthesis
Jinsheng Zheng
Daqing Liu
Chaoyue Wang
Minghui Hu
Zuopeng Yang
Changxing Ding
Dacheng Tao
72
1
0
10 May 2023
Medical supervised masked autoencoders: Crafting a better masking
  strategy and efficient fine-tuning schedule for medical image classification
Medical supervised masked autoencoders: Crafting a better masking strategy and efficient fine-tuning schedule for medical image classification
Jia-ju Mao
Shu-Hua Guo
Yuan Chang
Xuesong Yin
Binling Nie
77
2
0
10 May 2023
Vision-Language Models in Remote Sensing: Current Progress and Future
  Trends
Vision-Language Models in Remote Sensing: Current Progress and Future Trends
Xiang Li
Congcong Wen
Yuan Hu
Zhenghang Yuan
Xiao Xiang Zhu
VLM
82
82
0
09 May 2023
Self-supervised dense representation learning for live-cell microscopy
  with time arrow prediction
Self-supervised dense representation learning for live-cell microscopy with time arrow prediction
Benjamin Gallusser
Max Stieber
Martin Weigert
126
7
0
09 May 2023
Self-Supervised Learning for Point Clouds Data: A Survey
Self-Supervised Learning for Point Clouds Data: A Survey
Changyu Zeng
Wei Wang
A. Nguyen
Yutao Yue
3DPC
88
0
0
09 May 2023
Previous
123...676869...949596
Next