ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.06377
  4. Cited By
Masked Autoencoders Are Scalable Vision Learners
v1v2v3 (latest)

Masked Autoencoders Are Scalable Vision Learners

11 November 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
    ViTTPM
ArXiv (abs)PDFHTML

Papers citing "Masked Autoencoders Are Scalable Vision Learners"

50 / 4,777 papers shown
Title
Pushing the limits of raw waveform speaker recognition
Pushing the limits of raw waveform speaker recognition
Jee-weon Jung
You Jin Kim
Hee-Soo Heo
Bong-Jin Lee
Youngki Kwon
Joon Son Chung
88
90
0
16 Mar 2022
Data Efficient 3D Learner via Knowledge Transferred from 2D Model
Data Efficient 3D Learner via Knowledge Transferred from 2D Model
Ping Yu
Cheng Sun
Min Sun
3DPC
54
11
0
16 Mar 2022
P-STMO: Pre-Trained Spatial Temporal Many-to-One Model for 3D Human Pose
  Estimation
P-STMO: Pre-Trained Spatial Temporal Many-to-One Model for 3D Human Pose Estimation
Wenkang Shan
Zhenhua Liu
Xinfeng Zhang
Shanshe Wang
Siwei Ma
Wen Gao
3DH
107
125
0
15 Mar 2022
SuperAnimal pretrained pose estimation models for behavioral analysis
SuperAnimal pretrained pose estimation models for behavioral analysis
Shaokai Ye
Anastasiia Filippova
Jessy Lauer
Steffen Schneider
Maxime Vidal
Tian Qiu
Alexander Mathis
Mackenzie W. Mathis
106
33
0
14 Mar 2022
Rethinking Minimal Sufficient Representation in Contrastive Learning
Rethinking Minimal Sufficient Representation in Contrastive Learning
Haoqing Wang
Xun Guo
Zhiwei Deng
Yan Lu
SSL
78
76
0
14 Mar 2022
Masked Autoencoders for Point Cloud Self-supervised Learning
Masked Autoencoders for Point Cloud Self-supervised Learning
Yatian Pang
Wenxiao Wang
Francis E. H. Tay
Wen Liu
Yonghong Tian
Liuliang Yuan
3DPCViT
117
483
0
13 Mar 2022
Masked Visual Pre-training for Motor Control
Masked Visual Pre-training for Motor Control
Tete Xiao
Ilija Radosavovic
Trevor Darrell
Jitendra Malik
SSL
116
250
0
11 Mar 2022
Active Token Mixer
Active Token Mixer
Guoqiang Wei
Zhizheng Zhang
Cuiling Lan
Yan Lu
Zhibo Chen
57
15
0
11 Mar 2022
Visualizing and Understanding Patch Interactions in Vision Transformer
Visualizing and Understanding Patch Interactions in Vision Transformer
Jie Ma
Yalong Bai
Bineng Zhong
Wei Zhang
Ting Yao
Tao Mei
ViT
54
35
0
11 Mar 2022
Self Pre-training with Masked Autoencoders for Medical Image
  Classification and Segmentation
Self Pre-training with Masked Autoencoders for Medical Image Classification and Segmentation
Lei Zhou
Huidong Liu
Joseph Bae
Junjun He
Dimitris Samaras
Prateek Prasanna
MedImViT
71
70
0
10 Mar 2022
Backbone is All Your Need: A Simplified Architecture for Visual Object
  Tracking
Backbone is All Your Need: A Simplified Architecture for Visual Object Tracking
Boyu Chen
Peixia Li
Lei Bai
Leixian Qiao
Qiuhong Shen
Yue Liu
Weihao Gan
Wei Wu
Wanli Ouyang
ViTVOT
78
198
0
10 Mar 2022
MVP: Multimodality-guided Visual Pre-training
MVP: Multimodality-guided Visual Pre-training
Longhui Wei
Lingxi Xie
Wen-gang Zhou
Houqiang Li
Qi Tian
82
108
0
10 Mar 2022
Text-DIAE: A Self-Supervised Degradation Invariant Autoencoders for Text
  Recognition and Document Enhancement
Text-DIAE: A Self-Supervised Degradation Invariant Autoencoders for Text Recognition and Document Enhancement
Mohamed Ali Souibgui
Sanket Biswas
Andrés Mafla
Ali Furkan Biten
Alicia Fornés
Yousri Kessentini
Josep Lladós
Lluís Gómez
Dimosthenis Karatzas
75
25
0
09 Mar 2022
Multiscale Convolutional Transformer with Center Mask Pretraining for
  Hyperspectral Image Classification
Multiscale Convolutional Transformer with Center Mask Pretraining for Hyperspectral Image Classification
Sen Jia
Yifan Wang
ViT
83
14
0
09 Mar 2022
Uni4Eye: Unified 2D and 3D Self-supervised Pre-training via Masked Image
  Modeling Transformer for Ophthalmic Image Classification
Uni4Eye: Unified 2D and 3D Self-supervised Pre-training via Masked Image Modeling Transformer for Ophthalmic Image Classification
Zhiyuan Cai
Li Lin
Huaqing He
Xiaoying Tang
ViTMedIm
78
28
0
09 Mar 2022
Domain Generalization using Pretrained Models without Fine-tuning
Domain Generalization using Pretrained Models without Fine-tuning
Ziyue Li
Kan Ren
Xinyang Jiang
Yue Liu
Haipeng Zhang
Dongsheng Li
VLM
94
38
0
09 Mar 2022
Gait Recognition with Mask-based Regularization
Gait Recognition with Mask-based Regularization
Chuanfu Shen
Beibei Lin
Shunli Zhang
George Q. Huang
Shiqi Yu
Xin-cen Yu
CVBM
169
19
0
08 Mar 2022
Monocular Robot Navigation with Self-Supervised Pretrained Vision
  Transformers
Monocular Robot Navigation with Self-Supervised Pretrained Vision Transformers
Miguel A. Saavedra-Ruiz
Sacha Morin
Liam Paull
MDEViT
62
3
0
07 Mar 2022
DiT: Self-supervised Pre-training for Document Image Transformer
DiT: Self-supervised Pre-training for Document Image Transformer
Junlong Li
Yiheng Xu
Tengchao Lv
Lei Cui
Chaoxi Zhang
Furu Wei
ViTVLM
126
170
0
04 Mar 2022
ViT-P: Rethinking Data-efficient Vision Transformers from Locality
ViT-P: Rethinking Data-efficient Vision Transformers from Locality
Bin Chen
Ran A. Wang
Di Ming
Xin Feng
ViT
45
7
0
04 Mar 2022
Vision-Language Intelligence: Tasks, Representation Learning, and Large
  Models
Vision-Language Intelligence: Tasks, Representation Learning, and Large Models
Feng Li
Hao Zhang
Yi-Fan Zhang
Shixuan Liu
Jian Guo
L. Ni
Pengchuan Zhang
Lei Zhang
AI4TSVLM
79
37
0
03 Mar 2022
Instance Segmentation for Autonomous Log Grasping in Forestry Operations
Instance Segmentation for Autonomous Log Grasping in Forestry Operations
Jean-Michel Fortin
Olivier Gamache
Vincent Grondin
F. Pomerleau
Philippe Giguère
130
24
0
03 Mar 2022
Audio Self-supervised Learning: A Survey
Audio Self-supervised Learning: A Survey
Shuo Liu
Adria Mallol-Ragolta
Emilia Parada-Cabeleiro
Kun Qian
Xingshuo Jing
Alexander Kathan
Bin Hu
Bjoern W. Schuller
SSL
102
109
0
02 Mar 2022
Learning Moving-Object Tracking with FMCW LiDAR
Learning Moving-Object Tracking with FMCW LiDAR
Yinjuan Gu
Hongzhi Cheng
Kafeng Wang
Dejing Dou
Chengzhong Xu
Hui Kong
67
6
0
02 Mar 2022
LISA: Learning Interpretable Skill Abstractions from Language
LISA: Learning Interpretable Skill Abstractions from Language
Divyansh Garg
Skanda Vaidyanath
Kuno Kim
Jiaming Song
Stefano Ermon
LM&RoOffRL
235
30
0
28 Feb 2022
Unsupervised Point Cloud Representation Learning with Deep Neural
  Networks: A Survey
Unsupervised Point Cloud Representation Learning with Deep Neural Networks: A Survey
Aoran Xiao
Jiaxing Huang
Dayan Guan
Xiaoqin Zhang
Shijian Lu
Ling Shao
3DPC
114
75
0
28 Feb 2022
Reconstruction Task Finds Universal Winning Tickets
Reconstruction Task Finds Universal Winning Tickets
Ruichen Li
Binghui Li
Qi Qian
Liwei Wang
117
0
0
23 Feb 2022
HiP: Hierarchical Perceiver
HiP: Hierarchical Perceiver
João Carreira
Skanda Koppula
Daniel Zoran
Adrià Recasens
Catalin Ionescu
...
M. Botvinick
Oriol Vinyals
Karen Simonyan
Andrew Zisserman
Andrew Jaegle
VLM
120
14
0
22 Feb 2022
ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for
  Image Recognition and Beyond
ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond
Qiming Zhang
Yufei Xu
Jing Zhang
Dacheng Tao
ViT
118
235
0
21 Feb 2022
Visual Attention Network
Visual Attention Network
Meng-Hao Guo
Chengrou Lu
Zheng-Ning Liu
Ming-Ming Cheng
Shiyong Hu
ViTVLM
140
679
0
20 Feb 2022
Masked prediction tasks: a parameter identifiability view
Masked prediction tasks: a parameter identifiability view
Bingbin Liu
Daniel J. Hsu
Pradeep Ravikumar
Andrej Risteski
SSLOOD
67
4
0
18 Feb 2022
Graph Masked Autoencoders with Transformers
Graph Masked Autoencoders with Transformers
Sixiao Zhang
Hongxu Chen
Haoran Yang
Xiangguo Sun
Philip S. Yu
Guandong Xu
66
18
0
17 Feb 2022
Vision Models Are More Robust And Fair When Pretrained On Uncurated
  Images Without Supervision
Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision
Priya Goyal
Quentin Duval
Isaac Seessel
Mathilde Caron
Ishan Misra
Levent Sagun
Armand Joulin
Piotr Bojanowski
VLMSSL
128
111
0
16 Feb 2022
Should You Mask 15% in Masked Language Modeling?
Should You Mask 15% in Masked Language Modeling?
Alexander Wettig
Tianyu Gao
Zexuan Zhong
Danqi Chen
CVBM
105
167
0
16 Feb 2022
Meta Knowledge Distillation
Meta Knowledge Distillation
Jihao Liu
Boxiao Liu
Hongsheng Li
Yu Liu
83
26
0
16 Feb 2022
CommerceMM: Large-Scale Commerce MultiModal Representation Learning with
  Omni Retrieval
CommerceMM: Large-Scale Commerce MultiModal Representation Learning with Omni Retrieval
Licheng Yu
Jun Chen
Animesh Sinha
Mengjiao MJ Wang
Hugo Chen
Tamara L. Berg
Ning Zhang
VLM
93
39
0
15 Feb 2022
AI can evolve without labels: self-evolving vision transformer for chest
  X-ray diagnosis through knowledge distillation
AI can evolve without labels: self-evolving vision transformer for chest X-ray diagnosis through knowledge distillation
S. Park
Gwanghyun Kim
Y. Oh
J. Seo
Sang Min Lee
Jin Hwan Kim
Sungjun Moon
Jae-Kwang Lim
Changhyun Park
Jong Chul Ye
ViTMedIm
76
51
0
13 Feb 2022
MaskGIT: Masked Generative Image Transformer
MaskGIT: Masked Generative Image Transformer
Huiwen Chang
Han Zhang
Lu Jiang
Ce Liu
William T. Freeman
ViT
158
695
0
08 Feb 2022
How to Understand Masked Autoencoders
How to Understand Masked Autoencoders
Shuhao Cao
Peng Xu
David Clifton
100
42
0
08 Feb 2022
data2vec: A General Framework for Self-supervised Learning in Speech,
  Vision and Language
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
Alexei Baevski
Wei-Ning Hsu
Qiantong Xu
Arun Babu
Jiatao Gu
Michael Auli
SSLVLMViT
123
863
0
07 Feb 2022
Corrupted Image Modeling for Self-Supervised Visual Pre-Training
Corrupted Image Modeling for Self-Supervised Visual Pre-Training
Yuxin Fang
Li Dong
Hangbo Bao
Xinggang Wang
Furu Wei
100
88
0
07 Feb 2022
Robust Semantic Communications Against Semantic Noise
Robust Semantic Communications Against Semantic Noise
Qiyu Hu
Guangyi Zhang
Zhijin Qin
Yunlong Cai
Guanding Yu
Geoffrey Ye Li
AAML
65
83
0
07 Feb 2022
OFA: Unifying Architectures, Tasks, and Modalities Through a Simple
  Sequence-to-Sequence Learning Framework
OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Peng Wang
An Yang
Rui Men
Junyang Lin
Shuai Bai
Zhikang Li
Jianxin Ma
Chang Zhou
Jingren Zhou
Hongxia Yang
MLLMObjD
176
884
0
07 Feb 2022
Context Autoencoder for Self-Supervised Representation Learning
Context Autoencoder for Self-Supervised Representation Learning
Xiaokang Chen
Mingyu Ding
Xiaodi Wang
Ying Xin
Shentong Mo
Yunhao Wang
Shumin Han
Ping Luo
Gang Zeng
Jingdong Wang
SSL
193
400
0
07 Feb 2022
Self-supervised Learning with Random-projection Quantizer for Speech
  Recognition
Self-supervised Learning with Random-projection Quantizer for Speech Recognition
Chung-Cheng Chiu
James Qin
Yu Zhang
Jiahui Yu
Yonghui Wu
SSL
111
169
0
03 Feb 2022
AtmoDist: Self-supervised Representation Learning for Atmospheric
  Dynamics
AtmoDist: Self-supervised Representation Learning for Atmospheric Dynamics
Sebastian Hoffmann
C. Lessig
AI4Cl
75
10
0
02 Feb 2022
Adversarial Masking for Self-Supervised Learning
Adversarial Masking for Self-Supervised Learning
Yuge Shi
N. Siddharth
Philip Torr
Adam R. Kosiorek
SSL
138
87
0
31 Jan 2022
A Frustratingly Simple Approach for End-to-End Image Captioning
A Frustratingly Simple Approach for End-to-End Image Captioning
Ziyang Luo
Yadong Xi
Rongsheng Zhang
Jing Ma
VLMMLLM
75
16
0
30 Jan 2022
Research on Patch Attentive Neural Process
Research on Patch Attentive Neural Process
Xiaohan Yu
Shao‐Chen Mao
71
1
0
29 Jan 2022
Mask-based Latent Reconstruction for Reinforcement Learning
Mask-based Latent Reconstruction for Reinforcement Learning
Tao Yu
Zhizheng Zhang
Cuiling Lan
Yan Lu
Zhibo Chen
101
45
0
28 Jan 2022
Previous
123...93949596
Next