ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.06377
  4. Cited By
Masked Autoencoders Are Scalable Vision Learners
v1v2v3 (latest)

Masked Autoencoders Are Scalable Vision Learners

11 November 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
    ViTTPM
ArXiv (abs)PDFHTML

Papers citing "Masked Autoencoders Are Scalable Vision Learners"

50 / 4,778 papers shown
Title
Jointly Learning Visual and Auditory Speech Representations from Raw
  Data
Jointly Learning Visual and Auditory Speech Representations from Raw Data
A. Haliassos
Pingchuan Ma
Rodrigo Mira
Stavros Petridis
Maja Pantic
SSL
92
49
0
12 Dec 2022
CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1
  Accuracy with ViT-B and ViT-L on ImageNet
CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1 Accuracy with ViT-B and ViT-L on ImageNet
Xiaoyi Dong
Jianmin Bao
Ting Zhang
Dongdong Chen
Shuyang Gu
Weiming Zhang
Lu Yuan
Dong Chen
Fang Wen
Nenghai Yu
CLIP
74
37
0
12 Dec 2022
SRoUDA: Meta Self-training for Robust Unsupervised Domain Adaptation
SRoUDA: Meta Self-training for Robust Unsupervised Domain Adaptation
Wan-Xuan Zhu
Jia-Li Yin
Bo-Hao Chen
Ximeng Liu
89
6
0
12 Dec 2022
ALSO: Automotive Lidar Self-supervision by Occupancy estimation
ALSO: Automotive Lidar Self-supervision by Occupancy estimation
Alexandre Boulch
Corentin Sautier
Bjoern Michele
Gilles Puy
Renaud Marlet
SSL3DPC
80
58
0
12 Dec 2022
DeepCut: Unsupervised Segmentation using Graph Neural Networks
  Clustering
DeepCut: Unsupervised Segmentation using Graph Neural Networks Clustering
Amit Aflalo
Shai Bagon
Tamar Kashti
Yonina C. Eldar
GNN
106
35
0
12 Dec 2022
BEV-MAE: Bird's Eye View Masked Autoencoders for Point Cloud
  Pre-training in Autonomous Driving Scenarios
BEV-MAE: Bird's Eye View Masked Autoencoders for Point Cloud Pre-training in Autonomous Driving Scenarios
Zhiwei Lin
Yongtao Wang
Shengxiang Qi
Nan Dong
Ming-Hsuan Yang
3DPC
72
16
0
12 Dec 2022
On Pre-Training for Visuo-Motor Control: Revisiting a
  Learning-from-Scratch Baseline
On Pre-Training for Visuo-Motor Control: Revisiting a Learning-from-Scratch Baseline
Nicklas Hansen
Zhecheng Yuan
Yanjie Ze
Tongzhou Mu
Aravind Rajeswaran
H. Su
Huazhe Xu
Xiaolong Wang
91
66
0
12 Dec 2022
Accelerating Dataset Distillation via Model Augmentation
Accelerating Dataset Distillation via Model Augmentation
Lei Zhang
Jie M. Zhang
Bowen Lei
Subhabrata Mukherjee
Xiang Pan
Bo Zhao
Caiwen Ding
Yongbin Li
Dongkuan Xu
DD
139
66
0
12 Dec 2022
Masked autoencoders are effective solution to transformer data-hungry
Masked autoencoders are effective solution to transformer data-hungry
Jia-ju Mao
Honggu Zhou
Xuesong Yin
Binling Nie
MedIm
120
7
0
12 Dec 2022
SEPT: Towards Scalable and Efficient Visual Pre-Training
SEPT: Towards Scalable and Efficient Visual Pre-Training
Yiqi Lin
Huabin Zheng
Huaping Zhong
Jinjing Zhu
Weijia Li
Conghui He
Lin Wang
80
2
0
11 Dec 2022
Complete-to-Partial 4D Distillation for Self-Supervised Point Cloud
  Sequence Representation Learning
Complete-to-Partial 4D Distillation for Self-Supervised Point Cloud Sequence Representation Learning
Zhuoyang Zhang
Yu Dong
Yunze Liu
Li Yi
3DPCAI4TS
100
20
0
10 Dec 2022
Uniform Masking Prevails in Vision-Language Pretraining
Uniform Masking Prevails in Vision-Language Pretraining
Siddharth Verma
Yuchen Lu
Rui Hou
Hanchao Yu
Nicolas Ballas
Madian Khabsa
Amjad Almahairi
VLM
50
0
0
10 Dec 2022
A Whac-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One
  Amplifies Others
A Whac-A-Mole Dilemma: Shortcuts Come in Multiples Where Mitigating One Amplifies Others
Zhiheng Li
Ivan Evtimov
Albert Gordo
C. Hazirbas
Tal Hassner
Cristian Canton Ferrer
Chenliang Xu
Mark Ibrahim
86
78
0
09 Dec 2022
Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints
Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints
Aran Komatsuzaki
J. Puigcerver
James Lee-Thorp
Carlos Riquelme Ruiz
Basil Mustafa
Joshua Ainslie
Yi Tay
Mostafa Dehghani
N. Houlsby
MoMeMoE
108
124
0
09 Dec 2022
VindLU: A Recipe for Effective Video-and-Language Pretraining
VindLU: A Recipe for Effective Video-and-Language Pretraining
Feng Cheng
Xizi Wang
Jie Lei
David J. Crandall
Joey Tianyi Zhou
Gedas Bertasius
VLM
125
81
0
09 Dec 2022
Audiovisual Masked Autoencoders
Audiovisual Masked Autoencoders
Mariana-Iuliana Georgescu
Eduardo Fonseca
Radu Tudor Ionescu
Mario Lucic
Cordelia Schmid
Anurag Arnab
SSL
118
45
0
09 Dec 2022
Masked Lip-Sync Prediction by Audio-Visual Contextual Exploitation in
  Transformers
Masked Lip-Sync Prediction by Audio-Visual Contextual Exploitation in Transformers
Yasheng Sun
Hang Zhou
Kaisiyuan Wang
Qianyi Wu
Zhibin Hong
Jingtuo Liu
Errui Ding
Jingdong Wang
Ziwei Liu
Koike Hideki
62
34
0
09 Dec 2022
Co-training $2^L$ Submodels for Visual Recognition
Co-training 2L2^L2L Submodels for Visual Recognition
Hugo Touvron
Matthieu Cord
Maxime Oquab
Piotr Bojanowski
Jakob Verbeek
Hervé Jégou
VLM
72
10
0
09 Dec 2022
VideoDex: Learning Dexterity from Internet Videos
VideoDex: Learning Dexterity from Internet Videos
Kenneth Shaw
Shikhar Bahl
Deepak Pathak
103
96
0
08 Dec 2022
Masked Video Distillation: Rethinking Masked Feature Modeling for
  Self-supervised Video Representation Learning
Masked Video Distillation: Rethinking Masked Feature Modeling for Self-supervised Video Representation Learning
Rui Wang
Dongdong Chen
Zuxuan Wu
Yinpeng Chen
Xiyang Dai
Mengchen Liu
Lu Yuan
Yu-Gang Jiang
VGen
124
94
0
08 Dec 2022
OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist
  Models
OFASys: A Multi-Modal Multi-Task Learning System for Building Generalist Models
Jinze Bai
Rui Men
Han Yang
Xuancheng Ren
Kai Dang
...
Wenhang Ge
Jianxin Ma
Junyang Lin
Jingren Zhou
Chang Zhou
88
16
0
08 Dec 2022
Deep Incubation: Training Large Models by Divide-and-Conquering
Deep Incubation: Training Large Models by Divide-and-Conquering
Zanlin Ni
Yulin Wang
Jiangwei Yu
Haojun Jiang
Yu Cao
Gao Huang
VLM
94
11
0
08 Dec 2022
Group Generalized Mean Pooling for Vision Transformer
Group Generalized Mean Pooling for Vision Transformer
ByungSoo Ko
Han-Gyu Kim
Byeongho Heo
Sangdoo Yun
Sanghyuk Chun
Geonmo Gu
Wonjae Kim
ViT
88
1
0
08 Dec 2022
MixBoost: Improving the Robustness of Deep Neural Networks by Boosting
  Data Augmentation
MixBoost: Improving the Robustness of Deep Neural Networks by Boosting Data Augmentation
Zhendong Liu
Wenyu Jiang
Min Guo
Chongjun Wang
AAML
74
1
0
08 Dec 2022
Occlusion-Robust FAU Recognition by Mining Latent Space of Masked
  Autoencoders
Occlusion-Robust FAU Recognition by Mining Latent Space of Masked Autoencoders
Minyang Jiang
Yongwei Wang
Martin J. McKeown
Jane Wang
CVBM
49
2
0
08 Dec 2022
Teaching Matters: Investigating the Role of Supervision in Vision
  Transformers
Teaching Matters: Investigating the Role of Supervision in Vision Transformers
Matthew Walmer
Saksham Suri
Kamal Gupta
Abhinav Shrivastava
77
33
0
07 Dec 2022
Diffusion Art or Digital Forgery? Investigating Data Replication in
  Diffusion Models
Diffusion Art or Digital Forgery? Investigating Data Replication in Diffusion Models
Gowthami Somepalli
Vasu Singla
Micah Goldblum
Jonas Geiping
Tom Goldstein
105
330
0
07 Dec 2022
ViTPose++: Vision Transformer for Generic Body Pose Estimation
ViTPose++: Vision Transformer for Generic Body Pose Estimation
Yufei Xu
Jing Zhang
Qiming Zhang
Dacheng Tao
ViT
181
46
0
07 Dec 2022
SimVTP: Simple Video Text Pre-training with Masked Autoencoders
SimVTP: Simple Video Text Pre-training with Masked Autoencoders
Yue Ma
Tianyu Yang
Yin Shan
Xiu Li
90
27
0
07 Dec 2022
MEDIAR: Harmony of Data-Centric and Model-Centric for Multi-Modality
  Microscopy
MEDIAR: Harmony of Data-Centric and Model-Centric for Multi-Modality Microscopy
Gihun Lee
Sangmook Kim
Joonkee Kim
Se-Young Yun
MedIm
63
20
0
07 Dec 2022
Visual Query Tuning: Towards Effective Usage of Intermediate
  Representations for Parameter and Memory Efficient Transfer Learning
Visual Query Tuning: Towards Effective Usage of Intermediate Representations for Parameter and Memory Efficient Transfer Learning
Cheng-Hao Tu
Zheda Mai
Wei-Lun Chao
57
48
0
06 Dec 2022
InternVideo: General Video Foundation Models via Generative and
  Discriminative Learning
InternVideo: General Video Foundation Models via Generative and Discriminative Learning
Yi Wang
Kunchang Li
Yizhuo Li
Yinan He
Bingkun Huang
...
Junting Pan
Jiashuo Yu
Yali Wang
Limin Wang
Yu Qiao
VLMVGen
174
332
0
06 Dec 2022
Rethinking the Objectives of Vector-Quantized Tokenizers for Image
  Synthesis
Rethinking the Objectives of Vector-Quantized Tokenizers for Image Synthesis
Yuchao Gu
Xintao Wang
Yixiao Ge
Ying Shan
Xiaohu Qie
Mike Zheng Shou
DiffM
98
22
0
06 Dec 2022
GD-MAE: Generative Decoder for MAE Pre-training on LiDAR Point Clouds
GD-MAE: Generative Decoder for MAE Pre-training on LiDAR Point Clouds
Honghui Yang
Tong He
Jiaheng Liu
Huaguan Chen
Boxi Wu
Binbin Lin
Xiaofei He
Wanli Ouyang
130
62
0
06 Dec 2022
FlowFace: Semantic Flow-guided Shape-aware Face Swapping
FlowFace: Semantic Flow-guided Shape-aware Face Swapping
Hao Zeng
Wei Zhang
Changjie Fan
Tangjie Lv
Suzhe Wang
Zhimeng Zhang
Bowen Ma
Lincheng Li
Yu-qiong Ding
Xin Yu
CVBM
66
7
0
06 Dec 2022
Self-Supervised Audio-Visual Speech Representations Learning By
  Multimodal Self-Distillation
Self-Supervised Audio-Visual Speech Representations Learning By Multimodal Self-Distillation
Jing-Xuan Zhang
Genshun Wan
Zhenhua Ling
Jia Pan
Jianqing Gao
Cong Liu
SSL
81
13
0
06 Dec 2022
Unifying Vision, Text, and Layout for Universal Document Processing
Unifying Vision, Text, and Layout for Universal Document Processing
Zineng Tang
Ziyi Yang
Guoxin Wang
Yuwei Fang
Yang Liu
Chenguang Zhu
Michael Zeng
Chao-Yue Zhang
Joey Tianyi Zhou
VLM
131
115
0
05 Dec 2022
Images Speak in Images: A Generalist Painter for In-Context Visual
  Learning
Images Speak in Images: A Generalist Painter for In-Context Visual Learning
Xinlong Wang
Wen Wang
Yue Cao
Chunhua Shen
Tiejun Huang
VLMMLLM
161
262
0
05 Dec 2022
One-shot Implicit Animatable Avatars with Model-based Priors
One-shot Implicit Animatable Avatars with Model-based Priors
Yangyi Huang
Hongwei Yi
Weiyang Liu
Haofan Wang
Boxi Wu
Wenxiao Wang
Binbin Lin
Debing Zhang
Deng Cai
3DH
122
33
0
05 Dec 2022
Location-Aware Self-Supervised Transformers for Semantic Segmentation
Location-Aware Self-Supervised Transformers for Semantic Segmentation
Mathilde Caron
N. Houlsby
Cordelia Schmid
ViT
70
14
0
05 Dec 2022
Learning Imbalanced Data with Vision Transformers
Learning Imbalanced Data with Vision Transformers
Zhengzhuo Xu
R. Liu
Shuo Yang
Zenghao Chai
Chun Yuan
103
36
0
05 Dec 2022
Exploring Stochastic Autoregressive Image Modeling for Visual
  Representation
Exploring Stochastic Autoregressive Image Modeling for Visual Representation
Yu-Hang Qi
Fan Yang
Yousong Zhu
Yufei Liu
Liwei Wu
Rui Zhao
Wei Li
DiffM
57
13
0
03 Dec 2022
MIC: Masked Image Consistency for Context-Enhanced Domain Adaptation
MIC: Masked Image Consistency for Context-Enhanced Domain Adaptation
Lukas Hoyer
Dengxin Dai
Haoran Wang
Luc Van Gool
139
230
0
02 Dec 2022
Multi-scale Transformer Network with Edge-aware Pre-training for
  Cross-Modality MR Image Synthesis
Multi-scale Transformer Network with Edge-aware Pre-training for Cross-Modality MR Image Synthesis
Yonghao Li
Tao Zhou
Kelei He
Yi Zhou
Dinggang Shen
ViTMedIm
58
29
0
02 Dec 2022
Masked Contrastive Pre-Training for Efficient Video-Text Retrieval
Masked Contrastive Pre-Training for Efficient Video-Text Retrieval
Fangxun Shu
Biaolong Chen
Yue Liao
Shuwen Xiao
Wenyu Sun
Xiaobo Li
Yousong Zhu
Jinqiao Wang
Si Liu
CLIP
77
12
0
02 Dec 2022
Scaling Language-Image Pre-training via Masking
Scaling Language-Image Pre-training via Masking
Yanghao Li
Haoqi Fan
Ronghang Hu
Christoph Feichtenhofer
Kaiming He
CLIPVLM
111
330
0
01 Dec 2022
Improving Zero-Shot Models with Label Distribution Priors
Improving Zero-Shot Models with Label Distribution Priors
Jonathan Kahana
Niv Cohen
Yedid Hoshen
VLM
136
14
0
01 Dec 2022
ResFormer: Scaling ViTs with Multi-Resolution Training
ResFormer: Scaling ViTs with Multi-Resolution Training
Rui Tian
Zuxuan Wu
Qiuju Dai
Hang-Rui Hu
Yu Qiao
Yu-Gang Jiang
ViT
98
35
0
01 Dec 2022
Simplifying and Understanding State Space Models with Diagonal Linear
  RNNs
Simplifying and Understanding State Space Models with Diagonal Linear RNNs
Ankit Gupta
Harsh Mehta
Jonathan Berant
75
21
0
01 Dec 2022
Hyperbolic Contrastive Learning for Visual Representations beyond
  Objects
Hyperbolic Contrastive Learning for Visual Representations beyond Objects
Songwei Ge
Shlok Kumar Mishra
Simon Kornblith
Chun-Liang Li
David Jacobs
OCLSSL
129
57
0
01 Dec 2022
Previous
123...798081...949596
Next