ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.06377
  4. Cited By
Masked Autoencoders Are Scalable Vision Learners
v1v2v3 (latest)

Masked Autoencoders Are Scalable Vision Learners

11 November 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
    ViTTPM
ArXiv (abs)PDFHTML

Papers citing "Masked Autoencoders Are Scalable Vision Learners"

50 / 4,778 papers shown
Title
Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music
  Generation Task
Exploring the Efficacy of Pre-trained Checkpoints in Text-to-Music Generation Task
Shangda Wu
Maosong Sun
76
20
0
21 Nov 2022
Contrastive Masked Autoencoders for Self-Supervised Video Hashing
Contrastive Masked Autoencoders for Self-Supervised Video Hashing
Yuting Wang
Jinpeng Wang
Bin Chen
Ziyun Zeng
Shutao Xia
54
22
0
21 Nov 2022
Unifying Vision-Language Representation Space with Single-tower
  Transformer
Unifying Vision-Language Representation Space with Single-tower Transformer
Jiho Jang
Chaerin Kong
D. Jeon
Seonhoon Kim
Nojun Kwak
113
21
0
21 Nov 2022
Diffusion-Based Scene Graph to Image Generation with Masked Contrastive
  Pre-Training
Diffusion-Based Scene Graph to Image Generation with Masked Contrastive Pre-Training
Ling Yang
Zhilin Huang
Yang Song
Shenda Hong
Ge Li
Wentao Zhang
Tengjiao Wang
Guohao Li
Ming-Hsuan Yang
104
57
0
21 Nov 2022
Towards Generalizable Graph Contrastive Learning: An Information Theory
  Perspective
Towards Generalizable Graph Contrastive Learning: An Information Theory Perspective
Yige Yuan
Bingbing Xu
Huawei Shen
Qi Cao
Keting Cen
Wen Zheng
Xueqi Cheng
77
13
0
20 Nov 2022
UniMASK: Unified Inference in Sequential Decision Problems
UniMASK: Unified Inference in Sequential Decision Problems
Micah Carroll
Orr Paradise
Jessy Lin
Raluca Georgescu
Mingfei Sun
...
Stephanie Milani
Katja Hofmann
Matthew J. Hausknecht
Anca Dragan
Sam Devlin
OffRL
109
22
0
20 Nov 2022
Learning to Generate Image Embeddings with User-level Differential
  Privacy
Learning to Generate Image Embeddings with User-level Differential Privacy
Zheng Xu
Maxwell D. Collins
Yuxiao Wang
Liviu Panait
Sewoong Oh
S. Augenstein
Ting Liu
Florian Schroff
H. B. McMahan
FedML
99
31
0
20 Nov 2022
Karyotype AI for Precision Oncology
Karyotype AI for Precision Oncology
Z. Shamsi
D. Bryant
Jacob M Wilson
X. Qu
Kumar Avinava Dubey
...
F. Appelbaum
K. Choromanski
A. Bashir
M. Fang
Min Fang
114
0
0
20 Nov 2022
Peeling the Onion: Hierarchical Reduction of Data Redundancy for
  Efficient Vision Transformer Training
Peeling the Onion: Hierarchical Reduction of Data Redundancy for Efficient Vision Transformer Training
Zhenglun Kong
Haoyu Ma
Geng Yuan
Mengshu Sun
Yanyue Xie
...
Tianlong Chen
Xiaolong Ma
Xiaohui Xie
Zhangyang Wang
Yanzhi Wang
ViT
114
24
0
19 Nov 2022
EVEREST: Efficient Masked Video Autoencoder by Removing Redundant
  Spatiotemporal Tokens
EVEREST: Efficient Masked Video Autoencoder by Removing Redundant Spatiotemporal Tokens
Sun-Kyoo Hwang
Jaehong Yoon
Youngwan Lee
Sung Ju Hwang
85
6
0
19 Nov 2022
Bayesian autoencoders for data-driven discovery of coordinates,
  governing equations and fundamental constants
Bayesian autoencoders for data-driven discovery of coordinates, governing equations and fundamental constants
Liyao (Mars) Gao
J. Nathan Kutz
AI4CE
90
22
0
19 Nov 2022
Castling-ViT: Compressing Self-Attention via Switching Towards
  Linear-Angular Attention at Vision Transformer Inference
Castling-ViT: Compressing Self-Attention via Switching Towards Linear-Angular Attention at Vision Transformer Inference
Haoran You
Yunyang Xiong
Xiaoliang Dai
Bichen Wu
Peizhao Zhang
Haoqi Fan
Peter Vajda
Yingyan Lin
155
34
0
18 Nov 2022
CroCo v2: Improved Cross-view Completion Pre-training for Stereo
  Matching and Optical Flow
CroCo v2: Improved Cross-view Completion Pre-training for Stereo Matching and Optical Flow
Philippe Weinzaepfel
Thomas Lucas
Vincent Leroy
Yohann Cabon
Vaibhav Arora
Romain Brégier
G. Csurka
L. Antsfeld
Boris Chidlovskii
Jérôme Revaud
ViT
133
97
0
18 Nov 2022
Masked Autoencoders for Egocentric Video Understanding @ Ego4D Challenge
  2022
Masked Autoencoders for Egocentric Video Understanding @ Ego4D Challenge 2022
Jiachen Lei
Shuang Ma
Zhongjie Ba
Sai H. Vemprala
Ashish Kapoor
Kui Ren
EgoV
19
1
0
18 Nov 2022
$α$ DARTS Once More: Enhancing Differentiable Architecture Search
  by Masked Image Modeling
ααα DARTS Once More: Enhancing Differentiable Architecture Search by Masked Image Modeling
Bicheng Guo
Shuxuan Guo
Miaojing Shi
Peng Cheng
Shibo He
Jiming Chen
Kaicheng Yu
69
2
0
18 Nov 2022
Contrastive Losses Are Natural Criteria for Unsupervised Video
  Summarization
Contrastive Losses Are Natural Criteria for Unsupervised Video Summarization
Zongshang Pang
Yuta Nakashima
Mayu Otani
Hajime Nagahara
50
6
0
18 Nov 2022
Weighted Ensemble Self-Supervised Learning
Weighted Ensemble Self-Supervised Learning
Yangjun Ruan
Saurabh Singh
Warren Morningstar
Alexander A. Alemi
Sergey Ioffe
Ian S. Fischer
Joshua V. Dillon
FedML
85
16
0
18 Nov 2022
Data-Centric Debugging: mitigating model failures via targeted data
  collection
Data-Centric Debugging: mitigating model failures via targeted data collection
Sahil Singla
Atoosa Malemir Chegini
Mazda Moayeri
Soheil Feiz
99
4
0
17 Nov 2022
Towards All-in-one Pre-training via Maximizing Multi-modal Mutual
  Information
Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information
Weijie Su
Xizhou Zhu
Chenxin Tao
Lewei Lu
Bin Li
Gao Huang
Yu Qiao
Xiaogang Wang
Jie Zhou
Jifeng Dai
97
42
0
17 Nov 2022
CAE v2: Context Autoencoder with CLIP Target
CAE v2: Context Autoencoder with CLIP Target
Xinyu Zhang
Jiahui Chen
Junkun Yuan
Qiang Chen
Jian Wang
...
Jimin Pi
Kun Yao
Junyu Han
Errui Ding
Jingdong Wang
VLMCLIP
109
24
0
17 Nov 2022
Assessing Neural Network Robustness via Adversarial Pivotal Tuning
Assessing Neural Network Robustness via Adversarial Pivotal Tuning
Peter Ebert Christensen
Vésteinn Snaebjarnarson
Andrea Dittadi
Serge Belongie
Sagie Benaim
AAML
93
1
0
17 Nov 2022
EfficientTrain: Exploring Generalized Curriculum Learning for Training
  Visual Backbones
EfficientTrain: Exploring Generalized Curriculum Learning for Training Visual Backbones
Yulin Wang
Yang Yue
Rui Lu
Tian-De Liu
Zhaobai Zhong
S. Song
Gao Huang
90
29
0
17 Nov 2022
UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video
  UniFormer
UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer
Kunchang Li
Yali Wang
Yinan He
Yizhuo Li
Yi Wang
Limin Wang
Yu Qiao
ViT
122
113
0
17 Nov 2022
Exploring adaptation of VideoMAE for Audio-Visual Diarization & Social @
  Ego4d Looking at me Challenge
Exploring adaptation of VideoMAE for Audio-Visual Diarization & Social @ Ego4d Looking at me Challenge
Yinan He
Guo Chen
22
0
0
17 Nov 2022
AdaMAE: Adaptive Masking for Efficient Spatiotemporal Learning with
  Masked Autoencoders
AdaMAE: Adaptive Masking for Efficient Spatiotemporal Learning with Masked Autoencoders
W. G. C. Bandara
Naman Patel
A. Gholami
Mehdi Nikkhah
M. Agrawal
Vishal M. Patel
67
43
0
16 Nov 2022
MAGE: MAsked Generative Encoder to Unify Representation Learning and
  Image Synthesis
MAGE: MAsked Generative Encoder to Unify Representation Learning and Image Synthesis
Tianhong Li
Huiwen Chang
Shlok Kumar Mishra
Han Zhang
Dina Katabi
Dilip Krishnan
90
170
0
16 Nov 2022
Stare at What You See: Masked Image Modeling without Reconstruction
Stare at What You See: Masked Image Modeling without Reconstruction
Hongwei Xue
Peng Gao
Hongyang Li
Yu Qiao
Hao Sun
Houqiang Li
Jiebo Luo
68
32
0
16 Nov 2022
NEVIS'22: A Stream of 100 Tasks Sampled from 30 Years of Computer Vision
  Research
NEVIS'22: A Stream of 100 Tasks Sampled from 30 Years of Computer Vision Research
J. Bornschein
Alexandre Galashov
Ross Hemsley
Amal Rannen-Triki
Yutian Chen
...
Angeliki Lazaridou
Yee Whye Teh
Andrei A. Rusu
Razvan Pascanu
MarcÁurelio Ranzato
OODVLMAI4TS
111
18
0
15 Nov 2022
Masked Reconstruction Contrastive Learning with Information Bottleneck
  Principle
Masked Reconstruction Contrastive Learning with Information Bottleneck Principle
Ziwen Liu
Bonan li
Congying Han
Tiande Guo
Xuecheng Nie
SSL
66
2
0
15 Nov 2022
Self-supervised remote sensing feature learning: Learning Paradigms,
  Challenges, and Future Works
Self-supervised remote sensing feature learning: Learning Paradigms, Challenges, and Future Works
Chao Tao
Ji Qi
Mingning Guo
Qing Zhu
Haifeng Li
SSL
104
59
0
15 Nov 2022
Will Large-scale Generative Models Corrupt Future Datasets?
Will Large-scale Generative Models Corrupt Future Datasets?
Ryuichiro Hataya
Han Bao
Hiromi Arai
59
58
0
15 Nov 2022
Physics-Informed Machine Learning: A Survey on Problems, Methods and
  Applications
Physics-Informed Machine Learning: A Survey on Problems, Methods and Applications
Zhongkai Hao
Songming Liu
Yichi Zhang
Chengyang Ying
Yao Feng
Hang Su
Jun Zhu
PINNAI4CE
130
99
0
15 Nov 2022
FedTune: A Deep Dive into Efficient Federated Fine-Tuning with
  Pre-trained Transformers
FedTune: A Deep Dive into Efficient Federated Fine-Tuning with Pre-trained Transformers
Jinyu Chen
Wenchao Xu
Song Guo
Junxiao Wang
Jie Zhang
Yining Qi
FedML
83
36
0
15 Nov 2022
EVA: Exploring the Limits of Masked Visual Representation Learning at
  Scale
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
Yuxin Fang
Wen Wang
Binhui Xie
Quan-Sen Sun
Ledell Yu Wu
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
VLMCLIP
249
730
0
14 Nov 2022
MT4SSL: Boosting Self-Supervised Speech Representation Learning by
  Integrating Multiple Targets
MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets
Ziyang Ma
Zhisheng Zheng
Changli Tang
Yujin Wang
Xie Chen
124
20
0
14 Nov 2022
ParCNetV2: Oversized Kernel with Enhanced Attention
ParCNetV2: Oversized Kernel with Enhanced Attention
Ruihan Xu
Haokui Zhang
Wenze Hu
Shiliang Zhang
Xiaoyu Wang
ViT
85
6
0
14 Nov 2022
SSL4EO-S12: A Large-Scale Multi-Modal, Multi-Temporal Dataset for
  Self-Supervised Learning in Earth Observation
SSL4EO-S12: A Large-Scale Multi-Modal, Multi-Temporal Dataset for Self-Supervised Learning in Earth Observation
Yi Wang
Nassim Ait Ali Braham
Zhitong Xiong
Chenying Liu
C. Albrecht
Xiao Xiang Zhu
103
73
0
13 Nov 2022
Seeing Beyond the Brain: Conditional Diffusion Model with Sparse Masked
  Modeling for Vision Decoding
Seeing Beyond the Brain: Conditional Diffusion Model with Sparse Masked Modeling for Vision Decoding
Zijiao Chen
Jiaxin Qing
Tiange Xiang
Wan Lin Yue
J. Zhou
DiffMMedIm
132
155
0
13 Nov 2022
Demystify Self-Attention in Vision Transformers from a Semantic
  Perspective: Analysis and Application
Demystify Self-Attention in Vision Transformers from a Semantic Perspective: Analysis and Application
Leijie Wu
Song Guo
Yaohong Ding
Junxiao Wang
Wenchao Xu
Richard Yi Da Xu
Jiewei Zhang
57
2
0
13 Nov 2022
Point-DAE: Denoising Autoencoders for Self-supervised Point Cloud
  Learning
Point-DAE: Denoising Autoencoders for Self-supervised Point Cloud Learning
Yabin Zhang
Jiehong Lin
Ruihuang Li
Kui Jia
Lei Zhang
3DPC
88
8
0
13 Nov 2022
Perceptual Video Coding for Machines via Satisfied Machine Ratio
  Modeling
Perceptual Video Coding for Machines via Satisfied Machine Ratio Modeling
Qi Zhang
Shanshe Wang
Xinfeng Zhang
Chuanmin Jia
Jingshan Pan
Siwei Ma
Wen Gao
82
4
0
13 Nov 2022
MARLIN: Masked Autoencoder for facial video Representation LearnINg
MARLIN: Masked Autoencoder for facial video Representation LearnINg
Zhixi Cai
Shreya Ghosh
Kalin Stefanov
Abhinav Dhall
Jianfei Cai
Hamid Rezatofighi
Reza Haffari
Munawar Hayat
ViTCVBM
114
62
0
12 Nov 2022
One-Time Model Adaptation to Heterogeneous Clients: An Intra-Client and
  Inter-Image Attention Design
One-Time Model Adaptation to Heterogeneous Clients: An Intra-Client and Inter-Image Attention Design
Yikai Yan
Chaoyue Niu
Fan Wu
Qinya Li
Shaojie Tang
Chengfei Lyu
Guihai Chen
77
0
0
11 Nov 2022
Masked Contrastive Representation Learning
Masked Contrastive Representation Learning
Yuan Yao
Nandakishor Desai
M. Palaniswami
SSL
146
8
0
11 Nov 2022
A Comprehensive Survey of Transformers for Computer Vision
A Comprehensive Survey of Transformers for Computer Vision
Sonain Jamil
Md. Jalil Piran
Oh-Jin Kwon
ViT
78
54
0
11 Nov 2022
Federated Unsupervised Visual Representation Learning via Exploiting General Content and Personal Style
Federated Unsupervised Visual Representation Learning via Exploiting General Content and Personal Style
Yue Yang
Jingwei Sun
Ang Li
H. Li
Yiran Chen
OOD
131
0
0
11 Nov 2022
Unifying Flow, Stereo and Depth Estimation
Unifying Flow, Stereo and Depth Estimation
Haofei Xu
Jing Zhang
Jianfei Cai
Hamid Rezatofighi
Feng Yu
Dacheng Tao
Andreas Geiger
MDE
152
216
0
10 Nov 2022
Efficient Image Generation with Variadic Attention Heads
Efficient Image Generation with Variadic Attention Heads
Steven Walton
Ali Hassani
Xingqian Xu
Zhangyang Wang
Humphrey Shi
ViT
89
23
0
10 Nov 2022
Mask More and Mask Later: Efficient Pre-training of Masked Language
  Models by Disentangling the [MASK] Token
Mask More and Mask Later: Efficient Pre-training of Masked Language Models by Disentangling the [MASK] Token
Baohao Liao
David Thulke
Sanjika Hewavitharana
Hermann Ney
Christof Monz
75
9
0
09 Nov 2022
Masked Vision-Language Transformers for Scene Text Recognition
Masked Vision-Language Transformers for Scene Text Recognition
Jie Wu
Ying Peng
Shenmin Zhang
Weigang Qi
Jian Zhang
71
3
0
09 Nov 2022
Previous
123...818283...949596
Next