ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.06377
  4. Cited By
Masked Autoencoders Are Scalable Vision Learners
v1v2v3 (latest)

Masked Autoencoders Are Scalable Vision Learners

11 November 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
    ViTTPM
ArXiv (abs)PDFHTML

Papers citing "Masked Autoencoders Are Scalable Vision Learners"

50 / 4,777 papers shown
Title
Revisiting adapters with adversarial training
Revisiting adapters with adversarial training
Sylvestre-Alvise Rebuffi
Francesco Croce
Sven Gowal
AAML
62
17
0
10 Oct 2022
Exploiting map information for self-supervised learning in motion
  forecasting
Exploiting map information for self-supervised learning in motion forecasting
Caio Azevedo
Thomas Gilles
S. Sabatini
D. Tsishkou
SSL
115
9
0
10 Oct 2022
Denoising Masked AutoEncoders Help Robust Classification
Denoising Masked AutoEncoders Help Robust Classification
Quanlin Wu
Hang Ye
Yuntian Gu
Huishuai Zhang
Liwei Wang
Di He
77
22
0
10 Oct 2022
A Comprehensive Survey of Data Augmentation in Visual Reinforcement
  Learning
A Comprehensive Survey of Data Augmentation in Visual Reinforcement Learning
Guozheng Ma
Zhen Wang
Zhecheng Yuan
Xueqian Wang
Bo Yuan
Dacheng Tao
OffRL
87
28
0
10 Oct 2022
Scaling Up Probabilistic Circuits by Latent Variable Distillation
Scaling Up Probabilistic Circuits by Latent Variable Distillation
Hoang Trung-Dung
Honghua Zhang
Guy Van den Broeck
TPM
71
27
0
10 Oct 2022
Learning to Decompose Visual Features with Latent Textual Prompts
Learning to Decompose Visual Features with Latent Textual Prompts
Feng Wang
Manling Li
Xudong Lin
Hairong Lv
Alex Schwing
Heng Ji
VLM
103
25
0
09 Oct 2022
MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language
  Representation Learning
MAMO: Masked Multimodal Modeling for Fine-Grained Vision-Language Representation Learning
Zijia Zhao
Longteng Guo
Xingjian He
Shuai Shao
Zehuan Yuan
Jing Liu
105
9
0
09 Oct 2022
Deep Span Representations for Named Entity Recognition
Deep Span Representations for Named Entity Recognition
Enwei Zhu
Yiyang Liu
Jinpeng Li
68
11
0
09 Oct 2022
Self-supervised Video Representation Learning with Motion-Aware Masked
  Autoencoders
Self-supervised Video Representation Learning with Motion-Aware Masked Autoencoders
Haosen Yang
Deng Huang
Bin Wen
Jiannan Wu
Huanjin Yao
Yi Jiang
Xiatian Zhu
Zehuan Yuan
54
20
0
09 Oct 2022
VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature
  Alignment
VoLTA: Vision-Language Transformer with Weakly-Supervised Local-Feature Alignment
Shraman Pramanick
Li Jing
Sayan Nag
Jiachen Zhu
Hardik Shah
Yann LeCun
Ramalingam Chellappa
82
22
0
09 Oct 2022
Robustness of Unsupervised Representation Learning without Labels
Robustness of Unsupervised Representation Learning without Labels
Aleksandar Petrov
Marta Z. Kwiatkowska
OffRL
90
2
0
08 Oct 2022
(Fusionformer):Exploiting the Joint Motion Synergy with Fusion Network
  Based On Transformer for 3D Human Pose Estimation
(Fusionformer):Exploiting the Joint Motion Synergy with Fusion Network Based On Transformer for 3D Human Pose Estimation
Xinwei Yu
Xiaohua Zhang
ViT
98
0
0
08 Oct 2022
ViewFool: Evaluating the Robustness of Visual Recognition to Adversarial
  Viewpoints
ViewFool: Evaluating the Robustness of Visual Recognition to Adversarial Viewpoints
Yinpeng Dong
Shouwei Ruan
Hang Su
Cai Kang
Xingxing Wei
Junyi Zhu
AAML
85
50
0
08 Oct 2022
AlphaTuning: Quantization-Aware Parameter-Efficient Adaptation of
  Large-Scale Pre-Trained Language Models
AlphaTuning: Quantization-Aware Parameter-Efficient Adaptation of Large-Scale Pre-Trained Language Models
S. Kwon
Jeonghoon Kim
Jeongin Bae
Kang Min Yoo
Jin-Hwa Kim
Baeseong Park
Byeongwook Kim
Jung-Woo Ha
Nako Sung
Dongsoo Lee
MQ
117
31
0
08 Oct 2022
SVL-Adapter: Self-Supervised Adapter for Vision-Language Pretrained
  Models
SVL-Adapter: Self-Supervised Adapter for Vision-Language Pretrained Models
Omiros Pantazis
Gabriel J. Brostow
Kate E. Jones
Oisin Mac Aodha
VLM
82
42
0
07 Oct 2022
Pre-trained Adversarial Perturbations
Pre-trained Adversarial Perturbations
Y. Ban
Yinpeng Dong
AAML
98
24
0
07 Oct 2022
Critical Learning Periods for Multisensory Integration in Deep Networks
Critical Learning Periods for Multisensory Integration in Deep Networks
Michael Kleinman
Alessandro Achille
Stefano Soatto
116
11
0
06 Oct 2022
Real-World Robot Learning with Masked Visual Pre-training
Real-World Robot Learning with Masked Visual Pre-training
Ilija Radosavovic
Tete Xiao
Stephen James
Pieter Abbeel
Jitendra Malik
Trevor Darrell
SSL
246
254
0
06 Oct 2022
VIMA: General Robot Manipulation with Multimodal Prompts
VIMA: General Robot Manipulation with Multimodal Prompts
Yunfan Jiang
Agrim Gupta
Zichen Zhang
Guanzhi Wang
Yongqiang Dou
Yanjun Chen
Li Fei-Fei
Anima Anandkumar
Yuke Zhu
Linxi Fan
LM&Ro
117
355
0
06 Oct 2022
The Lie Derivative for Measuring Learned Equivariance
The Lie Derivative for Measuring Learned Equivariance
Nate Gruver
Marc Finzi
Micah Goldblum
A. Wilson
99
40
0
06 Oct 2022
Effective Self-supervised Pre-training on Low-compute Networks without
  Distillation
Effective Self-supervised Pre-training on Low-compute Networks without Distillation
Fuwen Tan
F. Saleh
Brais Martínez
81
4
0
06 Oct 2022
PSVRF: Learning to restore Pitch-Shifted Voice without reference
Yangfu Li
Xiaodan Lin
Jiaxin Yang
55
0
0
06 Oct 2022
Active Image Indexing
Active Image Indexing
Pierre Fernandez
Matthijs Douze
Hervé Jégou
Teddy Furon
VLM
67
10
0
05 Oct 2022
Image Masking for Robust Self-Supervised Monocular Depth Estimation
Image Masking for Robust Self-Supervised Monocular Depth Estimation
Hemang Chawla
Kishaan Jeeveswaran
Elahe Arani
Bahram Zonooz
MDE
97
7
0
05 Oct 2022
Vision+X: A Survey on Multimodal Learning in the Light of Data
Vision+X: A Survey on Multimodal Learning in the Light of Data
Ye Zhu
Yuehua Wu
N. Sebe
Yan Yan
107
19
0
05 Oct 2022
RankMe: Assessing the downstream performance of pretrained
  self-supervised representations by their rank
RankMe: Assessing the downstream performance of pretrained self-supervised representations by their rank
Q. Garrido
Randall Balestriero
Laurent Najman
Yann LeCun
SSL
138
79
0
05 Oct 2022
Exploring The Role of Mean Teachers in Self-supervised Masked
  Auto-Encoders
Exploring The Role of Mean Teachers in Self-supervised Masked Auto-Encoders
Youngwan Lee
Jeffrey Willette
Jonghee Kim
Juho Lee
Sung Ju Hwang
93
16
0
05 Oct 2022
Self-supervised Pre-training for Semantic Segmentation in an Indoor
  Scene
Self-supervised Pre-training for Semantic Segmentation in an Indoor Scene
Sulabh Shrestha
Yimeng Li
Jana Kosecka
3DPCSSLSSeg
93
2
0
04 Oct 2022
Backdoor Attacks in the Supply Chain of Masked Image Modeling
Backdoor Attacks in the Supply Chain of Masked Image Modeling
Xinyue Shen
Xinlei He
Zheng Li
Yun Shen
Michael Backes
Yang Zhang
78
8
0
04 Oct 2022
VICRegL: Self-Supervised Learning of Local Visual Features
VICRegL: Self-Supervised Learning of Local Visual Features
Adrien Bardes
Jean Ponce
Yann LeCun
SSL
99
127
0
04 Oct 2022
Learning from the Best: Contrastive Representations Learning Across
  Sensor Locations for Wearable Activity Recognition
Learning from the Best: Contrastive Representations Learning Across Sensor Locations for Wearable Activity Recognition
Vitor Fortes Rey
Sungho Suh
P. Lukowicz
SSLHAI
74
12
0
04 Oct 2022
MTSMAE: Masked Autoencoders for Multivariate Time-Series Forecasting
MTSMAE: Masked Autoencoders for Multivariate Time-Series Forecasting
Peiwang Tang
Xianchao Zhang
AI4TS
80
14
0
04 Oct 2022
CLIP2Point: Transfer CLIP to Point Cloud Classification with Image-Depth
  Pre-training
CLIP2Point: Transfer CLIP to Point Cloud Classification with Image-Depth Pre-training
Tianyu Huang
Bowen Dong
Yunhan Yang
Xiaoshui Huang
Rynson W. H. Lau
Wanli Ouyang
W. Zuo
VLM3DPCCLIP
138
150
0
03 Oct 2022
Expediting Large-Scale Vision Transformer for Dense Prediction without
  Fine-tuning
Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning
Weicong Liang
Yuhui Yuan
Henghui Ding
Xiao Luo
Weihong Lin
Ding Jia
Zheng Zhang
Chao Zhang
Hanhua Hu
117
31
0
03 Oct 2022
Attention Distillation: self-supervised vision transformer students need
  more guidance
Attention Distillation: self-supervised vision transformer students need more guidance
Kai Wang
Fei Yang
Joost van de Weijer
ViT
57
18
0
03 Oct 2022
Enhancing Fine-Grained 3D Object Recognition using Hybrid Multi-Modal
  Vision Transformer-CNN Models
Enhancing Fine-Grained 3D Object Recognition using Hybrid Multi-Modal Vision Transformer-CNN Models
Songsong Xiong
Georgios Tziafas
Hamidreza Kasaei
ViT
53
3
0
03 Oct 2022
Masked Supervised Learning for Semantic Segmentation
Masked Supervised Learning for Semantic Segmentation
H. Zunair
A. Ben Hamza
65
8
0
03 Oct 2022
Fill in Fabrics: Body-Aware Self-Supervised Inpainting for Image-Based
  Virtual Try-On
Fill in Fabrics: Body-Aware Self-Supervised Inpainting for Image-Based Virtual Try-On
H. Zunair
Y. Gobeil
Samuel Mercier
A. Ben Hamza
44
2
0
03 Oct 2022
Towards a Unified View on Visual Parameter-Efficient Transfer Learning
Towards a Unified View on Visual Parameter-Efficient Transfer Learning
Bruce X. B. Yu
Jianlong Chang
Lin Liu
Qi Tian
Changan Chen
VPVLMVLM
115
36
0
03 Oct 2022
Under the Cover Infant Pose Estimation using Multimodal Data
Under the Cover Infant Pose Estimation using Multimodal Data
Daniel G. Kyrollos
A. Fuller
K. Greenwood
J. Harrold
J.R. Green
3DH
81
9
0
03 Oct 2022
Contrastive Audio-Visual Masked Autoencoder
Contrastive Audio-Visual Masked Autoencoder
Yuan Gong
Andrew Rouditchenko
Alexander H. Liu
David Harwath
Leonid Karlinsky
Hilde Kuehne
James R. Glass
122
128
0
02 Oct 2022
Federated Training of Dual Encoding Models on Small Non-IID Client
  Datasets
Federated Training of Dual Encoding Models on Small Non-IID Client Datasets
Raviteja Vemulapalli
Warren Morningstar
Philip Mansfield
Hubert Eichner
K. Singhal
Arash Afkanpour
Bradley Green
FedML
94
2
0
30 Sep 2022
VIP: Towards Universal Visual Reward and Representation via
  Value-Implicit Pre-Training
VIP: Towards Universal Visual Reward and Representation via Value-Implicit Pre-Training
Yecheng Jason Ma
Shagun Sodhani
Dinesh Jayaraman
Osbert Bastani
Vikash Kumar
Amy Zhang
SSLOffRL
121
306
0
30 Sep 2022
Where Should I Spend My FLOPS? Efficiency Evaluations of Visual
  Pre-training Methods
Where Should I Spend My FLOPS? Efficiency Evaluations of Visual Pre-training Methods
Skanda Koppula
Yazhe Li
Evan Shelhamer
Andrew Jaegle
Nikhil Parthasarathy
Relja Arandjelović
João Carreira
Olivier J. Hénaff
84
9
0
30 Sep 2022
Slimmable Networks for Contrastive Self-supervised Learning
Slimmable Networks for Contrastive Self-supervised Learning
Shuai Zhao
Xiaohan Wang
Linchao Zhu
Yi Yang
59
1
0
30 Sep 2022
Rethinking the Learning Paradigm for Facial Expression Recognition
Rethinking the Learning Paradigm for Facial Expression Recognition
Weijie Wang
N. Sebe
Bruno Lepri
86
2
0
30 Sep 2022
Learning Transferable Spatiotemporal Representations from Natural Script
  Knowledge
Learning Transferable Spatiotemporal Representations from Natural Script Knowledge
Ziyun Zeng
Yuying Ge
Xihui Liu
Bin Chen
Ping Luo
Shutao Xia
Yixiao Ge
AI4TS
91
8
0
30 Sep 2022
Universal Prompt Tuning for Graph Neural Networks
Universal Prompt Tuning for Graph Neural Networks
Taoran Fang
Yunchao Zhang
Yang Yang
Chunping Wang
Lei Chen
122
59
0
30 Sep 2022
Self-Distillation for Further Pre-training of Transformers
Self-Distillation for Further Pre-training of Transformers
Seanie Lee
Minki Kang
Juho Lee
Sung Ju Hwang
Kenji Kawaguchi
100
8
0
30 Sep 2022
Improving Molecular Pretraining with Complementary Featurizations
Improving Molecular Pretraining with Complementary Featurizations
Yanqiao Zhu
Dingshuo Chen
Yuanqi Du
Yingze Wang
Qiang Liu
Shu Wu
AI4CE
70
7
0
29 Sep 2022
Previous
123...848586...949596
Next