ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.06377
  4. Cited By
Masked Autoencoders Are Scalable Vision Learners
v1v2v3 (latest)

Masked Autoencoders Are Scalable Vision Learners

11 November 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
    ViTTPM
ArXiv (abs)PDFHTML

Papers citing "Masked Autoencoders Are Scalable Vision Learners"

50 / 4,779 papers shown
Title
Towards Flexible Multi-modal Document Models
Towards Flexible Multi-modal Document Models
Naoto Inoue
Kotaro Kikuchi
E. Simo-Serra
Mayu Otani
Kota Yamaguchi
83
22
0
31 Mar 2023
Where are we in the search for an Artificial Visual Cortex for Embodied
  Intelligence?
Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?
Arjun Majumdar
Karmesh Yadav
Sergio Arnaud
Yecheng Jason Ma
Claire Chen
...
Dhruv Batra
Yixin Lin
Oleksandr Maksymets
Aravind Rajeswaran
Franziska Meier
LM&Ro
81
185
0
31 Mar 2023
DIME-FM: DIstilling Multimodal and Efficient Foundation Models
DIME-FM: DIstilling Multimodal and Efficient Foundation Models
Ximeng Sun
Pengchuan Zhang
Peizhao Zhang
Hardik Shah
Kate Saenko
Xide Xia
VLM
109
22
0
31 Mar 2023
STMT: A Spatial-Temporal Mesh Transformer for MoCap-Based Action
  Recognition
STMT: A Spatial-Temporal Mesh Transformer for MoCap-Based Action Recognition
Xiaoyu Zhu
Po-Yao (Bernie) Huang
Junwei Liang
Celso M. de Melo
Alexander G. Hauptmann
76
12
0
31 Mar 2023
Self-Supervised Multimodal Learning: A Survey
Self-Supervised Multimodal Learning: A Survey
Yongshuo Zong
Oisin Mac Aodha
Timothy M. Hospedales
SSL
125
50
0
31 Mar 2023
LaCViT: A Label-aware Contrastive Fine-tuning Framework for Vision
  Transformers
LaCViT: A Label-aware Contrastive Fine-tuning Framework for Vision Transformers
Zijun Long
Zaiqiao Meng
Gerardo Aragon Camarasa
R. McCreadie
VLM
79
5
0
31 Mar 2023
You Only Train Once: Learning a General Anomaly Enhancement Network with
  Random Masks for Hyperspectral Anomaly Detection
You Only Train Once: Learning a General Anomaly Enhancement Network with Random Masks for Hyperspectral Anomaly Detection
Zhaoxu Li
Yingqian Wang
Chao Xiao
Qi Ling
Zaiping Lin
Wei An
67
36
0
31 Mar 2023
Exploring the Limits of Deep Image Clustering using Pretrained Models
Exploring the Limits of Deep Image Clustering using Pretrained Models
Nikolas Adaloglou
Félix D. P. Michels
Hamza Kalisch
M. Kollmann
VLM
80
29
0
31 Mar 2023
Visual Anomaly Detection via Dual-Attention Transformer and
  Discriminative Flow
Visual Anomaly Detection via Dual-Attention Transformer and Discriminative Flow
Haiming Yao
Wei Luo
Wenyong Yu
ViT
78
3
0
31 Mar 2023
Whether and When does Endoscopy Domain Pretraining Make Sense?
Whether and When does Endoscopy Domain Pretraining Make Sense?
Dominik Batić
Felix Holm
Ege Özsoy
Tobias Czempiel
Nassir Navab
35
7
0
30 Mar 2023
Beyond Appearance: a Semantic Controllable Self-Supervised Learning
  Framework for Human-Centric Visual Tasks
Beyond Appearance: a Semantic Controllable Self-Supervised Learning Framework for Human-Centric Visual Tasks
Weihua Chen
Xianzhe Xu
Jian Jia
Haowen Luo
Yaohua Wang
F. Wang
Rong Jin
Xiuyu Sun
SSL
93
100
0
30 Mar 2023
Robo3D: Towards Robust and Reliable 3D Perception against Corruptions
Robo3D: Towards Robust and Reliable 3D Perception against Corruptions
Lingdong Kong
You-Chen Liu
Xin Li
Runnan Chen
Wenwei Zhang
Jiawei Ren
Liang Pan
Kaili Chen
Ziwei Liu
141
94
0
30 Mar 2023
Anatomically aware dual-hop learning for pulmonary embolism detection in
  CT pulmonary angiograms
Anatomically aware dual-hop learning for pulmonary embolism detection in CT pulmonary angiograms
Florin Condrea
S. Rapaka
Lucian Itu
Puneet Sharma
J. Sperl
Mohamed Ali
Marius Leordeanu
66
5
0
30 Mar 2023
PoseFormerV2: Exploring Frequency Domain for Efficient and Robust 3D
  Human Pose Estimation
PoseFormerV2: Exploring Frequency Domain for Efficient and Robust 3D Human Pose Estimation
Qi-jun Zhao
Ce Zheng
Mengyuan Liu
Pichao Wang
Chong Chen
ViT
95
94
0
30 Mar 2023
Complementary Random Masking for RGB-Thermal Semantic Segmentation
Complementary Random Masking for RGB-Thermal Semantic Segmentation
Ukcheol Shin
Kyunghyun Lee
In So Kweon
Jean Oh
76
23
0
30 Mar 2023
ISSTAD: Incremental Self-Supervised Learning Based on Transformer for
  Anomaly Detection and Localization
ISSTAD: Incremental Self-Supervised Learning Based on Transformer for Anomaly Detection and Localization
WenPing Jin
Fei-Yu Guo
Li Zhu
ViTMedIm
81
1
0
30 Mar 2023
PMatch: Paired Masked Image Modeling for Dense Geometric Matching
PMatch: Paired Masked Image Modeling for Dense Geometric Matching
Shengjie Zhu
Xiaoming Liu
106
25
0
30 Mar 2023
Masked Autoencoders as Image Processors
Masked Autoencoders as Image Processors
Huiyu Duan
Wei Shen
Xiongkuo Min
Danyang Tu
Long Teng
Jia Wang
Guangtao Zhai
ViT
64
11
0
30 Mar 2023
Mixed Autoencoder for Self-supervised Visual Representation Learning
Mixed Autoencoder for Self-supervised Visual Representation Learning
Kai Chen
Zhili Liu
Lanqing Hong
Hang Xu
Zhenguo Li
Dit-Yan Yeung
SSL
123
45
0
30 Mar 2023
Soft Neighbors are Positive Supporters in Contrastive Visual
  Representation Learning
Soft Neighbors are Positive Supporters in Contrastive Visual Representation Learning
Chongjian Ge
Jiangliu Wang
Zhan Tong
Shoufa Chen
Yibing Song
Ping Luo
SSL
75
28
0
30 Mar 2023
Masked and Adaptive Transformer for Exemplar Based Image Translation
Masked and Adaptive Transformer for Exemplar Based Image Translation
Changlong Jiang
Fei Gao
Biao Ma
Yuhao Lin
N. Wang
Gang Xu
84
18
0
30 Mar 2023
Dependent Task Offloading in Edge Computing Using GNN and Deep
  Reinforcement Learning
Dependent Task Offloading in Edge Computing Using GNN and Deep Reinforcement Learning
Zequn Cao
Xiaoheng Deng
32
12
0
30 Mar 2023
ImageNet-E: Benchmarking Neural Network Robustness via Attribute Editing
ImageNet-E: Benchmarking Neural Network Robustness via Attribute Editing
Xiaodan Li
YueFeng Chen
Yao Zhu
Shuhui Wang
Rong Zhang
Hui Xue
82
26
0
30 Mar 2023
BERT4ETH: A Pre-trained Transformer for Ethereum Fraud Detection
BERT4ETH: A Pre-trained Transformer for Ethereum Fraud Detection
Sihao Hu
Zhen Zhang
B. Luo
Shengliang Lu
Bingsheng He
Ling Liu
74
44
0
29 Mar 2023
ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding with
  GPT and Prototype Guidance
ViewRefer: Grasp the Multi-view Knowledge for 3D Visual Grounding with GPT and Prototype Guidance
Zoey Guo
Yiwen Tang
Renrui Zhang
Dong Wang
Zhigang Wang
Bin Zhao
Xuelong Li
117
60
0
29 Mar 2023
MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks
MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks
Weicheng Kuo
A. Piergiovanni
Dahun Kim
Xiyang Luo
Benjamin Caine
...
Luowei Zhou
Andrew M. Dai
Zhifeng Chen
Claire Cui
A. Angelova
MLLMVLM
125
25
0
29 Mar 2023
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
VideoMAE V2: Scaling Video Masked Autoencoders with Dual Masking
Limin Wang
Bingkun Huang
Zhiyu Zhao
Zhan Tong
Yinan He
Yi Wang
Yali Wang
Yu Qiao
VGen
153
364
0
29 Mar 2023
Generalized Relation Modeling for Transformer Tracking
Generalized Relation Modeling for Transformer Tracking
Shenyuan Gao
Chunluan Zhou
Jun Zhang
ViT
73
114
0
29 Mar 2023
Point2Vec for Self-Supervised Representation Learning on Point Clouds
Point2Vec for Self-Supervised Representation Learning on Point Clouds
Karim Abou Zeid
Jonas Schult
Alexander Hermans
Bastian Leibe
3DPC
73
30
0
29 Mar 2023
Unlocking Masked Autoencoders as Loss Function for Image and Video
  Restoration
Unlocking Masked Autoencoders as Loss Function for Image and Video Restoration
Man Zhou
Naishan Zheng
Jie Huang
Chunle Guo
Chongyi Li
59
2
0
29 Mar 2023
Towards Foundation Models and Few-Shot Parameter-Efficient Fine-Tuning for Volumetric Organ Segmentation
Towards Foundation Models and Few-Shot Parameter-Efficient Fine-Tuning for Volumetric Organ Segmentation
Julio Silva-Rodríguez
Jose Dolz
Ismail Ben Ayed
205
14
0
29 Mar 2023
TimeBalance: Temporally-Invariant and Temporally-Distinctive Video
  Representations for Semi-Supervised Action Recognition
TimeBalance: Temporally-Invariant and Temporally-Distinctive Video Representations for Semi-Supervised Action Recognition
I. Dave
Mamshad Nayeem Rizve
Chong Chen
M. Shah
TTA
103
18
0
28 Mar 2023
Your Diffusion Model is Secretly a Zero-Shot Classifier
Your Diffusion Model is Secretly a Zero-Shot Classifier
Alexander C. Li
Mihir Prabhudesai
Shivam Duggal
Ellis L Brown
Deepak Pathak
DiffMVLM
179
240
0
28 Mar 2023
Multi-modal learning for geospatial vegetation forecasting
Multi-modal learning for geospatial vegetation forecasting
V. Benson
Claire Robin
C. Requena-Mesa
Lazaro Alonso
Nuno Carvalhais
José A. Cortés
Zhihan Gao
Nora Linscheid
M. Weynants
Markus Reichstein
81
12
0
28 Mar 2023
One-Stage 3D Whole-Body Mesh Recovery with Component Aware Transformer
One-Stage 3D Whole-Body Mesh Recovery with Component Aware Transformer
Jing Lin
Ailing Zeng
Haoqian Wang
Lei Zhang
Y. Li
3DH
102
106
0
28 Mar 2023
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Kunchang Li
Yali Wang
Yizhuo Li
Yi Wang
Yinan He
Limin Wang
Yu Qiao
VGen
136
169
0
28 Mar 2023
TabRet: Pre-training Transformer-based Tabular Models for Unseen Columns
TabRet: Pre-training Transformer-based Tabular Models for Unseen Columns
Soma Onishi
Kenta Oono
Kohei Hayashi
LMTD
70
16
0
28 Mar 2023
Mask and Restore: Blind Backdoor Defense at Test Time with Masked
  Autoencoder
Mask and Restore: Blind Backdoor Defense at Test Time with Masked Autoencoder
Tao Sun
Lu Pang
Chao Chen
Haibin Ling
AAML
73
9
0
27 Mar 2023
GeoNet: Benchmarking Unsupervised Adaptation across Geographies
GeoNet: Benchmarking Unsupervised Adaptation across Geographies
Tarun Kalluri
Wangdong Xu
Manmohan Chandraker
OOD
63
15
0
27 Mar 2023
On the Stepwise Nature of Self-Supervised Learning
On the Stepwise Nature of Self-Supervised Learning
James B. Simon
Maksis Knutins
Liu Ziyin
Daniel Geisz
Abraham J. Fetterman
Joshua Albrecht
SSL
96
35
0
27 Mar 2023
A Comprehensive Survey on Test-Time Adaptation under Distribution Shifts
A Comprehensive Survey on Test-Time Adaptation under Distribution Shifts
Jian Liang
Ran He
Tien-Ping Tan
OODVLMTTA
138
243
0
27 Mar 2023
Sigmoid Loss for Language Image Pre-Training
Sigmoid Loss for Language Image Pre-Training
Xiaohua Zhai
Basil Mustafa
Alexander Kolesnikov
Lucas Beyer
CLIPVLM
313
1,208
0
27 Mar 2023
Text-to-Image Diffusion Models are Zero-Shot Classifiers
Text-to-Image Diffusion Models are Zero-Shot Classifiers
Kevin Clark
P. Jaini
DiffMVLM
119
116
0
27 Mar 2023
Vision Transformer with Quadrangle Attention
Vision Transformer with Quadrangle Attention
Qiming Zhang
Jing Zhang
Yufei Xu
Dacheng Tao
ViT
80
41
0
27 Mar 2023
Continuous Intermediate Token Learning with Implicit Motion Manifold for
  Keyframe Based Motion Interpolation
Continuous Intermediate Token Learning with Implicit Motion Manifold for Keyframe Based Motion Interpolation
Clinton Mo
Kun Hu
Chengjiang Long
Zhiyong Wang
72
14
0
27 Mar 2023
SEM-POS: Grammatically and Semantically Correct Video Captioning
SEM-POS: Grammatically and Semantically Correct Video Captioning
Asmar Nadeem
A. Hilton
R. Dawes
Graham A. Thomas
A. Mustafa
73
8
0
26 Mar 2023
WinCLIP: Zero-/Few-Shot Anomaly Classification and Segmentation
WinCLIP: Zero-/Few-Shot Anomaly Classification and Segmentation
Jongheon Jeong
Yang Zou
Taewan Kim
Dongqing Zhang
Avinash Ravichandran
Onkar Dabeer
VLM
138
211
0
26 Mar 2023
BlackVIP: Black-Box Visual Prompting for Robust Transfer Learning
BlackVIP: Black-Box Visual Prompting for Robust Transfer Learning
Changdae Oh
Hyeji Hwang
Hee-young Lee
Yongtaek Lim
Geunyoung Jung
Jiyoung Jung
Hosik Choi
Kyungwoo Song
VLMVPVLM
141
62
0
26 Mar 2023
Global-to-Local Modeling for Video-based 3D Human Pose and Shape
  Estimation
Global-to-Local Modeling for Video-based 3D Human Pose and Shape Estimation
Xi Shen
Zongxin Yang
Xiaohan Wang
Jianxin Ma
Chang Zhou
Yezhou Yang
ViT3DH
91
36
0
26 Mar 2023
Joint fMRI Decoding and Encoding with Latent Embedding Alignment
Joint fMRI Decoding and Encoding with Latent Embedding Alignment
Xuelin Qian
Yikai Wang
Yanwei Fu
Xinwei Sun
Xiangyang Xue
Jianfeng Feng
110
6
0
26 Mar 2023
Previous
123...717273...949596
Next