ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.06377
  4. Cited By
Masked Autoencoders Are Scalable Vision Learners
v1v2v3 (latest)

Masked Autoencoders Are Scalable Vision Learners

11 November 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
    ViTTPM
ArXiv (abs)PDFHTML

Papers citing "Masked Autoencoders Are Scalable Vision Learners"

50 / 4,777 papers shown
Title
Spatial-Temporal Transformer for Video Snapshot Compressive Imaging
Spatial-Temporal Transformer for Video Snapshot Compressive Imaging
Lishun Wang
Miao Cao
Yong Zhong
Xin Yuan
71
48
0
04 Sep 2022
An Empirical Study of End-to-End Video-Language Transformers with Masked
  Visual Modeling
An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling
Tsu-Jui Fu
Linjie Li
Zhe Gan
Kevin Qinghong Lin
William Yang Wang
Lijuan Wang
Zicheng Liu
VLM
130
65
0
04 Sep 2022
Masked Sinogram Model with Transformer for ill-Posed Computed Tomography
  Reconstruction: a Preliminary Study
Masked Sinogram Model with Transformer for ill-Posed Computed Tomography Reconstruction: a Preliminary Study
Zhengchun Liu
R. Kettimuthu
Ian Foster
MedIm
90
4
0
03 Sep 2022
BinImg2Vec: Augmenting Malware Binary Image Classification with Data2Vec
BinImg2Vec: Augmenting Malware Binary Image Classification with Data2Vec
Joon Sern Lee
Kai Keng Tay
Zong Fu Chua
13
2
0
02 Sep 2022
SkeletonMAE: Spatial-Temporal Masked Autoencoders for Self-supervised
  Skeleton Action Recognition
SkeletonMAE: Spatial-Temporal Masked Autoencoders for Self-supervised Skeleton Action Recognition
Wenhan Wu
Yilei Hua
Ce Zheng
Shi-Bao Wu
Chong Chen
Aidong Lu
ViT
137
36
0
01 Sep 2022
Visual Prompting via Image Inpainting
Visual Prompting via Image Inpainting
Amir Bar
Yossi Gandelsman
Trevor Darrell
Amir Globerson
Alexei A. Efros
VLMVPVLM
75
212
0
01 Sep 2022
Transformers are Sample-Efficient World Models
Transformers are Sample-Efficient World Models
Vincent Micheli
Eloi Alonso
Franccois Fleuret
VLMOffRL
185
189
0
01 Sep 2022
MAPLE: Masked Pseudo-Labeling autoEncoder for Semi-supervised Point
  Cloud Action Recognition
MAPLE: Masked Pseudo-Labeling autoEncoder for Semi-supervised Point Cloud Action Recognition
Xiaodong Chen
Wu Liu
Xinchen Liu
Yongdong Zhang
Jungong Han
Tao Mei
3DPC
79
13
0
01 Sep 2022
TokenCut: Segmenting Objects in Images and Videos with Self-supervised
  Transformer and Normalized Cut
TokenCut: Segmenting Objects in Images and Videos with Self-supervised Transformer and Normalized Cut
Yangtao Wang
Xiaoke Shen
Yuan. Yuan
Yuming Du
Maomao Li
S. Hu
James L. Crowley
Dominique Vaufreydaz
VOSViT
86
86
0
01 Sep 2022
Efficient Vision-Language Pretraining with Visual Concepts and
  Hierarchical Alignment
Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment
Mustafa Shukor
Guillaume Couairon
Matthieu Cord
VLMCLIP
100
27
0
29 Aug 2022
MORI-RAN: Multi-view Robust Representation Learning via Hybrid
  Contrastive Fusion
MORI-RAN: Multi-view Robust Representation Learning via Hybrid Contrastive Fusion
Guanzhou Ke
Yong-Nan Zhu
Yang Yu
63
7
0
26 Aug 2022
CMD: Self-supervised 3D Action Representation Learning with Cross-modal
  Mutual Distillation
CMD: Self-supervised 3D Action Representation Learning with Cross-modal Mutual Distillation
Yunyao Mao
Wen-gang Zhou
Zhenbo Lu
Jiajun Deng
Houqiang Li
103
44
0
26 Aug 2022
MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image
  Pretraining
MaskCLIP: Masked Self-Distillation Advances Contrastive Language-Image Pretraining
Xiaoyi Dong
Jianmin Bao
Yinglin Zheng
Ting Zhang
Dongdong Chen
...
Weiming Zhang
Lu Yuan
Dong Chen
Fang Wen
Nenghai Yu
CLIPVLM
113
167
0
25 Aug 2022
Pix4Point: Image Pretrained Standard Transformers for 3D Point Cloud
  Understanding
Pix4Point: Image Pretrained Standard Transformers for 3D Point Cloud Understanding
Guocheng Qian
Abdullah Hamdi
Xingdi Zhang
Guohao Li
3DPCViT
63
6
0
25 Aug 2022
Masked Autoencoders Enable Efficient Knowledge Distillers
Masked Autoencoders Enable Efficient Knowledge Distillers
Yutong Bai
Zeyu Wang
Junfei Xiao
Chen Wei
Huiyu Wang
Alan Yuille
Yuyin Zhou
Cihang Xie
CLL
100
44
0
25 Aug 2022
Clustering Egocentric Images in Passive Dietary Monitoring with
  Self-Supervised Learning
Clustering Egocentric Images in Passive Dietary Monitoring with Self-Supervised Learning
Jiachuan Peng
Peilun Shi
Jianing Qiu
Xinwei Ju
Frank P.-W. Lo
...
M. McCrory
Edward Sazonov
M. Sun
Gary Frost
Benny Lo
38
4
0
25 Aug 2022
Refine and Represent: Region-to-Object Representation Learning
Refine and Represent: Region-to-Object Representation Learning
Akash Gokul
Konstantinos Kallidromitis
Shufang Li
Yu Kato
Kazuki Kozuka
Trevor Darrell
Colorado Reed
SSeg
93
5
0
25 Aug 2022
Semi-Supervised and Unsupervised Deep Visual Learning: A Survey
Semi-Supervised and Unsupervised Deep Visual Learning: A Survey
Yanbei Chen
Massimiliano Mancini
Xiatian Zhu
Zeynep Akata
157
121
0
24 Aug 2022
Federated Self-Supervised Contrastive Learning and Masked Autoencoder
  for Dermatological Disease Diagnosis
Federated Self-Supervised Contrastive Learning and Masked Autoencoder for Dermatological Disease Diagnosis
Yawen Wu
Dewen Zeng
Zhepeng Wang
Yi Sheng
Lei Yang
A. James
Yiyu Shi
Jingtong Hu
99
7
0
24 Aug 2022
Prompt-Matched Semantic Segmentation
Prompt-Matched Semantic Segmentation
Lingbo Liu
Jianlong Chang
Bruce X. B. Yu
Liang Lin
Qi Tian
Xin Sun
VPVLMVLM
111
29
0
22 Aug 2022
Heterogeneous Graph Masked Autoencoders
Heterogeneous Graph Masked Autoencoders
Yijun Tian
Kaiwen Dong
Chunhui Zhang
Chuxu Zhang
Nitesh Chawla
129
82
0
21 Aug 2022
Masked Video Modeling with Correlation-aware Contrastive Learning for
  Breast Cancer Diagnosis in Ultrasound
Masked Video Modeling with Correlation-aware Contrastive Learning for Breast Cancer Diagnosis in Ultrasound
Zehui Lin
Ruobing Huang
Dong Ni
Jiayi Wu
B. Luo
35
6
0
21 Aug 2022
VLMAE: Vision-Language Masked Autoencoder
VLMAE: Vision-Language Masked Autoencoder
Su He
Taian Guo
Tao Dai
Ruizhi Qiao
Chen Wu
Xiujun Shu
Bohan Ren
VLM
87
11
0
19 Aug 2022
EAA-Net: Rethinking the Autoencoder Architecture with Intra-class
  Features for Medical Image Segmentation
EAA-Net: Rethinking the Autoencoder Architecture with Intra-class Features for Medical Image Segmentation
Shiqiang Ma
Xia Li
Jijun Tang
Fei Guo
56
4
0
19 Aug 2022
Towards Label-efficient Automatic Diagnosis and Analysis: A
  Comprehensive Survey of Advanced Deep Learning-based Weakly-supervised,
  Semi-supervised and Self-supervised Techniques in Histopathological Image
  Analysis
Towards Label-efficient Automatic Diagnosis and Analysis: A Comprehensive Survey of Advanced Deep Learning-based Weakly-supervised, Semi-supervised and Self-supervised Techniques in Histopathological Image Analysis
Linhao Qu
Siyu Liu
Xiaoyu Liu
Manning Wang
Zhijian Song
83
58
0
18 Aug 2022
See Finer, See More: Implicit Modality Alignment for Text-based Person
  Retrieval
See Finer, See More: Implicit Modality Alignment for Text-based Person Retrieval
Xiujun Shu
Wei Wen
Haoqian Wu
Keyun Chen
Yi-Zhe Song
Ruizhi Qiao
Bohan Ren
Xiao Wang
91
99
0
18 Aug 2022
Dual Modality Prompt Tuning for Vision-Language Pre-Trained Model
Dual Modality Prompt Tuning for Vision-Language Pre-Trained Model
Yinghui Xing
Qirui Wu
De Cheng
Shizhou Zhang
Guoqiang Liang
Peng Wang
Yanning Zhang
VLMVPVLM
144
59
0
17 Aug 2022
Conviformers: Convolutionally guided Vision Transformer
Conviformers: Convolutionally guided Vision Transformer
Mohit Vaishnav
Thomas Fel
I. F. Rodriguez
Thomas Serre
ViT
99
1
0
17 Aug 2022
KAM -- a Kernel Attention Module for Emotion Classification with EEG
  Data
KAM -- a Kernel Attention Module for Emotion Classification with EEG Data
Dongyang Kuang
C. Michoski
16
4
0
17 Aug 2022
Data Augmentation is a Hyperparameter: Cherry-picked Self-Supervision
  for Unsupervised Anomaly Detection is Creating the Illusion of Success
Data Augmentation is a Hyperparameter: Cherry-picked Self-Supervision for Unsupervised Anomaly Detection is Creating the Illusion of Success
Jaemin Yoo
Tianchen Zhao
Leman Akoglu
141
8
0
16 Aug 2022
Generating a Terrain-Robustness Benchmark for Legged Locomotion: A
  Prototype via Terrain Authoring and Active Learning
Generating a Terrain-Robustness Benchmark for Legged Locomotion: A Prototype via Terrain Authoring and Active Learning
Chong Zhang
Lizhi Yang
85
4
0
16 Aug 2022
ConTextual Masked Auto-Encoder for Dense Passage Retrieval
ConTextual Masked Auto-Encoder for Dense Passage Retrieval
Xing Wu
Guangyuan Ma
Meng Lin
Zijia Lin
Zhongyuan Wang
Songlin Hu
RALM
98
27
0
16 Aug 2022
Efficient Multimodal Transformer with Dual-Level Feature Restoration for
  Robust Multimodal Sentiment Analysis
Efficient Multimodal Transformer with Dual-Level Feature Restoration for Robust Multimodal Sentiment Analysis
Guoying Zhao
Zheng Lian
B. Liu
Jianhua Tao
82
54
0
16 Aug 2022
Grasping Core Rules of Time Series through Pure Models
Grasping Core Rules of Time Series through Pure Models
Gedi Liu
Yifeng Jiang
Yicun Ouyang
Keyang Zhong
Yang Wang
AI4TS
93
0
0
15 Aug 2022
Self-Supervised Vision Transformers for Malware Detection
Self-Supervised Vision Transformers for Malware Detection
Sachith Seneviratne
Ridwan Shariffdeen
Sanka Rasnayaka
Nuran Kasthuriarachchi
MedIm
59
35
0
15 Aug 2022
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep
  Models
Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models
Xingyu Xie
Pan Zhou
Huan Li
Zhouchen Lin
Shuicheng Yan
ODL
94
169
0
13 Aug 2022
BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers
BEiT v2: Masked Image Modeling with Vector-Quantized Visual Tokenizers
Zhiliang Peng
Li Dong
Hangbo Bao
QiXiang Ye
Furu Wei
71
322
0
12 Aug 2022
USB: A Unified Semi-supervised Learning Benchmark for Classification
USB: A Unified Semi-supervised Learning Benchmark for Classification
Yidong Wang
Hao Chen
Yue Fan
Wangbin Sun
R. Tao
...
T. Shinozaki
Bernt Schiele
Jindong Wang
Xingxu Xie
Yue Zhang
98
119
0
12 Aug 2022
Exploiting Feature Diversity for Make-up Temporal Video Grounding
Exploiting Feature Diversity for Make-up Temporal Video Grounding
Xiujun Shu
Wei Wen
Taian Guo
Su He
Chen Wu
Ruizhi Qiao
80
1
0
12 Aug 2022
MILAN: Masked Image Pretraining on Language Assisted Representation
MILAN: Masked Image Pretraining on Language Assisted Representation
Zejiang Hou
Fei Sun
Yen-kuang Chen
Yuan Xie
S. Kung
ViT
123
70
0
11 Aug 2022
Semi-supervised Vision Transformers at Scale
Semi-supervised Vision Transformers at Scale
Zhaowei Cai
Avinash Ravichandran
Paolo Favaro
Manchen Wang
Davide Modolo
Rahul Bhotika
Zhuowen Tu
Stefano Soatto
ViT
108
58
0
11 Aug 2022
Quality Not Quantity: On the Interaction between Dataset Design and
  Robustness of CLIP
Quality Not Quantity: On the Interaction between Dataset Design and Robustness of CLIP
Thao Nguyen
Gabriel Ilharco
Mitchell Wortsman
Sewoong Oh
Ludwig Schmidt
CLIPVLM
180
108
0
10 Aug 2022
PatchDropout: Economizing Vision Transformers Using Patch Dropout
PatchDropout: Economizing Vision Transformers Using Patch Dropout
Yue Liu
Christos Matsoukas
Fredrik Strand
Hossein Azizpour
Kevin Smith
64
24
0
10 Aug 2022
Label-Free Synthetic Pretraining of Object Detectors
Label-Free Synthetic Pretraining of Object Detectors
Hei Law
Jia Deng
76
4
0
08 Aug 2022
Understanding Masked Image Modeling via Learning Occlusion Invariant
  Feature
Understanding Masked Image Modeling via Learning Occlusion Invariant Feature
Xiangwen Kong
Xiangyu Zhang
SSL
78
54
0
08 Aug 2022
Advancing Plain Vision Transformer Towards Remote Sensing Foundation
  Model
Advancing Plain Vision Transformer Towards Remote Sensing Foundation Model
Di Wang
Qiming Zhang
Yufei Xu
Jing Zhang
Bo Du
Dacheng Tao
Lefei Zhang
84
257
0
08 Aug 2022
Frozen CLIP Models are Efficient Video Learners
Frozen CLIP Models are Efficient Video Learners
Ziyi Lin
Shijie Geng
Renrui Zhang
Peng Gao
Gerard de Melo
Xiaogang Wang
Jifeng Dai
Yu Qiao
Hongsheng Li
CLIPVLM
98
209
0
06 Aug 2022
P2P: Tuning Pre-trained Image Models for Point Cloud Analysis with
  Point-to-Pixel Prompting
P2P: Tuning Pre-trained Image Models for Point Cloud Analysis with Point-to-Pixel Prompting
Ziyi Wang
Xumin Yu
Yongming Rao
Jie Zhou
Jiwen Lu
VPVLMVLM
95
77
0
04 Aug 2022
MVSFormer: Multi-View Stereo by Learning Robust Image Features and
  Temperature-based Depth
MVSFormer: Multi-View Stereo by Learning Robust Image Features and Temperature-based Depth
Chenjie Cao
Xinlin Ren
Yanwei Fu
108
54
0
04 Aug 2022
Prompt Tuning for Generative Multimodal Pretrained Models
Prompt Tuning for Generative Multimodal Pretrained Models
Han Yang
Junyang Lin
An Yang
Peng Wang
Chang Zhou
Hongxia Yang
VLMLRMVPVLM
86
31
0
04 Aug 2022
Previous
123...868788...949596
Next