Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.06377
Cited By
v1
v2
v3 (latest)
Masked Autoencoders Are Scalable Vision Learners
11 November 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Masked Autoencoders Are Scalable Vision Learners"
50 / 4,779 papers shown
Title
Multi-modal Expression Recognition with Ensemble Method
Chuanhe Liu
Xinjie Zhang
Xiaolong Liu
Tenggan Zhang
Liyu Meng
Yuchen Liu
Yuanyuan Deng
Wenqiang Jiang
CVBM
41
7
0
17 Mar 2023
LION: Implicit Vision Prompt Tuning
Haixin Wang
Jianlong Chang
Xiao Luo
Jinan Sun
Zhouchen Lin
Qi Tian
VLM
MLLM
VPVLM
81
23
0
17 Mar 2023
Dual-path Adaptation from Image to Video Transformers
Jungin Park
Jiyoung Lee
Kwanghoon Sohn
ViT
85
38
0
17 Mar 2023
DiffusionSeg: Adapting Diffusion Towards Unsupervised Object Discovery
Chaofan Ma
Yu-Hao Yang
Chen Ju
Feifan Zhang
Jinxian Liu
Yu Wang
Ya Zhang
Yanfeng Wang
DiffM
96
38
0
17 Mar 2023
Denoising Diffusion Autoencoders are Unified Self-supervised Learners
Weilai Xiang
Hongyu Yang
Di Huang
Yunhong Wang
DiffM
125
78
0
17 Mar 2023
LOCATE: Localize and Transfer Object Parts for Weakly Supervised Affordance Grounding
Gen Li
Varun Jampani
Deqing Sun
Laura Sevilla-Lara
97
44
0
16 Mar 2023
Steering Prototypes with Prompt-tuning for Rehearsal-free Continual Learning
Zhuowei Li
Long Zhao
Zizhao Zhang
Han Zhang
Diya Liu
Ting Liu
Dimitris N. Metaxas
CLL
VLM
106
21
0
16 Mar 2023
All4One: Symbiotic Neighbour Contrastive Learning via Self-Attention and Redundancy Reduction
Imanol G. Estepa
Ignacio Sarasúa
Bhalaji Nagarajan
Petia Radeva
SSL
88
9
0
16 Mar 2023
MAPSeg: Unified Unsupervised Domain Adaptation for Heterogeneous Medical Image Segmentation Based on 3D Masked Autoencoding and Pseudo-Labeling
Xuzhe Zhang
Yu-Hsun Wu
S. Entringer
H. Simhan
Jia Guo
...
A. Jackowski
Haifeng Li
J. Posner
Andrew F. Laine
Yun Wang
OOD
101
14
0
16 Mar 2023
CSSL-MHTR: Continual Self-Supervised Learning for Scalable Multi-script Handwritten Text Recognition
M. Dhiaf
Mohamed Ali Souibgui
Kai Wang
Yuyang Liu
Yousri Kessentini
Alicia Fornés
Ahmed Cheikh Rouhou
CLL
60
2
0
16 Mar 2023
Multimodal Feature Extraction and Fusion for Emotional Reaction Intensity Estimation and Expression Classification in Videos with Transformers
Jia Li
Yin Chen
Xuesong Zhang
Jian‐Hui Nie
Zi-Yang Li
Yang Yu
Yan Zhang
Richang Hong
Ming Wang
54
18
0
16 Mar 2023
Cross-Modal Causal Intervention for Medical Report Generation
Weixing Chen
Yang-Yang Liu
Ce Wang
Jiarui Zhu
Shen Zhao
Guanbin Li
Cheng-Lin Liu
82
5
0
16 Mar 2023
Sensitivity-Aware Visual Parameter-Efficient Fine-Tuning
Haoyu He
Jianfei Cai
Jing Zhang
Dacheng Tao
Bohan Zhuang
VPVLM
89
58
0
15 Mar 2023
SeqCo-DETR: Sequence Consistency Training for Self-Supervised Object Detection with Transformers
Guoqiang Jin
Fan Yang
Mingshan Sun
R. Zhao
Yakun Liu
Wei Li
Tianpeng Bao
Liwei Wu
Xingyu Zeng
Rui Zhao
ViT
64
2
0
15 Mar 2023
Task-specific Fine-tuning via Variational Information Bottleneck for Weakly-supervised Pathology Whole Slide Image Classification
Honglin Li
Chenglu Zhu
Yunlong Zhang
Yuxuan Sun
Zhongyi Shui
Wenwei Kuang
S. Zheng
Ling Yang
128
59
0
15 Mar 2023
Real Face Foundation Representation Learning for Generalized Deepfake Detection
Liang Shi
Jie Zhang
Shiguang Shan
CVBM
90
9
0
15 Mar 2023
Diversity-Aware Meta Visual Prompting
Qidong Huang
Xiaoyi Dong
DongDong Chen
Weiming Zhang
Feifei Wang
Gang Hua
Neng H. Yu
VLM
VPVLM
90
57
0
14 Mar 2023
PiMAE: Point Cloud and Image Interactive Masked Autoencoders for 3D Object Detection
Anthony Chen
Kevin Zhang
Renrui Zhang
Zihan Wang
Yuheng Lu
Yandong Guo
Shanghang Zhang
3DPC
139
69
0
14 Mar 2023
Adaptive Rotated Convolution for Rotated Object Detection
Yifan Pu
Yiru Wang
Zhuofan Xia
Yizeng Han
Yulin Wang
Weihao Gan
Zidong Wang
S. Song
Gao Huang
92
84
0
14 Mar 2023
OVRL-V2: A simple state-of-art baseline for ImageNav and ObjectNav
Karmesh Yadav
Arjun Majumdar
Ram Ramrakhya
Naoki Yokoyama
Alexei Baevski
Z. Kira
Oleksandr Maksymets
Dhruv Batra
ViT
99
49
0
14 Mar 2023
AdPE: Adversarial Positional Embeddings for Pretraining Vision Transformers via MAE+
Tianlin Li
Ying Wang
Ziwei Xuan
Guo-Jun Qi
ViT
75
3
0
14 Mar 2023
Revisiting Class-Incremental Learning with Pre-Trained Models: Generalizability and Adaptivity are All You Need
Da-Wei Zhou
Han-Jia Ye
De-Chuan Zhan
Ziwei Liu
CLL
106
111
0
13 Mar 2023
Domain Generalization in Machine Learning Models for Wireless Communications: Concepts, State-of-the-Art, and Open Issues
Mohamed Akrout
Amal Feriani
F. Bellili
A. Mezghani
Ekram Hossain
OOD
AI4CE
101
28
0
13 Mar 2023
DPPMask: Masked Image Modeling with Determinantal Point Processes
Junde Xu
Zikai Lin
Donghao Zhou
Yao-Cheng Yang
Xiangyun Liao
Bian Wu
Guangyong Chen
Pheng-Ann Heng
76
1
0
13 Mar 2023
FireRisk: A Remote Sensing Dataset for Fire Risk Assessment with Benchmarks Using Supervised and Self-supervised Learning
Shuchang Shen
Sachith Seneviratne
Xinye Wanyan
M. Kirley
80
14
0
13 Mar 2023
ViM: Vision Middleware for Unified Downstream Transferring
Yutong Feng
Biao Gong
Jianwen Jiang
Yiliang Lv
Yujun Shen
Deli Zhao
Jingren Zhou
98
1
0
13 Mar 2023
CrossFormer++: A Versatile Vision Transformer Hinging on Cross-scale Attention
Wenxiao Wang
Wei Chen
Qibo Qiu
Long Chen
Boxi Wu
Binbin Lin
Xiaofei He
Wei Liu
98
49
0
13 Mar 2023
Traj-MAE: Masked Autoencoders for Trajectory Prediction
Hao Chen
Jiaze Wang
Kun Shao
Furui Liu
Jianye Hao
Chenyong Guan
Guangyong Chen
Pheng-Ann Heng
122
40
0
12 Mar 2023
Improving Masked Autoencoders by Learning Where to Mask
Haijia Chen
Wendong Zhang
Yunbo Wang
Xiaokang Yang
SSL
62
20
0
12 Mar 2023
Towards General Purpose Medical AI: Continual Learning Medical Foundation Model
Huahui Yi
Ziyuan Qin
Qicheng Lao
Wei Xu
Zekun Jiang
Dequan Wang
Shaoting Zhang
Kang Li
OOD
MedIm
CLL
72
15
0
12 Mar 2023
Token Sparsification for Faster Medical Image Segmentation
Lei Zhou
Huidong Liu
Joseph Bae
Junjun He
Dimitris Samaras
Prateek Prasanna
MedIm
62
3
0
11 Mar 2023
Active Visual Exploration Based on Attention-Map Entropy
Adam Pardyl
Grzegorz Rype'sć
Grzegorz Kurzejamski
Bartosz Zieliñski
Tomasz Trzciñski
93
6
0
11 Mar 2023
Semi-supervised Hand Appearance Recovery via Structure Disentanglement and Dual Adversarial Discrimination
Zimeng Zhao
Binghui Zuo
Zhiyu Long
Yangang Wang
58
5
0
11 Mar 2023
AugDiff: Diffusion based Feature Augmentation for Multiple Instance Learning in Whole Slide Image
Zhucheng Shao
Liuxi Dai
Yifeng Wang
Haoqian Wang
Yongbing Zhang
MedIm
DiffM
220
0
0
11 Mar 2023
PRSNet: A Masked Self-Supervised Learning Pedestrian Re-Identification Method
Zhijie Xiao
Zhicheng Dong
Hao Xiang
SSL
53
0
0
11 Mar 2023
DETA: Denoised Task Adaptation for Few-Shot Learning
Ji Zhang
Lianli Gao
Xu Luo
Hengtao Shen
Jingkuan Song
VLM
109
21
0
11 Mar 2023
Stabilizing Transformer Training by Preventing Attention Entropy Collapse
Shuangfei Zhai
Tatiana Likhomanenko
Etai Littwin
Dan Busbridge
Jason Ramapuram
Yizhe Zhang
Jiatao Gu
J. Susskind
AAML
118
78
0
11 Mar 2023
Ignorance is Bliss: Robust Control via Information Gating
Manan Tomar
Riashat Islam
Matthew E. Taylor
Sergey Levine
Philip Bachman
87
9
0
10 Mar 2023
Towards domain-invariant Self-Supervised Learning with Batch Styles Standardization
Marin Scalbert
Maria Vakalopoulou
Florent Couzinié-Devy
104
2
0
10 Mar 2023
MuLTI: Efficient Video-and-Language Understanding with Text-Guided MultiWay-Sampler and Multiple Choice Modeling
Jiaqi Xu
Bo Liu
Yunkuo Chen
Mengli Cheng
Xing Shi
95
1
0
10 Mar 2023
Human Pose Estimation from Ambiguous Pressure Recordings with Spatio-temporal Masked Transformers
Vandad Davoodnia
Ali Etemad
ViT
60
7
0
10 Mar 2023
HumanBench: Towards General Human-centric Perception with Projector Assisted Pretraining
Shixiang Tang
Cheng Chen
Qingsong Xie
Meilin Chen
Yizhou Wang
...
Feng Zhu
Haiyang Yang
Li Yi
Rui Zhao
Wanli Ouyang
VLM
107
36
0
10 Mar 2023
CFR-ICL: Cascade-Forward Refinement with Iterative Click Loss for Interactive Image Segmentation
Shoukun Sun
Min Xian
Fei Xu
L. Capriotti
Tiankai Yao
69
19
0
09 Mar 2023
Cal-QL: Calibrated Offline RL Pre-Training for Efficient Online Fine-Tuning
Mitsuhiko Nakamoto
Yuexiang Zhai
Anika Singh
Max Sobol Mark
Yi-An Ma
Chelsea Finn
Aviral Kumar
Sergey Levine
OffRL
OnRL
190
125
0
09 Mar 2023
Mimic before Reconstruct: Enhancing Masked Autoencoders with Feature Mimicking
Peng Gao
Renrui Zhang
Rongyao Fang
Ziyi Lin
Hongyang Li
Hongsheng Li
Qiao Yu
65
19
0
09 Mar 2023
Refined Vision-Language Modeling for Fine-grained Multi-modal Pre-training
Lisai Zhang
Qingcai Chen
Zhijian Chen
Yunpeng Han
Zhonghua Li
Bo Zhao
VLM
59
1
0
09 Mar 2023
M3AE: Multimodal Representation Learning for Brain Tumor Segmentation with Missing Modalities
Hong Liu
Dong Wei
Donghuan Lu
J. Sun
Liansheng Wang
Yefeng Zheng
73
51
0
09 Mar 2023
From Visual Prompt Learning to Zero-Shot Transfer: Mapping Is All You Need
Ziqing Yang
Zeyang Sha
Michael Backes
Yang Zhang
VPVLM
VLM
89
3
0
09 Mar 2023
Masked Image Modeling with Local Multi-Scale Reconstruction
Haoqing Wang
Yehui Tang
Yunhe Wang
Jianyuan Guo
Zhiwei Deng
Kai Han
90
53
0
09 Mar 2023
SLCA: Slow Learner with Classifier Alignment for Continual Learning on a Pre-trained Model
Gengwei Zhang
Liyuan Wang
Guoliang Kang
Ling-Hao Chen
Yunchao Wei
CLL
90
119
0
09 Mar 2023
Previous
1
2
3
...
73
74
75
...
94
95
96
Next