Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.06377
Cited By
v1
v2
v3 (latest)
Masked Autoencoders Are Scalable Vision Learners
11 November 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Masked Autoencoders Are Scalable Vision Learners"
50 / 4,778 papers shown
Title
Co-supervised learning paradigm with conditional generative adversarial networks for sample-efficient classification
Hao Zhen
Yucheng Shi
Jidong J. Yang
Javad Mohammadpour Vehni
GAN
49
1
0
27 Dec 2022
Semi-Supervised Semantic Segmentation Methods for UW-OCTA Diabetic Retinopathy Grade Assessment
Zhuoyi Tan
H. Madzin
Zeyu Ding
31
4
0
27 Dec 2022
Position-Aware Contrastive Alignment for Referring Image Segmentation
Bo Chen
Zhiwei Hu
Zhilong Ji
Jinfeng Bai
W. Zuo
136
7
0
27 Dec 2022
Exploring Transformer Backbones for Image Diffusion Models
Princy Chahal
42
3
0
27 Dec 2022
MVTN: Learning Multi-View Transformations for 3D Understanding
Abdullah Hamdi
Faisal AlZahrani
Silvio Giancola
Guohao Li
3DV
3DPC
139
6
0
27 Dec 2022
HandsOff: Labeled Dataset Generation With No Additional Human Annotations
Austin Xu
Mariya I. Vasileva
Achal Dave
Arjun Seshadri
67
7
0
24 Dec 2022
Reversible Column Networks
Yuxuan Cai
Yi Zhou
Qi Han
Jianjian Sun
Xiangwen Kong
Jun Yu Li
Xiangyu Zhang
VLM
96
59
0
22 Dec 2022
Similarity Contrastive Estimation for Image and Video Soft Contrastive Self-Supervised Learning
J. Denize
Jaonary Rabarisoa
Astrid Orcesi
Romain Hérault
SSL
92
6
0
21 Dec 2022
What Makes for Good Tokenizers in Vision Transformer?
Shengju Qian
Yi Zhu
Wenbo Li
Mu Li
Jiaya Jia
ViT
91
14
0
21 Dec 2022
Joint Embedding of 2D and 3D Networks for Medical Image Anomaly Detection
In-Joo Kang
Jinah Park
3DH
55
1
0
21 Dec 2022
MaskingDepth: Masked Consistency Regularization for Semi-supervised Monocular Depth Estimation
Jongbeom Baek
Gyeongnyeon Kim
Seonghoon Park
Honggyu An
Matteo Poggi
Seung Wook Kim
MDE
103
0
0
21 Dec 2022
UnICLAM:Contrastive Representation Learning with Adversarial Masking for Unified and Interpretable Medical Vision Question Answering
Chenlu Zhan
Peng Peng
Hongsen Wang
Tao Chen
Hongwei Wang
MedIm
77
4
0
21 Dec 2022
Unleashing the Power of Visual Prompting At the Pixel Level
Junyang Wu
Xianhang Li
Chen Wei
Huiyu Wang
Alan Yuille
Yuyin Zhou
Cihang Xie
VPVLM
VLM
97
32
0
20 Dec 2022
Masked Event Modeling: Self-Supervised Pretraining for Event Cameras
Simone Klenk
David Bonello
Lukas Koestler
Nikita Araslanov
Zorah Lähner
92
25
0
20 Dec 2022
Are Deep Neural Networks SMARTer than Second Graders?
A. Cherian
Kuan-Chuan Peng
Suhas Lohit
Kevin A. Smith
J. Tenenbaum
AAML
LRM
ReLM
112
31
0
20 Dec 2022
MM-3DScene: 3D Scene Understanding by Customizing Masked Modeling with Informative-Preserved Reconstruction and Self-Distilled Consistency
Mingye Xu
Mutian Xu
Tong He
Wanli Ouyang
Yali Wang
Xiaoguang Han
Yu Qiao
79
10
0
20 Dec 2022
AI applications in forest monitoring need remote sensing benchmark datasets
E. Lines
Matthew J. Allen
Carlos Cabo
K. Calders
Amandine Debus
S. Grieve
Milto Miltiadou
Adam Noach
H. Owen
Stefano Puliti
51
9
0
20 Dec 2022
LayoutDETR: Detection Transformer Is a Good Multimodal Layout Designer
Ning Yu
Chia-Chih Chen
Zeyuan Chen
Rui Meng
Ganglu Wu
P. Josel
Juan Carlos Niebles
Caiming Xiong
Ran Xu
ViT
DiffM
94
8
0
19 Dec 2022
Randomized Quantization: A Generic Augmentation for Data Agnostic Self-supervised Learning
Huimin Wu
Chenyang Lei
Xiao Sun
Pengju Wang
Qifeng Chen
Kwang-Ting Cheng
Stephen Lin
Zhirong Wu
MQ
86
6
0
19 Dec 2022
Query-as-context Pre-training for Dense Passage Retrieval
Xing Wu
Guangyuan Ma
Wanhui Qian
Zijia Lin
Songlin Hu
93
9
0
19 Dec 2022
Universal Object Detection with Large Vision Model
Feng-Huei Lin
Wenze Hu
Yaowei Wang
Yonghong Tian
Guangming Lu
Fanglin Chen
Yong-mei Xu
Xiaoyu Wang
VLM
ObjD
100
8
0
19 Dec 2022
Boosting Automatic COVID-19 Detection Performance with Self-Supervised Learning and Batch Knowledge Ensembling
Guang Li
Ren Togo
Takahiro Ogawa
Miki Haseyama
SSL
64
8
0
19 Dec 2022
ColoristaNet for Photorealistic Video Style Transfer
Xiaowen Qiu
Ruize Xu
Boan He
Yingtao Zhang
Wenqiang Zhang
Weifeng Ge
59
0
0
19 Dec 2022
Fine-Tuning Is All You Need to Mitigate Backdoor Attacks
Zeyang Sha
Xinlei He
Pascal Berrang
Mathias Humbert
Yang Zhang
AAML
90
38
0
18 Dec 2022
BEATs: Audio Pre-Training with Acoustic Tokenizers
Sanyuan Chen
Yu-Huan Wu
Chengyi Wang
Shujie Liu
Daniel C. Tompkins
Zhuo Chen
Furu Wei
124
299
0
18 Dec 2022
Attentive Mask CLIP
Yifan Yang
Weiquan Huang
Yixuan Wei
Houwen Peng
Xinyang Jiang
...
Fangyun Wei
Yin Wang
Han Hu
Lili Qiu
Yuqing Yang
CLIP
VLM
83
27
0
16 Dec 2022
Autoencoders as Cross-Modal Teachers: Can Pretrained 2D Image Transformers Help 3D Representation Learning?
Runpei Dong
Zekun Qi
Linfeng Zhang
Junbo Zhang
Jian‐Yuan Sun
Zheng Ge
Li Yi
Kaisheng Ma
ViT
3DPC
115
92
0
16 Dec 2022
DQnet: Cross-Model Detail Querying for Camouflaged Object Detection
Wei Sun
Chengao Liu
Linyan Zhang
Yu Li
Pengxu Wei
Chang-rui Liu
J. Zou
Jianbin Jiao
QiXiang Ye
84
6
0
16 Dec 2022
Improving self-supervised representation learning via sequential adversarial masking
Dylan Sam
Min Bai
Tristan McKinney
Li Erran Li
SSL
88
0
0
16 Dec 2022
MAViL: Masked Audio-Video Learners
Po-Yao (Bernie) Huang
Vasu Sharma
Hu Xu
Chaitanya K. Ryali
Haoqi Fan
Yanghao Li
Shang-Wen Li
Gargi Ghosh
Jitendra Malik
Christoph Feichtenhofer
81
54
0
15 Dec 2022
CLIPPO: Image-and-Language Understanding from Pixels Only
Michael Tschannen
Basil Mustafa
N. Houlsby
CLIP
VLM
102
49
0
15 Dec 2022
FlexiViT: One Model for All Patch Sizes
Lucas Beyer
Pavel Izmailov
Alexander Kolesnikov
Mathilde Caron
Simon Kornblith
Xiaohua Zhai
Matthias Minderer
Michael Tschannen
Ibrahim Alabdulmohsin
Filip Pavetić
VLM
153
94
0
15 Dec 2022
Unsupervised Object Localization: Observing the Background to Discover Objects
Oriane Siméoni
Chloé Sekkat
Gilles Puy
Antonín Vobecký
Éloi Zablocki
Patrick Pérez
OCL
SSL
81
59
0
15 Dec 2022
Sim-to-Real Transfer for Quadrupedal Locomotion via Terrain Transformer
Hang Lai
Weinan Zhang
Xialin He
Chen Yu
Zheng Tian
Yong Yu
Jun Wang
114
21
0
15 Dec 2022
Proposal Distribution Calibration for Few-Shot Object Detection
Bohao Li
Chang-rui Liu
Mengnan Shi
Xiaozhong Chen
Xiang Ji
QiXiang Ye
ObjD
84
6
0
15 Dec 2022
Efficient Self-supervised Learning with Contextualized Target Representations for Vision, Speech and Language
Alexei Baevski
Arun Babu
Wei-Ning Hsu
Michael Auli
VLM
SSL
129
97
0
14 Dec 2022
RTMDet: An Empirical Study of Designing Real-Time Object Detectors
Chengqi Lyu
Wenwei Zhang
Haian Huang
Yue Zhou
Yudong Wang
Yanyi Liu
Shilong Zhang
Kai-xiang Chen
ObjD
108
408
0
14 Dec 2022
Policy Adaptation from Foundation Model Feedback
Yuying Ge
Annabella Macaluso
Erran L. Li
Ping Luo
Xiaolong Wang
LM&Ro
76
13
0
14 Dec 2022
Image Compression with Product Quantized Masked Image Modeling
Alaaeldin El-Nouby
Matthew Muckley
Karen Ullrich
Ivan Laptev
Jakob Verbeek
Hervé Jégou
MQ
82
31
0
14 Dec 2022
MAELi: Masked Autoencoder for Large-Scale LiDAR Point Clouds
Georg Krispel
David Schinagl
Christian Fruhwirth-Reisinger
Horst Possegger
Horst Bischof
3DPC
117
15
0
14 Dec 2022
THMA: Tencent HD Map AI System for Creating HD Map Annotations
Kun Tang
Xu Cao
Zhipeng Cao
Tongxi Zhou
Erlong Li
...
Shengtao Zou
Chang-ling Liu
Shuqi Mei
Elena Sizikova
Chao Zheng
62
13
0
14 Dec 2022
NLIP: Noise-robust Language-Image Pre-training
Runhu Huang
Yanxin Long
Jianhua Han
Hang Xu
Xiwen Liang
Chunjing Xu
Xiaodan Liang
VLM
109
30
0
14 Dec 2022
LidarCLIP or: How I Learned to Talk to Point Clouds
Georg Hess
Adam Tonderski
Christoffer Petersson
Kalle AAstrom
Lennart Svensson
DiffM
87
23
0
13 Dec 2022
GPViT: A High Resolution Non-Hierarchical Vision Transformer with Group Propagation
Chenhongyi Yang
Jiarui Xu
Shalini De Mello
Elliot J. Crowley
Xinyu Wang
ViT
109
22
0
13 Dec 2022
Learning 3D Representations from 2D Pre-trained Models via Image-to-Point Masked Autoencoders
Renrui Zhang
Liuhui Wang
Yu Qiao
Peng Gao
Hongsheng Li
3DPC
92
137
0
13 Dec 2022
What do Vision Transformers Learn? A Visual Exploration
Amin Ghiasi
Hamid Kazemi
Eitan Borgnia
Steven Reich
Manli Shu
Micah Goldblum
A. Wilson
Tom Goldstein
ViT
91
64
0
13 Dec 2022
OAMixer: Object-aware Mixing Layer for Vision Transformers
H. Kang
Sangwoo Mo
Jinwoo Shin
VLM
119
4
0
13 Dec 2022
FastMIM: Expediting Masked Image Modeling Pre-training for Vision
Jianyuan Guo
Kai Han
Han Wu
Yehui Tang
Yunhe Wang
Chang Xu
80
10
0
13 Dec 2022
Semantics-Consistent Feature Search for Self-Supervised Visual Representation Learning
Kaiyou Song
Shanyi Zhang
Zihao An
Zimeng Luo
Tongzhou Wang
Jin Xie
SSL
87
7
0
13 Dec 2022
TencentPretrain: A Scalable and Flexible Toolkit for Pre-training Models of Different Modalities
Zhe Zhao
Yudong Li
Cheng-An Hou
Jing-xin Zhao
Rong Tian
...
Xingwu Sun
Zhanhui Kang
Xiaoyong Du
Linlin Shen
Kimmo Yan
VLM
106
24
0
13 Dec 2022
Previous
1
2
3
...
78
79
80
...
94
95
96
Next