Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.06377
Cited By
v1
v2
v3 (latest)
Masked Autoencoders Are Scalable Vision Learners
11 November 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Masked Autoencoders Are Scalable Vision Learners"
50 / 4,779 papers shown
Title
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
Jiasen Lu
Christopher Clark
Sangho Lee
Zichen Zhang
Savya Khosla
Ryan Marten
Derek Hoiem
Aniruddha Kembhavi
VLM
MLLM
102
175
0
28 Dec 2023
Fully Sparse 3D Occupancy Prediction
Haisong Liu
Yang Chen
Haiguang Wang
Zetong Yang
Tianyu Li
Jia Zeng
Li Chen
Hongyang Li
Limin Wang
128
19
0
28 Dec 2023
FILP-3D: Enhancing 3D Few-shot Class-incremental Learning with Pre-trained Vision-Language Models
Wan Xu
Tianyu Huang
Tianyu Qu
Guanglei Yang
Yiwen Guo
Wangmeng Zuo
75
0
0
28 Dec 2023
Forgery-aware Adaptive Transformer for Generalizable Synthetic Image Detection
Huan Liu
Zichang Tan
Chuangchuang Tan
Yunchao Wei
Yao-Min Zhao
Jingdong Wang
ViT
102
55
0
27 Dec 2023
Learning to Embed Time Series Patches Independently
Seunghan Lee
Taeyoung Park
Kibok Lee
SSL
AI4TS
95
32
0
27 Dec 2023
Unraveling the Key Components of OOD Generalization via Diversification
Harold Benoit
Liangze Jiang
Andrei Atanov
Ouguzhan Fatih Kar
Mattia Rigotti
Amir Zamir
CML
102
2
0
26 Dec 2023
BAL: Balancing Diversity and Novelty for Active Learning
Jingyao Li
Pengguang Chen
Shaozuo Yu
Shu Liu
Jiaya Jia
31
7
0
26 Dec 2023
TimesURL: Self-supervised Contrastive Learning for Universal Time Series Representation Learning
Jiexi Liu
Songcan Chen
AI4TS
77
40
0
25 Dec 2023
Partial Fine-Tuning: A Successor to Full Fine-Tuning for Vision Transformers
Peng Ye
Yongqi Huang
Chongjun Tu
Minglei Li
Tao Chen
Tong He
Wanli Ouyang
93
5
0
25 Dec 2023
APTv2: Benchmarking Animal Pose Estimation and Tracking with a Large-scale Dataset and Beyond
Yuxiang Yang
Yingqi Deng
Yufei Xu
Jing Zhang
74
4
0
25 Dec 2023
Segment Any Events via Weighted Adaptation of Pivotal Tokens
Zhiwen Chen
Zhiyu Zhu
Yifan Zhang
Junhui Hou
Guangming Shi
Jinjian Wu
80
7
0
24 Dec 2023
TVE: Learning Meta-attribution for Transferable Vision Explainer
Guanchu Wang
Yu-Neng Chuang
Fan Yang
Mengnan Du
Chia-Yuan Chang
...
Zirui Liu
Zhaozhuo Xu
Kaixiong Zhou
Xuanting Cai
Helen Zhou
117
1
0
23 Dec 2023
SAIC: Integration of Speech Anonymization and Identity Classification
Ming Cheng
Xingjian Diao
Shitong Cheng
Wenjun Liu
103
6
0
23 Dec 2023
DRStageNet: Deep Learning for Diabetic Retinopathy Staging from Fundus Images
Yevgeniy Men
Jonathan Fhima
Leo Anthony Celi
L. Z. Ribeiro
Luis Filipe Nakayama
Joachim A. Behar
81
5
0
22 Dec 2023
BrainVis: Exploring the Bridge between Brain and Visual Signals via Image Reconstruction
Honghao Fu
Zhiqi Shen
Jing Jih Chin
Hao Wang
DiffM
107
7
0
22 Dec 2023
ADA-GAD: Anomaly-Denoised Autoencoders for Graph Anomaly Detection
Junwei He
Qianqian Xu
Yangbangyan Jiang
Zitai Wang
Qingming Huang
48
29
0
22 Dec 2023
Revisiting Few-Shot Object Detection with Vision-Language Models
Anish Madan
Neehar Peri
Shu Kong
Deva Ramanan
VLM
103
11
0
22 Dec 2023
Unveiling Backbone Effects in CLIP: Exploring Representational Synergies and Variances
Cristian Rodriguez-Opazo
Edison Marrese-Taylor
Ehsan Abbasnejad
Hamed Damirchi
Ignacio M. Jara
Felipe Bravo-Marquez
Anton Van Den Hengel
VLM
68
1
0
22 Dec 2023
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Zhe Chen
Jiannan Wu
Wenhai Wang
Weijie Su
Guo Chen
...
Bin Li
Ping Luo
Tong Lu
Yu Qiao
Jifeng Dai
VLM
MLLM
311
1,217
0
21 Dec 2023
DUSt3R: Geometric 3D Vision Made Easy
Shuzhe Wang
Vincent Leroy
Yohann Cabon
Boris Chidlovskii
Jérôme Revaud
3DGS
119
406
0
21 Dec 2023
Bootstrap Masked Visual Modeling via Hard Patches Mining
Haochen Wang
Junsong Fan
Yuxi Wang
Kaiyou Song
Tiancai Wang
Xiangyu Zhang
Zhaoxiang Zhang
83
5
0
21 Dec 2023
Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation
Hongtao Wu
Ya Jing
Chi-Hou Cheang
Guangzeng Chen
Jiafeng Xu
Xinghang Li
Minghuan Liu
Hang Li
Tao Kong
151
113
0
20 Dec 2023
No More Shortcuts: Realizing the Potential of Temporal Self-Supervision
I. Dave
Simon Jenni
Mubarak Shah
68
10
0
20 Dec 2023
TADAP: Trajectory-Aided Drivable area Auto-labeling with Pre-trained self-supervised features in winter driving conditions
Eerik Alamikkotervo
Risto Ojala
Alvari Seppänen
Kari Tammi
50
0
0
20 Dec 2023
ECAMP: Entity-centered Context-aware Medical Vision Language Pre-training
Rongsheng Wang
Qingsong Yao
Zihang Jiang
Zhiyang He
Xiaodong Tao
Zihang Jiang
S.Kevin Zhou
MedIm
VLM
116
0
0
20 Dec 2023
Expediting Contrastive Language-Image Pretraining via Self-distilled Encoders
Bumsoo Kim
Jinhyung Kim
Yeonsik Jo
S. Kim
VLM
103
4
0
19 Dec 2023
RealGen: Retrieval Augmented Generation for Controllable Traffic Scenarios
Wenhao Ding
Yulong Cao
Ding Zhao
Chaowei Xiao
Marco Pavone
84
28
0
19 Dec 2023
Unsupervised Segmentation of Colonoscopy Images
Heming Yao
Jérôme Lüscher
Benjamín Gutiérrez-Becker
Josep Arús-Pous
Tommaso Biancalani
A. Bigorgne
David Richmond
MedIm
91
0
0
19 Dec 2023
Continual-MAE: Adaptive Distribution Masked Autoencoders for Continual Test-Time Adaptation
Jiaming Liu
Ran Xu
Senqiao Yang
Renrui Zhang
Qizhe Zhang
Zehui Chen
Yandong Guo
Shanghang Zhang
TTA
77
12
0
19 Dec 2023
Mask Grounding for Referring Image Segmentation
Yong Xien Chng
Henry Zheng
Yizeng Han
Xuchong Qiu
Gao Huang
ISeg
ObjD
143
21
0
19 Dec 2023
M-BEV: Masked BEV Perception for Robust Autonomous Driving
Siran Chen
Yue Ma
Yu Qiao
Yali Wang
127
11
0
19 Dec 2023
DMT: Comprehensive Distillation with Multiple Self-supervised Teachers
Yuang Liu
Jing Wang
Qiang-feng Zhou
Fan Wang
Jun Wang
Wei Zhang
48
0
0
19 Dec 2023
Big Learning Expectation Maximization
Yulai Cong
Sijia Li
70
2
0
19 Dec 2023
Regulating Intermediate 3D Features for Vision-Centric Autonomous Driving
Junkai Xu
Liang Peng
Haoran Cheng
Linxuan Xia
Qi Zhou
Dan Deng
Wei Qian
Wenxiao Wang
Deng Cai
127
9
0
19 Dec 2023
Appearance-based Refinement for Object-Centric Motion Segmentation
Junyu Xie
Weidi Xie
Andrew Zisserman
VOS
101
3
0
18 Dec 2023
Layerwise complexity-matched learning yields an improved model of cortical area V2
Nikhil Parthasarathy
Olivier J. Hénaff
Eero P. Simoncelli
86
1
0
18 Dec 2023
Unleashing the Power of CNN and Transformer for Balanced RGB-Event Video Recognition
Tianlin Li
Yao Rong
Shiao Wang
Yuan Chen
Zhe Wu
Bowei Jiang
Yonghong Tian
Jin Tang
ViT
158
3
0
18 Dec 2023
ADF & TransApp: A Transformer-Based Framework for Appliance Detection Using Smart Meter Consumption Series
Adrien Petralia
Philippe Charpentier
Themis Palpanas
AI4TS
117
4
0
17 Dec 2023
SHaRPose: Sparse High-Resolution Representation for Human Pose Estimation
Xiaoqi An
Lin Zhao
Chen Gong
Nannan Wang
Di Wang
Jian Yang
3DH
ViT
64
11
0
17 Dec 2023
Towards Compact 3D Representations via Point Feature Enhancement Masked Autoencoders
Yaohua Zha
Huizhen Ji
Jinmin Li
Rongsheng Li
Tao Dai
Bin Chen
Zhi Wang
Shu-Tao Xia
3DPC
109
32
0
17 Dec 2023
How to Efficiently Annotate Images for Best-Performing Deep Learning Based Segmentation Models: An Empirical Study with Weak and Noisy Annotations and Segment Anything Model
Yixin Zhang
Shen Zhao
Han Gu
Maciej A. Mazurowski
VLM
140
4
0
17 Dec 2023
SAME: Sample Reconstruction against Model Extraction Attacks
Yi Xie
Jie Zhang
Shiqian Zhao
Tianwei Zhang
Xiaofeng Chen
AAML
MIACV
105
4
0
17 Dec 2023
Semantic-Aware Autoregressive Image Modeling for Visual Representation Learning
Kaiyou Song
Shan Zhang
Tong Wang
VLM
86
2
0
16 Dec 2023
T-MAE: Temporal Masked Autoencoders for Point Cloud Representation Learning
Weijie Wei
Fatemeh Karimi Nejadasl
Theo Gevers
Martin R. Oswald
3DPC
87
3
0
15 Dec 2023
Test-Time Domain Adaptation by Learning Domain-Aware Batch Normalization
Yanan Wu
Zhixiang Chi
Yang Wang
Konstantinos N. Plataniotis
Songhe Feng
OOD
93
20
0
15 Dec 2023
Data-Efficient Multimodal Fusion on a Single GPU
Noël Vouitsis
Zhaoyan Liu
S. Gorti
Valentin Villecroze
Jesse C. Cresswell
Guangwei Yu
Gabriel Loaiza-Ganem
M. Volkovs
127
3
0
15 Dec 2023
Structural Information Guided Multimodal Pre-training for Vehicle-centric Perception
Tianlin Li
Wentao Wu
Chenglong Li
Zhicheng Zhao
Zhe Chen
Yukai Shi
Jin Tang
92
4
0
15 Dec 2023
3DAxiesPrompts: Unleashing the 3D Spatial Task Capabilities of GPT-4V
Dingning Liu
Xiaomeng Dong
Renrui Zhang
Xu Luo
Peng Gao
Xiaoshui Huang
Yongshun Gong
Zhihui Wang
90
11
0
15 Dec 2023
SkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery
Xin Guo
Jiangwei Lao
Bo Dang
Yingying Zhang
Lei Yu
...
Jian Wang
Jingdong Chen
Ming Yang
Yongjun Zhang
Yansheng Li
157
129
0
15 Dec 2023
SeiT++: Masked Token Modeling Improves Storage-efficient Training
Min-Seob Lee
Song Park
Byeongho Heo
Dongyoon Han
Hyunjung Shim
MQ
VLM
76
1
0
15 Dec 2023
Previous
1
2
3
...
45
46
47
...
94
95
96
Next