Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.06377
Cited By
v1
v2
v3 (latest)
Masked Autoencoders Are Scalable Vision Learners
11 November 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Masked Autoencoders Are Scalable Vision Learners"
50 / 4,778 papers shown
Title
SpikeZIP-TF: Conversion is All You Need for Transformer-based SNN
Kang You
Zekai Xu
Chen Nie
Zhijie Deng
Qinghai Guo
Xiang Wang
Zhezhi He
104
11
0
05 Jun 2024
LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection
Qiang Chen
Xiangbo Su
Xinyu Zhang
Jian Wang
Jiahui Chen
...
Shan Zhang
Kun Yao
Errui Ding
Gang Zhang
Jingdong Wang
ViT
118
21
0
05 Jun 2024
FILS: Self-Supervised Video Feature Prediction In Semantic Language Space
Mona Ahmadian
Frank Guerin
Andrew Gilbert
119
1
0
05 Jun 2024
FusionBench: A Comprehensive Benchmark of Deep Model Fusion
Anke Tang
Li Shen
Yong Luo
Han Hu
Di Lin
Dacheng Tao
ELM
MoMe
VLM
82
27
0
05 Jun 2024
Genuine-Focused Learning using Mask AutoEncoder for Generalized Fake Audio Detection
Xiaopeng Wang
Ruibo Fu
Zhengqi Wen
Zhiyong Wang
Yuankun Xie
...
Xuefei Liu
Yongwei Li
Xin Qi
Yi Lu
Shuchen Shi
65
6
0
05 Jun 2024
Self-Supervised Skeleton-Based Action Representation Learning: A Benchmark and Beyond
Jiahang Zhang
Lilang Lin
Shuai Yang
Jiaying Liu
SSL
101
2
0
05 Jun 2024
GraphAlign: Pretraining One Graph Neural Network on Multiple Graphs via Feature Alignment
Zhenyu Hou
Haozhan Li
Yukuo Cen
Jie Tang
Yuxiao Dong
95
8
0
05 Jun 2024
AVFF: Audio-Visual Feature Fusion for Video Deepfake Detection
Trevine Oorloff
Surya Koppisetti
Nicolo Bonettini
Divyaraj Solanki
Ben Colman
Yaser Yacoob
Ali Shahriyari
Gaurav Bharaj
99
26
0
05 Jun 2024
ZeroPur: Succinct Training-Free Adversarial Purification
Xiuli Bi
Zonglin Yang
Bo Liu
Xiaodong Cun
Chi-Man Pun
133
0
0
05 Jun 2024
Multi-layer Learnable Attention Mask for Multimodal Tasks
Wayner Barrios
SouYoung Jin
73
1
0
04 Jun 2024
Enhancing 2D Representation Learning with a 3D Prior
Mehmet Aygun
Prithviraj Dhar
Zhicheng Yan
Oisin Mac Aodha
Rakesh Ranjan
SSL
99
1
0
04 Jun 2024
An Empirical Study into Clustering of Unseen Datasets with Self-Supervised Encoders
Scott C. Lowe
Joakim Bruslund Haurum
Sageev Oore
T. Moeslund
Graham W. Taylor
SSL
122
4
0
04 Jun 2024
Audio Mamba: Selective State Spaces for Self-Supervised Audio Representations
Sarthak Yadav
Zheng-Hua Tan
Mamba
81
17
0
04 Jun 2024
M2D-CLAP: Masked Modeling Duo Meets CLAP for Learning General-purpose Audio-Language Representation
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
Noboru Harada
Masahiro Yasuda
Shunsuke Tsubaki
Keisuke Imoto
VLM
95
7
0
04 Jun 2024
CODE: Contrasting Self-generated Description to Combat Hallucination in Large Multi-modal Models
Junho Kim
Hyunjun Kim
Yeonju Kim
Yong Man Ro
MLLM
117
16
0
04 Jun 2024
Frequency Enhanced Pre-training for Cross-city Few-shot Traffic Forecasting
Zhanyu Liu
Jianrong Ding
Guanjie Zheng
AI4TS
74
3
0
03 Jun 2024
DDA: Dimensionality Driven Augmentation Search for Contrastive Learning in Laparoscopic Surgery
Yuning Zhou
H. Badgery
Matthew Read
James Bailey
Catherine E. Davey
63
2
0
03 Jun 2024
Task-oriented Embedding Counts: Heuristic Clustering-driven Feature Fine-tuning for Whole Slide Image Classification
Xuenian Wang
Shanshan Shi
Renao Yan
Qiehe Sun
Lianghui Zhu
Tian Guan
Yonghong He
78
2
0
02 Jun 2024
Learning Discrete Concepts in Latent Hierarchical Models
Lingjing Kong
Guan-Hong Chen
Erdun Gao
Eric P. Xing
Yuejie Chi
Kun Zhang
132
5
0
01 Jun 2024
Learning Manipulation by Predicting Interaction
Jia Zeng
Qingwen Bu
Bangjun Wang
Wenke Xia
Li Chen
...
Heming Cui
Bin Zhao
Xuelong Li
Yu Qiao
Hongyang Li
134
26
0
01 Jun 2024
Whole Heart 3D+T Representation Learning Through Sparse 2D Cardiac MR Images
Yundi Zhang
Chen Chen
Suprosanna Shit
Sophie Starck
Daniel Rueckert
Jiazhen Pan
117
2
0
01 Jun 2024
Cross-Table Pretraining towards a Universal Function Space for Heterogeneous Tabular Data
Jintai Chen
Zhen Lin
Qiyuan Chen
Jimeng Sun
LMTD
94
1
0
01 Jun 2024
Extreme Point Supervised Instance Segmentation
Hyeonjun Lee
S. Hwang
Suha Kwak
61
2
0
31 May 2024
MASA: Motion-aware Masked Autoencoder with Semantic Alignment for Sign Language Recognition
Weichao Zhao
Hezhen Hu
Wen-gang Zhou
Yunyao Mao
Min Wang
Houqiang Li
SLR
83
10
0
31 May 2024
Is Synthetic Data all We Need? Benchmarking the Robustness of Models Trained with Synthetic Images
Krishnakant Singh
Thanush Navaratnam
Jannik Holmer
Simone Schaub-Meyer
Stefan Roth
DiffM
99
21
0
30 May 2024
Multi-Label Guided Soft Contrastive Learning for Efficient Earth Observation Pretraining
Yi Wang
C. Albrecht
Xiao Xiang Zhu
91
10
0
30 May 2024
Knockout: A simple way to handle missing inputs
Minh Nguyen
Batuhan K. Karaman
Heejong Kim
Alan Q. Wang
Fengbei Liu
M. Sabuncu
OOD
UQCV
73
2
0
30 May 2024
Scaling White-Box Transformers for Vision
Jinrui Yang
Xianhang Li
Druv Pai
Yuyin Zhou
Yi-An Ma
Yaodong Yu
Cihang Xie
ViT
114
9
0
30 May 2024
Robust Image Semantic Coding with Learnable CSI Fusion Masking over MIMO Fading Channels
Bingyan Xie
Yongpeng Wu
Yuxuan Shi
Wenjun Zhang
Shuguang Cui
Merouane Debbah
82
3
0
30 May 2024
DeMamba: AI-Generated Video Detection on Million-Scale GenVideo Benchmark
Haoxing Chen
Yan Hong
Zizheng Huang
Zhuoer Xu
Zhangxuan Gu
...
Jun Lan
Huijia Zhu
Jianfu Zhang
Weiqiang Wang
Huaxiong Li
Mamba
164
21
0
30 May 2024
EgoSurgery-Phase: A Dataset of Surgical Phase Recognition from Egocentric Open Surgery Videos
Ryoske Fujii
Masashi Hatano
Hideo Saito
Hiroki Kajita
92
9
0
30 May 2024
SAM-E: Leveraging Visual Foundation Model with Sequence Imitation for Embodied Manipulation
Junjie Zhang
Chenjia Bai
Haoran He
Wenke Xia
Zhigang Wang
Bin Zhao
Xiu Li
Xuelong Li
120
13
0
30 May 2024
DP-IQA: Utilizing Diffusion Prior for Blind Image Quality Assessment in the Wild
Honghao Fu
Yufei Wang
Wenhan Yang
Alex C. Kot
Bihan Wen
102
3
0
30 May 2024
Distribution Aligned Semantics Adaption for Lifelong Person Re-Identification
Qizao Wang
Xuelin Qian
Bin Li
Xiangyang Xue
100
1
0
30 May 2024
Enhancing Vision-Language Model with Unmasked Token Alignment
Jihao Liu
Jinliang Zheng
Boxiao Liu
Yu Liu
Hongsheng Li
CLIP
45
0
0
29 May 2024
EntProp: High Entropy Propagation for Improving Accuracy and Robustness
Shohei Enomoto
AAML
109
1
0
29 May 2024
MLAE: Masked LoRA Experts for Parameter-Efficient Fine-Tuning
Junjie Wang
Guangjing Yang
Wentao Chen
Huahui Yi
Xiaohu Wu
Qicheng Lao
MoE
ALM
82
0
0
29 May 2024
LetsMap: Unsupervised Representation Learning for Semantic BEV Mapping
Nikhil Gosala
Kürsat Petek
B. R. Kiran
S. Yogamani
Paulo L. J. Drews-Jr
Wolfram Burgard
Abhinav Valada
SSL
122
0
0
29 May 2024
Large Brain Model for Learning Generic Representations with Tremendous EEG Data in BCI
Wei-Bang Jiang
Li-Ming Zhao
Bao-Liang Lu
107
92
0
29 May 2024
FocSAM: Delving Deeply into Focused Objects in Segmenting Anything
You Huang
Zongyu Lan
Liujuan Cao
Xianming Lin
Shengchuan Zhang
Guannan Jiang
Rongrong Ji
VLM
53
2
0
29 May 2024
MEGA: Masked Generative Autoencoder for Human Mesh Recovery
Guénolé Fiche
Simon Leglaive
Xavier Alameda-Pineda
Francesc Moreno-Noguer
3DH
139
1
0
29 May 2024
WIDIn: Wording Image for Domain-Invariant Representation in Single-Source Domain Generalization
Jiawei Ma
Yulei Niu
Shiyuan Huang
G. Han
Shih-Fu Chang
VLM
86
1
0
28 May 2024
SCE-MAE: Selective Correspondence Enhancement with Masked Autoencoder for Self-Supervised Landmark Estimation
Kejia Yin
Varshanth R. Rao
R. Jiang
Xudong Liu
P. Aarabi
David B. Lindell
104
0
0
28 May 2024
Self-Supervised Learning Based Handwriting Verification
Mihir Chauhan
Mohammad Abuzar Shaikh
Abhishek Satbhai
Mir Basheer Ali
B. Ramamurthy
Mingchen Gao
Siwei Lyu
Sargur Srihari
83
2
0
28 May 2024
MSPE: Multi-Scale Patch Embedding Prompts Vision Transformers to Any Resolution
Wenzhuo Liu
Fei Zhu
Shijie Ma
Cheng-Lin Liu
79
5
0
28 May 2024
In-Context Symmetries: Self-Supervised Learning through Contextual World Models
Sharut Gupta
Chenyu Wang
Yifei Wang
Tommi Jaakkola
Stefanie Jegelka
74
3
0
28 May 2024
Visualizing the loss landscape of Self-supervised Vision Transformer
Youngwan Lee
Jeffrey Willette
Jonghee Kim
Sung Ju Hwang
ViT
64
1
0
28 May 2024
DMT-JEPA: Discriminative Masked Targets for Joint-Embedding Predictive Architecture
Shentong Mo
Sukmin Yun
91
3
0
28 May 2024
Self-supervised Pre-training for Transferable Multi-modal Perception
Xiaohao Xu
Tianyi Zhang
Jinrong Yang
Matthew Johnson-Roberson
Xiaonan Huang
36
0
0
28 May 2024
PTM-VQA: Efficient Video Quality Assessment Leveraging Diverse PreTrained Models from the Wild
Kun Yuan
Hongbo Liu
Mading Li
Muyi Sun
Ming Sun
Jiachao Gong
Jinhua Hao
Chao Zhou
Yansong Tang
ViT
91
5
0
28 May 2024
Previous
1
2
3
...
30
31
32
...
94
95
96
Next