Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.06377
Cited By
v1
v2
v3 (latest)
Masked Autoencoders Are Scalable Vision Learners
11 November 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Masked Autoencoders Are Scalable Vision Learners"
50 / 4,779 papers shown
Title
Bigger is not Always Better: Scaling Properties of Latent Diffusion Models
Kangfu Mei
Zhengzhong Tu
M. Delbracio
Hossein Talebi
Vishal M. Patel
P. Milanfar
DiffM
88
13
0
01 Apr 2024
Bridging Remote Sensors with Multisensor Geospatial Foundation Models
Boran Han
Shuai Zhang
Xingjian Shi
Markus Reichstein
90
27
0
01 Apr 2024
Adaptive Query Prompting for Multi-Domain Landmark Detection
Qiusen Wei
Guoheng Huang
Xiaochen Yuan
Xuhang Chen
Guo Zhong
Jianwen Huang
Jiajie Huang
MedIm
67
2
0
01 Apr 2024
SyncMask: Synchronized Attentional Masking for Fashion-centric Vision-Language Pretraining
Chull Hwan Song
Taebaek Hwang
Jooyoung Yoon
Shunghyun Choi
Yeong Hyeon Gu
50
5
0
01 Apr 2024
A Survey on Hypergraph Neural Networks: An In-Depth and Step-By-Step Guide
Sunwoo Kim
Soo Yong Lee
Yue Gao
Alessia Antelmi
Mirko Polato
Kijung Shin
GNN
AI4TS
90
23
0
01 Apr 2024
Diffusion-Driven Domain Adaptation for Generating 3D Molecules
Haokai Hong
Wanyu Lin
Kay Chen Tan
DiffM
80
2
0
01 Apr 2024
Lipsum-FT: Robust Fine-Tuning of Zero-Shot Models Using Random Text Guidance
G. Nam
Byeongho Heo
Juho Lee
VLM
73
7
0
01 Apr 2024
Rethinking Interactive Image Segmentation with Low Latency, High Quality, and Diverse Prompts
Qin Liu
Jaemin Cho
Mohit Bansal
Marc Niethammer
VLM
103
12
0
31 Mar 2024
Learning to Rank Patches for Unbiased Image Redundancy Reduction
Yang Luo
Zhineng Chen
Peng Zhou
Zuxuan Wu
Xieping Gao
Yu-Gang Jiang
SSL
83
4
0
31 Mar 2024
HypeBoy: Generative Self-Supervised Representation Learning on Hypergraphs
Sunwoo Kim
Shinhwan Kang
Fanchen Bu
Soo Yong Lee
Jaemin Yoo
Kijung Shin
SSL
80
11
0
31 Mar 2024
A Review of Modern Recommender Systems Using Generative Models (Gen-RecSys)
Yashar Deldjoo
Zhankui He
Julian McAuley
Anton Korikov
Scott Sanner
Arnau Ramisa
René Vidal
M. Sathiamoorthy
Atoosa Kasirzadeh
Silvia Milano
VLM
152
61
0
31 Mar 2024
Transformer based Pluralistic Image Completion with Reduced Information Loss
Qiankun Liu
Yuqi Jiang
Zhentao Tan
DongDong Chen
Ying Fu
Qi Chu
Gang Hua
Nenghai Yu
ViT
114
12
0
31 Mar 2024
DailyMAE: Towards Pretraining Masked Autoencoders in One Day
Jiantao Wu
Shentong Mo
Sara Atito
Zhenhua Feng
Josef Kittler
Muhammad Awais
84
3
0
31 Mar 2024
Bayesian Exploration of Pre-trained Models for Low-shot Image Classification
Yibo Miao
Yu Lei
Feng Zhou
Zhijie Deng
VLM
UQCV
BDL
104
3
0
30 Mar 2024
Long-Tailed Recognition on Binary Networks by Calibrating A Pre-trained Model
Jihun Kim
Dahyun Kim
Hyungrok Jung
Taeil Oh
Jonghyun Choi
MQ
119
0
0
30 Mar 2024
Image-to-Image Matching via Foundation Models: A New Perspective for Open-Vocabulary Semantic Segmentation
Yuan Wang
Rui Sun
Naisong Luo
Yuwen Pan
Tianzhu Zhang
VLM
81
10
0
30 Mar 2024
Learned Scanpaths Aid Blind Panoramic Video Quality Assessment
Kanglong Fan
Wen Wen
Mu Li
Yifan Peng
Kede Ma
66
2
0
30 Mar 2024
InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning
Yan-Shuo Liang
Wu-Jun Li
CLL
127
53
0
30 Mar 2024
Robust Ensemble Person Re-Identification via Orthogonal Fusion with Occlusion Handling
Syeda Nyma Ferdous
Xin Li
99
0
0
29 Mar 2024
Revolutionizing Disease Diagnosis with simultaneous functional PET/MR and Deeply Integrated Brain Metabolic, Hemodynamic, and Perfusion Networks
Luoyu Wang
Yitian Tao
Qing Yang
Yan Liang
Siwei Liu
Hongcheng Shi
Dinggang Shen
Han Zhang
MedIm
37
0
0
29 Mar 2024
A Unified Framework for Human-centric Point Cloud Video Understanding
Yiteng Xu
Kecheng Ye
Xiao Han
Yiming Ren
Xinge Zhu
Yuexin Ma
76
2
0
29 Mar 2024
FairCLIP: Harnessing Fairness in Vision-Language Learning
Yan Luo
Minfei Shi
Muhammad Osama Khan
Muhammad Muneeb Afzal
Hao Huang
...
Luo Song
Ava Kouhana
T. Elze
Yi Fang
Mengyu Wang
VLM
93
38
0
29 Mar 2024
Siamese Vision Transformers are Scalable Audio-visual Learners
Yan-Bo Lin
Gedas Bertasius
85
7
0
28 Mar 2024
MVEB: Self-Supervised Learning with Multi-View Entropy Bottleneck
Liangjiang Wen
Xiasi Wang
Jianzhuang Liu
Zenglin Xu
54
3
0
28 Mar 2024
Low-Rank Rescaled Vision Transformer Fine-Tuning: A Residual Design Approach
Wei Dong
Xing Zhang
Bihui Chen
Dawei Yan
Zhijun Lin
Qingsen Yan
Peng Wang
Yang Yang
77
7
0
28 Mar 2024
ImageNet-D: Benchmarking Neural Network Robustness on Diffusion Synthetic Object
Chenshuang Zhang
Fei Pan
Junmo Kim
In So Kweon
Chengzhi Mao
85
11
1
27 Mar 2024
ParCo: Part-Coordinating Text-to-Motion Synthesis
Qiran Zou
Shangyuan Yuan
Shian Du
Yu Wang
Chang-Shu Liu
Yi Tian Xu
Jie Chen
Xiangyang Ji
75
20
0
27 Mar 2024
ViTAR: Vision Transformer with Any Resolution
Qihang Fan
Quanzeng You
Xiaotian Han
Yongfei Liu
Yunzhe Tao
Huaibo Huang
Ran He
Hongxia Yang
ViT
90
16
0
27 Mar 2024
Selective Mixup Fine-Tuning for Optimizing Non-Decomposable Objectives
Shrinivas Ramasubramanian
Harsh Rangwani
S. Takemori
Kunal Samanta
Yuhei Umeda
Venkatesh Babu Radhakrishnan
75
0
0
27 Mar 2024
NeuroPictor: Refining fMRI-to-Image Reconstruction via Multi-individual Pretraining and Multi-level Modulation
Jingyang Huo
Yikai Wang
Xuelin Qian
Yun Wang
Chong Li
Jianfeng Feng
Yanwei Fu
DiffM
MedIm
77
10
0
27 Mar 2024
DODA: Adapting Object Detectors to Dynamic Agricultural Environments in Real-Time with Diffusion
Shuai Xiang
Pieter M. Blok
James Burridge
Haozhou Wang
Wei Guo
105
0
0
27 Mar 2024
Neural Embedding Compression For Efficient Multi-Task Earth Observation Modelling
Carlos Gomes
Thomas Brunschwiler
101
0
0
26 Mar 2024
Efficient Image Pre-Training with Siamese Cropped Masked Autoencoders
Alexandre Eymaël
Renaud Vandeghen
A. Cioppa
Silvio Giancola
Guohao Li
Marc Van Droogenbroeck
ViT
77
8
0
26 Mar 2024
Masked Autoencoders are PDE Learners
Anthony Zhou
A. Farimani
AI4CE
119
7
0
26 Mar 2024
Tiny Models are the Computational Saver for Large Models
Qingyuan Wang
B. Cardiff
Antoine Frappé
Benoît Larras
Deepu John
147
2
0
26 Mar 2024
DiffFAE: Advancing High-fidelity One-shot Facial Appearance Editing with Space-sensitive Customization and Semantic Preservation
Qilin Wang
Jiangning Zhang
Chengming Xu
Weijian Cao
Ying Tai
Yue Han
Yanhao Ge
Hong Gu
Chengjie Wang
Yanwei Fu
DiffM
69
0
0
26 Mar 2024
TRAM: Global Trajectory and Motion of 3D Humans from in-the-wild Videos
Yufu Wang
ZiYun Wang
Lingjie Liu
Kostas Daniilidis
90
32
0
26 Mar 2024
Exploring Dynamic Transformer for Efficient Object Tracking
Jiawen Zhu
Xin Chen
Haiwen Diao
Shuai Li
Jun-Yan He
Chenyang Li
Bin Luo
Dong Wang
Huchuan Lu
146
3
0
26 Mar 2024
SD-DiT: Unleashing the Power of Self-supervised Discrimination in Diffusion Transformer
Rui Zhu
Yingwei Pan
Yehao Li
Ting Yao
Zhenglong Sun
Tao Mei
C. Chen
120
26
0
25 Mar 2024
UrbanVLP: Multi-Granularity Vision-Language Pretraining for Urban Socioeconomic Indicator Prediction
Xixuan Hao
Wei Chen
Yibo Yan
Siru Zhong
Kun Wang
Qingsong Wen
Yuxuan Liang
VLM
118
1
0
25 Mar 2024
SatSynth: Augmenting Image-Mask Pairs through Diffusion Models for Aerial Semantic Segmentation
Aysim Toker
Marvin Eisenberger
Zorah Lähner
Laura Leal-Taixé
DiffM
101
26
0
25 Mar 2024
QKFormer: Hierarchical Spiking Transformer using Q-K Attention
Chenlin Zhou
Han Zhang
Zhaokun Zhou
Liutao Yu
Liwei Huang
Xiaopeng Fan
Liuliang Yuan
Zhengyu Ma
Huihui Zhou
Yonghong Tian
110
18
0
25 Mar 2024
CMViM: Contrastive Masked Vim Autoencoder for 3D Multi-modal Representation Learning for AD classification
Guangqian Yang
Kangrui Du
Zhihan Yang
Ye Du
Yongping Zheng
Shujun Wang
90
19
0
25 Mar 2024
PathoTune: Adapting Visual Foundation Model to Pathological Specialists
Jiaxuan Lu
Fang Yan
Xiaofan Zhang
Yue Gao
Shaoting Zhang
VLM
LM&MA
MedIm
82
7
0
25 Mar 2024
LSTTN: A Long-Short Term Transformer-based Spatio-temporal Neural Network for Traffic Flow Forecasting
Qinyao Luo
Silu He
Xing Han
Yuhan Wang
Haifeng Li
AI4TS
98
49
0
25 Mar 2024
Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects
Zicong Fan
Takehiko Ohkawa
Linlin Yang
Nie Lin
Zhishan Zhou
...
Kun He
Yoichi Sato
Otmar Hilliges
Hyung Jin Chang
Angela Yao
125
16
0
25 Mar 2024
L-MAE: Longitudinal masked auto-encoder with time and severity-aware encoding for diabetic retinopathy progression prediction
Rachid Zeghlache
Pierre-Henri Conze
Mostafa EL HABIB DAHO
Yi-Hsuan Li
Alireza Rezaei
...
Pascale Massin
B. Cochener
Ikram Brahim
G. Quellec
M. Lamard
67
0
0
24 Mar 2024
Out-of-Distribution Detection via Deep Multi-Comprehension Ensemble
Chenhui Xu
Fuxun Yu
Zirui Xu
Nathan Inkawhich
Xiang Chen
OODD
86
6
0
24 Mar 2024
Adversarially Masked Video Consistency for Unsupervised Domain Adaptation
Xiaoyu Zhu
Junwei Liang
Po-Yao Huang
Alex Hauptmann
105
1
0
24 Mar 2024
Segment Anything Model for Road Network Graph Extraction
Congrui Hetang
Haoru Xue
Cindy X. Le
Tianwei Yue
Wenping Wang
Yihui He
141
17
0
24 Mar 2024
Previous
1
2
3
...
36
37
38
...
94
95
96
Next