Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.06377
Cited By
v1
v2
v3 (latest)
Masked Autoencoders Are Scalable Vision Learners
11 November 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Masked Autoencoders Are Scalable Vision Learners"
50 / 4,777 papers shown
Title
Restore Anything with Masks: Leveraging Mask Image Modeling for Blind All-in-One Image Restoration
Chu-Jie Qin
Rui-Qi Wu
Zikun Liu
Xin Lin
Chun-Le Guo
Hyun Hee Park
Chongyi Li
88
8
0
28 Sep 2024
Forgetting, Ignorance or Myopia: Revisiting Key Challenges in Online Continual Learning
Xinrui Wang
Chuanxing Geng
Wenhai Wan
Shao-yuan Li
Songcan Chen
CLL
105
3
0
28 Sep 2024
From Vision to Audio and Beyond: A Unified Model for Audio-Visual Representation and Generation
Kun Su
Xiulong Liu
Eli Shlizerman
VGen
163
7
0
27 Sep 2024
Localizing Memorization in SSL Vision Encoders
Wenhao Wang
Adam Dziedzic
Michael Backes
Franziska Boenisch
67
2
0
27 Sep 2024
ProMerge: Prompt and Merge for Unsupervised Instance Segmentation
Dylan Li
Gyungin Shin
78
3
0
27 Sep 2024
UniEmoX: Cross-modal Semantic-Guided Large-Scale Pretraining for Universal Scene Emotion Perception
Chuang Chen
Xingwu Sun
Zhi Liu
91
1
0
27 Sep 2024
Learning from Pattern Completion: Self-supervised Controllable Generation
Zhiqiang Chen
Guofan Fan
Jinying Gao
Lei Ma
Bo Lei
Tiejun Huang
Shan Yu
52
0
0
27 Sep 2024
Off to new Shores: A Dataset & Benchmark for (near-)coastal Flood Inundation Forecasting
Brandon Victor
Mathilde Letard
Peter Naylor
Karim Douch
Nicolas Longépé
Zhen He
Patrick Ebel
AI4CE
53
1
0
27 Sep 2024
Cross-video Identity Correlating for Person Re-identification Pre-training
Jialong Zuo
Ying Nie
Hanyu Zhou
Huaxin Zhang
Haoyu Wang
Tianyu Guo
Nong Sang
Changxin Gao
90
5
0
27 Sep 2024
How Effective is Pre-training of Large Masked Autoencoders for Downstream Earth Observation Tasks?
Jose Sosa
Mohamed Aloulou
Danila Rukhovich
Rim Sleimi
Boonyarit Changaival
Anis Kacem
Djamila Aouada
75
1
0
27 Sep 2024
Token Caching for Diffusion Transformer Acceleration
Jinming Lou
Wenyang Luo
Yufan Liu
Bing Li
Xinmiao Ding
Weiming Hu
Jiajiong Cao
Yuming Li
Chenguang Ma
88
6
0
27 Sep 2024
CycleNet: Enhancing Time Series Forecasting through Modeling Periodic Patterns
Shengsheng Lin
Weiwei Lin
Xinyi Hu
Wentai Wu
Ruichao Mo
Haocheng Zhong
AI4TS
115
30
0
27 Sep 2024
Self-supervised Pretraining for Cardiovascular Magnetic Resonance Cine Segmentation
Rob A. J. de Mooij
Josien P. W. Pluim
Cian M. Scannell
59
0
0
26 Sep 2024
Spatial Hierarchy and Temporal Attention Guided Cross Masking for Self-supervised Skeleton-based Action Recognition
Xinpeng Yin
Wenming Cao
80
0
0
26 Sep 2024
Efficient Bias Mitigation Without Privileged Information
Mateo Espinosa Zarlenga
Swami Sankaranarayanan
Jerone T. A. Andrews
Z. Shams
M. Jamnik
Alice Xiang
126
3
0
26 Sep 2024
Prototype based Masked Audio Model for Self-Supervised Learning of Sound Event Detection
Pengfei Cai
Yan Song
Nan Jiang
Qing Gu
Ian Mcloughlin
60
2
0
26 Sep 2024
Triple Point Masking
Jiaming Liu
Linghe Kong
Yue Wu
Maoguo Gong
Hao Li
Qiguang Miao
Wenping Ma
Can Qin
3DPC
85
0
0
26 Sep 2024
CadVLM: Bridging Language and Vision in the Generation of Parametric CAD Sketches
Sifan Wu
Amir Khasahmadi
Mor Katz
P. Jayaraman
Yewen Pu
K. Willis
Bang Liu
3DV
72
9
0
26 Sep 2024
CROSS-GAiT: Cross-Attention-Based Multimodal Representation Fusion for Parametric Gait Adaptation in Complex Terrains
Gershom Seneviratne
K. Weerakoon
Mohamed Bashir Elnoor
Vignesh Rajgopal
Harshavarthan Varatharajan
Mohamed Khalid M Jaffar
Jason Pusey
Dinesh Manocha
CVBM
71
0
0
25 Sep 2024
PACE: Marrying generalization in PArameter-efficient fine-tuning with Consistency rEgularization
Yao Ni
Shan Zhang
Piotr Koniusz
463
8
0
25 Sep 2024
Face Forgery Detection with Elaborate Backbone
Zonghui Guo
Y. Liu
Jie Zhang
Haiyong Zheng
Shiguang Shan
AAML
CVBM
97
1
0
25 Sep 2024
3DDX: Bone Surface Reconstruction from a Single Standard-Geometry Radiograph via Dual-Face Depth Estimation
Yi Gu
Y. Otake
Keisuke Uemura
Masaki Takao
Mazen Soufi
S. Okada
Nobuhiko Sugano
Hugues Talbot
Yoshinobu Sato
53
2
0
25 Sep 2024
Layout-Corrector: Alleviating Layout Sticking Phenomenon in Discrete Diffusion Model
Shoma Iwai
Atsuki Osanai
Shunsuke Kitada
S. Omachi
3DV
53
2
0
25 Sep 2024
Stochastic Subsampling With Average Pooling
Bum Jun Kim
Sang Woo Kim
43
0
0
25 Sep 2024
EMIT- Event-Based Masked Auto Encoding for Irregular Time Series
Hrishikesh Patel
Ruihong Qiu
Adam Irwin
Shazia Sadiq
Sen Wang
AI4TS
117
3
0
25 Sep 2024
Self-Supervised Any-Point Tracking by Contrastive Random Walks
Ayush Shrivastava
Andrew Owens
63
5
0
24 Sep 2024
Segmentation Strategies in Deep Learning for Prostate Cancer Diagnosis: A Comparative Study of Mamba, SAM, and YOLO
Ali Badiezadeh
Amin Malekmohammadi
Seyed Mostafa Mirhassani
Parisa Gifani
Majid Vafaeezadeh
Mamba
65
2
0
24 Sep 2024
Predicting Distance matrix with large language models
Jiaxing Yang
24
0
0
24 Sep 2024
Hyperbolic Image-and-Pointcloud Contrastive Learning for 3D Classification
Naiwen Hu
Haozhe Cheng
Yifan Xie
Pengcheng Shi
Jihua Zhu
3DPC
98
0
0
24 Sep 2024
3D-JEPA: A Joint Embedding Predictive Architecture for 3D Self-Supervised Representation Learning
Naiwen Hu
Haozhe Cheng
Yifan Xie
Shiqi Li
Jihua Zhu
AI4TS
3DV
53
0
0
24 Sep 2024
Towards Universal Large-Scale Foundational Model for Natural Gas Demand Forecasting
Xinxing Zhou
Jiaqi Ye
Shubao Zhao
Ming Jin
Zhaoxiang Hou
Chengyi Yang
Zengxiang Li
Yanlong Wen
Xiaojie Yuan
AI4TS
63
1
0
24 Sep 2024
Robust Training Objectives Improve Embedding-based Retrieval in Industrial Recommendation Systems
Matthew Kolodner
Mingxuan Ju
Zihao Fan
Tong Zhao
Elham Ghazizadeh
Yan Wu
Neil Shah
Yozen Liu
83
4
0
23 Sep 2024
Mammo-Clustering: A Multi-views Tri-level Information Fusion Context Clustering Framework for Localization and Classification in Mammography
Shilong Yang
Chulong Zhang
Qi Zang
Juan Yu
Liang Zeng
...
Yexuan Xing
Xin Pan
Qi Li
Xiaokun Liang
Yaoqin Xie
101
0
0
23 Sep 2024
BrainDreamer: Reasoning-Coherent and Controllable Image Generation from EEG Brain Signals via Language Guidance
Ling Wang
Chen Wu
Lin Wang
DiffM
66
0
0
21 Sep 2024
ViTGuard: Attention-aware Detection against Adversarial Examples for Vision Transformer
Shihua Sun
Kenechukwu Nwodo
Shridatt Sugrim
Angelos Stavrou
Haining Wang
AAML
85
1
0
20 Sep 2024
Prithvi WxC: Foundation Model for Weather and Climate
J. Schmude
Sujit Roy
Will Trojak
Johannes Jakubik
Daniel Salles Civitarese
...
Campbell Watson
M. Maskey
Tsengdar J Lee
Juan Bernabé-Moreno
Rahul Ramachandran
VLM
AI4Cl
102
10
0
20 Sep 2024
Formula-Supervised Visual-Geometric Pre-training
Ryosuke Yamada
Kensho Hara
Hirokatsu Kataoka
Koshi Makihara
Nakamasa Inoue
Rio Yokota
Y. Satoh
57
1
0
20 Sep 2024
Leveraging Text Localization for Scene Text Removal via Text-aware Masked Image Modeling
Zixiao Wang
Hongtao Xie
Yuxin Wang
Yadong Qu
Fengjun Guo
Pengwei Liu
DiffM
71
0
0
20 Sep 2024
FreeAvatar: Robust 3D Facial Animation Transfer by Learning an Expression Foundation Model
Feng Qiu
Wei Zhang
Chen Liu
Rudong An
Lincheng Li
Yu Ding
Changjie Fan
Zhipeng Hu
Xin Yu
SLR
3DH
86
0
0
20 Sep 2024
RingMo-Aerial: An Aerial Remote Sensing Foundation Model With Affine Transformation Contrastive Learning
Wenhui Diao
Haichen Yu
Kaiyue Kang
Tong Ling
Di Liu
...
Hanbo Bi
Libo Ren
Xuexue Li
Yongqiang Mao
Xian Sun
274
1
0
20 Sep 2024
MEXMA: Token-level objectives improve sentence representations
Joao Maria Janeiro
Benjamin Piwowarski
Patrick Gallinari
Loïc Barrault
41
2
0
19 Sep 2024
Is Tokenization Needed for Masked Particle Modelling?
Matthew Leigh
Samuel Klein
François Charton
Tobias Golling
Lukas Heinrich
Michael Kagan
Ines Ochoa
Margarita Osadchy
95
8
0
19 Sep 2024
SoundBeam meets M2D: Target Sound Extraction with Audio Foundation Model
Carlos Hernandez-Olivan
Marc Delcroix
Tsubasa Ochiai
Daisuke Niizumi
Naohiro Tawara
Tomohiro Nakatani
Shoko Araki
54
2
0
19 Sep 2024
FoME: A Foundation Model for EEG using Adaptive Temporal-Lateral Attention Scaling
Enze Shi
Kui Zhao
Qilong Yuan
Jiaqi Wang
Huawen Hu
Sigang Yu
Shu Zhang
52
4
0
19 Sep 2024
Measuring Sound Symbolism in Audio-visual Models
Wei-Cheng Tseng
Yi-Jen Shih
David Harwath
Raymond Mooney
90
0
0
18 Sep 2024
Unsupervised Feature Orthogonalization for Learning Distortion-Invariant Representations
Sebastian Doerrich
Francesco Di Salvo
Christian Ledig
OOD
53
0
0
18 Sep 2024
DynaMo: In-Domain Dynamics Pretraining for Visuo-Motor Control
Zichen Jeff Cui
Hengkai Pan
Aadhithya Iyer
Siddhant Haldar
Lerrel Pinto
VGen
116
17
0
18 Sep 2024
Agglomerative Token Clustering
Joakim Bruslund Haurum
Sergio Escalera
Graham W. Taylor
T. Moeslund
83
4
0
18 Sep 2024
EventAug: Multifaceted Spatio-Temporal Data Augmentation Methods for Event-based Learning
Yukun Tian
Hao Chen
Yongjian Deng
Feihong Shen
Kepan Liu
Wei You
Ziyang Zhang
57
0
0
18 Sep 2024
DETECLAP: Enhancing Audio-Visual Representation Learning with Object Information
Shota Nakada
Taichi Nishimura
Hokuto Munakata
Masayoshi Kondo
Tatsuya Komatsu
CLIP
VLM
63
0
0
18 Sep 2024
Previous
1
2
3
...
20
21
22
...
94
95
96
Next