ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.06377
  4. Cited By
Masked Autoencoders Are Scalable Vision Learners
v1v2v3 (latest)

Masked Autoencoders Are Scalable Vision Learners

11 November 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
    ViTTPM
ArXiv (abs)PDFHTML

Papers citing "Masked Autoencoders Are Scalable Vision Learners"

50 / 4,777 papers shown
Title
Restore Anything with Masks: Leveraging Mask Image Modeling for Blind
  All-in-One Image Restoration
Restore Anything with Masks: Leveraging Mask Image Modeling for Blind All-in-One Image Restoration
Chu-Jie Qin
Rui-Qi Wu
Zikun Liu
Xin Lin
Chun-Le Guo
Hyun Hee Park
Chongyi Li
88
8
0
28 Sep 2024
Forgetting, Ignorance or Myopia: Revisiting Key Challenges in Online
  Continual Learning
Forgetting, Ignorance or Myopia: Revisiting Key Challenges in Online Continual Learning
Xinrui Wang
Chuanxing Geng
Wenhai Wan
Shao-yuan Li
Songcan Chen
CLL
105
3
0
28 Sep 2024
From Vision to Audio and Beyond: A Unified Model for Audio-Visual
  Representation and Generation
From Vision to Audio and Beyond: A Unified Model for Audio-Visual Representation and Generation
Kun Su
Xiulong Liu
Eli Shlizerman
VGen
163
7
0
27 Sep 2024
Localizing Memorization in SSL Vision Encoders
Localizing Memorization in SSL Vision Encoders
Wenhao Wang
Adam Dziedzic
Michael Backes
Franziska Boenisch
67
2
0
27 Sep 2024
ProMerge: Prompt and Merge for Unsupervised Instance Segmentation
ProMerge: Prompt and Merge for Unsupervised Instance Segmentation
Dylan Li
Gyungin Shin
78
3
0
27 Sep 2024
UniEmoX: Cross-modal Semantic-Guided Large-Scale Pretraining for
  Universal Scene Emotion Perception
UniEmoX: Cross-modal Semantic-Guided Large-Scale Pretraining for Universal Scene Emotion Perception
Chuang Chen
Xingwu Sun
Zhi Liu
91
1
0
27 Sep 2024
Learning from Pattern Completion: Self-supervised Controllable
  Generation
Learning from Pattern Completion: Self-supervised Controllable Generation
Zhiqiang Chen
Guofan Fan
Jinying Gao
Lei Ma
Bo Lei
Tiejun Huang
Shan Yu
52
0
0
27 Sep 2024
Off to new Shores: A Dataset & Benchmark for (near-)coastal Flood
  Inundation Forecasting
Off to new Shores: A Dataset & Benchmark for (near-)coastal Flood Inundation Forecasting
Brandon Victor
Mathilde Letard
Peter Naylor
Karim Douch
Nicolas Longépé
Zhen He
Patrick Ebel
AI4CE
53
1
0
27 Sep 2024
Cross-video Identity Correlating for Person Re-identification
  Pre-training
Cross-video Identity Correlating for Person Re-identification Pre-training
Jialong Zuo
Ying Nie
Hanyu Zhou
Huaxin Zhang
Haoyu Wang
Tianyu Guo
Nong Sang
Changxin Gao
90
5
0
27 Sep 2024
How Effective is Pre-training of Large Masked Autoencoders for
  Downstream Earth Observation Tasks?
How Effective is Pre-training of Large Masked Autoencoders for Downstream Earth Observation Tasks?
Jose Sosa
Mohamed Aloulou
Danila Rukhovich
Rim Sleimi
Boonyarit Changaival
Anis Kacem
Djamila Aouada
75
1
0
27 Sep 2024
Token Caching for Diffusion Transformer Acceleration
Token Caching for Diffusion Transformer Acceleration
Jinming Lou
Wenyang Luo
Yufan Liu
Bing Li
Xinmiao Ding
Weiming Hu
Jiajiong Cao
Yuming Li
Chenguang Ma
88
6
0
27 Sep 2024
CycleNet: Enhancing Time Series Forecasting through Modeling Periodic
  Patterns
CycleNet: Enhancing Time Series Forecasting through Modeling Periodic Patterns
Shengsheng Lin
Weiwei Lin
Xinyi Hu
Wentai Wu
Ruichao Mo
Haocheng Zhong
AI4TS
115
30
0
27 Sep 2024
Self-supervised Pretraining for Cardiovascular Magnetic Resonance Cine
  Segmentation
Self-supervised Pretraining for Cardiovascular Magnetic Resonance Cine Segmentation
Rob A. J. de Mooij
Josien P. W. Pluim
Cian M. Scannell
59
0
0
26 Sep 2024
Spatial Hierarchy and Temporal Attention Guided Cross Masking for
  Self-supervised Skeleton-based Action Recognition
Spatial Hierarchy and Temporal Attention Guided Cross Masking for Self-supervised Skeleton-based Action Recognition
Xinpeng Yin
Wenming Cao
80
0
0
26 Sep 2024
Efficient Bias Mitigation Without Privileged Information
Efficient Bias Mitigation Without Privileged Information
Mateo Espinosa Zarlenga
Swami Sankaranarayanan
Jerone T. A. Andrews
Z. Shams
M. Jamnik
Alice Xiang
126
3
0
26 Sep 2024
Prototype based Masked Audio Model for Self-Supervised Learning of Sound
  Event Detection
Prototype based Masked Audio Model for Self-Supervised Learning of Sound Event Detection
Pengfei Cai
Yan Song
Nan Jiang
Qing Gu
Ian Mcloughlin
60
2
0
26 Sep 2024
Triple Point Masking
Triple Point Masking
Jiaming Liu
Linghe Kong
Yue Wu
Maoguo Gong
Hao Li
Qiguang Miao
Wenping Ma
Can Qin
3DPC
85
0
0
26 Sep 2024
CadVLM: Bridging Language and Vision in the Generation of Parametric CAD
  Sketches
CadVLM: Bridging Language and Vision in the Generation of Parametric CAD Sketches
Sifan Wu
Amir Khasahmadi
Mor Katz
P. Jayaraman
Yewen Pu
K. Willis
Bang Liu
3DV
72
9
0
26 Sep 2024
CROSS-GAiT: Cross-Attention-Based Multimodal Representation Fusion for
  Parametric Gait Adaptation in Complex Terrains
CROSS-GAiT: Cross-Attention-Based Multimodal Representation Fusion for Parametric Gait Adaptation in Complex Terrains
Gershom Seneviratne
K. Weerakoon
Mohamed Bashir Elnoor
Vignesh Rajgopal
Harshavarthan Varatharajan
Mohamed Khalid M Jaffar
Jason Pusey
Dinesh Manocha
CVBM
71
0
0
25 Sep 2024
PACE: Marrying generalization in PArameter-efficient fine-tuning with Consistency rEgularization
PACE: Marrying generalization in PArameter-efficient fine-tuning with Consistency rEgularization
Yao Ni
Shan Zhang
Piotr Koniusz
463
8
0
25 Sep 2024
Face Forgery Detection with Elaborate Backbone
Face Forgery Detection with Elaborate Backbone
Zonghui Guo
Y. Liu
Jie Zhang
Haiyong Zheng
Shiguang Shan
AAMLCVBM
97
1
0
25 Sep 2024
3DDX: Bone Surface Reconstruction from a Single Standard-Geometry
  Radiograph via Dual-Face Depth Estimation
3DDX: Bone Surface Reconstruction from a Single Standard-Geometry Radiograph via Dual-Face Depth Estimation
Yi Gu
Y. Otake
Keisuke Uemura
Masaki Takao
Mazen Soufi
S. Okada
Nobuhiko Sugano
Hugues Talbot
Yoshinobu Sato
53
2
0
25 Sep 2024
Layout-Corrector: Alleviating Layout Sticking Phenomenon in Discrete
  Diffusion Model
Layout-Corrector: Alleviating Layout Sticking Phenomenon in Discrete Diffusion Model
Shoma Iwai
Atsuki Osanai
Shunsuke Kitada
S. Omachi
3DV
53
2
0
25 Sep 2024
Stochastic Subsampling With Average Pooling
Stochastic Subsampling With Average Pooling
Bum Jun Kim
Sang Woo Kim
43
0
0
25 Sep 2024
EMIT- Event-Based Masked Auto Encoding for Irregular Time Series
EMIT- Event-Based Masked Auto Encoding for Irregular Time Series
Hrishikesh Patel
Ruihong Qiu
Adam Irwin
Shazia Sadiq
Sen Wang
AI4TS
117
3
0
25 Sep 2024
Self-Supervised Any-Point Tracking by Contrastive Random Walks
Self-Supervised Any-Point Tracking by Contrastive Random Walks
Ayush Shrivastava
Andrew Owens
63
5
0
24 Sep 2024
Segmentation Strategies in Deep Learning for Prostate Cancer Diagnosis:
  A Comparative Study of Mamba, SAM, and YOLO
Segmentation Strategies in Deep Learning for Prostate Cancer Diagnosis: A Comparative Study of Mamba, SAM, and YOLO
Ali Badiezadeh
Amin Malekmohammadi
Seyed Mostafa Mirhassani
Parisa Gifani
Majid Vafaeezadeh
Mamba
65
2
0
24 Sep 2024
Predicting Distance matrix with large language models
Predicting Distance matrix with large language models
Jiaxing Yang
24
0
0
24 Sep 2024
Hyperbolic Image-and-Pointcloud Contrastive Learning for 3D
  Classification
Hyperbolic Image-and-Pointcloud Contrastive Learning for 3D Classification
Naiwen Hu
Haozhe Cheng
Yifan Xie
Pengcheng Shi
Jihua Zhu
3DPC
98
0
0
24 Sep 2024
3D-JEPA: A Joint Embedding Predictive Architecture for 3D
  Self-Supervised Representation Learning
3D-JEPA: A Joint Embedding Predictive Architecture for 3D Self-Supervised Representation Learning
Naiwen Hu
Haozhe Cheng
Yifan Xie
Shiqi Li
Jihua Zhu
AI4TS3DV
53
0
0
24 Sep 2024
Towards Universal Large-Scale Foundational Model for Natural Gas Demand
  Forecasting
Towards Universal Large-Scale Foundational Model for Natural Gas Demand Forecasting
Xinxing Zhou
Jiaqi Ye
Shubao Zhao
Ming Jin
Zhaoxiang Hou
Chengyi Yang
Zengxiang Li
Yanlong Wen
Xiaojie Yuan
AI4TS
63
1
0
24 Sep 2024
Robust Training Objectives Improve Embedding-based Retrieval in
  Industrial Recommendation Systems
Robust Training Objectives Improve Embedding-based Retrieval in Industrial Recommendation Systems
Matthew Kolodner
Mingxuan Ju
Zihao Fan
Tong Zhao
Elham Ghazizadeh
Yan Wu
Neil Shah
Yozen Liu
83
4
0
23 Sep 2024
Mammo-Clustering: A Multi-views Tri-level Information Fusion Context Clustering Framework for Localization and Classification in Mammography
Mammo-Clustering: A Multi-views Tri-level Information Fusion Context Clustering Framework for Localization and Classification in Mammography
Shilong Yang
Chulong Zhang
Qi Zang
Juan Yu
Liang Zeng
...
Yexuan Xing
Xin Pan
Qi Li
Xiaokun Liang
Yaoqin Xie
101
0
0
23 Sep 2024
BrainDreamer: Reasoning-Coherent and Controllable Image Generation from
  EEG Brain Signals via Language Guidance
BrainDreamer: Reasoning-Coherent and Controllable Image Generation from EEG Brain Signals via Language Guidance
Ling Wang
Chen Wu
Lin Wang
DiffM
66
0
0
21 Sep 2024
ViTGuard: Attention-aware Detection against Adversarial Examples for
  Vision Transformer
ViTGuard: Attention-aware Detection against Adversarial Examples for Vision Transformer
Shihua Sun
Kenechukwu Nwodo
Shridatt Sugrim
Angelos Stavrou
Haining Wang
AAML
85
1
0
20 Sep 2024
Prithvi WxC: Foundation Model for Weather and Climate
Prithvi WxC: Foundation Model for Weather and Climate
J. Schmude
Sujit Roy
Will Trojak
Johannes Jakubik
Daniel Salles Civitarese
...
Campbell Watson
M. Maskey
Tsengdar J Lee
Juan Bernabé-Moreno
Rahul Ramachandran
VLMAI4Cl
102
10
0
20 Sep 2024
Formula-Supervised Visual-Geometric Pre-training
Formula-Supervised Visual-Geometric Pre-training
Ryosuke Yamada
Kensho Hara
Hirokatsu Kataoka
Koshi Makihara
Nakamasa Inoue
Rio Yokota
Y. Satoh
57
1
0
20 Sep 2024
Leveraging Text Localization for Scene Text Removal via Text-aware
  Masked Image Modeling
Leveraging Text Localization for Scene Text Removal via Text-aware Masked Image Modeling
Zixiao Wang
Hongtao Xie
Yuxin Wang
Yadong Qu
Fengjun Guo
Pengwei Liu
DiffM
71
0
0
20 Sep 2024
FreeAvatar: Robust 3D Facial Animation Transfer by Learning an
  Expression Foundation Model
FreeAvatar: Robust 3D Facial Animation Transfer by Learning an Expression Foundation Model
Feng Qiu
Wei Zhang
Chen Liu
Rudong An
Lincheng Li
Yu Ding
Changjie Fan
Zhipeng Hu
Xin Yu
SLR3DH
86
0
0
20 Sep 2024
RingMo-Aerial: An Aerial Remote Sensing Foundation Model With Affine Transformation Contrastive Learning
RingMo-Aerial: An Aerial Remote Sensing Foundation Model With Affine Transformation Contrastive Learning
Wenhui Diao
Haichen Yu
Kaiyue Kang
Tong Ling
Di Liu
...
Hanbo Bi
Libo Ren
Xuexue Li
Yongqiang Mao
Xian Sun
274
1
0
20 Sep 2024
MEXMA: Token-level objectives improve sentence representations
MEXMA: Token-level objectives improve sentence representations
Joao Maria Janeiro
Benjamin Piwowarski
Patrick Gallinari
Loïc Barrault
41
2
0
19 Sep 2024
Is Tokenization Needed for Masked Particle Modelling?
Is Tokenization Needed for Masked Particle Modelling?
Matthew Leigh
Samuel Klein
François Charton
Tobias Golling
Lukas Heinrich
Michael Kagan
Ines Ochoa
Margarita Osadchy
95
8
0
19 Sep 2024
SoundBeam meets M2D: Target Sound Extraction with Audio Foundation Model
SoundBeam meets M2D: Target Sound Extraction with Audio Foundation Model
Carlos Hernandez-Olivan
Marc Delcroix
Tsubasa Ochiai
Daisuke Niizumi
Naohiro Tawara
Tomohiro Nakatani
Shoko Araki
54
2
0
19 Sep 2024
FoME: A Foundation Model for EEG using Adaptive Temporal-Lateral
  Attention Scaling
FoME: A Foundation Model for EEG using Adaptive Temporal-Lateral Attention Scaling
Enze Shi
Kui Zhao
Qilong Yuan
Jiaqi Wang
Huawen Hu
Sigang Yu
Shu Zhang
52
4
0
19 Sep 2024
Measuring Sound Symbolism in Audio-visual Models
Measuring Sound Symbolism in Audio-visual Models
Wei-Cheng Tseng
Yi-Jen Shih
David Harwath
Raymond Mooney
90
0
0
18 Sep 2024
Unsupervised Feature Orthogonalization for Learning Distortion-Invariant
  Representations
Unsupervised Feature Orthogonalization for Learning Distortion-Invariant Representations
Sebastian Doerrich
Francesco Di Salvo
Christian Ledig
OOD
53
0
0
18 Sep 2024
DynaMo: In-Domain Dynamics Pretraining for Visuo-Motor Control
DynaMo: In-Domain Dynamics Pretraining for Visuo-Motor Control
Zichen Jeff Cui
Hengkai Pan
Aadhithya Iyer
Siddhant Haldar
Lerrel Pinto
VGen
116
17
0
18 Sep 2024
Agglomerative Token Clustering
Agglomerative Token Clustering
Joakim Bruslund Haurum
Sergio Escalera
Graham W. Taylor
T. Moeslund
83
4
0
18 Sep 2024
EventAug: Multifaceted Spatio-Temporal Data Augmentation Methods for
  Event-based Learning
EventAug: Multifaceted Spatio-Temporal Data Augmentation Methods for Event-based Learning
Yukun Tian
Hao Chen
Yongjian Deng
Feihong Shen
Kepan Liu
Wei You
Ziyang Zhang
57
0
0
18 Sep 2024
DETECLAP: Enhancing Audio-Visual Representation Learning with Object
  Information
DETECLAP: Enhancing Audio-Visual Representation Learning with Object Information
Shota Nakada
Taichi Nishimura
Hokuto Munakata
Masayoshi Kondo
Tatsuya Komatsu
CLIPVLM
63
0
0
18 Sep 2024
Previous
123...202122...949596
Next