ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.06377
  4. Cited By
Masked Autoencoders Are Scalable Vision Learners
v1v2v3 (latest)

Masked Autoencoders Are Scalable Vision Learners

11 November 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
    ViTTPM
ArXiv (abs)PDFHTML

Papers citing "Masked Autoencoders Are Scalable Vision Learners"

50 / 4,778 papers shown
Title
Probing the 3D Awareness of Visual Foundation Models
Probing the 3D Awareness of Visual Foundation Models
Mohamed El Banani
Amit Raj
Kevis-Kokitsi Maninis
Abhishek Kar
Yuanzhen Li
Michael Rubinstein
Deqing Sun
Leonidas Guibas
Justin Johnson
Varun Jampani
101
86
0
12 Apr 2024
Masked Image Modeling as a Framework for Self-Supervised Learning across
  Eye Movements
Masked Image Modeling as a Framework for Self-Supervised Learning across Eye Movements
Robin Weiler
Matthias Brucklacher
C. Pennartz
Sander M. Bohté
77
0
0
12 Apr 2024
TSLANet: Rethinking Transformers for Time Series Representation Learning
TSLANet: Rethinking Transformers for Time Series Representation Learning
Emadeldeen Eldele
Mohamed Ragab
Zhenghua Chen
Min-man Wu
Xiaoli Li
AI4TSAIFin
100
46
0
12 Apr 2024
NC-TTT: A Noise Contrastive Approach for Test-Time Training
NC-TTT: A Noise Contrastive Approach for Test-Time Training
David Osowiechi
G. A. V. Hakim
Mehrdad Noori
Milad Cheraghalikhani
Ali Bahri
Moslem Yazdanpanah
Ismail Ben Ayed
Christian Desrosiers
59
3
0
12 Apr 2024
OmniSat: Self-Supervised Modality Fusion for Earth Observation
OmniSat: Self-Supervised Modality Fusion for Earth Observation
Guillaume Astruc
Nicolas Gonthier
Clement Mallet
Loic Landrieu
126
29
0
12 Apr 2024
Emerging Property of Masked Token for Effective Pre-training
Emerging Property of Masked Token for Effective Pre-training
Hyesong Choi
Hunsang Lee
Seyoung Joung
Hyejin Park
Jiyeong Kim
Dongbo Min
89
10
0
12 Apr 2024
Salience-Based Adaptive Masking: Revisiting Token Dynamics for Enhanced
  Pre-training
Salience-Based Adaptive Masking: Revisiting Token Dynamics for Enhanced Pre-training
Hyesong Choi
Hyejin Park
Kwang Moo Yi
Sungmin Cha
Dongbo Min
105
9
0
12 Apr 2024
Guided Masked Self-Distillation Modeling for Distributed Multimedia
  Sensor Event Analysis
Guided Masked Self-Distillation Modeling for Distributed Multimedia Sensor Event Analysis
Masahiro Yasuda
Noboru Harada
Yasunori Ohishi
Shoichiro Saito
Akira Nakayama
Nobutaka Ono
94
4
0
12 Apr 2024
Practical Region-level Attack against Segment Anything Models
Practical Region-level Attack against Segment Anything Models
Yifan Shen
Zhengyuan Li
Gang Wang
VLM
73
10
0
12 Apr 2024
Any2Point: Empowering Any-modality Large Models for Efficient 3D
  Understanding
Any2Point: Empowering Any-modality Large Models for Efficient 3D Understanding
Yiwen Tang
Ray Zhang
Jiaming Liu
Zoey Guo
Dong Wang
...
Bin Zhao
Shanghang Zhang
Peng Gao
Hongsheng Li
Xuelong Li
97
13
0
11 Apr 2024
GLID: Pre-training a Generalist Encoder-Decoder Vision Model
GLID: Pre-training a Generalist Encoder-Decoder Vision Model
Jihao Liu
Jinliang Zheng
Yu Liu
Hongsheng Li
VLM
56
3
0
11 Apr 2024
Solving Masked Jigsaw Puzzles with Diffusion Vision Transformers
Solving Masked Jigsaw Puzzles with Diffusion Vision Transformers
Jinyang Liu
Wondmgezahu Teshome
S. Ghimire
Octavia Camps
Mario Sznaier
DiffM
78
2
0
10 Apr 2024
NeuroNet: A Novel Hybrid Self-Supervised Learning Framework for Sleep
  Stage Classification Using Single-Channel EEG
NeuroNet: A Novel Hybrid Self-Supervised Learning Framework for Sleep Stage Classification Using Single-Channel EEG
Cheol-Hui Lee
Hakseung Kim
Hyun-jee Han
Min-Kyung Jung
Byung C. Yoon
Dong-Joo Kim
77
6
0
10 Apr 2024
BRAVE: Broadening the visual encoding of vision-language models
BRAVE: Broadening the visual encoding of vision-language models
Ouguzhan Fatih Kar
A. Tonioni
Petra Poklukar
Achin Kulshrestha
Amir Zamir
Federico Tombari
MLLMVLM
80
32
0
10 Apr 2024
Adapting LLaMA Decoder to Vision Transformer
Adapting LLaMA Decoder to Vision Transformer
Jiahao Wang
Wenqi Shao
Mengzhao Chen
Chengyue Wu
Yong Liu
Taiqiang Wu
Kaipeng Zhang
Songyang Zhang
Kai-xiang Chen
Ping Luo
MLLM
85
4
0
10 Apr 2024
Test-Time Adaptation with SaLIP: A Cascade of SAM and CLIP for Zero shot
  Medical Image Segmentation
Test-Time Adaptation with SaLIP: A Cascade of SAM and CLIP for Zero shot Medical Image Segmentation
Sidra Aleem
Fangyijie Wang
Mayug Maniparambil
Eric Arazo
J. Dietlmeier
Guénolé Silvestre
Kathleen M. Curran
Noel E. O'Connor
Suzanne Little
VLMMedIm
90
14
0
09 Apr 2024
Zero-Shot Relational Learning for Multimodal Knowledge Graphs
Zero-Shot Relational Learning for Multimodal Knowledge Graphs
Rui Cai
Shichao Pei
Xiangliang Zhang
80
4
0
09 Apr 2024
Masked Modeling Duo: Towards a Universal Audio Pre-training Framework
Masked Modeling Duo: Towards a Universal Audio Pre-training Framework
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
Noboru Harada
K. Kashino
105
15
0
09 Apr 2024
Unified Multi-modal Diagnostic Framework with Reconstruction
  Pre-training and Heterogeneity-combat Tuning
Unified Multi-modal Diagnostic Framework with Reconstruction Pre-training and Heterogeneity-combat Tuning
Yupei Zhang
Li Pan
Qiushi Yang
Tan Li
Zhen Chen
91
1
0
09 Apr 2024
Finding Visual Task Vectors
Finding Visual Task Vectors
Alberto Hojel
Yutong Bai
Trevor Darrell
Amir Globerson
Amir Bar
122
8
0
08 Apr 2024
Social-MAE: Social Masked Autoencoder for Multi-person Motion
  Representation Learning
Social-MAE: Social Masked Autoencoder for Multi-person Motion Representation Learning
Mahsa Ehsanpour
Ian Reid
Hamid Rezatofighi
ViT
78
0
0
08 Apr 2024
EB-GAME: A Game-Changer in ECG Heartbeat Anomaly Detection
EB-GAME: A Game-Changer in ECG Heartbeat Anomaly Detection
Juneyoung Park
Da Young Kim
Yunsoo Kim
J. Yoo
Tae Joon Kim
37
0
0
08 Apr 2024
Comparing Self-Supervised Learning Techniques for Wearable Human
  Activity Recognition
Comparing Self-Supervised Learning Techniques for Wearable Human Activity Recognition
Sannara Ek
Riccardo Presotto
Gabriele Civitarese
Franccois Portet
P. Lalanda
Claudio Bettini
HAI
64
2
0
08 Apr 2024
iVPT: Improving Task-relevant Information Sharing in Visual Prompt
  Tuning by Cross-layer Dynamic Connection
iVPT: Improving Task-relevant Information Sharing in Visual Prompt Tuning by Cross-layer Dynamic Connection
Nan Zhou
Jiaxin Chen
Di Huang
66
1
0
08 Apr 2024
PairAug: What Can Augmented Image-Text Pairs Do for Radiology?
PairAug: What Can Augmented Image-Text Pairs Do for Radiology?
Yutong Xie
Qi Chen
Sinuo Wang
Minh-Son To
Iris Lee
Ee Win Khoo
Kerolos Hendy
Daniel Koh
Yong-quan Xia
Qi Wu
MedImLM&MA
100
9
0
07 Apr 2024
D2SL: Decouple Defogging and Semantic Learning for Foggy Domain-Adaptive
  Segmentation
D2SL: Decouple Defogging and Semantic Learning for Foggy Domain-Adaptive Segmentation
Xuan Sun
Zhanfu An
Yuyu Liu
88
0
0
07 Apr 2024
Rethinking Self-training for Semi-supervised Landmark Detection: A
  Selection-free Approach
Rethinking Self-training for Semi-supervised Landmark Detection: A Selection-free Approach
Haibo Jin
Haoxuan Che
Hao Chen
92
0
0
06 Apr 2024
Dynamic Switch Layers For Unsupervised Learning
Dynamic Switch Layers For Unsupervised Learning
Haiguang Li
Usama Pervaiz
Michal Matuszak
Robert Kamara
Gilles Roux
T. Thormundsson
Joseph Antognini
129
1
0
05 Apr 2024
LOSS-SLAM: Lightweight Open-Set Semantic Simultaneous Localization and
  Mapping
LOSS-SLAM: Lightweight Open-Set Semantic Simultaneous Localization and Mapping
Kurran Singh
Tim Magoun
John J. Leonard
126
1
0
05 Apr 2024
Test Time Training for Industrial Anomaly Segmentation
Test Time Training for Industrial Anomaly Segmentation
Alex Costanzino
Pierluigi Zama Ramirez
Mirko Del Moro
Agostino Aiezzo
Giuseppe Lisanti
Samuele Salti
Luigi Di Stefano
75
0
0
04 Apr 2024
JUICER: Data-Efficient Imitation Learning for Robotic Assembly
JUICER: Data-Efficient Imitation Learning for Robotic Assembly
Lars Ankile
Anthony Simeonov
Idan Shenfeld
Pulkit Agrawal
LM&Ro
121
19
0
04 Apr 2024
HAPNet: Toward Superior RGB-Thermal Scene Parsing via Hybrid,
  Asymmetric, and Progressive Heterogeneous Feature Fusion
HAPNet: Toward Superior RGB-Thermal Scene Parsing via Hybrid, Asymmetric, and Progressive Heterogeneous Feature Fusion
Jiahang Li
Peng Yun
Qijun Chen
Rui Fan
77
9
0
04 Apr 2024
AdaGlimpse: Active Visual Exploration with Arbitrary Glimpse Position
  and Scale
AdaGlimpse: Active Visual Exploration with Arbitrary Glimpse Position and Scale
Adam Pardyl
Michal Wronka
Maciej Wolczyk
Kamil Adamczewski
Tomasz Trzciñski
Bartosz Zieliñski
85
2
0
04 Apr 2024
A Comprehensive Survey on Self-Supervised Learning for Recommendation
A Comprehensive Survey on Self-Supervised Learning for Recommendation
Xubin Ren
Wei Wei
Lianghao Xia
Chao Huang
SSL
133
14
0
04 Apr 2024
Foundation Model for Advancing Healthcare: Challenges, Opportunities,
  and Future Directions
Foundation Model for Advancing Healthcare: Challenges, Opportunities, and Future Directions
Yuting He
Fuxiang Huang
Xinrui Jiang
Yuxiang Nie
Minghao Wang
Jiguang Wang
Hao Chen
LM&MAAI4CE
146
37
0
04 Apr 2024
Multi Positive Contrastive Learning with Pose-Consistent Generated
  Images
Multi Positive Contrastive Learning with Pose-Consistent Generated Images
Sho Inayoshi
Aji Resindra Widya
Satoshi Ozaki
Junji Otsuka
Takeshi Ohashi
3DH
151
1
0
04 Apr 2024
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale
  Prediction
Visual Autoregressive Modeling: Scalable Image Generation via Next-Scale Prediction
Keyu Tian
Yi Jiang
Zehuan Yuan
Bingyue Peng
Liwei Wang
VGen
127
349
0
03 Apr 2024
Cross-Modal Conditioned Reconstruction for Language-guided Medical Image
  Segmentation
Cross-Modal Conditioned Reconstruction for Language-guided Medical Image Segmentation
Xiaoshuang Huang
Hongxiang Li
Meng Cao
Long Chen
Chenyu You
Dong An
VLM
97
5
0
03 Apr 2024
Foundation Models for Structural Health Monitoring
Foundation Models for Structural Health Monitoring
Luca Benfenati
Daniele Jahier Pagliari
Luca Zanatta
Yhorman Alexander Bedoya Velez
Andrea Acquaviva
Massimo Poncino
Enrico Macii
Luca Benini
Luca Bompani
AI4CE
82
2
0
03 Apr 2024
On the Efficiency and Robustness of Vibration-based Foundation Models
  for IoT Sensing: A Case Study
On the Efficiency and Robustness of Vibration-based Foundation Models for IoT Sensing: A Case Study
Tomoyoshi Kimura
Jinyang Li
Tianshi Wang
Denizhan Kara
Yizhuo Chen
...
Maggie B. Wigness
Shengzhong Liu
Mani B. Srivastava
Suhas Diggavi
Tarek Abdelzaher
60
5
0
03 Apr 2024
Masked Completion via Structured Diffusion with White-Box Transformers
Masked Completion via Structured Diffusion with White-Box Transformers
Druv Pai
Ziyang Wu
Sam Buchanan
Yaodong Yu
Yi-An Ma
67
14
0
03 Apr 2024
Towards Robust 3D Pose Transfer with Adversarial Learning
Towards Robust 3D Pose Transfer with Adversarial Learning
Haoyu Chen
Hao Tang
Ehsan Adeli
Guoying Zhao
3DHAAML
75
3
0
02 Apr 2024
Dynamic Pre-training: Towards Efficient and Scalable All-in-One Image
  Restoration
Dynamic Pre-training: Towards Efficient and Scalable All-in-One Image Restoration
Akshay Dudhane
Omkar Thawakar
Syed Waqas Zamir
Salman Khan
Fahad Shahbaz Khan
Ming-Hsuan Yang
AI4CE
78
7
0
02 Apr 2024
A Universal Knowledge Embedded Contrastive Learning Framework for
  Hyperspectral Image Classification
A Universal Knowledge Embedded Contrastive Learning Framework for Hyperspectral Image Classification
Quanwei Liu
Yanni Dong
Wei Chen
Lefei Zhang
Bo Du
VLM
72
3
0
02 Apr 2024
Propensity Score Alignment of Unpaired Multimodal Data
Propensity Score Alignment of Unpaired Multimodal Data
Johnny Xi
Jason S. Hartford
66
5
0
02 Apr 2024
Can Biases in ImageNet Models Explain Generalization?
Can Biases in ImageNet Models Explain Generalization?
Paul Gavrikov
J. Keuper
OODVLM
65
15
0
01 Apr 2024
SUGAR: Pre-training 3D Visual Representations for Robotics
SUGAR: Pre-training 3D Visual Representations for Robotics
Shizhe Chen
Ricardo Garcia Pinel
Ivan Laptev
Cordelia Schmid
107
16
0
01 Apr 2024
Accurate Patient Alignment without Unnecessary Imaging Dose via
  Synthesizing Patient-specific 3D CT Images from 2D kV Images
Accurate Patient Alignment without Unnecessary Imaging Dose via Synthesizing Patient-specific 3D CT Images from 2D kV Images
Yuzhen Ding
J. Holmes
H. Feng
Baoxin Li
Lisa A. McGee
...
S. A. Vora
Daniel J. Ma
Robert L. Foote
Samir H. Patel
Wei Liu
48
0
0
01 Apr 2024
NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation
  Learning for Neural Radiance Fields
NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields
Muhammad Zubair Irshad
Sergey Zakahrov
Vitor Campagnolo Guizilini
Adrien Gaidon
Z. Kira
Rares Andrei Ambrus
ViT
98
15
0
01 Apr 2024
Bigger is not Always Better: Scaling Properties of Latent Diffusion
  Models
Bigger is not Always Better: Scaling Properties of Latent Diffusion Models
Kangfu Mei
Zhengzhong Tu
M. Delbracio
Hossein Talebi
Vishal M. Patel
P. Milanfar
DiffM
88
13
0
01 Apr 2024
Previous
123...353637...949596
Next