ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.06377
  4. Cited By
Masked Autoencoders Are Scalable Vision Learners
v1v2v3 (latest)

Masked Autoencoders Are Scalable Vision Learners

11 November 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
    ViTTPM
ArXiv (abs)PDFHTML

Papers citing "Masked Autoencoders Are Scalable Vision Learners"

50 / 4,779 papers shown
Title
Style Transfer and Self-Supervised Learning Powered Myocardium
  Infarction Super-Resolution Segmentation
Style Transfer and Self-Supervised Learning Powered Myocardium Infarction Super-Resolution Segmentation
Lichao Wang
Jiahao Huang
Xiaodan Xing
Yinzhe Wu
R. Rajakulasingam
Andrew D. Scott
Pedro F. Ferreira
Ranil De Silva
S. Nielles-Vallespin
Guang Yang
MedImSupR
21
0
0
27 Sep 2023
M$^{3}$3D: Learning 3D priors using Multi-Modal Masked Autoencoders for
  2D image and video understanding
M3^{3}33D: Learning 3D priors using Multi-Modal Masked Autoencoders for 2D image and video understanding
Muhammad Abdullah Jamal
Omid Mohareri
3DPC
89
1
0
26 Sep 2023
SEPT: Towards Efficient Scene Representation Learning for Motion
  Prediction
SEPT: Towards Efficient Scene Representation Learning for Motion Prediction
Zhiqian Lan
Yuxuan Jiang
Yao Mu
Chong Chen
Keqiang Li
165
28
0
26 Sep 2023
VPA: Fully Test-Time Visual Prompt Adaptation
VPA: Fully Test-Time Visual Prompt Adaptation
Jiachen Sun
Mark Ibrahim
Melissa Hall
Ivan Evtimov
Z. Morley Mao
Cristian Canton Ferrer
C. Hazirbas
VLM
94
7
0
26 Sep 2023
SeMAnD: Self-Supervised Anomaly Detection in Multimodal Geospatial
  Datasets
SeMAnD: Self-Supervised Anomaly Detection in Multimodal Geospatial Datasets
Daria Reshetova
Swetava Ganguli
C. V. K. Iyer
Vipul Pandey
64
3
0
26 Sep 2023
Class Incremental Learning via Likelihood Ratio Based Task Prediction
Class Incremental Learning via Likelihood Ratio Based Task Prediction
Haowei Lin
Yijia Shao
W. Qian
Ningxin Pan
Yiduo Guo
Bing-Quan Liu
CLL
111
14
0
26 Sep 2023
Regress Before Construct: Regress Autoencoder for Point Cloud
  Self-supervised Learning
Regress Before Construct: Regress Autoencoder for Point Cloud Self-supervised Learning
Yang Liu
Chong Chen
Can Wang
Xulin King
Mengyuan Liu
3DPC
82
9
0
25 Sep 2023
CLIP-DIY: CLIP Dense Inference Yields Open-Vocabulary Semantic
  Segmentation For-Free
CLIP-DIY: CLIP Dense Inference Yields Open-Vocabulary Semantic Segmentation For-Free
Monika Wysoczañska
Michael Ramamonjisoa
Tomasz Trzciñski
Oriane Siméoni
3DVVLM
126
22
0
25 Sep 2023
Species196: A One-Million Semi-supervised Dataset for Fine-grained
  Species Recognition
Species196: A One-Million Semi-supervised Dataset for Fine-grained Species Recognition
W. He
Kai Han
Ying Nie
Chengcheng Wang
Yunhe Wang
VLM
99
6
0
25 Sep 2023
Masked Image Residual Learning for Scaling Deeper Vision Transformers
Masked Image Residual Learning for Scaling Deeper Vision Transformers
Guoxi Huang
Hongtao Fu
A. Bors
124
7
0
25 Sep 2023
ReMasker: Imputing Tabular Data with Masked Autoencoding
ReMasker: Imputing Tabular Data with Masked Autoencoding
Tianyu Du
Luca Melis
Ting Wang
71
19
0
25 Sep 2023
OneSeg: Self-learning and One-shot Learning based Single-slice
  Annotation for 3D Medical Image Segmentation
OneSeg: Self-learning and One-shot Learning based Single-slice Annotation for 3D Medical Image Segmentation
YiXuan Wu
B. Zheng
Jintai Chen
Benlin Liu
Jian Wu
59
1
0
24 Sep 2023
Survey of Social Bias in Vision-Language Models
Survey of Social Bias in Vision-Language Models
Nayeon Lee
Yejin Bang
Holy Lovenia
Samuel Cahyawijaya
Wenliang Dai
Pascale Fung
VLM
136
19
0
24 Sep 2023
Robust 6DoF Pose Estimation Against Depth Noise and a Comprehensive Evaluation on a Mobile Dataset
Robust 6DoF Pose Estimation Against Depth Noise and a Comprehensive Evaluation on a Mobile Dataset
Zixun Huang
Keling Yao
Seth Z. Zhao
Chuanyu Pan
Chenfeng Xu
106
0
0
24 Sep 2023
Beyond Grids: Exploring Elastic Input Sampling for Vision Transformers
Beyond Grids: Exploring Elastic Input Sampling for Vision Transformers
Adam Pardyl
Grzegorz Kurzejamski
Jan Olszewski
Tomasz Trzciñski
Bartosz Zieliñski
66
1
0
23 Sep 2023
RTrack: Accelerating Convergence for Visual Object Tracking via
  Pseudo-Boxes Exploration
RTrack: Accelerating Convergence for Visual Object Tracking via Pseudo-Boxes Exploration
Guotian Zeng
Bi Zeng
Kuanqi Cai
Jianqi Liu
Qingmao Wei
65
1
0
23 Sep 2023
Robotic Offline RL from Internet Videos via Value-Function Pre-Training
Robotic Offline RL from Internet Videos via Value-Function Pre-Training
Chethan Bhateja
Derek Guo
Dibya Ghosh
Anika Singh
Manan Tomar
Q. Vuong
Yevgen Chebotar
Sergey Levine
Aviral Kumar
OffRL
110
22
0
22 Sep 2023
On Separate Normalization in Self-supervised Transformers
On Separate Normalization in Self-supervised Transformers
Xiaohui Chen
Yinkai Wang
Yuanqi Du
S. Hassoun
Liping Liu
ViT
77
2
0
22 Sep 2023
Masking Improves Contrastive Self-Supervised Learning for ConvNets, and
  Saliency Tells You Where
Masking Improves Contrastive Self-Supervised Learning for ConvNets, and Saliency Tells You Where
Zhi-Yi Chin
Chieh-Ming Jiang
Ching-Chun Huang
Pin-Yu Chen
Wei-Chen Chiu
SSL
101
0
0
22 Sep 2023
TrTr: A Versatile Pre-Trained Large Traffic Model based on Transformer
  for Capturing Trajectory Diversity in Vehicle Population
TrTr: A Versatile Pre-Trained Large Traffic Model based on Transformer for Capturing Trajectory Diversity in Vehicle Population
Ruyi Feng
Zhibin Li
Bowen Liu
Yan Ding
121
2
0
22 Sep 2023
Sequential Action-Induced Invariant Representation for Reinforcement
  Learning
Sequential Action-Induced Invariant Representation for Reinforcement Learning
Dayang Liang
Qihang Chen
Yunlong Liu
90
4
0
22 Sep 2023
See to Touch: Learning Tactile Dexterity through Visual Incentives
See to Touch: Learning Tactile Dexterity through Visual Incentives
Irmak Güzey
Yinlong Dai
Ben Evans
Soumith Chintala
Lerrel Pinto
102
37
0
21 Sep 2023
Towards Answering Health-related Questions from Medical Videos: Datasets
  and Approaches
Towards Answering Health-related Questions from Medical Videos: Datasets and Approaches
Deepak Gupta
Kush Attal
Dina Demner-Fushman
LM&MA
54
1
0
21 Sep 2023
Beyond Image Borders: Learning Feature Extrapolation for Unbounded Image
  Composition
Beyond Image Borders: Learning Feature Extrapolation for Unbounded Image Composition
Xiaoyu Liu
Ming-Yu Liu
Junyi Li
Shuai Liu
Xiaotao Wang
Lei Lei
Wangmeng Zuo
75
2
0
21 Sep 2023
A Knowledge-Driven Cross-view Contrastive Learning for EEG
  Representation
A Knowledge-Driven Cross-view Contrastive Learning for EEG Representation
Weining Weng
Yang Gu
Qihui Zhang
Yingying Huang
Chunyan Miao
Yiqiang Chen
73
7
0
21 Sep 2023
Exploring Self-supervised Skeleton-based Action Recognition in Occluded Environments
Exploring Self-supervised Skeleton-based Action Recognition in Occluded Environments
Yifei Chen
Kunyu Peng
Alina Roitberg
David Schneider
Jiaming Zhang
Junwei Zheng
R. Liu
Yufan Chen
Kailun Yang
Rainer Stiefelhagen
103
0
0
21 Sep 2023
Neural Image Compression Using Masked Sparse Visual Representation
Neural Image Compression Using Masked Sparse Visual Representation
Wei Jiang
Wei Wang
Yuewei Chen
72
7
0
20 Sep 2023
Gold-YOLO: Efficient Object Detector via Gather-and-Distribute Mechanism
Gold-YOLO: Efficient Object Detector via Gather-and-Distribute Mechanism
Chengcheng Wang
Wei He
Ying Nie
Jianyuan Guo
Chuanjian Liu
Kai Han
Yunhe Wang
ObjD
131
245
0
20 Sep 2023
StructChart: Perception, Structuring, Reasoning for Visual Chart
  Understanding
StructChart: Perception, Structuring, Reasoning for Visual Chart Understanding
Renqiu Xia
Bo Zhang
Hao Peng
Hancheng Ye
Xiangchao Yan
Peng Ye
Botian Shi
Yu Qiao
Junchi Yan
116
0
0
20 Sep 2023
Ano-SuPs: Multi-size anomaly detection for manufactured products by identifying suspected patches
Ano-SuPs: Multi-size anomaly detection for manufactured products by identifying suspected patches
Hao Xu
Juan Du
Andi Wang
YingCong Chen
125
1
0
20 Sep 2023
Weak Supervision for Label Efficient Visual Bug Detection
Weak Supervision for Label Efficient Visual Bug Detection
F. Rahman
78
2
0
20 Sep 2023
Score Mismatching for Generative Modeling
Score Mismatching for Generative Modeling
Senmao Ye
Fei Liu
DiffM
75
8
0
20 Sep 2023
SEMPART: Self-supervised Multi-resolution Partitioning of Image
  Semantics
SEMPART: Self-supervised Multi-resolution Partitioning of Image Semantics
Sriram Ravindran
Debraj Basu
100
3
0
20 Sep 2023
Test-Time Training for Speech
Test-Time Training for Speech
Sri Harsha Dumpala
Chandramouli Shama Sastry
Sageev Oore
112
1
0
19 Sep 2023
AI Foundation Models for Weather and Climate: Applications, Design, and
  Implementation
AI Foundation Models for Weather and Climate: Applications, Design, and Implementation
S. K. Mukkavilli
Daniel Salles Civitarese
J. Schmude
Johannes Jakubik
Anne Jones
...
R. Ganti
Hendrik Hamann
U. Nair
Rahul Ramachandran
Kommy Weldemariam
AI4ClAI4CE
97
18
0
19 Sep 2023
AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual
  Representation Models
AV-SUPERB: A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models
Yuan Tseng
Layne Berry
Yi-Ting Chen
I-Hsiang Chiu
Hsuan-Hao Lin
...
Yu Tsao
Shinji Watanabe
Abdel-rahman Mohamed
Chi-Luen Feng
Hung-yi Lee
VLMSSL
127
15
0
19 Sep 2023
Few-Shot Panoptic Segmentation With Foundation Models
Few-Shot Panoptic Segmentation With Foundation Models
Markus Kappeler
Kürsat Petek
Niclas Vodisch
Wolfram Burgard
Abhinav Valada
98
18
0
19 Sep 2023
Learning Tri-modal Embeddings for Zero-Shot Soundscape Mapping
Learning Tri-modal Embeddings for Zero-Shot Soundscape Mapping
Subash Khanal
Srikumar Sastry
Aayush Dhakal
Nathan Jacobs
107
10
0
19 Sep 2023
DCPT: Darkness Clue-Prompted Tracking in Nighttime UAVs
DCPT: Darkness Clue-Prompted Tracking in Nighttime UAVs
Jiawen Zhu
Huayi Tang
Zhi-Qi Cheng
Ju He
Bin Luo
Shihao Qiu
Shengming Li
Huchuan Lu
103
13
0
19 Sep 2023
PanopticNeRF-360: Panoramic 3D-to-2D Label Transfer in Urban Scenes
PanopticNeRF-360: Panoramic 3D-to-2D Label Transfer in Urban Scenes
Xiao Fu
Shangzhan Zhang
Tianrun Chen
Yichong Lu
Xiaowei Zhou
Andreas Geiger
Yiyi Liao
3DPC
181
9
0
19 Sep 2023
Pre-training on Synthetic Driving Data for Trajectory Prediction
Pre-training on Synthetic Driving Data for Trajectory Prediction
Yiheng Li
Seth Z. Zhao
Chenfeng Xu
Chen Tang
Chenran Li
Mingyu Ding
Masayoshi Tomizuka
Wei Zhan
111
13
0
18 Sep 2023
Long-Tail Learning with Foundation Model: Heavy Fine-Tuning Hurts
Long-Tail Learning with Foundation Model: Heavy Fine-Tuning Hurts
Jiang-Xin Shi
Tong Wei
Zhi Zhou
Jiejing Shao
Xin-Yan Han
Yu-Feng Li
111
36
0
18 Sep 2023
Contrastive Learning for Enhancing Robust Scene Transfer in Vision-based
  Agile Flight
Contrastive Learning for Enhancing Robust Scene Transfer in Vision-based Agile Flight
Jiaxu Xing
L. Bauersfeld
Yunlong Song
Chunwei Xing
Davide Scaramuzza
112
12
0
18 Sep 2023
Heterogeneous Generative Knowledge Distillation with Masked Image
  Modeling
Heterogeneous Generative Knowledge Distillation with Masked Image Modeling
Ziming Wang
Shumin Han
Xiaodi Wang
Jing Hao
Xianbin Cao
Baochang Zhang
VLM
74
0
0
18 Sep 2023
Self-supervised TransUNet for Ultrasound regional segmentation of the
  distal radius in children
Self-supervised TransUNet for Ultrasound regional segmentation of the distal radius in children
Yuyue Zhou
Jessica Knight
B. Felfeliyan
Christopher Keen
A. Hareendranathan
Jacob L. Jaremko
71
0
0
18 Sep 2023
FactoFormer: Factorized Hyperspectral Transformers with Self-Supervised
  Pretraining
FactoFormer: Factorized Hyperspectral Transformers with Self-Supervised Pretraining
Shaheera Mohamed
Maryam Haghighat
Tharindu Fernando
Sridha Sridharan
Clinton Fookes
Peyman Moghadam
ViT
88
14
0
18 Sep 2023
LiteTrack: Layer Pruning with Asynchronous Feature Extraction for
  Lightweight and Efficient Visual Tracking
LiteTrack: Layer Pruning with Asynchronous Feature Extraction for Lightweight and Efficient Visual Tracking
Qingmao Wei
Bi Zeng
Jianqi Liu
Li He
Guotian Zeng
89
12
0
17 Sep 2023
FrameRS: A Video Frame Compression Model Composed by Self supervised
  Video Frame Reconstructor and Key Frame Selector
FrameRS: A Video Frame Compression Model Composed by Self supervised Video Frame Reconstructor and Key Frame Selector
Qiqian Fu
Guanhong Wang
Gaoang Wang
32
0
0
16 Sep 2023
MMST-ViT: Climate Change-aware Crop Yield Prediction via Multi-Modal
  Spatial-Temporal Vision Transformer
MMST-ViT: Climate Change-aware Crop Yield Prediction via Multi-Modal Spatial-Temporal Vision Transformer
Fudong Lin
Summer Crawford
Kaleb Guillot
Yihe Zhang
Yan Chen
...
Tri Setiyono
B. Tubana
Lu Peng
Magdy A. Bayoumi
N. Tzeng
102
26
0
16 Sep 2023
RingMo-lite: A Remote Sensing Multi-task Lightweight Network with
  CNN-Transformer Hybrid Framework
RingMo-lite: A Remote Sensing Multi-task Lightweight Network with CNN-Transformer Hybrid Framework
Yuelei Wang
Ting Zhang
Liangjin Zhao
Lin Hu
Zhechao Wang
...
Kaiqiang Chen
Xuan Zeng
Zhirui Wang
Hongqi Wang
Xian Sun
99
5
0
16 Sep 2023
Previous
123...555657...949596
Next