ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.06377
  4. Cited By
Masked Autoencoders Are Scalable Vision Learners
v1v2v3 (latest)

Masked Autoencoders Are Scalable Vision Learners

11 November 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
    ViTTPM
ArXiv (abs)PDFHTML

Papers citing "Masked Autoencoders Are Scalable Vision Learners"

50 / 4,779 papers shown
Title
SiTH: Single-view Textured Human Reconstruction with Image-Conditioned
  Diffusion
SiTH: Single-view Textured Human Reconstruction with Image-Conditioned Diffusion
Hsuan-I Ho
Mingli Song
Otmar Hilliges
DiffM
83
36
0
27 Nov 2023
A-JEPA: Joint-Embedding Predictive Architecture Can Listen
A-JEPA: Joint-Embedding Predictive Architecture Can Listen
Zhengcong Fei
Mingyuan Fan
Junshi Huang
153
20
0
27 Nov 2023
FLAIR: A Conditional Diffusion Framework with Applications to Face Video
  Restoration
FLAIR: A Conditional Diffusion Framework with Applications to Face Video Restoration
Zihao Zou
Jiaming Liu
Shirin Shoushtari
Yubo Wang
Weijie Gan
Ulugbek S. Kamilov
VGenDiffM
89
2
0
26 Nov 2023
Adversarial Purification of Information Masking
Adversarial Purification of Information Masking
Sitong Liu
Z. Lian
Shuangquan Zhang
Liang Xiao
AAML
76
0
0
26 Nov 2023
ChAda-ViT : Channel Adaptive Attention for Joint Representation Learning
  of Heterogeneous Microscopy Images
ChAda-ViT : Channel Adaptive Attention for Joint Representation Learning of Heterogeneous Microscopy Images
Nicolas Bourriez
Ihab Bendidi
Ethan O. Cohen
Gabriel Watkinson
Maxime Sanchez
Guillaume Bollot
Auguste Genovesio
MedIm
58
12
0
26 Nov 2023
Insect-Foundation: A Foundation Model and Large-scale 1M Dataset for
  Visual Insect Understanding
Insect-Foundation: A Foundation Model and Large-scale 1M Dataset for Visual Insect Understanding
Hoang-Quan Nguyen
Thanh-Dat Truong
Xuan-Bac Nguyen
Ashley Dowling
Xin Li
Khoa Luu
VLM
82
20
0
26 Nov 2023
SpliceMix: A Cross-scale and Semantic Blending Augmentation Strategy for
  Multi-label Image Classification
SpliceMix: A Cross-scale and Semantic Blending Augmentation Strategy for Multi-label Image Classification
Lei Wang
Yibing Zhan
Leilei Ma
Dapeng Tao
Liang Ding
Chen Gong
79
1
0
26 Nov 2023
xTrimoGene: An Efficient and Scalable Representation Learner for
  Single-Cell RNA-Seq Data
xTrimoGene: An Efficient and Scalable Representation Learner for Single-Cell RNA-Seq Data
Jing Gong
Minsheng Hao
Xingyi Cheng
Xin Zeng
Chiming Liu
Jianzhu Ma
Xuegong Zhang
Taifeng Wang
Leo T. Song
136
22
0
26 Nov 2023
Predicting Gradient is Better: Exploring Self-Supervised Learning for
  SAR ATR with a Joint-Embedding Predictive Architecture
Predicting Gradient is Better: Exploring Self-Supervised Learning for SAR ATR with a Joint-Embedding Predictive Architecture
Wei-Jang Li
Yang Wei
Tianpeng Liu
Yuenan Hou
Yuxuan Li
Zhen Liu
Yongxiang Liu
Li Liu
101
20
0
26 Nov 2023
CUCL: Codebook for Unsupervised Continual Learning
CUCL: Codebook for Unsupervised Continual Learning
Chen Cheng
Jingkuan Song
Xiaosu Zhu
Sitong Su
Lianli Gao
Jikang Cheng
CLL
61
2
0
25 Nov 2023
SEGIC: Unleashing the Emergent Correspondence for In-Context
  Segmentation
SEGIC: Unleashing the Emergent Correspondence for In-Context Segmentation
Lingchen Meng
Shiyi Lan
Hengduo Li
Jose M. Alvarez
Zuxuan Wu
Yu-Gang Jiang
VLMISegMLLM
79
9
0
24 Nov 2023
Understanding Self-Supervised Features for Learning Unsupervised
  Instance Segmentation
Understanding Self-Supervised Features for Learning Unsupervised Instance Segmentation
Paul Engstler
Luke Melas-Kyriazi
Christian Rupprecht
Iro Laina
SSL
67
5
0
24 Nov 2023
Stable Cluster Discrimination for Deep Clustering
Stable Cluster Discrimination for Deep Clustering
Qi Qian
OOD
80
24
0
24 Nov 2023
Towards Transferable Multi-modal Perception Representation Learning for
  Autonomy: NeRF-Supervised Masked AutoEncoder
Towards Transferable Multi-modal Perception Representation Learning for Autonomy: NeRF-Supervised Masked AutoEncoder
Xiaohao Xu
138
0
0
23 Nov 2023
Multi-modal In-Context Learning Makes an Ego-evolving Scene Text
  Recognizer
Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer
Zhen Zhao
Jingqun Tang
Chunhui Lin
Binghong Wu
Can Huang
Hao Liu
Xin Tan
Zhizhong Zhang
Yuan Xie
106
25
0
22 Nov 2023
SegVol: Universal and Interactive Volumetric Medical Image Segmentation
SegVol: Universal and Interactive Volumetric Medical Image Segmentation
Yuxin Du
Fan Bai
Tiejun Huang
Bo Zhao
VLM
151
44
0
22 Nov 2023
Bridging Generalization Gaps in High Content Imaging Through Online
  Self-Supervised Domain Adaptation
Bridging Generalization Gaps in High Content Imaging Through Online Self-Supervised Domain Adaptation
Johan Fredin Haslum
Christos Matsoukas
Karl‐Johan Leuchowius
Kevin Smith
65
2
0
21 Nov 2023
Echocardiogram Foundation Model -- Application 1: Estimating Ejection
  Fraction
Echocardiogram Foundation Model -- Application 1: Estimating Ejection Fraction
Adil Dahlan
C. Zakka
Abhinav Kumar
Laura Tang
R. Shad
R. Fong
W. Hiesinger
81
2
0
21 Nov 2023
Long-MIL: Scaling Long Contextual Multiple Instance Learning for
  Histopathology Whole Slide Image Analysis
Long-MIL: Scaling Long Contextual Multiple Instance Learning for Histopathology Whole Slide Image Analysis
Honglin Li
Yunlong Zhang
Chenglu Zhu
Jiatong Cai
Sunyi Zheng
Lin Yang
VLM
94
4
0
21 Nov 2023
Instance-aware 3D Semantic Segmentation powered by Shape Generators and
  Classifiers
Instance-aware 3D Semantic Segmentation powered by Shape Generators and Classifiers
Bo Sun
Qixing Huang
Xiangru Huang
3DV3DPC
72
0
0
21 Nov 2023
Provable Representation with Efficient Planning for Partial Observable
  Reinforcement Learning
Provable Representation with Efficient Planning for Partial Observable Reinforcement Learning
Hongming Zhang
Zhaolin Ren
Chenjun Xiao
Dale Schuurmans
Bo Dai
104
4
0
20 Nov 2023
Evaluating Supervision Levels Trade-Offs for Infrared-Based People
  Counting
Evaluating Supervision Levels Trade-Offs for Infrared-Based People Counting
David Latortue
Moetez Kdayem
F. Guerrero-Peña
Eric Granger
M. Pedersoli
76
0
0
20 Nov 2023
SA-Med2D-20M Dataset: Segment Anything in 2D Medical Imaging with 20
  Million masks
SA-Med2D-20M Dataset: Segment Anything in 2D Medical Imaging with 20 Million masks
Jin Ye
Junlong Cheng
Jianpin Chen
Zhongying Deng
Tian-Xin Li
...
Hui Sun
Min Zhu
Shaoting Zhang
Junjun He
Yu Qiao
VLMMedIm
89
40
0
20 Nov 2023
DocPedia: Unleashing the Power of Large Multimodal Model in the
  Frequency Domain for Versatile Document Understanding
DocPedia: Unleashing the Power of Large Multimodal Model in the Frequency Domain for Versatile Document Understanding
Hao Feng
Qi Liu
Hao Liu
Wen-gang Zhou
Houqiang Li
Can Huang
VLM
122
67
0
20 Nov 2023
Masked Autoencoders Are Robust Neural Architecture Search Learners
Masked Autoencoders Are Robust Neural Architecture Search Learners
Yiming Hu
Xiangxiang Chu
Bo Zhang
OOD
109
0
0
20 Nov 2023
CurriculumLoc: Enhancing Cross-Domain Geolocalization through
  Multi-Stage Refinement
CurriculumLoc: Enhancing Cross-Domain Geolocalization through Multi-Stage Refinement
Boni Hu
Lin Chen
Runjian Chen
Shuhui Bu
Pengcheng Han
Haowei Li
68
1
0
20 Nov 2023
Event Camera Data Dense Pre-training
Event Camera Data Dense Pre-training
Yan Yang
Liyuan Pan
Liu Liu
65
4
0
20 Nov 2023
Pair-wise Layer Attention with Spatial Masking for Video Prediction
Pair-wise Layer Attention with Spatial Masking for Video Prediction
Ping Li
Chenhan Zhang
Zheng Yang
Xianghua Xu
Mingli Song
70
0
0
19 Nov 2023
Morphology-Enhanced CAM-Guided SAM for weakly supervised Breast Lesion
  Segmentation
Morphology-Enhanced CAM-Guided SAM for weakly supervised Breast Lesion Segmentation
Xin Yue
Xiaoling Liu
Qing Zhao
Jianqiang Li
Changwei Song
Suqin Liu
Zhikai Yang
Guanghui Fu
MedIm
68
2
0
18 Nov 2023
On the Out of Distribution Robustness of Foundation Models in Medical
  Image Segmentation
On the Out of Distribution Robustness of Foundation Models in Medical Image Segmentation
D. M. Nguyen
Tan Ngoc Pham
Nghiem Tuong Diep
Nghi Quoc Phan
Quang Pham
...
Ngan Hoang Le
Nhat Ho
Pengtao Xie
Daniel Sonntag
Mathias Niepert
VLMUQCVOOD
80
7
0
18 Nov 2023
Towards Robust and Accurate Visual Prompting
Towards Robust and Accurate Visual Prompting
Qi Li
Liangzhi Li
Zhouqiang Jiang
Bowen Wang
VPVLMVLM
66
3
0
18 Nov 2023
Deep Tensor Network
Deep Tensor Network
Yifan Zhang
120
0
0
18 Nov 2023
Point Cloud Self-supervised Learning via 3D to Multi-view Masked
  Autoencoder
Point Cloud Self-supervised Learning via 3D to Multi-view Masked Autoencoder
Zhimin Chen
Yingwei Li
Longlong Jing
Liang Yang
Bing Li
3DPC
102
9
0
17 Nov 2023
Mind the map! Accounting for existing map information when estimating
  online HDMaps from sensor
Mind the map! Accounting for existing map information when estimating online HDMaps from sensor
Rémy Sun
Li Yang
Diane Lingrand
Frédéric Precioso
60
0
0
17 Nov 2023
Leveraging Multimodal Fusion for Enhanced Diagnosis of Multiple Retinal
  Diseases in Ultra-wide OCTA
Leveraging Multimodal Fusion for Enhanced Diagnosis of Multiple Retinal Diseases in Ultra-wide OCTA
Hao Wei
Peilun Shi
Guitao Bai
Minqing Zhang
Shuangle Li
Wu Yuan
60
1
0
17 Nov 2023
Segment Anything in Defect Detection
Segment Anything in Defect Detection
Bozhen Hu
Bin Gao
Cheng Tan
Tongle Wu
Stan Z. Li
40
7
0
17 Nov 2023
Multi-entity Video Transformers for Fine-Grained Video Representation Learning
Multi-entity Video Transformers for Fine-Grained Video Representation Learning
Matthew Walmer
Rose Kanjirathinkal
Kai Sheng Tai
Keyur Muzumdar
Taipeng Tian
Abhinav Shrivastava
ViT
90
0
0
17 Nov 2023
From Pretext to Purpose: Batch-Adaptive Self-Supervised Learning
From Pretext to Purpose: Batch-Adaptive Self-Supervised Learning
Jiansong Zhang
Linlin Shen
Peizhong Liu
SSL
59
0
0
16 Nov 2023
Self-supervised learning of multi-omics embeddings in the low-label,
  high-data regime
Self-supervised learning of multi-omics embeddings in the low-label, high-data regime
Christian John Hurry
Emma Slade
82
0
0
16 Nov 2023
Video-LLaVA: Learning United Visual Representation by Alignment Before
  Projection
Video-LLaVA: Learning United Visual Representation by Alignment Before Projection
Bin Lin
Yang Ye
Bin Zhu
Jiaxi Cui
Munan Ning
Peng Jin
Li-ming Yuan
VLMMLLM
390
711
0
16 Nov 2023
Slide-SAM: Medical SAM Meets Sliding Window
Slide-SAM: Medical SAM Meets Sliding Window
Quan Quan
Fenghe Tang
Zikang Xu
Heqin Zhu
S.Kevin Zhou
VLMMedIm
95
8
0
16 Nov 2023
SQLNet: Scale-Modulated Query and Localization Network for Few-Shot Class-Agnostic Counting
SQLNet: Scale-Modulated Query and Localization Network for Few-Shot Class-Agnostic Counting
Hefeng Wu
Yandong Chen
Lingbo Liu
Tianshui Chen
Keze Wang
Liang Lin
83
1
0
16 Nov 2023
ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy
ConvNet vs Transformer, Supervised vs CLIP: Beyond ImageNet Accuracy
Kirill Vishniakov
Zhiqiang Shen
Zhuang Liu
CLIP
106
17
0
15 Nov 2023
Imagine the Unseen World: A Benchmark for Systematic Generalization in
  Visual World Models
Imagine the Unseen World: A Benchmark for Systematic Generalization in Visual World Models
Yeongbin Kim
Gautam Singh
Junyeong Park
Çağlar Gülçehre
Sungjin Ahn
OCLVLM
113
5
0
15 Nov 2023
AdapterShadow: Adapting Segment Anything Model for Shadow Detection
AdapterShadow: Adapting Segment Anything Model for Shadow Detection
Lei Jie
Hui Zhang
VLM
77
3
0
15 Nov 2023
Toulouse Hyperspectral Data Set: a benchmark data set to assess
  semi-supervised spectral representation learning and pixel-wise
  classification techniques
Toulouse Hyperspectral Data Set: a benchmark data set to assess semi-supervised spectral representation learning and pixel-wise classification techniques
R. Thoreau
Laurent Risser
V. Achard
Béatrice Berthelot
X. Briottet
157
6
0
15 Nov 2023
Autoencoder with Group-based Decoder and Multi-task Optimization for
  Anomalous Sound Detection
Autoencoder with Group-based Decoder and Multi-task Optimization for Anomalous Sound Detection
Yifan Zhou
Dongxing Xu
Haoran Wei
Yanhua Long
72
0
0
15 Nov 2023
Zero-Shot Segmentation of Eye Features Using the Segment Anything Model
  (SAM)
Zero-Shot Segmentation of Eye Features Using the Segment Anything Model (SAM)
Virmarie Maquiling
Sean Anthony Byrne
D. Niehorster
Marcus Nyström
Enkelejda Kasneci
VLM
120
14
0
14 Nov 2023
Dual-channel Prototype Network for few-shot Classification of
  Pathological Images
Dual-channel Prototype Network for few-shot Classification of Pathological Images
Hao Quan
Xinjia Li
Dayu Hu
Tianhang Nan
Xiaoyu Cui
64
0
0
14 Nov 2023
SynthEnsemble: A Fusion of CNN, Vision Transformer, and Hybrid Models
  for Multi-Label Chest X-Ray Classification
SynthEnsemble: A Fusion of CNN, Vision Transformer, and Hybrid Models for Multi-Label Chest X-Ray Classification
S. M. N. Ashraf
Md. Adyelullahil Mamun
Hasnat Md. Abdullah
Rabiul Alam
ViTMedIm
101
9
0
13 Nov 2023
Previous
123...495051...949596
Next