ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.06377
  4. Cited By
Masked Autoencoders Are Scalable Vision Learners
v1v2v3 (latest)

Masked Autoencoders Are Scalable Vision Learners

11 November 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
    ViTTPM
ArXiv (abs)PDFHTML

Papers citing "Masked Autoencoders Are Scalable Vision Learners"

50 / 4,779 papers shown
Title
MVSFormer++: Revealing the Devil in Transformer's Details for Multi-View
  Stereo
MVSFormer++: Revealing the Devil in Transformer's Details for Multi-View Stereo
Chenjie Cao
Xinlin Ren
Yanwei Fu
94
29
0
22 Jan 2024
Exploring Missing Modality in Multimodal Egocentric Datasets
Exploring Missing Modality in Multimodal Egocentric Datasets
Merey Ramazanova
Alejandro Pardo
Humam Alwassel
Guohao Li
EgoV
83
4
0
21 Jan 2024
Unifying Visual and Vision-Language Tracking via Contrastive Learning
Unifying Visual and Vision-Language Tracking via Contrastive Learning
Yinchao Ma
Yuyang Tang
Wenfei Yang
Tianzhu Zhang
Jinpeng Zhang
Mengxue Kang
ObjD
76
17
0
20 Jan 2024
Spatial Structure Constraints for Weakly Supervised Semantic
  Segmentation
Spatial Structure Constraints for Weakly Supervised Semantic Segmentation
Tao Chen
Yazhou Yao
Xing-Rui Huang
Zechao Li
Liqiang Nie
Jinhui Tang
63
20
0
20 Jan 2024
A Novel Benchmark for Few-Shot Semantic Segmentation in the Era of Foundation Models
A Novel Benchmark for Few-Shot Semantic Segmentation in the Era of Foundation Models
Reda Bensaid
Vincent Gripon
Franccois Leduc-Primeau
Lukas Mauch
G. B. Hacene
Fabien Cardinaux
VLM
99
7
0
20 Jan 2024
One Step Learning, One Step Review
One Step Learning, One Step Review
Xiaolong Huang
Qiankun Li
Xueran Li
Xuesong Gao
78
1
0
19 Jan 2024
Towards Universal Unsupervised Anomaly Detection in Medical Imaging
Towards Universal Unsupervised Anomaly Detection in Medical Imaging
Cosmin I. Bercea
Benedikt Wiestler
Daniel Rueckert
Julia A. Schnabel
69
2
0
19 Jan 2024
Memorization in Self-Supervised Learning Improves Downstream
  Generalization
Memorization in Self-Supervised Learning Improves Downstream Generalization
Wenhao Wang
Muhammad Ahmad Kaleem
Adam Dziedzic
Michael Backes
Nicolas Papernot
Franziska Boenisch
SSL
81
11
0
19 Jan 2024
LDReg: Local Dimensionality Regularized Self-Supervised Learning
LDReg: Local Dimensionality Regularized Self-Supervised Learning
Hanxun Huang
R. Campello
S. Erfani
Xingjun Ma
Michael E. Houle
James Bailey
87
5
0
19 Jan 2024
Exploring scalable medical image encoders beyond text supervision
Exploring scalable medical image encoders beyond text supervision
Fernando Pérez-García
Harshita Sharma
Sam Bond-Taylor
Kenza Bouzid
Valentina Salvatelli
...
Maria T. A. Wetscherek
Noel C. F. Codella
Stephanie L. Hyland
Javier Alvarez-Valle
Ozan Oktay
LM&MAMedIm
141
9
0
19 Jan 2024
Enhancing medical vision-language contrastive learning via inter-matching relation modelling
Enhancing medical vision-language contrastive learning via inter-matching relation modelling
Mingjian Li
Mingyuan Meng
M. Fulham
David Dagan Feng
Lei Bi
Jinman Kim
VLM
144
1
0
19 Jan 2024
Reconstructing the Invisible: Video Frame Restoration through Siamese
  Masked Conditional Variational Autoencoder
Reconstructing the Invisible: Video Frame Restoration through Siamese Masked Conditional Variational Autoencoder
Yongchen Zhou
Richard Jiang
46
0
0
18 Jan 2024
OMG-Seg: Is One Model Good Enough For All Segmentation?
OMG-Seg: Is One Model Good Enough For All Segmentation?
Xiangtai Li
Haobo Yuan
Wei Li
Henghui Ding
Size Wu
Wenwei Zhang
Yining Li
Kai Chen
Chen Change Loy
VLMMLLMViT
150
64
0
18 Jan 2024
A Simple Latent Diffusion Approach for Panoptic Segmentation and Mask
  Inpainting
A Simple Latent Diffusion Approach for Panoptic Segmentation and Mask Inpainting
Wouter Van Gansbeke
Bert De Brabandere
DiffM
128
11
0
18 Jan 2024
The Manga Whisperer: Automatically Generating Transcriptions for Comics
The Manga Whisperer: Automatically Generating Transcriptions for Comics
Ragav Sachdeva
Andrew Zisserman
97
15
0
18 Jan 2024
Supervised Fine-tuning in turn Improves Visual Foundation Models
Supervised Fine-tuning in turn Improves Visual Foundation Models
Xiaohu Jiang
Yixiao Ge
Yuying Ge
Dachuan Shi
Chun Yuan
Ying Shan
VLMCLIP
94
9
0
18 Jan 2024
GPT4Ego: Unleashing the Potential of Pre-trained Models for Zero-Shot
  Egocentric Action Recognition
GPT4Ego: Unleashing the Potential of Pre-trained Models for Zero-Shot Egocentric Action Recognition
Guangzhao Dai
Xiangbo Shu
Wenhao Wu
Rui Yan
Jiachao Zhang
VLM
115
7
0
18 Jan 2024
Enhancing Small Object Encoding in Deep Neural Networks: Introducing
  Fast&Focused-Net with Volume-wise Dot Product Layer
Enhancing Small Object Encoding in Deep Neural Networks: Introducing Fast&Focused-Net with Volume-wise Dot Product Layer
Tofik Ali
Partha Pratim Roy
ObjD
66
2
0
18 Jan 2024
HCVP: Leveraging Hierarchical Contrastive Visual Prompt for Domain
  Generalization
HCVP: Leveraging Hierarchical Contrastive Visual Prompt for Domain Generalization
Guanglin Zhou
Zhongyi Han
Shiming Chen
Erdun Gao
Liming Zhu
Tongliang Liu
Lina Yao
Kun Zhang
96
3
0
18 Jan 2024
P$^2$OT: Progressive Partial Optimal Transport for Deep Imbalanced
  Clustering
P2^22OT: Progressive Partial Optimal Transport for Deep Imbalanced Clustering
Chuyu Zhang
Hui Ren
Xuming He
115
7
0
17 Jan 2024
Classification and Reconstruction Processes in Deep Predictive Coding
  Networks: Antagonists or Allies?
Classification and Reconstruction Processes in Deep Predictive Coding Networks: Antagonists or Allies?
Jan Rathjens
Laurenz Wiskott
60
2
0
17 Jan 2024
CrossVideo: Self-supervised Cross-modal Contrastive Learning for Point
  Cloud Video Understanding
CrossVideo: Self-supervised Cross-modal Contrastive Learning for Point Cloud Video Understanding
Yunze Liu
Changxi Chen
Zifan Wang
Li Yi
3DPC
89
3
0
17 Jan 2024
Visual Robotic Manipulation with Depth-Aware Pretraining
Visual Robotic Manipulation with Depth-Aware Pretraining
Wanying Wang
Jinming Li
Yichen Zhu
Zhiyuan Xu
Zhengping Che
Chaomin Shen
Yaxin Peng
Dong Liu
Feifei Feng
Jian Tang
MDE
91
4
0
17 Jan 2024
Hearing Loss Detection from Facial Expressions in One-on-one
  Conversations
Hearing Loss Detection from Facial Expressions in One-on-one Conversations
Yufeng Yin
Ishwarya Ananthabhotla
V. Ithapu
Stavros Petridis
Yu-Hsiang Wu
Christi Miller
CVBM
70
4
0
17 Jan 2024
Scalable Pre-training of Large Autoregressive Image Models
Scalable Pre-training of Large Autoregressive Image Models
Alaaeldin El-Nouby
Michal Klein
Shuangfei Zhai
Miguel Angel Bautista
Alexander Toshev
Vaishaal Shankar
J. Susskind
Armand Joulin
VLM
105
80
0
16 Jan 2024
Achieve Fairness without Demographics for Dermatological Disease
  Diagnosis
Achieve Fairness without Demographics for Dermatological Disease Diagnosis
Ching-Hao Chiu
Yu-Jen Chen
Yawen Wu
Yiyu Shi
Tsung-Yi Ho
79
6
0
16 Jan 2024
Forging Vision Foundation Models for Autonomous Driving: Challenges,
  Methodologies, and Opportunities
Forging Vision Foundation Models for Autonomous Driving: Challenges, Methodologies, and Opportunities
Xu Yan
Haiming Zhang
Yingjie Cai
Jingming Guo
Weichao Qiu
...
Lihui Jiang
Wei Zhang
Hongbo Zhang
Dengxin Dai
Bingbing Liu
175
20
0
16 Jan 2024
Cross-Level Multi-Instance Distillation for Self-Supervised Fine-Grained Visual Categorization
Cross-Level Multi-Instance Distillation for Self-Supervised Fine-Grained Visual Categorization
Qi Bi
Wei Ji
Jingjun Yi
Haolan Zhan
Gui-Song Xia
123
1
0
16 Jan 2024
Exploring Masked Autoencoders for Sensor-Agnostic Image Retrieval in
  Remote Sensing
Exploring Masked Autoencoders for Sensor-Agnostic Image Retrieval in Remote Sensing
Jakob Hackstein
Gencer Sumbul
Kai Norman Clasen
Begüm Demir
116
7
0
15 Jan 2024
Graph Transformer GANs with Graph Masked Modeling for Architectural
  Layout Generation
Graph Transformer GANs with Graph Masked Modeling for Architectural Layout Generation
Hao Tang
Ling Shao
N. Sebe
Luc Van Gool
95
6
0
15 Jan 2024
MIMIC: Mask Image Pre-training with Mix Contrastive Fine-tuning for
  Facial Expression Recognition
MIMIC: Mask Image Pre-training with Mix Contrastive Fine-tuning for Facial Expression Recognition
Fan Zhang
Xiaobao Guo
Xiaojiang Peng
Alex C. Kot
60
1
0
14 Jan 2024
NODI: Out-Of-Distribution Detection with Noise from Diffusion
NODI: Out-Of-Distribution Detection with Noise from Diffusion
Jingqiu Zhou
Aojun Zhou
Hongsheng Li
DiffM
72
1
0
13 Jan 2024
Transformer for Object Re-Identification: A Survey
Transformer for Object Re-Identification: A Survey
Mang Ye
Shuo Chen
Chenyue Li
Wei-Shi Zheng
David J. Crandall
Bo Du
ViT
156
16
0
13 Jan 2024
Frequency Masking for Universal Deepfake Detection
Frequency Masking for Universal Deepfake Detection
Chandler Timm C. Doloriel
Ngai-Man Cheung
89
16
0
12 Jan 2024
Self-supervised Learning of Dense Hierarchical Representations for
  Medical Image Segmentation
Self-supervised Learning of Dense Hierarchical Representations for Medical Image Segmentation
Eytan Kats
Jochen G. Hirsch
Mattias P. Heinrich
SSL
66
0
0
12 Jan 2024
A Study on Self-Supervised Pretraining for Vision Problems in
  Gastrointestinal Endoscopy
A Study on Self-Supervised Pretraining for Vision Problems in Gastrointestinal Endoscopy
Edward Sanderson
B. Matuszewski
74
2
0
11 Jan 2024
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
Eyes Wide Shut? Exploring the Visual Shortcomings of Multimodal LLMs
Shengbang Tong
Zhuang Liu
Yuexiang Zhai
Yi-An Ma
Yann LeCun
Saining Xie
VLMMLLM
163
349
0
11 Jan 2024
End-to-end Learnable Clustering for Intent Learning in Recommendation
End-to-end Learnable Clustering for Intent Learning in Recommendation
Yue Liu
Shihao Zhu
Jun Xia
Yingwei Ma
Jian Ma
Wenliang Zhong
Xinwang Liu
Guannan Zhang
Kejun Zhang
112
11
0
11 Jan 2024
Efficient Image Deblurring Networks based on Diffusion Models
Efficient Image Deblurring Networks based on Diffusion Models
Kang Chen
Yuanjie Liu
DiffM
119
2
0
11 Jan 2024
HiCMAE: Hierarchical Contrastive Masked Autoencoder for Self-Supervised
  Audio-Visual Emotion Recognition
HiCMAE: Hierarchical Contrastive Masked Autoencoder for Self-Supervised Audio-Visual Emotion Recognition
Guoying Zhao
Zheng Lian
Bin Liu
Jianhua Tao
108
32
0
11 Jan 2024
MISS: A Generative Pretraining and Finetuning Approach for Med-VQA
MISS: A Generative Pretraining and Finetuning Approach for Med-VQA
Jiawei Chen
Dingkang Yang
Yue Jiang
Yuxuan Lei
Lihua Zhang
LM&MAMedIm
62
15
0
10 Jan 2024
Unsupervised Salient Patch Selection for Data-Efficient Reinforcement
  Learning
Unsupervised Salient Patch Selection for Data-Efficient Reinforcement Learning
Zhaohui Jiang
Paul Weng
OffRL
71
0
0
10 Jan 2024
HiMTM: Hierarchical Multi-Scale Masked Time Series Modeling for
  Long-Term Forecasting
HiMTM: Hierarchical Multi-Scale Masked Time Series Modeling for Long-Term Forecasting
Shubao Zhao
Ming Jin
Zhaoxiang Hou
Che-Sheng Yang
Zengxiang Li
Qingsong Wen
Yi Wang
86
2
0
10 Jan 2024
Motion Guided Token Compression for Efficient Masked Video Modeling
Motion Guided Token Compression for Efficient Masked Video Modeling
Yukun Feng
Yangming Shi
Fengze Liu
Tan Yan
86
0
0
10 Jan 2024
SnapCap: Efficient Snapshot Compressive Video Captioning
SnapCap: Efficient Snapshot Compressive Video Captioning
Jianqiao Sun
Yudi Su
Hao Zhang
Ziheng Cheng
Zequn Zeng
Zhengjue Wang
Bo Chen
Xin Yuan
144
1
0
10 Jan 2024
Revisiting Adversarial Training at Scale
Revisiting Adversarial Training at Scale
Zeyu Wang
Xianhang Li
Hongru Zhu
Cihang Xie
133
19
0
09 Jan 2024
Generic Knowledge Boosted Pre-training For Remote Sensing Images
Generic Knowledge Boosted Pre-training For Remote Sensing Images
Ziyue Huang
Mingming Zhang
Yuan Gong
Qingjie Liu
Yunhong Wang
VLM
81
15
0
09 Jan 2024
Let's Go Shopping (LGS) -- Web-Scale Image-Text Dataset for Visual
  Concept Understanding
Let's Go Shopping (LGS) -- Web-Scale Image-Text Dataset for Visual Concept Understanding
Yatong Bai
Utsav Garg
Apaar Shanker
Haoming Zhang
Samyak Parajuli
...
Eugenia D Fomitcheva
E. Branson
Aerin Kim
Somayeh Sojoudi
Kyunghyun Cho
60
2
0
09 Jan 2024
Skin Cancer Segmentation and Classification Using Vision Transformer for
  Automatic Analysis in Dermatoscopy-based Non-invasive Digital System
Skin Cancer Segmentation and Classification Using Vision Transformer for Automatic Analysis in Dermatoscopy-based Non-invasive Digital System
Galib Muhammad Shahriar Himel
Md. Masudul Islam
Kh Abdullah Al-Aff
Shams Ibne Karim
Md. Kabir Uddin Sikder
MedIm
82
25
0
09 Jan 2024
PhilEO Bench: Evaluating Geo-Spatial Foundation Models
PhilEO Bench: Evaluating Geo-Spatial Foundation Models
Casper Fibaek
Luke Camilleri
Andreas Luyts
Nikolaos Dionelis
B. L. Saux
105
17
0
09 Jan 2024
Previous
123...434445...949596
Next