ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.06377
  4. Cited By
Masked Autoencoders Are Scalable Vision Learners
v1v2v3 (latest)

Masked Autoencoders Are Scalable Vision Learners

11 November 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
    ViTTPM
ArXiv (abs)PDFHTML

Papers citing "Masked Autoencoders Are Scalable Vision Learners"

50 / 4,779 papers shown
Title
Exploring Visual Pre-training for Robot Manipulation: Datasets, Models
  and Methods
Exploring Visual Pre-training for Robot Manipulation: Datasets, Models and Methods
Ya Jing
Xuelin Zhu
Xingbin Liu
Qie Sima
Taozheng Yang
Yunhai Feng
Tao Kong
LM&Ro
76
16
0
07 Aug 2023
Revealing the Underlying Patterns: Investigating Dataset Similarity,
  Performance, and Generalization
Revealing the Underlying Patterns: Investigating Dataset Similarity, Performance, and Generalization
Akshit Achara
R. Pandey
SSL
115
0
0
07 Aug 2023
Heterogeneous Forgetting Compensation for Class-Incremental Learning
Heterogeneous Forgetting Compensation for Class-Incremental Learning
Jiahua Dong
Wenqi Liang
Yang Cong
Gan Sun
CLL
110
21
0
07 Aug 2023
Feature-Suppressed Contrast for Self-Supervised Food Pre-training
Feature-Suppressed Contrast for Self-Supervised Food Pre-training
Xinda Liu
Yaohui Zhu
Linhu Liu
Jiang Tian
Lili Wang
SSL
68
2
0
07 Aug 2023
AI-GOMS: Large AI-Driven Global Ocean Modeling System
AI-GOMS: Large AI-Driven Global Ocean Modeling System
Wei Xiong
Yanfei Xiang
Hao Wu
Shuyi Zhou
Yuze Sun
Muyuan Ma
Xiaomeng Huang
AI4ClAI4CE
77
22
0
06 Aug 2023
Bootstrapping Contrastive Learning Enhanced Music Cold-Start Matching
Bootstrapping Contrastive Learning Enhanced Music Cold-Start Matching
Xinping Zhao
Y. Zhang
Qiang Xiao
Yuming Ren
Yingchun Yang
37
6
0
05 Aug 2023
A Symbolic Character-Aware Model for Solving Geometry Problems
A Symbolic Character-Aware Model for Solving Geometry Problems
Maizhen Ning
Qiufeng Wang
Kaizhu Huang
Xiaowei Huang
77
18
0
05 Aug 2023
A Parameter-efficient Multi-subject Model for Predicting fMRI Activity
A Parameter-efficient Multi-subject Model for Predicting fMRI Activity
Connor Lane
Gregory Kiar
64
2
0
04 Aug 2023
DETR Doesn't Need Multi-Scale or Locality Design
DETR Doesn't Need Multi-Scale or Locality Design
Yutong Lin
Yuhui Yuan
Zheng Zhang
Chen Li
Nanning Zheng
Han Hu
95
5
0
03 Aug 2023
MAP: A Model-agnostic Pretraining Framework for Click-through Rate
  Prediction
MAP: A Model-agnostic Pretraining Framework for Click-through Rate Prediction
Jianghao Lin
Yanru Qu
Wei Guo
Xinyi Dai
Ruiming Tang
Yong Yu
Weinan Zhang
77
21
0
03 Aug 2023
A Multidimensional Analysis of Social Biases in Vision Transformers
A Multidimensional Analysis of Social Biases in Vision Transformers
Jannik Brinkmann
Paul Swoboda
Christian Bartelt
60
8
0
03 Aug 2023
InterAct: Exploring the Potentials of ChatGPT as a Cooperative Agent
InterAct: Exploring the Potentials of ChatGPT as a Cooperative Agent
Po-Lin Chen
Cheng-Shang Chang
LM&RoLLMAG
77
14
0
03 Aug 2023
Beyond Generic: Enhancing Image Captioning with Real-World Knowledge
  using Vision-Language Pre-Training Model
Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model
Ka Leong Cheng
Wenpo Song
Zheng Ma
Wenhao Zhu
Zi-Yue Zhu
Jianbing Zhang
CLIPVLM
65
11
0
02 Aug 2023
ImageBrush: Learning Visual In-Context Instructions for Exemplar-Based
  Image Manipulation
ImageBrush: Learning Visual In-Context Instructions for Exemplar-Based Image Manipulation
Yasheng Sun
Yifan Yang
Houwen Peng
Yifei Shen
Yuqing Yang
Hang-Rui Hu
Lili Qiu
Hideki Koike
DiffMLM&Ro
87
39
0
02 Aug 2023
AnyLoc: Towards Universal Visual Place Recognition
AnyLoc: Towards Universal Visual Place Recognition
Nikhil Varma Keetha
Avneesh Mishra
Jay Karhade
Krishna Murthy Jatavallabhula
Sebastian Scherer
Madhava Krishna
Sourav Garg
118
133
0
01 Aug 2023
Patch-wise Auto-Encoder for Visual Anomaly Detection
Patch-wise Auto-Encoder for Visual Anomaly Detection
Yajie Cui
Zhaoxiang Liu
Kai Wang
UQCVViT
40
0
0
01 Aug 2023
Gated Driver Attention Predictor
Gated Driver Attention Predictor
Tianci Zhao
Xue Bai
Jianwu Fang
Jianru Xue
79
2
0
01 Aug 2023
Improving Pixel-based MIM by Reducing Wasted Modeling Capability
Improving Pixel-based MIM by Reducing Wasted Modeling Capability
Yuan Liu
Songyang Zhang
Jiacheng Chen
Zhaohui Yu
Kai-xiang Chen
Dahua Lin
104
32
0
01 Aug 2023
EEG-based Cognitive Load Classification using Feature Masked
  Autoencoding and Emotion Transfer Learning
EEG-based Cognitive Load Classification using Feature Masked Autoencoding and Emotion Transfer Learning
Dustin Pulver
Prithila Angkan
Paul Hungler
Ali Etemad
86
5
0
01 Aug 2023
Disruptive Autoencoders: Leveraging Low-level features for 3D Medical
  Image Pre-training
Disruptive Autoencoders: Leveraging Low-level features for 3D Medical Image Pre-training
Jeya Maria Jose Valanarasu
Yucheng Tang
Dong Yang
Ziyue Xu
Can Zhao
...
Vishal M. Patel
Bennett Landman
Daguang Xu
Yufan He
V. Nath
MedIm
76
13
0
31 Jul 2023
Stochastic positional embeddings improve masked image modeling
Stochastic positional embeddings improve masked image modeling
Amir Bar
Florian Bordes
Assaf Shocher
Mahmoud Assran
Pascal Vincent
Nicolas Ballas
Trevor Darrell
Amir Globerson
Yann LeCun
77
3
0
31 Jul 2023
Sampling to Distill: Knowledge Transfer from Open-World Data
Sampling to Distill: Knowledge Transfer from Open-World Data
Yuzheng Wang
Zhaoyu Chen
Jie M. Zhang
Dingkang Yang
Zuhao Ge
Yang Liu
Siao Liu
Yunquan Sun
Wenqiang Zhang
Lizhe Qi
85
9
0
31 Jul 2023
SAMFlow: Eliminating Any Fragmentation in Optical Flow with Segment
  Anything Model
SAMFlow: Eliminating Any Fragmentation in Optical Flow with Segment Anything Model
Shili Zhou
Ruian He
Weimin Tan
Bo Yan
VLM
68
13
0
31 Jul 2023
Bridging the Gap: Exploring the Capabilities of Bridge-Architectures for
  Complex Visual Reasoning Tasks
Bridging the Gap: Exploring the Capabilities of Bridge-Architectures for Complex Visual Reasoning Tasks
Kousik Rajesh
Mrigank Raman
M. A. Karim
Pranit Chawla
VLM
58
2
0
31 Jul 2023
Open-Set Domain Adaptation with Visual-Language Foundation Models
Open-Set Domain Adaptation with Visual-Language Foundation Models
Qing Yu
Go Irie
Kiyoharu Aizawa
VLM
111
7
0
30 Jul 2023
UnIVAL: Unified Model for Image, Video, Audio and Language Tasks
UnIVAL: Unified Model for Image, Video, Audio and Language Tasks
Mustafa Shukor
Corentin Dancette
Alexandre Ramé
Matthieu Cord
MoMeMLLM
126
46
0
30 Jul 2023
HandMIM: Pose-Aware Self-Supervised Learning for 3D Hand Mesh Estimation
HandMIM: Pose-Aware Self-Supervised Learning for 3D Hand Mesh Estimation
Zuyan Liu
Gaojie Lin
Congyi Wang
Min Zheng
Feida Zhu
3DH
70
0
0
29 Jul 2023
VPP: Efficient Conditional 3D Generation via Voxel-Point Progressive
  Representation
VPP: Efficient Conditional 3D Generation via Voxel-Point Progressive Representation
Zekun Qi
Muzhou Yu
Runpei Dong
Kaisheng Ma
3DPC
78
15
0
28 Jul 2023
The RoboDepth Challenge: Methods and Advancements Towards Robust Depth
  Estimation
The RoboDepth Challenge: Methods and Advancements Towards Robust Depth Estimation
Lingdong Kong
Yaru Niu
Shaoyuan Xie
Hanjiang Hu
Lai Xing Ng
...
Zhenyu Li
Runze Chen
Haiyong Luo
Fang Zhao
Jing Yu
101
13
0
27 Jul 2023
Take-A-Photo: 3D-to-2D Generative Pre-training of Point Cloud Models
Take-A-Photo: 3D-to-2D Generative Pre-training of Point Cloud Models
Ziyi Wang
Xumin Yu
Yongming Rao
Jie Zhou
Jiwen Lu
DiffM3DPC
79
19
0
27 Jul 2023
IML-ViT: Benchmarking Image Manipulation Localization by Vision
  Transformer
IML-ViT: Benchmarking Image Manipulation Localization by Vision Transformer
Xiaochen Ma
Bo Du
Zhuohang Jiang
Ahmed Y. Al Hammadi
Jizhe Zhou
73
9
0
27 Jul 2023
P2C: Self-Supervised Point Cloud Completion from Single Partial Clouds
P2C: Self-Supervised Point Cloud Completion from Single Partial Clouds
Rui-Qing Cui
Shi Qiu
Saeed Anwar
Jiawei Liu
Chaoyue Xing
Jing Zhang
Nick Barnes
3DPC
78
21
0
27 Jul 2023
vox2vec: A Framework for Self-supervised Contrastive Learning of
  Voxel-level Representations in Medical Images
vox2vec: A Framework for Self-supervised Contrastive Learning of Voxel-level Representations in Medical Images
M. Goncharov
Vera Soboleva
Anvar Kurmukov
Maxim Pisov
Mikhail Belyaev
SSL
69
10
0
27 Jul 2023
Pre-training Vision Transformers with Very Limited Synthesized Images
Pre-training Vision Transformers with Very Limited Synthesized Images
Ryo Nakamura1
Hirokatsu Kataoka
Sora Takashima
Edgar Josafat Martinez-Noriega
Rio Yokota
Nakamasa Inoue
121
7
0
27 Jul 2023
Take Your Pick: Enabling Effective Personalized Federated Learning
  within Low-dimensional Feature Space
Take Your Pick: Enabling Effective Personalized Federated Learning within Low-dimensional Feature Space
Guogang Zhu
Xuefeng Liu
Shaojie Tang
Jianwei Niu
Xinghao Wu
Jiaxing Shen
114
2
0
26 Jul 2023
Analysis of Video Quality Datasets via Design of Minimalistic Video
  Quality Models
Analysis of Video Quality Datasets via Design of Minimalistic Video Quality Models
Wei Sun
Wen Wen
Xiongkuo Min
Long Lan
Guangtao Zhai
Kede Ma
91
26
0
26 Jul 2023
How to Scale Your EMA
How to Scale Your EMA
Dan Busbridge
Jason Ramapuram
Pierre Ablin
Tatiana Likhomanenko
Eeshan Gunesh Dhekane
Xavier Suau
Russ Webb
82
19
0
25 Jul 2023
When Multi-Task Learning Meets Partial Supervision: A Computer Vision
  Review
When Multi-Task Learning Meets Partial Supervision: A Computer Vision Review
Maxime Fontana
Michael W. Spratling
Miaojing Shi
87
7
0
25 Jul 2023
E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning
E^2VPT: An Effective and Efficient Approach for Visual Prompt Tuning
Cheng Han
Qifan Wang
Yiming Cui
Zhiwen Cao
Wenguan Wang
Siyuan Qi
Dongfang Liu
VPVLMVLM
90
55
0
25 Jul 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming-Hsuan Yang
Fahad Shahbaz Khan
VLM
148
128
0
25 Jul 2023
Optical Flow boosts Unsupervised Localization and Segmentation
Optical Flow boosts Unsupervised Localization and Segmentation
Xinyu Zhang
Abdeslam Boularias
107
5
0
25 Jul 2023
Unlocking the Emotional World of Visual Media: An Overview of the
  Science, Research, and Impact of Understanding Emotion
Unlocking the Emotional World of Visual Media: An Overview of the Science, Research, and Impact of Understanding Emotion
James Z. Wang
Sicheng Zhao
Chenyan Wu
Reginald B. Adams
M. Newman
T. Shafir
Rachelle Tsachor
138
33
0
25 Jul 2023
Learning Autonomous Ultrasound via Latent Task Representation and
  Robotic Skills Adaptation
Learning Autonomous Ultrasound via Latent Task Representation and Robotic Skills Adaptation
Xutian Deng
Junnan Jiang
Wen-Huang Cheng
Miao Li
55
3
0
25 Jul 2023
Multi-Granularity Prediction with Learnable Fusion for Scene Text
  Recognition
Multi-Granularity Prediction with Learnable Fusion for Scene Text Recognition
Cheng Da
Peng Wang
Cong Yao
52
9
0
25 Jul 2023
Enhancing image captioning with depth information using a
  Transformer-based framework
Enhancing image captioning with depth information using a Transformer-based framework
Aya Mahmoud Ahmed
Mohamed Yousef
K. Hussain
Yousef B. Mahdy
ViT
71
4
0
24 Jul 2023
CLIP-KD: An Empirical Study of CLIP Model Distillation
CLIP-KD: An Empirical Study of CLIP Model Distillation
Chuanguang Yang
Zhulin An
Libo Huang
Junyu Bi
Xinqiang Yu
Hansheng Yang
Boyu Diao
Yongjun Xu
VLM
106
35
0
24 Jul 2023
AMAE: Adaptation of Pre-Trained Masked Autoencoder for Dual-Distribution
  Anomaly Detection in Chest X-Rays
AMAE: Adaptation of Pre-Trained Masked Autoencoder for Dual-Distribution Anomaly Detection in Chest X-Rays
Behzad Bozorgtabar
Dwarikanath Mahapatra
Jean-Philippe Thiran
MedIm
94
10
0
24 Jul 2023
MC-JEPA: A Joint-Embedding Predictive Architecture for Self-Supervised
  Learning of Motion and Content Features
MC-JEPA: A Joint-Embedding Predictive Architecture for Self-Supervised Learning of Motion and Content Features
Adrien Bardes
Jean Ponce
Yann LeCun
MDE
107
27
0
24 Jul 2023
Global k-Space Interpolation for Dynamic MRI Reconstruction using Masked
  Image Modeling
Global k-Space Interpolation for Dynamic MRI Reconstruction using Masked Image Modeling
Jia Pan
Suprosanna Shit
Özgün Turgut
Wenqi Huang
Hongwei Bran Li
Nil Stolt Ansó
Thomas Kustner
Kerstin Hammernik
Daniel Rueckert
67
9
0
24 Jul 2023
SwinMM: Masked Multi-view with Swin Transformers for 3D Medical Image
  Segmentation
SwinMM: Masked Multi-view with Swin Transformers for 3D Medical Image Segmentation
Yiqing Wang
Zihan Li
Jieru Mei
Zi-Ying Wei
Li Liu
Chen Wang
Shengtian Sang
Alan Yuille
Cihang Xie
Yuyin Zhou
ViTMedIm
69
33
0
24 Jul 2023
Previous
123...606162...949596
Next