ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.06377
  4. Cited By
Masked Autoencoders Are Scalable Vision Learners
v1v2v3 (latest)

Masked Autoencoders Are Scalable Vision Learners

11 November 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
    ViTTPM
ArXiv (abs)PDFHTML

Papers citing "Masked Autoencoders Are Scalable Vision Learners"

50 / 4,779 papers shown
Title
General Object Foundation Model for Images and Videos at Scale
General Object Foundation Model for Images and Videos at Scale
Junfeng Wu
Yi Jiang
Qihao Liu
Zehuan Yuan
Xiang Bai
Song Bai
VOSVLM
111
41
0
14 Dec 2023
Tokenize Anything via Prompting
Tokenize Anything via Prompting
Ting Pan
Lulu Tang
Xinlong Wang
Shiguang Shan
VLM
68
23
0
14 Dec 2023
On the Difficulty of Defending Contrastive Learning against Backdoor
  Attacks
On the Difficulty of Defending Contrastive Learning against Backdoor Attacks
Changjiang Li
Ren Pang
Bochuan Cao
Zhaohan Xi
Jinghui Chen
Shouling Ji
Ting Wang
AAML
64
6
0
14 Dec 2023
Exploring Transferability for Randomized Smoothing
Exploring Transferability for Randomized Smoothing
Kai Qiu
Huishuai Zhang
Zhirong Wu
Stephen Lin
AAML
50
1
0
14 Dec 2023
Weighted Ensemble Models Are Strong Continual Learners
Weighted Ensemble Models Are Strong Continual Learners
Imad Eddine Marouf
Subhankar Roy
Enzo Tartaglione
Stéphane Lathuilière
CLL
115
22
0
14 Dec 2023
VaLID: Variable-Length Input Diffusion for Novel View Synthesis
VaLID: Variable-Length Input Diffusion for Novel View Synthesis
Shijie Li
F. G. Zanjani
H. Yahia
Yuki M. Asano
Juergen Gall
A. Habibian
DiffM
75
5
0
14 Dec 2023
Toward General-Purpose Robots via Foundation Models: A Survey and
  Meta-Analysis
Toward General-Purpose Robots via Foundation Models: A Survey and Meta-Analysis
Yafei Hu
Quanting Xie
Vidhi Jain
Jonathan M Francis
Jay Patrikar
...
Xiaolong Wang
Sebastian A. Scherer
Z. Kira
Fei Xia
Yonatan Bisk
LM&RoAI4CE
138
75
0
14 Dec 2023
VMT-Adapter: Parameter-Efficient Transfer Learning for Multi-Task Dense
  Scene Understanding
VMT-Adapter: Parameter-Efficient Transfer Learning for Multi-Task Dense Scene Understanding
Yi Xin
Junlong Du
Qiang Wang
Zhiwen Lin
Ke Yan
VPVLM
167
54
0
14 Dec 2023
Semi-supervised Semantic Segmentation Meets Masked Modeling:Fine-grained
  Locality Learning Matters in Consistency Regularization
Semi-supervised Semantic Segmentation Meets Masked Modeling:Fine-grained Locality Learning Matters in Consistency Regularization
W. Pan
Zhe Xu
Jiangpeng Yan
Zihan Wu
Raymond Kai-Yu Tong
Xiu Li
Jianhua Yao
ISeg
60
2
0
14 Dec 2023
NViST: In the Wild New View Synthesis from a Single Image with
  Transformers
NViST: In the Wild New View Synthesis from a Single Image with Transformers
Wonbong Jang
Lourdes Agapito
ViT
89
10
0
13 Dec 2023
ERASE: Error-Resilient Representation Learning on Graphs for Label Noise
  Tolerance
ERASE: Error-Resilient Representation Learning on Graphs for Label Noise Tolerance
Ling-Hao Chen
Yuanshuo Zhang
Taohua Huang
Liangcai Su
Zeyi Lin
Xi Xiao
Xiaobo Xia
Tongliang Liu
NoLa
111
9
0
13 Dec 2023
PAD: Self-Supervised Pre-Training with Patchwise-Scale Adapter for
  Infrared Images
PAD: Self-Supervised Pre-Training with Patchwise-Scale Adapter for Infrared Images
Tao Zhang
Kun Ding
Jinyong Wen
Yu Xiong
Zeyu Zhang
Shiming Xiang
Chunhong Pan
55
3
0
13 Dec 2023
LMD: Faster Image Reconstruction with Latent Masking Diffusion
LMD: Faster Image Reconstruction with Latent Masking Diffusion
Zhiyuan Ma
Zhihuan Yu
Jianjun Li
Bowen Zhou
DiffM
66
9
0
13 Dec 2023
Erasing Self-Supervised Learning Backdoor by Cluster Activation Masking
Erasing Self-Supervised Learning Backdoor by Cluster Activation Masking
Shengsheng Qian
Yifei Wang
Dizhan Xue
Shengjie Zhang
Huaiwen Zhang
Changsheng Xu
AAML
85
1
0
13 Dec 2023
Modality Plug-and-Play: Elastic Modality Adaptation in Multimodal LLMs
  for Embodied AI
Modality Plug-and-Play: Elastic Modality Adaptation in Multimodal LLMs for Embodied AI
Kai Huang
Boyuan Yang
Wei Gao
68
1
0
13 Dec 2023
DTL: Disentangled Transfer Learning for Visual Recognition
DTL: Disentangled Transfer Learning for Visual Recognition
Minghao Fu
Ke Zhu
Jianxin Wu
106
19
0
13 Dec 2023
Foundation Models in Robotics: Applications, Challenges, and the Future
Foundation Models in Robotics: Applications, Challenges, and the Future
Roya Firoozi
Johnathan Tucker
Stephen Tian
Anirudha Majumdar
Jiankai Sun
...
Brian Ichter
Danny Driess
Jiajun Wu
Cewu Lu
Mac Schwager
LM&RoAI4CELRMVLM
108
161
0
13 Dec 2023
A Foundational Multimodal Vision Language AI Assistant for Human
  Pathology
A Foundational Multimodal Vision Language AI Assistant for Human Pathology
Ming Y. Lu
Bowen Chen
Drew F. K. Williamson
Richard J. Chen
Kenji Ikamura
...
Ivy Liang
L. Le
Tong Ding
Anil V. Parwani
Faisal Mahmood
MedImLM&MA
86
23
0
13 Dec 2023
Artificial Intelligence for Digital and Computational Pathology
Artificial Intelligence for Digital and Computational Pathology
Andrew H. Song
Guillaume Jaume
Drew F. K. Williamson
Ming Y. Lu
Anurag J. Vaidya
Tiffany R. Miller
Faisal Mahmood
AI4CE
98
152
0
13 Dec 2023
Polynomial-based Self-Attention for Table Representation learning
Polynomial-based Self-Attention for Table Representation learning
Jayoung Kim
Yehjin Shin
Jeongwhan Choi
Hyowon Wi
Noseong Park
LMTD
93
2
0
12 Dec 2023
A Perspective of Q-value Estimation on Offline-to-Online Reinforcement
  Learning
A Perspective of Q-value Estimation on Offline-to-Online Reinforcement Learning
Yinmin Zhang
Jie Liu
Chuming Li
Yazhe Niu
Yaodong Yang
Yu Liu
Wanli Ouyang
OffRLOnRL
133
12
0
12 Dec 2023
Exploring Plain ViT Reconstruction for Multi-class Unsupervised Anomaly
  Detection
Exploring Plain ViT Reconstruction for Multi-class Unsupervised Anomaly Detection
Jiangning Zhang
Xuhai Chen
Yabiao Wang
Chengjie Wang
Yong Liu
Xiangtai Li
Ming-Hsuan Yang
Dacheng Tao
130
28
0
12 Dec 2023
MinD-3D: Reconstruct High-quality 3D objects in Human Brain
MinD-3D: Reconstruct High-quality 3D objects in Human Brain
Jianxiong Gao
Yu Fu
Yun Wang
Xuelin Qian
Jianfeng Feng
Yanwei Fu
DiffM
99
6
0
12 Dec 2023
Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language
  Models
Turbo: Informativity-Driven Acceleration Plug-In for Vision-Language Models
Chen Ju
Haicheng Wang
Zeqian Li
Xu Chen
Zhonghua Zhai
Weilin Huang
Shuai Xiao
VLM
125
8
0
12 Dec 2023
CLIP in Medical Imaging: A Comprehensive Survey
CLIP in Medical Imaging: A Comprehensive Survey
Zihao Zhao
Yuxiao Liu
Han Wu
Yonghao Li
Sheng Wang
L. Teng
Disheng Liu
Zhiming Cui
Qian Wang
Dinggang Shen
CLIPMedImLM&MAVLM
158
8
0
12 Dec 2023
Fast Training of Diffusion Transformer with Extreme Masking for 3D Point
  Clouds Generation
Fast Training of Diffusion Transformer with Extreme Masking for 3D Point Clouds Generation
Shentong Mo
Enze Xie
Yue Wu
Junsong Chen
Matthias Nießner
Zhenguo Li
69
6
0
12 Dec 2023
Shifted Autoencoders for Point Annotation Restoration in Object Counting
Shifted Autoencoders for Point Annotation Restoration in Object Counting
Yuda Zou
Xin Xiao
Peilin Zhou
Zhichao Sun
Bo Du
Yongchao Xu
63
1
0
12 Dec 2023
TransMed: Large Language Models Enhance Vision Transformer for
  Biomedical Image Classification
TransMed: Large Language Models Enhance Vision Transformer for Biomedical Image Classification
Kaipeng Zheng
Weiran Huang
Lichao Sun
LM&MAMedImVLM
88
0
0
12 Dec 2023
Domain Prompt Learning with Quaternion Networks
Domain Prompt Learning with Quaternion Networks
Qinglong Cao
Zhengqin Xu
Yuntian Chen
Chao Ma
Xiaokang Yang
VLM
123
12
0
12 Dec 2023
Building Universal Foundation Models for Medical Image Analysis with
  Spatially Adaptive Networks
Building Universal Foundation Models for Medical Image Analysis with Spatially Adaptive Networks
Lingxiao Luo
Xuanzhong Chen
Bingda Tang
Xinsheng Chen
Rong Han
Chengpeng Hu
Yujiang Li
Ting Chen
MedIm
72
2
0
12 Dec 2023
Template Free Reconstruction of Human-object Interaction with Procedural
  Interaction Generation
Template Free Reconstruction of Human-object Interaction with Procedural Interaction Generation
Xianghui Xie
Bharat Lal Bhatnagar
J. E. Lenssen
Gerard Pons-Moll
3DH
133
14
0
12 Dec 2023
CLASS-M: Adaptive stain separation-based contrastive learning with
  pseudo-labeling for histopathological image classification
CLASS-M: Adaptive stain separation-based contrastive learning with pseudo-labeling for histopathological image classification
Bodong Zhang
Hamid Manoochehri
M. M. Ho
Fahimeh Fooladgar
Yosep Chong
Beatrice Knudsen
Deepika Sirohi
Tolga Tasdizen
116
5
0
12 Dec 2023
Remote Sensing Vision-Language Foundation Models without Annotations via
  Ground Remote Alignment
Remote Sensing Vision-Language Foundation Models without Annotations via Ground Remote Alignment
Utkarsh Mall
Cheng Perng Phoo
Meilin Kelsey Liu
Carl Vondrick
B. Hariharan
Kavita Bala
VLM
72
42
0
12 Dec 2023
Multimodal Pretraining of Medical Time Series and Notes
Multimodal Pretraining of Medical Time Series and Notes
Ryan N. King
Tianbao Yang
Bobak J. Mortazavi
67
14
0
11 Dec 2023
Photorealistic Video Generation with Diffusion Models
Photorealistic Video Generation with Diffusion Models
Agrim Gupta
Lijun Yu
Kihyuk Sohn
Xiuye Gu
Meera Hahn
Fei-Fei Li
Irfan Essa
Lu Jiang
José Lezama
VGen
155
201
0
11 Dec 2023
4M: Massively Multimodal Masked Modeling
4M: Massively Multimodal Masked Modeling
David Mizrahi
Roman Bachmann
Ouguzhan Fatih Kar
Teresa Yeo
Mingfei Gao
Afshin Dehghan
Amir Zamir
MLLM
99
75
0
11 Dec 2023
Flexible visual prompts for in-context learning in computer vision
Flexible visual prompts for in-context learning in computer vision
Thomas Foster
Ioana Croitoru
Robert Dorfman
Christoffer Edlund
Thomas Varsavsky
Jon Almazán
VLMVOS
72
0
0
11 Dec 2023
Survey on Foundation Models for Prognostics and Health Management in Industrial Cyber-Physical Systems
Ruonan Liu
Quanhu Zhang
Te Han
AI4CE
79
4
0
11 Dec 2023
Medical Vision Language Pretraining: A survey
Medical Vision Language Pretraining: A survey
Prashant Shrestha
Sanskar Amgain
Bidur Khanal
Cristian A. Linte
Binod Bhattarai
VLM
100
17
0
11 Dec 2023
DisControlFace: Disentangled Control for Personalized Facial Image
  Editing
DisControlFace: Disentangled Control for Personalized Facial Image Editing
Haozhe Jia
Yan Li
Hengfei Cui
Di Xu
Changpeng Yang
Yuwang Wang
Tao Yu
DiffM
62
1
0
11 Dec 2023
Concrete Subspace Learning based Interference Elimination for Multi-task
  Model Fusion
Concrete Subspace Learning based Interference Elimination for Multi-task Model Fusion
Anke Tang
Li Shen
Yong Luo
Liang Ding
Han Hu
Bo Du
Dacheng Tao
MoMe
99
22
0
11 Dec 2023
AUGCAL: Improving Sim2Real Adaptation by Uncertainty Calibration on
  Augmented Synthetic Images
AUGCAL: Improving Sim2Real Adaptation by Uncertainty Calibration on Augmented Synthetic Images
Prithvijit Chattopadhyay
Bharat Goyal
B. Ecsedi
Viraj Prabhu
Judy Hoffman
92
1
0
11 Dec 2023
TabMT: Generating tabular data with masked transformers
TabMT: Generating tabular data with masked transformers
Manbir Gulati
Paul F. Roysdon
LMTD
99
38
0
11 Dec 2023
Counterfactual World Modeling for Physical Dynamics Understanding
Counterfactual World Modeling for Physical Dynamics Understanding
Rahul Venkatesh
Honglin Chen
Kevin T. Feigelis
Daniel M. Bear
Khaled Jedoui
...
Wanhee Lee
Sherry Liu
Kevin A. Smith
Judith E. Fan
Daniel L. K. Yamins
VGen
94
2
0
11 Dec 2023
Audio-Visual LLM for Video Understanding
Audio-Visual LLM for Video Understanding
Fangxun Shu
Lei Zhang
Hao Jiang
Cihang Xie
VLMMLLM
76
44
0
11 Dec 2023
Large Scale Foundation Models for Intelligent Manufacturing
  Applications: A Survey
Large Scale Foundation Models for Intelligent Manufacturing Applications: A Survey
Haotian Zhang
S. D. Semujju
Zhicheng Wang
Xianwei Lv
Kang Xu
...
Jing Wu
Zhuo Long
Wensheng Liang
Xiaoguang Ma
Ruiyan Zhuang
UQCVAI4TSAI4CE
94
4
0
11 Dec 2023
Deciphering 'What' and 'Where' Visual Pathways from Spectral Clustering
  of Layer-Distributed Neural Representations
Deciphering 'What' and 'Where' Visual Pathways from Spectral Clustering of Layer-Distributed Neural Representations
Xiao Zhang
David Yunis
Michael Maire
56
4
0
11 Dec 2023
DG-TTA: Out-of-domain Medical Image Segmentation through Augmentation and Descriptor-driven Domain Generalization and Test-Time Adaptation
DG-TTA: Out-of-domain Medical Image Segmentation through Augmentation and Descriptor-driven Domain Generalization and Test-Time Adaptation
Christian Weihsbach
Christian N. Kruse
Alexander Bigalke
Mattias P. Heinrich
82
0
0
11 Dec 2023
Diffusion for Natural Image Matting
Diffusion for Natural Image Matting
Yihan Hu
Yiheng Lin
Wei Wang
Yao-Min Zhao
Yunchao Wei
Humphrey Shi
103
9
0
10 Dec 2023
Transformer-based Selective Super-Resolution for Efficient Image
  Refinement
Transformer-based Selective Super-Resolution for Efficient Image Refinement
Tianyi Zhang
Kishore Kasichainula
Yaoxin Zhuo
Baoxin Li
Jae-sun Seo
Yu Cao
48
7
0
10 Dec 2023
Previous
123...464748...949596
Next