ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.06377
  4. Cited By
Masked Autoencoders Are Scalable Vision Learners
v1v2v3 (latest)

Masked Autoencoders Are Scalable Vision Learners

11 November 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
    ViTTPM
ArXiv (abs)PDFHTML

Papers citing "Masked Autoencoders Are Scalable Vision Learners"

50 / 4,779 papers shown
Title
Advancing Volumetric Medical Image Segmentation via Global-Local Masked
  Autoencoder
Advancing Volumetric Medical Image Segmentation via Global-Local Masked Autoencoder
Jiafan Zhuang
Luyang Luo
Hao Chen
97
11
0
15 Jun 2023
ViP: A Differentially Private Foundation Model for Computer Vision
ViP: A Differentially Private Foundation Model for Computer Vision
Yaodong Yu
Maziar Sanjabi
Yi Ma
Kamalika Chaudhuri
Chuan Guo
65
13
0
15 Jun 2023
Description-Enhanced Label Embedding Contrastive Learning for Text
  Classification
Description-Enhanced Label Embedding Contrastive Learning for Text Classification
Kun Zhang
Le Wu
Guangyi Lv
Enhong Chen
Shulan Ruan
Jing Liu
Qing Cui
Jun Zhou
Meng Wang
VLM
52
10
0
15 Jun 2023
Explore In-Context Learning for 3D Point Cloud Understanding
Explore In-Context Learning for 3D Point Cloud Understanding
Zhongbin Fang
Xiangtai Li
Xia Li
J. M. Buhmann
Chen Change Loy
Mengyuan Liu
3DPC
87
27
0
14 Jun 2023
Towards AGI in Computer Vision: Lessons Learned from GPT and Large
  Language Models
Towards AGI in Computer Vision: Lessons Learned from GPT and Large Language Models
Lingxi Xie
Longhui Wei
Xiaopeng Zhang
Kaifeng Bi
Xiaotao Gu
Jianlong Chang
Qi Tian
88
7
0
14 Jun 2023
TomoSAM: a 3D Slicer extension using SAM for tomography segmentation
TomoSAM: a 3D Slicer extension using SAM for tomography segmentation
Federico Semeraro
Alexandre Quintart
Sergio Izquierdo
J. Ferguson
49
6
0
14 Jun 2023
Chart2Vec: A Universal Embedding of Context-Aware Visualizations
Chart2Vec: A Universal Embedding of Context-Aware Visualizations
Qing Chen
Ying Chen
Ruishi Zou
Wei Shuai
Yi Guo
Jiazhe Wang
Nana Cao
83
3
0
14 Jun 2023
Deblurring Masked Autoencoder is Better Recipe for Ultrasound Image
  Recognition
Deblurring Masked Autoencoder is Better Recipe for Ultrasound Image Recognition
Qingbo Kang
Jun Gao
Kang Li
Qicheng Lao
106
10
0
14 Jun 2023
MOFI: Learning Image Representations from Noisy Entity Annotated Images
MOFI: Learning Image Representations from Noisy Entity Annotated Images
Wentao Wu
Aleksei Timofeev
Chen Chen
Bowen Zhang
Kun Duan
...
Yantao Zheng
Jonathon Shlens
Xianzhi Du
Zhe Gan
Yinfei Yang
VLM
92
8
0
13 Jun 2023
Automated 3D Pre-Training for Molecular Property Prediction
Automated 3D Pre-Training for Molecular Property Prediction
Xu Wang
Huan Zhao
Weiwei Tu
Quanming Yao
AI4CE
93
37
0
13 Jun 2023
Rethinking Polyp Segmentation from an Out-of-Distribution Perspective
Rethinking Polyp Segmentation from an Out-of-Distribution Perspective
Ge-Peng Ji
Jing Zhang
Dylan Campbell
Huan Xiong
Nick Barnes
84
7
0
13 Jun 2023
Dynamically Masked Discriminator for Generative Adversarial Networks
Dynamically Masked Discriminator for Generative Adversarial Networks
Wentian Zhang
Haozhe Liu
Bing Li
Jinheng Xie
Yawen Huang
Yuexiang Li
Yefeng Zheng
Guohao Li
TTA
93
2
0
13 Jun 2023
Semi-supervised learning made simple with self-supervised clustering
Semi-supervised learning made simple with self-supervised clustering
Enrico Fini
Pietro Astolfi
Alahari Karteek
Xavier Alameda-Pineda
Julien Mairal
Moin Nabi
Elisa Ricci
SSL
81
29
0
13 Jun 2023
Learning to Mask and Permute Visual Tokens for Vision Transformer Pre-Training
Learning to Mask and Permute Visual Tokens for Vision Transformer Pre-Training
Lorenzo Baraldi
Roberto Amoroso
Marcella Cornia
Lorenzo Baraldi
Andrea Pilzer
Rita Cucchiara
153
2
0
12 Jun 2023
Controlling Text-to-Image Diffusion by Orthogonal Finetuning
Controlling Text-to-Image Diffusion by Orthogonal Finetuning
Zeju Qiu
Wei-yu Liu
Haiwen Feng
Yuxuan Xue
Yao Feng
Zhen Liu
Dan Zhang
Adrian Weller
Bernhard Schölkopf
DiffM
137
158
0
12 Jun 2023
A Survey of Vision-Language Pre-training from the Lens of Multimodal
  Machine Translation
A Survey of Vision-Language Pre-training from the Lens of Multimodal Machine Translation
Jeremy Gwinnup
Kevin Duh
VLM
63
3
0
12 Jun 2023
MaskedFusion360: Reconstruct LiDAR Data by Querying Camera Features
MaskedFusion360: Reconstruct LiDAR Data by Querying Camera Features
Royden Wagner
Marvin Klemp
Carlos Fernandez Lopez
3DPC
93
1
0
12 Jun 2023
Unmasking Deepfakes: Masked Autoencoding Spatiotemporal Transformers for
  Enhanced Video Forgery Detection
Unmasking Deepfakes: Masked Autoencoding Spatiotemporal Transformers for Enhanced Video Forgery Detection
Sayantan Das
Mojtaba Kolahdouzi
Levent Özparlak
Will Hickie
Ali Etemad
ViTCVBM
62
4
0
12 Jun 2023
Generating Synthetic Datasets by Interpolating along Generalized
  Geodesics
Generating Synthetic Datasets by Interpolating along Generalized Geodesics
JiaoJiao Fan
David Alvarez-Melis
104
10
0
12 Jun 2023
Multi-modal Pre-training for Medical Vision-language Understanding and
  Generation: An Empirical Study with A New Benchmark
Multi-modal Pre-training for Medical Vision-language Understanding and Generation: An Empirical Study with A New Benchmark
Li Xu
Bo Liu
Ameer Hamza Khan
Lu Fan
Xiao-Ming Wu
LM&MA
67
9
0
10 Jun 2023
SegViTv2: Exploring Efficient and Continual Semantic Segmentation with
  Plain Vision Transformers
SegViTv2: Exploring Efficient and Continual Semantic Segmentation with Plain Vision Transformers
Bowen Zhang
Liyang Liu
Minh Hieu Phan
Zhi Tian
Chunhua Shen
Yifan Liu
ViT
114
30
0
09 Jun 2023
FLSL: Feature-level Self-supervised Learning
FLSL: Feature-level Self-supervised Learning
Qing Su
Anton Netchaev
Hai Helen Li
Shihao Ji
119
5
0
09 Jun 2023
FasterViT: Fast Vision Transformers with Hierarchical Attention
FasterViT: Fast Vision Transformers with Hierarchical Attention
Ali Hatamizadeh
Greg Heinrich
Hongxu Yin
Andrew Tao
J. Álvarez
Jan Kautz
Pavlo Molchanov
ViT
122
72
0
09 Jun 2023
Leveraging Large Language Models for Scalable Vector Graphics-Driven
  Image Understanding
Leveraging Large Language Models for Scalable Vector Graphics-Driven Image Understanding
Mu Cai
Zeyi Huang
Yuheng Li
Utkarsh Ojha
Haohan Wang
Yong Jae Lee
VLM
37
2
0
09 Jun 2023
Learning Domain-Aware Detection Head with Prompt Tuning
Learning Domain-Aware Detection Head with Prompt Tuning
Haochen Li
Rui Zhang
Hantao Yao
Xinkai Song
Yifan Hao
Yongwei Zhao
Ling Li
Yunji Chen
VLM
106
18
0
09 Jun 2023
On the Challenges and Perspectives of Foundation Models for Medical
  Image Analysis
On the Challenges and Perspectives of Foundation Models for Medical Image Analysis
Shaoting Zhang
Dimitris N. Metaxas
LM&MAVLMMedImAI4CE
106
156
0
09 Jun 2023
Exploring Effective Mask Sampling Modeling for Neural Image Compression
Exploring Effective Mask Sampling Modeling for Neural Image Compression
Lin Liu
Mingming Zhao
Shanxin Yuan
Wenlong Lyu
Wen-gang Zhou
Houqiang Li
Yanfeng Wang
Qi Tian
74
3
0
09 Jun 2023
ADDP: Learning General Representations for Image Recognition and
  Generation with Alternating Denoising Diffusion Process
ADDP: Learning General Representations for Image Recognition and Generation with Alternating Denoising Diffusion Process
Changyao Tian
Chenxin Tao
Jifeng Dai
Hao Li
Ziheng Li
Lewei Lu
Xiaogang Wang
Hongsheng Li
Gao Huang
Xizhou Zhu
DiffM
106
10
0
08 Jun 2023
R-MAE: Regions Meet Masked Autoencoders
R-MAE: Regions Meet Masked Autoencoders
Duy-Kien Nguyen
Vaibhav Aggarwal
Yanghao Li
Martin R. Oswald
Alexander Kirillov
Cees G. M. Snoek
Xinlei Chen
TPM
126
11
0
08 Jun 2023
SNAP: Self-Supervised Neural Maps for Visual Positioning and Semantic
  Understanding
SNAP: Self-Supervised Neural Maps for Visual Positioning and Semantic Understanding
Paul-Edouard Sarlin
Eduard Trulls
Marc Pollefeys
J. Hosang
Simon Lynen
3DPCSSL
98
26
0
08 Jun 2023
Connectional-Style-Guided Contextual Representation Learning for Brain
  Disease Diagnosis
Connectional-Style-Guided Contextual Representation Learning for Brain Disease Diagnosis
Gongshu Wang
Ning Jiang
Yunxiao Ma
Tiantian Liu
Duanduan Chen
Jinglong Wu
Guoqi Li
Dong Liang
Tianyi Yan
MedIm
74
2
0
08 Jun 2023
Image Clustering via the Principle of Rate Reduction in the Age of
  Pretrained Models
Image Clustering via the Principle of Rate Reduction in the Age of Pretrained Models
Tianzhe Chu
Shengbang Tong
Tianjiao Ding
Xili Dai
B. Haeffele
René Vidal
Yi Ma
SSLVLM
104
14
0
08 Jun 2023
Factorized Contrastive Learning: Going Beyond Multi-view Redundancy
Factorized Contrastive Learning: Going Beyond Multi-view Redundancy
Paul Pu Liang
Zihao Deng
Martin Q. Ma
James Zou
Louis-Philippe Morency
Ruslan Salakhutdinov
SSL
98
56
0
08 Jun 2023
SparseTrack: Multi-Object Tracking by Performing Scene Decomposition
  based on Pseudo-Depth
SparseTrack: Multi-Object Tracking by Performing Scene Decomposition based on Pseudo-Depth
Zelin Liu
Xinggang Wang
Cheng Wang
Wenyu Liu
X. Bai
VOSVOT
149
43
0
08 Jun 2023
FlowFormer: A Transformer Architecture and Its Masked Cost Volume
  Autoencoding for Optical Flow
FlowFormer: A Transformer Architecture and Its Masked Cost Volume Autoencoding for Optical Flow
Zhaoyang Huang
Xiaoyu Shi
Chao Zhang
Qiang Wang
Yijin Li
Hongwei Qin
Jifeng Dai
Xiaogang Wang
Hongsheng Li
141
4
0
08 Jun 2023
Improving Visual Prompt Tuning for Self-supervised Vision Transformers
Improving Visual Prompt Tuning for Self-supervised Vision Transformers
S. Yoo
Eunji Kim
Dahuin Jung
Jungbeom Lee
Sung-Hoon Yoon
VLM
128
44
0
08 Jun 2023
Spain on Fire: A novel wildfire risk assessment model based on image
  satellite processing and atmospheric information
Spain on Fire: A novel wildfire risk assessment model based on image satellite processing and atmospheric information
Helena Liz-López
Javier Huertas-Tato
Jorge Pérez-Aracil
C. Casanova-Mateo
J. Sanz-Justo
David Camacho
67
11
0
08 Jun 2023
Joint Channel Estimation and Feedback with Masked Token Transformers in
  Massive MIMO Systems
Joint Channel Estimation and Feedback with Masked Token Transformers in Massive MIMO Systems
Mingming Zhao
Lin Liu
Lifu Liu
Mengke Li
Qing-min Tian
63
0
0
08 Jun 2023
Differentially Private Image Classification by Learning Priors from
  Random Processes
Differentially Private Image Classification by Learning Priors from Random Processes
Xinyu Tang
Ashwinee Panda
Vikash Sehwag
Prateek Mittal
91
21
0
08 Jun 2023
Understanding Masked Autoencoders via Hierarchical Latent Variable
  Models
Understanding Masked Autoencoders via Hierarchical Latent Variable Models
Lingjing Kong
Martin Q. Ma
Guan-Hong Chen
Eric Xing
Yuejie Chi
Louis-Philippe Morency
Kun Zhang
87
32
0
08 Jun 2023
Object-Centric Learning for Real-World Videos by Predicting Temporal
  Feature Similarities
Object-Centric Learning for Real-World Videos by Predicting Temporal Feature Similarities
Andrii Zadaianchuk
Maximilian Seitzer
Georg Martius
OCL
133
43
0
07 Jun 2023
UniBoost: Unsupervised Unimodal Pre-training for Boosting Zero-shot
  Vision-Language Tasks
UniBoost: Unsupervised Unimodal Pre-training for Boosting Zero-shot Vision-Language Tasks
Yanan Sun
Zi-Qi Zhong
Qi Fan
Chi-Keung Tang
Yu-Wing Tai
VLM
78
4
0
07 Jun 2023
Exposing flaws of generative model evaluation metrics and their unfair
  treatment of diffusion models
Exposing flaws of generative model evaluation metrics and their unfair treatment of diffusion models
G. Stein
Jesse C. Cresswell
Rasa Hosseinzadeh
Yi Sui
Brendan Leigh Ross
Valentin Villecroze
Zhaoyan Liu
Anthony L. Caterini
J. E. T. Taylor
Gabriel Loaiza-Ganem
EGVM
155
108
0
07 Jun 2023
Coarse Is Better? A New Pipeline Towards Self-Supervised Learning with
  Uncurated Images
Coarse Is Better? A New Pipeline Towards Self-Supervised Learning with Uncurated Images
Ke Zhu
Yin He
Jianxin Wu
89
4
0
07 Jun 2023
Randomized 3D Scene Generation for Generalizable Self-Supervised
  Pre-Training
Randomized 3D Scene Generation for Generalizable Self-Supervised Pre-Training
Lanxiao Li
M. Heizmann
68
0
0
07 Jun 2023
Efficient Vision Transformer for Human Pose Estimation via Patch
  Selection
Efficient Vision Transformer for Human Pose Estimation via Patch Selection
K. A. Kinfu
René Vidal
ViT
86
4
0
07 Jun 2023
Extracting Cloud-based Model with Prior Knowledge
Extracting Cloud-based Model with Prior Knowledge
Songtao Zhao
Kangjie Chen
Meng Hao
Jian Zhang
Guowen Xu
Hongwei Li
Tianwei Zhang
AAMLMIACVSILMMLAUSLR
117
5
0
07 Jun 2023
Self-supervised Audio Teacher-Student Transformer for Both Clip-level
  and Frame-level Tasks
Self-supervised Audio Teacher-Student Transformer for Both Clip-level and Frame-level Tasks
Xian Li
Nian Shao
Xiaofei Li
ViTCLIP
103
28
0
07 Jun 2023
Accurate Fine-Grained Segmentation of Human Anatomy in Radiographs via
  Volumetric Pseudo-Labeling
Accurate Fine-Grained Segmentation of Human Anatomy in Radiographs via Volumetric Pseudo-Labeling
C. Seibold
A. Jaus
M. Fink
Moon S. Kim
Simon Reiß
Ken Herrmann
Jens Kleesiek
Rainer Stiefelhagen
65
9
0
06 Jun 2023
Human-imperceptible, Machine-recognizable Images
Human-imperceptible, Machine-recognizable Images
Fusheng Hao
Fengxiang He
Yikai Wang
Fuxiang Wu
Jing Zhang
Jun Cheng
Dacheng Tao
AAML
80
2
0
06 Jun 2023
Previous
123...646566...949596
Next