ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.06377
  4. Cited By
Masked Autoencoders Are Scalable Vision Learners
v1v2v3 (latest)

Masked Autoencoders Are Scalable Vision Learners

11 November 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
    ViTTPM
ArXiv (abs)PDFHTML

Papers citing "Masked Autoencoders Are Scalable Vision Learners"

50 / 4,778 papers shown
Title
Localization vs. Semantics: Visual Representations in Unimodal and
  Multimodal Models
Localization vs. Semantics: Visual Representations in Unimodal and Multimodal Models
Zhuowan Li
Cihang Xie
Benjamin Van Durme
Alan Yuille
VLMSSL
56
2
0
01 Dec 2022
GRiT: A Generative Region-to-text Transformer for Object Understanding
GRiT: A Generative Region-to-text Transformer for Object Understanding
Jialian Wu
Jianfeng Wang
Zhengyuan Yang
Zhe Gan
Zicheng Liu
Junsong Yuan
Lijuan Wang
ObjDVLM
81
119
0
01 Dec 2022
Spatio-Temporal Crop Aggregation for Video Representation Learning
Spatio-Temporal Crop Aggregation for Video Representation Learning
Sepehr Sameni
Simon Jenni
Paolo Favaro
103
3
0
30 Nov 2022
Exploiting Category Names for Few-Shot Classification with
  Vision-Language Models
Exploiting Category Names for Few-Shot Classification with Vision-Language Models
Taihong Xiao
Zirui Wang
Liangliang Cao
Jiahui Yu
Shengyang Dai
Ming-Hsuan Yang
VLMMLLM
91
5
0
29 Nov 2022
Procedural Image Programs for Representation Learning
Procedural Image Programs for Representation Learning
Manel Baradad
Chun-Fu
Jonas Wulff
Tongzhou Wang
Rogerio Feris
Antonio Torralba
Phillip Isola
103
23
0
29 Nov 2022
BARTSmiles: Generative Masked Language Models for Molecular
  Representations
BARTSmiles: Generative Masked Language Models for Molecular Representations
Gayane Chilingaryan
Hovhannes Tamoyan
Ani Tevosyan
N. Babayan
L. Khondkaryan
Karen Hambardzumyan
Zaven Navoyan
Hrant Khachatrian
Armen Aghajanyan
SSL
101
28
0
29 Nov 2022
On the Power of Foundation Models
On the Power of Foundation Models
Yang Yuan
106
38
0
29 Nov 2022
Dimensionality-Varying Diffusion Process
Dimensionality-Varying Diffusion Process
Han Zhang
Ruili Feng
Zhantao Yang
Lianghua Huang
Yu Liu
Yifei Zhang
Yujun Shen
Deli Zhao
Jingren Zhou
Fan Cheng
DiffM
49
10
0
29 Nov 2022
Handling Image and Label Resolution Mismatch in Remote Sensing
Handling Image and Label Resolution Mismatch in Remote Sensing
Scott Workman
Armin Hadzic
M. U. Rafique
69
5
0
28 Nov 2022
H3WB: Human3.6M 3D WholeBody Dataset and Benchmark
H3WB: Human3.6M 3D WholeBody Dataset and Benchmark
Yue Zhu
Nermin Samet
David Picard
3DH
81
20
0
28 Nov 2022
Perceive, Ground, Reason, and Act: A Benchmark for General-purpose
  Visual Representation
Perceive, Ground, Reason, and Act: A Benchmark for General-purpose Visual Representation
Jiangyong Huang
William Zhu
Baoxiong Jia
Zan Wang
Xiaojian Ma
Qing Li
Siyuan Huang
123
5
0
28 Nov 2022
SgVA-CLIP: Semantic-guided Visual Adapting of Vision-Language Models for
  Few-shot Image Classification
SgVA-CLIP: Semantic-guided Visual Adapting of Vision-Language Models for Few-shot Image Classification
Fang Peng
Xiaoshan Yang
Linhui Xiao
Yaowei Wang
Changsheng Xu
VLM
93
50
0
28 Nov 2022
Offline Q-Learning on Diverse Multi-Task Data Both Scales And
  Generalizes
Offline Q-Learning on Diverse Multi-Task Data Both Scales And Generalizes
Aviral Kumar
Rishabh Agarwal
Xinyang Geng
George Tucker
Sergey Levine
OffRL
132
51
0
28 Nov 2022
Class Adaptive Network Calibration
Class Adaptive Network Calibration
Bingyuan Liu
Jérôme Rony
Adrian Galdran
Jose Dolz
Ismail Ben Ayed
94
10
0
28 Nov 2022
Learning Dense Object Descriptors from Multiple Views for Low-shot
  Category Generalization
Learning Dense Object Descriptors from Multiple Views for Low-shot Category Generalization
Stefan Stojanov
Anh Thai
Zixuan Huang
James M. Rehg
101
2
0
28 Nov 2022
Deep Active Learning for Computer Vision: Past and Future
Deep Active Learning for Computer Vision: Past and Future
Rinyoichi Takezoe
Xu Liu
Shunan Mao
Marco Tianyu Chen
Zhanpeng Feng
Shiliang Zhang
Xiaoyu Wang
VLM
94
22
0
27 Nov 2022
SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary
  Semantic Segmentation
SegCLIP: Patch Aggregation with Learnable Centers for Open-Vocabulary Semantic Segmentation
Huaishao Luo
Junwei Bao
Youzheng Wu
Xiaodong He
Tianrui Li
VLM
122
154
0
27 Nov 2022
Traditional Classification Neural Networks are Good Generators: They are
  Competitive with DDPMs and GANs
Traditional Classification Neural Networks are Good Generators: They are Competitive with DDPMs and GANs
Guangrun Wang
Philip Torr
83
9
0
27 Nov 2022
A Time Series is Worth 64 Words: Long-term Forecasting with Transformers
A Time Series is Worth 64 Words: Long-term Forecasting with Transformers
Yuqi Nie
Nam H. Nguyen
Phanwadee Sinthong
Jayant Kalagnanam
AIFinAI4TS
153
1,443
0
27 Nov 2022
A Knowledge-based Learning Framework for Self-supervised Pre-training
  Towards Enhanced Recognition of Biomedical Microscopy Images
A Knowledge-based Learning Framework for Self-supervised Pre-training Towards Enhanced Recognition of Biomedical Microscopy Images
Wei Chen
Chen Li
Dan Chen
Xin Luo
MedImSSL
77
12
0
27 Nov 2022
Rethinking Alignment and Uniformity in Unsupervised Image Semantic
  Segmentation
Rethinking Alignment and Uniformity in Unsupervised Image Semantic Segmentation
Daoan Zhang
Chenming Li
Haoquan Li
Wen-Fong Huang
Lingyun Huang
Jianguo Zhang
89
20
0
26 Nov 2022
MAEDAY: MAE for few and zero shot AnomalY-Detection
MAEDAY: MAE for few and zero shot AnomalY-Detection
Eli Schwartz
Assaf Arbelle
Leonid Karlinsky
Sivan Harary
Florian Scheidegger
Sivan Doveh
Raja Giryes
ViTUQCV
89
36
0
25 Nov 2022
Copy-Pasting Coherent Depth Regions Improves Contrastive Learning for
  Urban-Scene Segmentation
Copy-Pasting Coherent Depth Regions Improves Contrastive Learning for Urban-Scene Segmentation
Liang Zeng
A. Lengyel
Nergis Tomen
Jan van Gemert
AI4TS
69
0
0
25 Nov 2022
BatmanNet: Bi-branch Masked Graph Transformer Autoencoder for Molecular
  Representation
BatmanNet: Bi-branch Masked Graph Transformer Autoencoder for Molecular Representation
Zhen Wang
Zheng Feng
Yanjun Li
Bowen Li
Yongrui Wang
C. Sha
Min He
Xiaolin Li
AI4CE
90
9
0
25 Nov 2022
Expanding Small-Scale Datasets with Guided Imagination
Expanding Small-Scale Datasets with Guided Imagination
Yifan Zhang
Daquan Zhou
Bryan Hooi
Kaixin Wang
Jiashi Feng
171
48
0
25 Nov 2022
ILSGAN: Independent Layer Synthesis for Unsupervised
  Foreground-Background Segmentation
ILSGAN: Independent Layer Synthesis for Unsupervised Foreground-Background Segmentation
Qiran Zou
Yu Yang
Wing Yin Cheung
Chang-rui Liu
Xiang Ji
GAN
143
4
0
25 Nov 2022
XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video
  Representation Learning
XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video Representation Learning
Pritam Sarkar
Ali Etemad
112
23
0
25 Nov 2022
Towards Good Practices for Missing Modality Robust Action Recognition
Towards Good Practices for Missing Modality Robust Action Recognition
Sangmin Woo
Sumin Lee
Yeonju Park
Muhammad Adi Nugroho
Changick Kim
98
52
0
25 Nov 2022
Self-supervised vision-language pretraining for Medical visual question
  answering
Self-supervised vision-language pretraining for Medical visual question answering
Pengfei Li
Gang Liu
Lin Tan
Jinying Liao
Shenjun Zhong
MedIm
66
36
0
24 Nov 2022
Video Test-Time Adaptation for Action Recognition
Video Test-Time Adaptation for Action Recognition
Wei Lin
M. Jehanzeb Mirza
Mateusz Koziñski
Horst Possegger
Hilde Kuehne
Horst Bischof
TTA
105
32
0
24 Nov 2022
Pose-disentangled Contrastive Learning for Self-supervised Facial
  Representation
Pose-disentangled Contrastive Learning for Self-supervised Facial Representation
Y. Liu
Wenbin Wang
Yibing Zhan
Shaoze Feng
Li-Yu Daisy Liu
Zhe Chen
SSL
69
13
0
24 Nov 2022
Self-Supervised Learning based on Heat Equation
Self-Supervised Learning based on Heat Equation
Yinpeng Chen
Xiyang Dai
Dongdong Chen
Mengchen Liu
Lu Yuan
Zicheng Liu
Youzuo Lin
77
4
0
23 Nov 2022
ASiT: Local-Global Audio Spectrogram vIsion Transformer for Event
  Classification
ASiT: Local-Global Audio Spectrogram vIsion Transformer for Event Classification
Sara Atito
Muhammad Awais
Wenwu Wang
Mark D. Plumbley
J. Kittler
ViT
71
11
0
23 Nov 2022
ActMAD: Activation Matching to Align Distributions for
  Test-Time-Training
ActMAD: Activation Matching to Align Distributions for Test-Time-Training
M. Jehanzeb Mirza
Pol Jané Soneira
W. Lin
Mateusz Koziñski
Horst Possegger
Horst Bischof
VLMTTA
103
29
0
23 Nov 2022
Unsupervised 3D Keypoint Discovery with Multi-View Geometry
Unsupervised 3D Keypoint Discovery with Multi-View Geometry
S. Honari
Chen Zhao
Mathieu Salzmann
Pascal Fua
3DH
70
1
0
23 Nov 2022
Tell Me What Happened: Unifying Text-guided Video Completion via
  Multimodal Masked Video Generation
Tell Me What Happened: Unifying Text-guided Video Completion via Multimodal Masked Video Generation
Tsu-Jui Fu
Licheng Yu
Ning Zhang
Cheng-Yang Fu
Jong-Chyi Su
William Yang Wang
Sean Bell
VGen
151
38
0
23 Nov 2022
Reason from Context with Self-supervised Learning
Reason from Context with Self-supervised Learning
Xinyu Liu
Ankur Sikarwar
Gabriel Kreiman
Zenglin Shi
Mengmi Zhang
ReLMLRM
94
1
0
23 Nov 2022
A Dual-scale Lead-seperated Transformer With Lead-orthogonal Attention
  And Meta-information For Ecg Classification
A Dual-scale Lead-seperated Transformer With Lead-orthogonal Attention And Meta-information For Ecg Classification
Yongbin Li
Guijin Wang
Zhourui Xia
Wenming Yang
Li Sun
MedIm
54
1
0
23 Nov 2022
Masked Autoencoding for Scalable and Generalizable Decision Making
Masked Autoencoding for Scalable and Generalizable Decision Making
Fangchen Liu
Hao Liu
Aditya Grover
Pieter Abbeel
OffRL
87
49
0
23 Nov 2022
Fast-iTPN: Integrally Pre-Trained Transformer Pyramid Network with Token
  Migration
Fast-iTPN: Integrally Pre-Trained Transformer Pyramid Network with Token Migration
Yunjie Tian
Lingxi Xie
Jihao Qiu
Jianbin Jiao
Yaowei Wang
Qi Tian
Qixiang Ye
ViT
98
7
0
23 Nov 2022
DETRs with Collaborative Hybrid Assignments Training
DETRs with Collaborative Hybrid Assignments Training
Zhuofan Zong
Guanglu Song
Yu Liu
ViT
142
331
0
22 Nov 2022
A Cross-Residual Learning for Image Recognition
A Cross-Residual Learning for Image Recognition
Junle Liang
Songsen Yu
Huan Yang
43
0
0
22 Nov 2022
Generalizable Industrial Visual Anomaly Detection with Self-Induction
  Vision Transformer
Generalizable Industrial Visual Anomaly Detection with Self-Induction Vision Transformer
Haiming Yao
Wenyong Yu
ViT
96
5
0
22 Nov 2022
SimVP: Towards Simple yet Powerful Spatiotemporal Predictive Learning
SimVP: Towards Simple yet Powerful Spatiotemporal Predictive Learning
Cheng Tan
Zhangyang Gao
Siyuan Li
Stan Z. Li
VLMAI4TS
102
3
0
22 Nov 2022
LoopDA: Constructing Self-loops to Adapt Nighttime Semantic Segmentation
LoopDA: Constructing Self-loops to Adapt Nighttime Semantic Segmentation
Fengyi Shen
Zador Pataki
A. Gurram
Ziyuan Liu
He Wang
Alois Knoll
77
6
0
21 Nov 2022
Multitask Vision-Language Prompt Tuning
Multitask Vision-Language Prompt Tuning
Sheng Shen
Shijia Yang
Tianjun Zhang
Bohan Zhai
Joseph E. Gonzalez
Kurt Keutzer
Trevor Darrell
VLMVPVLM
115
53
0
21 Nov 2022
SMAUG: Sparse Masked Autoencoder for Efficient Video-Language
  Pre-training
SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training
Yuanze Lin
Chen Wei
Huiyu Wang
Alan Yuille
Cihang Xie
3DGS
109
15
0
21 Nov 2022
N-Gram in Swin Transformers for Efficient Lightweight Image
  Super-Resolution
N-Gram in Swin Transformers for Efficient Lightweight Image Super-Resolution
Haram Choi
Jeong-Sik Lee
Jihoon Yang
ViT
88
83
0
21 Nov 2022
MATE: Masked Autoencoders are Online 3D Test-Time Learners
MATE: Masked Autoencoders are Online 3D Test-Time Learners
M. Jehanzeb Mirza
Inkyu Shin
Wei Lin
Andreas Schriebl
Kunyang Sun
...
Horst Possegger
Mateusz Koziñski
In So Kweon
Kun-Jin Yoon
Horst Bischof
TTA3DPC
103
16
0
21 Nov 2022
VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for
  Speech Representation Learning
VATLM: Visual-Audio-Text Pre-Training with Unified Masked Prediction for Speech Representation Learning
Qiu-shi Zhu
Long Zhou
Zi-Hua Zhang
Shujie Liu
Binxing Jiao
Jie Zhang
Lirong Dai
Daxin Jiang
Jinyu Li
Furu Wei
113
38
0
21 Nov 2022
Previous
123...808182...949596
Next