ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.06377
  4. Cited By
Masked Autoencoders Are Scalable Vision Learners
v1v2v3 (latest)

Masked Autoencoders Are Scalable Vision Learners

11 November 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
    ViTTPM
ArXiv (abs)PDFHTML

Papers citing "Masked Autoencoders Are Scalable Vision Learners"

50 / 4,778 papers shown
Title
A Recipe for Unbounded Data Augmentation in Visual Reinforcement
  Learning
A Recipe for Unbounded Data Augmentation in Visual Reinforcement Learning
Abdulaziz Almuzairee
Nicklas Hansen
Henrik I. Christensen
79
7
0
27 May 2024
Unisolver: PDE-Conditional Transformers Are Universal PDE Solvers
Unisolver: PDE-Conditional Transformers Are Universal PDE Solvers
Zhou Hang
Yuezhou Ma
Haixu Wu
Haowen Wang
Mingsheng Long
AI4CE
79
11
0
27 May 2024
FedHPL: Efficient Heterogeneous Federated Learning with Prompt Tuning
  and Logit Distillation
FedHPL: Efficient Heterogeneous Federated Learning with Prompt Tuning and Logit Distillation
Yuting Ma
Lechao Cheng
Yaxiong Wang
Zhun Zhong
Xiaohua Xu
Meng Wang
FedML
82
4
0
27 May 2024
LCM: Locally Constrained Compact Point Cloud Model for Masked Point
  Modeling
LCM: Locally Constrained Compact Point Cloud Model for Masked Point Modeling
Yaohua Zha
Naiqi Li
Yanzi Wang
Tao Dai
Hang Guo
Bin Chen
Zhi Wang
Zhihao Ouyang
Shu-Tao Xia
Mamba
120
10
0
27 May 2024
Position: Foundation Agents as the Paradigm Shift for Decision Making
Position: Foundation Agents as the Paradigm Shift for Decision Making
Xiaoqian Liu
Xingzhou Lou
Jianbin Jiao
Junge Zhang
OffRLLLMAG
105
7
0
27 May 2024
UIT-DarkCow team at ImageCLEFmedical Caption 2024: Diagnostic Captioning
  for Radiology Images Efficiency with Transformer Models
UIT-DarkCow team at ImageCLEFmedical Caption 2024: Diagnostic Captioning for Radiology Images Efficiency with Transformer Models
Quan Van Nguyen
Huy Quang Pham
Dan Quang Tran
Thang Kien-Bao Nguyen
Nhat-Hao Nguyen-Dang
Bao-Thien Nguyen-Tat
MedIm
67
2
0
27 May 2024
TokenUnify: Scalable Autoregressive Visual Pre-training with Mixture
  Token Prediction
TokenUnify: Scalable Autoregressive Visual Pre-training with Mixture Token Prediction
Yinda Chen
Haoyuan Shi
Xiaoyu Liu
Te Shi
Ruobing Zhang
Dong Liu
Zhiwei Xiong
Feng Wu
98
10
0
27 May 2024
Smoke and Mirrors in Causal Downstream Tasks
Smoke and Mirrors in Causal Downstream Tasks
Riccardo Cadei
Lukas Lindorfer
Sylvia Cremer
Cordelia Schmid
Francesco Locatello
CML
144
6
0
27 May 2024
Synergy and Diversity in CLIP: Enhancing Performance Through Adaptive Backbone Ensembling
Synergy and Diversity in CLIP: Enhancing Performance Through Adaptive Backbone Ensembling
Cristian Rodriguez-Opazo
Ehsan Abbasnejad
Damien Teney
Edison Marrese-Taylor
Hamed Damirchi
Anton Van Den Hengel
VLM
136
1
0
27 May 2024
Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to
  Multimodal Inputs
Implicit Multimodal Alignment: On the Generalization of Frozen LLMs to Multimodal Inputs
Mustafa Shukor
Matthieu Cord
141
5
0
26 May 2024
Probabilistic Contrastive Learning with Explicit Concentration on the
  Hypersphere
Probabilistic Contrastive Learning with Explicit Concentration on the Hypersphere
H. Li
Ouyang Cheng
Tamaz Amiranashvili
Matthew S. Rosen
Bjoern Menze
J. Iglesias
95
0
0
26 May 2024
Segmentation of Maya hieroglyphs through fine-tuned foundation models
Segmentation of Maya hieroglyphs through fine-tuned foundation models
Fnu Shivam
Megan Leight
Mary Kate Kelly
Claire Davis
Kelsey Clodfelter
Jacob Thrasher
Yenumula Reddy
P. Gyawali
52
0
0
26 May 2024
Enhancing Feature Diversity Boosts Channel-Adaptive Vision Transformers
Enhancing Feature Diversity Boosts Channel-Adaptive Vision Transformers
Chau Pham
Bryan A. Plummer
72
6
0
26 May 2024
LoGAH: Predicting 774-Million-Parameter Transformers using Graph
  HyperNetworks with 1/100 Parameters
LoGAH: Predicting 774-Million-Parameter Transformers using Graph HyperNetworks with 1/100 Parameters
Xinyu Zhou
Boris Knyazev
Alexia Jolicoeur-Martineau
Jie Fu
AI4CE
81
0
0
25 May 2024
ModelLock: Locking Your Model With a Spell
ModelLock: Locking Your Model With a Spell
Yifeng Gao
Yuhua Sun
Xingjun Ma
Zuxuan Wu
Yu-Gang Jiang
VLM
88
1
0
25 May 2024
Accelerating Transformers with Spectrum-Preserving Token Merging
Accelerating Transformers with Spectrum-Preserving Token Merging
Hoai-Chau Tran
D. M. Nguyen
Duy M. Nguyen
Trung Thanh Nguyen
Ngan Le
Pengtao Xie
Daniel Sonntag
James Y. Zou
Binh T. Nguyen
Mathias Niepert
106
13
0
25 May 2024
Diverse Teacher-Students for Deep Safe Semi-Supervised Learning under
  Class Mismatch
Diverse Teacher-Students for Deep Safe Semi-Supervised Learning under Class Mismatch
Qikai Wang
Rundong He
Yongshun Gong
Chunxiao Ren
Hao Sun
Xiaoshui Huang
Yilong Yin
55
0
0
25 May 2024
From Orthogonality to Dependency: Learning Disentangled Representation
  for Multi-Modal Time-Series Sensing Signals
From Orthogonality to Dependency: Learning Disentangled Representation for Multi-Modal Time-Series Sensing Signals
Ruichu Cai
Zhifan Jiang
Zijian Li
Weilin Chen
Xuexin Chen
Zhifeng Hao
Yifan Shen
Guan-Hong Chen
Kun Zhang
129
1
0
25 May 2024
Certifying Adapters: Enabling and Enhancing the Certification of
  Classifier Adversarial Robustness
Certifying Adapters: Enabling and Enhancing the Certification of Classifier Adversarial Robustness
Jieren Deng
Hanbin Hong
A. Palmer
Xin Zhou
Jinbo Bi
Kaleel Mahmood
Yuan Hong
Derek Aguiar
AAML
62
0
0
25 May 2024
Recasting Generic Pretrained Vision Transformers As Object-Centric Scene
  Encoders For Manipulation Policies
Recasting Generic Pretrained Vision Transformers As Object-Centric Scene Encoders For Manipulation Policies
Jianing Qian
Anastasios Panagopoulos
Dinesh Jayaraman
LM&RoViT
76
5
0
24 May 2024
The Road Less Scheduled
The Road Less Scheduled
Aaron Defazio
Xingyu Yang
Yang
Harsh Mehta
Konstantin Mishchenko
Ahmed Khaled
Ashok Cutkosky
120
60
0
24 May 2024
Polyp Segmentation Generalisability of Pretrained Backbones
Polyp Segmentation Generalisability of Pretrained Backbones
Edward Sanderson
B. Matuszewski
69
0
0
24 May 2024
PoinTramba: A Hybrid Transformer-Mamba Framework for Point Cloud
  Analysis
PoinTramba: A Hybrid Transformer-Mamba Framework for Point Cloud Analysis
Zicheng Wang
Zhen Chen
Yiming Wu
Zhen Zhao
Luping Zhou
Dong Xu
Mamba
109
15
0
24 May 2024
Text-guided 3D Human Motion Generation with Keyframe-based Parallel Skip
  Transformer
Text-guided 3D Human Motion Generation with Keyframe-based Parallel Skip Transformer
Zichen Geng
Caren Han
Zeeshan Hayder
Jian Liu
Mubarak Shah
Ajmal Mian
65
4
0
24 May 2024
Uncovering cognitive taskonomy through transfer learning in masked
  autoencoder-based fMRI reconstruction
Uncovering cognitive taskonomy through transfer learning in masked autoencoder-based fMRI reconstruction
Youzhi Qu
Junfeng Xia
Xinyao Jian
Wendu Li
Kaining Peng
Zhichao Liang
Haiyan Wu
Quanying Liu
61
0
0
24 May 2024
Modally Reduced Representation Learning of Multi-Lead ECG Signals
  through Simultaneous Alignment and Reconstruction
Modally Reduced Representation Learning of Multi-Lead ECG Signals through Simultaneous Alignment and Reconstruction
Nabil Ibtehaz
Masood S. Mortazavi
63
0
0
24 May 2024
Enhancing Generalized Fetal Brain MRI Segmentation using A Cascade
  Network with Depth-wise Separable Convolution and Attention Mechanism
Enhancing Generalized Fetal Brain MRI Segmentation using A Cascade Network with Depth-wise Separable Convolution and Attention Mechanism
Zhigao Cai
Xing-Ming Zhao
21
1
0
24 May 2024
PS-CAD: Local Geometry Guidance via Prompting and Selection for CAD
  Reconstruction
PS-CAD: Local Geometry Guidance via Prompting and Selection for CAD Reconstruction
Bingchen Yang
Haiyong Jiang
Hao Pan
Peter Wonka
Jun Xiao
Guosheng Lin
3DV
95
0
0
24 May 2024
ARVideo: Autoregressive Pretraining for Self-Supervised Video
  Representation Learning
ARVideo: Autoregressive Pretraining for Self-Supervised Video Representation Learning
Sucheng Ren
Hongru Zhu
Chen Wei
Yijiang Li
Alan Yuille
Cihang Xie
AI4TSVGenSSL
87
2
0
24 May 2024
Learning Invariant Causal Mechanism from Vision-Language Models
Learning Invariant Causal Mechanism from Vision-Language Models
Changwen Zheng
Siyu Zhao
Xingyu Zhang
Jiangmeng Li
Changwen Zheng
Jingyao Wang
CMLBDLVLM
127
0
0
24 May 2024
MuDreamer: Learning Predictive World Models without Reconstruction
MuDreamer: Learning Predictive World Models without Reconstruction
Maxime Burchi
Radu Timofte
75
4
0
23 May 2024
What Variables Affect Out-Of-Distribution Generalization in Pretrained
  Models?
What Variables Affect Out-Of-Distribution Generalization in Pretrained Models?
Md Yousuf Harun
Kyungbok Lee
Jhair Gallardo
Giri Krishnan
Christopher Kanan
107
6
0
23 May 2024
AstroPT: Scaling Large Observation Models for Astronomy
AstroPT: Scaling Large Observation Models for Astronomy
Michael J. Smith
Ryan J. Roberts
E. Angeloudi
M. Huertas-Company
76
2
0
23 May 2024
Masked Image Modelling for retinal OCT understanding
Masked Image Modelling for retinal OCT understanding
Theodoros Pissas
Pablo Márquez-Neila
Sebastian Wolf
M. Zinkernagel
Raphael Sznitman
49
1
0
23 May 2024
Sparse-Tuning: Adapting Vision Transformers with Efficient Fine-tuning
  and Inference
Sparse-Tuning: Adapting Vision Transformers with Efficient Fine-tuning and Inference
Ting Liu
Xuyang Liu
Liangtao Shi
Zunnan Xu
Siteng Huang
Yi Xin
Quanjun Yin
86
8
0
23 May 2024
Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation
Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation
Shiqi Yang
Zhi-Wei Zhong
Mengjie Zhao
Shusuke Takahashi
Masato Ishii
Takashi Shibuya
Yuki Mitsufuji
101
4
0
23 May 2024
MAMOC: MRI Motion Correction via Masked Autoencoding
MAMOC: MRI Motion Correction via Masked Autoencoding
Lennart Alexander Van der Goten
Jingyu Guo
Kevin Smith
41
0
0
23 May 2024
Does context matter in digital pathology?
Does context matter in digital pathology?
Paulina Tomaszewska
Mateusz Sperkowski
Przemysław Biecek
30
0
0
23 May 2024
Tuning-free Universally-Supervised Semantic Segmentation
Tuning-free Universally-Supervised Semantic Segmentation
Xiaobo Yang
Xiaojin Gong
VLM
84
2
0
23 May 2024
Leveraging Semantic Segmentation Masks with Embeddings for Fine-Grained
  Form Classification
Leveraging Semantic Segmentation Masks with Embeddings for Fine-Grained Form Classification
Taylor Archibald
Tony R. Martinez
AI4TS
70
0
0
23 May 2024
Configuring Data Augmentations to Reduce Variance Shift in Positional
  Embedding of Vision Transformers
Configuring Data Augmentations to Reduce Variance Shift in Positional Embedding of Vision Transformers
Bum Jun Kim
Sang Woo Kim
ViT
66
1
0
23 May 2024
Multi-modality Regional Alignment Network for Covid X-Ray Survival
  Prediction and Report Generation
Multi-modality Regional Alignment Network for Covid X-Ray Survival Prediction and Report Generation
Zhusi Zhong
Jie Li
J. Sollee
Scott Collins
Harrison X. Bai
Paul J Zhang
Terrance Healey
Michael Atalay
Xinbo Gao
Zhicheng Jiao
68
1
0
23 May 2024
PEAC: Unsupervised Pre-training for Cross-Embodiment Reinforcement
  Learning
PEAC: Unsupervised Pre-training for Cross-Embodiment Reinforcement Learning
Chengyang Ying
Zhongkai Hao
Xinning Zhou
Xuezhou Xu
Hang Su
Xingxing Zhang
Jun Zhu
132
5
0
23 May 2024
Dinomaly: The Less Is More Philosophy in Multi-Class Unsupervised Anomaly Detection
Dinomaly: The Less Is More Philosophy in Multi-Class Unsupervised Anomaly Detection
Jia Guo
Shuai Lu
Weihang Zhang
Huiqi Li
Huiqi Li
Hongen Liao
ViT
162
13
0
23 May 2024
Mamba-R: Vision Mamba ALSO Needs Registers
Mamba-R: Vision Mamba ALSO Needs Registers
Feng Wang
Jiahao Wang
Sucheng Ren
Guoyizhe Wei
J. Mei
Wei Shao
Yuyin Zhou
Alan Yuille
Cihang Xie
Mamba
87
23
0
23 May 2024
Harmony: A Joint Self-Supervised and Weakly-Supervised Framework for Learning General Purpose Visual Representations
Harmony: A Joint Self-Supervised and Weakly-Supervised Framework for Learning General Purpose Visual Representations
Mohammed Baharoon
Jonathan Klein
D. L. Michels
SSLVLM
138
0
0
23 May 2024
AnomalyDINO: Boosting Patch-based Few-shot Anomaly Detection with DINOv2
AnomalyDINO: Boosting Patch-based Few-shot Anomaly Detection with DINOv2
Simon Damm
M. Laszkiewicz
Johannes Lederer
Asja Fischer
128
8
0
23 May 2024
A Survey on Vision-Language-Action Models for Embodied AI
A Survey on Vision-Language-Action Models for Embodied AI
Yueen Ma
Zixing Song
Yuzheng Zhuang
Jianye Hao
Irwin King
LM&Ro
335
54
0
23 May 2024
LookHere: Vision Transformers with Directed Attention Generalize and
  Extrapolate
LookHere: Vision Transformers with Directed Attention Generalize and Extrapolate
A. Fuller
Daniel G. Kyrollos
Yousef Yassin
James R. Green
112
3
0
22 May 2024
Unleashing the Power of Unlabeled Data: A Self-supervised Learning
  Framework for Cyber Attack Detection in Smart Grids
Unleashing the Power of Unlabeled Data: A Self-supervised Learning Framework for Cyber Attack Detection in Smart Grids
Hanyu Zeng
Pengfei Zhou
Xin Lou
Zhen Wei Ng
D. K. Yau
Marianne Winslett
35
0
0
22 May 2024
Previous
123...313233...949596
Next