ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.06377
  4. Cited By
Masked Autoencoders Are Scalable Vision Learners
v1v2v3 (latest)

Masked Autoencoders Are Scalable Vision Learners

11 November 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
    ViTTPM
ArXiv (abs)PDFHTML

Papers citing "Masked Autoencoders Are Scalable Vision Learners"

50 / 4,777 papers shown
Title
Expanding Sparse Tuning for Low Memory Usage
Expanding Sparse Tuning for Low Memory Usage
Shufan Shen
Junshu Sun
Xiangyang Ji
Qingming Huang
Shuhui Wang
107
0
0
04 Nov 2024
Visual Fourier Prompt Tuning
Visual Fourier Prompt Tuning
Runjia Zeng
Cheng Han
Qifan Wang
Chunshu Wu
Tong Geng
Lifu Huang
Ying Nian Wu
Dongfang Liu
VPVLMVLM
124
8
0
02 Nov 2024
HEXA-MoE: Efficient and Heterogeneous-aware MoE Acceleration with ZERO
  Computation Redundancy
HEXA-MoE: Efficient and Heterogeneous-aware MoE Acceleration with ZERO Computation Redundancy
Shuqing Luo
Jie Peng
Pingzhi Li
Tianlong Chen
MoE
58
0
0
02 Nov 2024
HIP: Hierarchical Point Modeling and Pre-training for Visual Information
  Extraction
HIP: Hierarchical Point Modeling and Pre-training for Visual Information Extraction
Rujiao Long
Pengfei Wang
Zhibo Yang
Cong Yao
72
0
0
02 Nov 2024
Music Foundation Model as Generic Booster for Music Downstream Tasks
Music Foundation Model as Generic Booster for Music Downstream Tasks
Weihsiang Liao
Yuhta Takida
Yukara Ikemiya
Zhi-Wei Zhong
Chieh-Hsin Lai
...
Stefan Uhlich
Taketo Akama
Woosung Choi
Yuichiro Koyama
Yuki Mitsufuji
232
1
0
02 Nov 2024
Randomized Autoregressive Visual Generation
Randomized Autoregressive Visual Generation
Qihang Yu
Ju He
XueQing Deng
Xiaohui Shen
Liang-Chieh Chen
VGenDiffM
138
40
1
01 Nov 2024
PedSleepMAE: Generative Model for Multimodal Pediatric Sleep Signals
PedSleepMAE: Generative Model for Multimodal Pediatric Sleep Signals
Saurav R. Pandey
Aaqib Saeed
Harlin Lee
60
0
0
01 Nov 2024
Preventing Dimensional Collapse in Self-Supervised Learning via
  Orthogonality Regularization
Preventing Dimensional Collapse in Self-Supervised Learning via Orthogonality Regularization
Junlin He
Jinxiao Du
Wei Ma
SSL
118
1
0
01 Nov 2024
Multiple Information Prompt Learning for Cloth-Changing Person Re-Identification
Multiple Information Prompt Learning for Cloth-Changing Person Re-Identification
Shengxun Wei
Zan Gao
Yibo Zhao
Weili Guan
Weili Guan
Shengyong Chen
135
2
0
01 Nov 2024
Learning Video Representations without Natural Videos
Learning Video Representations without Natural Videos
Xueyang Yu
Xinlei Chen
Yossi Gandelsman
VGenAI4TS
90
1
0
31 Oct 2024
No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse
  Unposed Images
No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images
Botao Ye
Sifei Liu
Haofei Xu
Xueting Li
Marc Pollefeys
Ming-Hsuan Yang
Songyou Peng
86
36
0
31 Oct 2024
Sparsh: Self-supervised touch representations for vision-based tactile
  sensing
Sparsh: Self-supervised touch representations for vision-based tactile sensing
Carolina Higuera
Akash Sharma
Chaithanya Krishna Bodduluri
Taosha Fan
Patrick E. Lancaster
...
Michael Kaess
Byron Boots
Mike Lambeta
Tingfan Wu
Mustafa Mukadam
85
23
0
31 Oct 2024
Denoising Diffusion Models for Anomaly Localization in Medical Images
Denoising Diffusion Models for Anomaly Localization in Medical Images
Cosmin I. Bercea
P. Cattin
Julia A. Schnabel
J. Wolleb
DiffMMedIm
66
1
0
31 Oct 2024
Context-Aware Token Selection and Packing for Enhanced Vision
  Transformer
Context-Aware Token Selection and Packing for Enhanced Vision Transformer
Tianyi Zhang
B. Li
Jae-sun Seo
Yu Cao
72
0
0
31 Oct 2024
FRoundation: Are Foundation Models Ready for Face Recognition?
FRoundation: Are Foundation Models Ready for Face Recognition?
Tahar Chettaoui
Naser Damer
Fadi Boutros
CVBM
94
8
0
31 Oct 2024
EchoFM: Foundation Model for Generalizable Echocardiogram Analysis
EchoFM: Foundation Model for Generalizable Echocardiogram Analysis
Sekeun Kim
Pengfei Jin
S. Song
Cheng Chen
Yiwei Li
Hui Ren
Xiang Li
Tianming Liu
Quanzheng Li
107
0
0
30 Oct 2024
Aligning Audio-Visual Joint Representations with an Agentic Workflow
Aligning Audio-Visual Joint Representations with an Agentic Workflow
Shentong Mo
Yibing Song
61
0
0
30 Oct 2024
Continuous Spatio-Temporal Memory Networks for 4D Cardiac Cine MRI
  Segmentation
Continuous Spatio-Temporal Memory Networks for 4D Cardiac Cine MRI Segmentation
Meng Ye
Bingyu Xin
L. Axel
Dimitris N. Metaxas
62
0
0
30 Oct 2024
Efficient Adaptation of Pre-trained Vision Transformer via Householder
  Transformation
Efficient Adaptation of Pre-trained Vision Transformer via Householder Transformation
Wei Dong
Yuan Sun
Yiting Yang
Xing Zhang
Zhijun Lin
Qingsen Yan
Han Zhang
Peng Wang
Yang Yang
Hengtao Shen
79
1
0
30 Oct 2024
Dataset Awareness is not Enough: Implementing Sample-level Tail
  Encouragement in Long-tailed Self-supervised Learning
Dataset Awareness is not Enough: Implementing Sample-level Tail Encouragement in Long-tailed Self-supervised Learning
Haowen Xiao
Guanghui Liu
Xinyi Gao
Yang Li
Fengmao Lv
Jielei Chu
106
0
0
30 Oct 2024
Epipolar-Free 3D Gaussian Splatting for Generalizable Novel View
  Synthesis
Epipolar-Free 3D Gaussian Splatting for Generalizable Novel View Synthesis
Zhiyuan Min
Yawei Luo
Jianwen Sun
Yi Yang
3DGS
82
3
0
30 Oct 2024
Revisiting MAE pre-training for 3D medical image segmentation
Revisiting MAE pre-training for 3D medical image segmentation
Tassilo Wald
Constantin Ulrich
Stanislav Lukyanenko
Andrei Goncharov
Alberto Paderno
Leander Maerkisch
Paul F. Jäger
Paul F. Jäger
Klaus Maier-Hein
127
2
0
30 Oct 2024
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
Haiyang Wang
Yue Fan
Muhammad Ferjad Naeem
Yongqin Xian
J. E. Lenssen
Liwei Wang
F. Tombari
Bernt Schiele
107
2
0
30 Oct 2024
Pre-Trained Vision Models as Perception Backbones for Safety Filters in
  Autonomous Driving
Pre-Trained Vision Models as Perception Backbones for Safety Filters in Autonomous Driving
Yuxuan Yang
Hussein Sibai
104
2
0
29 Oct 2024
Unified Domain Generalization and Adaptation for Multi-View 3D Object
  Detection
Unified Domain Generalization and Adaptation for Multi-View 3D Object Detection
Gyusam Chang
Jiwon Lee
Donghyun Kim
Jinkyu Kim
Dongwook Lee
Daehyun Ji
Sujin Jang
Sangpil Kim
118
1
0
29 Oct 2024
Robots Pre-train Robots: Manipulation-Centric Robotic Representation
  from Large-Scale Robot Datasets
Robots Pre-train Robots: Manipulation-Centric Robotic Representation from Large-Scale Robot Datasets
Guangqi Jiang
Yifei Sun
Tao Huang
Huanyu Li
Yongyuan Liang
Huazhe Xu
70
7
0
29 Oct 2024
Multi-Class Textual-Inversion Secretly Yields a Semantic-Agnostic
  Classifier
Multi-Class Textual-Inversion Secretly Yields a Semantic-Agnostic Classifier
Kai Wang
Fei Yang
Bogdan Raducanu
Joost van de Weijer
57
2
0
29 Oct 2024
Towards Unifying Understanding and Generation in the Era of Vision
  Foundation Models: A Survey from the Autoregression Perspective
Towards Unifying Understanding and Generation in the Era of Vision Foundation Models: A Survey from the Autoregression Perspective
Shenghao Xie
Wenqiang Zu
Mingyang Zhao
Duo Su
Shilong Liu
Ruohua Shi
Guoqi Li
Shanghang Zhang
Lei Ma
LRM
151
3
0
29 Oct 2024
BenchX: A Unified Benchmark Framework for Medical Vision-Language
  Pretraining on Chest X-Rays
BenchX: A Unified Benchmark Framework for Medical Vision-Language Pretraining on Chest X-Rays
Yang Zhou
Tan Li Hui Faith
Yanyu Xu
Sicong Leng
Xinxing Xu
Yong Liu
Rick Siow Mong Goh
SSLVLMLM&MAMedIm
75
1
0
29 Oct 2024
ReMix: Training Generalized Person Re-identification on a Mixture of
  Data
ReMix: Training Generalized Person Re-identification on a Mixture of Data
Timur Mamedov
Anton Konushin
Vadim Konushin
63
1
0
29 Oct 2024
A Fresh Look at Generalized Category Discovery through Non-negative
  Matrix Factorization
A Fresh Look at Generalized Category Discovery through Non-negative Matrix Factorization
Zhong Ji
Steve Yang
Jingren Liu
Yanwei Pang
Jungong Han
125
1
0
29 Oct 2024
Efficient and Effective Weight-Ensembling Mixture of Experts for
  Multi-Task Model Merging
Efficient and Effective Weight-Ensembling Mixture of Experts for Multi-Task Model Merging
Li Shen
Anke Tang
Enneng Yang
G. Guo
Yong Luo
Lefei Zhang
Xiaochun Cao
Di Lin
Dacheng Tao
MoMe
83
9
0
29 Oct 2024
DiffSTR: Controlled Diffusion Models for Scene Text Removal
DiffSTR: Controlled Diffusion Models for Scene Text Removal
Sanhita Pathak
V. Kaushik
Brejesh Lall
DiffM
70
0
0
29 Oct 2024
A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks
A Large Recurrent Action Model: xLSTM enables Fast Inference for Robotics Tasks
Thomas Schmied
Thomas Adler
Vihang Patil
M. Beck
Korbinian Poppel
Johannes Brandstetter
Günter Klambauer
Razvan Pascanu
Sepp Hochreiter
203
7
0
29 Oct 2024
PK-YOLO: Pretrained Knowledge Guided YOLO for Brain Tumor Detection in Multiplanar MRI Slices
PK-YOLO: Pretrained Knowledge Guided YOLO for Brain Tumor Detection in Multiplanar MRI Slices
Ming Kang
F. F. Ting
Raphaël C.-W. Phan
C. Ting
ViTMedIm
173
1
0
29 Oct 2024
AdaptGCD: Multi-Expert Adapter Tuning for Generalized Category Discovery
AdaptGCD: Multi-Expert Adapter Tuning for Generalized Category Discovery
Yuxun Qu
Yongqiang Tang
Chenyang Zhang
Wensheng Zhang
179
0
0
29 Oct 2024
EEG-Driven 3D Object Reconstruction with Style Consistency and Diffusion
  Prior
EEG-Driven 3D Object Reconstruction with Style Consistency and Diffusion Prior
Xin Xiang
Wenhui Zhou
Guojun Dai
DiffM
80
0
0
28 Oct 2024
BEVPose: Unveiling Scene Semantics through Pose-Guided Multi-Modal BEV
  Alignment
BEVPose: Unveiling Scene Semantics through Pose-Guided Multi-Modal BEV Alignment
M. Hosseinzadeh
Ian Reid
65
1
0
28 Oct 2024
Multi-modal AI for comprehensive breast cancer prognostication
Multi-modal AI for comprehensive breast cancer prognostication
Jan Witowski
Ken Zeng
Joseph Cappadona
Jailan Elayoubi
Elena Diana Chiru
...
Adam Brufsky
Francisco J. Esteva
Lajos Pusztai
Yann LeCun
Krzysztof J. Geras
25
1
0
28 Oct 2024
Accelerating Augmentation Invariance Pretraining
Accelerating Augmentation Invariance Pretraining
Jinhong Lin
Cheng-En Wu
Yibing Wei
Pedro Morgado
ViT
81
1
0
27 Oct 2024
Idempotent Unsupervised Representation Learning for Skeleton-Based
  Action Recognition
Idempotent Unsupervised Representation Learning for Skeleton-Based Action Recognition
Lilang Lin
Lehong Wu
Jiahang Zhang
Jiaying Liu
106
3
0
27 Oct 2024
PaPaGei: Open Foundation Models for Optical Physiological Signals
PaPaGei: Open Foundation Models for Optical Physiological Signals
Arvind Pillai
Dimitris Spathis
F. Kawsar
Mohammad Malekzadeh
VLM
97
8
0
27 Oct 2024
GiVE: Guiding Visual Encoder to Perceive Overlooked Information
GiVE: Guiding Visual Encoder to Perceive Overlooked Information
Junjie Li
Jianghong Ma
Xiaofeng Zhang
Yuhang Li
Jianyang Shi
126
1
0
26 Oct 2024
Your Image is Secretly the Last Frame of a Pseudo Video
Your Image is Secretly the Last Frame of a Pseudo Video
Wenlong Chen
Wenlin Chen
Lapo Rastrelli
Yingzhen Li
DiffMVGen
93
0
0
26 Oct 2024
Exploring Self-Supervised Learning with U-Net Masked Autoencoders and
  EfficientNet B7 for Improved Classification
Exploring Self-Supervised Learning with U-Net Masked Autoencoders and EfficientNet B7 for Improved Classification
Vamshi Krishna Kancharla
Pavan Kumar Kaveti
69
3
0
25 Oct 2024
Frozen-DETR: Enhancing DETR with Image Understanding from Frozen
  Foundation Models
Frozen-DETR: Enhancing DETR with Image Understanding from Frozen Foundation Models
Shenghao Fu
Junkai Yan
Q. Yang
Xihan Wei
Xiaohua Xie
Wei-Shi Zheng
VLM
111
3
0
25 Oct 2024
Connecting Joint-Embedding Predictive Architecture with Contrastive
  Self-supervised Learning
Connecting Joint-Embedding Predictive Architecture with Contrastive Self-supervised Learning
Shentong Mo
Shengbang Tong
98
1
0
25 Oct 2024
On Occlusions in Video Action Detection: Benchmark Datasets And Training
  Recipes
On Occlusions in Video Action Detection: Benchmark Datasets And Training Recipes
Rajat Modi
Vibhav Vineet
Yogesh S Rawat
86
2
0
25 Oct 2024
Transductive Learning for Near-Duplicate Image Detection in Scanned
  Photo Collections
Transductive Learning for Near-Duplicate Image Detection in Scanned Photo Collections
Francesc Net
Marc Folia
Pep Casals
Lluís Gómez
25
2
0
25 Oct 2024
PointPatchRL -- Masked Reconstruction Improves Reinforcement Learning on
  Point Clouds
PointPatchRL -- Masked Reconstruction Improves Reinforcement Learning on Point Clouds
B. Gyenes
Nikolai Franke
P. Becker
Gerhard Neumann
3DPC
96
0
0
24 Oct 2024
Previous
123...161718...949596
Next