ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.06377
  4. Cited By
Masked Autoencoders Are Scalable Vision Learners
v1v2v3 (latest)

Masked Autoencoders Are Scalable Vision Learners

11 November 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
    ViTTPM
ArXiv (abs)PDFHTML

Papers citing "Masked Autoencoders Are Scalable Vision Learners"

50 / 4,779 papers shown
Title
PaPr: Training-Free One-Step Patch Pruning with Lightweight ConvNets for
  Faster Inference
PaPr: Training-Free One-Step Patch Pruning with Lightweight ConvNets for Faster Inference
Tanvir Mahmud
Burhaneddin Yaman
Chun-Hao Liu
Diana Marculescu
109
3
0
24 Mar 2024
SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal
  Visual Object Tracking
SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking
Xiaojun Hou
Jiazheng Xing
Yijie Qian
Yaowei Guo
Shuo Xin
...
Kai Tang
Mengmeng Wang
Zhengkai Jiang
Liang Liu
Yong-Jin Liu
105
29
0
24 Mar 2024
Edit3K: Universal Representation Learning for Video Editing Components
Edit3K: Universal Representation Learning for Video Editing Components
Xin Gu
Libo Zhang
Fan Chen
Longyin Wen
Yufei Wang
Tiejian Luo
Sijie Zhu
122
4
0
24 Mar 2024
Cognitive resilience: Unraveling the proficiency of image-captioning
  models to interpret masked visual content
Cognitive resilience: Unraveling the proficiency of image-captioning models to interpret masked visual content
Zhicheng Du
Zhaotian Xie
Huazhang Ying
Likun Zhang
Peiwu Qin
80
0
0
23 Mar 2024
Centered Masking for Language-Image Pre-Training
Centered Masking for Language-Image Pre-Training
Mingliang Liang
Martha Larson
VLMCLIP
60
4
0
23 Mar 2024
Once for Both: Single Stage of Importance and Sparsity Search for Vision
  Transformer Compression
Once for Both: Single Stage of Importance and Sparsity Search for Vision Transformer Compression
Hancheng Ye
Chong Yu
Peng Ye
Renqiu Xia
Yansong Tang
Jiwen Lu
Tao Chen
Bo Zhang
90
3
0
23 Mar 2024
3D-TransUNet for Brain Metastases Segmentation in the BraTS2023
  Challenge
3D-TransUNet for Brain Metastases Segmentation in the BraTS2023 Challenge
Siwei Yang
Xianhang Li
Jieru Mei
Jieneng Chen
Cihang Xie
Yuyin Zhou
MedIm
71
7
0
23 Mar 2024
InternVideo2: Scaling Video Foundation Models for Multimodal Video
  Understanding
InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding
Yi Wang
Kunchang Li
Xinhao Li
Jiashuo Yu
Yinan He
...
Hongjie Zhang
Yifei Huang
Yu Qiao
Yali Wang
Limin Wang
88
79
0
22 Mar 2024
Neural Plasticity-Inspired Multimodal Foundation Model for Earth
  Observation
Neural Plasticity-Inspired Multimodal Foundation Model for Earth Observation
Zhitong Xiong
Yi Wang
Fahong Zhang
Adam J. Stewart
Joelle Hanna
Damian Borth
Ioannis Papoutsis
B. L. Saux
Gustau Camps-Valls
Xiao Xiang Zhu
AI4CE
117
18
0
22 Mar 2024
Self-Supervised Backbone Framework for Diverse Agricultural Vision Tasks
Self-Supervised Backbone Framework for Diverse Agricultural Vision Tasks
Sudhir Sornapudi
Rajhans Singh Corteva Agriscience
SSL
79
2
0
22 Mar 2024
Improve Cross-domain Mixed Sampling with Guidance Training for Adaptive
  Segmentation
Improve Cross-domain Mixed Sampling with Guidance Training for Adaptive Segmentation
Wenlve Zhou
Zhiheng Zhou
Tianlei Wang
Delu Zeng
82
0
0
22 Mar 2024
LiFT: A Surprisingly Simple Lightweight Feature Transform for Dense ViT
  Descriptors
LiFT: A Surprisingly Simple Lightweight Feature Transform for Dense ViT Descriptors
Saksham Suri
Matthew Walmer
Kamal Gupta
Abhinav Shrivastava
74
7
0
21 Mar 2024
Hierarchical Text-to-Vision Self Supervised Alignment for Improved
  Histopathology Representation Learning
Hierarchical Text-to-Vision Self Supervised Alignment for Improved Histopathology Representation Learning
Hasindri Watawana
Kanchana Ranasinghe
Tariq Mahmood
Muzammal Naseer
Salman Khan
Fahad Shahbaz Khan
SSL
70
5
0
21 Mar 2024
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey
Zeyu Han
Chao Gao
Jinyang Liu
Jeff Zhang
Sai Qian Zhang
307
404
0
21 Mar 2024
Token Transformation Matters: Towards Faithful Post-hoc Explanation for
  Vision Transformer
Token Transformation Matters: Towards Faithful Post-hoc Explanation for Vision Transformer
Junyi Wu
Bin Duan
Weitai Kang
Hao Tang
Yan Yan
60
9
0
21 Mar 2024
SurroundSDF: Implicit 3D Scene Understanding Based on Signed Distance
  Field
SurroundSDF: Implicit 3D Scene Understanding Based on Signed Distance Field
Lizhe Liu
Bohua Wang
Hongwei Xie
Daqi Liu
Li Liu
Zhiqiang Tian
Kuiyuan Yang
Bing Wang
81
3
0
21 Mar 2024
Exploring Task Unification in Graph Representation Learning via
  Generative Approach
Exploring Task Unification in Graph Representation Learning via Generative Approach
Yulan Hu
Ouyang Sheng
Zhirui Yang
Ge Chen
Junchen Wan
Xiao Wang
Yong Liu
73
3
0
21 Mar 2024
Training point-based deep learning networks for forest segmentation with
  synthetic data
Training point-based deep learning networks for forest segmentation with synthetic data
Francisco Raverta Capua
Juan Schandin
Pablo De Cristoforis
3DPC
59
3
0
21 Mar 2024
MaskSAM: Towards Auto-prompt SAM with Mask Classification for Volumetric Medical Image Segmentation
MaskSAM: Towards Auto-prompt SAM with Mask Classification for Volumetric Medical Image Segmentation
Bin Xie
Hao Tang
Bin Duan
Dawen Cai
Yan Yan
Gady Agam
VLMMedIm
81
0
0
21 Mar 2024
On Pretraining Data Diversity for Self-Supervised Learning
On Pretraining Data Diversity for Self-Supervised Learning
Hasan Hammoud
Tuhin Das
Fabio Pizzati
Philip Torr
Adel Bibi
Guohao Li
155
3
0
20 Mar 2024
Practical End-to-End Optical Music Recognition for Pianoform Music
Practical End-to-End Optical Music Recognition for Pianoform Music
Jirí Mayer
Milan Straka
Jan Hajic
Pavel Pecina
55
2
0
20 Mar 2024
MTP: Advancing Remote Sensing Foundation Model via Multi-Task
  Pretraining
MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining
Di Wang
Jing Zhang
Minqiang Xu
Lin Liu
Dongsheng Wang
...
Chengxi Han
Haonan Guo
Bo Du
Dacheng Tao
Lefei Zhang
83
53
0
20 Mar 2024
vid-TLDR: Training Free Token merging for Light-weight Video Transformer
vid-TLDR: Training Free Token merging for Light-weight Video Transformer
Joonmyung Choi
Sanghyeok Lee
Jaewon Chu
Minhyuk Choi
Hyunwoo J. Kim
MoMeViT
109
17
0
20 Mar 2024
AdaViPro: Region-based Adaptive Visual Prompt for Large-Scale Models
  Adapting
AdaViPro: Region-based Adaptive Visual Prompt for Large-Scale Models Adapting
Mengyu Yang
Ye Tian
Lanshan Zhang
Xiao Liang
Xuming Ran
Wendong Wang
VLM
102
2
0
20 Mar 2024
Mora: Enabling Generalist Video Generation via A Multi-Agent Framework
Mora: Enabling Generalist Video Generation via A Multi-Agent Framework
Zhengqing Yuan
Ruoxi Chen
Zhaoxu Li
Haolong Jia
Lifang He
Chi Wang
Lichao Sun
VGen
109
28
0
20 Mar 2024
When Do We Not Need Larger Vision Models?
When Do We Not Need Larger Vision Models?
Baifeng Shi
Ziyang Wu
Maolin Mao
Xin Wang
Trevor Darrell
VLMLRM
119
47
0
19 Mar 2024
ViTGaze: Gaze Following with Interaction Features in Vision Transformers
ViTGaze: Gaze Following with Interaction Features in Vision Transformers
Yuehao Song
Xinggang Wang
Jingfeng Yao
Wenyu Liu
Jinglin Zhang
Xiangmin Xu
ViT
83
3
0
19 Mar 2024
Emotion Recognition Using Transformers with Masked Learning
Emotion Recognition Using Transformers with Masked Learning
Seongjae Min
Junseok Yang
Sangjun Lim
Junyong Lee
Sangwon Lee
Sejoon Lim
85
8
0
19 Mar 2024
Compound Expression Recognition via Multi Model Ensemble
Compound Expression Recognition via Multi Model Ensemble
Jun-chen Yu
Jichao Zhu
Wangyuan Zhu
77
8
0
19 Mar 2024
Pretraining Codomain Attention Neural Operators for Solving Multiphysics
  PDEs
Pretraining Codomain Attention Neural Operators for Solving Multiphysics PDEs
Md Ashiqur Rahman
Robert Joseph George
Mogab Elleithy
Daniel Leibovici
Zong-Yi Li
...
Julius Berner
Raymond A. Yeh
Jean Kossaifi
Kamyar Azizzadenesheli
A. Anandkumar
AI4CE
123
23
0
19 Mar 2024
Task-Customized Mixture of Adapters for General Image Fusion
Task-Customized Mixture of Adapters for General Image Fusion
Pengfei Zhu
Yang Sun
Bing Cao
Qinghua Hu
MoMe
122
23
0
19 Mar 2024
NTK-Guided Few-Shot Class Incremental Learning
NTK-Guided Few-Shot Class Incremental Learning
Jingren Liu
Zhong Ji
Yanwei Pang
YunLong Yu
CLL
95
4
0
19 Mar 2024
Human Mesh Recovery from Arbitrary Multi-view Images
Human Mesh Recovery from Arbitrary Multi-view Images
Xiaoben Li
Mancheng Meng
Ziyan Wu
Terrence Chen
Fan Yang
Dinggang Shen
81
1
0
19 Mar 2024
ADAPT to Robustify Prompt Tuning Vision Transformers
ADAPT to Robustify Prompt Tuning Vision Transformers
Masih Eskandar
Tooba Imtiaz
Zifeng Wang
Jennifer Dy
VPVLMVLMAAML
92
0
0
19 Mar 2024
EffiPerception: an Efficient Framework for Various Perception Tasks
EffiPerception: an Efficient Framework for Various Perception Tasks
Xinhao Xiang
Simon Dräger
Jiawei Zhang
VLM
77
0
0
18 Mar 2024
LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D
  Generation
LN3Diff: Scalable Latent Neural Fields Diffusion for Speedy 3D Generation
Yushi Lan
Fangzhou Hong
Shuai Yang
Shangchen Zhou
Xuyi Meng
Bo Dai
Xingang Pan
Chen Change Loy
81
44
0
18 Mar 2024
GenView: Enhancing View Quality with Pretrained Generative Model for
  Self-Supervised Learning
GenView: Enhancing View Quality with Pretrained Generative Model for Self-Supervised Learning
Xiaojie Li
Yibo Yang
Hefei Ling
Jianlong Wu
Yue Yu
Guohao Li
Min Zhang
SSL
101
6
0
18 Mar 2024
N-Modal Contrastive Losses with Applications to Social Media Data in
  Trimodal Space
N-Modal Contrastive Losses with Applications to Social Media Data in Trimodal Space
William Theisen
Walter J. Scheirer
61
1
0
18 Mar 2024
Unimodal Multi-Task Fusion for Emotional Mimicry Intensity Prediction
Unimodal Multi-Task Fusion for Emotional Mimicry Intensity Prediction
Tobias Hallmen
Fabian Deuser
Norbert Oswald
Elisabeth André
84
2
0
18 Mar 2024
HVDistill: Transferring Knowledge from Images to Point Clouds via
  Unsupervised Hybrid-View Distillation
HVDistill: Transferring Knowledge from Images to Point Clouds via Unsupervised Hybrid-View Distillation
Sha Zhang
Jiajun Deng
Lei Bai
Houqiang Li
Wanli Ouyang
Yanyong Zhang
3DPC
98
8
0
18 Mar 2024
S-JEPA: towards seamless cross-dataset transfer through dynamic spatial
  attention
S-JEPA: towards seamless cross-dataset transfer through dynamic spatial attention
Pierre Guetschel
Thomas Moreau
Michael Tangermann
68
9
0
18 Mar 2024
TTT-KD: Test-Time Training for 3D Semantic Segmentation through
  Knowledge Distillation from Foundation Models
TTT-KD: Test-Time Training for 3D Semantic Segmentation through Knowledge Distillation from Foundation Models
Lisa Weijler
Muhammad Jehanzeb Mirza
Leon Sick
Can Ekkazan
Pedro Hermosilla
TTA
99
0
0
18 Mar 2024
Continual Forgetting for Pre-trained Vision Models
Continual Forgetting for Pre-trained Vision Models
Hongbo Zhao
Bolin Ni
Haochen Wang
Junsong Fan
Fei Zhu
Yuxi Wang
Yuntao Chen
Gaofeng Meng
Zhaoxiang Zhang
MUVLM
136
13
0
18 Mar 2024
Boosting Order-Preserving and Transferability for Neural Architecture
  Search: a Joint Architecture Refined Search and Fine-tuning Approach
Boosting Order-Preserving and Transferability for Neural Architecture Search: a Joint Architecture Refined Search and Fine-tuning Approach
Beichen Zhang
Xiaoxing Wang
Xiaohan Qin
Junchi Yan
73
4
0
18 Mar 2024
Align and Distill: Unifying and Improving Domain Adaptive Object Detection
Align and Distill: Unifying and Improving Domain Adaptive Object Detection
Justin Kay
T. Haucke
Suzanne Stathatos
Siqi Deng
Erik Young
Pietro Perona
Sara Beery
Grant Van Horn
115
6
0
18 Mar 2024
Domain-Guided Masked Autoencoders for Unique Player Identification
Domain-Guided Masked Autoencoders for Unique Player Identification
Bavesh Balaji
Jerrin Bright
Sirisha Rambhatla
Yuhao Chen
Alexander Wong
John S. Zelek
David A Clausi
63
2
0
17 Mar 2024
Self-supervised co-salient object detection via feature correspondence
  at multiple scales
Self-supervised co-salient object detection via feature correspondence at multiple scales
Souradeep Chakraborty
Dimitris Samaras
86
4
0
17 Mar 2024
Brain-on-Switch: Towards Advanced Intelligent Network Data Plane via
  NN-Driven Traffic Analysis at Line-Speed
Brain-on-Switch: Towards Advanced Intelligent Network Data Plane via NN-Driven Traffic Analysis at Line-Speed
Jinzhu Yan
Haotian Xu
Zhuotao Liu
Qi Li
Ke Xu
Mingwei Xu
Jianping Wu
90
22
0
17 Mar 2024
ViSaRL: Visual Reinforcement Learning Guided by Human Saliency
ViSaRL: Visual Reinforcement Learning Guided by Human Saliency
Anthony Liang
Jesse Thomason
Erdem Biyik
79
7
0
16 Mar 2024
Uncertainty-Aware Adapter: Adapting Segment Anything Model (SAM) for
  Ambiguous Medical Image Segmentation
Uncertainty-Aware Adapter: Adapting Segment Anything Model (SAM) for Ambiguous Medical Image Segmentation
Mingzhou Jiang
Jiaying Zhou
Junde Wu
Tianyang Wang
Yueming Jin
Min Xu
MedIm
103
3
0
16 Mar 2024
Previous
123...373839...949596
Next