ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.06377
  4. Cited By
Masked Autoencoders Are Scalable Vision Learners

Masked Autoencoders Are Scalable Vision Learners

11 November 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
    ViT
    TPM
ArXivPDFHTML

Papers citing "Masked Autoencoders Are Scalable Vision Learners"

50 / 4,615 papers shown
Title
Enhancing Parameter-Efficient Fine-Tuning of Vision Transformers through
  Frequency-Based Adaptation
Enhancing Parameter-Efficient Fine-Tuning of Vision Transformers through Frequency-Based Adaptation
S. Ly
Hien Nguyen
82
1
0
28 Nov 2024
PP-SSL : Priority-Perception Self-Supervised Learning for Fine-Grained Recognition
ShuaiHeng Li
Qing Cai
Fan Zhang
Hao Fei
Yangyang Shu
Ziqiang Liu
Yiming Li
Lingqiao Liu
76
0
0
28 Nov 2024
TAMT: Temporal-Aware Model Tuning for Cross-Domain Few-Shot Action Recognition
TAMT: Temporal-Aware Model Tuning for Cross-Domain Few-Shot Action Recognition
Yilong Wang
Zilin Gao
Qilong Wang
Zhaofeng Chen
P. Li
Q. Hu
80
1
0
28 Nov 2024
Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for
  Robust 3D Robotic Manipulation
Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation
Yueru Jia
Jiaming Liu
Sixiang Chen
Chenyang Gu
Zihan Wang
...
Lily Lee
Pengwei Wang
Zhongyuan Wang
Renrui Zhang
Shanghang Zhang
89
11
0
27 Nov 2024
Point Cloud Unsupervised Pre-training via 3D Gaussian Splatting
Point Cloud Unsupervised Pre-training via 3D Gaussian Splatting
Hao Liu
Minglin Chen
Yanni Ma
Haihong Xiao
Ying He
3DGS
3DPC
84
0
0
27 Nov 2024
RS-vHeat: Heat Conduction Guided Efficient Remote Sensing Foundation Model
RS-vHeat: Heat Conduction Guided Efficient Remote Sensing Foundation Model
Huiyang Hu
Peijin Wang
Hanbo Bi
Boyuan Tong
Zehua Wang
...
Ziqi Zhang
QiXiang Ye
Kun Fu
Xian Sun
Xian Sun
105
0
0
27 Nov 2024
RealTraj: Towards Real-World Pedestrian Trajectory Forecasting
RealTraj: Towards Real-World Pedestrian Trajectory Forecasting
Ryo Fujii
Hideo Saito
Ryo Hachiuma
AI4TS
109
1
0
26 Nov 2024
SelfSplat: Pose-Free and 3D Prior-Free Generalizable 3D Gaussian Splatting
SelfSplat: Pose-Free and 3D Prior-Free Generalizable 3D Gaussian Splatting
Gyeongjin Kang
Jisang Yoo
Jihyeon Park
Seungtae Nam
Hyeonsoo Im
Sangheon Shin
Sangpil Kim
Eunbyung Park
3DGS
161
3
0
26 Nov 2024
Probing the Mid-level Vision Capabilities of Self-Supervised Learning
Probing the Mid-level Vision Capabilities of Self-Supervised Learning
Xuweiyi Chen
Markus Marks
Zezhou Cheng
84
0
0
25 Nov 2024
Open Vocabulary Monocular 3D Object Detection
Open Vocabulary Monocular 3D Object Detection
Jin Yao
Hao Gu
Xuweiyi Chen
Jiayun Wang
Zezhou Cheng
ObjD
VLM
76
3
0
25 Nov 2024
Image Generation Diversity Issues and How to Tame Them
Image Generation Diversity Issues and How to Tame Them
Mischa Dombrowski
Weitong Zhang
Sarah Cechnicka
Hadrien Reynaud
Bernhard Kainz
72
0
0
25 Nov 2024
Cautious Optimizers: Improving Training with One Line of Code
Cautious Optimizers: Improving Training with One Line of Code
Kaizhao Liang
Lizhang Chen
B. Liu
Qiang Liu
ODL
108
5
0
25 Nov 2024
Scaling Spike-driven Transformer with Efficient Spike Firing
  Approximation Training
Scaling Spike-driven Transformer with Efficient Spike Firing Approximation Training
Man Yao
Xuerui Qiu
Tianxiang Hu
J. Hu
Yuhong Chou
Keyu Tian
Jianxing Liao
Luziwei Leng
Bo Xu
Guoqi Li
76
7
0
25 Nov 2024
DeDe: Detecting Backdoor Samples for SSL Encoders via Decoders
DeDe: Detecting Backdoor Samples for SSL Encoders via Decoders
Sizai Hou
Songze Li
Duanyi Yao
AAML
72
0
0
25 Nov 2024
SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition
SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition
Yongkun Du
Z. Chen
Hongtao Xie
Caiyan Jia
Yu-Gang Jiang
90
1
0
24 Nov 2024
Multi-Token Enhancing for Vision Representation Learning
Multi-Token Enhancing for Vision Representation Learning
Zhong-Yu Li
Yu-Song Hu
Bo Yin
Ming-Ming Cheng
66
1
0
24 Nov 2024
PR-MIM: Delving Deeper into Partial Reconstruction in Masked Image
  Modeling
PR-MIM: Delving Deeper into Partial Reconstruction in Masked Image Modeling
Zhong-Yu Li
Yunheng Li
Deng-Ping Fan
Ming-Ming Cheng
73
0
0
24 Nov 2024
TransFair: Transferring Fairness from Ocular Disease Classification to Progression Prediction
Leila Gheisi
Henry Chu
Raju Gottumukkala
Yan Luo
Xingquan Zhu
Mengyu Wang
Min Shi
MedIm
115
0
0
24 Nov 2024
A Survey of Recent Advances and Challenges in Deep Audio-Visual Correlation Learning
Luis Vilaca
Yi Yu
Paula Vinan
75
0
0
24 Nov 2024
SPA: Efficient User-Preference Alignment against Uncertainty in Medical
  Image Segmentation
SPA: Efficient User-Preference Alignment against Uncertainty in Medical Image Segmentation
Jiayuan Zhu
Junde Wu
Cheng Ouyang
Konstantinos Kamnitsas
Alison Noble
72
0
0
23 Nov 2024
Improving Factuality of 3D Brain MRI Report Generation with Paired
  Image-domain Retrieval and Text-domain Augmentation
Improving Factuality of 3D Brain MRI Report Generation with Paired Image-domain Retrieval and Text-domain Augmentation
J. Lee
Y. Oh
Dahyoun Lee
Hyon Keun Joh
Chul-Ho Sohn
...
Cheol Kyu Jung
Jung Hyun Park
Kyu Sung Choi
Byung-Hoon Kim
Jong Chul Ye
DiffM
MedIm
79
0
0
23 Nov 2024
Enhancing Instruction-Following Capability of Visual-Language Models by
  Reducing Image Redundancy
Enhancing Instruction-Following Capability of Visual-Language Models by Reducing Image Redundancy
Te Yang
Jian Jia
Xiangyu Zhu
Weisong Zhao
Bo Wang
...
Shengyuan Liu
Quan Chen
Peng Jiang
Kun Gai
Zhen Lei
69
1
0
23 Nov 2024
There is no SAMantics! Exploring SAM as a Backbone for Visual
  Understanding Tasks
There is no SAMantics! Exploring SAM as a Backbone for Visual Understanding Tasks
Miguel Espinosa
Chenhongyi Yang
Linus Ericsson
Jingyu Sun
Elliot J. Crowley
VLM
75
0
0
22 Nov 2024
Optimized Vessel Segmentation: A Structure-Agnostic Approach with Small
  Vessel Enhancement and Morphological Correction
Optimized Vessel Segmentation: A Structure-Agnostic Approach with Small Vessel Enhancement and Morphological Correction
Dongning Song
Weijian Huang
Jiarun Liu
Md Jahidul Islam
Hao Yang
Shanshan Wang
77
0
0
22 Nov 2024
Aim My Robot: Precision Local Navigation to Any Object
Aim My Robot: Precision Local Navigation to Any Object
Xiangyun Meng
Xuning Yang
Sanghun Jung
F. Ramos
Srid Sadhan Jujjavarapu
Sanjoy Paul
Dieter Fox
85
1
0
22 Nov 2024
Segment Anything in Light Fields for Real-Time Applications via
  Constrained Prompting
Segment Anything in Light Fields for Real-Time Applications via Constrained Prompting
Nikolai Goncharov
Donald G. Dansereau
VLM
75
1
0
21 Nov 2024
Segment Any Class (SAC): Multi-Class Few-Shot Semantic Segmentation via
  Class Region Proposals
Segment Any Class (SAC): Multi-Class Few-Shot Semantic Segmentation via Class Region Proposals
Hussni Mohd Zakir
Eric Tatt Wei Ho
VLM
77
0
0
21 Nov 2024
NexusSplats: Efficient 3D Gaussian Splatting in the Wild
NexusSplats: Efficient 3D Gaussian Splatting in the Wild
Yuzhou Tang
Dejun Xu
Yongjie Hou
Zhenzhong Wang
Min Jiang
3DGS
82
1
0
21 Nov 2024
Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning
Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning
Jiange Yang
Haoyi Zhu
Yanjie Wang
Gangshan Wu
Tong He
Limin Wang
103
2
0
21 Nov 2024
Extending Video Masked Autoencoders to 128 frames
Extending Video Masked Autoencoders to 128 frames
N. B. Gundavarapu
Luke Friedman
Raghav Goyal
Chaitra Hegde
Eirikur Agustsson
...
Mikhail Sirotenko
Ming Yang
Tobias Weyand
Boqing Gong
Leonid Sigal
82
1
0
20 Nov 2024
Generating 3D-Consistent Videos from Unposed Internet Photos
Generating 3D-Consistent Videos from Unposed Internet Photos
Gene Chou
Kai Zhang
Sai Bi
Hao Tan
Zexiang Xu
Fujun Luan
Bharath Hariharan
Noah Snavely
3DGS
VGen
92
3
0
20 Nov 2024
Uni-Mlip: Unified Self-supervision for Medical Vision Language
  Pre-training
Uni-Mlip: Unified Self-supervision for Medical Vision Language Pre-training
Ameera Bawazir
Kebin Wu
Wenbin Li
CLIP
77
1
0
20 Nov 2024
RAW-Diffusion: RGB-Guided Diffusion Models for High-Fidelity RAW Image
  Generation
RAW-Diffusion: RGB-Guided Diffusion Models for High-Fidelity RAW Image Generation
Christoph Reinders
Radu Berdan
Beril Besbinar
Junji Otsuka
Daisuke Iso
81
2
0
20 Nov 2024
Adapting Vision Foundation Models for Robust Cloud Segmentation in
  Remote Sensing Images
Adapting Vision Foundation Models for Robust Cloud Segmentation in Remote Sensing Images
Xuechao Zou
Shun Zhang
Kai Li
Shiying Wang
Junliang Xing
Lei Jin
Congyan Lang
Pin Tao
66
1
0
20 Nov 2024
Efficient Masked AutoEncoder for Video Object Counting and A Large-Scale Benchmark
Efficient Masked AutoEncoder for Video Object Counting and A Large-Scale Benchmark
Bing Cao
Quanhao Lu
Jiekang Feng
Pengfei Zhu
Q. Hu
Qilong Wang
73
0
0
20 Nov 2024
Probe-Me-Not: Protecting Pre-trained Encoders from Malicious Probing
Probe-Me-Not: Protecting Pre-trained Encoders from Malicious Probing
Ruyi Ding
Tong Zhou
Lili Su
A. A. Ding
Xiaolin Xu
Yunsi Fei
AAML
69
1
0
19 Nov 2024
GaussianPretrain: A Simple Unified 3D Gaussian Representation for Visual
  Pre-training in Autonomous Driving
GaussianPretrain: A Simple Unified 3D Gaussian Representation for Visual Pre-training in Autonomous Driving
Shaoqing Xu
Fang Li
Shengyin Jiang
Ziying Song
Li Liu
Zhi-xin Yang
3DGS
SSL
93
1
0
19 Nov 2024
KDC-MAE: Knowledge Distilled Contrastive Mask Auto-Encoder
KDC-MAE: Knowledge Distilled Contrastive Mask Auto-Encoder
Maheswar Bora
Saurabh Atreya
Aritra Mukherjee
Abhijit Das
92
0
0
19 Nov 2024
Learning a Neural Association Network for Self-supervised Multi-Object Tracking
Shuai Li
Michael G. Burke
S. Ramamoorthy
Juergen Gall
VOT
83
0
0
18 Nov 2024
Teaching Video Diffusion Model with Latent Physical Phenomenon Knowledge
Qinglong Cao
Ding Wang
Xirui Li
Yuntian Chen
Chao Ma
Xiaokang Yang
DiffM
VGen
118
2
0
18 Nov 2024
Relational Contrastive Learning and Masked Image Modeling for Scene Text Recognition
T. Lin
Jinglei Zhang
Yi Xu
Kai Chen
Rui Zhang
Cheng Chen
38
0
0
18 Nov 2024
LaVin-DiT: Large Vision Diffusion Transformer
Zhaoqing Wang
Xiaobo Xia
Runnan Chen
Dongdong Yu
Changhu Wang
Mingming Gong
Tongliang Liu
92
6
0
18 Nov 2024
MpoxVLM: A Vision-Language Model for Diagnosing Skin Lesions from Mpox Virus Infection
Xu Cao
Wenqian Ye
K. Moise
Megan Coffee
41
2
0
16 Nov 2024
One-Layer Transformer Provably Learns One-Nearest Neighbor In Context
Zihao Li
Yuan Cao
Cheng Gao
Yihan He
Han Liu
Jason M. Klusowski
Jianqing Fan
Mengdi Wang
MLT
55
6
0
16 Nov 2024
From Prototypes to General Distributions: An Efficient Curriculum for
  Masked Image Modeling
From Prototypes to General Distributions: An Efficient Curriculum for Masked Image Modeling
Jinhong Lin
Cheng-En Wu
Huanran Li
Jifan Zhang
Yu Hen Hu
Pedro Morgado
41
0
0
16 Nov 2024
FedAli: Personalized Federated Learning with Aligned Prototypes through
  Optimal Transport
FedAli: Personalized Federated Learning with Aligned Prototypes through Optimal Transport
Sannara Ek
Kaile Wang
François Portet
P. Lalanda
Jiannong Cao
FedML
38
0
0
15 Nov 2024
CorrCLIP: Reconstructing Correlations in CLIP with Off-the-Shelf
  Foundation Models for Open-Vocabulary Semantic Segmentation
CorrCLIP: Reconstructing Correlations in CLIP with Off-the-Shelf Foundation Models for Open-Vocabulary Semantic Segmentation
Dengke Zhang
Fagui Liu
Quan Tang
VLM
55
1
0
15 Nov 2024
Modification Takes Courage: Seamless Image Stitching via Reference-Driven Inpainting
Modification Takes Courage: Seamless Image Stitching via Reference-Driven Inpainting
Ziqi Xie
Xiao Lai
Weidong Zhao
Xianhui Liu
Wenlong Hou
Wenlong Hou
52
0
0
15 Nov 2024
On the Surprising Effectiveness of Attention Transfer for Vision
  Transformers
On the Surprising Effectiveness of Attention Transfer for Vision Transformers
Alexander C. Li
Yuandong Tian
Bin Chen
Deepak Pathak
Xinlei Chen
43
0
0
14 Nov 2024
Assessing the Performance of the DINOv2 Self-supervised Learning Vision
  Transformer Model for the Segmentation of the Left Atrium from MRI Images
Assessing the Performance of the DINOv2 Self-supervised Learning Vision Transformer Model for the Segmentation of the Left Atrium from MRI Images
Bipasha Kundu
Bidur Khanal
R. Simon
Cristian A. Linte
MedIm
28
2
0
14 Nov 2024
Previous
123...111213...919293
Next