ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.06377
  4. Cited By
Masked Autoencoders Are Scalable Vision Learners
v1v2v3 (latest)

Masked Autoencoders Are Scalable Vision Learners

11 November 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
    ViTTPM
ArXiv (abs)PDFHTML

Papers citing "Masked Autoencoders Are Scalable Vision Learners"

50 / 4,777 papers shown
Title
Text-Queried Audio Source Separation via Hierarchical Modeling
Text-Queried Audio Source Separation via Hierarchical Modeling
Xinlei Yin
Xiulian Peng
Xue Jiang
Zhiwei Xiong
Yan Lu
46
0
0
27 May 2025
Incorporating Flexible Image Conditioning into Text-to-Video Diffusion Models without Training
Incorporating Flexible Image Conditioning into Text-to-Video Diffusion Models without Training
Bolin Lai
Sangmin Lee
Xu Cao
Xiang Li
James M. Rehg
DiffM
70
0
0
27 May 2025
Vision Transformers with Self-Distilled Registers
Vision Transformers with Self-Distilled Registers
Yinjie Chen
Zipeng Yan
Chong Zhou
Bo Dai
Andrew F. Luo
54
0
0
27 May 2025
HuMoCon: Concept Discovery for Human Motion Understanding
HuMoCon: Concept Discovery for Human Motion Understanding
Qihang Fang
Chengcheng Tang
Bugra Tekin
Shugao Ma
Yanchao Yang
43
0
0
27 May 2025
MoE-Gyro: Self-Supervised Over-Range Reconstruction and Denoising for MEMS Gyroscopes
MoE-Gyro: Self-Supervised Over-Range Reconstruction and Denoising for MEMS Gyroscopes
Feiyang Pan
Shenghe Zheng
Chunyan Yin
Guangbin Dou
15
0
0
27 May 2025
Roboflow100-VL: A Multi-Domain Object Detection Benchmark for Vision-Language Models
Roboflow100-VL: A Multi-Domain Object Detection Benchmark for Vision-Language Models
Peter Robicheaux
Matvei Popov
Anish Madan
Isaac Robinson
Joseph Nelson
Deva Ramanan
Neehar Peri
ObjDVLM
107
3
0
27 May 2025
The Missing Point in Vision Transformers for Universal Image Segmentation
The Missing Point in Vision Transformers for Universal Image Segmentation
Sajjad Shahabodini
Mobina Mansoori
Farnoush Bayatmakou
J. Abouei
Konstantinos N. Plataniotis
Arash Mohammadi
ViTISeg
31
0
0
26 May 2025
ViTaPEs: Visuotactile Position Encodings for Cross-Modal Alignment in Multimodal Transformers
ViTaPEs: Visuotactile Position Encodings for Cross-Modal Alignment in Multimodal Transformers
Fotios Lygerakis
Ozan Özdenizci
Elmar Rückert
45
0
0
26 May 2025
MM-Prompt: Cross-Modal Prompt Tuning for Continual Visual Question Answering
MM-Prompt: Cross-Modal Prompt Tuning for Continual Visual Question Answering
Xu Li
Fan Lyu
LRM
20
0
0
26 May 2025
Spurious Privacy Leakage in Neural Networks
Spurious Privacy Leakage in Neural Networks
Chenxiang Zhang
Jun Pang
S. Mauw
53
0
0
26 May 2025
Advancements in Medical Image Classification through Fine-Tuning Natural Domain Foundation Models
Advancements in Medical Image Classification through Fine-Tuning Natural Domain Foundation Models
Mobina Mansoori
Sajjad Shahabodini
Farnoush Bayatmakou
J. Abouei
Konstantinos N. Plataniotis
Arash Mohammadi
41
0
0
26 May 2025
DiSA: Diffusion Step Annealing in Autoregressive Image Generation
DiSA: Diffusion Step Annealing in Autoregressive Image Generation
Qinyu Zhao
Jaskirat Singh
Ming Xu
Akshay Asthana
Stephen Gould
Liang Zheng
DiffM
68
0
0
26 May 2025
Rotary Masked Autoencoders are Versatile Learners
Rotary Masked Autoencoders are Versatile Learners
Uros Zivanovic
Serafina Di Gioia
Andre Scaffidi
Martín de los Rios
Gabriella Contardo
R. Trotta
30
0
0
26 May 2025
Enhancing Contrastive Learning-based Electrocardiogram Pretrained Model with Patient Memory Queue
Enhancing Contrastive Learning-based Electrocardiogram Pretrained Model with Patient Memory Queue
Xiaoyu Sun
Yang Yang
Xunde Dong
18
0
0
26 May 2025
Advancing Video Self-Supervised Learning via Image Foundation Models
Advancing Video Self-Supervised Learning via Image Foundation Models
Jingwei Wu
Zhewei Huang
Chang Liu
44
0
0
25 May 2025
Tokenizing Electron Cloud in Protein-Ligand Interaction Learning
Tokenizing Electron Cloud in Protein-Ligand Interaction Learning
H. Lin
Odin Zhang
Jia Xu
Yunfan Liu
Zheng Cheng
Lirong Wu
Yufei Huang
Zhifeng Gao
Stan Z. Li
51
0
0
25 May 2025
Plug-and-Play Context Feature Reuse for Efficient Masked Generation
Plug-and-Play Context Feature Reuse for Efficient Masked Generation
Xuejie Liu
Anji Liu
Guy Van den Broeck
Yitao Liang
42
0
0
25 May 2025
FHGS: Feature-Homogenized Gaussian Splatting
FHGS: Feature-Homogenized Gaussian Splatting
Q. G. Duan
Benyun Zhao
Mingqiao Han Yijun Huang
Ben M. Chen
3DGS
36
0
0
25 May 2025
Distill CLIP (DCLIP): Enhancing Image-Text Retrieval via Cross-Modal Transformer Distillation
Distill CLIP (DCLIP): Enhancing Image-Text Retrieval via Cross-Modal Transformer Distillation
Daniel Csizmadia
Andrei Codreanu
Victor Sim
Vighnesh Prabhu
Michael Lu
Kevin Zhu
Sean O'Brien
Vasu Sharma
CLIPVLM
71
0
0
25 May 2025
C3R: Channel Conditioned Cell Representations for unified evaluation in microscopy imaging
C3R: Channel Conditioned Cell Representations for unified evaluation in microscopy imaging
Umar Marikkar
Syed Sameed Husain
Muhammad Awais
Sara Atito
39
0
0
24 May 2025
VISTA: Vision-Language Inference for Training-Free Stock Time-Series Analysis
VISTA: Vision-Language Inference for Training-Free Stock Time-Series Analysis
Tina Khezresmaeilzadeh
Parsa Razmara
Seyedarmin Azizi
Mohammad Erfan Sadeghi
Erfan Baghaei Portaghloo
AI4TS
276
0
0
24 May 2025
MADCAT: Combating Malware Detection Under Concept Drift with Test-Time Adaptation
MADCAT: Combating Malware Detection Under Concept Drift with Test-Time Adaptation
Eunjin Roh
Yigitcan Kaya
Christopher Kruegel
Giovanni Vigna
Sanghyun Hong
116
0
0
24 May 2025
CONCORD: Concept-Informed Diffusion for Dataset Distillation
CONCORD: Concept-Informed Diffusion for Dataset Distillation
Jianyang Gu
Haonan Wang
Ruoxi Jia
Saeed Vahidian
Vyacheslav Kungurtsev
Wei Jiang
Yiran Chen
DiffMDD
922
0
0
23 May 2025
SpikeGen: Generative Framework for Visual Spike Stream Processing
Gaole Dai
Menghang Dong
Rongyu Zhang
Ruichuan An
Shanghang Zhang
Tiejun Huang
DiffM3DGS
44
0
0
23 May 2025
Bootstrapping Imitation Learning for Long-horizon Manipulation via Hierarchical Data Collection Space
Bootstrapping Imitation Learning for Long-horizon Manipulation via Hierarchical Data Collection Space
Jinrong Yang
Kexun Chen
Zhuoling Li
Shengkai Wu
Yong Zhao
...
Chaohui Shang
Meiyu Zhi
Linfeng Gao
Mingshan Sun
Hui Cheng
99
0
0
23 May 2025
From Flight to Insight: Semantic 3D Reconstruction for Aerial Inspection via Gaussian Splatting and Language-Guided Segmentation
Mahmoud Chick Zaouali
Todd Charter
Homayoun Najjaran
3DGS
33
0
0
23 May 2025
Center-aware Residual Anomaly Synthesis for Multi-class Industrial Anomaly Detection
Center-aware Residual Anomaly Synthesis for Multi-class Industrial Anomaly Detection
Qiyu Chen
Huiyuan Luo
Haiming Yao
Wei Luo
Zhen Qu
Chengkan Lv
Zhengtao Zhang
225
1
0
23 May 2025
BehaveGPT: A Foundation Model for Large-scale User Behavior Modeling
BehaveGPT: A Foundation Model for Large-scale User Behavior Modeling
Jiahui Gong
Jingtao Ding
Fanjin Meng
Chen Yang
Hong Chen
Zuojian Wang
Haisheng Lu
Yong Li
51
0
0
23 May 2025
REN: Fast and Efficient Region Encodings from Patch-Based Image Encoders
Savya Khosla
Sethuraman TV
Barnett Lee
Alexander Schwing
Derek Hoiem
VGen
167
0
0
23 May 2025
LookWhere? Efficient Visual Recognition by Learning Where to Look and What to See from Self-Supervision
A. Fuller
Yousef Yassin
Junfeng Wen
Daniel G. Kyrollos
Tarek Ibrahim
James R. Green
Evan Shelhamer
ViT
187
0
0
23 May 2025
REPA Works Until It Doesn't: Early-Stopped, Holistic Alignment Supercharges Diffusion Training
REPA Works Until It Doesn't: Early-Stopped, Holistic Alignment Supercharges Diffusion Training
Ziqiao Wang
Wangbo Zhao
Yuhao Zhou
Zekai Li
Zhiyuan Liang
...
Pengfei Zhou
Kai Zhang
Zhangyang Wang
Kai Wang
Yang You
92
0
0
22 May 2025
SAMba-UNet: Synergizing SAM2 and Mamba in UNet with Heterogeneous Aggregation for Cardiac MRI Segmentation
SAMba-UNet: Synergizing SAM2 and Mamba in UNet with Heterogeneous Aggregation for Cardiac MRI Segmentation
Guohao Huo
Ruiting Dai
Hao Tang
Mamba
69
0
0
22 May 2025
Investigating Fine- and Coarse-grained Structural Correspondences Between Deep Neural Networks and Human Object Image Similarity Judgments Using Unsupervised Alignment
Investigating Fine- and Coarse-grained Structural Correspondences Between Deep Neural Networks and Human Object Image Similarity Judgments Using Unsupervised Alignment
Soh Takahashi
Masaru Sasaki
Ken Takeda
Masafumi Oizumi
58
0
0
22 May 2025
REOBench: Benchmarking Robustness of Earth Observation Foundation Models
REOBench: Benchmarking Robustness of Earth Observation Foundation Models
Xiang Li
Yong Tao
Siyuan Zhang
Siwei Liu
Zhitong Xiong
Chunbo Luo
L. J. Liu
Mykola Pechenizkiy
Xiao Xiang Zhu
T. Huang
69
0
0
22 May 2025
Scalable Graph Generative Modeling via Substructure Sequences
Scalable Graph Generative Modeling via Substructure Sequences
Zehong Wang
Zheyuan Zhang
Tianyi Ma
Chuxu Zhang
Yanfang Ye
AI4CE
71
0
0
22 May 2025
SpecMaskFoley: Steering Pretrained Spectral Masked Generative Transformer Toward Synchronized Video-to-audio Synthesis via ControlNet
SpecMaskFoley: Steering Pretrained Spectral Masked Generative Transformer Toward Synchronized Video-to-audio Synthesis via ControlNet
Zhi-Wei Zhong
Akira Takahashi
Shuyang Cui
Keisuke Toyama
Shusuke Takahashi
Yuki Mitsufuji
VGen
66
0
0
22 May 2025
An Effective Training Framework for Light-Weight Automatic Speech Recognition Models
An Effective Training Framework for Light-Weight Automatic Speech Recognition Models
Abdul Hannan
Alessio Brutti
Shah Nawaz
Mubashir Noman
71
0
0
22 May 2025
CAD: A General Multimodal Framework for Video Deepfake Detection via Cross-Modal Alignment and Distillation
CAD: A General Multimodal Framework for Video Deepfake Detection via Cross-Modal Alignment and Distillation
Yuxuan Du
Zhendong Wang
Yuhao Luo
Caiyong Piao
Zhiyuan Yan
Hao Li
Li Yuan
169
0
0
21 May 2025
The Devil is in Fine-tuning and Long-tailed Problems:A New Benchmark for Scene Text Detection
The Devil is in Fine-tuning and Long-tailed Problems:A New Benchmark for Scene Text Detection
Tianjiao Cao
Jiahao Lyu
Weichao Zeng
Weimin Mu
Yu Zhou
88
0
0
21 May 2025
UWSAM: Segment Anything Model Guided Underwater Instance Segmentation and A Large-scale Benchmark Dataset
UWSAM: Segment Anything Model Guided Underwater Instance Segmentation and A Large-scale Benchmark Dataset
Hua Li
Shijie Lian
Zhiyuan Li
Runmin Cong
Sam Kwong
VLM
81
0
0
21 May 2025
gen2seg: Generative Models Enable Generalizable Instance Segmentation
gen2seg: Generative Models Enable Generalizable Instance Segmentation
Om Khangaonkar
Hamed Pirsiavash
DiffMVLM
147
0
0
21 May 2025
An Inclusive Foundation Model for Generalizable Cytogenetics in Precision Oncology
An Inclusive Foundation Model for Generalizable Cytogenetics in Precision Oncology
Changchun Yang
Weiqian Dai
Yilan Zhang
Siyuan Chen
Jingdong Hu
...
Yuxuan Chen
Ao Xu
Na Li
Xin Gao
Yongguo Yu
32
0
0
21 May 2025
Exploring The Visual Feature Space for Multimodal Neural Decoding
Exploring The Visual Feature Space for Multimodal Neural Decoding
Weihao Xia
Cengiz Öztireli
75
0
0
21 May 2025
GCNT: Graph-Based Transformer Policies for Morphology-Agnostic Reinforcement Learning
GCNT: Graph-Based Transformer Policies for Morphology-Agnostic Reinforcement Learning
Yingbo Luo
Meibao Yao
Xueming Xiao
75
0
0
21 May 2025
From Pixels to Images: Deep Learning Advances in Remote Sensing Image Semantic Segmentation
From Pixels to Images: Deep Learning Advances in Remote Sensing Image Semantic Segmentation
Quanwei Liu
Wei Chen
Yanni Dong
Jiaqi Yang
Wei Xiang
146
0
0
21 May 2025
Grouping First, Attending Smartly: Training-Free Acceleration for Diffusion Transformers
Grouping First, Attending Smartly: Training-Free Acceleration for Diffusion Transformers
Sucheng Ren
Qihang Yu
Ju He
Alan Yuille
Liang-Chieh Chen
131
0
0
20 May 2025
Egocentric Action-aware Inertial Localization in Point Clouds
Egocentric Action-aware Inertial Localization in Point Clouds
Mingfang Zhang
Ryo Yonetani
Yifei Huang
Liangyang Ouyang
Ruicong Liu
Yoichi Sato
60
0
0
20 May 2025
Utilizing Strategic Pre-training to Reduce Overfitting: Baguan -- A Pre-trained Weather Forecasting Model
Utilizing Strategic Pre-training to Reduce Overfitting: Baguan -- A Pre-trained Weather Forecasting Model
Peisong Niu
Ziqing Ma
Tian Zhou
Weiqi Chen
Lefei Shen
Rong Jin
Liang Sun
AI4CE
45
0
0
20 May 2025
MSVIT: Improving Spiking Vision Transformer Using Multi-scale Attention Fusion
MSVIT: Improving Spiking Vision Transformer Using Multi-scale Attention Fusion
Wei Hua
Chenlin Zhou
Jibin Wu
Yansong Chua
Yangyang Shu
110
0
0
19 May 2025
Enhancing Channel-Independent Time Series Forecasting via Cross-Variate Patch Embedding
Enhancing Channel-Independent Time Series Forecasting via Cross-Variate Patch Embedding
Donghwa Shin
Edwin Zhang
AI4TS
84
0
0
19 May 2025
Previous
123456...949596
Next