ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.06377
  4. Cited By
Masked Autoencoders Are Scalable Vision Learners
v1v2v3 (latest)

Masked Autoencoders Are Scalable Vision Learners

11 November 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
    ViTTPM
ArXiv (abs)PDFHTML

Papers citing "Masked Autoencoders Are Scalable Vision Learners"

50 / 4,777 papers shown
Title
DynamicVis: An Efficient and General Visual Foundation Model for Remote Sensing Image Understanding
DynamicVis: An Efficient and General Visual Foundation Model for Remote Sensing Image Understanding
Keyan Chen
Chenyang Liu
Bowen Chen
Wenyuan Li
Zhengxia Zou
Zhenwei Shi
78
3
0
20 Mar 2025
TULIP: Towards Unified Language-Image Pretraining
TULIP: Towards Unified Language-Image Pretraining
Zineng Tang
Long Lian
Seun Eisape
Xudong Wang
Roei Herzig
Adam Yala
Alane Suhr
Trevor Darrell
David M. Chan
VLMCLIPMLLM
202
7
0
19 Mar 2025
FedSCA: Federated Tuning with Similarity-guided Collaborative Aggregation for Heterogeneous Medical Image Segmentation
FedSCA: Federated Tuning with Similarity-guided Collaborative Aggregation for Heterogeneous Medical Image Segmentation
Yumin Zhang
Yan Gao
Haoran Duan
Hanqing Guo
Tejal Shah
R. Ranjan
Bo Wei
FedML
116
0
0
19 Mar 2025
CHROME: Clothed Human Reconstruction with Occlusion-Resilience and Multiview-Consistency from a Single Image
CHROME: Clothed Human Reconstruction with Occlusion-Resilience and Multiview-Consistency from a Single Image
Arindam Dutta
Meng Zheng
Zhongpai Gao
Benjamin Planche
Anwesha Choudhuri
Terrence Chen
Amit K. Roy-Chowdhury
Ziyan Wu
3DH
100
2
0
19 Mar 2025
Aligning Information Capacity Between Vision and Language via Dense-to-Sparse Feature Distillation for Image-Text Matching
Aligning Information Capacity Between Vision and Language via Dense-to-Sparse Feature Distillation for Image-Text Matching
Yang Liu
Wentao Feng
Zhuoyao Liu
Shudong Huang
Jiancheng Lv
DiffMVLM
114
0
0
19 Mar 2025
CAM-Seg: A Continuous-valued Embedding Approach for Semantic Image Generation
CAM-Seg: A Continuous-valued Embedding Approach for Semantic Image Generation
Masud Ahmed
Zahid Hasan
Syed Arefinul Haque
A. Faridee
S. Purushotham
Suya You
Nirmalya Roy
182
0
0
19 Mar 2025
Conjuring Positive Pairs for Efficient Unification of Representation Learning and Image Synthesis
Conjuring Positive Pairs for Efficient Unification of Representation Learning and Image Synthesis
Imanol G. Estepa
Jesús M. Rodríguez-de-Vera
Ignacio Sarasúa
Bhalaji Nagarajan
Petia Radeva
200
0
0
19 Mar 2025
Transport-Related Surface Detection with Machine Learning: Analyzing Temporal Trends in Madrid and Vienna
Transport-Related Surface Detection with Machine Learning: Analyzing Temporal Trends in Madrid and Vienna
Miguel Ureña Pliego
Rubén Martínez Marín
Nianfang Shi
Takeru Shibayama
Ulrich Leth
Miguel Marchamalo Sacristán
225
0
0
19 Mar 2025
Object-Centric Pretraining via Target Encoder Bootstrapping
Object-Centric Pretraining via Target Encoder Bootstrapping
Nikola Đukić
Tim Lebailly
Tinne Tuytelaars
OCL
129
0
0
19 Mar 2025
Representational Similarity via Interpretable Visual Concepts
Representational Similarity via Interpretable Visual Concepts
Neehar Kondapaneni
Oisin Mac Aodha
Pietro Perona
DRL
506
2
0
19 Mar 2025
Utilization of Neighbor Information for Image Classification with Different Levels of Supervision
Utilization of Neighbor Information for Image Classification with Different Levels of Supervision
Gihan Jayatilaka
Abhinav Shrivastava
M. Gwilliam
103
0
0
18 Mar 2025
FusDreamer: Label-efficient Remote Sensing World Model for Multimodal Data Classification
FusDreamer: Label-efficient Remote Sensing World Model for Multimodal Data Classification
Jiadong Wang
Weiwei Song
Hao Chen
Jie Ren
Huimin Zhao
146
0
0
18 Mar 2025
Deeply Supervised Flow-Based Generative Models
Deeply Supervised Flow-Based Generative Models
Inkyu Shin
Chenglin Yang
Liang-Chieh Chen
93
2
0
18 Mar 2025
SAM2 for Image and Video Segmentation: A Comprehensive Survey
SAM2 for Image and Video Segmentation: A Comprehensive Survey
Zhang Jiaxing
Tang Hao
VLM
110
0
0
17 Mar 2025
Graph Generative Models Evaluation with Masked Autoencoder
Graph Generative Models Evaluation with Masked Autoencoder
Chengen Wang
Murat Kantarcioglu
94
0
0
17 Mar 2025
8-Calves Image dataset
8-Calves Image dataset
Xuyang Fang
S. Hannuna
Neill D. F. Campbell
404
0
0
17 Mar 2025
Towards Scalable Foundation Model for Multi-modal and Hyperspectral Geospatial Data
Towards Scalable Foundation Model for Multi-modal and Hyperspectral Geospatial Data
Haozhe Si
Yuxuan Wan
Minh Do
Deepak Vasisht
Han Zhao
Hendrik Hamann
173
0
0
17 Mar 2025
Quantum EigenGame for excited state calculation
Quantum EigenGame for excited state calculation
David Quiroga
Jason Han
Anastasios Kyrillidis
116
0
0
17 Mar 2025
MaskSDM with Shapley values to improve flexibility, robustness, and explainability in species distribution modeling
MaskSDM with Shapley values to improve flexibility, robustness, and explainability in species distribution modeling
Robin Zbinden
Nina Van Tiel
Gencer Sumbul
Chiara Vanalli
B. Kellenberger
D. Tuia
75
0
0
17 Mar 2025
Pathology Image Restoration via Mixture of Prompts
Pathology Image Restoration via Mixture of Prompts
Jiangdong Cai
Yan Chen
Zhenrong Shen
Haotian Jiang
Honglin Xiong
Kai Xuan
Lichi Zhang
Qian Wang
MedIm
80
0
0
16 Mar 2025
Multi Activity Sequence Alignment via Implicit Clustering
Multi Activity Sequence Alignment via Implicit Clustering
Taein Kwon
Zador Pataki
Mahdi Rad
Marc Pollefeys
HAIAI4TS
103
0
0
16 Mar 2025
SAM2-ELNet: Label Enhancement and Automatic Annotation for Remote Sensing Segmentation
SAM2-ELNet: Label Enhancement and Automatic Annotation for Remote Sensing Segmentation
Jianhao Yang
Wenshuo Yu
Yuanchao Lv
Jiance Sun
Bokang Sun
Mingyang Liu
81
0
0
16 Mar 2025
Self-Supervised Pretraining for Fine-Grained Plankton Recognition
Self-Supervised Pretraining for Fine-Grained Plankton Recognition
Joona Kareinen
T. Eerola
K. Kraft
L. Lensu
S. Suikkanen
Heikki Kälviäinen
SSL
494
0
0
14 Mar 2025
COIN: Confidence Score-Guided Distillation for Annotation-Free Cell Segmentation
COIN: Confidence Score-Guided Distillation for Annotation-Free Cell Segmentation
Sanghyun Jo
Seo Jin Lee
Seungwoo Lee
Seohyung Hong
Hyungseok Seo
Kyungsu Kim
80
0
0
14 Mar 2025
Towards a Unified Copernicus Foundation Model for Earth Vision
Towards a Unified Copernicus Foundation Model for Earth Vision
Yi Wang
Zhitong Xiong
Chenying Liu
Adam J. Stewart
Thomas Dujardin
...
Angelos Zavras
Franziska Gerken
Ioannis Papoutsis
Laura Leal-Taixé
Xiao Xiang Zhu
120
4
0
14 Mar 2025
SpaceSeg: A High-Precision Intelligent Perception Segmentation Method for Multi-Spacecraft On-Orbit Targets
Hao Liu
Pengyu Guo
Siyuan Yang
Zeqing Jiang
Qinglei Hu
Dongyu Li
55
0
0
14 Mar 2025
Similarity-Aware Token Pruning: Your VLM but Faster
Ahmadreza Jeddi
Negin Baghbanzadeh
Elham Dolatabadi
Babak Taati
3DVVLM
119
2
0
14 Mar 2025
Transformers without Normalization
Transformers without Normalization
Jiachen Zhu
Xinlei Chen
Kaiming He
Yann LeCun
Zhuang Liu
OffRLViT
160
20
0
13 Mar 2025
Beyond Atoms: Enhancing Molecular Pretrained Representations with 3D Space Modeling
Beyond Atoms: Enhancing Molecular Pretrained Representations with 3D Space Modeling
Shuqi Lu
Xiaohong Ji
Bohang Zhang
Lin Yao
Siyuan Liu
Zhifeng Gao
Linfeng Zhang
Guolin Ke
AI4CE
128
1
0
13 Mar 2025
Masked Sensory-Temporal Attention for Sensor Generalization in Quadruped Locomotion
Masked Sensory-Temporal Attention for Sensor Generalization in Quadruped Locomotion
Dikai Liu
Tianwei Zhang
Jianxiong Yin
Simon See
265
1
0
13 Mar 2025
The Power of One: A Single Example is All it Takes for Segmentation in VLMs
Mir Rayat Imtiaz Hossain
Mennatullah Siam
Leonid Sigal
James J. Little
MLLMVLM
Presented at ResearchTrend Connect | VLM on 21 May 2025
230
0
0
13 Mar 2025
Do computer vision foundation models learn the low-level characteristics of the human visual system?
Do computer vision foundation models learn the low-level characteristics of the human visual system?
Yancheng Cai
Fei Yin
Dounia Hammou
Rafal Mantiuk
VLM
Presented at ResearchTrend Connect | VLM on 14 Mar 2025
225
2
0
13 Mar 2025
CoStoDet-DDPM: Collaborative Training of Stochastic and Deterministic Models Improves Surgical Workflow Anticipation and Recognition
Kaixiang Yang
Xin Li
Qiang Li
Zhiwei Wang
93
0
0
13 Mar 2025
RoMA: Scaling up Mamba-based Foundation Models for Remote Sensing
Fengxiang Wang
Hongru Wang
Yansen Wang
Di Wang
Mingshuo Chen
...
Yangang Sun
Shuo Wang
L. Lan
Wenjing Yang
Jing Zhang
Mamba
122
3
0
13 Mar 2025
AudioX: Diffusion Transformer for Anything-to-Audio Generation
AudioX: Diffusion Transformer for Anything-to-Audio Generation
Zeyue Tian
Yizhu Jin
Zhaoyang Liu
Ruibin Yuan
Xu Tan
Qifeng Chen
Wei Xue
Yu Guo
118
6
0
13 Mar 2025
Panopticon: Advancing Any-Sensor Foundation Models for Earth Observation
Leonard Waldmann
Ando Shah
Yi Wang
Nils Lehmann
Adam J. Stewart
Zhitong Xiong
Xiao Xiang Zhu
Stefan Bauer
John Chuang
74
4
0
13 Mar 2025
Towards Graph Foundation Models: A Transferability Perspective
Yansen Wang
Wenqi Fan
Suhang Wang
Yao Ma
88
1
0
13 Mar 2025
A Self-supervised Motion Representation for Portrait Video Generation
A Self-supervised Motion Representation for Portrait Video Generation
Qiyuan Zhang
Chenyu Wu
Wenzhang Sun
Huaize Liu
Donglin Di
Wei Chen
Changqing Zou
VGen
111
0
0
13 Mar 2025
MotionLab: Unified Human Motion Generation and Editing via the Motion-Condition-Motion Paradigm
MotionLab: Unified Human Motion Generation and Editing via the Motion-Condition-Motion Paradigm
Ziyan Guo
Zeyu Hu
Na Zhao
De Wen Soh
VGen
205
3
0
13 Mar 2025
Robustness Tokens: Towards Adversarial Robustness of Transformers
Brian Pulfer
Yury Belousov
S. Voloshynovskiy
AAML
85
0
0
13 Mar 2025
Interactive Multimodal Fusion with Temporal Modeling
Jun-chen Yu
Yongqi Wang
Lei Wang
Yang Zheng
Shengfan Xu
103
1
0
13 Mar 2025
Multi-Modal Foundation Models for Computational Pathology: A Survey
Multi-Modal Foundation Models for Computational Pathology: A Survey
Dong Li
Guihong Wan
Xintao Wu
Xinyu Wu
Xiaohui Chen
Yi He
Christine G. Lian
Peter K. Sorger
Yevgeniy R. Semenov
Chen Zhao
MedIm
123
0
0
12 Mar 2025
Evaluating Visual Explanations of Attention Maps for Transformer-based Medical Imaging
Minjae Chung
Jong Bum Won
Ganghyun Kim
Yujin Kim
Utku Ozbulak
MedIm
198
0
0
12 Mar 2025
Evaluation of state-of-the-art deep learning models in the segmentation of the heart ventricles in parasternal short-axis echocardiograms
Julian Rene Cuellar Buritica
Vu Dinh
Manjula Burri
Julie Roelandts
James Wendling
Jon D. Klingensmith
111
0
0
12 Mar 2025
Freeze and Cluster: A Simple Baseline for Rehearsal-Free Continual Category Discovery
Chuyu Zhang
Xueyang Yu
Peiyan Gu
Xuming He
CLL
144
0
0
12 Mar 2025
CleverDistiller: Simple and Spatially Consistent Cross-modal Distillation
Hariprasath Govindarajan
Maciej K. Wozniak
Marvin Klingner
Camille Maurice
B. R. Kiran
S. Yogamani
120
0
0
12 Mar 2025
Discovering Influential Neuron Path in Vision Transformers
Discovering Influential Neuron Path in Vision Transformers
Yifan Wang
Yifei Liu
Yingdong Shi
Chong Li
Anqi Pang
Sibei Yang
Jingyi Yu
Kan Ren
ViT
251
0
0
12 Mar 2025
HaploVL: A Single-Transformer Baseline for Multi-Modal Understanding
HaploVL: A Single-Transformer Baseline for Multi-Modal Understanding
Rui Yang
Lin Song
Yicheng Xiao
Runhui Huang
Yixiao Ge
Ying Shan
Hengshuang Zhao
MLLM
108
3
0
12 Mar 2025
Isolated Channel Vision Transformers: From Single-Channel Pretraining to Multi-Channel Finetuning
Wenyi Lian
Joakim Lindblad
Patrick Micke
Natasa Sladoje
102
1
0
12 Mar 2025
Pre-trained Models Succeed in Medical Imaging with Representation Similarity Degradation
Wenqiang Zu
Shenghao Xie
Hao Chen
Lei Ma
MedIm
143
0
0
11 Mar 2025
Previous
123...8910...949596
Next