ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.06377
  4. Cited By
Masked Autoencoders Are Scalable Vision Learners
v1v2v3 (latest)

Masked Autoencoders Are Scalable Vision Learners

11 November 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
    ViTTPM
ArXiv (abs)PDFHTML

Papers citing "Masked Autoencoders Are Scalable Vision Learners"

50 / 4,778 papers shown
Title
AnimalFormer: Multimodal Vision Framework for Behavior-based Precision
  Livestock Farming
AnimalFormer: Multimodal Vision Framework for Behavior-based Precision Livestock Farming
Ahmed Qazi
Taha Razzaq
Asim Iqbal
58
2
0
14 Jun 2024
ImageNet3D: Towards General-Purpose Object-Level 3D Understanding
ImageNet3D: Towards General-Purpose Object-Level 3D Understanding
Wufei Ma
Guanning Zeng
Guofeng Zhang
Qihao Liu
Letian Zhang
Adam Kortylewski
Yaoyao Liu
Alan Yuille
VLM3DV
89
10
0
13 Jun 2024
Explore the Limits of Omni-modal Pretraining at Scale
Explore the Limits of Omni-modal Pretraining at Scale
Yiyuan Zhang
Handong Li
Jing Liu
Xiangyu Yue
VLMLRM
87
1
0
13 Jun 2024
4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities
4M-21: An Any-to-Any Vision Model for Tens of Tasks and Modalities
Roman Bachmann
Oğuzhan Fatih Kar
David Mizrahi
Ali Garjani
Mingfei Gao
David Griffiths
Jiaming Hu
Afshin Dehghan
Amir Zamir
MoEVLMMLLM
113
17
0
13 Jun 2024
Neural Assets: 3D-Aware Multi-Object Scene Synthesis with Image
  Diffusion Models
Neural Assets: 3D-Aware Multi-Object Scene Synthesis with Image Diffusion Models
Ziyi Wu
Yulia Rubanova
Rishabh Kabra
Drew A. Hudson
Igor Gilitschenski
Yusuf Aytar
Sjoerd van Steenkiste
Kelsey R. Allen
Thomas Kipf
VGenDiffM
104
9
0
13 Jun 2024
Towards Multilingual Audio-Visual Question Answering
Towards Multilingual Audio-Visual Question Answering
Orchid Chetia Phukan
Priyabrata Mallick
Swarup Ranjan Behera
Aalekhya Satya Narayani
Arun Balaji Buduru
Rajesh Sharma
98
0
0
13 Jun 2024
T-JEPA: A Joint-Embedding Predictive Architecture for Trajectory
  Similarity Computation
T-JEPA: A Joint-Embedding Predictive Architecture for Trajectory Similarity Computation
Lihuan Li
Hao Xue
Yang Song
Flora Salim
124
1
0
13 Jun 2024
Efficient Multi-View Fusion and Flexible Adaptation to View Missing in
  Cardiovascular System Signals
Efficient Multi-View Fusion and Flexible Adaptation to View Missing in Cardiovascular System Signals
Qihan Hu
Daomiao Wang
Hong Wu
Jian Liu
Cuiwei Yang
100
0
0
13 Jun 2024
Cognitively Inspired Energy-Based World Models
Cognitively Inspired Energy-Based World Models
Alexi Gladstone
Ganesh Nanduru
Md. Mofijul Islam
Aman Chadha
Jundong Li
Tariq Iqbal
72
0
0
13 Jun 2024
UnO: Unsupervised Occupancy Fields for Perception and Forecasting
UnO: Unsupervised Occupancy Fields for Perception and Forecasting
Ben Agro
Quinlan Sykora
Sergio Casas
Thomas Gilles
R. Urtasun
141
18
0
12 Jun 2024
Unveiling Incomplete Modality Brain Tumor Segmentation: Leveraging
  Masked Predicted Auto-Encoder and Divergence Learning
Unveiling Incomplete Modality Brain Tumor Segmentation: Leveraging Masked Predicted Auto-Encoder and Divergence Learning
Zhongao Sun
Jiameng Li
Yuhan Wang
Jiarong Cheng
Qing Zhou
Chun Li
MedIm
102
0
0
12 Jun 2024
Strategies for Pretraining Neural Operators
Strategies for Pretraining Neural Operators
Anthony Zhou
Cooper Lorsung
AmirPouya Hemmasian
Amir Barati Farimani
AI4CE
98
6
0
12 Jun 2024
A$^{2}$-MAE: A spatial-temporal-spectral unified remote sensing
  pre-training method based on anchor-aware masked autoencoder
A2^{2}2-MAE: A spatial-temporal-spectral unified remote sensing pre-training method based on anchor-aware masked autoencoder
Lixian Zhang
Yi Zhao
Runmin Dong
Jinxiao Zhang
Shuai Yuan
...
Weijia Li
Wei Liu
Wayne Zhang
Xue Jiang
Haohuan Fu
117
4
0
12 Jun 2024
A Concept-Based Explainability Framework for Large Multimodal Models
A Concept-Based Explainability Framework for Large Multimodal Models
Jayneel Parekh
Pegah Khayatan
Mustafa Shukor
A. Newson
Matthieu Cord
102
18
0
12 Jun 2024
Sense Less, Generate More: Pre-training LiDAR Perception with Masked
  Autoencoders for Ultra-Efficient 3D Sensing
Sense Less, Generate More: Pre-training LiDAR Perception with Masked Autoencoders for Ultra-Efficient 3D Sensing
Sina Tayebati
Theja Tulabandhula
A. R. Trivedi
89
6
0
12 Jun 2024
Enhancing End-to-End Autonomous Driving with Latent World Model
Enhancing End-to-End Autonomous Driving with Latent World Model
Yingyan Li
Lue Fan
Jiawei He
Yuqi Wang
Yuntao Chen
Zhaoxiang Zhang
Tieniu Tan
169
22
0
12 Jun 2024
An Image is Worth 32 Tokens for Reconstruction and Generation
An Image is Worth 32 Tokens for Reconstruction and Generation
Qihang Yu
Mark Weber
XueQing Deng
Xiaohui Shen
Daniel Cremers
Liang-Chieh Chen
VLMViT
172
104
0
11 Jun 2024
Vision Model Pre-training on Interleaved Image-Text Data via Latent
  Compression Learning
Vision Model Pre-training on Interleaved Image-Text Data via Latent Compression Learning
Chenyu Yang
Xizhou Zhu
Jinguo Zhu
Weijie Su
Junjie Wang
...
Lewei Lu
Bin Li
Jie Zhou
Yu Qiao
Jifeng Dai
VLMCLIP
87
6
0
11 Jun 2024
BAKU: An Efficient Transformer for Multi-Task Policy Learning
BAKU: An Efficient Transformer for Multi-Task Policy Learning
Siddhant Haldar
Zhuoran Peng
Lerrel Pinto
OffRL
114
43
0
11 Jun 2024
Autoregressive Pretraining with Mamba in Vision
Autoregressive Pretraining with Mamba in Vision
Sucheng Ren
Xianhang Li
Haoqin Tu
Feng Wang
Fangxun Shu
...
L. Yang
Peng Wang
Heng Wang
Alan Yuille
Cihang Xie
Mamba
127
12
0
11 Jun 2024
Towards Fundamentally Scalable Model Selection: Asymptotically Fast
  Update and Selection
Towards Fundamentally Scalable Model Selection: Asymptotically Fast Update and Selection
Wenxiao Wang
Weiming Zhuang
Lingjuan Lyu
103
0
0
11 Jun 2024
Visual Representation Learning with Stochastic Frame Prediction
Visual Representation Learning with Stochastic Frame Prediction
Huiwon Jang
Dongyoung Kim
Junsu Kim
Jinwoo Shin
Pieter Abbeel
Younggyo Seo
99
3
0
11 Jun 2024
Let Go of Your Labels with Unsupervised Transfer
Let Go of Your Labels with Unsupervised Transfer
Artyom Gadetsky
Yulun Jiang
Maria Brbić
VLM
100
8
0
11 Jun 2024
RS-DFM: A Remote Sensing Distributed Foundation Model for Diverse
  Downstream Tasks
RS-DFM: A Remote Sensing Distributed Foundation Model for Diverse Downstream Tasks
Zhechao Wang
Peirui Cheng
Pengju Tian
Yuchao Wang
Mingxin Chen
Shujing Duan
Zhirui Wang
Xinming Li
Xian Sun
70
2
0
11 Jun 2024
Scaling up masked audio encoder learning for general audio
  classification
Scaling up masked audio encoder learning for general audio classification
Heinrich Dinkel
Zhiyong Yan
Yongqing Wang
Junbo Zhang
Yujun Wang
Bin Wang
96
7
0
11 Jun 2024
UVIS: Unsupervised Video Instance Segmentation
UVIS: Unsupervised Video Instance Segmentation
Shuaiyi Huang
Saksham Suri
Kamal Gupta
Sai Saketh Rambhatla
Ser-Nam Lim
Abhinav Shrivastava
VLM
81
3
0
11 Jun 2024
Merlin: A Vision Language Foundation Model for 3D Computed Tomography
Merlin: A Vision Language Foundation Model for 3D Computed Tomography
Louis Blankemeier
Joseph Paul Cohen
Ashwin Kumar
Dave Van Veen
Syed Jamal Safdar Gardezi
...
Andrew L. Wentland
C. Langlotz
Jason Hom
S. Gatidis
Akshay S. Chaudhari
LM&MAMedIm
91
41
0
10 Jun 2024
NeuroMoCo: A Neuromorphic Momentum Contrast Learning Method for Spiking
  Neural Networks
NeuroMoCo: A Neuromorphic Momentum Contrast Learning Method for Spiking Neural Networks
Yuqi Ma
Huamin Wang
Hangchi Shen
Xuemei Chen
Shukai Duan
Shiping Wen
128
0
0
10 Jun 2024
UnSupDLA: Towards Unsupervised Document Layout Analysis
UnSupDLA: Towards Unsupervised Document Layout Analysis
Talha Uddin Sheikh
Tahira Shehzadi
K. Hashmi
Didier Stricker
Muhammad Zeshan Afzal
81
2
0
10 Jun 2024
Extending Segment Anything Model into Auditory and Temporal Dimensions
  for Audio-Visual Segmentation
Extending Segment Anything Model into Auditory and Temporal Dimensions for Audio-Visual Segmentation
Juhyeong Seon
Woobin Im
Sebin Lee
Jumin Lee
Sung-eui Yoon
92
2
0
10 Jun 2024
An Open and Large-Scale Dataset for Multi-Modal Climate Change-aware
  Crop Yield Predictions
An Open and Large-Scale Dataset for Multi-Modal Climate Change-aware Crop Yield Predictions
Fudong Lin
Kaleb Guillot
Summer Crawford
Yihe Zhang
Xu Yuan
Nian-Feng Tzeng
AI4ClAI4CE
89
7
0
10 Jun 2024
Adapting Pretrained ViTs with Convolution Injector for Visuo-Motor
  Control
Adapting Pretrained ViTs with Convolution Injector for Visuo-Motor Control
Dongyoon Hwang
ByungKun Lee
Hojoon Lee
Hyunseung Kim
Jaegul Choo
114
0
0
10 Jun 2024
Investigating Pre-Training Objectives for Generalization in Vision-Based
  Reinforcement Learning
Investigating Pre-Training Objectives for Generalization in Vision-Based Reinforcement Learning
Donghu Kim
Hojoon Lee
Kyungmin Lee
Dongyoon Hwang
Jaegul Choo
OffRL
87
1
0
10 Jun 2024
VCR: A Task for Pixel-Level Complex Reasoning in Vision Language Models via Restoring Occluded Text
VCR: A Task for Pixel-Level Complex Reasoning in Vision Language Models via Restoring Occluded Text
Tianyu Zhang
Suyuchen Wang
Lu Li
Ge Zhang
Perouz Taslakian
Sai Rajeswar
Jie Fu
Bang Liu
Yoshua Bengio
116
5
0
10 Jun 2024
SAM-PM: Enhancing Video Camouflaged Object Detection using
  Spatio-Temporal Attention
SAM-PM: Enhancing Video Camouflaged Object Detection using Spatio-Temporal Attention
Muhammad Nawfal Meeran
Gokul Adethya T
Bhanu Pratyush Mantha
87
4
0
09 Jun 2024
Utilizing Grounded SAM for self-supervised frugal camouflaged human
  detection
Utilizing Grounded SAM for self-supervised frugal camouflaged human detection
Matthias Pijarowski
Alexander Wolpert
Martin Heckmann
Michael Teutsch
80
1
0
09 Jun 2024
CorrMAE: Pre-training Correspondence Transformers with Masked
  Autoencoder
CorrMAE: Pre-training Correspondence Transformers with Masked Autoencoder
Tangfei Liao
Xiaoqin Zhang
Guobao Xiao
Min Li
Tao Wang
Mang Ye
68
1
0
09 Jun 2024
Medical Vision Generalist: Unifying Medical Imaging Tasks in Context
Medical Vision Generalist: Unifying Medical Imaging Tasks in Context
Sucheng Ren
Xiaoke Huang
Xianhang Li
Junfei Xiao
Jieru Mei
Zeyu Wang
Alan Yuille
Yuyin Zhou
MedIm
89
9
0
08 Jun 2024
Training-Free Robust Interactive Video Object Segmentation
Training-Free Robust Interactive Video Object Segmentation
Xiaoli Wei
Zhaoqing Wang
Yandong Guo
Chunxia Zhang
Tongliang Liu
Mingming Gong
VLMVOS
80
1
0
08 Jun 2024
Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis
Revisiting Non-Autoregressive Transformers for Efficient Image Synthesis
Zanlin Ni
Yulin Wang
Renping Zhou
Jiayi Guo
Jinyi Hu
Zhiyuan Liu
Shiji Song
Yuan Yao
Gao Huang
84
17
0
08 Jun 2024
Weakly Supervised Set-Consistency Learning Improves Morphological
  Profiling of Single-Cell Images
Weakly Supervised Set-Consistency Learning Improves Morphological Profiling of Single-Cell Images
Heming Yao
Phil Hanslovsky
Jan-Christian Huetter
Burkhard Hoeckendorf
David Richmond
73
5
0
08 Jun 2024
MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for
  Vision Tasks
MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks
Xingkui Zhu
Yiran Guan
Dingkang Liang
Yuchao Chen
Yuliang Liu
Xiang Bai
MoE
84
6
0
07 Jun 2024
Parameter-Inverted Image Pyramid Networks
Parameter-Inverted Image Pyramid Networks
Xizhou Zhu
Xue Yang
Zhaokai Wang
Hao Li
Wenhan Dou
Junqi Ge
Lewei Lu
Ping Luo
Jifeng Dai
79
0
0
06 Jun 2024
M3LEO: A Multi-Modal, Multi-Label Earth Observation Dataset Integrating
  Interferometric SAR and RGB Data
M3LEO: A Multi-Modal, Multi-Label Earth Observation Dataset Integrating Interferometric SAR and RGB Data
Matthew J Allen
Francisco Dorr
Joseph A. Gallego-Mejia
Laura Martínez-Ferrer
Anna Jungbluth
Freddie Kalaitzis
Raúl Ramos-Pollán
90
4
0
06 Jun 2024
FPN-fusion: Enhanced Linear Complexity Time Series Forecasting Model
FPN-fusion: Enhanced Linear Complexity Time Series Forecasting Model
Chu Li
Pingjia Xiao
Q. Yuan
AI4TS
51
0
0
06 Jun 2024
Road Network Representation Learning with the Third Law of Geography
Road Network Representation Learning with the Third Law of Geography
Haicang Zhou
Weiming Huang
Yile Chen
Tiantian He
Gao Cong
Yew-Soon Ong
57
4
0
06 Jun 2024
The 3D-PC: a benchmark for visual perspective taking in humans and machines
The 3D-PC: a benchmark for visual perspective taking in humans and machines
Drew Linsley
Peisen Zhou
A. Ashok
Akash Nagaraj
Gaurav Gaonkar
Francis E Lewis
Zygmunt Pizlo
Thomas Serre
137
6
0
06 Jun 2024
VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
Zeyue Tian
Zhaoyang Liu
Ruibin Yuan
Jiahao Pan
Xiaoqiang Huang
Xu Tan
Xu Tan
Qifeng Chen
Yu Guo
VGen
275
17
0
06 Jun 2024
Alignment Calibration: Machine Unlearning for Contrastive Learning under
  Auditing
Alignment Calibration: Machine Unlearning for Contrastive Learning under Auditing
Yihan Wang
Yiwei Lu
Guojun Zhang
Franziska Boenisch
Adam Dziedzic
Yaoliang Yu
Xiao-Shan Gao
MU
106
1
0
05 Jun 2024
Hi5: 2D Hand Pose Estimation with Zero Human Annotation
Hi5: 2D Hand Pose Estimation with Zero Human Annotation
Masum Hasan
Cengiz Ozel
Nina Long
Alexander Martin
Samuel Potter
Tariq Adnan
Sangwu Lee
Amir Zadeh
Ehsan Hoque
DiffM3DH
62
0
0
05 Jun 2024
Previous
123...293031...949596
Next