ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.06377
  4. Cited By
Masked Autoencoders Are Scalable Vision Learners
v1v2v3 (latest)

Masked Autoencoders Are Scalable Vision Learners

11 November 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
    ViTTPM
ArXiv (abs)PDFHTML

Papers citing "Masked Autoencoders Are Scalable Vision Learners"

50 / 4,779 papers shown
Title
USat: A Unified Self-Supervised Encoder for Multi-Sensor Satellite
  Imagery
USat: A Unified Self-Supervised Encoder for Multi-Sensor Satellite Imagery
Jeremy Irvin
Lucas Tao
Joanne Zhou
Yuntao Ma
Langston Nashold
Benjamin Liu
Andrew Y. Ng
ViT
101
23
0
02 Dec 2023
SASSL: Enhancing Self-Supervised Learning via Neural Style Transfer
SASSL: Enhancing Self-Supervised Learning via Neural Style Transfer
Renan A. Rojas-Gomez
Karan Singhal
Ali Etemad
Alex Bijamov
Warren Morningstar
Philip Mansfield
94
1
0
02 Dec 2023
Beyond Accuracy: Statistical Measures and Benchmark for Evaluation of
  Representation from Self-Supervised Learning
Beyond Accuracy: Statistical Measures and Benchmark for Evaluation of Representation from Self-Supervised Learning
Jiantao Wu
Shentong Mo
Sara Atito
Josef Kittler
Zhenhua Feng
Muhammad Awais
SSL
77
3
0
02 Dec 2023
Local Masking Meets Progressive Freezing: Crafting Efficient Vision
  Transformers for Self-Supervised Learning
Local Masking Meets Progressive Freezing: Crafting Efficient Vision Transformers for Self-Supervised Learning
Utku Mert Topcuoglu
Erdem Akagündüz
88
1
0
02 Dec 2023
Token Fusion: Bridging the Gap between Token Pruning and Token Merging
Token Fusion: Bridging the Gap between Token Pruning and Token Merging
Minchul Kim
Shangqian Gao
Yen-Chang Hsu
Yilin Shen
Hongxia Jin
98
42
0
02 Dec 2023
Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense
  Interactions through Masked Modeling
Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling
Shentong Mo
Pedro Morgado
82
14
0
02 Dec 2023
Improve Supervised Representation Learning with Masked Image Modeling
Improve Supervised Representation Learning with Masked Image Modeling
Kaifeng Chen
Daniel M. Salz
Huiwen Chang
Kihyuk Sohn
Dilip Krishnan
Mojtaba Seyedhosseini
SSLViT
67
3
0
01 Dec 2023
Segment and Caption Anything
Segment and Caption Anything
Xiaoke Huang
Jianfeng Wang
Yansong Tang
Zheng Zhang
Han Hu
Jiwen Lu
Lijuan Wang
Zicheng Liu
MLLMVLM
94
21
0
01 Dec 2023
Sequential Modeling Enables Scalable Learning for Large Vision Models
Sequential Modeling Enables Scalable Learning for Large Vision Models
Yutong Bai
Xinyang Geng
K. Mangalam
Amir Bar
Alan Yuille
Trevor Darrell
Jitendra Malik
Alexei A. Efros
MLLMVLM
91
169
0
01 Dec 2023
EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment
  Anything
EfficientSAM: Leveraged Masked Image Pretraining for Efficient Segment Anything
Yunyang Xiong
Bala Varadarajan
Lemeng Wu
Xiaoyu Xiang
Fanyi Xiao
...
Dilin Wang
Fei Sun
Forrest N. Iandola
Raghuraman Krishnamoorthi
Vikas Chandra
VLM
107
160
0
01 Dec 2023
SPOT: Self-Training with Patch-Order Permutation for Object-Centric
  Learning with Autoregressive Transformers
SPOT: Self-Training with Patch-Order Permutation for Object-Centric Learning with Autoregressive Transformers
Ioannis Kakogeorgiou
Spyros Gidaris
Konstantinos Karantzalos
N. Komodakis
ViTOCL
131
16
0
01 Dec 2023
Learning from One Continuous Video Stream
Learning from One Continuous Video Stream
João Carreira
Michael King
Viorica Patraucean
Dilara Gokay
Catalin Ionescu
...
Joseph Heyward
Carl Doersch
Y. Aytar
Dima Damen
Andrew Zisserman
CLL
91
6
0
01 Dec 2023
Spatial-Temporal-Decoupled Masked Pre-training for Spatiotemporal
  Forecasting
Spatial-Temporal-Decoupled Masked Pre-training for Spatiotemporal Forecasting
Haotian Gao
Renhe Jiang
Zheng Dong
Jinliang Deng
Yuxin Ma
Xuan Song
AI4TS
105
21
0
01 Dec 2023
Self-Supervised Learning of Spatial Acoustic Representation with
  Cross-Channel Signal Reconstruction and Multi-Channel Conformer
Self-Supervised Learning of Spatial Acoustic Representation with Cross-Channel Signal Reconstruction and Multi-Channel Conformer
Bing Yang
Xiaofei Li
SSL
115
3
0
01 Dec 2023
Learning Anatomically Consistent Embedding for Chest Radiography
Learning Anatomically Consistent Embedding for Chest Radiography
Ziyu Zhou
Haozhe Luo
Jiaxuan Pang
Xiaowei Ding
Michael B. Gotway
Jianming Liang
SSL
89
5
0
01 Dec 2023
Brainformer: Mimic Human Visual Brain Functions to Machine Vision Models
  via fMRI
Brainformer: Mimic Human Visual Brain Functions to Machine Vision Models via fMRI
Xuan-Bac Nguyen
Xin Li
Pawan Sinha
Samee U. Khan
Khoa Luu
ViTMedIm
98
0
0
30 Nov 2023
InstructSeq: Unifying Vision Tasks with Instruction-conditioned
  Multi-modal Sequence Generation
InstructSeq: Unifying Vision Tasks with Instruction-conditioned Multi-modal Sequence Generation
Rongyao Fang
Shilin Yan
Zhaoyang Huang
Jingqiu Zhou
Hao Tian
Jifeng Dai
Hongsheng Li
MLLM
108
14
0
30 Nov 2023
DEVIAS: Learning Disentangled Video Representations of Action and Scene
  for Holistic Video Understanding
DEVIAS: Learning Disentangled Video Representations of Action and Scene for Holistic Video Understanding
Kyungho Bae
Geo Ahn
Youngrae Kim
Jinwoo Choi
75
3
0
30 Nov 2023
Initializing Models with Larger Ones
Initializing Models with Larger Ones
Zhiqiu Xu
Yanjie Chen
Kirill Vishniakov
Yida Yin
Zhiqiang Shen
Trevor Darrell
Lingjie Liu
Zhuang Liu
95
21
0
30 Nov 2023
Merlin:Empowering Multimodal LLMs with Foresight Minds
Merlin:Empowering Multimodal LLMs with Foresight Minds
En Yu
Liang Zhao
Yana Wei
Jinrong Yang
Dongming Wu
...
Haoran Wei
Tiancai Wang
Zheng Ge
Xiangyu Zhang
Wenbing Tao
LRM
140
27
0
30 Nov 2023
Stochastic Vision Transformers with Wasserstein Distance-Aware Attention
Stochastic Vision Transformers with Wasserstein Distance-Aware Attention
Franciskus Xaverius Erick
Mina Rezaei
Johanna P. Müller
Bernhard Kainz
51
0
0
30 Nov 2023
Perceptual Group Tokenizer: Building Perception with Iterative Grouping
Perceptual Group Tokenizer: Building Perception with Iterative Grouping
Zhiwei Deng
Ting Chen
Yang Li
ViTVLM
75
2
0
30 Nov 2023
Knowledge Transfer from Vision Foundation Models for Efficient Training
  of Small Task-specific Models
Knowledge Transfer from Vision Foundation Models for Efficient Training of Small Task-specific Models
Raviteja Vemulapalli
Hadi Pouransari
Fartash Faghri
Sachin Mehta
Mehrdad Farajtabar
Mohammad Rastegari
Oncel Tuzel
145
11
0
30 Nov 2023
Quantification of cardiac capillarization in single-immunostained
  myocardial slices using weakly supervised instance segmentation
Quantification of cardiac capillarization in single-immunostained myocardial slices using weakly supervised instance segmentation
Zhao Zhang
Xiwen Chen
William Richardson
Bruce Z. Gao
Abolfazl Razi
Tong Ye
40
1
0
30 Nov 2023
Back to 3D: Few-Shot 3D Keypoint Detection with Back-Projected 2D
  Features
Back to 3D: Few-Shot 3D Keypoint Detection with Back-Projected 2D Features
Thomas Wimmer
Peter Wonka
M. Ovsjanikov
118
13
0
29 Nov 2023
MoMask: Generative Masked Modeling of 3D Human Motions
MoMask: Generative Masked Modeling of 3D Human Motions
Chuan Guo
Yuxuan Mu
Muhammad Gohar Javed
Sen Wang
Li Cheng
VGen
105
145
0
29 Nov 2023
Do text-free diffusion models learn discriminative visual
  representations?
Do text-free diffusion models learn discriminative visual representations?
Soumik Mukhopadhyay
M. Gwilliam
Yosuke Yamaguchi
Vatsal Agarwal
Namitha Padmanabhan
Archana Swaminathan
Dinesh Manocha
Abhinav Shrivastava
DiffM
136
14
1
29 Nov 2023
SODA: Bottleneck Diffusion Models for Representation Learning
SODA: Bottleneck Diffusion Models for Representation Learning
Drew A. Hudson
Daniel Zoran
Mateusz Malinowski
Andrew Kyle Lampinen
Andrew Jaegle
James L. McClelland
Loic Matthey
Felix Hill
Alexander Lerchner
DiffM
108
56
0
29 Nov 2023
DiG-IN: Diffusion Guidance for Investigating Networks -- Uncovering
  Classifier Differences Neuron Visualisations and Visual Counterfactual
  Explanations
DiG-IN: Diffusion Guidance for Investigating Networks -- Uncovering Classifier Differences Neuron Visualisations and Visual Counterfactual Explanations
Maximilian Augustin
Yannic Neuhaus
Matthias Hein
DiffM
113
5
0
29 Nov 2023
PillarNeSt: Embracing Backbone Scaling and Pretraining for Pillar-based
  3D Object Detection
PillarNeSt: Embracing Backbone Scaling and Pretraining for Pillar-based 3D Object Detection
Weixin Mao
Tiancai Wang
Diankun Zhang
Junjie Yan
Osamu Yoshie
3DPC
77
8
0
29 Nov 2023
Continual Self-supervised Learning: Towards Universal Multi-modal
  Medical Data Representation Learning
Continual Self-supervised Learning: Towards Universal Multi-modal Medical Data Representation Learning
Yiwen Ye
Yutong Xie
Jianpeng Zhang
Ziyang Chen
Qi Wu
Yong-quan Xia
CLL
64
19
0
29 Nov 2023
LanGWM: Language Grounded World Model
LanGWM: Language Grounded World Model
Rudra P. K. Poudel
Harit Pandya
Chao Zhang
Roberto Cipolla
91
5
0
29 Nov 2023
Synchronizing Vision and Language: Bidirectional Token-Masking
  AutoEncoder for Referring Image Segmentation
Synchronizing Vision and Language: Bidirectional Token-Masking AutoEncoder for Referring Image Segmentation
Minhyeok Lee
Dogyoon Lee
Jungho Lee
Suhwan Cho
Heeseung Choi
Ig-Jae Kim
Sangyoun Lee
72
0
0
29 Nov 2023
C3Net: Compound Conditioned ControlNet for Multimodal Content Generation
C3Net: Compound Conditioned ControlNet for Multimodal Content Generation
Juntao Zhang
Yuehuai Liu
Yu-Wing Tai
Chi-Keung Tang
DiffM
78
5
0
29 Nov 2023
Efficient Stitchable Task Adaptation
Efficient Stitchable Task Adaptation
Haoyu He
Zizheng Pan
Jing Liu
Jianfei Cai
Bohan Zhuang
132
3
0
29 Nov 2023
Meta Co-Training: Two Views are Better than One
Meta Co-Training: Two Views are Better than One
Jay C. Rothenberger
Dimitrios I. Diochnos
VLM
168
3
0
29 Nov 2023
E-ViLM: Efficient Video-Language Model via Masked Video Modeling with
  Semantic Vector-Quantized Tokenizer
E-ViLM: Efficient Video-Language Model via Masked Video Modeling with Semantic Vector-Quantized Tokenizer
Jacob Zhiyuan Fang
Skyler Zheng
Vasu Sharma
Robinson Piramuthu
VLM
161
0
0
28 Nov 2023
End-to-End Temporal Action Detection with 1B Parameters Across 1000
  Frames
End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames
Shuming Liu
Chen-Da Liu-Zhang
Chen Zhao
Guohao Li
121
29
0
28 Nov 2023
BIM: Block-Wise Self-Supervised Learning with Masked Image Modeling
BIM: Block-Wise Self-Supervised Learning with Masked Image Modeling
Yixuan Luo
Mengye Ren
Sai Qian Zhang
69
0
0
28 Nov 2023
No Representation Rules Them All in Category Discovery
No Representation Rules Them All in Category Discovery
S. Vaze
Andrea Vedaldi
Andrew Zisserman
OOD
92
36
0
28 Nov 2023
LLaFS: When Large Language Models Meet Few-Shot Segmentation
LLaFS: When Large Language Models Meet Few-Shot Segmentation
Lanyun Zhu
Tianrun Chen
Deyi Ji
Jieping Ye
Jun Liu
VLM
122
42
0
28 Nov 2023
Large Model Based Referring Camouflaged Object Detection
Large Model Based Referring Camouflaged Object Detection
Shupeng Cheng
Ge-Peng Ji
Pengda Qin
Deng-Ping Fan
Bowen Zhou
Peng Xu
ObjD
64
8
0
28 Nov 2023
Rescuing referral failures during automated diagnosis of domain-shifted
  medical images
Rescuing referral failures during automated diagnosis of domain-shifted medical images
Anuj Srivastava
Karm Patel
Pradeep Shenoy
D. Sridharan
OOD
73
0
0
28 Nov 2023
MultiModal-Learning for Predicting Molecular Properties: A Framework
  Based on Image and Graph Structures
MultiModal-Learning for Predicting Molecular Properties: A Framework Based on Image and Graph Structures
Zhuoyuan Wang
Jiacong Mi
Shan Lu
Jieyue He
67
2
0
28 Nov 2023
HandyPriors: Physically Consistent Perception of Hand-Object
  Interactions with Differentiable Priors
HandyPriors: Physically Consistent Perception of Hand-Object Interactions with Differentiable Priors
Shutong Zhang
Yi-Ling Qiao
Guanglei Zhu
Eric Heiden
Dylan Turpin
Jingzhou Liu
Ming-Chyuan Lin
Miles Macklin
Animesh Garg
86
2
0
28 Nov 2023
CLAP: Isolating Content from Style through Contrastive Learning with Augmented Prompts
CLAP: Isolating Content from Style through Contrastive Learning with Augmented Prompts
Yichao Cai
Yuhang Liu
Zhen Zhang
Javen Qinfeng Shi
CLIPVLM
159
8
0
28 Nov 2023
Making Self-supervised Learning Robust to Spurious Correlation via
  Learning-speed Aware Sampling
Making Self-supervised Learning Robust to Spurious Correlation via Learning-speed Aware Sampling
Weicheng Zhu
Sheng Liu
C. Fernandez‐Granda
N. Razavian
127
1
0
27 Nov 2023
Small and Dim Target Detection in IR Imagery: A Review
Small and Dim Target Detection in IR Imagery: A Review
Nikhil Kumar
Pravendra Singh
ObjD
82
4
0
27 Nov 2023
Diffusion-TTA: Test-time Adaptation of Discriminative Models via
  Generative Feedback
Diffusion-TTA: Test-time Adaptation of Discriminative Models via Generative Feedback
Mihir Prabhudesai
Tsung-Wei Ke
Alexander C. Li
Deepak Pathak
Katerina Fragkiadaki
TTA
84
15
0
27 Nov 2023
ViT-Lens: Towards Omni-modal Representations
ViT-Lens: Towards Omni-modal Representations
Weixian Lei
Yixiao Ge
Kun Yi
Jianfeng Zhang
Difei Gao
Dylan Sun
Yuying Ge
Ying Shan
Mike Zheng Shou
99
20
0
27 Nov 2023
Previous
123...484950...949596
Next