ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.06377
  4. Cited By
Masked Autoencoders Are Scalable Vision Learners
v1v2v3 (latest)

Masked Autoencoders Are Scalable Vision Learners

11 November 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
    ViTTPM
ArXiv (abs)PDFHTML

Papers citing "Masked Autoencoders Are Scalable Vision Learners"

50 / 4,779 papers shown
Title
Rethinking Transformers Pre-training for Multi-Spectral Satellite
  Imagery
Rethinking Transformers Pre-training for Multi-Spectral Satellite Imagery
Mubashir Noman
Muzammal Naseer
Hisham Cholakkal
Rao Muhammad Anwar
Salman Khan
Fahad Shahbaz Khan
ViT
100
46
0
08 Mar 2024
Fine-tuning a Multiple Instance Learning Feature Extractor with Masked
  Context Modelling and Knowledge Distillation
Fine-tuning a Multiple Instance Learning Feature Extractor with Masked Context Modelling and Knowledge Distillation
Juan Pisula
Katarzyna Bozek
66
2
0
08 Mar 2024
Spatiotemporal Predictive Pre-training for Robotic Motor Control
Spatiotemporal Predictive Pre-training for Robotic Motor Control
Jiange Yang
Bei Liu
Jianlong Fu
Bocheng Pan
Gangshan Wu
Limin Wang
108
12
0
08 Mar 2024
Tracking Meets LoRA: Faster Training, Larger Model, Stronger Performance
Tracking Meets LoRA: Faster Training, Larger Model, Stronger Performance
Liting Lin
Heng Fan
Zhipeng Zhang
Yaowei Wang
Yong-mei Xu
Haibin Ling
129
36
0
08 Mar 2024
Denoising Autoregressive Representation Learning
Denoising Autoregressive Representation Learning
Yazhe Li
J. Bornschein
Ting Chen
DiffM
84
4
0
08 Mar 2024
Evaluating Text-to-Image Generative Models: An Empirical Study on Human
  Image Synthesis
Evaluating Text-to-Image Generative Models: An Empirical Study on Human Image Synthesis
Mu-Hwa Chen
Yi Liu
Jian Yi
Changran Xu
Qiuxia Lai
Hongliang Wang
Tsung-Yi Ho
Qiang Xu
EGVM
84
10
0
08 Mar 2024
MedFLIP: Medical Vision-and-Language Self-supervised Fast Pre-Training
  with Masked Autoencoder
MedFLIP: Medical Vision-and-Language Self-supervised Fast Pre-Training with Masked Autoencoder
Lei Li
Tianfang Zhang
Xinglin Zhang
Jiaqi Liu
Bingqi Ma
Yan-chun Luo
Tao Chen
MedIm
81
0
0
07 Mar 2024
Discriminative Sample-Guided and Parameter-Efficient Feature Space
  Adaptation for Cross-Domain Few-Shot Learning
Discriminative Sample-Guided and Parameter-Efficient Feature Space Adaptation for Cross-Domain Few-Shot Learning
Rashindrie Perera
Saman K. Halgamuge
99
2
0
07 Mar 2024
ACC-ViT : Atrous Convolution's Comeback in Vision Transformers
ACC-ViT : Atrous Convolution's Comeback in Vision Transformers
Nabil Ibtehaz
Ning Yan
Masood S. Mortazavi
Daisuke Kihara
ViT
97
3
0
07 Mar 2024
ComFe: An Interpretable Head for Vision Transformers
ComFe: An Interpretable Head for Vision Transformers
Evelyn J. Mannix
H. Bondell
Howard Bondell
VLMViT
99
1
0
07 Mar 2024
LoDisc: Learning Global-Local Discriminative Features for
  Self-Supervised Fine-Grained Visual Recognition
LoDisc: Learning Global-Local Discriminative Features for Self-Supervised Fine-Grained Visual Recognition
Jialu Shi
Zhiqiang Wei
Jie Nie
Lei Huang
SSL
91
0
0
06 Mar 2024
On the Effectiveness of Distillation in Mitigating Backdoors in
  Pre-trained Encoder
On the Effectiveness of Distillation in Mitigating Backdoors in Pre-trained Encoder
Tingxu Han
Shenghan Huang
Ziqi Ding
Weisong Sun
Yebo Feng
...
Hanwei Qian
Cong Wu
Quanjun Zhang
Yang Liu
Zhenyu Chen
54
8
0
06 Mar 2024
Popeye: A Unified Visual-Language Model for Multi-Source Ship Detection
  from Remote Sensing Imagery
Popeye: A Unified Visual-Language Model for Multi-Source Ship Detection from Remote Sensing Imagery
Wei Zhang
Miaoxin Cai
Tong Zhang
Guoqiang Lei
Zhuang Yin
Xuerui Mao
76
8
0
06 Mar 2024
Multi-Grained Cross-modal Alignment for Learning Open-vocabulary
  Semantic Segmentation from Text Supervision
Multi-Grained Cross-modal Alignment for Learning Open-vocabulary Semantic Segmentation from Text Supervision
Yajie Liu
Pu Ge
Qingjie Liu
Di Huang
125
2
0
06 Mar 2024
DPOT: Auto-Regressive Denoising Operator Transformer for Large-Scale PDE
  Pre-Training
DPOT: Auto-Regressive Denoising Operator Transformer for Large-Scale PDE Pre-Training
Zhongkai Hao
Chang Su
Songming Liu
Julius Berner
Chengyang Ying
Hang Su
A. Anandkumar
Jian Song
Jun Zhu
AI4TSAI4CE
137
37
0
06 Mar 2024
World Models for Autonomous Driving: An Initial Survey
World Models for Autonomous Driving: An Initial Survey
Yanchen Guan
Haicheng Liao
Zhenning Li
Jia Hu
Runze Yuan
Yunjian Li
Guohui Zhang
Chengzhong Xu
160
43
0
05 Mar 2024
Enhancing Vision-Language Pre-training with Rich Supervisions
Enhancing Vision-Language Pre-training with Rich Supervisions
Yuan Gao
Kunyu Shi
Pengkai Zhu
Edouard Belval
Oren Nuriel
Srikar Appalaraju
Shabnam Ghadar
Vijay Mahadevan
Zhuowen Tu
Stefano Soatto
VLMCLIP
168
12
0
05 Mar 2024
HeAR -- Health Acoustic Representations
HeAR -- Health Acoustic Representations
Sebastien Baur
Zaid Nabulsi
Wei-Hung Weng
Jake Garrison
Louis Blankemeier
...
Shwetak N. Patel
S. Shetty
Shruthi Prabhakara
Monde Muyoyeta
Diego Ardila
LM&MA
55
14
0
04 Mar 2024
Differentially Private Representation Learning via Image Captioning
Differentially Private Representation Learning via Image Captioning
Tom Sander
Yaodong Yu
Maziar Sanjabi
Alain Durmus
Yi-An Ma
Kamalika Chaudhuri
Chuan Guo
106
4
0
04 Mar 2024
Self-Supervised Facial Representation Learning with Facial Region
  Awareness
Self-Supervised Facial Representation Learning with Facial Region Awareness
Zheng Gao
Ioannis Patras
SSL
93
12
0
04 Mar 2024
VTG-GPT: Tuning-Free Zero-Shot Video Temporal Grounding with GPT
VTG-GPT: Tuning-Free Zero-Shot Video Temporal Grounding with GPT
Yifang Xu
Yunzhuo Sun
Zien Xie
Benxiang Zhai
Sidan Du
78
7
0
04 Mar 2024
HyperPredict: Estimating Hyperparameter Effects for Instance-Specific
  Regularization in Deformable Image Registration
HyperPredict: Estimating Hyperparameter Effects for Instance-Specific Regularization in Deformable Image Registration
Aisha L. Shuaibu
Ivor J. A. Simpson
82
1
0
04 Mar 2024
xT: Nested Tokenization for Larger Context in Large Images
xT: Nested Tokenization for Larger Context in Large Images
Ritwik Gupta
Shufan Li
Tyler Lixuan Zhu
Jitendra Malik
Trevor Darrell
K. Mangalam
ViT
76
6
0
04 Mar 2024
TopicDiff: A Topic-enriched Diffusion Approach for Multimodal
  Conversational Emotion Detection
TopicDiff: A Topic-enriched Diffusion Approach for Multimodal Conversational Emotion Detection
Jiamin Luo
Jingjing Wang
Guodong Zhou
79
1
0
04 Mar 2024
NeuSpeech: Decode Neural signal as Speech
NeuSpeech: Decode Neural signal as Speech
Yiqian Yang
Yiqun Duan
Qiang Zhang
Hyejeong Jo
Jinni Zhou
Won Hee Lee
Renjing Xu
Hui Xiong
88
6
0
04 Mar 2024
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
Vision-RWKV: Efficient and Scalable Visual Perception with RWKV-Like Architectures
Yuchen Duan
Weiyun Wang
Zhe Chen
Xizhou Zhu
Lewei Lu
Tong Lu
Yu Qiao
Hongsheng Li
Jifeng Dai
Wenhai Wang
ViT
95
50
0
04 Mar 2024
Kick Back & Relax++: Scaling Beyond Ground-Truth Depth with SlowTV &
  CribsTV
Kick Back & Relax++: Scaling Beyond Ground-Truth Depth with SlowTV & CribsTV
Jaime Spencer
Chris Russell
Simon Hadfield
Richard Bowden
MDE
97
7
0
03 Mar 2024
InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding
InfiMM-HD: A Leap Forward in High-Resolution Multimodal Understanding
Haogeng Liu
Quanzeng You
Xiaotian Han
Yiqi Wang
Bohan Zhai
Yongfei Liu
Yunzhe Tao
Huaibo Huang
Ran He
Hongxia Yang
MLLM
77
10
0
03 Mar 2024
Dynamic Adapter Meets Prompt Tuning: Parameter-Efficient Transfer
  Learning for Point Cloud Analysis
Dynamic Adapter Meets Prompt Tuning: Parameter-Efficient Transfer Learning for Point Cloud Analysis
Xin Zhou
Dingkang Liang
Wei Xu
Xingkui Zhu
Yihan Xu
Zhikang Zou
Xiang Bai
94
28
0
03 Mar 2024
LUM-ViT: Learnable Under-sampling Mask Vision Transformer for Bandwidth
  Limited Optical Signal Acquisition
LUM-ViT: Learnable Under-sampling Mask Vision Transformer for Bandwidth Limited Optical Signal Acquisition
Lingfeng Liu
Dong Ni
Hangjie Yuan
ViT
95
0
0
03 Mar 2024
Learn Suspected Anomalies from Event Prompts for Video Anomaly Detection
Learn Suspected Anomalies from Event Prompts for Video Anomaly Detection
Chenchen Tao
Chong Wang
Yuexian Zou
Xiaohao Peng
Xiaogang Xu
Jiangbo Qian
83
3
0
02 Mar 2024
BootTOD: Bootstrap Task-oriented Dialogue Representations by Aligning
  Diverse Responses
BootTOD: Bootstrap Task-oriented Dialogue Representations by Aligning Diverse Responses
Weihao Zeng
Keqing He
Yejie Wang
Dayuan Fu
Weiran Xu
74
0
0
02 Mar 2024
Feature Alignment: Rethinking Efficient Active Learning via Proxy in the
  Context of Pre-trained Models
Feature Alignment: Rethinking Efficient Active Learning via Proxy in the Context of Pre-trained Models
Ziting Wen
Oscar Pizarro
Stefan B. Williams
64
0
0
02 Mar 2024
Can Transformers Capture Spatial Relations between Objects?
Can Transformers Capture Spatial Relations between Objects?
Chuan Wen
Dinesh Jayaraman
Yang Gao
ViT
63
5
0
01 Mar 2024
Rethinking cluster-conditioned diffusion models
Rethinking cluster-conditioned diffusion models
Nikolas Adaloglou
Tim Kaiser
Félix D. P. Michels
M. Kollmann
VLM
81
3
0
01 Mar 2024
VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks
VisionLLaMA: A Unified LLaMA Backbone for Vision Tasks
Xiangxiang Chu
Jianlin Su
Bo Zhang
Chunhua Shen
MLLM
108
12
0
01 Mar 2024
Learning and Leveraging World Models in Visual Representation Learning
Learning and Leveraging World Models in Visual Representation Learning
Q. Garrido
Mahmoud Assran
Nicolas Ballas
Adrien Bardes
Laurent Najman
Yann LeCun
SSL
107
30
0
01 Mar 2024
Data-efficient Event Camera Pre-training via Disentangled Masked
  Modeling
Data-efficient Event Camera Pre-training via Disentangled Masked Modeling
Zhenpeng Huang
Chao Li
Hao Chen
Yongjian Deng
Yifeng Geng
Limin Wang
77
2
0
01 Mar 2024
Revisiting Disentanglement in Downstream Tasks: A Study on Its Necessity
  for Abstract Visual Reasoning
Revisiting Disentanglement in Downstream Tasks: A Study on Its Necessity for Abstract Visual Reasoning
Ruiqian Nai
Zixin Wen
Ji Li
Yuanzhi Li
Yang Gao
96
2
0
01 Mar 2024
ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text
  Detection and Spotting
ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting
Chen Duan
Pei Fu
Shan Guo
Qianyi Jiang
Xiaoming Wei
VLM
97
5
0
01 Mar 2024
Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language
  Pre-training
Semantics-enhanced Cross-modal Masked Image Modeling for Vision-Language Pre-training
Haowei Liu
Yaya Shi
Haiyang Xu
Chunfen Yuan
Qinghao Ye
...
Mingshi Yan
Ji Zhang
Fei Huang
Bing Li
Weiming Hu
VLM
96
0
0
01 Mar 2024
MaskLRF: Self-supervised Pretraining via Masked Autoencoding of Local
  Reference Frames for Rotation-invariant 3D Point Set Analysis
MaskLRF: Self-supervised Pretraining via Masked Autoencoding of Local Reference Frames for Rotation-invariant 3D Point Set Analysis
Takahiko Furuya
3DPC
99
2
0
01 Mar 2024
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Tsai-Shien Chen
Aliaksandr Siarohin
Willi Menapace
Ekaterina Deyneka
Hsiang-wei Chao
...
Yuwei Fang
Hsin-Ying Lee
Jian Ren
Ming-Hsuan Yang
Sergey Tulyakov
VGen
168
211
0
29 Feb 2024
Humanoid Locomotion as Next Token Prediction
Humanoid Locomotion as Next Token Prediction
Ilija Radosavovic
Bike Zhang
Baifeng Shi
Jathushan Rajasegaran
Sarthak Kamat
Trevor Darrell
Koushil Sreenath
Jitendra Malik
LM&Ro
96
67
0
29 Feb 2024
Deep Learning for Cross-Domain Data Fusion in Urban Computing: Taxonomy,
  Advances, and Outlook
Deep Learning for Cross-Domain Data Fusion in Urban Computing: Taxonomy, Advances, and Outlook
Xingchen Zou
Yibo Yan
Xixuan Hao
Yuehong Hu
Haomin Wen
...
Junbo Zhang
Yong Li
Tianrui Li
Yu Zheng
Yuxuan Liang
HAIAI4TS
106
45
0
29 Feb 2024
Mirage: Cross-Embodiment Zero-Shot Policy Transfer with Cross-Painting
Mirage: Cross-Embodiment Zero-Shot Policy Transfer with Cross-Painting
Lawrence Yunliang Chen
Kush Hari
K. Dharmarajan
Chenfeng Xu
Quan Vuong
Ken Goldberg
150
23
0
29 Feb 2024
BigGait: Learning Gait Representation You Want by Large Vision Models
BigGait: Learning Gait Representation You Want by Large Vision Models
Dingqiang Ye
Chao Fan
Jingzhe Ma
Xiaoming Liu
Shiqi Yu
CVBMSLR
130
22
0
29 Feb 2024
VideoMAC: Video Masked Autoencoders Meet ConvNets
VideoMAC: Video Masked Autoencoders Meet ConvNets
Gensheng Pei
Tao Chen
XiRuo Jiang
Huafeng Liu
Zeren Sun
Yazhou Yao
VGen
110
10
0
29 Feb 2024
A Simple yet Effective Network based on Vision Transformer for
  Camouflaged Object and Salient Object Detection
A Simple yet Effective Network based on Vision Transformer for Camouflaged Object and Salient Object Detection
Chao Hao
Zitong Yu
Xin Liu
Jun Xu
Huanjing Yue
Jingyu Yang
ViT
130
7
0
29 Feb 2024
SwitchLight: Co-design of Physics-driven Architecture and Pre-training
  Framework for Human Portrait Relighting
SwitchLight: Co-design of Physics-driven Architecture and Pre-training Framework for Human Portrait Relighting
Hoon Kim
Minje Jang
Wonjun Yoon
Jisoo Lee
Donghyun Na
Sanghyun Woo
AI4CE
108
24
0
29 Feb 2024
Previous
123...394041...949596
Next