ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.06377
  4. Cited By
Masked Autoencoders Are Scalable Vision Learners
v1v2v3 (latest)

Masked Autoencoders Are Scalable Vision Learners

11 November 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
    ViTTPM
ArXiv (abs)PDFHTML

Papers citing "Masked Autoencoders Are Scalable Vision Learners"

50 / 4,777 papers shown
Title
Think Twice Before Recognizing: Large Multimodal Models for General
  Fine-grained Traffic Sign Recognition
Think Twice Before Recognizing: Large Multimodal Models for General Fine-grained Traffic Sign Recognition
Yaozong Gan
Guang Li
Ren Togo
Keisuke Maeda
Takahiro Ogawa
Miki Haseyama
89
1
0
03 Sep 2024
An Examination of Offline-Trained Encoders in Vision-Based Deep
  Reinforcement Learning for Autonomous Driving
An Examination of Offline-Trained Encoders in Vision-Based Deep Reinforcement Learning for Autonomous Driving
S. Mohammed
Alp Argun
Nicolas Bonnotte
Gerd Ascheid
OffRL
74
0
0
02 Sep 2024
Backdoor Defense through Self-Supervised and Generative Learning
Backdoor Defense through Self-Supervised and Generative Learning
Ivan Sabolić
Ivan Grubišić
Siniša Šegvić
AAML
113
0
0
02 Sep 2024
Understanding Multimodal Hallucination with Parameter-Free
  Representation Alignment
Understanding Multimodal Hallucination with Parameter-Free Representation Alignment
Yueqian Wang
Jianxin Liang
Yuxuan Wang
Huishuai Zhang
Dongyan Zhao
103
1
0
02 Sep 2024
IVGF: The Fusion-Guided Infrared and Visible General Framework
IVGF: The Fusion-Guided Infrared and Visible General Framework
Fangcen Liu
Chenqiang Gao
Fang Chen
Pengcheng Li
Junjie Guo
Deyu Meng
158
0
0
02 Sep 2024
MaskMol: Knowledge-guided Molecular Image Pre-Training Framework for
  Activity Cliffs
MaskMol: Knowledge-guided Molecular Image Pre-Training Framework for Activity Cliffs
Zhixiang Cheng
Hongxin Xiang
Pengsen Ma
Li Zeng
Xin Jin
...
Yang Deng
Bosheng Song
Xinxin Feng
Changhui Deng
Xiangxiang Zeng
76
0
0
02 Sep 2024
ViRED: Prediction of Visual Relations in Engineering Drawings
ViRED: Prediction of Visual Relations in Engineering Drawings
Chao Gu
Ke Lin
Yiyang Luo
Jiahui Hou
Xiang-Yang Li
78
0
0
02 Sep 2024
Towards Student Actions in Classroom Scenes: New Dataset and Baseline
Towards Student Actions in Classroom Scenes: New Dataset and Baseline
Zhuolin Tan
Chenqiang Gao
Anyong Qin
Ruixin Chen
Tiecheng Song
Feng Yang
Deyu Meng
70
0
0
02 Sep 2024
Affordance-based Robot Manipulation with Flow Matching
Affordance-based Robot Manipulation with Flow Matching
Fan Zhang
Michael Gienger
167
14
0
02 Sep 2024
Self-Supervised Vision Transformers for Writer Retrieval
Self-Supervised Vision Transformers for Writer Retrieval
Tim Raven
Arthur Matei
Gernot A. Fink
ViT
71
1
0
01 Sep 2024
Make Your ViT-based Multi-view 3D Detectors Faster via Token Compression
Make Your ViT-based Multi-view 3D Detectors Faster via Token Compression
Dingyuan Zhang
Dingkang Liang
Zichang Tan
Xiaoqing Ye
Cheng Zhang
Jingdong Wang
Xiang Bai
ViT
107
2
0
01 Sep 2024
FLUX that Plays Music
FLUX that Plays Music
Zhengcong Fei
Mingyuan Fan
Changqian Yu
Junshi Huang
138
9
0
01 Sep 2024
Geospatial foundation models for image analysis: evaluating and
  enhancing NASA-IBM Prithvi's domain adaptability
Geospatial foundation models for image analysis: evaluating and enhancing NASA-IBM Prithvi's domain adaptability
Chia-Yu Hsu
Wenwen Li
Sizhe Wang
78
14
0
31 Aug 2024
RI-MAE: Rotation-Invariant Masked AutoEncoders for Self-Supervised Point
  Cloud Representation Learning
RI-MAE: Rotation-Invariant Masked AutoEncoders for Self-Supervised Point Cloud Representation Learning
Kunming Su
Qiuxia Wu
Panpan Cai
Xiaogang Zhu
Xuequan Lu
Zhiyong Wang
Kun Hu
3DPC
84
4
0
31 Aug 2024
Vote&Mix: Plug-and-Play Token Reduction for Efficient Vision Transformer
Vote&Mix: Plug-and-Play Token Reduction for Efficient Vision Transformer
Shuai Peng
Di Fu
Baole Wei
Yong Cao
Liangcai Gao
Zhi Tang
ViT
74
1
0
30 Aug 2024
BTMuda: A Bi-level Multi-source unsupervised domain adaptation framework
  for breast cancer diagnosis
BTMuda: A Bi-level Multi-source unsupervised domain adaptation framework for breast cancer diagnosis
Yuxiang Yang
Xinyi Zeng
Pinxian Zeng
Binyu Yan
Xi Wu
Jiliu Zhou
Yan Wang
OOD
59
2
0
30 Aug 2024
ConDense: Consistent 2D/3D Pre-training for Dense and Sparse Features
  from Multi-View Images
ConDense: Consistent 2D/3D Pre-training for Dense and Sparse Features from Multi-View Images
Xiaoshuai Zhang
Zhicheng Wang
Howard Zhou
Soham Ghosh
Danushen Gnanapragasam
Varun Jampani
Hao Su
Leonidas Guibas
DD
91
5
0
30 Aug 2024
A Survey of the Self Supervised Learning Mechanisms for Vision Transformers
A Survey of the Self Supervised Learning Mechanisms for Vision Transformers
Asifullah Khan
A. Sohail
Mustansar Fiaz
Mehdi Hassan
Tariq Habib Afridi
...
Muhammad Zaigham Zaheer
Kamran Ali
Tangina Sultana
Ziaurrehman Tanoli
Naeem Akhter
274
5
0
30 Aug 2024
Towards Modality-agnostic Label-efficient Segmentation with
  Entropy-Regularized Distribution Alignment
Towards Modality-agnostic Label-efficient Segmentation with Entropy-Regularized Distribution Alignment
Liyao Tang
Zhe Chen
Shanshan Zhao
Chaoyue Wang
Dacheng Tao
107
0
0
29 Aug 2024
Adapting Vision-Language Models to Open Classes via Test-Time Prompt
  Tuning
Adapting Vision-Language Models to Open Classes via Test-Time Prompt Tuning
Zhengqing Gao
Xiang Ao
Xu-Yao Zhang
Cheng-Lin Liu
VLMVPVLM
89
0
0
29 Aug 2024
MICDrop: Masking Image and Depth Features via Complementary Dropout for
  Domain-Adaptive Semantic Segmentation
MICDrop: Masking Image and Depth Features via Complementary Dropout for Domain-Adaptive Semantic Segmentation
Linyan Yang
Lukas Hoyer
Mark Weber
Tobias Fischer
Dengxin Dai
Laura Leal-Taixé
Marc Pollefeys
Daniel Cremers
Luc Van Gool
MDE
103
4
0
29 Aug 2024
Audio xLSTMs: Learning Self-Supervised Audio Representations with xLSTMs
Audio xLSTMs: Learning Self-Supervised Audio Representations with xLSTMs
Sarthak Yadav
Sergios Theodoridis
Zheng-Hua Tan
133
3
0
29 Aug 2024
A Computational Framework for Modeling Emergence of Color Vision in the Human Brain
A Computational Framework for Modeling Emergence of Color Vision in the Human Brain
Atsunobu Kotani
Ren Ng
78
0
0
29 Aug 2024
A Simple and Generalist Approach for Panoptic Segmentation
A Simple and Generalist Approach for Panoptic Segmentation
Nedyalko Prisadnikov
Wouter Van Gansbeke
Danda Pani Paudel
Luc Van Gool
VLM
116
0
0
29 Aug 2024
Does Data-Efficient Generalization Exacerbate Bias in Foundation Models?
Does Data-Efficient Generalization Exacerbate Bias in Foundation Models?
Dilermando Queiroz
Anderson Carlos
Maíra Fatoretto
Luis Filipe Nakayama
André Anjos
Lilian Berton
138
0
0
28 Aug 2024
wav2pos: Sound Source Localization using Masked Autoencoders
wav2pos: Sound Source Localization using Masked Autoencoders
Axel Berg
Jens Gulin
Mark O'Connor
Chuteng Zhou
Karl Åström
Magnus Oskarsson
64
2
0
28 Aug 2024
GANs Conditioning Methods: A Survey
GANs Conditioning Methods: A Survey
Anis Bourou
Valérie Mezger
Auguste Genovesio
EGVMAI4CE
138
2
0
28 Aug 2024
Hierarchical Visual Categories Modeling: A Joint Representation Learning
  and Density Estimation Framework for Out-of-Distribution Detection
Hierarchical Visual Categories Modeling: A Joint Representation Learning and Density Estimation Framework for Out-of-Distribution Detection
Jinglun Li
Xinyu Zhou
Pinxue Guo
Yixuan Sun
Yiwen Huang
Weifeng Ge
Wenqiang Zhang
93
2
0
28 Aug 2024
ClimDetect: A Benchmark Dataset for Climate Change Detection and Attribution
ClimDetect: A Benchmark Dataset for Climate Change Detection and Attribution
Sungduk Yu
Brian L. White
Anahita Bhiwandiwalla
Musashi Hinck
Matthew Lyle Olson
Tung Nguyen
Vasudev Lal
Tung Nguyen
Vasudev Lal
112
0
0
28 Aug 2024
GenRec: Unifying Video Generation and Recognition with Diffusion Models
GenRec: Unifying Video Generation and Recognition with Diffusion Models
Zejia Weng
Xitong Yang
Zhen Xing
Zuxuan Wu
Yu-Gang Jiang
VGenDiffM
106
7
0
27 Aug 2024
Multi-Modal Instruction-Tuning Small-Scale Language-and-Vision Assistant
  for Semiconductor Electron Micrograph Analysis
Multi-Modal Instruction-Tuning Small-Scale Language-and-Vision Assistant for Semiconductor Electron Micrograph Analysis
Sakhinana Sagar Srinivas
Geethan Sannidhi
Venkataramana Runkana
104
1
0
27 Aug 2024
A Preliminary Exploration Towards General Image Restoration
A Preliminary Exploration Towards General Image Restoration
Xiangtao Kong
Jinjin Gu
Yihao Liu
Wenlong Zhang
Xiangyu Chen
Yu Qiao
Chao Dong
DiffM
90
3
0
27 Aug 2024
CVPT: Cross-Attention help Visual Prompt Tuning adapt visual task
CVPT: Cross-Attention help Visual Prompt Tuning adapt visual task
Lingyun Huang
Jianxu Mao
Yaonan Wang
Junfei Yi
Ziming Tao
VLMVPVLM
88
2
0
27 Aug 2024
Unsupervised-to-Online Reinforcement Learning
Unsupervised-to-Online Reinforcement Learning
Junsu Kim
Seohong Park
Sergey Levine
OnRL
100
5
0
27 Aug 2024
Pre-training Everywhere: Parameter-Efficient Fine-Tuning for Medical Image Analysis via Target Parameter Pre-training
Pre-training Everywhere: Parameter-Efficient Fine-Tuning for Medical Image Analysis via Target Parameter Pre-training
Xingliang Lei
Yiwen Ye
Zhisong Wang
Ziyang Chen
Minglei Shu
Weidong (Tom) Cai
Yanning Zhang
Yong-quan Xia
106
1
0
27 Aug 2024
Sequence-aware Pre-training for Echocardiography Probe Movement Guidance
Sequence-aware Pre-training for Echocardiography Probe Movement Guidance
Haojun Jiang
Teng Wang
Zhenguo Sun
Yulin Wang
Yang Yue
...
Ning Jia
Meng Li
Shaqi Luo
Shiji Song
Gao Huang
64
1
0
27 Aug 2024
Advancing Humanoid Locomotion: Mastering Challenging Terrains with
  Denoising World Model Learning
Advancing Humanoid Locomotion: Mastering Challenging Terrains with Denoising World Model Learning
Xinyang Gu
Yen-Jen Wang
Xiang Zhu
Chengming Shi
Yanjiang Guo
Yichen Liu
Jianyu Chen
97
45
0
26 Aug 2024
GR-MG: Leveraging Partially Annotated Data via Multi-Modal Goal
  Conditioned Policy
GR-MG: Leveraging Partially Annotated Data via Multi-Modal Goal Conditioned Policy
Peiyan Li
Hongtao Wu
Yan Huang
Chilam Cheang
Liang Wang
Tao Kong
VGen
95
13
0
26 Aug 2024
Affine steerers for structured keypoint description
Affine steerers for structured keypoint description
Georg Bökman
Johan Edstedt
Michael Felsberg
Fredrik Kahl
LLMSV
74
2
0
26 Aug 2024
Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep
  Learning
Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning
Wei An
Xiao Bi
Guanting Chen
Shanhuang Chen
Chengqi Deng
...
Chenggang Zhao
Yao Zhao
Shangyan Zhou
Shunfeng Zhou
Yuheng Zou
62
7
0
26 Aug 2024
GenFormer -- Generated Images are All You Need to Improve Robustness of
  Transformers on Small Datasets
GenFormer -- Generated Images are All You Need to Improve Robustness of Transformers on Small Datasets
Sven Oehri
Nikolas Ebert
Ahmed Abdullah
Didier Stricker
Oliver Wasenmüller
ViT
88
6
0
26 Aug 2024
Dual-Path Adversarial Lifting for Domain Shift Correction in Online
  Test-time Adaptation
Dual-Path Adversarial Lifting for Domain Shift Correction in Online Test-time Adaptation
Yushun Tang
Shuoshuo Chen
Zhihe Lu
Xinchao Wang
Zhihai He
110
1
0
26 Aug 2024
An Embedding is Worth a Thousand Noisy Labels
An Embedding is Worth a Thousand Noisy Labels
Francesco Di Salvo
Sebastian Doerrich
Ines Rieger
Christian Ledig
NoLa
153
0
0
26 Aug 2024
Hierarchical Network Fusion for Multi-Modal Electron Micrograph
  Representation Learning with Foundational Large Language Models
Hierarchical Network Fusion for Multi-Modal Electron Micrograph Representation Learning with Foundational Large Language Models
Sakhinana Sagar Srinivas
Geethan Sannidhi
Venkataramana Runkana
106
0
0
24 Aug 2024
Preliminary Investigations of a Multi-Faceted Robust and Synergistic
  Approach in Semiconductor Electron Micrograph Analysis: Integrating Vision
  Transformers with Large Language and Multimodal Models
Preliminary Investigations of a Multi-Faceted Robust and Synergistic Approach in Semiconductor Electron Micrograph Analysis: Integrating Vision Transformers with Large Language and Multimodal Models
Sakhinana Sagar Srinivas
Geethan Sannidhi
Sreeja Gangasani
Chidaksh Ravuru
Venkataramana Runkana
106
0
0
24 Aug 2024
Can Visual Foundation Models Achieve Long-term Point Tracking?
Can Visual Foundation Models Achieve Long-term Point Tracking?
Görkay Aydemir
Weidi Xie
Fatma Guney
80
8
0
24 Aug 2024
Disentangled Generative Graph Representation Learning
Disentangled Generative Graph Representation Learning
Xinyue Hu
Zhibin Duan
Xinyang Liu
Yuxin Li
Bo Chen
Mingyuan Zhou
126
0
0
24 Aug 2024
SeA: Semantic Adversarial Augmentation for Last Layer Features from
  Unsupervised Representation Learning
SeA: Semantic Adversarial Augmentation for Last Layer Features from Unsupervised Representation Learning
Qi Qian
Yuanhong Xu
Juhua Hu
AAML
134
0
0
23 Aug 2024
A New Era in Computational Pathology: A Survey on Foundation and
  Vision-Language Models
A New Era in Computational Pathology: A Survey on Foundation and Vision-Language Models
Dibaloke Chanda
Milan Aryal
Nasim Yahya Soltani
Masoud Ganji
AI4CEVLM
135
7
0
23 Aug 2024
NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech
  Processing Tasks
NEST: Self-supervised Fast Conformer as All-purpose Seasoning to Speech Processing Tasks
He Huang
Taejin Park
Kunal Dhawan
Ivan Medennikov
Krishna Puvvada
Nithin Rao Koluguri
Weiqing Wang
Jagadeesh Balam
Boris Ginsburg
SSLAI4TS
79
1
0
23 Aug 2024
Previous
123...222324...949596
Next