ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.06377
  4. Cited By
Masked Autoencoders Are Scalable Vision Learners
v1v2v3 (latest)

Masked Autoencoders Are Scalable Vision Learners

11 November 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
    ViTTPM
ArXiv (abs)PDFHTML

Papers citing "Masked Autoencoders Are Scalable Vision Learners"

50 / 4,778 papers shown
Title
Implicit-Zoo: A Large-Scale Dataset of Neural Implicit Functions for 2D
  Images and 3D Scenes
Implicit-Zoo: A Large-Scale Dataset of Neural Implicit Functions for 2D Images and 3D Scenes
Qi Ma
Danda Pani Paudel
E. Konukoglu
Luc Van Gool
108
6
0
25 Jun 2024
Masked Generative Extractor for Synergistic Representation and 3D
  Generation of Point Clouds
Masked Generative Extractor for Synergistic Representation and 3D Generation of Point Clouds
Hongliang Zeng
Ping Zhang
Fang Li
Jiahua Wang
Tingyu Ye
Pengteng Guo
3DPC
112
0
0
25 Jun 2024
MM-SpuBench: Towards Better Understanding of Spurious Biases in
  Multimodal LLMs
MM-SpuBench: Towards Better Understanding of Spurious Biases in Multimodal LLMs
Wenqian Ye
Guangtao Zheng
Yunsheng Ma
Xu Cao
Bolin Lai
James M. Rehg
Aidong Zhang
91
15
0
24 Jun 2024
BrainMAE: A Region-aware Self-supervised Learning Framework for Brain
  Signals
BrainMAE: A Region-aware Self-supervised Learning Framework for Brain Signals
Yifan Yang
Yutong Mao
Xufu Liu
Xiao Liu
64
3
0
24 Jun 2024
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs
Shengbang Tong
Ellis L Brown
Penghao Wu
Sanghyun Woo
Manoj Middepogu
...
Xichen Pan
Austin Wang
Rob Fergus
Yann LeCun
Saining Xie
3DVMLLM
166
377
0
24 Jun 2024
Towards Open Respiratory Acoustic Foundation Models: Pretraining and
  Benchmarking
Towards Open Respiratory Acoustic Foundation Models: Pretraining and Benchmarking
Yuwei Zhang
Tong Xia
Jing Han
Yu Wu
Georgios Rizos
Yang Liu
Mohammed Mosuily
Jagmohan Chauhan
Cecilia Mascolo
AI4CE
71
12
0
23 Jun 2024
CAVM: Conditional Autoregressive Vision Model for Contrast-Enhanced
  Brain Tumor MRI Synthesis
CAVM: Conditional Autoregressive Vision Model for Contrast-Enhanced Brain Tumor MRI Synthesis
Lujun Gui
Chuyang Ye
Tianyi Yan
MedImDiffM
74
2
0
23 Jun 2024
Beyond the Doors of Perception: Vision Transformers Represent Relations
  Between Objects
Beyond the Doors of Perception: Vision Transformers Represent Relations Between Objects
Michael A. Lepori
Alexa R. Tartaglini
Wai Keen Vong
Thomas Serre
Brenden M. Lake
Ellie Pavlick
91
4
0
22 Jun 2024
SAM-EG: Segment Anything Model with Egde Guidance framework for
  efficient Polyp Segmentation
SAM-EG: Segment Anything Model with Egde Guidance framework for efficient Polyp Segmentation
Quoc-Huy Trinh
Hai-Dang Nguyen
Bao-Tram Nguyen Ngoc
Debesh Jha
Ulas Bagci
Minh-Triet Tran
MedIm
67
4
0
21 Jun 2024
Computation-Efficient Semi-Supervised Learning for ECG-based
  Cardiovascular Diseases Detection
Computation-Efficient Semi-Supervised Learning for ECG-based Cardiovascular Diseases Detection
Rushuang Zhou
Zijun Liu
Lei A. Clifton
David Clifton
Kannie W. Y. Chan
Yuan-Ting Zhang
Yining Dong
56
1
0
20 Jun 2024
Two-Stage Depth Enhanced Learning with Obstacle Map For Object
  Navigation
Two-Stage Depth Enhanced Learning with Obstacle Map For Object Navigation
Yanwei Zheng
Shaopu Feng
Bowen Huang
Changrui Li
Xiao Zhang
Dongxiao Yu
121
0
0
20 Jun 2024
Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation
Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation
Jiaming Zhou
Teli Ma
Kun-Yu Lin
Ronghe Qiu
Zifan Wang
Junwei Liang
149
7
0
20 Jun 2024
A Pure Transformer Pretraining Framework on Text-attributed Graphs
A Pure Transformer Pretraining Framework on Text-attributed Graphs
Yu Song
Haitao Mao
Jiachen Xiao
Jingzhe Liu
Zhikai Chen
Wei Jin
Carl Yang
Jiliang Tang
Hui Liu
AI4CE
94
4
0
19 Jun 2024
Liveness Detection in Computer Vision: Transformer-based Self-Supervised
  Learning for Face Anti-Spoofing
Liveness Detection in Computer Vision: Transformer-based Self-Supervised Learning for Face Anti-Spoofing
Arman Keresh
Pakizar Shamoi
72
7
0
19 Jun 2024
Transferable Tactile Transformers for Representation Learning Across
  Diverse Sensors and Tasks
Transferable Tactile Transformers for Representation Learning Across Diverse Sensors and Tasks
Jialiang Zhao
Yuxiang Ma
Lirui Wang
Edward H. Adelson
98
26
0
19 Jun 2024
Conditional score-based diffusion models for solving inverse problems in
  mechanics
Conditional score-based diffusion models for solving inverse problems in mechanics
Agnimitra Dasgupta
Harisankar Ramaswamy
Javier Murgoitio-Esandi
Ken Foo
Runze Li
Qifa Zhou
Brendan Kennedy
Assad A. Oberai
DiffMMedIm
103
4
0
19 Jun 2024
ChangeViT: Unleashing Plain Vision Transformers for Change Detection
ChangeViT: Unleashing Plain Vision Transformers for Change Detection
Duowang Zhu
Xiaohu Huang
Haiyan Huang
Zhenfeng Shao
Q. Cheng
82
9
0
18 Jun 2024
GFM4MPM: Towards Geospatial Foundation Models for Mineral Prospectivity
  Mapping
GFM4MPM: Towards Geospatial Foundation Models for Mineral Prospectivity Mapping
A. Daruna
Vasily Zadorozhnyy
Georgina Lukoczki
Han-Pang Chiu
45
1
0
18 Jun 2024
Cephalometric Landmark Detection across Ages with Prototypical Network
Cephalometric Landmark Detection across Ages with Prototypical Network
Han Wu
Chong Wang
Lanzhuju Mei
Tong Yang
Min Zhu
Dingggang Shen
Zhiming Cui
80
4
0
18 Jun 2024
Semantic Graph Consistency: Going Beyond Patches for Regularizing
  Self-Supervised Vision Transformers
Semantic Graph Consistency: Going Beyond Patches for Regularizing Self-Supervised Vision Transformers
Chaitanya Devaguptapu
Sumukh K. Aithal
Shrinivas Ramasubramanian
Moyuru Yamada
Manohar Kaul
ViT
92
0
0
18 Jun 2024
VIRL: Volume-Informed Representation Learning towards Few-shot
  Manufacturability Estimation
VIRL: Volume-Informed Representation Learning towards Few-shot Manufacturability Estimation
Yu-hsuan Chen
Jonathan Cagan
Levent Burak Kara
87
2
0
18 Jun 2024
Autoregressive Image Generation without Vector Quantization
Autoregressive Image Generation without Vector Quantization
Tianhong Li
Yonglong Tian
He Li
Mingyang Deng
Kaiming He
DiffM
164
238
0
17 Jun 2024
Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of
  99%
Scaling the Codebook Size of VQGAN to 100,000 with a Utilization Rate of 99%
Lei Zhu
Fangyun Wei
Yanye Lu
Dong Chen
VLM
97
40
0
17 Jun 2024
AnyMaker: Zero-shot General Object Customization via Decoupled
  Dual-Level ID Injection
AnyMaker: Zero-shot General Object Customization via Decoupled Dual-Level ID Injection
Lingjie Kong
Kai WU
Xiaobin Hu
Wenhui Han
Jinlong Peng
Chengming Xu
Donghao Luo
Jiangning Zhang
Chengjie Wang
Yanwei Fu
DiffM
74
0
0
17 Jun 2024
DocGenome: An Open Large-scale Scientific Document Benchmark for
  Training and Testing Multi-modal Large Language Models
DocGenome: An Open Large-scale Scientific Document Benchmark for Training and Testing Multi-modal Large Language Models
Renqiu Xia
Song Mao
Xiangchao Yan
Hongbin Zhou
Bo Zhang
...
Yongwei Wang
Bin Wang
Junchi Yan
Fei Wu
Yu Qiao
109
12
0
17 Jun 2024
AnoPatch: Towards Better Consistency in Machine Anomalous Sound
  Detection
AnoPatch: Towards Better Consistency in Machine Anomalous Sound Detection
Anbai Jiang
Bing Han
Zhiqiang Lv
Yufeng Deng
Wei-Qiang Zhang
Xie Chen
Yanmin Qian
Jia Liu
Pingyi Fan
64
3
0
17 Jun 2024
Holistic-Motion2D: Scalable Whole-body Human Motion Generation in 2D
  Space
Holistic-Motion2D: Scalable Whole-body Human Motion Generation in 2D Space
Yuan Wang
Zhao Wang
Junhao Gong
Di Huang
Tong He
...
J. Jiao
Xuetao Feng
Qi Dou
Shixiang Tang
Dan Xu
92
4
0
17 Jun 2024
Relational Learning in Pre-Trained Models: A Theory from Hypergraph
  Recovery Perspective
Relational Learning in Pre-Trained Models: A Theory from Hypergraph Recovery Perspective
Yang Chen
Cong Fang
Zhouchen Lin
Bing Liu
57
1
0
17 Jun 2024
WeatherQA: Can Multimodal Language Models Reason about Severe Weather?
WeatherQA: Can Multimodal Language Models Reason about Severe Weather?
Chengqian Ma
Zhanxiang Hua
Alexandra Anderson-Frey
Vikram Iyer
Xin Liu
Lianhui Qin
106
6
0
17 Jun 2024
Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with
  Instruction Tuning
Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning
Zebang Cheng
Zhi-Qi Cheng
Jun-Yan He
Jingdong Sun
Kai Wang
Yuxiang Lin
Zheng Lian
Xiaojiang Peng
Alexander G. Hauptmann
MLLM
121
40
0
17 Jun 2024
Duoduo CLIP: Efficient 3D Understanding with Multi-View Images
Duoduo CLIP: Efficient 3D Understanding with Multi-View Images
Han-Hung Lee
Yiming Zhang
Angel X. Chang
3DPC
162
4
0
17 Jun 2024
Harnessing Massive Satellite Imagery with Efficient Masked Image Modeling
Harnessing Massive Satellite Imagery with Efficient Masked Image Modeling
Fengxiang Wang
H. Wang
Di Wang
Zonghao Guo
Zhenyu Zhong
Long Lan
Wenjing Yang
Jing Zhang
92
3
0
17 Jun 2024
HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model
HyperSIGMA: Hyperspectral Intelligence Comprehension Foundation Model
Di Wang
Meiqi Hu
Yao Jin
Yuchun Miao
Jiaqi Yang
...
Lefei Zhang
Chen Wu
Di Lin
Dacheng Tao
Liangpei Zhang
164
27
0
17 Jun 2024
Self-supervised Pretraining and Finetuning for Monocular Depth and
  Visual Odometry
Self-supervised Pretraining and Finetuning for Monocular Depth and Visual Odometry
Boris Chidlovskii
L. Antsfeld
MDEViT
86
2
0
16 Jun 2024
ALPS: An Auto-Labeling and Pre-training Scheme for Remote Sensing
  Segmentation With Segment Anything Model
ALPS: An Auto-Labeling and Pre-training Scheme for Remote Sensing Segmentation With Segment Anything Model
Song Zhang
Qingzhong Wang
Junyi Liu
Haoyi Xiong
98
1
0
16 Jun 2024
On the Effectiveness of Supervision in Asymmetric Non-Contrastive
  Learning
On the Effectiveness of Supervision in Asymmetric Non-Contrastive Learning
Jeongheon Oh
Kibok Lee
SSL
73
1
0
16 Jun 2024
ExPLoRA: Parameter-Efficient Extended Pre-Training to Adapt Vision Transformers under Domain Shifts
ExPLoRA: Parameter-Efficient Extended Pre-Training to Adapt Vision Transformers under Domain Shifts
Samar Khanna
Medhanie Irgau
David B. Lobell
Stefano Ermon
VLM
150
6
0
16 Jun 2024
Occam's Razor for Self Supervised Learning: What is Sufficient to Learn
  Good Representations?
Occam's Razor for Self Supervised Learning: What is Sufficient to Learn Good Representations?
Mark Ibrahim
David Klindt
Randall Balestriero
SSL
130
5
1
15 Jun 2024
SemanticMIM: Marring Masked Image Modeling with Semantics Compression
  for General Visual Representation
SemanticMIM: Marring Masked Image Modeling with Semantics Compression for General Visual Representation
Yike Yuan
Huanzhang Dou
Fengjun Guo
Xi Li
102
2
0
15 Jun 2024
PIG: Prompt Images Guidance for Night-Time Scene Parsing
PIG: Prompt Images Guidance for Night-Time Scene Parsing
Zhifeng Xie
Rui Qiu
Sen Wang
Xin Tan
Yuan Xie
Lizhuang Ma
83
2
0
15 Jun 2024
Self Pre-training with Topology- and Spatiality-aware Masked
  Autoencoders for 3D Medical Image Segmentation
Self Pre-training with Topology- and Spatiality-aware Masked Autoencoders for 3D Medical Image Segmentation
Pengfei Gu
Yejia Zhang
Huimin Li
Chaoli Wang
Benlin Liu
MedIm
123
2
0
15 Jun 2024
Self-Supervised Representation Learning with Spatial-Temporal
  Consistency for Sign Language Recognition
Self-Supervised Representation Learning with Spatial-Temporal Consistency for Sign Language Recognition
Weichao Zhao
Wengang Zhou
Hezhen Hu
Min Wang
Houqiang Li
SLR
100
3
0
15 Jun 2024
The BabyView dataset: High-resolution egocentric videos of infants' and
  young children's everyday experiences
The BabyView dataset: High-resolution egocentric videos of infants' and young children's everyday experiences
Bria Long
Violet Xiang
Stefan Stojanov
Robert Z. Sparks
Zi Yin
...
Steven Y. Feng
Chengxu Zhuang
V. Marchman
Daniel L. K. Yamins
Michael C. Frank
VGenEgoV
102
3
0
14 Jun 2024
EFM3D: A Benchmark for Measuring Progress Towards 3D Egocentric
  Foundation Models
EFM3D: A Benchmark for Measuring Progress Towards 3D Egocentric Foundation Models
Julian Straub
Daniel DeTone
Tianwei Shen
Nan Yang
Chris Sweeney
Richard Newcombe
EgoV
95
9
0
14 Jun 2024
CarLLaVA: Vision language models for camera-only closed-loop driving
CarLLaVA: Vision language models for camera-only closed-loop driving
Katrin Renz
Long Chen
Ana-Maria Marcu
Jan Hünermann
Benoît Hanotte
Alice Karnsund
Jamie Shotton
Elahe Arani
Oleg Sinavski
VLM
138
26
0
14 Jun 2024
MapVision: CVPR 2024 Autonomous Grand Challenge Mapless Driving Tech
  Report
MapVision: CVPR 2024 Autonomous Grand Challenge Mapless Driving Tech Report
Zhongyu Yang
Mai Liu
Jinluo Xie
Yueming Zhang
Chen Shen
Wei Shao
Jichao Jiao
Tengfei Xing
Runbo Hu
Pengfei Xu
73
2
0
14 Jun 2024
What Does Softmax Probability Tell Us about Classifiers Ranking Across
  Diverse Test Conditions?
What Does Softmax Probability Tell Us about Classifiers Ranking Across Diverse Test Conditions?
Weijie Tu
Weijian Deng
Liang Zheng
Tom Gedeon
94
1
0
14 Jun 2024
Exploring the Benefits of Vision Foundation Models for Unsupervised
  Domain Adaptation
Exploring the Benefits of Vision Foundation Models for Unsupervised Domain Adaptation
B. B. Englert
Fabrizio J. Piva
Tommie Kerssies
Daan de Geus
Gijs Dubbelman
84
11
0
14 Jun 2024
Vision Language Modeling of Content, Distortion and Appearance for Image
  Quality Assessment
Vision Language Modeling of Content, Distortion and Appearance for Image Quality Assessment
Fei Zhou
Zhicong Huang
Tianhao Gu
Guoping Qiu
CoGeVLM
137
1
0
14 Jun 2024
A Two-Stage Masked Autoencoder Based Network for Indoor Depth Completion
A Two-Stage Masked Autoencoder Based Network for Indoor Depth Completion
Kailai Sun
Zhou Yang
Qianchuan Zhao
3DVViT3DPCMDE
41
0
0
14 Jun 2024
Previous
123...282930...949596
Next