ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.06377
  4. Cited By
Masked Autoencoders Are Scalable Vision Learners
v1v2v3 (latest)

Masked Autoencoders Are Scalable Vision Learners

11 November 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
    ViTTPM
ArXiv (abs)PDFHTML

Papers citing "Masked Autoencoders Are Scalable Vision Learners"

50 / 4,777 papers shown
Title
Real-Time Localization and Bimodal Point Pattern Analysis of Palms Using
  UAV Imagery
Real-Time Localization and Bimodal Point Pattern Analysis of Palms Using UAV Imagery
Kangning Cui
Wei Tang
Rongkun Zhu
Manqi Wang
Gregory Larsen
...
Jordan Karubian
Raymond H. Chan
R. Plemmons
Jean-Michel Morel
Miles Silman
61
4
0
14 Oct 2024
EchoApex: A General-Purpose Vision Foundation Model for Echocardiography
EchoApex: A General-Purpose Vision Foundation Model for Echocardiography
A. Amadou
Yanzhe Zhang
Sebastien Piat
Paul Klein
Ingo Schmuecking
Tiziano Passerini
Puneet Sharma
95
5
0
14 Oct 2024
Browsing without Third-Party Cookies: What Do You See?
Browsing without Third-Party Cookies: What Do You See?
Maxwell Lin
Shihan Lin
Helen Wu
Karen Wang
Xiaowei Yang
BDL
268
14
0
14 Oct 2024
Adaptive Diffusion Terrain Generator for Autonomous Uneven Terrain
  Navigation
Adaptive Diffusion Terrain Generator for Autonomous Uneven Terrain Navigation
Youwei Yu
Junhong Xu
Lantao Liu
63
0
0
14 Oct 2024
Customize Your Visual Autoregressive Recipe with Set Autoregressive
  Modeling
Customize Your Visual Autoregressive Recipe with Set Autoregressive Modeling
Wenze Liu
Le Zhuo
Yi Xin
Sheng Xia
Peng Gao
Xiangyu Yue
125
9
0
14 Oct 2024
PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic
  Manipulation
PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation
Kai Zhang
Pengzhen Ren
Bingqian Lin
Junfan Lin
Shikui Ma
Hang Xu
Xiaodan Liang
61
2
0
14 Oct 2024
V2M: Visual 2-Dimensional Mamba for Image Representation Learning
V2M: Visual 2-Dimensional Mamba for Image Representation Learning
Chengkun Wang
Wenzhao Zheng
Yuanhui Huang
Jie Zhou
Jiwen Lu
Mamba
37
2
0
14 Oct 2024
FasterDiT: Towards Faster Diffusion Transformers Training without
  Architecture Modification
FasterDiT: Towards Faster Diffusion Transformers Training without Architecture Modification
J. Yao
Wang Cheng
Wenyu Liu
Xinggang Wang
91
13
0
14 Oct 2024
Revisiting and Benchmarking Graph Autoencoders: A Contrastive Learning
  Perspective
Revisiting and Benchmarking Graph Autoencoders: A Contrastive Learning Perspective
Jintang Li
Ruofan Wu
Yuchang Zhu
Huizhe Zhang
Xinzhou Jin
Guibin Zhang
Zulun Zhu
Zibin Zheng
Liang Chen
SSL
82
0
0
14 Oct 2024
LADMIM: Logical Anomaly Detection with Masked Image Modeling in Discrete
  Latent Space
LADMIM: Logical Anomaly Detection with Masked Image Modeling in Discrete Latent Space
Shunsuke Sakai
Tatushito Hasegawa
Makoto Koshino
85
1
0
14 Oct 2024
Graph Masked Autoencoder for Spatio-Temporal Graph Learning
Graph Masked Autoencoder for Spatio-Temporal Graph Learning
Qianru Zhang
Haixin Wang
Siu-Ming Yiu
Hongzhi Yin
AI4TS
54
1
0
14 Oct 2024
Exploring Semi-Supervised Learning for Online Mapping
Exploring Semi-Supervised Learning for Online Mapping
Adam Lilja
Erik Wallin
Junsheng Fu
Lars Hammarstrand
SSL
137
1
0
14 Oct 2024
Multi-modal Vision Pre-training for Medical Image Analysis
Multi-modal Vision Pre-training for Medical Image Analysis
Shaohao Rui
Lingzhi Chen
Zhenyu Tang
Lilong Wang
M. Liu
Shanghang Zhang
Xiaosong Wang
69
0
0
14 Oct 2024
Locality Alignment Improves Vision-Language Models
Locality Alignment Improves Vision-Language Models
Ian Covert
Tony Sun
James Zou
Tatsunori Hashimoto
VLM
267
7
0
14 Oct 2024
NARAIM: Native Aspect Ratio Autoregressive Image Models
NARAIM: Native Aspect Ratio Autoregressive Image Models
Daniel Gallo Fernández
Robert van der Klis
Rǎzvan-Andrei Matişan
Janusz Partyka
E. Gavves
Samuele Papa
Phillip Lippe
24
0
0
13 Oct 2024
UnSeg: One Universal Unlearnable Example Generator is Enough against All
  Image Segmentation
UnSeg: One Universal Unlearnable Example Generator is Enough against All Image Segmentation
Ye Sun
Hao Zhang
Tiehua Zhang
Xingjun Ma
Yu-Gang Jiang
VLM
87
4
0
13 Oct 2024
Large-Scale 3D Medical Image Pre-training with Geometric Context Priors
Large-Scale 3D Medical Image Pre-training with Geometric Context Priors
Linshan Wu
Jiaxin Zhuang
Hao Chen
90
6
0
13 Oct 2024
ImagineNav: Prompting Vision-Language Models as Embodied Navigator
  through Scene Imagination
ImagineNav: Prompting Vision-Language Models as Embodied Navigator through Scene Imagination
Xinxin Zhao
Wenzhe Cai
Likun Tang
Teng Wang
LM&Ro
71
10
0
13 Oct 2024
AM-SAM: Automated Prompting and Mask Calibration for Segment Anything
  Model
AM-SAM: Automated Prompting and Mask Calibration for Segment Anything Model
Yuchen Li
Li Zhang
Youwei Liang
Pengtao Xie
VLM
64
2
0
13 Oct 2024
Point Cloud Mixture-of-Domain-Experts Model for 3D Self-supervised Learning
Point Cloud Mixture-of-Domain-Experts Model for 3D Self-supervised Learning
Yaohua Zha
Tao Dai
Yanzi Wang
Hang Guo
Bin Chen
Zhihao Ouyang
Chunlin Fan
3DPC
88
1
0
13 Oct 2024
COrAL: Order-Agnostic Language Modeling for Efficient Iterative
  Refinement
COrAL: Order-Agnostic Language Modeling for Efficient Iterative Refinement
Yuxi Xie
Anirudh Goyal
Xiaobao Wu
Xunjian Yin
Xiao Xu
Min-Yen Kan
Liangming Pan
William Yang Wang
LRM
358
1
0
12 Oct 2024
Pic@Point: Cross-Modal Learning by Local and Global Point-Picture
  Correspondence
Pic@Point: Cross-Modal Learning by Local and Global Point-Picture Correspondence
Vencia Herzog
Stefan Suwelack
3DPC
59
0
0
12 Oct 2024
Calibrated Cache Model for Few-Shot Vision-Language Model Adaptation
Calibrated Cache Model for Few-Shot Vision-Language Model Adaptation
Kun Ding
Qiang Yu
Haojian Zhang
Gaofeng Meng
Shiming Xiang
VLM
67
0
0
11 Oct 2024
A foundation model for generalizable disease diagnosis in chest X-ray
  images
A foundation model for generalizable disease diagnosis in chest X-ray images
Lijian Xu
Ziyu Ni
Hao Sun
Hongsheng Li
Shaoting Zhang
LM&MAMedIm
57
1
0
11 Oct 2024
VideoSAM: Open-World Video Segmentation
VideoSAM: Open-World Video Segmentation
Pinxue Guo
Zixu Zhao
Jianxiong Gao
Chongruo Wu
Tong He
Zheng Zhang
Tianjun Xiao
Wenqiang Zhang
VOS
78
1
0
11 Oct 2024
GAI-Enabled Explainable Personalized Federated Semi-Supervised Learning
GAI-Enabled Explainable Personalized Federated Semi-Supervised Learning
Yubo Peng
Feibo Jiang
Li Dong
Kezhi Wang
Kun Yang
FedML
134
0
0
11 Oct 2024
Learning General Representation of 12-Lead Electrocardiogram with a
  Joint-Embedding Predictive Architecture
Learning General Representation of 12-Lead Electrocardiogram with a Joint-Embedding Predictive Architecture
Sehun Kim
64
2
0
11 Oct 2024
On a Hidden Property in Computational Imaging
On a Hidden Property in Computational Imaging
Yinan Feng
Yinpeng Chen
Yueh Lee
Youzuo Lin
80
0
0
11 Oct 2024
SmartPretrain: Model-Agnostic and Dataset-Agnostic Representation Learning for Motion Prediction
SmartPretrain: Model-Agnostic and Dataset-Agnostic Representation Learning for Motion Prediction
Yang Zhou
Hao Shao
Letian Wang
Steven Waslander
Hongsheng Li
Yu Liu
84
2
0
11 Oct 2024
Scaling Laws For Diffusion Transformers
Scaling Laws For Diffusion Transformers
Zhengyang Liang
Hao He
Ceyuan Yang
Bo Dai
89
14
0
10 Oct 2024
Parameter-Efficient Fine-Tuning in Spectral Domain for Point Cloud
  Learning
Parameter-Efficient Fine-Tuning in Spectral Domain for Point Cloud Learning
Dingkang Liang
Tianrui Feng
Xin Zhou
Yumeng Zhang
Zhikang Zou
Xiang Bai
68
7
0
10 Oct 2024
C^2DA: Contrastive and Context-aware Domain Adaptive Semantic
  Segmentation
C^2DA: Contrastive and Context-aware Domain Adaptive Semantic Segmentation
Md. Al-Masrur Khan
Zheng Chen
Lantao Liu
63
0
0
10 Oct 2024
OneRef: Unified One-tower Expression Grounding and Segmentation with
  Mask Referring Modeling
OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling
Linhui Xiao
Xiaoshan Yang
Fang Peng
Yaowei Wang
Changsheng Xu
ObjD
124
7
0
10 Oct 2024
FLIER: Few-shot Language Image Models Embedded with Latent
  Representations
FLIER: Few-shot Language Image Models Embedded with Latent Representations
Zhinuo Zhou
Peng Zhou
Xiaoyong Pan
VLM
40
0
0
10 Oct 2024
Chain-of-Sketch: Enabling Global Visual Reasoning
Chain-of-Sketch: Enabling Global Visual Reasoning
Aryo Lotfi
Enrico Fini
Samy Bengio
Moin Nabi
Emmanuel Abbe
LRM
92
0
0
10 Oct 2024
SPA: 3D Spatial-Awareness Enables Effective Embodied Representation
SPA: 3D Spatial-Awareness Enables Effective Embodied Representation
Haoyi Zhu
Honghui Yang
Yating Wang
Jiange Yang
Limin Wang
Tong He
3DH
126
9
0
10 Oct 2024
MotionAura: Generating High-Quality and Motion Consistent Videos using Discrete Diffusion
MotionAura: Generating High-Quality and Motion Consistent Videos using Discrete Diffusion
Onkar Susladkar
Jishu Sen Gupta
Chirag Sehgal
Sparsh Mittal
Rekha Singhal
DiffMVGen
105
0
0
10 Oct 2024
Masked Generative Priors Improve World Models Sequence Modelling Capabilities
Masked Generative Priors Improve World Models Sequence Modelling Capabilities
Cristian Meo
Mircea Lica
Zarif Ikram
Akihiro Nakano
Vedant Shah
Aniket Didolkar
Dianbo Liu
Anirudh Goyal
Justin Dauwels
OffRL
235
0
0
10 Oct 2024
Progressive Multi-Modal Fusion for Robust 3D Object Detection
Progressive Multi-Modal Fusion for Robust 3D Object Detection
Rohit Mohan
Daniele Cattaneo
Florian Drews
Abhinav Valada
3DPC
138
4
0
09 Oct 2024
Exploring the design space of deep-learning-based weather forecasting
  systems
Exploring the design space of deep-learning-based weather forecasting systems
Shoaib Ahmed Siddiqui
Jean Kossaifi
Boris Bonev
Christopher Choy
Jan Kautz
David M. Krueger
Kamyar Azizzadenesheli
AI4CEOOD
59
3
0
09 Oct 2024
Generalizing Segmentation Foundation Model Under Sim-to-real
  Domain-shift for Guidewire Segmentation in X-ray Fluoroscopy
Generalizing Segmentation Foundation Model Under Sim-to-real Domain-shift for Guidewire Segmentation in X-ray Fluoroscopy
Yuxuan Wen
Evgenia Roussinova
Olivier Brina
Paolo Machi
Mohamed Bouri
OODMedIm
94
1
0
09 Oct 2024
Self-Supervised Learning for Real-World Object Detection: a Survey
Self-Supervised Learning for Real-World Object Detection: a Survey
Alina Ciocarlan
Sidonie Lefebvre
S. L. Hégarat-Mascle
Arnaud Woiselle
ObjD
94
1
0
09 Oct 2024
Towards Generalisable Time Series Understanding Across Domains
Towards Generalisable Time Series Understanding Across Domains
Özgün Turgut
Philip Muller
Martin J. Menten
Daniel Rueckert
AI4TS
133
3
0
09 Oct 2024
Suppress Content Shift: Better Diffusion Features via Off-the-Shelf
  Generation Techniques
Suppress Content Shift: Better Diffusion Features via Off-the-Shelf Generation Techniques
Benyuan Meng
Qianqian Xu
Zitai Wang
Zhiyong Yang
Xiaochun Cao
Qingming Huang
96
0
0
09 Oct 2024
MaskBlur: Spatial and Angular Data Augmentation for Light Field Image
  Super-Resolution
MaskBlur: Spatial and Angular Data Augmentation for Light Field Image Super-Resolution
Wentao Chao
Fuqing Duan
Yulan Guo
Guanghui Wang
71
1
0
09 Oct 2024
Pair-VPR: Place-Aware Pre-training and Contrastive Pair Classification for Visual Place Recognition with Vision Transformers
Pair-VPR: Place-Aware Pre-training and Contrastive Pair Classification for Visual Place Recognition with Vision Transformers
Stephen Hausler
Peyman Moghadam
SSLViT
68
4
0
09 Oct 2024
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Sihyun Yu
Sangkyung Kwak
Huiwon Jang
Jongheon Jeong
Jonathan Huang
Jinwoo Shin
Saining Xie
OCL
184
102
0
09 Oct 2024
GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge
  for Robot Manipulation
GR-2: A Generative Video-Language-Action Model with Web-Scale Knowledge for Robot Manipulation
Chi-Lam Cheang
Guangzeng Chen
Ya Jing
Tao Kong
Hang Li
...
Hongtao Wu
Jiafeng Xu
Yichu Yang
Hanbo Zhang
Minzhao Zhu
VGenLM&Ro
121
73
0
08 Oct 2024
Guided Self-attention: Find the Generalized Necessarily Distinct Vectors
  for Grain Size Grading
Guided Self-attention: Find the Generalized Necessarily Distinct Vectors for Grain Size Grading
Fang Gao
XueTao Li
Jiabao Wang
Shengheng Ma
Jun Yu
46
0
0
08 Oct 2024
TimeDART: A Diffusion Autoregressive Transformer for Self-Supervised Time Series Representation
TimeDART: A Diffusion Autoregressive Transformer for Self-Supervised Time Series Representation
Daoyu Wang
Mingyue Cheng
Ziqiang Liu
Qiang Liu
AI4TSDiffM
120
1
0
08 Oct 2024
Previous
123...181920...949596
Next