Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.06377
Cited By
v1
v2
v3 (latest)
Masked Autoencoders Are Scalable Vision Learners
11 November 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Masked Autoencoders Are Scalable Vision Learners"
50 / 4,777 papers shown
Title
Probabilistic Language-Image Pre-Training
Sanghyuk Chun
Wonjae Kim
Song Park
Sangdoo Yun
MLLM
VLM
CLIP
489
6
2
24 Oct 2024
TabDPT: Scaling Tabular Foundation Models
Junwei Ma
Valentin Thomas
Rasa Hosseinzadeh
Hamidreza Kamkari
Alex Labach
Jesse C. Cresswell
Keyvan Golestan
Guangwei Yu
M. Volkovs
Anthony L. Caterini
LMTD
113
8
0
23 Oct 2024
Learning Versatile Skills with Curriculum Masking
Yao Tang
Zhihui Xie
Zichuan Lin
Deheng Ye
Shuai Li
OffRL
73
0
0
23 Oct 2024
PETAH: Parameter Efficient Task Adaptation for Hybrid Transformers in a resource-limited Context
Maximilian Augustin
Syed Shakib Sarwar
Mostafa Elhoushi
Sai Qian Zhang
Yuecheng Li
B. D. Salvo
66
1
0
23 Oct 2024
Bridging the Gaps: Utilizing Unlabeled Face Recognition Datasets to Boost Semi-Supervised Facial Expression Recognition
Jie Song
Mengqiao He
Jinhua Feng
Bo Shen
69
0
0
23 Oct 2024
Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration
Max Wilcoxson
Qiyang Li
Kevin Frans
Sergey Levine
SSL
OffRL
OnRL
189
0
0
23 Oct 2024
EVC-MF: End-to-end Video Captioning Network with Multi-scale Features
Tian-Zi Niu
Zhen-Duo Chen
Xin Luo
Xin-Shun Xu
49
0
0
22 Oct 2024
ViMGuard: A Novel Multi-Modal System for Video Misinformation Guarding
Andrew Kan
Christopher Kan
Zaid Nabulsi
32
0
0
22 Oct 2024
Foundation Models for Rapid Autonomy Validation
Alec Farid
Peter Schleede
Aaron Huang
Christoffer Heckman
98
0
0
22 Oct 2024
Frontiers in Intelligent Colonoscopy
Ge-Peng Ji
Jingyi Liu
Peng Xu
Nick Barnes
Fahad Shahbaz Khan
Salman Khan
Deng-Ping Fan
125
5
0
22 Oct 2024
Granularity Matters in Long-Tail Learning
Shizhen Zhao
Xin Wen
Qingbin Liu
Chuofan Ma
Chun Yuan
Xiaojuan Qi
76
0
0
21 Oct 2024
Zero-Shot Scene Reconstruction from Single Images with Deep Prior Assembly
Junsheng Zhou
Yu-Shen Liu
Zhizhong Han
ViT
115
11
0
21 Oct 2024
Contrastive random lead coding for channel-agnostic self-supervision of biosignals
Thea Brusch
Mikkel N. Schmidt
T. S. Alstrøm
SSL
62
0
0
21 Oct 2024
SeisLM: a Foundation Model for Seismic Waveforms
Tianlin Liu
Jannes Münchmeyer
Laura Laurenti
C. Marone
Maarten V. de Hoop
Ivan Dokmanić
VLM
124
6
0
21 Oct 2024
ViMoE: An Empirical Study of Designing Vision Mixture-of-Experts
Xumeng Han
Longhui Wei
Zhiyang Dou
Zipeng Wang
Chenhui Qiang
Xin He
Yingfei Sun
Zhenjun Han
Qi Tian
MoE
83
5
0
21 Oct 2024
Exploring Stronger Transformer Representation Learning for Occluded Person Re-Identification
Zhangjian Ji
Donglin Cheng
K. Feng
ViT
69
1
0
21 Oct 2024
OpenMU: Your Swiss Army Knife for Music Understanding
Mengjie Zhao
Zhi-Wei Zhong
Zhuoyuan Mao
Shiqi Yang
Wei-Hsiang Liao
Shusuke Takahashi
Hiromi Wakaki
Yuki Mitsufuji
OSLM
99
8
0
21 Oct 2024
TIPS: Text-Image Pretraining with Spatial awareness
Kevis-Kokitsi Maninis
Kaifeng Chen
Soham Ghosh
Arjun Karpur
Koert Chen
...
Jan Dlabal
Dan Gnanapragasam
Mojtaba Seyedhosseini
Howard Zhou
Andre Araujo
VLM
129
3
0
21 Oct 2024
Generalized Multimodal Fusion via Poisson-Nernst-Planck Equation
Jiayu Xiong
Jing Wang
Hengjing Xiang
Jun Xue
Chen Xu
Zhouqiang Jiang
57
0
0
20 Oct 2024
Upsampling DINOv2 features for unsupervised vision tasks and weakly supervised materials segmentation
Ronan Docherty
Antonis Vamvakeros
Samuel J. Cooper
69
2
0
20 Oct 2024
A Survey of Hallucination in Large Visual Language Models
Wei Lan
Wenyi Chen
Qingfeng Chen
Shirui Pan
Huiyu Zhou
Yi-Lun Pan
LRM
92
6
0
20 Oct 2024
FoMo: A Foundation Model for Mobile Traffic Forecasting with Diffusion Model
Haoye Chai
Shiyuan Zhang
Xiaoqian Qi
Yong Li
171
1
0
20 Oct 2024
CAGE: Causal Attention Enables Data-Efficient Generalizable Robotic Manipulation
Shangning Xia
Hongjie Fang
Hao-Shu Fang
Cewu Lu
CML
86
5
0
19 Oct 2024
A Survey on All-in-One Image Restoration: Taxonomy, Evaluation and Future Trends
Junjun Jiang
Zengyuan Zuo
Gang Wu
Kui Jiang
Xianming Liu
113
17
0
19 Oct 2024
BiGR: Harnessing Binary Latent Codes for Image Generation and Improved Visual Representation Capabilities
Shaozhe Hao
Xuantong Liu
Xianbiao Qi
Shihao Zhao
Bojia Zi
Rong Xiao
Kai Han
Kwan-Yee K. Wong
198
3
0
18 Oct 2024
Croc: Pretraining Large Multimodal Models with Cross-Modal Comprehension
Yin Xie
Kaicheng Yang
Ninghua Yang
Weimo Deng
Xiangzi Dai
...
Yumeng Wang
Xiang An
Yongle Zhao
Ziyong Feng
Jiankang Deng
MLLM
VLM
72
1
0
18 Oct 2024
E3D-GPT: Enhanced 3D Visual Foundation for Medical Vision-Language Model
Zihang Jiang
Zihang Jiang
Qingsong Yao
Rongsheng Wang
Zhiyang He
Xiaodong Tao
Wei Wei
Weifu Lv
Shuoling Zhou
VLM
MedIm
43
5
0
18 Oct 2024
Rethinking Transformer for Long Contextual Histopathology Whole Slide Image Analysis
Honglin Li
Yunlong Zhang
Pingyi Chen
Zhongyi Shui
Chenglu Zhu
Lin Yang
MedIm
97
5
0
18 Oct 2024
Swiss Army Knife: Synergizing Biases in Knowledge from Vision Foundation Models for Multi-Task Learning
Yuxiang Lu
Shengcao Cao
Yu-Xiong Wang
124
1
0
18 Oct 2024
Attuned to Change: Causal Fine-Tuning under Latent-Confounded Shifts
Jialin Yu
Yuxiang Zhou
Yulan He
Nevin L. Zhang
Ricardo Silva
Philip Torr
Ricardo M. A. Silva
93
0
0
18 Oct 2024
Improving Vector-Quantized Image Modeling with Latent Consistency-Matching Diffusion
Bac Nguyen
and Chieh-Hsin Lai
Yuhta Takida
Naoki Murata
Toshimitsu Uesaka
Stefano Ermon
Yuki Mitsufuji
103
0
0
18 Oct 2024
On Partial Prototype Collapse in the DINO Family of Self-Supervised Methods
Hariprasath Govindarajan
Per Sidén
Jacob Roll
Fredrik Lindsten
64
2
0
17 Oct 2024
Fluid: Scaling Autoregressive Text-to-image Generative Models with Continuous Tokens
Lijie Fan
Tianhong Li
Siyang Qin
Yuanzhen Li
Chen Sun
Michael Rubinstein
Deqing Sun
Kaiming He
Yonglong Tian
VLM
DiffM
131
57
0
17 Oct 2024
Artificial Kuramoto Oscillatory Neurons
Takeru Miyato
Sindy Löwe
Andreas Geiger
Max Welling
AI4CE
204
10
0
17 Oct 2024
The Non-Local Model Merging Problem: Permutation Symmetries and Variance Collapse
Ekansh Sharma
Daniel M. Roy
Gintare Karolina Dziugaite
MoMe
77
4
0
16 Oct 2024
Adaptive Prompt Learning with SAM for Few-shot Scanning Probe Microscope Image Segmentation
Yao Shen
Ziwei Wei
Chunmeng Liu
Shuming Wei
Qi Zhao
Kaiyang Zeng
Guangyao Li
VLM
65
0
0
16 Oct 2024
Stabilize the Latent Space for Image Autoregressive Modeling: A Unified Perspective
Yongxin Zhu
Bing Li
Hang Zhang
Xin Li
Linli Xu
Lidong Bing
DiffM
116
9
0
16 Oct 2024
MAX: Masked Autoencoder for X-ray Fluorescence in Geological Investigation
An-Sheng Lee
Yu-Wen Pao
Hsuan-Tien Lin
Sofia Ya Hsuan Liou
55
1
0
16 Oct 2024
Fusion from Decomposition: A Self-Supervised Approach for Image Fusion and Beyond
Pengwei Liang
Junjun Jiang
Qing Ma
Xianming Liu
Jiayi Ma
79
2
0
16 Oct 2024
iFuzzyTL: Interpretable Fuzzy Transfer Learning for SSVEP BCI System
Xiaowei Jiang
Beining Cao
Liang Ou
Yu-Cheng Chang
T. Do
Chin-Teng Lin
87
2
0
16 Oct 2024
TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration
Yiwei Guo
Shaobin Zhuang
Kunchang Li
Yu Qiao
Yali Wang
VLM
CLIP
128
1
0
16 Oct 2024
SAM-Guided Masked Token Prediction for 3D Scene Understanding
Zhimin Chen
Liang Yang
Yingwei Li
Longlong Jing
Bing Li
128
3
0
16 Oct 2024
SOE: SO(3)-Equivariant 3D MRI Encoding
Shizhe He
Magdalini Paschali
J. Ouyang
Adnan Masood
Akshay S. Chaudhari
Ehsan Adeli
57
1
0
15 Oct 2024
Beyond Labels: A Self-Supervised Framework with Masked Autoencoders and Random Cropping for Breast Cancer Subtype Classification
Annalisa Chiocchetti
Marco Dossena
Christopher Irwin
Luigi Portinale
65
0
0
15 Oct 2024
Visual Fixation-Based Retinal Prosthetic Simulation
Yuli Wu
Do Dinh Tan Nguyen
Henning Konermann
Rüveyda Yilmaz
Peter Walter
Johannes Stegmaier
54
0
0
15 Oct 2024
Breaking Modality Gap in RGBT Tracking: Coupled Knowledge Distillation
Andong Lu
Jiacong Zhao
Chenglong Li
Yun Xiao
Bin Luo
94
4
0
15 Oct 2024
Meta-DT: Offline Meta-RL as Conditional Sequence Modeling with World Model Disentanglement
Zhi Wang
Li Zhang
Wenhao Wu
Yuanheng Zhu
Dongbin Zhao
C. L. Philip Chen
OffRL
100
9
0
15 Oct 2024
DRACO: A Denoising-Reconstruction Autoencoder for Cryo-EM
Yingjun Shen
Haizhao Dai
Qihe Chen
Yan Zeng
Jiakai Zhang
Yuan Pei
Jingyi Yu
118
3
0
15 Oct 2024
Multiview Scene Graph
Juexiao Zhang
Gao Zhu
Sihang Li
Xinhao Liu
Haorui Song
Xinran Tang
Chen Feng
3DV
75
2
0
15 Oct 2024
A CLIP-Powered Framework for Robust and Generalizable Data Selection
Steve Yang
Peng Ye
Wanli Ouyang
Dongzhan Zhou
Furao Shen
115
2
0
15 Oct 2024
Previous
1
2
3
...
17
18
19
...
94
95
96
Next