Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.06377
Cited By
v1
v2
v3 (latest)
Masked Autoencoders Are Scalable Vision Learners
11 November 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Masked Autoencoders Are Scalable Vision Learners"
50 / 4,777 papers shown
Title
Memory Storyboard: Leveraging Temporal Segmentation for Streaming Self-Supervised Learning from Egocentric Videos
Yanlai Yang
Mengye Ren
474
0
0
21 Jan 2025
A generalizable 3D framework and model for self-supervised learning in medical imaging
Tony Xu
Sepehr Hosseini
Chris Anderson
Anthony Rinaldi
Rahul G. Krishnan
Anne L. Martel
Maged Goubran
MedIm
162
3
0
20 Jan 2025
How Well Do Supervised 3D Models Transfer to Medical Imaging Tasks?
Wenxuan Li
Alan Yuille
Zongwei Zhou
MedIm
148
10
0
20 Jan 2025
Enhancing SAR Object Detection with Self-Supervised Pre-training on Masked Auto-Encoders
Xinyang Pu
Feng Xu
125
0
0
20 Jan 2025
MetaNeRV: Meta Neural Representations for Videos with Spatial-Temporal Guidance
Jialong Guo
Ke Liu
Jiangchao Yao
Zhihua Wang
Jiajun Bu
Haishuai Wang
AI4TS
125
1
0
20 Jan 2025
Enhancing Graph Self-Supervised Learning with Graph Interplay
Xinjian Zhao
Wei Pang
Xiangru Jian
Yaoyao Xu
Chaolong Ying
Tianshu Yu
230
0
0
17 Jan 2025
SynthLight: Portrait Relighting with Diffusion Model by Learning to Re-render Synthetic Faces
Sumit Chaturvedi
Mengwei Ren
Yannick Hold-Geoffroy
Jingyuan Liu
Julie Dorsey
Zhixin Shu
DiffM
99
0
0
17 Jan 2025
Few-Shot Adaptation of Training-Free Foundation Model for 3D Medical Image Segmentation
Xingxin He
Yifan Hu
Zhaoye Zhou
Mohamed Jarraya
Fang Liu
VLM
MedIm
105
2
0
17 Jan 2025
Enhancing Skin Disease Diagnosis: Interpretable Visual Concept Discovery with SAM
Xin Hu
Janet Wang
Jihun Hamm
R. Yotsu
Zhengming Ding
147
1
0
17 Jan 2025
EarthView: A Large Scale Remote Sensing Dataset for Self-Supervision
Diego A. Velázquez
Pau Rodríguez López
Sergio Alonso
Josep M. Gonfaus
Jordi Gonzalez
Gerardo Richarte
Javier Marin
Yoshua Bengio
Alexandre Lacoste
106
1
0
14 Jan 2025
Code and Pixels: Multi-Modal Contrastive Pre-training for Enhanced Tabular Data Analysis
Kankana Roy
Lars Krämer
Sebastian Domaschke
Malik Haris
Roland Aydin
Fabian Isensee
Martin Held
119
0
0
13 Jan 2025
EdgeTAM: On-Device Track Anything Model
Chong Zhou
Chenchen Zhu
Yunyang Xiong
Saksham Suri
Fanyi Xiao
...
Raghuraman Krishnamoorthi
Bo Dai
Chen Change Loy
Vikas Chandra
Bilge Soran
VLM
106
1
0
13 Jan 2025
RoboHorizon: An LLM-Assisted Multi-View World Model for Long-Horizon Robotic Manipulation
Zixuan Chen
Jing Huo
Yangtao Chen
Yang Gao
158
4
0
11 Jan 2025
AI-powered virtual tissues from spatial proteomics for clinical diagnostics and biomedical discovery
Johann Wenckstern
Eeshaan Jain
Kiril Vasilev
Matteo Pariset
Andreas Wicki
Gabriele Gut
Charlotte Bunne
69
3
0
10 Jan 2025
AnCoGen: Analysis, Control and Generation of Speech with a Masked Autoencoder
Samir Sadok
Simon Leglaive
Laurent Girin
Gaël Richard
Xavier Alameda-Pineda
129
3
0
10 Jan 2025
EditAR: Unified Conditional Generation with Autoregressive Models
Jiteng Mu
Nuno Vasconcelos
Xinyu Wang
DiffM
89
6
0
08 Jan 2025
Edit as You See: Image-guided Video Editing via Masked Motion Modeling
Zhi-Lin Huang
Yebin Liu
Chujun Qin
Zihan Wang
Dong Zhou
Dong Li
E. Barsoum
DiffM
VGen
77
0
0
08 Jan 2025
Abstracted Shapes as Tokens -- A Generalizable and Interpretable Model for Time-series Classification
Yunshi Wen
Tengfei Ma
Tsui-Wei Weng
Lam M. Nguyen
A. Julius
AI4TS
88
3
0
08 Jan 2025
Learning Informative Latent Representation for Quantum State Tomography
Hailan Ma
Zhenhong Sun
Daoyi Dong
Dong Gong
99
1
0
08 Jan 2025
LLaVA-Mini: Efficient Image and Video Large Multimodal Models with One Vision Token
Shaolei Zhang
Qingkai Fang
Zhe Yang
Yang Feng
MLLM
VLM
168
43
0
07 Jan 2025
Gaussian Masked Autoencoders
Jathushan Rajasegaran
Xinlei Chen
Rulilong Li
Christoph Feichtenhofer
Jitendra Malik
Shiry Ginosar
3DGS
65
1
0
06 Jan 2025
PiLaMIM: Toward Richer Visual Representations by Integrating Pixel and Latent Masked Image Modeling
Junmyeong Lee
Eui Jun Hwang
Sukmin Cho
Jong C. Park
89
0
0
06 Jan 2025
Human Gaze Boosts Object-Centered Representation Learning
Timothy Schaumlöffel
A. Aubret
Gemma Roig
Jochen Triesch
117
0
0
06 Jan 2025
First-place Solution for Streetscape Shop Sign Recognition Competition
Bin Wang
Li Jing
468
0
0
06 Jan 2025
LWFNet: Coherent Doppler Wind Lidar-Based Network for Wind Field Retrieval
R. Tao
Chong Wang
Hao Chen
Mingjiao Jia
Xiang Shang
...
Yanyu Lu
Yanfeng Huo
Junlin Wu
Xianghui Xue
Xiankang Dou
82
0
0
05 Jan 2025
DeTrack: In-model Latent Denoising Learning for Visual Object Tracking
Xinyu Zhou
Jinglun Li
Lingyi Hong
Kaixun Jiang
Pinxue Guo
Weifeng Ge
Wenqiang Zhang
DiffM
85
1
0
05 Jan 2025
Enhancing Contrastive Learning for Retinal Imaging via Adjusted Augmentation Scales
Zijie Cheng
Yangqiu Song
André Altmann
P. Keane
Yukun Zhou
MedIm
76
0
0
05 Jan 2025
FOLDER: Accelerating Multi-modal Large Language Models with Enhanced Performance
Haicheng Wang
Zhemeng Yu
Gabriele Spadaro
Chen Ju
Victor Quétu
Enzo Tartaglione
Enzo Tartaglione
VLM
433
6
0
05 Jan 2025
Missing Data as Augmentation in the Earth Observation Domain: A Multi-View Learning Approach
Francisco Mena
Diego Arenas
A. Dengel
119
1
0
03 Jan 2025
Fine-grained Image-to-LiDAR Contrastive Distillation with Visual Foundation Models
Yifan Zhang
Junhui Hou
152
1
0
03 Jan 2025
Keypoint Aware Masked Image Modelling
Madhava Krishna
Convin.AI
136
0
0
03 Jan 2025
Double-Flow GAN model for the reconstruction of perceived faces from brain activities
Zihao Wang
Jing Zhao
Xuetong Ding
Hui Zhang
CVBM
AI4CE
104
0
0
03 Jan 2025
Exploring Structured Semantic Priors Underlying Diffusion Score for Test-time Adaptation
Mingjia Li
Shuang Li
Tongrui Su
Longhui Yuan
Jian Liang
Wei Li
DiffM
90
0
0
03 Jan 2025
Is Segment Anything Model 2 All You Need for Surgery Video Segmentation? A Systematic Evaluation
Cheng Yuan
Jian Jiang
Kunyi Yang
Lv Wu
Rui Wang
...
Yifan Zhou
Wanli Song
Haoran Wang
Qi Dou
Yutong Ban
90
2
0
03 Jan 2025
Advancements in Visual Language Models for Remote Sensing: Datasets, Capabilities, and Enhancement Techniques
Lijie Tao
Han Zhang
Haizhao Jing
Yu Liu
Kelu Yao
Guoting Wei
Xizhe Xue
144
0
0
03 Jan 2025
Neural Network Diffusion
Kaili Wang
Dongwen Tang
Boya Zeng
Yida Yin
Zhaopan Xu
Yukun Zhou
Zelin Zang
Trevor Darrell
Zhuang Liu
Yang You
DiffM
137
5
0
03 Jan 2025
ILDiff: Generate Transparent Animated Stickers by Implicit Layout Distillation
Ting Zhang
Zhiqiang Yuan
Yeshuang Zhu
Jinchao Zhang
DiffM
168
0
0
31 Dec 2024
Deep Neural Networks and Brain Alignment: Brain Encoding and Decoding (Survey)
Subba Reddy Oota
Zijiao Chen
Manish Gupta
R. Bapi
G. Jobard
F. Alexandre
X. Hinaut
3DV
AI4CE
145
15
0
31 Dec 2024
Segmentation of Muscularis Propria in Colon Histopathology Images Using Vision Transformers for Hirschsprung's Disease
Youssef Megahed
Anthony Fuller
Saleh Abou-Alwan
Dina El Demellawy
Adrian D. C. Chan
MedIm
60
0
0
31 Dec 2024
SwinIA: Self-Supervised Blind-Spot Image Denoising without Convolutions
Mikhail Papkov
P. Chizhov
L. Parts
ViT
137
2
0
31 Dec 2024
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
Pinelopi Papalampidi
Skanda Koppula
Shreya Pathak
Justin T Chiu
Joseph Heyward
Viorica Patraucean
Jiajun Shen
Antoine Miech
Andrew Zisserman
Aida Nematzdeh
VLM
134
26
0
31 Dec 2024
Enhancing Visual Representation for Text-based Person Searching
Wei Shen
Ming Fang
Yuxia Wang
Jiafeng Xiao
Diping Li
Ningyu Zhang
Ling Xu
Weinan Zhang
111
4
0
31 Dec 2024
TravelAgent: Generative Agents in the Built Environment
Ariel Noyman
Kai Hu
Kent Larson
AI4CE
52
2
0
25 Dec 2024
The Dynamic Duo of Collaborative Masking and Target for Advanced Masked Autoencoder Learning
Shentong Mo
100
0
0
23 Dec 2024
VarAD: Lightweight High-Resolution Image Anomaly Detection via Visual Autoregressive Modeling
Yunkang Cao
Haiming Yao
Wei Luo
Nong Sang
139
7
0
23 Dec 2024
Adaptive Dataset Quantization
Muquan Li
Dongyang Zhang
Qiang Dong
Xiurui Xie
Ke Qin
DD
MQ
132
0
0
22 Dec 2024
IV-tuning: Parameter-Efficient Transfer Learning for Infrared-Visible Tasks
Yaming Zhang
Chenqiang Gao
Fangcen Liu
Junjie Guo
Lan Wang
Xinggan Peng
Deyu Meng
192
0
0
21 Dec 2024
Towards Graph Foundation Models: Learning Generalities Across Graphs via Task-Trees
Zehong Wang
Zheyuan Zhang
Tianyi Ma
Nitesh Chawla
Chuxu Zhang
Yanfang Ye
AI4CE
122
0
0
21 Dec 2024
DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment
Cijo Jose
Théo Moutakanni
Dahyun Kang
Federico Baldassarre
Timothée Darcet
...
Maxime Oquab
Oriane Siméoni
Huy V. Vo
Patrick Labatut
Piotr Bojanowski
CLIP
VLM
178
8
0
20 Dec 2024
Personalized Representation from Personalized Generation
Shobhita Sundaram
Julia Chae
Yonglong Tian
Sara Beery
Phillip Isola
108
1
0
20 Dec 2024
Previous
1
2
3
...
12
13
14
...
94
95
96
Next