Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.06377
Cited By
v1
v2
v3 (latest)
Masked Autoencoders Are Scalable Vision Learners
11 November 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Masked Autoencoders Are Scalable Vision Learners"
50 / 4,777 papers shown
Title
VFM-Det: Towards High-Performance Vehicle Detection via Large Foundation Models
Wentao Wu
Fanghua Hong
Xiao Wang
Chenglong Li
Jin Tang
VLM
93
1
0
23 Aug 2024
Image Segmentation in Foundation Model Era: A Survey
Tianfei Zhou
Fei Zhang
Boyu Chang
Wenguan Wang
Ye Yuan
E. Konukoglu
Daniel Cremers
VLM
142
12
0
23 Aug 2024
From Few to More: Scribble-based Medical Image Segmentation via Masked Context Modeling and Continuous Pseudo Labels
Zhisong Wang
Yiwen Ye
Ziyang Chen
Minglei Shu
Yong Xia
86
1
0
23 Aug 2024
Symmetric masking strategy enhances the performance of Masked Image Modeling
Khanh-Binh Nguyen
Chae Jung Park
130
0
0
23 Aug 2024
Sapiens: Foundation for Human Vision Models
Rawal Khirodkar
Timur M. Bagautdinov
Julieta Martinez
Su Zhaoen
Austin James
Peter Selednik
Stuart Anderson
Shunsuke Saito
VLM
145
82
0
22 Aug 2024
CODE: Confident Ordinary Differential Editing
B. V. Delft
Tommaso Martorella
Alexandre Alahi
DiffM
106
0
0
22 Aug 2024
Multi-Style Facial Sketch Synthesis through Masked Generative Modeling
Bowen Sun
Guo Lu
Shibao Zheng
CVBM
66
0
0
22 Aug 2024
Cross-Domain Foundation Model Adaptation: Pioneering Computer Vision Models for Geophysical Data Analysis
Zhixiang Guo
Xinming Wu
Luming Liang
Hanlin Sheng
Nuo Chen
Zhengfa Bi
AI4CE
102
4
0
22 Aug 2024
SAM-SP: Self-Prompting Makes SAM Great Again
Chunpeng Zhou
Kangjie Ning
Qianqian Shen
Sheng Zhou
Zhi Yu
Haishuai Wang
VLM
82
3
0
22 Aug 2024
SynPlay: Importing Real-world Diversity for a Synthetic Human Dataset
Jinsub Yim
Hyungtae Lee
Sungmin Eum
Yi-Ting Shen
Yan Zhang
Heesung Kwon
Shuvra S. Bhattacharyya
VGen
111
1
0
21 Aug 2024
SEA: Supervised Embedding Alignment for Token-Level Visual-Textual Integration in MLLMs
Yuanyang Yin
Yaqi Zhao
Yajie Zhang
Ke Lin
Jiahao Wang
Xin Tao
Pengfei Wan
Di Zhang
Baoqun Yin
Wentao Zhang
LRM
111
9
0
21 Aug 2024
EMCNet : Graph-Nets for Electron Micrographs Classification
Sakhinana Sagar Srinivas
Rajat Kumar Sarkar
Venkataramana Runkana
96
0
0
21 Aug 2024
ShapeSplat: A Large-scale Dataset of Gaussian Splats and Their Self-Supervised Pretraining
Qi Ma
Yue Li
Bin Ren
N. Sebe
E. Konukoglu
Theo Gevers
Luc Van Gool
D. Paudel
3DGS
3DPC
118
16
0
20 Aug 2024
Rethinking Video Segmentation with Masked Video Consistency: Did the Model Learn as Intended?
Chen Liang
Qiang Guo
Xiaochao Qu
Luoqi Liu
Ting Liu
VOS
69
0
0
20 Aug 2024
SZTU-CMU at MER2024: Improving Emotion-LLaMA with Conv-Attention for Multimodal Emotion Recognition
Zebang Cheng
Shuyuan Tu
Dawei Huang
Minghan Li
Xiaojiang Peng
Zhi-Qi Cheng
Alexander G. Hauptmann
145
2
0
20 Aug 2024
CooPre: Cooperative Pretraining for V2X Cooperative Perception
Seth Z. Zhao
Hao Xiang
Chenfeng Xu
Xin Xia
Bolei Zhou
Jiaqi Ma
3DPC
148
2
0
20 Aug 2024
PooDLe: Pooled and dense self-supervised learning from naturalistic videos
Alex N. Wang
Christopher Hoang
Yuwen Xiong
Yann LeCun
Mengye Ren
254
0
0
20 Aug 2024
Leveraging Superfluous Information in Contrastive Representation Learning
Xuechu Yu
SSL
67
2
0
19 Aug 2024
Uniting contrastive and generative learning for event sequences models
Aleksandr Yugay
Alexey Zaytsev
AI4TS
97
1
0
19 Aug 2024
Mutually-Aware Feature Learning for Few-Shot Object Counting
Yerim Jeon
Subeen Lee
Jihwan Kim
Jae-Pil Heo
96
1
0
19 Aug 2024
Image-based Freeform Handwriting Authentication with Energy-oriented Self-Supervised Learning
Wenwen Qiang
Luntian Mou
Changwen Zheng
Wen Gao
AAML
76
2
0
19 Aug 2024
Enhancing Modal Fusion by Alignment and Label Matching for Multimodal Emotion Recognition
Qifei Li
Yingming Gao
Yuhua Wen
Cong Wang
Ya Li
61
1
0
18 Aug 2024
EEG-SCMM: Soft Contrastive Masked Modeling for Cross-Corpus EEG-Based Emotion Recognition
Qile Liu
Weishan Ye
Yulu Liu
Zhen Liang
100
0
0
17 Aug 2024
Zero-Shot Object-Centric Representation Learning
Aniket Didolkar
Andrii Zadaianchuk
Anirudh Goyal
Mike Mozer
Yoshua Bengio
Georg Martius
Maximilian Seitzer
VLM
OCL
90
8
0
17 Aug 2024
HybridOcc: NeRF Enhanced Transformer-based Multi-Camera 3D Occupancy Prediction
Xiao Zhao
Bo Chen
Mingyang Sun
Dingkang Yang
Youxing Wang
Xukun Zhang
Mingcheng Li
Dongliang Kou
Xiaoyi Wei
Lihua Zhang
101
6
0
17 Aug 2024
ADen: Adaptive Density Representations for Sparse-view Camera Pose Estimation
Hao Tang
Weiyao Wang
Pierre Gleize
Matt Feiszli
3DH
73
1
0
16 Aug 2024
OpenCity: Open Spatio-Temporal Foundation Models for Traffic Prediction
Zhonghang Li
Long Xia
Lei Shi
Yong-mei Xu
D. Yin
Chao Huang
VLM
AI4TS
AI4CE
85
10
0
16 Aug 2024
PCP-MAE: Learning to Predict Centers for Point Masked Autoencoders
Xiangdong Zhang
Shaofeng Zhang
Junchi Yan
3DPC
107
7
0
16 Aug 2024
MAT-SED: A Masked Audio Transformer with Masked-Reconstruction Based Pre-training for Sound Event Detection
Pengfei Cai
Yan Song
Kang Li
Haoyu Song
Ian Mcloughlin
76
6
0
16 Aug 2024
SpectralEarth: Training Hyperspectral Foundation Models at Scale
Nassim Ait Ali Braham
C. Albrecht
Julien Mairal
J. Chanussot
Yi Wang
X. Zhu
82
15
0
15 Aug 2024
HyperTaxel: Hyper-Resolution for Taxel-Based Tactile Signals Through Contrastive Learning
Hongyu Li
Snehal Dikhale
Jinda Cui
Soshi Iba
Nawid Jamali
98
3
0
15 Aug 2024
SLCA++: Unleash the Power of Sequential Fine-tuning for Continual Learning with Pre-training
Gengwei Zhang
Liyuan Wang
Guoliang Kang
Ling Chen
Yunchao Wei
VLM
CLL
68
7
0
15 Aug 2024
Unsupervised Part Discovery via Dual Representation Alignment
Jiahao Xia
Wenjian Huang
Min Xu
Jianguo Zhang
Haimin Zhang
Ziyu Sheng
Dong Xu
92
0
0
15 Aug 2024
Snuffy: Efficient Whole Slide Image Classifier
Hossein Jafarinia
Alireza Alipanah
Danial Hamdi
Saeed Razavi
Nahal Mirzaie
M. Rohban
3DH
96
2
0
15 Aug 2024
End-to-end Semantic-centric Video-based Multimodal Affective Computing
Ronghao Lin
Ying Zeng
Sijie Mai
Haifeng Hu
VGen
118
0
0
14 Aug 2024
Whitening Consistently Improves Self-Supervised Learning
András Kalapos
Bálint Gyires-Tóth
SSL
82
0
0
14 Aug 2024
Domain-invariant Representation Learning via Segment Anything Model for Blood Cell Classification
Yongcheng Li
Lingcong Cai
Ying Lu
Cheng Lin
Yupeng Zhang
...
Genan Dai
Bowen Zhang
Jingzhou Cao
Xiangzhong Zhang
Xiaomao Fan
108
1
0
14 Aug 2024
Connecting Dreams with Visual Brainstorming Instruction
Yasheng Sun
Bohan Li
Mingchen Zhuge
Deng-Ping Fan
Salman Khan
Fahad Shahbaz Khan
Hideki Koike
DiffM
64
0
0
14 Aug 2024
Image-Based Leopard Seal Recognition: Approaches and Challenges in Current Automated Systems
Jorge Yero Salazar
Pablo Rivas
Renato Borras-Chavez
Sarah Kienle
45
0
0
14 Aug 2024
CNN-JEPA: Self-Supervised Pretraining Convolutional Neural Networks Using Joint Embedding Predictive Architecture
András Kalapos
Bálint Gyires-Tóth
110
2
0
14 Aug 2024
Membership Inference Attack Against Masked Image Modeling
Zehan Li
Xinlei He
Ning Yu
Yang Zhang
77
3
0
13 Aug 2024
Token Compensator: Altering Inference Cost of Vision Transformer without Re-Tuning
Shibo Jie
Yehui Tang
Jianyuan Guo
Zhi-Hong Deng
Kai Han
Yunhe Wang
VLM
62
4
0
13 Aug 2024
Masked Image Modeling: A Survey
Vlad Hondru
Florinel-Alin Croitoru
Shervin Minaee
Radu Tudor Ionescu
N. Sebe
192
8
0
13 Aug 2024
ClickAttention: Click Region Similarity Guided Interactive Segmentation
Long Xu
Shanghong Li
Yongquan Chen
Junkang Chen
Rui Huang
Feng Wu
76
0
0
12 Aug 2024
Diffuse-UDA: Addressing Unsupervised Domain Adaptation in Medical Image Segmentation with Appearance and Structure Aligned Diffusion Models
Haifan Gong
Yitao Wang
Yihan Wang
Jiashun Xiao
Xiang Wan
Haofeng Li
MedIm
116
3
0
12 Aug 2024
Multi-scale Contrastive Adaptor Learning for Segmenting Anything in Underperformed Scenes
Ke Zhou
Zhongwei Qiu
Dongmei Fu
VLM
70
3
0
12 Aug 2024
Enhancing 3D Transformer Segmentation Model for Medical Image with Token-level Representation Learning
Xinrong Hu
Dewen Zeng
Yawen Wu
Xueyang Li
Yiyu Shi
ViT
MedIm
75
0
0
12 Aug 2024
HySparK: Hybrid Sparse Masking for Large Scale Medical Image Pre-Training
Fenghe Tang
Ronghao Xu
Qingsong Yao
Xueming Fu
Quan Quan
Heqin Zhu
Zaiyi Liu
S. Kevin Zhou
SSL
MedIm
107
3
0
11 Aug 2024
Contrastive masked auto-encoders based self-supervised hashing for 2D image and 3D point cloud cross-modal retrieval
Rukai Wei
Heng Cui
Yu Liu
Yufeng Hou
Yanzhao Xie
Ke Zhou
3DPC
47
0
0
11 Aug 2024
Scene123: One Prompt to 3D Scene Generation via Video-Assisted and Consistency-Enhanced MAE
Yiying Yang
Fukun Yin
Jiayuan Fan
Xin Chen
Wanzhang Li
Gang Yu
VGen
94
1
0
10 Aug 2024
Previous
1
2
3
...
23
24
25
...
94
95
96
Next