Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.06377
Cited By
v1
v2
v3 (latest)
Masked Autoencoders Are Scalable Vision Learners
11 November 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Masked Autoencoders Are Scalable Vision Learners"
50 / 4,777 papers shown
Title
Learning a Neural Association Network for Self-supervised Multi-Object Tracking
Shuai Li
Michael G. Burke
S. Ramamoorthy
Juergen Gall
VOT
156
0
0
18 Nov 2024
Teaching Video Diffusion Model with Latent Physical Phenomenon Knowledge
Qinglong Cao
Ding Wang
Xirui Li
Yuntian Chen
Chao Ma
Xiaokang Yang
DiffM
VGen
148
2
0
18 Nov 2024
Relational Contrastive Learning and Masked Image Modeling for Scene Text Recognition
T. Lin
Jinglei Zhang
Yi Xu
Kai Chen
Rui Zhang
Chong Chen
103
0
0
18 Nov 2024
LaVin-DiT: Large Vision Diffusion Transformer
Zhaoqing Wang
Xiaobo Xia
Runnan Chen
Dongdong Yu
Changhu Wang
Mingming Gong
Tongliang Liu
180
8
0
18 Nov 2024
MpoxVLM: A Vision-Language Model for Diagnosing Skin Lesions from Mpox Virus Infection
Xu Cao
Wenqian Ye
K. Moise
Megan Coffee
85
2
0
16 Nov 2024
One-Layer Transformer Provably Learns One-Nearest Neighbor In Context
Zihao Li
Yuan Cao
Cheng Gao
Yihan He
Han Liu
Jason M. Klusowski
Jianqing Fan
Mengdi Wang
MLT
166
8
0
16 Nov 2024
From Prototypes to General Distributions: An Efficient Curriculum for Masked Image Modeling
Jinhong Lin
Cheng-En Wu
Huanran Li
Jifan Zhang
Yu Hen Hu
Pedro Morgado
117
0
0
16 Nov 2024
CorrCLIP: Reconstructing Correlations in CLIP with Off-the-Shelf Foundation Models for Open-Vocabulary Semantic Segmentation
Dengke Zhang
Fagui Liu
Quan Tang
VLM
157
2
0
15 Nov 2024
Modification Takes Courage: Seamless Image Stitching via Reference-Driven Inpainting
Ziqi Xie
Xiao Lai
Weidong Zhao
Xianhui Liu
Wenlong Hou
Wenlong Hou
161
0
0
15 Nov 2024
FedAli: Personalized Federated Learning Alignment with Prototype Layers for Generalized Mobile Services
Sannara Ek
Kaile Wang
François Portet
P. Lalanda
Jiannong Cao
FedML
106
0
0
15 Nov 2024
On the Surprising Effectiveness of Attention Transfer for Vision Transformers
Alexander C. Li
Yuandong Tian
Bin Chen
Deepak Pathak
Xinlei Chen
75
3
0
14 Nov 2024
Assessing the Performance of the DINOv2 Self-supervised Learning Vision Transformer Model for the Segmentation of the Left Atrium from MRI Images
Bipasha Kundu
Bidur Khanal
R. Simon
Cristian A. Linte
MedIm
48
4
0
14 Nov 2024
Long-Tailed Object Detection Pre-training: Dynamic Rebalancing Contrastive Learning with Dual Reconstruction
Chen-Long Duan
Yong Li
Xiu-Shen Wei
Lin Zhao
61
1
0
14 Nov 2024
Harnessing Vision Foundation Models for High-Performance, Training-Free Open Vocabulary Segmentation
Yuheng Shi
Minjing Dong
Chang Xu
VLM
118
3
0
14 Nov 2024
VidMan: Exploiting Implicit Dynamics from Video Diffusion Model for Effective Robot Manipulation
Youpeng Wen
Junfan Lin
Yinlin Zhu
Jiawei Han
Hang Xu
Shen Zhao
Xiaodan Liang
VGen
DiffM
100
5
0
14 Nov 2024
Towards More Accurate Fake Detection on Images Generated from Advanced Generative and Neural Rendering Models
Chengdong Dong
Vijayakumar Bhagavatula
Zhenyu Zhou
Ajay Kumar
127
0
0
13 Nov 2024
Learning Disentangled Representations for Perceptual Point Cloud Quality Assessment via Mutual Information Minimization
Ziyu Shan
Yujie Zhang
Yipeng Liu
Yiling Xu
57
0
0
12 Nov 2024
SAMPart3D: Segment Any Part in 3D Objects
Yanting Yang
Yukun Huang
Yu Guo
Liangjun Lu
Xiaoyang Wu
Edmund Y. Lam
Yan-Pei Cao
Xihui Liu
VLM
115
12
0
11 Nov 2024
Past, Present, and Future of Sensor-Based Human Activity Recognition Using Wearables: A Surveying Tutorial on a Still Challenging Task
H. Haresamudram
Chi Ian Tang
Sungho Suh
P. Lukowicz
Thomas Ploetz
183
3
0
11 Nov 2024
ENAT: Rethinking Spatial-temporal Interactions in Token-based Image Synthesis
Zanlin Ni
Yulin Wang
Renping Zhou
Yizeng Han
Jiayi Guo
Zhiyuan Liu
Yuan Yao
Gao Huang
103
5
0
11 Nov 2024
White-Box Diffusion Transformer for single-cell RNA-seq generation
Zhuorui Cui
Shengze Dong
Ding Liu
40
1
0
11 Nov 2024
Understanding the Role of Equivariance in Self-supervised Learning
Yifei Wang
Kaiwen Hu
Sharut Gupta
Ziyu Ye
Yisen Wang
Stefanie Jegelka
SSL
97
2
0
10 Nov 2024
CityGuessr: City-Level Video Geo-Localization on a Global Scale
P. Kulkarni
Gaurav Kumar Nayak
Mubarak Shah
ViT
AI4TS
58
3
0
10 Nov 2024
Pattern Integration and Enhancement Vision Transformer for Self-Supervised Learning in Remote Sensing
Kaixuan Lu
Ruiqian Zhang
Xiao Huang
Yuxing Xie
Xiaogang Ning
Hanchao Zhang
Mengke Yuan
Pan Zhang
Tao Wang
Tongkui Liao
84
2
0
09 Nov 2024
Concept Bottleneck Language Models For protein design
Aya Abdelsalam Ismail
Tuomas Oikarinen
Amy Wang
Julius Adebayo
Samuel Stanton
...
J. Kleinhenz
Allen Goodman
H. C. Bravo
Kyunghyun Cho
Nathan C. Frey
116
6
0
09 Nov 2024
CROPS: A Deployable Crop Management System Over All Possible State Availabilities
Jing Wu
Zhixin Lai
Shengjie Liu
Suiyao Chen
Ran Tao
Pan Zhao
Chuyuan Tao
Yikun Cheng
N. Hovakimyan
OffRL
101
0
0
09 Nov 2024
GCI-ViTAL: Gradual Confidence Improvement with Vision Transformers for Active Learning on Label Noise
Moseli Motsóehli
Kyungim Baek
90
1
0
08 Nov 2024
Moving Off-the-Grid: Scene-Grounded Video Representations
Sjoerd van Steenkiste
Daniel Zoran
Yi Yang
Yulia Rubanova
Rishabh Kabra
...
Thomas Keck
João Carreira
Alexey Dosovitskiy
Mehdi S. M. Sajjadi
Thomas Kipf
75
4
0
08 Nov 2024
Enhancing Cardiovascular Disease Prediction through Multi-Modal Self-Supervised Learning
Francesco Girlanda
Olga Demler
Bjoern Menze
Neda Davoudi
107
0
0
08 Nov 2024
Autoregressive Models in Vision: A Survey
Jing Xiong
Gongye Liu
Lun Huang
Chengyue Wu
Taiqiang Wu
...
Hao Fei
Guillermo Sapiro
Jiebo Luo
Ping Luo
Ngai Wong
VGen
191
14
0
08 Nov 2024
Don't Look Twice: Faster Video Transformers with Run-Length Tokenization
Rohan Choudhury
Guanglei Zhu
Sihan Liu
Koichiro Niinuma
Kris M. Kitani
László A. Jeni
83
14
0
07 Nov 2024
Diff-2-in-1: Bridging Generation and Dense Perception with Diffusion Models
Shuhong Zheng
Zhipeng Bao
Ruoyu Zhao
Martial Hebert
Yu-Xiong Wang
DiffM
162
0
0
07 Nov 2024
wav2sleep: A Unified Multi-Modal Approach to Sleep Stage Classification from Physiological Signals
Jonathan Carter
Lionel Tarassenko
MLAU
97
0
0
07 Nov 2024
Image Understanding Makes for A Good Tokenizer for Image Generation
Luting Wang
Yang Zhao
Zijian Zhang
Jiashi Feng
Si Liu
Bingyi Kang
VLM
89
4
0
07 Nov 2024
Cross- and Intra-image Prototypical Learning for Multi-label Disease Diagnosis and Interpretation
Chong Wang
Fengbei Liu
Yuanhong Chen
Helen Frazer
Gustavo Carneiro
123
2
0
07 Nov 2024
A Contrastive Self-Supervised Learning scheme for beat tracking amenable to few-shot learning
Antonin Gagnere
Geoffroy Peeters
S. Essid
73
1
0
06 Nov 2024
AMNCutter: Affinity-Attention-Guided Multi-View Normalized Cutter for Unsupervised Surgical Instrument Segmentation
Mingyu Sheng
Jianan Fan
Dongnan Liu
Ron Kikinis
Weidong Cai
80
0
0
06 Nov 2024
Estimating Ego-Body Pose from Doubly Sparse Egocentric Video Data
Seunggeun Chi
Pin-Hao Huang
Enna Sachdeva
Hengbo Ma
Karthik Ramani
Kwonjoon Lee
DiffM
74
2
0
05 Nov 2024
Classification Done Right for Vision-Language Pre-Training
Zilong Huang
Qinghao Ye
Bingyi Kang
Jiashi Feng
Haoqi Fan
CLIP
VLM
122
4
0
05 Nov 2024
TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation
Wenhao Wang
Yue Yang
VGen
108
3
0
05 Nov 2024
Pre-trained Visual Dynamics Representations for Efficient Policy Learning
Hao Luo
Bohan Zhou
Zongqing Lu
68
2
0
05 Nov 2024
MA^2: A Self-Supervised and Motion Augmenting Autoencoder for Gait-Based Automatic Disease Detection
Yiqun Liu
Ke Zhang
Yin Zhu
MedIm
40
0
0
05 Nov 2024
Rethinking Decoders for Transformer-based Semantic Segmentation: A Compression Perspective
Qishuai Wen
Chun-Guang Li
ViT
64
0
0
05 Nov 2024
A Mamba Foundation Model for Time Series Forecasting
Haoyu Ma
Yushu Chen
Wenlai Zhao
Jinzhe Yang
Yingsheng Ji
Xinghua Xu
Xiaozhu Liu
Hao Jing
Shengzhuo Liu
Guangwen Yang
AI4TS
Mamba
125
5
0
05 Nov 2024
Multi-Transmotion: Pre-trained Model for Human Motion Prediction
Yang Gao
Po-Chien Luan
Alexandre Alahi
63
10
0
04 Nov 2024
ViTally Consistent: Scaling Biological Representation Learning for Cell Microscopy
Kian Kenyon-Dean
Zitong Jerry Wang
John Urbanik
Konstantin Donhauser
Jason Hartford
...
Safiye Celik
Marta M. Fay
Juan Sebastian Rodriguez Vera
I. Haque
Oren Z. Kraus
MedIm
113
6
0
04 Nov 2024
Segment Anything for Dendrites from Electron Microscopy
Zewen Zhuo
I. Belevich
Ville Leinonen
E. Jokitalo
Tarja Malm
Alejandra Sierra
Jussi Tohka
37
1
0
04 Nov 2024
Adaptive Length Image Tokenization via Recurrent Allocation
Shivam Duggal
Phillip Isola
Antonio Torralba
William T. Freeman
VLM
102
9
0
04 Nov 2024
Unified Speech Recognition: A Single Model for Auditory, Visual, and Audiovisual Inputs
A. Haliassos
Rodrigo Mira
Honglie Chen
Zoe Landgraf
Stavros Petridis
Maja Pantic
SSL
86
7
0
04 Nov 2024
Masked Autoencoders are Parameter-Efficient Federated Continual Learners
Yuchen He
Xiangfeng Wang
CLL
FedML
62
0
0
04 Nov 2024
Previous
1
2
3
...
15
16
17
...
94
95
96
Next