Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.06377
Cited By
Masked Autoencoders Are Scalable Vision Learners
11 November 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Masked Autoencoders Are Scalable Vision Learners"
50 / 4,615 papers shown
Title
Long-Tailed Object Detection Pre-training: Dynamic Rebalancing Contrastive Learning with Dual Reconstruction
Chen-Long Duan
Yong Li
Xiu-Shen Wei
Lin Zhao
36
1
0
14 Nov 2024
Harnessing Vision Foundation Models for High-Performance, Training-Free Open Vocabulary Segmentation
Yuheng Shi
Minjing Dong
Chang Xu
VLM
43
1
0
14 Nov 2024
VidMan: Exploiting Implicit Dynamics from Video Diffusion Model for Effective Robot Manipulation
Youpeng Wen
Junfan Lin
Bo Li
J. Han
Hang Xu
Shen Zhao
Xiaodan Liang
VGen
DiffM
43
2
0
14 Nov 2024
Towards More Accurate Fake Detection on Images Generated from Advanced Generative and Neural Rendering Models
Chengdong Dong
Vijayakumar Bhagavatula
Zhenyu Zhou
Ajay Kumar
36
0
0
13 Nov 2024
Learning Disentangled Representations for Perceptual Point Cloud Quality Assessment via Mutual Information Minimization
Ziyu Shan
Yujie Zhang
Yipeng Liu
Yiling Xu
41
0
0
12 Nov 2024
SAMPart3D: Segment Any Part in 3D Objects
Yanting Yang
Yukun Huang
Yu Guo
Liangjun Lu
Xiaoyang Wu
Edmund Y. Lam
Yan-Pei Cao
Xihui Liu
VLM
39
7
0
11 Nov 2024
Past, Present, and Future of Sensor-Based Human Activity Recognition Using Wearables: A Surveying Tutorial on a Still Challenging Task
H. Haresamudram
Chi Ian Tang
Sungho Suh
P. Lukowicz
Thomas Ploetz
76
2
0
11 Nov 2024
ENAT: Rethinking Spatial-temporal Interactions in Token-based Image Synthesis
Zanlin Ni
Yulin Wang
Renping Zhou
Yizeng Han
Jiayi Guo
Zhiyuan Liu
Yuan Yao
Gao Huang
63
4
0
11 Nov 2024
White-Box Diffusion Transformer for single-cell RNA-seq generation
Zhuorui Cui
Shengze Dong
Ding Liu
35
1
0
11 Nov 2024
Understanding the Role of Equivariance in Self-supervised Learning
Yifei Wang
Kaiwen Hu
Sharut Gupta
Ziyu Ye
Yisen Wang
Stefanie Jegelka
SSL
50
2
0
10 Nov 2024
CityGuessr: City-Level Video Geo-Localization on a Global Scale
P. Kulkarni
Gaurav Kumar Nayak
Mubarak Shah
ViT
AI4TS
29
2
0
10 Nov 2024
Pattern Integration and Enhancement Vision Transformer for Self-Supervised Learning in Remote Sensing
Kaixuan Lu
Ruiqian Zhang
Xiao Huang
Yuxing Xie
Xiaogang Ning
Hanchao Zhang
Mengke Yuan
Pan Zhang
Tao Wang
Tongkui Liao
37
0
0
09 Nov 2024
Concept Bottleneck Language Models For protein design
Aya Abdelsalam Ismail
Tuomas Oikarinen
Amy Wang
Julius Adebayo
Samuel Stanton
...
J. Kleinhenz
Allen Goodman
H. C. Bravo
Kyunghyun Cho
Nathan C. Frey
45
4
0
09 Nov 2024
CROPS: A Deployable Crop Management System Over All Possible State Availabilities
Jing Wu
Zhixin Lai
Shengjie Liu
Suiyao Chen
Ran Tao
Pan Zhao
Chuyuan Tao
Yikun Cheng
N. Hovakimyan
OffRL
53
0
0
09 Nov 2024
GCI-ViTAL: Gradual Confidence Improvement with Vision Transformers for Active Learning on Label Noise
Moseli Motsóehli
Kyungim Baek
34
1
0
08 Nov 2024
Moving Off-the-Grid: Scene-Grounded Video Representations
Sjoerd van Steenkiste
Daniel Zoran
Yi Yang
Yulia Rubanova
Rishabh Kabra
...
Thomas Keck
João Carreira
Alexey Dosovitskiy
Mehdi S. M. Sajjadi
Thomas Kipf
39
3
0
08 Nov 2024
Autoregressive Models in Vision: A Survey
Jing Xiong
Gongye Liu
Lun Huang
Chengyue Wu
Taiqiang Wu
...
Hao Fei
Guillermo Sapiro
Jiebo Luo
Ping Luo
Ngai Wong
VGen
48
9
0
08 Nov 2024
Enhancing Cardiovascular Disease Prediction through Multi-Modal Self-Supervised Learning
Francesco Girlanda
Olga Demler
Bjoern H. Menze
Neda Davoudi
42
0
0
08 Nov 2024
Don't Look Twice: Faster Video Transformers with Run-Length Tokenization
Rohan Choudhury
Guanglei Zhu
Sihan Liu
Koichiro Niinuma
Kris M. Kitani
László A. Jeni
31
11
0
07 Nov 2024
Diff-2-in-1: Bridging Generation and Dense Perception with Diffusion Models
Shuhong Zheng
Zhipeng Bao
Ruoyu Zhao
Martial Hebert
Yu-xiong Wang
DiffM
35
0
0
07 Nov 2024
wav2sleep: A Unified Multi-Modal Approach to Sleep Stage Classification from Physiological Signals
Jonathan Carter
Lionel Tarassenko
MLAU
45
0
0
07 Nov 2024
Image Understanding Makes for A Good Tokenizer for Image Generation
Luting Wang
Yang Zhao
Zijian Zhang
Jiashi Feng
Si Liu
Bingyi Kang
VLM
44
4
0
07 Nov 2024
Cross- and Intra-image Prototypical Learning for Multi-label Disease Diagnosis and Interpretation
Chong Wang
Fengbei Liu
Yuanhong Chen
Helen Frazer
Gustavo Carneiro
32
2
0
07 Nov 2024
A Contrastive Self-Supervised Learning scheme for beat tracking amenable to few-shot learning
Antonin Gagnere
Geoffroy Peeters
S. Essid
45
1
0
06 Nov 2024
AMNCutter: Affinity-Attention-Guided Multi-View Normalized Cutter for Unsupervised Surgical Instrument Segmentation
Mingyu Sheng
Jianan Fan
Dongnan Liu
Ron Kikinis
Weidong Cai
39
0
0
06 Nov 2024
Estimating Ego-Body Pose from Doubly Sparse Egocentric Video Data
Seunggeun Chi
Pin-Hao Huang
Enna Sachdeva
Hengbo Ma
Karthik Ramani
Kwonjoon Lee
DiffM
47
2
0
05 Nov 2024
Classification Done Right for Vision-Language Pre-Training
Zilong Huang
Qinghao Ye
Bingyi Kang
Jiashi Feng
Haoqi Fan
CLIP
VLM
50
2
0
05 Nov 2024
TIP-I2V: A Million-Scale Real Text and Image Prompt Dataset for Image-to-Video Generation
Wenhao Wang
Yuqing Yang
VGen
47
3
0
05 Nov 2024
Pre-trained Visual Dynamics Representations for Efficient Policy Learning
Hao Luo
Bohan Zhou
Zongqing Lu
30
1
0
05 Nov 2024
MA^2: A Self-Supervised and Motion Augmenting Autoencoder for Gait-Based Automatic Disease Detection
Yiqun Liu
Ke Zhang
Yin Zhu
MedIm
33
0
0
05 Nov 2024
Rethinking Decoders for Transformer-based Semantic Segmentation: A Compression Perspective
Qishuai Wen
Chun-Guang Li
ViT
37
0
0
05 Nov 2024
A Mamba Foundation Model for Time Series Forecasting
Haoyu Ma
Yushu Chen
Wenlai Zhao
Jinzhe Yang
Yingsheng Ji
Xinghua Xu
Xiaozhu Liu
Hao Jing
Shengzhuo Liu
Guangwen Yang
AI4TS
Mamba
47
2
0
05 Nov 2024
Multi-Transmotion: Pre-trained Model for Human Motion Prediction
Yang Gao
Po-Chien Luan
Alexandre Alahi
44
6
0
04 Nov 2024
ViTally Consistent: Scaling Biological Representation Learning for Cell Microscopy
Kian Kenyon-Dean
Zitong Jerry Wang
John Urbanik
Konstantin Donhauser
Jason Hartford
...
Safiye Celik
Marta Fay
Juan Sebastian Rodriguez Vera
I. Haque
Oren Z. Kraus
MedIm
39
4
0
04 Nov 2024
Segment Anything for Dendrites from Electron Microscopy
Zewen Zhuo
I. Belevich
Ville Leinonen
E. Jokitalo
Tarja Malm
Alejandra Sierra
Jussi Tohka
37
1
0
04 Nov 2024
Adaptive Length Image Tokenization via Recurrent Allocation
Shivam Duggal
Phillip Isola
Antonio Torralba
William T. Freeman
VLM
41
5
0
04 Nov 2024
Unified Speech Recognition: A Single Model for Auditory, Visual, and Audiovisual Inputs
A. Haliassos
Rodrigo Mira
Honglie Chen
Zoe Landgraf
Stavros Petridis
M. Pantic
SSL
37
5
0
04 Nov 2024
Masked Autoencoders are Parameter-Efficient Federated Continual Learners
Yuchen He
Xiangfeng Wang
CLL
FedML
40
0
0
04 Nov 2024
Expanding Sparse Tuning for Low Memory Usage
Shufan Shen
Junshu Sun
Xiangyang Ji
Qingming Huang
Shuhui Wang
50
0
0
04 Nov 2024
Visual Fourier Prompt Tuning
Runjia Zeng
Cheng Han
Qifan Wang
Chunshu Wu
Tong Geng
Lifu Huang
Ying Nian Wu
Dongfang Liu
VPVLM
VLM
61
6
0
02 Nov 2024
HEXA-MoE: Efficient and Heterogeneous-aware MoE Acceleration with ZERO Computation Redundancy
Shuqing Luo
Jie Peng
Pingzhi Li
Tianlong Chen
MoE
36
2
0
02 Nov 2024
HIP: Hierarchical Point Modeling and Pre-training for Visual Information Extraction
Rujiao Long
Pengfei Wang
Zhibo Yang
Cong Yao
41
0
0
02 Nov 2024
Music Foundation Model as Generic Booster for Music Downstream Tasks
Weihsiang Liao
Yuhta Takida
Yukara Ikemiya
Zhi-Wei Zhong
Chieh-Hsin Lai
...
Stefan Uhlich
Taketo Akama
Woosung Choi
Yuichiro Koyama
Yuki Mitsufuji
56
0
0
02 Nov 2024
Randomized Autoregressive Visual Generation
Qihang Yu
Ju He
XueQing Deng
Xiaohui Shen
Liang-Chieh Chen
VGen
DiffM
57
30
1
01 Nov 2024
PedSleepMAE: Generative Model for Multimodal Pediatric Sleep Signals
Saurav R. Pandey
Aaqib Saeed
Harlin Lee
30
0
0
01 Nov 2024
Preventing Dimensional Collapse in Self-Supervised Learning via Orthogonality Regularization
Junlin He
Jinxiao Du
Wei Ma
SSL
40
0
0
01 Nov 2024
Multiple Information Prompt Learning for Cloth-Changing Person Re-Identification
Shengxun Wei
Zan Gao
Yibo Zhao
Weili Guan
Weili Guan
Shengyong Chen
51
2
0
01 Nov 2024
Learning Video Representations without Natural Videos
Xueyang Yu
Xinlei Chen
Yossi Gandelsman
VGen
AI4TS
54
0
0
31 Oct 2024
No Pose, No Problem: Surprisingly Simple 3D Gaussian Splats from Sparse Unposed Images
Botao Ye
Sifei Liu
Haofei Xu
Xueting Li
Marc Pollefeys
Ming Yang
Songyou Peng
40
21
0
31 Oct 2024
Sparsh: Self-supervised touch representations for vision-based tactile sensing
Carolina Higuera
Akash Sharma
Chaithanya Krishna Bodduluri
Taosha Fan
Patrick E. Lancaster
...
Michael Kaess
Byron Boots
Mike Lambeta
Tingfan Wu
Mustafa Mukadam
47
12
0
31 Oct 2024
Previous
1
2
3
...
12
13
14
...
91
92
93
Next