Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.06377
Cited By
v1
v2
v3 (latest)
Masked Autoencoders Are Scalable Vision Learners
11 November 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Masked Autoencoders Are Scalable Vision Learners"
50 / 4,778 papers shown
Title
4D Contrastive Superflows are Dense 3D Representation Learners
Xiang Xu
Lingdong Kong
Hui Shuai
Wenwei Zhang
Liang Pan
Kai Chen
Ziwei Liu
Qingshan Liu
3DPC
108
10
0
08 Jul 2024
Transfer Learning with Self-Supervised Vision Transformers for Snake Identification
Anthony Miyaguchi
Murilo Gustineli
Austin Fischer
Ryan Lundqvist
43
4
0
08 Jul 2024
Pseudo-triplet Guided Few-shot Composed Image Retrieval
Bohan Hou
Haoqiang Lin
Haokun Wen
Meng Liu
Xuemeng Song
99
5
0
08 Jul 2024
Multimodal Diffusion Transformer: Learning Versatile Behavior from Multimodal Goals
Moritz Reuss
Ömer Erdinç Yagmurlu
Fabian Wenzel
Rudolf Lioutikov
OffRL
111
52
0
08 Jul 2024
KidSat: satellite imagery to map childhood poverty dataset and benchmark
Makkunda Sharma
Fan Yang
Duy-Nhat Vo
Esra Suel
Swapnil Mishra
Samir Bhatt
Oliver Fiala
William Rudgard
Seth Flaxman
118
1
0
08 Jul 2024
Bringing Masked Autoencoders Explicit Contrastive Properties for Point Cloud Self-Supervised Learning
Bin Ren
Guofeng Mei
D. Paudel
Weijie Wang
Yawei Li
Mengyuan Liu
Rita Cucchiara
Luc Van Gool
N. Sebe
3DPC
116
14
0
08 Jul 2024
Cross-domain Few-shot In-context Learning for Enhancing Traffic Sign Recognition
Yaozong Gan
Guang Li
Ren Togo
Keisuke Maeda
Takahiro Ogawa
Miki Haseyama
59
3
0
08 Jul 2024
Learning to Adapt Category Consistent Meta-Feature of CLIP for Few-Shot Classification
Jiaying Shi
Xuetong Xue
Shenghui Xu
VLM
146
0
0
08 Jul 2024
FALIP: Visual Prompt as Foveal Attention Boosts CLIP Zero-Shot Performance
Jiedong Zhuang
Jiaqi Hu
Lianrui Mu
Rui Hu
Xiaoyu Liang
Jiangnan Ye
Haoji Hu
CLIP
VLM
102
4
0
08 Jul 2024
Mamba-FSCIL: Dynamic Adaptation with Selective State Space Model for Few-Shot Class-Incremental Learning
Xiaojie Li
Yibo Yang
Jianlong Wu
Guohao Li
Ming-Hsuan Yang
Liqiang Nie
M. Zhang
Mamba
99
6
0
08 Jul 2024
Forest2Seq: Revitalizing Order Prior for Sequential Indoor Scene Synthesis
Qi Sun
Hang Zhou
Wengang Zhou
Li Li
Houqiang Li
3DPC
3DV
96
7
0
07 Jul 2024
CBM: Curriculum by Masking
Andrei Jarca
Florinel-Alin Croitoru
Radu Tudor Ionescu
62
0
0
06 Jul 2024
SAM Fewshot Finetuning for Anatomical Segmentation in Medical Images
Weiyi Xie
Nathalie Willems
Shubham Patil
Yang Li
Mayank Kumar
99
14
0
05 Jul 2024
Multi-modal Masked Siamese Network Improves Chest X-Ray Representation Learning
Saeed Shurrab
Alejandro Guerra-Manzanares
Farah E. Shamout
94
1
0
05 Jul 2024
Self-Supervised Representation Learning for Adversarial Attack Detection
Yi Li
Plamen Angelov
N. Suri
SSL
AAML
81
4
0
05 Jul 2024
MARS: Paying more attention to visual attributes for text-based person search
Alex Ergasti
Tomaso Fontanini
Claudio Ferrari
Massimo Bertozzi
Andrea Prati
86
10
0
05 Jul 2024
Smart Vision-Language Reasoners
Denisa Roberts
Lucas Roberts
VLM
ReLM
LRM
77
4
0
05 Jul 2024
Learning to Be a Transformer to Pinpoint Anomalies
Alex Costanzino
Pierluigi Zama Ramirez
Giuseppe Lisanti
Luigi Di Stefano
97
0
0
04 Jul 2024
Robust Adaptation of Foundation Models with Black-Box Visual Prompting
Changdae Oh
Gyeongdeok Seo
Geunyoung Jung
Zhi-Qi Cheng
Hosik Choi
Jiyoung Jung
Kyungwoo Song
VLM
128
1
0
04 Jul 2024
How JEPA Avoids Noisy Features: The Implicit Bias of Deep Linear Self Distillation Networks
Etai Littwin
Omid Saremi
Madhu Advani
Vimal Thilak
Preetum Nakkiran
Chen Huang
Joshua Susskind
80
5
0
03 Jul 2024
Precision at Scale: Domain-Specific Datasets On-Demand
Jesús M. Rodríguez-de-Vera
Imanol G. Estepa
Ignacio Sarasúa
Bhalaji Nagarajan
Petia Radeva
87
2
0
03 Jul 2024
A Survey on Trustworthiness in Foundation Models for Medical Image Analysis
Congzhen Shi
Ryan Rezai
Jiaxi Yang
Qi Dou
Xiaoxiao Li
MedIm
76
6
0
03 Jul 2024
LPViT: Low-Power Semi-structured Pruning for Vision Transformers
Kaixin Xu
Zhe Wang
Chunyun Chen
Xue Geng
Jie Lin
Xulei Yang
Min-man Wu
Min Wu
Xiaoli Li
Weisi Lin
ViT
VLM
211
10
0
02 Jul 2024
Towards Multimodal Open-Set Domain Generalization and Adaptation through Self-supervision
Hao Dong
Eleni Chatzi
Olga Fink
74
7
0
01 Jul 2024
Mask and Compress: Efficient Skeleton-based Action Recognition in Continual Learning
Matteo Mosconi
Andriy Sorokin
Aniello Panariello
Angelo Porrello
Jacopo Bonato
Marco Cotogni
Luigi Sabetta
Simone Calderara
Rita Cucchiara
CLL
75
1
0
01 Jul 2024
Diffusion Forcing: Next-token Prediction Meets Full-Sequence Diffusion
Boyuan Chen
Diego Marti Monso
Yilun Du
Max Simchowitz
Russ Tedrake
Vincent Sitzmann
DiffM
169
109
0
01 Jul 2024
The Solution for Temporal Sound Localisation Task of ICCV 1st Perception Test Challenge 2023
Yurui Huang
Yang Yang
Shou Chen
Xiangyu Wu
Qingguo Chen
Jianfeng Lu
107
0
0
01 Jul 2024
Efficient Cutting Tool Wear Segmentation Based on Segment Anything Model
Zongshuo Li
Ding Huo
M. Meurer
Thomas Bergs
65
0
0
01 Jul 2024
Coding for Intelligence from the Perspective of Category
Wenhan Yang
Zixuan Hu
Lilang Lin
Jiaying Liu
Ling-Yu Duan
AI4CE
172
1
0
01 Jul 2024
FairMedFM: Fairness Benchmarking for Medical Imaging Foundation Models
Ruinan Jin
Zikang Xu
Yuan Zhong
Qiongsong Yao
Qi Dou
S. Kevin Zhou
Xiaoxiao Li
VLM
109
17
0
01 Jul 2024
Look Ahead or Look Around? A Theoretical Comparison Between Autoregressive and Masked Pretraining
Qi Zhang
Tianqi Du
Haotian Huang
Yifei Wang
Yisen Wang
71
5
0
01 Jul 2024
StyleShot: A Snapshot on Any Style
Junyao Gao
Yanchen Liu
Yanan Sun
Yinhao Tang
Yanhong Zeng
Kai Chen
Cairong Zhao
TTA
3DH
VLM
180
19
0
01 Jul 2024
Diffusion Models and Representation Learning: A Survey
Michael Fuest
Pingchuan Ma
Ming Gui
Johannes S. Fischer
Vincent Tao Hu
Bjorn Ommer
DiffM
104
24
0
30 Jun 2024
Location embedding based pairwise distance learning for fine-grained diagnosis of urinary stones
Qiangguo Jin
Jiapeng Huang
Changming Sun
Hui Cui
Ping Xuan
...
Leyi Wei
Yu-Jie Wu
Chia-An Wu
H. Duh
Yueh-Hsun Lu
70
1
0
29 Jun 2024
Learning Unsupervised Gaze Representation via Eye Mask Driven Information Bottleneck
Yangzhou Jiang
Yinxin Lin
Yaoming Wang
Teng Li
Bilian Ke
Bingbing Ni
CVBM
85
1
0
29 Jun 2024
LLaVolta: Efficient Multi-modal Models via Stage-wise Visual Context Compression
Jieneng Chen
Luoxin Ye
Ju He
Zhao-Yang Wang
Daniel Khashabi
Alan Yuille
VLM
75
7
0
28 Jun 2024
Segment Anything without Supervision
Xudong Wang
Jingfeng Yang
Trevor Darrell
VLM
121
15
0
28 Jun 2024
GM-DF: Generalized Multi-Scenario Deepfake Detection
Yingxin Lai
Zitong Yu
Jing Yang
Bin Li
Xiangui Kang
Linlin Shen
134
11
0
28 Jun 2024
Fine-tuning of Geospatial Foundation Models for Aboveground Biomass Estimation
Michal Muszynski
Levente Klein
Ademir Ferreira da Silva
Anjani Prasad Atluri
Carlos Gomes
...
Shraddha Singh
Steve Meliksetian
Campbell Watson
Daiki Kimura
Harini Srinivasan
139
4
0
28 Jun 2024
Learning Visual Conditioning Tokens to Correct Domain Shift for Fully Test-time Adaptation
Yushun Tang
Shuoshuo Chen
Zhehan Kan
Yi Zhang
Qinghai Guo
Zhihai He
96
2
0
27 Jun 2024
From Efficient Multimodal Models to World Models: A Survey
Xinji Mai
Zeng Tao
Junxiong Lin
Haoran Wang
Yang Chang
Yanlan Kang
Yan Wang
Wenqiang Zhang
93
6
0
27 Jun 2024
Retain, Blend, and Exchange: A Quality-aware Spatial-Stereo Fusion Approach for Event Stream Recognition
Lan Chen
Dong Li
Xiao Wang
Pengpeng Shao
Wei Zhang
Yaowei Wang
Yonghong Tian
Jin Tang
102
2
0
27 Jun 2024
WV-Net: A foundation model for SAR WV-mode satellite imagery trained using contrastive self-supervised learning on 10 million images
Yannik Glaser
J. Stopa
Linnea M. Wolniewicz
Ralph Foster
Doug Vandemark
A. Mouche
Bertrand Chapron
Peter Sadowski
71
1
0
26 Jun 2024
AlignedCut: Visual Concepts Discovery on Brain-Guided Universal Feature Space
Huzheng Yang
James Gee
Jianbo Shi
VOS
68
2
0
26 Jun 2024
Changen2: Multi-Temporal Remote Sensing Generative Change Foundation Model
Zhuo Zheng
Stefano Ermon
Dongjun Kim
Liangpei Zhang
Yanfei Zhong
DiffM
87
20
0
26 Jun 2024
3D-MVP: 3D Multiview Pretraining for Robotic Manipulation
Shengyi Qian
Kaichun Mo
Valts Blukis
David Fouhey
Dieter Fox
Ankit Goyal
85
3
0
26 Jun 2024
Video Occupancy Models
Manan Tomar
Philippe Hansen-Estruch
Philip Bachman
Alex Lamb
John Langford
Matthew E. Taylor
Sergey Levine
127
3
0
25 Jun 2024
Unified Auto-Encoding with Masked Diffusion
Philippe Hansen-Estruch
S. Vishwanath
Amy Zhang
Manan Tomar
DiffM
93
1
0
25 Jun 2024
Investigating Self-Supervised Methods for Label-Efficient Learning
S. Nandam
Sara Atito
Zhenhua Feng
Josef Kittler
Muhammad Awais
VLM
79
2
0
25 Jun 2024
Pseudo Labelling for Enhanced Masked Autoencoders
S. Nandam
Sara Atito
Zhenhua Feng
Josef Kittler
Muhammad Awais
99
1
0
25 Jun 2024
Previous
1
2
3
...
27
28
29
...
94
95
96
Next