Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.06377
Cited By
Masked Autoencoders Are Scalable Vision Learners
11 November 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Masked Autoencoders Are Scalable Vision Learners"
50 / 4,611 papers shown
Title
AutoSSVH: Exploring Automated Frame Sampling for Efficient Self-Supervised Video Hashing
Niu Lian
Jun Li
Jinpeng Wang
Ruisheng Luo
Yaowei Wang
Shu-Tao Xia
Bin Chen
95
0
0
04 Apr 2025
MIMRS: A Survey on Masked Image Modeling in Remote Sensing
Shabnam Choudhury
Akhil Vasim
Michael Schmitt
Biplab Banerjee
30
0
0
04 Apr 2025
Geospatial Artificial Intelligence for Satellite-based Flood Extent Mapping: Concepts, Advances, and Future Perspectives
Hyunho Lee
Wenwen Li
AI4CE
38
0
0
03 Apr 2025
Refining CLIP's Spatial Awareness: A Visual-Centric Perspective
Congpei Qiu
Yanhao Wu
Wei Ke
Xiuxiu Bai
Tong Zhang
VLM
47
0
0
03 Apr 2025
Safety Modulation: Enhancing Safety in Reinforcement Learning through Cost-Modulated Rewards
Hanping Zhang
Yuhong Guo
OffRL
36
0
0
03 Apr 2025
ESC: Erasing Space Concept for Knowledge Deletion
Tae-Young Lee
Sundong Park
M. Jeon
Hyoseok Hwang
Gyeong-Moon Park
KELM
MU
37
0
0
03 Apr 2025
A Sensorimotor Vision Transformer
Konrad Gadzicki
K. Schill
C. Zetzsche
49
0
0
03 Apr 2025
Spline-based Transformers
Prashanth Chandran
Agon Serifi
Markus Gross
Moritz Bächer
36
0
0
03 Apr 2025
v-CLR: View-Consistent Learning for Open-World Instance Segmentation
Chang-Bin Zhang
Jinhong Ni
Yujie Zhong
Kai Han
3DV
VLM
57
0
0
02 Apr 2025
Dual-stream Transformer-GCN Model with Contextualized Representations Learning for Monocular 3D Human Pose Estimation
Mingrui Ye
Lianping Yang
Hegui Zhu
Zenghao Zheng
Xin Wang
Yantao Lo
ViT
31
0
0
02 Apr 2025
Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness
Haochen Wang
Yucheng Zhao
Tiancai Wang
Haoqiang Fan
X. Zhang
Zhaoxiang Zhang
59
0
0
02 Apr 2025
Learning from Streaming Video with Orthogonal Gradients
Tengda Han
Dilara Gokay
Joseph Heyward
Chuhan Zhang
Daniel Zoran
Viorica Patraucean
João Carreira
Dima Damen
Andrew Zisserman
40
0
0
02 Apr 2025
COST: Contrastive One-Stage Transformer for Vision-Language Small Object Tracking
Chunhui Zhang
Li Liu
Jialin Gao
Xin Sun
Hao Wen
Xi Zhou
Shiming Ge
Y. Wang
33
0
0
02 Apr 2025
Scene-Centric Unsupervised Panoptic Segmentation
Oliver Hahn
Christoph Reich
Nikita Araslanov
Daniel Cremers
Christian Rupprecht
Stefan Roth
OCL
57
0
0
02 Apr 2025
UniViTAR: Unified Vision Transformer with Native Resolution
Limeng Qiao
Yiyang Gan
Bairui Wang
Jie Qin
Shuang Xu
Siqi Yang
Lin Ma
50
0
0
02 Apr 2025
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization
Siyuan Li
L. Zhang
Zedong Wang
Juanxi Tian
Cheng Tan
...
Chang Yu
Qingsong Xie
Haonan Lu
Haoqian Wang
Zhen Lei
46
0
0
01 Apr 2025
SMILE: Infusing Spatial and Motion Semantics in Masked Video Learning
Fida Mohammad Thoker
Letian Jiang
Chen Zhao
Bernard Ghanem
53
0
0
01 Apr 2025
Spingarn's Method and Progressive Decoupling Beyond Elicitable Monotonicity
B. Evens
P. Latafat
Panagiotis Patrinos
46
0
0
01 Apr 2025
Scaling Language-Free Visual Representation Learning
David Fan
Shengbang Tong
Jiachen Zhu
Koustuv Sinha
Zhuang Liu
...
Michael G. Rabbat
Nicolas Ballas
Yann LeCun
Amir Bar
Saining Xie
CLIP
VLM
56
2
0
01 Apr 2025
A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives
Shuyu Li
Shulei Ji
Zihao W. Wang
Songruoyao Wu
Jiaxing Yu
K. Zhang
MGen
VGen
65
1
0
01 Apr 2025
From Colors to Classes: Emergence of Concepts in Vision Transformers
Teresa Dorszewski
Lenka Tětková
Robert Jenssen
Lars Kai Hansen
Kristoffer Wickstrøm
37
0
0
31 Mar 2025
Order Matters: On Parameter-Efficient Image-to-Video Probing for Recognizing Nearly Symmetric Actions
Thinesh Thiyakesan Ponbagavathi
Alina Roitberg
34
0
0
31 Mar 2025
CBIL: Collective Behavior Imitation Learning for Fish from Real Videos
Yifan Wu
Zhiyang Dou
Yuko Ishiwaka
Shun Ogawa
Yuke Lou
Wenping Wang
Lingjie Liu
Taku Komura
45
3
0
31 Mar 2025
Leveraging Diffusion Model and Image Foundation Model for Improved Correspondence Matching in Coronary Angiography
Lin Zhao
Xin Yu
Yikang Liu
Xiao Chen
Eric Z. Chen
Terrence Chen
Shanhui Sun
DiffM
MedIm
42
0
0
31 Mar 2025
SAVeD: Learning to Denoise Low-SNR Video for Improved Downstream Performance
Suzanne Stathatos
Michael Hobley
Markus Marks
Pietro Perona
27
0
0
31 Mar 2025
SmartScan: An AI-based Interactive Framework for Automated Region Extraction from Satellite Images
S. Nagendra
Kashif Rashid
36
0
0
31 Mar 2025
Self-Supervised Pretraining for Aerial Road Extraction
Rupert Polley
Sai Vignesh Abishek Deenadayalan
Johann Marius Zöllner
SSL
66
0
0
31 Mar 2025
FlexiMo: A Flexible Remote Sensing Foundation Model
Xuyang Li
Chenyu Li
Pedram Ghamisi
Danfeng Hong
40
0
0
31 Mar 2025
HumanAesExpert: Advancing a Multi-Modality Foundation Model for Human Image Aesthetic Assessment
Zhichao Liao
Xiaokun Liu
Wenyu Qin
Qingyu Li
Qiulin Wang
Pengfei Wan
Di Zhang
Long Zeng
P. Feng
51
0
0
31 Mar 2025
ViT-Linearizer: Distilling Quadratic Knowledge into Linear-Time Vision Models
Guoyizhe Wei
Rama Chellappa
31
0
0
30 Mar 2025
Can Visuo-motor Policies Benefit from Random Exploration Data? A Case Study on Stacking
Shutong Jin
Axel Kaliff
Ruiyu Wang
Muhammad Zahid
Florian T. Pokorny
VGen
34
0
0
30 Mar 2025
Efficient Token Compression for Vision Transformer with Spatial Information Preserved
Junzhu Mao
Yang Shen
Jinyang Guo
Yazhou Yao
Xiansheng Hua
ViT
31
0
0
30 Mar 2025
Beyond Academic Benchmarks: Critical Analysis and Best Practices for Visual Industrial Anomaly Detection
Aimira Baitieva
Yacine Bouaouni
Alexandre Briot
Dick Ameln
Souhaiel Khalfaoui
S. Akçay
39
0
0
30 Mar 2025
Multi-label classification for multi-temporal, multi-spatial coral reef condition monitoring using vision foundation model with adapter learning
Xinlei Shao
Hongruixuan Chen
Fan Zhao
Kirsty Magson
Jundong Chen
Peiran Li
J. Wang
Jun Sasaki
44
0
0
29 Mar 2025
Function Fitting Based on Kolmogorov-Arnold Theorem and Kernel Functions
Jianpeng Liu
Qizhi Pan
32
0
0
29 Mar 2025
Efficient Building Roof Type Classification: A Domain-Specific Self-Supervised Approach
Guneet Mutreja
Ksenia Bittner
35
0
0
28 Mar 2025
Assessing Foundation Models for Sea Ice Type Segmentation in Sentinel-1 SAR Imagery
Samira Alkaee Taleghan
Morteza Karimzadeh
A. Barrett
Walter N. Meier
F. Banaei-Kashani
56
0
0
28 Mar 2025
Detecting Localized Deepfake Manipulations Using Action Unit-Guided Video Representations
Tharun Anand
Siva Sankar
Pravin Nair
AAML
40
0
0
28 Mar 2025
A Survey on Remote Sensing Foundation Models: From Vision to Multimodality
Ziyue Huang
Hongxi Yan
Qiqi Zhan
Shuai Yang
Mingming Zhang
Chenkai Zhang
Yiming Lei
Zeming Liu
Qingjie Liu
Y. Wang
42
0
0
28 Mar 2025
MedCL: Learning Consistent Anatomy Distribution for Scribble-supervised Medical Image Segmentation
Ke Zhang
Vishal M. Patel
44
0
0
28 Mar 2025
Evaluating Text-to-Image Synthesis with a Conditional Fréchet Distance
Jaywon Koo
J. Hernandez
Moayed Haji-Ali
Ziyan Yang
Vicente Ordonez
EGVM
67
0
0
27 Mar 2025
Delving Deep into Semantic Relation Distillation
Zhaoyi Yan
Kangjun Liu
Qixiang Ye
54
0
0
27 Mar 2025
Test-Time Visual In-Context Tuning
Jiahao Xie
A. Tonioni
N. Rauschmayr
F. Tombari
Bernt Schiele
OOD
VLM
60
0
0
27 Mar 2025
HyperFree: A Channel-adaptive and Tuning-free Foundation Model for Hyperspectral Remote Sensing Imagery
Jingtao Li
Y. Liu
Xinyu Wang
Yunning Peng
Chen Sun
...
Tian Ke
Xiao Jiang
Tangwei Lu
Anran Zhao
Yanfei Zhong
VLM
55
0
0
27 Mar 2025
Harmonizing Visual Representations for Unified Multimodal Understanding and Generation
Size Wu
W. Zhang
Lumin Xu
Sheng Jin
Zhonghua Wu
Qingyi Tao
Wentao Liu
Wei Li
Chen Change Loy
VGen
121
2
0
27 Mar 2025
Feature4X: Bridging Any Monocular Video to 4D Agentic AI with Versatile Gaussian Feature Fields
Shijie Zhou
Hui Ren
Yijia Weng
Shuwang Zhang
Zhen Wang
...
Zhiwen Fan
Suya You
Z. Wang
Leonidas J. Guibas
A. Kadambi
VGen
3DGS
83
0
0
26 Mar 2025
RxRx3-core: Benchmarking drug-target interactions in High-Content Microscopy
Oren Z. Kraus
Federico Comitani
John Urbanik
Kian Kenyon-Dean
Lakshmanan Arumugam
Saber Saberian
Cas Wognum
Safiye Celik
I. Haque
80
0
0
26 Mar 2025
MAR-3D: Progressive Masked Auto-regressor for High-Resolution 3D Generation
Jinnan Chen
Lingting Zhu
Zeyu Hu
Shengju Qian
Y. Chen
Xin Wang
G. Lee
97
1
0
26 Mar 2025
Mamba-3D as Masked Autoencoders for Accurate and Data-Efficient Analysis of Medical Ultrasound Videos
Jiaheng Zhou
Yanfeng Zhou
Wei Fang
Yuxing Tang
Le Lu
Ge Yang
Mamba
182
0
0
26 Mar 2025
ATP: Adaptive Threshold Pruning for Efficient Data Encoding in Quantum Neural Networks
Mohamed Afane
Gabrielle Ebbrecht
Ying Wang
Juntao Chen
Junaid Farooq
28
0
0
26 Mar 2025
Previous
1
2
3
4
5
...
91
92
93
Next