Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.06377
Cited By
v1
v2
v3 (latest)
Masked Autoencoders Are Scalable Vision Learners
11 November 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Masked Autoencoders Are Scalable Vision Learners"
50 / 4,777 papers shown
Title
Earth-Adapter: Bridge the Geospatial Domain Gaps with Mixture of Frequency Adaptation
Xiaoxing Hu
Ziyang Gong
Yansen Wang
Yuru Jia
Gen Luo
Xue Yang
473
1
0
08 Apr 2025
ViTaMIn: Learning Contact-Rich Tasks Through Robot-Free Visuo-Tactile Manipulation Interface
Fangchen Liu
Chuanyu Li
Yihua Qin
Ankit Shaw
Jinfeng Xu
Pieter Abbeel
Rui Chen
130
5
0
08 Apr 2025
Falcon: Fractional Alternating Cut with Overcoming Minima in Unsupervised Segmentation
Xiao Zhang
Xiangyu Han
Xiwen Lai
Yao Sun
Pei Zhang
Konrad Kording
60
0
0
08 Apr 2025
Uni4D: A Unified Self-Supervised Learning Framework for Point Cloud Videos
Zhi Zuo
Chenyi Zhuang
Zhiqiang Shen
Pan Gao
Jie Qin
Nicu Sebe
3DPC
123
0
0
07 Apr 2025
S^4M: Boosting Semi-Supervised Instance Segmentation with SAM
Heeji Yoon
Heeseong Shin
Eunbeen Hong
Hyunwook Choi
Hansang Cho
Daun Jeong
Seungryong Kim
65
0
0
07 Apr 2025
SapiensID: Foundation for Human Recognition
Minchul Kim
Dingqiang Ye
Yiyang Su
Feng Liu
Xiaoming Liu
CVBM
VLM
89
1
0
07 Apr 2025
Attributed Synthetic Data Generation for Zero-shot Domain-specific Image Classification
Shijian Wang
Linxin Song
Ryotaro Shimizu
M. Goto
Hanqian Wu
VLM
58
0
0
06 Apr 2025
Resilience of Vision Transformers for Domain Generalisation in the Presence of Out-of-Distribution Noisy Images
Hamza Riaz
Alan F. Smeaton
82
0
0
05 Apr 2025
Window Token Concatenation for Efficient Visual Large Language Models
Yifan Li
Wentao Bao
Botao Ye
Zhen Tan
Tianlong Chen
Huan Liu
Yu Kong
VLM
103
0
0
05 Apr 2025
MInCo: Mitigating Information Conflicts in Distracted Visual Model-based Reinforcement Learning
Shiguang Sun
Hanbo Zhang
Zeyang Liu
Xinrui Yang
Lipeng Wan
Bing Yan
Xingyu Chen
219
0
0
05 Apr 2025
A Survey of Pathology Foundation Model: Progress and Future Directions
Conghao Xiong
Hao Chen
Joseph J. Y. Sung
LM&MA
AI4CE
173
1
0
05 Apr 2025
Pairwise Optimal Transports for Training All-to-All Flow-Based Condition Transfer Model
Kotaro Ikeda
Masanori Koyama
Jinzhe Zhang
Kohei Hayashi
Kenji Fukumizu
OT
568
1
0
04 Apr 2025
MIMRS: A Survey on Masked Image Modeling in Remote Sensing
Shabnam Choudhury
Akhil Vasim
Michael Schmitt
Biplab Banerjee
78
0
0
04 Apr 2025
REJEPA: A Novel Joint-Embedding Predictive Architecture for Efficient Remote Sensing Image Retrieval
Shabnam Choudhury
Yash Salunkhe
Sarthak Mehrotra
Biplab Banerjee
80
0
0
04 Apr 2025
Detecting underdetermination in parameterized quantum circuits
Marie Kempkes
Jakob Spiegelberg
Evert van Nieuwenburg
Vedran Dunjko
93
0
0
04 Apr 2025
Temporal-contextual Event Learning for Pedestrian Crossing Intent Prediction
Hongbin Liang
Hezhe Qiao
Wei Huang
Qizhou Wang
Mingsheng Shang
Lin Chen
65
0
0
04 Apr 2025
AutoSSVH: Exploring Automated Frame Sampling for Efficient Self-Supervised Video Hashing
Niu Lian
Jun Li
Jinpeng Wang
Ruisheng Luo
Yaowei Wang
Shu-Tao Xia
Bin Chen
425
0
0
04 Apr 2025
Refining CLIP's Spatial Awareness: A Visual-Centric Perspective
Congpei Qiu
Yanhao Wu
Wei Ke
Xiuxiu Bai
Tong Zhang
VLM
104
0
0
03 Apr 2025
ESC: Erasing Space Concept for Knowledge Deletion
Tae-Young Lee
Sundong Park
M. Jeon
Hyoseok Hwang
Gyeong-Moon Park
KELM
MU
85
0
0
03 Apr 2025
Geospatial Artificial Intelligence for Satellite-Based Flood Extent Mapping: Concepts, Advances, and Future Perspectives
Hyunho Lee
Wenwen Li
AI4CE
95
0
0
03 Apr 2025
Safety Modulation: Enhancing Safety in Reinforcement Learning through Cost-Modulated Rewards
Hanping Zhang
Yuhong Guo
OffRL
116
0
0
03 Apr 2025
Spline-based Transformers
Prashanth Chandran
Agon Serifi
Markus Gross
Moritz Bächer
156
0
0
03 Apr 2025
A Sensorimotor Vision Transformer
Konrad Gadzicki
K. Schill
C. Zetzsche
142
0
0
03 Apr 2025
Dual-stream Transformer-GCN Model with Contextualized Representations Learning for Monocular 3D Human Pose Estimation
Mingrui Ye
Lianping Yang
Hegui Zhu
Zenghao Zheng
Xin Wang
Yantao Lo
ViT
95
0
0
02 Apr 2025
COST: Contrastive One-Stage Transformer for Vision-Language Small Object Tracking
Chunhui Zhang
Li Liu
Jialin Gao
Xin Sun
Hao Wen
Xi Zhou
Shiming Ge
Yucheng Wang
110
1
0
02 Apr 2025
Scene-Centric Unsupervised Panoptic Segmentation
Oliver Hahn
Christoph Reich
Nikita Araslanov
Daniel Cremers
Christian Rupprecht
Stefan Roth
OCL
144
0
0
02 Apr 2025
UniViTAR: Unified Vision Transformer with Native Resolution
Limeng Qiao
Yiyang Gan
Bairui Wang
Jie Qin
Shuang Xu
Siqi Yang
Lin Ma
173
0
0
02 Apr 2025
v-CLR: View-Consistent Learning for Open-World Instance Segmentation
Chang-Bin Zhang
Jinhong Ni
Yujie Zhong
Kai Han
3DV
VLM
177
0
0
02 Apr 2025
Ross3D: Reconstructive Visual Instruction Tuning with 3D-Awareness
Haochen Wang
Yucheng Zhao
Tiancai Wang
Haoqiang Fan
Xinming Zhang
Zhaoxiang Zhang
153
4
0
02 Apr 2025
Learning from Streaming Video with Orthogonal Gradients
Tengda Han
Dilara Gokay
Joseph Heyward
Chuhan Zhang
Daniel Zoran
Viorica Patraucean
João Carreira
Dima Damen
Andrew Zisserman
116
0
0
02 Apr 2025
SMILE: Infusing Spatial and Motion Semantics in Masked Video Learning
Fida Mohammad Thoker
Letian Jiang
Chen Zhao
Bernard Ghanem
140
0
0
01 Apr 2025
MergeVQ: A Unified Framework for Visual Generation and Representation with Disentangled Token Merging and Quantization
Siyuan Li
Lefei Zhang
Zedong Wang
Juanxi Tian
Cheng Tan
...
Chang Yu
Qingsong Xie
Haonan Lu
Haoqian Wang
Zhen Lei
108
2
0
01 Apr 2025
Spingarn's Method and Progressive Decoupling Beyond Elicitable Monotonicity
B. Evens
P. Latafat
Panagiotis Patrinos
229
1
0
01 Apr 2025
A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives
Shuyu Li
Shulei Ji
Zihao Wang
Songruoyao Wu
Jiaxing Yu
Kai Zhang
MGen
VGen
295
1
0
01 Apr 2025
Scaling Language-Free Visual Representation Learning
David Fan
Shengbang Tong
Jiachen Zhu
Koustuv Sinha
Zhuang Liu
...
Michael G. Rabbat
Nicolas Ballas
Yann LeCun
Amir Bar
Saining Xie
CLIP
VLM
Presented at
ResearchTrend Connect | VLM
on
04 Jun 2025
172
6
0
01 Apr 2025
SAVeD: Learning to Denoise Low-SNR Video for Improved Downstream Performance
Suzanne Stathatos
Michael Hobley
Markus Marks
Pietro Perona
79
0
0
31 Mar 2025
FlexiMo: A Flexible Remote Sensing Foundation Model
Xuyang Li
Chenyu Li
Pedram Ghamisi
Danfeng Hong
76
3
0
31 Mar 2025
Order Matters: On Parameter-Efficient Image-to-Video Probing for Recognizing Nearly Symmetric Actions
Thinesh Thiyakesan Ponbagavathi
Alina Roitberg
62
0
0
31 Mar 2025
Leveraging Diffusion Model and Image Foundation Model for Improved Correspondence Matching in Coronary Angiography
Lin Zhao
Xin Yu
Yikang Liu
Xiao Chen
Eric Z. Chen
Terrence Chen
Shanhui Sun
DiffM
MedIm
78
0
0
31 Mar 2025
Self-Supervised Pretraining for Aerial Road Extraction
Rupert Polley
Sai Vignesh Abishek Deenadayalan
Johann Marius Zöllner
SSL
100
0
0
31 Mar 2025
HumanAesExpert: Advancing a Multi-Modality Foundation Model for Human Image Aesthetic Assessment
Zhichao Liao
Xiaokun Liu
Wenyu Qin
Qingyu Li
Qiulin Wang
Pengfei Wan
Di Zhang
Long Zeng
Pingfa Feng
201
1
0
31 Mar 2025
SmartScan: An AI-based Interactive Framework for Automated Region Extraction from Satellite Images
S. Nagendra
Kashif Rashid
78
0
0
31 Mar 2025
From Colors to Classes: Emergence of Concepts in Vision Transformers
Teresa Dorszewski
Lenka Tětková
Robert Jenssen
Lars Kai Hansen
Kristoffer Wickstrøm
78
3
0
31 Mar 2025
CBIL: Collective Behavior Imitation Learning for Fish from Real Videos
Yifan Wu
Zhiyang Dou
Yuko Ishiwaka
Shun Ogawa
Yuke Lou
Wenping Wang
Lingjie Liu
Taku Komura
208
3
0
31 Mar 2025
Beyond Academic Benchmarks: Critical Analysis and Best Practices for Visual Industrial Anomaly Detection
Aimira Baitieva
Yacine Bouaouni
Alexandre Briot
Dick Ameln
Souhaiel Khalfaoui
S. Akçay
90
0
0
30 Mar 2025
Can Visuo-motor Policies Benefit from Random Exploration Data? A Case Study on Stacking
Shutong Jin
Axel Kaliff
Ruiyu Wang
Muhammad Zahid
Florian T. Pokorny
VGen
59
0
0
30 Mar 2025
ViT-Linearizer: Distilling Quadratic Knowledge into Linear-Time Vision Models
Guoyizhe Wei
Rama Chellappa
101
2
0
30 Mar 2025
Efficient Token Compression for Vision Transformer with Spatial Information Preserved
Junzhu Mao
Yang Shen
Jinyang Guo
Yazhou Yao
Xiansheng Hua
ViT
141
0
0
30 Mar 2025
Multi-label classification for multi-temporal, multi-spatial coral reef condition monitoring using vision foundation model with adapter learning
Xinlei Shao
Hongruixuan Chen
Fan Zhao
Kirsty Magson
Jundong Chen
Peiran Li
Jingchao Wang
Jun Sasaki
125
0
0
29 Mar 2025
Function Fitting Based on Kolmogorov-Arnold Theorem and Kernel Functions
Jianpeng Liu
Qizhi Pan
54
0
0
29 Mar 2025
Previous
1
2
3
...
6
7
8
...
94
95
96
Next