Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.06377
Cited By
v1
v2
v3 (latest)
Masked Autoencoders Are Scalable Vision Learners
11 November 2021
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Masked Autoencoders Are Scalable Vision Learners"
50 / 4,777 papers shown
Title
RandAR: Decoder-only Autoregressive Visual Generation in Random Orders
Ziqi Pang
Tianyuan Zhang
Fujun Luan
Yunze Man
Hao Tan
Kai Zhang
William T. Freeman
Yu-Xiong Wang
VGen
135
20
0
02 Dec 2024
Gen-SIS: Generative Self-augmentation Improves Self-supervised Learning
Varun Belagali
Srikar Yellapragada
Alexandros Graikos
S. Kapse
Zilinghan Li
Tarak Nandi
Ravi K. Madduri
Prateek Prasanna
Joel H. Saltz
Dimitris Samaras
DiffM
144
2
0
02 Dec 2024
COSMOS: Cross-Modality Self-Distillation for Vision Language Pre-training
Sanghwan Kim
Rui Xiao
Mariana-Iuliana Georgescu
Stephan Alaniz
Zeynep Akata
VLM
346
3
0
02 Dec 2024
OmniGuard: Hybrid Manipulation Localization via Augmented Versatile Deep Image Watermarking
Xinyu Zhang
Zecheng Tang
Zhipei Xu
Runyi Li
Youmin Xu
Bin Chen
Feng Gao
Jian Zhang
WIGM
191
5
0
02 Dec 2024
Token Cropr: Faster ViTs for Quite a Few Tasks
Benjamin Bergner
C. Lippert
Aravindh Mahendran
ViT
VLM
104
1
0
01 Dec 2024
FiffDepth: Feed-forward Transformation of Diffusion-Based Generators for Detailed Depth Estimation
Yunpeng Bai
Qixing Huang
DiffM
172
0
0
01 Dec 2024
Rethinking Generalizability and Discriminability of Self-Supervised Learning from Evolutionary Game Theory Perspective
Jiangmeng Li
Zehua Zang
Qirui Ji
Chuxiong Sun
Jingyao Wang
Junge Zhang
Changwen Zheng
Gang Hua
Hui Xiong
SSL
146
0
0
30 Nov 2024
Deepfake Media Generation and Detection in the Generative AI Era: A Survey and Outlook
Florinel-Alin Croitoru
Andrei Iulian Hiji
Vlad Hondru
Nicolae-Cătălin Ristea
Paul Irofti
Marius Popescu
Cristian Rusu
Radu Tudor Ionescu
Fahad Shahbaz Khan
Mubarak Shah
135
5
0
29 Nov 2024
Curriculum Fine-tuning of Vision Foundation Model for Medical Image Classification Under Label Noise
Yeonguk Yu
Minhwan Ko
Sungho Shin
Kangmin Kim
K. Lee
NoLa
120
2
0
29 Nov 2024
Effective Fine-Tuning of Vision-Language Models for Accurate Galaxy Morphology Analysis
Ruoqi Wang
Haitao Wang
Qiong Luo
156
1
0
29 Nov 2024
Pretrained Reversible Generation as Unsupervised Visual Representation Learning
Rongkun Xue
Jinouwen Zhang
Yazhe Niu
Dazhong Shen
Bingqi Ma
Yu Liu
Jing Yang
188
0
0
29 Nov 2024
Enhancing Parameter-Efficient Fine-Tuning of Vision Transformers through Frequency-Based Adaptation
S. Ly
Hien Nguyen
126
1
0
28 Nov 2024
PP-SSL : Priority-Perception Self-Supervised Learning for Fine-Grained Recognition
ShuaiHeng Li
Qing Cai
Fan Zhang
Hao Fei
Yangyang Shu
Ziqiang Liu
Haoyang Li
Lingqiao Liu
127
0
0
28 Nov 2024
TAMT: Temporal-Aware Model Tuning for Cross-Domain Few-Shot Action Recognition
Yilong Wang
Zilin Gao
Qilong Wang
Zhaofeng Chen
P. Li
Q. Hu
182
1
0
28 Nov 2024
Reconstructing Animals and the Wild
Peter Kulits
Michael J. Black
Silvia Zuffi
85
0
0
27 Nov 2024
Lift3D Foundation Policy: Lifting 2D Large-Scale Pretrained Models for Robust 3D Robotic Manipulation
Yueru Jia
Jiaming Liu
Sixiang Chen
Chenyang Gu
Zihan Wang
...
Lily Lee
Pengwei Wang
Zhongyuan Wang
Renrui Zhang
Shanghang Zhang
170
19
0
27 Nov 2024
Point Cloud Unsupervised Pre-training via 3D Gaussian Splatting
Hao Liu
Minglin Chen
Yanni Ma
Haihong Xiao
Ying He
3DGS
3DPC
127
1
0
27 Nov 2024
RS-vHeat: Heat Conduction Guided Efficient Remote Sensing Foundation Model
Huiyang Hu
Peijin Wang
Hanbo Bi
Boyuan Tong
Zehua Wang
...
Ziqi Zhang
Yaowei Wang
QiXiang Ye
Kun Fu
Xian Sun
297
0
0
27 Nov 2024
SelfSplat: Pose-Free and 3D Prior-Free Generalizable 3D Gaussian Splatting
Gyeongjin Kang
Jisang Yoo
Jihyeon Park
Seungtae Nam
Hyeonsoo Im
Sangheon Shin
Sangpil Kim
Eunbyung Park
3DGS
332
6
0
26 Nov 2024
RealTraj: Towards Real-World Pedestrian Trajectory Forecasting
Ryo Fujii
Hideo Saito
Ryo Hachiuma
AI4TS
188
2
0
26 Nov 2024
Probing the Mid-level Vision Capabilities of Self-Supervised Learning
Xuweiyi Chen
Markus Marks
Zezhou Cheng
173
0
0
25 Nov 2024
Open Vocabulary Monocular 3D Object Detection
Jin Yao
Hao Gu
Xuweiyi Chen
Jiayun Wang
Zezhou Cheng
ObjD
VLM
121
3
0
25 Nov 2024
Image Generation Diversity Issues and How to Tame Them
Mischa Dombrowski
Weitong Zhang
Sarah Cechnicka
Hadrien Reynaud
Bernhard Kainz
128
1
0
25 Nov 2024
Cautious Optimizers: Improving Training with One Line of Code
Kaizhao Liang
Lizhang Chen
B. Liu
Qiang Liu
ODL
261
9
0
25 Nov 2024
Scaling Spike-driven Transformer with Efficient Spike Firing Approximation Training
Man Yao
Xuerui Qiu
Tianxiang Hu
J. Hu
Yuhong Chou
Keyu Tian
Jianxing Liao
Luziwei Leng
Bo Xu
Guoqi Li
149
16
0
25 Nov 2024
DeDe: Detecting Backdoor Samples for SSL Encoders via Decoders
Sizai Hou
Songze Li
Duanyi Yao
AAML
192
0
0
25 Nov 2024
SVTRv2: CTC Beats Encoder-Decoder Models in Scene Text Recognition
Yongkun Du
Z. Chen
Hongtao Xie
Caiyan Jia
Yu-Gang Jiang
164
1
0
24 Nov 2024
Multi-Token Enhancing for Vision Representation Learning
Zhong-Yu Li
Yu-Song Hu
Bo Yin
Ming-Ming Cheng
174
1
0
24 Nov 2024
PR-MIM: Delving Deeper into Partial Reconstruction in Masked Image Modeling
Zhong-Yu Li
Yunheng Li
Deng-Ping Fan
Ming-Ming Cheng
175
0
0
24 Nov 2024
TransFair: Transferring Fairness from Ocular Disease Classification to Progression Prediction
Leila Gheisi
Henry Chu
Raju Gottumukkala
Yan Luo
Xingquan Zhu
Mengyu Wang
Min Shi
MedIm
150
0
0
24 Nov 2024
A Survey of Recent Advances and Challenges in Deep Audio-Visual Correlation Learning
Luis Vilaca
Yi Yu
Paula Vinan
186
0
0
24 Nov 2024
SPA: Efficient User-Preference Alignment against Uncertainty in Medical Image Segmentation
Jiayuan Zhu
Junde Wu
Cheng Ouyang
Konstantinos Kamnitsas
Alison Noble
108
0
0
23 Nov 2024
Improving Factuality of 3D Brain MRI Report Generation with Paired Image-domain Retrieval and Text-domain Augmentation
J. Lee
Y. Oh
Dahyoun Lee
Hyon Keun Joh
Chul-Ho Sohn
...
Cheol Kyu Jung
Jung Hyun Park
Kyu Sung Choi
Byung-Hoon Kim
Jong Chul Ye
DiffM
MedIm
129
1
0
23 Nov 2024
Enhancing Instruction-Following Capability of Visual-Language Models by Reducing Image Redundancy
Te Yang
Jian Jia
Xiangyu Zhu
Weisong Zhao
Bo Wang
...
Shengyuan Liu
Quan Chen
Peng Jiang
Kun Gai
Zhen Lei
81
1
0
23 Nov 2024
There is no SAMantics! Exploring SAM as a Backbone for Visual Understanding Tasks
Miguel Espinosa
Chenhongyi Yang
Linus Ericsson
Jingyu Sun
Elliot J. Crowley
VLM
118
1
0
22 Nov 2024
Optimized Vessel Segmentation: A Structure-Agnostic Approach with Small Vessel Enhancement and Morphological Correction
Dongning Song
Weijian Huang
Jiarun Liu
Md Jahidul Islam
Hao Yang
Shanshan Wang
130
0
0
22 Nov 2024
Aim My Robot: Precision Local Navigation to Any Object
Xiangyun Meng
Xuning Yang
Sanghun Jung
F. Ramos
Srid Sadhan Jujjavarapu
Sanjoy Paul
Dieter Fox
134
1
0
22 Nov 2024
Segment Anything in Light Fields for Real-Time Applications via Constrained Prompting
Nikolai Goncharov
Donald G. Dansereau
VLM
110
1
0
21 Nov 2024
Segment Any Class (SAC): Multi-Class Few-Shot Semantic Segmentation via Class Region Proposals
Hussni Mohd Zakir
Eric Tatt Wei Ho
VLM
112
0
0
21 Nov 2024
NexusSplats: Efficient 3D Gaussian Splatting in the Wild
Yuzhou Tang
Dejun Xu
Yongjie Hou
Zhenzhong Wang
Min Jiang
3DGS
205
2
0
21 Nov 2024
Tra-MoE: Learning Trajectory Prediction Model from Multiple Domains for Adaptive Policy Conditioning
Jiange Yang
Haoyi Zhu
Yanjie Wang
Gangshan Wu
Tong He
Limin Wang
202
3
0
21 Nov 2024
Extending Video Masked Autoencoders to 128 frames
N. B. Gundavarapu
Luke Friedman
Raghav Goyal
Chaitra Hegde
Eirikur Agustsson
...
Mikhail Sirotenko
Ming-Hsuan Yang
Tobias Weyand
Boqing Gong
Leonid Sigal
118
1
0
20 Nov 2024
Generating 3D-Consistent Videos from Unposed Internet Photos
Gene Chou
Kai Zhang
Sai Bi
Hao Tan
Zexiang Xu
Fujun Luan
Bharath Hariharan
Noah Snavely
3DGS
VGen
164
3
0
20 Nov 2024
Uni-Mlip: Unified Self-supervision for Medical Vision Language Pre-training
Ameera Bawazir
Kebin Wu
Wenbin Li
CLIP
106
1
0
20 Nov 2024
RAW-Diffusion: RGB-Guided Diffusion Models for High-Fidelity RAW Image Generation
Christoph Reinders
Radu Berdan
Beril Besbinar
Junji Otsuka
Daisuke Iso
121
2
0
20 Nov 2024
Adapting Vision Foundation Models for Robust Cloud Segmentation in Remote Sensing Images
Xuechao Zou
Shun Zhang
Kai Li
Shiying Wang
Junliang Xing
Lei Jin
Congyan Lang
Pin Tao
110
1
0
20 Nov 2024
Efficient Masked AutoEncoder for Video Object Counting and A Large-Scale Benchmark
Bing Cao
Quanhao Lu
Jiekang Feng
Pengfei Zhu
Q. Hu
Qilong Wang
137
1
0
20 Nov 2024
Probe-Me-Not: Protecting Pre-trained Encoders from Malicious Probing
Ruyi Ding
Tong Zhou
Lili Su
A. A. Ding
Xiaolin Xu
Yunsi Fei
AAML
152
2
0
19 Nov 2024
GaussianPretrain: A Simple Unified 3D Gaussian Representation for Visual Pre-training in Autonomous Driving
Shaoqing Xu
Fang Li
Shengyin Jiang
Ziying Song
Li Liu
Zhi-xin Yang
3DGS
SSL
128
2
0
19 Nov 2024
KDC-MAE: Knowledge Distilled Contrastive Mask Auto-Encoder
Maheswar Bora
Saurabh Atreya
Aritra Mukherjee
Abhijit Das
138
0
0
19 Nov 2024
Previous
1
2
3
...
14
15
16
...
94
95
96
Next