Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2211.07636
Cited By
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
14 November 2022
Yuxin Fang
Wen Wang
Binhui Xie
Quan-Sen Sun
Ledell Yu Wu
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
VLM
CLIP
Re-assign community
ArXiv
PDF
HTML
Papers citing
"EVA: Exploring the Limits of Masked Visual Representation Learning at Scale"
50 / 507 papers shown
Title
What Makes for Good Visual Tokenizers for Large Language Models?
Guangzhi Wang
Yixiao Ge
Xiaohan Ding
Mohan S. Kankanhalli
Ying Shan
MLLM
VLM
25
38
0
20 May 2023
VisionLLM: Large Language Model is also an Open-Ended Decoder for Vision-Centric Tasks
Wen Wang
Zhe Chen
Xiaokang Chen
Jiannan Wu
Xizhou Zhu
...
Ping Luo
Tong Lu
Jie Zhou
Yu Qiao
Jifeng Dai
MLLM
VLM
35
457
0
18 May 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLM
MLLM
ObjD
42
114
0
18 May 2023
MedBLIP: Bootstrapping Language-Image Pre-training from 3D Medical Images and Texts
Qiuhui Chen
Xinyue Hu
Zirui Wang
Yi Hong
LM&MA
MedIm
14
34
0
18 May 2023
Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with Foundation Models
Zhimin Chen
Longlong Jing
Yingwei Li
Bing Li
29
31
0
15 May 2023
Self-Chained Image-Language Model for Video Localization and Question Answering
Shoubin Yu
Jaemin Cho
Prateek Yadav
Joey Tianyi Zhou
45
129
0
11 May 2023
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
Wenliang Dai
Junnan Li
Dongxu Li
A. M. H. Tiong
Junqi Zhao
Weisheng Wang
Boyang Albert Li
Pascale Fung
Steven C. H. Hoi
MLLM
VLM
19
1,908
0
11 May 2023
Segment Anything is A Good Pseudo-label Generator for Weakly Supervised Semantic Segmentation
Peng-Tao Jiang
Yuqi Yang
VLM
28
29
0
02 May 2023
A Strong and Reproducible Object Detector with Only Public Datasets
Tianhe Ren
Jianwei Yang
Siyi Liu
Ailing Zeng
Feng Li
Hao Zhang
Hongyang Li
Zhaoyang Zeng
Lei Zhang
ObjD
33
11
0
25 Apr 2023
A Cookbook of Self-Supervised Learning
Randall Balestriero
Mark Ibrahim
Vlad Sobal
Ari S. Morcos
Shashank Shekhar
...
Pierre Fernandez
Amir Bar
Hamed Pirsiavash
Yann LeCun
Micah Goldblum
SyDa
FedML
SSL
44
273
0
24 Apr 2023
SkinGPT-4: An Interactive Dermatology Diagnostic System with Visual Large Language Model
Juexiao Zhou
Xiao-Zhen He
Liyuan Sun
Jiannan Xu
Xiuying Chen
Yuetan Chu
Longxi Zhou
Xingyu Liao
Bin Zhang
Xin Gao
LM&MA
35
24
0
21 Apr 2023
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
Deyao Zhu
Jun Chen
Xiaoqian Shen
Xiang Li
Mohamed Elhoseiny
VLM
MLLM
41
1,899
0
20 Apr 2023
DINOv2: Learning Robust Visual Features without Supervision
Maxime Oquab
Timothée Darcet
Théo Moutakanni
Huy Q. Vo
Marc Szafraniec
...
Hervé Jégou
Julien Mairal
Patrick Labatut
Armand Joulin
Piotr Bojanowski
VLM
CLIP
SSL
110
3,041
0
14 Apr 2023
On Robustness in Multimodal Learning
Brandon McKinzie
Joseph Cheng
Vaishaal Shankar
Yinfei Yang
Jonathon Shlens
Alexander Toshev
32
2
0
10 Apr 2023
ViT-Calibrator: Decision Stream Calibration for Vision Transformer
Lin Chen
Zhijie Jia
Tian Qiu
Lechao Cheng
Jie Lei
Zunlei Feng
Min-Gyoo Song
24
1
0
10 Apr 2023
Video ChatCaptioner: Towards Enriched Spatiotemporal Descriptions
Jun Chen
Deyao Zhu
Kilichbek Haydarov
Xiang Li
Mohamed Elhoseiny
23
37
0
09 Apr 2023
V3Det: Vast Vocabulary Visual Detection Dataset
Jiaqi Wang
Pan Zhang
Tao Chu
Yuhang Cao
Yujie Zhou
Tong Wu
Bin Wang
Conghui He
Dahua Lin
VLM
ObjD
29
52
0
07 Apr 2023
Hierarchical Video-Moment Retrieval and Step-Captioning
Abhaysinh Zala
Jaemin Cho
Satwik Kottur
Xilun Chen
Barlas Ouguz
Yasher Mehdad
Joey Tianyi Zhou
3DV
20
51
0
29 Mar 2023
EVA-CLIP: Improved Training Techniques for CLIP at Scale
Quan-Sen Sun
Yuxin Fang
Ledell Yu Wu
Xinlong Wang
Yue Cao
CLIP
VLM
52
468
0
27 Mar 2023
Exploring the Benefits of Visual Prompting in Differential Privacy
Yizhe Li
Yu-Lin Tsai
Xuebin Ren
Chia-Mu Yu
Pin-Yu Chen
AAML
VPVLM
21
18
0
22 Mar 2023
EVA-02: A Visual Representation for Neon Genesis
Yuxin Fang
Quan-Sen Sun
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
VLM
ViT
CLIP
40
259
0
20 Mar 2023
ChatGPT Asks, BLIP-2 Answers: Automatic Questioning Towards Enriched Visual Descriptions
Deyao Zhu
Jun Chen
Kilichbek Haydarov
Xiaoqian Shen
Wenxuan Zhang
Mohamed Elhoseiny
MLLM
29
96
0
12 Mar 2023
A Categorical Framework of General Intelligence
Yang Yuan
28
1
0
08 Mar 2023
DejaVu: Conditional Regenerative Learning to Enhance Dense Prediction
Shubhankar Borse
Debasmit Das
Hyojin Park
H. Cai
Risheek Garrepalli
Fatih Porikli
40
9
0
02 Mar 2023
Offsite-Tuning: Transfer Learning without Full Model
Guangxuan Xiao
Ji Lin
Song Han
35
67
0
09 Feb 2023
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video
Haiyang Xu
Qinghao Ye
Mingshi Yan
Yaya Shi
Jiabo Ye
...
Guohai Xu
Ji Zhang
Songfang Huang
Feiran Huang
Jingren Zhou
MLLM
VLM
MoE
40
160
0
01 Feb 2023
What Makes Good Examples for Visual In-Context Learning?
Yuanhan Zhang
Kaiyang Zhou
Ziwei Liu
MLLM
VPVLM
VLM
LRM
24
107
0
31 Jan 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
270
4,244
0
30 Jan 2023
Aerial Image Object Detection With Vision Transformer Detector (ViTDet)
Liya Wang
A. Tien
44
7
0
28 Jan 2023
Masked Autoencoding Does Not Help Natural Language Supervision at Scale
Floris Weers
Vaishaal Shankar
Angelos Katharopoulos
Yinfei Yang
Tom Gunter
CLIP
23
4
0
19 Jan 2023
TikTalk: A Video-Based Dialogue Dataset for Multi-Modal Chitchat in Real World
Hongpeng Lin
Ludan Ruan
Wenke Xia
Peiyu Liu
Jing Wen
...
Di Hu
Ruihua Song
Wayne Xin Zhao
Qin Jin
Zhiwu Lu
VGen
33
9
0
14 Jan 2023
A Survey on Self-supervised Learning: Algorithms, Applications, and Future Trends
Jie Gui
Tuo Chen
Jing Zhang
Qiong Cao
Zhe Sun
Haoran Luo
Dacheng Tao
31
124
0
13 Jan 2023
CARD: Semantic Segmentation with Efficient Class-Aware Regularized Decoder
Ye Huang
Di Kang
Liang Chen
W. Jia
Xiangjian He
Lixin Duan
Xuefei Zhe
Linchao Bao
34
2
0
11 Jan 2023
Reference Twice: A Simple and Unified Baseline for Few-Shot Instance Segmentation
Yue Han
Jiangning Zhang
Zhucun Xue
Chao Xu
Xintian Shen
Yabiao Wang
Chengjie Wang
Yong Liu
Xiangtai Li
34
17
0
03 Jan 2023
Reproducible scaling laws for contrastive language-image learning
Mehdi Cherti
Romain Beaumont
Ross Wightman
Mitchell Wortsman
Gabriel Ilharco
Cade Gordon
Christoph Schuhmann
Ludwig Schmidt
J. Jitsev
VLM
CLIP
59
739
0
14 Dec 2022
CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1 Accuracy with ViT-B and ViT-L on ImageNet
Xiaoyi Dong
Jianmin Bao
Ting Zhang
Dongdong Chen
Shuyang Gu
Weiming Zhang
Lu Yuan
Dong Chen
Fang Wen
Nenghai Yu
CLIP
22
35
0
12 Dec 2022
Spurious Features Everywhere -- Large-Scale Detection of Harmful Spurious Features in ImageNet
Yannic Neuhaus
Maximilian Augustin
Valentyn Boreiko
Matthias Hein
AAML
36
30
0
09 Dec 2022
Direct-Effect Risk Minimization for Domain Generalization
Yuhui Li
Zejia Wu
Chao Zhang
Hongyang R. Zhang
OOD
37
0
0
26 Nov 2022
Fast-iTPN: Integrally Pre-Trained Transformer Pyramid Network with Token Migration
Yunjie Tian
Lingxi Xie
Jihao Qiu
Jianbin Jiao
Yaowei Wang
Qi Tian
Qixiang Ye
ViT
36
6
0
23 Nov 2022
I Can't Believe There's No Images! Learning Visual Tasks Using only Language Supervision
Sophia Gu
Christopher Clark
Aniruddha Kembhavi
VLM
14
24
0
17 Nov 2022
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
Wenhai Wang
Jifeng Dai
Zhe Chen
Zhenhang Huang
Zhiqi Li
...
Tong Lu
Lewei Lu
Hongsheng Li
Xiaogang Wang
Yu Qiao
VLM
36
657
0
10 Nov 2022
Polite Teacher: Semi-Supervised Instance Segmentation with Mutual Learning and Pseudo-Label Thresholding
Dominik Filipiak
Andrzej Zapala
Piotr Tempczyk
A. Fensel
Marek Cygan
ISeg
13
10
0
07 Nov 2022
Neural Eigenfunctions Are Structured Representation Learners
Zhijie Deng
Jiaxin Shi
Hao Zhang
Peng Cui
Cewu Lu
Jun Zhu
50
14
0
23 Oct 2022
Exploring Target Representations for Masked Autoencoders
Xingbin Liu
Jinghao Zhou
Tao Kong
Xianming Lin
Rongrong Ji
100
50
0
08 Sep 2022
Pathway to Future Symbiotic Creativity
Yi-Ting Guo
Qi-fei Liu
Jie Chen
Wei Xue
Jie Fu
...
Fernando Rosas
Jeffrey Shaw
Xing Wu
Jiji Zhang
Jianliang Xu
26
0
0
18 Aug 2022
Group DETR: Fast DETR Training with Group-Wise One-to-Many Assignment
Qiang Chen
Xiaokang Chen
Jian Wang
Shan Zhang
Kun Yao
Haocheng Feng
Junyu Han
Errui Ding
Gang Zeng
Jingdong Wang
ViT
46
120
0
26 Jul 2022
Contrastive Learning Rivals Masked Image Modeling in Fine-tuning via Feature Distillation
Yixuan Wei
Han Hu
Zhenda Xie
Zheng-Wei Zhang
Yue Cao
Jianmin Bao
Dong Chen
B. Guo
CLIP
88
124
0
27 May 2022
Revealing the Dark Secrets of Masked Image Modeling
Zhenda Xie
Zigang Geng
Jingcheng Hu
Zheng-Wei Zhang
Han Hu
Yue Cao
VLM
191
139
0
26 May 2022
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
305
7,443
0
11 Nov 2021
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
317
5,785
0
29 Apr 2021
Previous
1
2
3
...
10
11
9
Next