Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.06911
Cited By
ViM: Vision Middleware for Unified Downstream Transferring
13 March 2023
Yutong Feng
Biao Gong
Jianwen Jiang
Yiliang Lv
Yujun Shen
Deli Zhao
Jingren Zhou
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ViM: Vision Middleware for Unified Downstream Transferring"
27 / 27 papers shown
Title
Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks
Wenhui Wang
Hangbo Bao
Li Dong
Johan Bjorck
Zhiliang Peng
...
Kriti Aggarwal
O. Mohammed
Saksham Singhal
Subhojit Som
Furu Wei
MLLM
VLM
ViT
103
636
0
22 Aug 2022
PEVL: Position-enhanced Pre-training and Prompt Tuning for Vision-language Models
Yuan Yao
Qi-An Chen
Ao Zhang
Wei Ji
Zhiyuan Liu
Tat-Seng Chua
Maosong Sun
VLM
MLLM
41
38
0
23 May 2022
Vision Transformer Adapter for Dense Predictions
Zhe Chen
Yuchen Duan
Wenhai Wang
Junjun He
Tong Lu
Jifeng Dai
Yu Qiao
50
552
0
17 May 2022
Visual Prompt Tuning
Menglin Jia
Luming Tang
Bor-Chun Chen
Claire Cardie
Serge Belongie
Bharath Hariharan
Ser-Nam Lim
VLM
VPVLM
66
1,576
0
23 Mar 2022
X-Learner: Learning Cross Sources and Tasks for Universal Visual Representation
Yinan He
Gengshi Huang
Siyu Chen
Jianing Teng
Wang Kun
Zhen-fei Yin
Lu Sheng
Ziwei Liu
Yu Qiao
Jing Shao
VLM
SSL
ViT
69
7
0
16 Mar 2022
OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Peng Wang
An Yang
Rui Men
Junyang Lin
Shuai Bai
Zhikang Li
Jianxin Ma
Chang Zhou
Jingren Zhou
Hongxia Yang
MLLM
ObjD
107
859
0
07 Feb 2022
INTERN: A New Learning Paradigm Towards General Vision
Jing Shao
Siyu Chen
Yangguang Li
Kun Wang
Zhen-fei Yin
...
F. Yu
Junjie Yan
Dahua Lin
Xiaogang Wang
Yu Qiao
51
34
0
16 Nov 2021
CLIP-Adapter: Better Vision-Language Models with Feature Adapters
Peng Gao
Shijie Geng
Renrui Zhang
Teli Ma
Rongyao Fang
Yongfeng Zhang
Hongsheng Li
Yu Qiao
VLM
CLIP
156
1,011
0
09 Oct 2021
Rethinking Why Intermediate-Task Fine-Tuning Works
Ting-Yun Chang
Chi-Jen Lu
LRM
32
29
0
26 Aug 2021
Multi-Task Self-Training for Learning General Representations
Golnaz Ghiasi
Barret Zoph
E. D. Cubuk
Quoc V. Le
Nayeon Lee
SSL
39
100
0
25 Aug 2021
BEiT: BERT Pre-Training of Image Transformers
Hangbo Bao
Li Dong
Songhao Piao
Furu Wei
ViT
122
2,785
0
15 Jun 2021
ImageNet-21K Pretraining for the Masses
T. Ridnik
Emanuel Ben-Baruch
Asaf Noy
Lihi Zelnik-Manor
SSeg
VLM
CLIP
265
692
0
22 Apr 2021
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
423
3,952
0
18 Apr 2021
GPT Understands, Too
Xiao Liu
Yanan Zheng
Zhengxiao Du
Ming Ding
Yujie Qian
Zhilin Yang
Jie Tang
VLM
122
1,161
0
18 Mar 2021
TJU-DHD: A Diverse High-Resolution Dataset for Object Detection
Yanwei Pang
Jiale Cao
Yazhao Li
Jin Xie
Hanqing Sun
Jinfeng Gong
ObjD
48
63
0
18 Nov 2020
Google Landmarks Dataset v2 -- A Large-Scale Benchmark for Instance-Level Recognition and Retrieval
Tobias Weyand
A. Araújo
Bingyi Cao
Jack Sim
50
367
0
03 Apr 2020
Improved Baselines with Momentum Contrastive Learning
Xinlei Chen
Haoqi Fan
Ross B. Girshick
Kaiming He
SSL
405
3,397
0
09 Mar 2020
Gradient Surgery for Multi-Task Learning
Tianhe Yu
Saurabh Kumar
Abhishek Gupta
Sergey Levine
Karol Hausman
Chelsea Finn
82
1,190
0
19 Jan 2020
A Large-scale Study of Representation Learning with the Visual Task Adaptation Benchmark
Xiaohua Zhai
J. Puigcerver
Alexander Kolesnikov
P. Ruyssen
C. Riquelme
...
Michael Tschannen
Marcin Michalski
Olivier Bousquet
Sylvain Gelly
N. Houlsby
SSL
49
432
0
01 Oct 2019
LVIS: A Dataset for Large Vocabulary Instance Segmentation
Agrim Gupta
Piotr Dollár
Ross B. Girshick
ISeg
VLM
68
1,352
0
08 Aug 2019
The iMaterialist Fashion Attribute Dataset
Sheng Guo
Weilin Huang
Xiao Zhang
Prasanna Srikhanta
Huayu Chen
Yuan Li
Matthew R. Scott
Hartwig Adam
Serge J. Belongie
VLM
20
77
0
13 Jun 2019
Generalizing from a Few Examples: A Survey on Few-Shot Learning
Yaqing Wang
Quanming Yao
James T. Kwok
L. Ni
66
1,802
0
10 Apr 2019
BigEarthNet: A Large-Scale Benchmark Archive For Remote Sensing Image Understanding
Gencer Sumbul
Marcela Charfuelan
Begüm Demir
Volker Markl
56
447
0
16 Feb 2019
Panoptic Feature Pyramid Networks
Alexander Kirillov
Ross B. Girshick
Kaiming He
Piotr Dollár
ISeg
SSeg
80
1,278
0
08 Jan 2019
From Recognition to Cognition: Visual Commonsense Reasoning
Rowan Zellers
Yonatan Bisk
Ali Farhadi
Yejin Choi
LRM
BDL
OCL
ReLM
121
873
0
27 Nov 2018
Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering
Yash Goyal
Tejas Khot
D. Summers-Stay
Dhruv Batra
Devi Parikh
CoGe
279
3,187
0
02 Dec 2016
The Cityscapes Dataset for Semantic Urban Scene Understanding
Marius Cordts
Mohamed Omran
Sebastian Ramos
Timo Rehfeld
Markus Enzweiler
Rodrigo Benenson
Uwe Franke
Stefan Roth
Bernt Schiele
505
11,540
0
06 Apr 2016
1