Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.11432
Cited By
Florence: A New Foundation Model for Computer Vision
22 November 2021
Lu Yuan
Dongdong Chen
Yi-Ling Chen
Noel Codella
Xiyang Dai
Jianfeng Gao
Houdong Hu
Xuedong Huang
Boxin Li
Chunyuan Li
Ce Liu
Mengchen Liu
Zicheng Liu
Yumao Lu
Yu Shi
Lijuan Wang
Jianfeng Wang
Bin Xiao
Zhen Xiao
Jianwei Yang
Michael Zeng
Luowei Zhou
Pengchuan Zhang
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Florence: A New Foundation Model for Computer Vision"
50 / 668 papers shown
Title
A survey on knowledge-enhanced multimodal learning
Maria Lymperaiou
Giorgos Stamou
176
15
0
19 Nov 2022
Task Residual for Tuning Vision-Language Models
Tao Yu
Zhihe Lu
Xin Jin
Zhibo Chen
Xinchao Wang
VLM
CLIP
111
92
0
18 Nov 2022
Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks
Hao Li
Jinguo Zhu
Xiaohu Jiang
Xizhou Zhu
Hongsheng Li
...
Xiaohua Wang
Yu Qiao
Xiaogang Wang
Wenhai Wang
Jifeng Dai
MLLM
89
58
0
17 Nov 2022
I Can't Believe There's No Images! Learning Visual Tasks Using only Language Supervision
Sophia Gu
Christopher Clark
Aniruddha Kembhavi
VLM
81
26
0
17 Nov 2022
UniFormerV2: Spatiotemporal Learning by Arming Image ViTs with Video UniFormer
Kunchang Li
Yali Wang
Yinan He
Yizhuo Li
Yi Wang
Limin Wang
Yu Qiao
ViT
136
113
0
17 Nov 2022
Language-Assisted Deep Learning for Autistic Behaviors Recognition
Andong Deng
Taojiannan Yang
Chong Chen
Qian Chen
Leslie C. Neely
Sakiko Oyama
65
8
0
17 Nov 2022
PromptCap: Prompt-Guided Task-Aware Image Captioning
Yushi Hu
Hang Hua
Zhengyuan Yang
Weijia Shi
Noah A. Smith
Jiebo Luo
127
106
0
15 Nov 2022
Self-supervised remote sensing feature learning: Learning Paradigms, Challenges, and Future Works
Chao Tao
Ji Qi
Mingning Guo
Qing Zhu
Haifeng Li
SSL
104
59
0
15 Nov 2022
FedTune: A Deep Dive into Efficient Federated Fine-Tuning with Pre-trained Transformers
Jinyu Chen
Wenchao Xu
Song Guo
Junxiao Wang
Jie Zhang
Yining Qi
FedML
83
36
0
15 Nov 2022
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
Yuxin Fang
Wen Wang
Binhui Xie
Quan-Sen Sun
Ledell Yu Wu
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
VLM
CLIP
269
730
0
14 Nov 2022
ContextCLIP: Contextual Alignment of Image-Text pairs on CLIP visual representations
Chanda Grover
Indra Deep Mastan
Debayan Gupta
VLM
CLIP
61
4
0
14 Nov 2022
Steps towards prompt-based creation of virtual worlds
Jasmine Roberts
Andrzej Banburski-Fahey
J. Lanier
71
14
0
10 Nov 2022
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
Wenhai Wang
Jifeng Dai
Zhe Chen
Zhenhang Huang
Zhiqi Li
...
Tong Lu
Lewei Lu
Hongsheng Li
Xiaogang Wang
Yu Qiao
VLM
193
700
0
10 Nov 2022
ERNIE-UniX2: A Unified Cross-lingual Cross-modal Framework for Understanding and Generation
Bin Shan
Yaqian Han
Weichong Yin
Shuohuan Wang
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
MLLM
VLM
88
8
0
09 Nov 2022
Deep Learning based Computer Vision Methods for Complex Traffic Environments Perception: A Review
Talha Azfar
Jinlong Li
Hongkai Yu
R. Cheu
Yisheng Lv
Ruimin Ke
63
22
0
09 Nov 2022
State-of-the-art Models for Object Detection in Various Fields of Application
S. A. G. Naqvi
Syed Shahnawaz Ali
ObjD
OOD
132
0
0
01 Nov 2022
Open-vocabulary Semantic Segmentation with Frozen Vision-Language Models
Chaofan Ma
Yu-Hao Yang
Yanfeng Wang
Ya Zhang
Weidi Xie
VLM
86
48
0
27 Oct 2022
M
3
^3
3
ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design
Hanxue Liang
Zhiwen Fan
Rishov Sarkar
Ziyu Jiang
Tianlong Chen
Kai Zou
Yu Cheng
Cong Hao
Zhangyang Wang
MoE
77
88
0
26 Oct 2022
Compressing And Debiasing Vision-Language Pre-Trained Models for Visual Question Answering
Q. Si
Yuanxin Liu
Zheng Lin
Peng Fu
Weiping Wang
VLM
120
1
0
26 Oct 2022
CPL: Counterfactual Prompt Learning for Vision and Language Models
Xuehai He
Diji Yang
Weixi Feng
Tsu-Jui Fu
Arjun Reddy Akula
Varun Jampani
P. Narayana
Sugato Basu
William Yang Wang
Xinze Wang
VPVLM
VLM
100
15
0
19 Oct 2022
MedCLIP: Contrastive Learning from Unpaired Medical Images and Text
Zifeng Wang
Zhenbang Wu
Dinesh Agarwal
Jimeng Sun
CLIP
VLM
MedIm
138
436
0
18 Oct 2022
FIMP: Foundation Model-Informed Message Passing for Graph Neural Networks
S. Rizvi
Nazreen Pallikkavaliyaveetil
David Zhang
Zhuoyang Lyu
Nhi Nguyen
...
Amin Karbasi
Rex Ying
Maria Brbić
Rahul M. Dhodapkar
David van Dijk
GNN
AI4CE
23
1
0
17 Oct 2022
Visual Classification via Description from Large Language Models
Sachit Menon
Carl Vondrick
VLM
113
303
0
13 Oct 2022
SVL-Adapter: Self-Supervised Adapter for Vision-Language Pretrained Models
Omiros Pantazis
Gabriel J. Brostow
Kate E. Jones
Oisin Mac Aodha
VLM
84
42
0
07 Oct 2022
MaPLe: Multi-modal Prompt Learning
Muhammad Uzair Khattak
H. Rasheed
Muhammad Maaz
Salman Khan
Fahad Shahbaz Khan
VPVLM
VLM
330
574
0
06 Oct 2022
CLIP model is an Efficient Continual Learner
Vishal Thengane
Salman Khan
Munawar Hayat
Fahad Shahbaz Khan
BDL
VLM
CLL
175
51
0
06 Oct 2022
VIMA: General Robot Manipulation with Multimodal Prompts
Yunfan Jiang
Agrim Gupta
Zichen Zhang
Guanzhi Wang
Yongqiang Dou
Yanjun Chen
Li Fei-Fei
Anima Anandkumar
Yuke Zhu
Linxi Fan
LM&Ro
133
356
0
06 Oct 2022
Towards a Unified View on Visual Parameter-Efficient Transfer Learning
Bruce X. B. Yu
Jianlong Chang
Lin Liu
Qi Tian
Changan Chen
VPVLM
VLM
115
36
0
03 Oct 2022
ERNIE-ViL 2.0: Multi-view Contrastive Learning for Image-Text Pre-training
Bin Shan
Weichong Yin
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
VLM
81
19
0
30 Sep 2022
Linearly Mapping from Image to Text Space
Jack Merullo
Louis Castricato
Carsten Eickhoff
Ellie Pavlick
VLM
251
118
0
30 Sep 2022
REST: REtrieve & Self-Train for generative action recognition
Adrian Bulat
Enrique Sanchez
Brais Martínez
Georgios Tzimiropoulos
VLM
63
4
0
29 Sep 2022
PACT: Perception-Action Causal Transformer for Autoregressive Robotics Pre-Training
Rogerio Bonatti
Sai H. Vemprala
Shuang Ma
Felipe Vieira Frujeri
Shuhang Chen
Ashish Kapoor
94
23
0
22 Sep 2022
OmniVL:One Foundation Model for Image-Language and Video-Language Tasks
Junke Wang
Dongdong Chen
Zuxuan Wu
Chong Luo
Luowei Zhou
Yucheng Zhao
Yujia Xie
Ce Liu
Yu-Gang Jiang
Lu Yuan
MLLM
VLM
152
153
0
15 Sep 2022
Correlation Information Bottleneck: Towards Adapting Pretrained Multimodal Models for Robust Visual Question Answering
Jingjing Jiang
Zi-yi Liu
Nanning Zheng
114
8
0
14 Sep 2022
PaLI: A Jointly-Scaled Multilingual Language-Image Model
Xi Chen
Tianlin Li
Soravit Changpinyo
A. Piergiovanni
Piotr Padlewski
...
Andreas Steiner
A. Angelova
Xiaohua Zhai
N. Houlsby
Radu Soricut
MLLM
VLM
256
742
0
14 Sep 2022
CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment
Hongwei Xue
Yuchong Sun
Bei Liu
Jianlong Fu
Rui Song
Houqiang Li
Jiebo Luo
CLIP
VLM
137
71
0
14 Sep 2022
PreSTU: Pre-Training for Scene-Text Understanding
Jihyung Kil
Soravit Changpinyo
Xi Chen
Hexiang Hu
Sebastian Goodman
Wei-Lun Chao
Radu Soricut
VLM
193
29
0
12 Sep 2022
VL-Taboo: An Analysis of Attribute-based Zero-shot Capabilities of Vision-Language Models
Felix Vogel
Nina Shvetsova
Leonid Karlinsky
Hilde Kuehne
VLM
113
7
0
12 Sep 2022
FETA: Towards Specializing Foundation Models for Expert Task Applications
Amit Alfassy
Assaf Arbelle
Oshri Halimi
Sivan Harary
Roei Herzig
...
Christoph Auer
Kate Saenko
Peter W. J. Staar
Rogerio Feris
Leonid Karlinsky
90
20
0
08 Sep 2022
Statistical Foundation Behind Machine Learning and Its Impact on Computer Vision
Lei Zhang
H. Shum
VLM
SSL
70
2
0
06 Sep 2022
Design of the topology for contrastive visual-textual alignment
Zhun Sun
99
1
0
05 Sep 2022
Diffusion Models: A Comprehensive Survey of Methods and Applications
Ling Yang
Zhilong Zhang
Yingxia Shao
Shenda Hong
Runsheng Xu
Yue Zhao
Wentao Zhang
Tengjiao Wang
Ming-Hsuan Yang
DiffM
MedIm
572
1,428
0
02 Sep 2022
Injecting Image Details into CLIP's Feature Space
Zilun Zhang
Cuifeng Shen
Yuan-Chung Shen
Huixin Xiong
Xinyu Zhou
VLM
CLIP
79
0
0
31 Aug 2022
Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment
Mustafa Shukor
Guillaume Couairon
Matthieu Cord
VLM
CLIP
100
27
0
29 Aug 2022
PromptFL: Let Federated Participants Cooperatively Learn Prompts Instead of Models -- Federated Learning in Age of Foundation Model
Tao Guo
Song Guo
Junxiao Wang
Wenchao Xu
FedML
VLM
LRM
71
128
0
24 Aug 2022
Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks
Wenhui Wang
Hangbo Bao
Li Dong
Johan Bjorck
Zhiliang Peng
...
Kriti Aggarwal
O. Mohammed
Saksham Singhal
Subhojit Som
Furu Wei
MLLM
VLM
ViT
175
647
0
22 Aug 2022
Deep is a Luxury We Don't Have
Ahmed Taha
Yen Nhi Truong Vu
Brent Mombourquette
Thomas P. Matthews
Jason Su
Sadanand Singh
ViT
MedIm
66
2
0
11 Aug 2022
Language-Guided Face Animation by Recurrent StyleGAN-based Generator
Tiankai Hang
Huan Yang
Bei Liu
Jianlong Fu
Xin Geng
B. Guo
VGen
108
13
0
11 Aug 2022
Quality Not Quantity: On the Interaction between Dataset Design and Robustness of CLIP
Thao Nguyen
Gabriel Ilharco
Mitchell Wortsman
Sewoong Oh
Ludwig Schmidt
CLIP
VLM
190
108
0
10 Aug 2022
Advancing Plain Vision Transformer Towards Remote Sensing Foundation Model
Di Wang
Qiming Zhang
Yufei Xu
Jing Zhang
Bo Du
Dacheng Tao
Lefei Zhang
90
257
0
08 Aug 2022
Previous
1
2
3
...
11
12
13
14
Next