Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.11432
Cited By
Florence: A New Foundation Model for Computer Vision
22 November 2021
Lu Yuan
Dongdong Chen
Yi-Ling Chen
Noel Codella
Xiyang Dai
Jianfeng Gao
Houdong Hu
Xuedong Huang
Boxin Li
Chunyuan Li
Ce Liu
Mengchen Liu
Zicheng Liu
Yumao Lu
Yu Shi
Lijuan Wang
Jianfeng Wang
Bin Xiao
Zhen Xiao
Jianwei Yang
Michael Zeng
Luowei Zhou
Pengchuan Zhang
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Florence: A New Foundation Model for Computer Vision"
50 / 664 papers shown
Title
Spatio-Temporal Analysis of Patient-Derived Organoid Videos Using Deep Learning for the Prediction of Drug Efficacy
Leo Fillioux
E. Gontran
J. Cartry
J. Mathieu
Sabrina Bedja
A. Boilève
P. Cournède
F. Jaulin
Stergios Christodoulidis
Maria Vakalopoulou
22
6
0
28 Aug 2023
End-to-end Autonomous Driving using Deep Learning: A Systematic Review
Apoorv Singh
43
9
0
27 Aug 2023
Computation-efficient Deep Learning for Computer Vision: A Survey
Yulin Wang
Yizeng Han
Chaofei Wang
Shiji Song
Qi Tian
Gao Huang
VLM
36
20
0
27 Aug 2023
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
Jinze Bai
Shuai Bai
Shusheng Yang
Shijie Wang
Sinan Tan
Peng Wang
Junyang Lin
Chang Zhou
Jingren Zhou
MLLM
VLM
ObjD
50
811
0
24 Aug 2023
Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages
Jinyi Hu
Yuan Yao
Chong Wang
Shanonan Wang
Yinxu Pan
...
Yankai Lin
Jiao Xue
Dahai Li
Zhiyuan Liu
Maosong Sun
MLLM
VLM
35
48
0
23 Aug 2023
EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE
Junyi Chen
Longteng Guo
Jianxiang Sun
Shuai Shao
Zehuan Yuan
Liang Lin
Dongyu Zhang
MLLM
VLM
MoE
60
9
0
23 Aug 2023
Improving Adversarial Robustness of Masked Autoencoders via Test-time Frequency-domain Prompting
Qidong Huang
Xiaoyi Dong
Dongdong Chen
Yinpeng Chen
Lu Yuan
Gang Hua
Weiming Zhang
Neng H. Yu
AAML
30
9
0
20 Aug 2023
ViT-Lens: Initiating Omni-Modal Exploration through 3D Insights
Weixian Lei
Yixiao Ge
Jianfeng Zhang
Dylan Sun
Kun Yi
Ying Shan
Mike Zheng Shou
33
1
0
20 Aug 2023
Sensitivity analysis of AI-based algorithms for autonomous driving on optical wavefront aberrations induced by the windshield
D. Wolf
Markus Ulrich
Nikhil Kapoor
37
3
0
19 Aug 2023
DPL: Decoupled Prompt Learning for Vision-Language Models
C. Xu
Yuhan Zhu
Guozhen Zhang
Haocheng Shen
Yixuan Liao
Xiaoxin Chen
Gangshan Wu
Limin Wang
VLM
29
4
0
19 Aug 2023
Helping Hands: An Object-Aware Ego-Centric Video Recognition Model
Chuhan Zhang
Ankush Gupta
Andrew Zisserman
VLM
32
20
0
15 Aug 2023
Large Language Models and Foundation Models in Smart Agriculture: Basics, Opportunities, and Challenges
Jiajia Li
Mingle Xu
Lirong Xiang
Dong Chen
Weichao Zhuang
Xunyuan Yin
Zhao Li
39
3
0
13 Aug 2023
FoodSAM: Any Food Segmentation
Xing Lan
Jiayi Lyu
Han Jiang
Kunkun Dong
Zehai Niu
Yi Zhang
Jian Xue
VLM
31
25
0
11 Aug 2023
ViLP: Knowledge Exploration using Vision, Language, and Pose Embeddings for Video Action Recognition
S. Chaudhuri
Saumik Bhattacharya
27
3
0
07 Aug 2023
Learning Concise and Descriptive Attributes for Visual Recognition
Andy Yan
Yu-Xiang Wang
Yiwu Zhong
Chengyu Dong
Zexue He
Yujie Lu
William Wang
Jingbo Shang
Julian McAuley
VLM
27
60
0
07 Aug 2023
Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP
Qihang Yu
Ju He
XueQing Deng
Xiaohui Shen
Liang-Chieh Chen
VLM
CLIP
45
136
0
04 Aug 2023
Multimodal Adaptation of CLIP for Few-Shot Action Recognition
Jiazheng Xing
Mengmeng Wang
Xiaojun Hou
Guangwen Dai
Jingdong Wang
Yong-Jin Liu
VLM
22
0
0
03 Aug 2023
Lowis3D: Language-Driven Open-World Instance-Level 3D Scene Understanding
Runyu Ding
Jihan Yang
Chuhui Xue
Wenqing Zhang
Song Bai
Xiaojuan Qi
3DV
VLM
29
28
0
01 Aug 2023
UnIVAL: Unified Model for Image, Video, Audio and Language Tasks
Mustafa Shukor
Corentin Dancette
Alexandre Ramé
Matthieu Cord
MoMe
MLLM
61
42
0
30 Jul 2023
Sat2Cap: Mapping Fine-Grained Textual Descriptions from Satellite Images
Aayush Dhakal
Adeel Ahmad
Subash Khanal
Srikumar Sastry
Hannah Kerner
Nathan Jacobs
33
13
0
29 Jul 2023
Cross-Modal Concept Learning and Inference for Vision-Language Models
Yi Zhang
Ce Zhang
Yushun Tang
Z. He
VLM
MLLM
CLIP
39
15
0
28 Jul 2023
TextManiA: Enriching Visual Feature by Text-driven Manifold Augmentation
Moon Ye-Bin
Jisoo Kim
Hong-Kyu Kim
Kilho Son
Tae-Hyun Oh
34
9
0
27 Jul 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming Yang
Fahad Shahbaz Khan
VLM
38
119
0
25 Jul 2023
Meta-Transformer: A Unified Framework for Multimodal Learning
Yiyuan Zhang
Kaixiong Gong
Kaipeng Zhang
Hongsheng Li
Yu Qiao
Wanli Ouyang
Xiangyu Yue
33
137
0
20 Jul 2023
What Can Simple Arithmetic Operations Do for Temporal Modeling?
Wenhao Wu
Yuxin Song
Zhun Sun
Jingdong Wang
Chang Xu
Wanli Ouyang
40
8
0
18 Jul 2023
Self-regulating Prompts: Foundational Model Adaptation without Forgetting
Muhammad Uzair Khattak
Syed Talal Wasim
Muzammal Naseer
Salman Khan
Ming Yang
Fahad Shahbaz Khan
VLM
23
168
0
13 Jul 2023
Vision Language Transformers: A Survey
Clayton Fields
C. Kennington
VLM
28
5
0
06 Jul 2023
Distilling Large Vision-Language Model with Out-of-Distribution Generalizability
Xuanlin Li
Yunhao Fang
Minghua Liu
Z. Ling
Zhuowen Tu
Haoran Su
VLM
31
25
0
06 Jul 2023
Multi-Similarity Contrastive Learning
Emily Mu
John Guttag
Maggie Makar
SSL
36
2
0
06 Jul 2023
An Efficient General-Purpose Modular Vision Model via Multi-Task Heterogeneous Training
Z. Chen
Mingyu Ding
Songlin Yang
Wei Zhan
Masayoshi Tomizuka
Erik Learned-Miller
Chuang Gan
MoE
32
8
0
29 Jun 2023
Prompt Ensemble Self-training for Open-Vocabulary Domain Adaptation
Jiaxing Huang
Jingyi Zhang
Han Qiu
Sheng Jin
Shijian Lu
VPVLM
VLM
24
0
0
29 Jun 2023
Benchmarking Zero-Shot Recognition with Vision-Language Models: Challenges on Granularity and Specificity
Zhenlin Xu
Yi Zhu
Tiffany Deng
Abhay Mittal
Yanbei Chen
Manchen Wang
Paolo Favaro
Joseph Tighe
Davide Modolo
VLM
CoGe
33
8
0
28 Jun 2023
Understanding Prompt Tuning for V-L Models Through the Lens of Neural Collapse
Didi Zhu
Zexi Li
Min Zhang
Junkun Yuan
Yunfeng Shao
Jiashuo Liu
Kun Kuang
Yinchuan Li
Chao Wu
VLM
31
1
0
28 Jun 2023
CLIPA-v2: Scaling CLIP Training with 81.1% Zero-shot ImageNet Accuracy within a \
10,000 Budget; An Extra \
4,000 Unlocks 81.8% Accuracy
Xianhang Li
Zeyu Wang
Cihang Xie
CLIP
VLM
56
19
0
27 Jun 2023
What a MESS: Multi-Domain Evaluation of Zero-Shot Semantic Segmentation
Benedikt Blumenstiel
Johannes Jakubik
Hilde Kuhne
Michael Vossing
VLM
32
15
0
27 Jun 2023
Exploring Data Redundancy in Real-world Image Classification through Data Selection
Zhenyu Tang
Shaoting Zhang
Xiaosong Wang
30
2
0
25 Jun 2023
OpenMask3D: Open-Vocabulary 3D Instance Segmentation
Ayca Takmaz
Elisabetta Fedele
R. Sumner
Marc Pollefeys
F. Tombari
Francis Engelmann
ISeg
VLM
36
164
0
23 Jun 2023
Robustness of Segment Anything Model (SAM) for Autonomous Driving in Adverse Weather Conditions
Xinru Shan
Chaoning Zhang
VLM
33
12
0
23 Jun 2023
AudioPaLM: A Large Language Model That Can Speak and Listen
Paul Kishan Rubenstein
Chulayuth Asawaroengchai
D. Nguyen
Ankur Bapna
Zalan Borsos
...
Neil Zeghidour
Yu Zhang
Zhishuai Zhang
Lukás Zilka
Christian Frank
LM&MA
AuLLM
VLM
48
264
0
22 Jun 2023
RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large Vision-Language Model for Remote Sensing
Zilun Zhang
Tiancheng Zhao
Yulong Guo
Jianwei Yin
DiffM
VLM
32
54
0
20 Jun 2023
RemoteCLIP: A Vision Language Foundation Model for Remote Sensing
F. Liu
Delong Chen
Zhan-Rong Guan
Xiaocong Zhou
Jiale Zhu
Qiaolin Ye
Liyong Fu
Jun Zhou
VLM
71
193
0
19 Jun 2023
A Universal Semantic-Geometric Representation for Robotic Manipulation
Tong Zhang
Yingdong Hu
Hanchen Cui
Hang Zhao
Yang Gao
84
18
0
18 Jun 2023
LabelBench: A Comprehensive Framework for Benchmarking Adaptive Label-Efficient Learning
Jifan Zhang
Yifang Chen
Gregory H. Canal
Stephen Mussmann
Arnav M. Das
...
Yinglun Zhu
Jeffrey Bilmes
S. Du
Kevin G. Jamieson
Robert D. Nowak
VLM
33
10
0
16 Jun 2023
Robustness Analysis on Foundational Segmentation Models
Madeline Chantry Schiappa
Shehreen Azad
V. Sachidanand
Yunhao Ge
O. Mikšík
Yogesh S Rawat
Vibhav Vineet
OOD
VLM
AAML
30
6
0
15 Jun 2023
Encyclopedic VQA: Visual questions about detailed properties of fine-grained categories
Thomas Mensink
J. Uijlings
Lluis Castrejon
A. Goel
Felipe Cadar
Howard Zhou
Fei Sha
A. Araújo
V. Ferrari
42
37
0
15 Jun 2023
COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
Sihan Chen
Xingjian He
Handong Li
Xiaojie Jin
Jiashi Feng
Qingbin Liu
VLM
CLIP
30
8
0
15 Jun 2023
Extending CLIP's Image-Text Alignment to Referring Image Segmentation
Seoyeon Kim
Minguk Kang
Dongwon Kim
Jaesik Park
Suha Kwak
VLM
30
10
0
14 Jun 2023
MOFI: Learning Image Representations from Noisy Entity Annotated Images
Wentao Wu
Aleksei Timofeev
Chen Chen
Bowen Zhang
Kun Duan
...
Yantao Zheng
Jonathon Shlens
Xianzhi Du
Zhe Gan
Yinfei Yang
VLM
26
7
0
13 Jun 2023
Robustness of SAM: Segment Anything Under Corruptions and Beyond
Yu Qiao
Chaoning Zhang
Taegoo Kang
Donghun Kim
Chenshuang Zhang
Choong Seon Hong
AAML
25
33
0
13 Jun 2023
A Comprehensive Survey on Applications of Transformers for Deep Learning Tasks
Saidul Islam
Hanae Elmekki
Ahmed Elsebai
Jamal Bentahar
Najat Drawel
Gaith Rjoub
Witold Pedrycz
ViT
MedIm
24
174
0
11 Jun 2023
Previous
1
2
3
...
6
7
8
...
12
13
14
Next