ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.11432
  4. Cited By
Florence: A New Foundation Model for Computer Vision

Florence: A New Foundation Model for Computer Vision

22 November 2021
Lu Yuan
Dongdong Chen
Yi-Ling Chen
Noel Codella
Xiyang Dai
Jianfeng Gao
Houdong Hu
Xuedong Huang
Boxin Li
Chunyuan Li
Ce Liu
Mengchen Liu
Zicheng Liu
Yumao Lu
Yu Shi
Lijuan Wang
Jianfeng Wang
Bin Xiao
Zhen Xiao
Jianwei Yang
Michael Zeng
Luowei Zhou
Pengchuan Zhang
    VLM
ArXiv (abs)PDFHTML

Papers citing "Florence: A New Foundation Model for Computer Vision"

50 / 668 papers shown
Title
Video Task Decathlon: Unifying Image and Video Tasks in Autonomous
  Driving
Video Task Decathlon: Unifying Image and Video Tasks in Autonomous Driving
Thomas E. Huang
Yifan Liu
Luc Van Gool
Fisher Yu
120
5
0
08 Sep 2023
ImageBind-LLM: Multi-modality Instruction Tuning
ImageBind-LLM: Multi-modality Instruction Tuning
Jiaming Han
Renrui Zhang
Wenqi Shao
Peng Gao
Peng Xu
...
Yafei Wen
Xiaoxin Chen
Xiangyu Yue
Hongsheng Li
Yu Qiao
MLLM
105
125
0
07 Sep 2023
InstructDiffusion: A Generalist Modeling Interface for Vision Tasks
InstructDiffusion: A Generalist Modeling Interface for Vision Tasks
Zigang Geng
Binxin Yang
Tiankai Hang
Chen Li
Shuyang Gu
...
Jianmin Bao
Zheng Zhang
Han Hu
DongDong Chen
Baining Guo
DiffMVLM
131
107
0
07 Sep 2023
Distribution-Aware Prompt Tuning for Vision-Language Models
Distribution-Aware Prompt Tuning for Vision-Language Models
Eulrang Cho
Jooyeon Kim
Hyunwoo J. Kim
VPVLMVLM
59
23
0
06 Sep 2023
BDC-Adapter: Brownian Distance Covariance for Better Vision-Language
  Reasoning
BDC-Adapter: Brownian Distance Covariance for Better Vision-Language Reasoning
Yi Zhang
Ce Zhang
Zihan Liao
Yushun Tang
Zhihai He
BDLVLM
111
10
0
03 Sep 2023
Contrastive Feature Masking Open-Vocabulary Vision Transformer
Contrastive Feature Masking Open-Vocabulary Vision Transformer
Dahun Kim
A. Angelova
Weicheng Kuo
ObjDVLM
125
27
0
02 Sep 2023
Learning Speech Representation From Contrastive Token-Acoustic
  Pretraining
Learning Speech Representation From Contrastive Token-Acoustic Pretraining
Chunyu Qiang
Hao Li
Yixin Tian
Ruibo Fu
Tao Wang
Longbiao Wang
Jianwu Dang
130
5
0
01 Sep 2023
AttrSeg: Open-Vocabulary Semantic Segmentation via Attribute
  Decomposition-Aggregation
AttrSeg: Open-Vocabulary Semantic Segmentation via Attribute Decomposition-Aggregation
Chaofan Ma
Yu-Hao Yang
Chen Ju
Fei Zhang
Ya Zhang
Yanfeng Wang
VLM
141
19
0
31 Aug 2023
ViLTA: Enhancing Vision-Language Pre-training through Textual
  Augmentation
ViLTA: Enhancing Vision-Language Pre-training through Textual Augmentation
Weihan Wang
Zhiyong Yang
Bin Xu
Juanzi Li
Yankui Sun
VLM
96
8
0
31 Aug 2023
A General-Purpose Self-Supervised Model for Computational Pathology
A General-Purpose Self-Supervised Model for Computational Pathology
Richard J. Chen
Tong Ding
Ming Y. Lu
Drew F. K. Williamson
Guillaume Jaume
...
Judy J. Wang
Walt Williams
L. Le
Georg Gerber
Faisal Mahmood
MedIm
143
44
0
29 Aug 2023
CoVR: Learning Composed Video Retrieval from Web Video Captions
CoVR: Learning Composed Video Retrieval from Web Video Captions
Lucas Ventura
Antoine Yang
Cordelia Schmid
Gül Varol
93
21
0
28 Aug 2023
Spatio-Temporal Analysis of Patient-Derived Organoid Videos Using Deep
  Learning for the Prediction of Drug Efficacy
Spatio-Temporal Analysis of Patient-Derived Organoid Videos Using Deep Learning for the Prediction of Drug Efficacy
Leo Fillioux
E. Gontran
J. Cartry
J. Mathieu
Sabrina Bedja
A. Boilève
P. Cournède
F. Jaulin
Stergios Christodoulidis
Maria Vakalopoulou
64
6
0
28 Aug 2023
End-to-end Autonomous Driving using Deep Learning: A Systematic Review
End-to-end Autonomous Driving using Deep Learning: A Systematic Review
Apoorv Singh
95
9
0
27 Aug 2023
Computation-efficient Deep Learning for Computer Vision: A Survey
Computation-efficient Deep Learning for Computer Vision: A Survey
Yulin Wang
Yizeng Han
Chaofei Wang
Shiji Song
Qi Tian
Gao Huang
VLM
141
21
0
27 Aug 2023
Qwen-VL: A Versatile Vision-Language Model for Understanding,
  Localization, Text Reading, and Beyond
Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond
Jinze Bai
Shuai Bai
Shusheng Yang
Shijie Wang
Sinan Tan
Peng Wang
Junyang Lin
Chang Zhou
Jingren Zhou
MLLMVLMObjD
229
945
0
24 Aug 2023
Large Multilingual Models Pivot Zero-Shot Multimodal Learning across
  Languages
Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages
Jinyi Hu
Yuan Yao
Chong Wang
Shanonan Wang
Yinxu Pan
...
Yankai Lin
Jiao Xue
Dahai Li
Zhiyuan Liu
Maosong Sun
MLLMVLM
120
56
0
23 Aug 2023
EVE: Efficient Vision-Language Pre-training with Masked Prediction and
  Modality-Aware MoE
EVE: Efficient Vision-Language Pre-training with Masked Prediction and Modality-Aware MoE
Junyi Chen
Longteng Guo
Jianxiang Sun
Shuai Shao
Zehuan Yuan
Liang Lin
Dongyu Zhang
MLLMVLMMoE
85
10
0
23 Aug 2023
Improving Adversarial Robustness of Masked Autoencoders via Test-time
  Frequency-domain Prompting
Improving Adversarial Robustness of Masked Autoencoders via Test-time Frequency-domain Prompting
Qidong Huang
Xiaoyi Dong
DongDong Chen
Yinpeng Chen
Lu Yuan
Gang Hua
Weiming Zhang
Neng H. Yu
AAML
113
9
0
20 Aug 2023
ViT-Lens: Initiating Omni-Modal Exploration through 3D Insights
ViT-Lens: Initiating Omni-Modal Exploration through 3D Insights
Weixian Lei
Yixiao Ge
Jianfeng Zhang
Dylan Sun
Kun Yi
Ying Shan
Mike Zheng Shou
70
1
0
20 Aug 2023
Sensitivity analysis of AI-based algorithms for autonomous driving on
  optical wavefront aberrations induced by the windshield
Sensitivity analysis of AI-based algorithms for autonomous driving on optical wavefront aberrations induced by the windshield
D. Wolf
Markus Ulrich
Nikhil Kapoor
67
3
0
19 Aug 2023
DPL: Decoupled Prompt Learning for Vision-Language Models
DPL: Decoupled Prompt Learning for Vision-Language Models
C. Xu
Yuhan Zhu
Guozhen Zhang
Haocheng Shen
Yixuan Liao
Xiaoxin Chen
Gangshan Wu
Limin Wang
VLM
62
5
0
19 Aug 2023
Helping Hands: An Object-Aware Ego-Centric Video Recognition Model
Helping Hands: An Object-Aware Ego-Centric Video Recognition Model
Chuhan Zhang
Ankush Gupta
Andrew Zisserman
VLM
82
23
0
15 Aug 2023
Large Language Models and Foundation Models in Smart Agriculture:
  Basics, Opportunities, and Challenges
Large Language Models and Foundation Models in Smart Agriculture: Basics, Opportunities, and Challenges
Jiajia Li
Mingle Xu
Lirong Xiang
Dong Chen
Weichao Zhuang
Xunyuan Yin
Zhao Li
134
3
0
13 Aug 2023
FoodSAM: Any Food Segmentation
FoodSAM: Any Food Segmentation
Xing Lan
Jiayi Lyu
Han Jiang
Kunkun Dong
Zehai Niu
Yi Zhang
Jian Xue
VLM
98
28
0
11 Aug 2023
ViLP: Knowledge Exploration using Vision, Language, and Pose Embeddings
  for Video Action Recognition
ViLP: Knowledge Exploration using Vision, Language, and Pose Embeddings for Video Action Recognition
S. Chaudhuri
Saumik Bhattacharya
80
3
0
07 Aug 2023
Learning Concise and Descriptive Attributes for Visual Recognition
Learning Concise and Descriptive Attributes for Visual Recognition
Andy Yan
Yu Wang
Yiwu Zhong
Chengyu Dong
Zexue He
Yujie Lu
William Wang
Jingbo Shang
Julian McAuley
VLM
121
64
0
07 Aug 2023
Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen
  Convolutional CLIP
Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP
Qihang Yu
Ju He
XueQing Deng
Xiaohui Shen
Liang-Chieh Chen
VLMCLIP
103
152
0
04 Aug 2023
Multimodal Adaptation of CLIP for Few-Shot Action Recognition
Multimodal Adaptation of CLIP for Few-Shot Action Recognition
Jiazheng Xing
Mengmeng Wang
Xiaojun Hou
Guangwen Dai
Jingdong Wang
Yong-Jin Liu
VLM
70
0
0
03 Aug 2023
Lowis3D: Language-Driven Open-World Instance-Level 3D Scene
  Understanding
Lowis3D: Language-Driven Open-World Instance-Level 3D Scene Understanding
Runyu Ding
Jihan Yang
Chuhui Xue
Wenqing Zhang
Song Bai
Xiaojuan Qi
3DVVLM
84
29
0
01 Aug 2023
UnIVAL: Unified Model for Image, Video, Audio and Language Tasks
UnIVAL: Unified Model for Image, Video, Audio and Language Tasks
Mustafa Shukor
Corentin Dancette
Alexandre Ramé
Matthieu Cord
MoMeMLLM
128
46
0
30 Jul 2023
Sat2Cap: Mapping Fine-Grained Textual Descriptions from Satellite Images
Sat2Cap: Mapping Fine-Grained Textual Descriptions from Satellite Images
Aayush Dhakal
Adeel Ahmad
Subash Khanal
Srikumar Sastry
Hannah Kerner
Nathan Jacobs
84
13
0
29 Jul 2023
Cross-Modal Concept Learning and Inference for Vision-Language Models
Cross-Modal Concept Learning and Inference for Vision-Language Models
Yi Zhang
Ce Zhang
Yushun Tang
Z. He
VLMMLLMCLIP
86
16
0
28 Jul 2023
TextManiA: Enriching Visual Feature by Text-driven Manifold Augmentation
TextManiA: Enriching Visual Feature by Text-driven Manifold Augmentation
Moon Ye-Bin
Jisoo Kim
Hong-Kyu Kim
Kilho Son
Tae-Hyun Oh
86
9
0
27 Jul 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming-Hsuan Yang
Fahad Shahbaz Khan
VLM
150
128
0
25 Jul 2023
Meta-Transformer: A Unified Framework for Multimodal Learning
Meta-Transformer: A Unified Framework for Multimodal Learning
Yiyuan Zhang
Kaixiong Gong
Kaipeng Zhang
Hongsheng Li
Yu Qiao
Wanli Ouyang
Xiangyu Yue
105
150
0
20 Jul 2023
What Can Simple Arithmetic Operations Do for Temporal Modeling?
What Can Simple Arithmetic Operations Do for Temporal Modeling?
Wenhao Wu
Yuxin Song
Zhun Sun
Jingdong Wang
Chang Xu
Wanli Ouyang
98
11
0
18 Jul 2023
Self-regulating Prompts: Foundational Model Adaptation without
  Forgetting
Self-regulating Prompts: Foundational Model Adaptation without Forgetting
Muhammad Uzair Khattak
Syed Talal Wasim
Muzammal Naseer
Salman Khan
Ming-Hsuan Yang
Fahad Shahbaz Khan
VLM
97
189
0
13 Jul 2023
Vision Language Transformers: A Survey
Vision Language Transformers: A Survey
Clayton Fields
C. Kennington
VLM
66
5
0
06 Jul 2023
Distilling Large Vision-Language Model with Out-of-Distribution
  Generalizability
Distilling Large Vision-Language Model with Out-of-Distribution Generalizability
Xuanlin Li
Yunhao Fang
Minghua Liu
Z. Ling
Zhuowen Tu
Haoran Su
VLM
106
25
0
06 Jul 2023
Multi-Similarity Contrastive Learning
Multi-Similarity Contrastive Learning
Emily Mu
John Guttag
Maggie Makar
SSL
98
2
0
06 Jul 2023
An Efficient General-Purpose Modular Vision Model via Multi-Task
  Heterogeneous Training
An Efficient General-Purpose Modular Vision Model via Multi-Task Heterogeneous Training
Z. Chen
Mingyu Ding
Songlin Yang
Wei Zhan
Masayoshi Tomizuka
Erik Learned-Miller
Chuang Gan
MoE
67
8
0
29 Jun 2023
Prompt Ensemble Self-training for Open-Vocabulary Domain Adaptation
Prompt Ensemble Self-training for Open-Vocabulary Domain Adaptation
Jiaxing Huang
Jingyi Zhang
Han Qiu
Sheng Jin
Shijian Lu
VPVLMVLM
111
0
0
29 Jun 2023
Benchmarking Zero-Shot Recognition with Vision-Language Models:
  Challenges on Granularity and Specificity
Benchmarking Zero-Shot Recognition with Vision-Language Models: Challenges on Granularity and Specificity
Zhenlin Xu
Yi Zhu
Tiffany Deng
Abhay Mittal
Yanbei Chen
Manchen Wang
Paolo Favaro
Joseph Tighe
Davide Modolo
VLMCoGe
141
9
0
28 Jun 2023
Understanding Prompt Tuning for V-L Models Through the Lens of Neural
  Collapse
Understanding Prompt Tuning for V-L Models Through the Lens of Neural Collapse
Didi Zhu
Zexi Li
Min Zhang
Junkun Yuan
Yunfeng Shao
Jiashuo Liu
Kun Kuang
Yinchuan Li
Chao Wu
VLM
83
2
0
28 Jun 2023
CLIPA-v2: Scaling CLIP Training with 81.1% Zero-shot ImageNet Accuracy
  within a \$10,000 Budget; An Extra \$4,000 Unlocks 81.8% Accuracy
CLIPA-v2: Scaling CLIP Training with 81.1% Zero-shot ImageNet Accuracy within a \10,000 Budget; An Extra \4,000 Unlocks 81.8% Accuracy
Xianhang Li
Zeyu Wang
Cihang Xie
CLIPVLM
129
20
0
27 Jun 2023
What a MESS: Multi-Domain Evaluation of Zero-Shot Semantic Segmentation
What a MESS: Multi-Domain Evaluation of Zero-Shot Semantic Segmentation
Benedikt Blumenstiel
Johannes Jakubik
Hilde Kuhne
Michael Vossing
VLM
131
18
0
27 Jun 2023
Exploring Data Redundancy in Real-world Image Classification through
  Data Selection
Exploring Data Redundancy in Real-world Image Classification through Data Selection
Zhenyu Tang
Shaoting Zhang
Xiaosong Wang
44
3
0
25 Jun 2023
OpenMask3D: Open-Vocabulary 3D Instance Segmentation
OpenMask3D: Open-Vocabulary 3D Instance Segmentation
Ayca Takmaz
Elisabetta Fedele
R. Sumner
Marc Pollefeys
F. Tombari
Francis Engelmann
ISegVLM
97
173
0
23 Jun 2023
Robustness of Segment Anything Model (SAM) for Autonomous Driving in
  Adverse Weather Conditions
Robustness of Segment Anything Model (SAM) for Autonomous Driving in Adverse Weather Conditions
Xinru Shan
Chaoning Zhang
VLM
92
14
0
23 Jun 2023
AudioPaLM: A Large Language Model That Can Speak and Listen
AudioPaLM: A Large Language Model That Can Speak and Listen
Paul Kishan Rubenstein
Chulayuth Asawaroengchai
D. Nguyen
Ankur Bapna
Zalan Borsos
...
Neil Zeghidour
Yu Zhang
Zhishuai Zhang
Lukás Zilka
Christian Frank
LM&MAAuLLMVLM
149
295
0
22 Jun 2023
Previous
123...678...121314
Next