ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.11432
  4. Cited By
Florence: A New Foundation Model for Computer Vision

Florence: A New Foundation Model for Computer Vision

22 November 2021
Lu Yuan
Dongdong Chen
Yi-Ling Chen
Noel Codella
Xiyang Dai
Jianfeng Gao
Houdong Hu
Xuedong Huang
Boxin Li
Chunyuan Li
Ce Liu
Mengchen Liu
Zicheng Liu
Yumao Lu
Yu Shi
Lijuan Wang
Jianfeng Wang
Bin Xiao
Zhen Xiao
Jianwei Yang
Michael Zeng
Luowei Zhou
Pengchuan Zhang
    VLM
ArXiv (abs)PDFHTML

Papers citing "Florence: A New Foundation Model for Computer Vision"

50 / 668 papers shown
Title
Parameter Efficient Fine Tuning: A Comprehensive Analysis Across
  Applications
Parameter Efficient Fine Tuning: A Comprehensive Analysis Across Applications
Charith Chandra Sai Balne
S. Bhaduri
Tamoghna Roy
Vinija Jain
Aman Chadha
108
20
0
21 Apr 2024
Progressive Multi-modal Conditional Prompt Tuning
Progressive Multi-modal Conditional Prompt Tuning
Xiaoyu Qiu
Hao Feng
Yuechen Wang
Wen-gang Zhou
Houqiang Li
VLM
95
3
0
18 Apr 2024
Pretraining Billion-scale Geospatial Foundational Models on Frontier
Pretraining Billion-scale Geospatial Foundational Models on Frontier
A. Tsaris
P. Dias
Abhishek Potnis
Junqi Yin
Feiyi Wang
D. Lunga
AI4CE
47
5
0
17 Apr 2024
A Progressive Framework of Vision-language Knowledge Distillation and
  Alignment for Multilingual Scene
A Progressive Framework of Vision-language Knowledge Distillation and Alignment for Multilingual Scene
Wenbo Zhang
Yifan Zhang
Jianfeng Lin
Binqiang Huang
Jinlu Zhang
Wenhao Yu
VLM
105
2
0
17 Apr 2024
Optimization of Prompt Learning via Multi-Knowledge Representation for
  Vision-Language Models
Optimization of Prompt Learning via Multi-Knowledge Representation for Vision-Language Models
Enming Zhang
Bingke Zhu
Yingying Chen
Qinghai Miao
Ming Tang
Jinqiao Wang
VLM
69
0
0
16 Apr 2024
Evolving Interpretable Visual Classifiers with Large Language Models
Evolving Interpretable Visual Classifiers with Large Language Models
Mia Chiquier
Utkarsh Mall
Carl Vondrick
VLM
103
11
0
15 Apr 2024
Leveraging Temporal Contextualization for Video Action Recognition
Leveraging Temporal Contextualization for Video Action Recognition
Minji Kim
Dongyoon Han
Taekyung Kim
Bohyung Han
98
2
0
15 Apr 2024
LLM-Seg: Bridging Image Segmentation and Large Language Model Reasoning
LLM-Seg: Bridging Image Segmentation and Large Language Model Reasoning
Junchi Wang
Lei Ke
MLLMLRMVLM
85
29
0
12 Apr 2024
On the Robustness of Language Guidance for Low-Level Vision Tasks:
  Findings from Depth Estimation
On the Robustness of Language Guidance for Low-Level Vision Tasks: Findings from Depth Estimation
Agneet Chatterjee
Tejas Gokhale
Chitta Baral
Yezhou Yang
VLM
65
2
0
12 Apr 2024
Generalized Contrastive Learning for Multi-Modal Retrieval and Ranking
Generalized Contrastive Learning for Multi-Modal Retrieval and Ranking
Tianyu Zhu
M. Jung
Jesse Clark
143
1
0
12 Apr 2024
PromptSync: Bridging Domain Gaps in Vision-Language Models through
  Class-Aware Prototype Alignment and Discrimination
PromptSync: Bridging Domain Gaps in Vision-Language Models through Class-Aware Prototype Alignment and Discrimination
Anant Khandelwal
VLM
77
1
0
11 Apr 2024
BRAVE: Broadening the visual encoding of vision-language models
BRAVE: Broadening the visual encoding of vision-language models
Ouguzhan Fatih Kar
A. Tonioni
Petra Poklukar
Achin Kulshrestha
Amir Zamir
Federico Tombari
MLLMVLM
80
32
0
10 Apr 2024
On the Surprising Efficacy of Distillation as an Alternative to
  Pre-Training Small Models
On the Surprising Efficacy of Distillation as an Alternative to Pre-Training Small Models
Sean Farhat
Deming Chen
120
0
0
04 Apr 2024
SalFoM: Dynamic Saliency Prediction with Video Foundation Models
SalFoM: Dynamic Saliency Prediction with Video Foundation Models
Morteza Moradi
Mohammad Moradi
Francesco Rundo
C. Spampinato
Ali Borji
S. Palazzo
84
1
0
03 Apr 2024
Cross-Modal Conditioned Reconstruction for Language-guided Medical Image
  Segmentation
Cross-Modal Conditioned Reconstruction for Language-guided Medical Image Segmentation
Xiaoshuang Huang
Hongxiang Li
Meng Cao
Long Chen
Chenyu You
Dong An
VLM
99
5
0
03 Apr 2024
Foundation Models for Structural Health Monitoring
Foundation Models for Structural Health Monitoring
Luca Benfenati
Daniele Jahier Pagliari
Luca Zanatta
Yhorman Alexander Bedoya Velez
Andrea Acquaviva
Massimo Poncino
Enrico Macii
Luca Benini
Luca Bompani
AI4CE
84
2
0
03 Apr 2024
Segment Any 3D Object with Language
Segment Any 3D Object with Language
Seungjun Lee
Yuyang Zhao
Gim Hee Lee
84
1
0
02 Apr 2024
Transformer based Pluralistic Image Completion with Reduced Information
  Loss
Transformer based Pluralistic Image Completion with Reduced Information Loss
Qiankun Liu
Yuqi Jiang
Zhentao Tan
DongDong Chen
Ying Fu
Qi Chu
Gang Hua
Nenghai Yu
ViT
114
12
0
31 Mar 2024
LORD: Large Models based Opposite Reward Design for Autonomous Driving
LORD: Large Models based Opposite Reward Design for Autonomous Driving
Xin Ye
Feng Tao
Abhirup Mallik
Burhaneddin Yaman
Liu Ren
OffRL
119
5
0
27 Mar 2024
Open-Set Recognition in the Age of Vision-Language Models
Open-Set Recognition in the Age of Vision-Language Models
Dimity Miller
Niko Sünderhauf
Alex Kenna
Keita Mason
VLM
68
6
0
25 Mar 2024
Enhancing Visual Continual Learning with Language-Guided Supervision
Enhancing Visual Continual Learning with Language-Guided Supervision
Bolin Ni
Hongbo Zhao
Chenghao Zhang
Ke Hu
Gaofeng Meng
Zhaoxiang Zhang
Shiming Xiang
CLLVLM
135
4
0
24 Mar 2024
SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal
  Visual Object Tracking
SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking
Xiaojun Hou
Jiazheng Xing
Yijie Qian
Yaowei Guo
Shuo Xin
...
Kai Tang
Mengmeng Wang
Zhengkai Jiang
Liang Liu
Yong-Jin Liu
105
29
0
24 Mar 2024
Few-Shot Adversarial Prompt Learning on Vision-Language Models
Few-Shot Adversarial Prompt Learning on Vision-Language Models
Yiwei Zhou
Xiaobo Xia
Zhiwei Lin
Bo Han
Tongliang Liu
VLM
106
16
0
21 Mar 2024
MTP: Advancing Remote Sensing Foundation Model via Multi-Task
  Pretraining
MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining
Di Wang
Jing Zhang
Minqiang Xu
Lin Liu
Dongsheng Wang
...
Chengxi Han
Haonan Guo
Bo Du
Dacheng Tao
Lefei Zhang
83
53
0
20 Mar 2024
FaceXFormer: A Unified Transformer for Facial Analysis
FaceXFormer: A Unified Transformer for Facial Analysis
Kartik Narayan
VS Vibashan
Rama Chellappa
Vishal M. Patel
ViT
143
13
0
19 Mar 2024
Compositional Kronecker Context Optimization for Vision-Language Models
Compositional Kronecker Context Optimization for Vision-Language Models
Kun Ding
Xiaohui Li
Qiang Yu
Ying Wang
Haojian Zhang
Shiming Xiang
VLM
79
0
0
18 Mar 2024
Generative Region-Language Pretraining for Open-Ended Object Detection
Generative Region-Language Pretraining for Open-Ended Object Detection
Chuang Lin
Yi Jiang
Zhuang Li
Zehuan Yuan
Jianfei Cai
ObjDVLM
86
20
0
15 Mar 2024
UniCode: Learning a Unified Codebook for Multimodal Large Language
  Models
UniCode: Learning a Unified Codebook for Multimodal Large Language Models
Sipeng Zheng
Bohan Zhou
Yicheng Feng
Ye Wang
Zongqing Lu
VLMMLLM
86
9
0
14 Mar 2024
CLIP-BEVFormer: Enhancing Multi-View Image-Based BEV Detector with
  Ground Truth Flow
CLIP-BEVFormer: Enhancing Multi-View Image-Based BEV Detector with Ground Truth Flow
Chenbin Pan
Burhaneddin Yaman
Senem Velipasalar
Liu Ren
101
11
0
13 Mar 2024
MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with
  Module-wise Pruning Error Metric
MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric
Haokun Lin
Haoli Bai
Zhili Liu
Lu Hou
Muyi Sun
Linqi Song
Ying Wei
Zhenan Sun
CLIPVLM
96
17
0
12 Mar 2024
Improving deep learning with prior knowledge and cognitive models: A
  survey on enhancing explainability, adversarial robustness and zero-shot
  learning
Improving deep learning with prior knowledge and cognitive models: A survey on enhancing explainability, adversarial robustness and zero-shot learning
F. Mumuni
A. Mumuni
AAML
105
7
0
11 Mar 2024
RESTORE: Towards Feature Shift for Vision-Language Prompt Learning
RESTORE: Towards Feature Shift for Vision-Language Prompt Learning
Yuncheng Yang
Chuyan Zhang
Zuopeng Yang
Yuting Gao
Yulei Qin
Ke Li
Xing Sun
Jie Yang
Yun Gu
VLMVPVLM
123
0
0
10 Mar 2024
In-context Prompt Learning for Test-time Vision Recognition with Frozen
  Vision-language Model
In-context Prompt Learning for Test-time Vision Recognition with Frozen Vision-language Model
Junhui Yin
Xinyu Zhang
Lin Wu
Xianghua Xie
Xiaojie Wang
VPVLMVLMMLLM
65
2
0
10 Mar 2024
CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?
CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?
Ibrahim Alabdulmohsin
Xiao Wang
Andreas Steiner
Priya Goyal
Alexander DÁmour
Xiao-Qi Zhai
84
21
0
07 Mar 2024
Non-autoregressive Sequence-to-Sequence Vision-Language Models
Non-autoregressive Sequence-to-Sequence Vision-Language Models
Kunyu Shi
Qi Dong
Luis Goncalves
Zhuowen Tu
Stefano Soatto
VLM
151
3
0
04 Mar 2024
Rethinking CLIP-based Video Learners in Cross-Domain Open-Vocabulary
  Action Recognition
Rethinking CLIP-based Video Learners in Cross-Domain Open-Vocabulary Action Recognition
Kun-Yu Lin
Henghui Ding
Jiaming Zhou
Yu-Ming Tang
Yi-Xing Peng
Zhilin Zhao
Chen Change Loy
Wei-Shi Zheng
VLM
125
18
0
03 Mar 2024
Multi-modal Attribute Prompting for Vision-Language Models
Multi-modal Attribute Prompting for Vision-Language Models
Xin Liu
Jiamin Wu
and Wenfei Yang
Xu Zhou
Tianzhu Zhang
VLM
89
12
0
01 Mar 2024
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Tsai-Shien Chen
Aliaksandr Siarohin
Willi Menapace
Ekaterina Deyneka
Hsiang-wei Chao
...
Yuwei Fang
Hsin-Ying Lee
Jian Ren
Ming-Hsuan Yang
Sergey Tulyakov
VGen
176
211
0
29 Feb 2024
CricaVPR: Cross-image Correlation-aware Representation Learning for
  Visual Place Recognition
CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition
Feng Lu
Xiangyuan Lan
Lijun Zhang
Dongmei Jiang
Yaowei Wang
Chun Yuan
101
37
0
29 Feb 2024
Percept, Chat, and then Adapt: Multimodal Knowledge Transfer of
  Foundation Models for Open-World Video Recognition
Percept, Chat, and then Adapt: Multimodal Knowledge Transfer of Foundation Models for Open-World Video Recognition
Boyu Chen
Siran Chen
Kunchang Li
Qinglin Xu
Yu Qiao
Yali Wang
72
5
0
29 Feb 2024
Demonstrating and Reducing Shortcuts in Vision-Language Representation
  Learning
Demonstrating and Reducing Shortcuts in Vision-Language Representation Learning
Maurits J. R. Bleeker
Mariya Hendriksen
Andrew Yates
Maarten de Rijke
VLM
100
2
0
27 Feb 2024
StochCA: A Novel Approach for Exploiting Pretrained Models with
  Cross-Attention
StochCA: A Novel Approach for Exploiting Pretrained Models with Cross-Attention
SeungWon Seo
Suho Lee
Sangheum Hwang
87
0
0
25 Feb 2024
Towards Seamless Adaptation of Pre-trained Models for Visual Place
  Recognition
Towards Seamless Adaptation of Pre-trained Models for Visual Place Recognition
Feng Lu
Lijun Zhang
Xiangyuan Lan
Shuting Dong
Yaowei Wang
Chun Yuan
116
34
0
22 Feb 2024
The Effect of Batch Size on Contrastive Self-Supervised Speech
  Representation Learning
The Effect of Batch Size on Contrastive Self-Supervised Speech Representation Learning
Nik Vaessen
David A. van Leeuwen
96
3
0
21 Feb 2024
VideoPrism: A Foundational Visual Encoder for Video Understanding
VideoPrism: A Foundational Visual Encoder for Video Understanding
Long Zhao
N. B. Gundavarapu
Liangzhe Yuan
Hao Zhou
Shen Yan
...
Huisheng Wang
Hartwig Adam
Mikhail Sirotenko
Ting Liu
Boqing Gong
VGen
136
36
0
20 Feb 2024
Advancing Human Action Recognition with Foundation Models trained on
  Unlabeled Public Videos
Advancing Human Action Recognition with Foundation Models trained on Unlabeled Public Videos
Yang Qian
Yinan Sun
A. Kargarandehkordi
Parnian Azizian
O. Mutlu
Saimourya Surabhi
Pingyi Chen
Zain Jabbar
Dennis Paul Wall
Peter Washington
OffRL
78
1
0
14 Feb 2024
Towards a Foundation Model for Brain Age Prediction using coVariance
  Neural Networks
Towards a Foundation Model for Brain Age Prediction using coVariance Neural Networks
Saurabh Sihag
Gonzalo Mateos
Alejandro Ribeiro
74
6
0
12 Feb 2024
Real-World Robot Applications of Foundation Models: A Review
Real-World Robot Applications of Foundation Models: A Review
Kento Kawaharazuka
T. Matsushima
Andrew Gambardella
Jiaxian Guo
Chris Paxton
Andy Zeng
OffRLVLMLM&Ro
118
54
0
08 Feb 2024
LLMs Meet VLMs: Boost Open Vocabulary Object Detection with Fine-grained
  Descriptors
LLMs Meet VLMs: Boost Open Vocabulary Object Detection with Fine-grained Descriptors
Sheng Jin
Xue-Qiu Jiang
Jiaxing Huang
Lewei Lu
Shijian Lu
VLMObjD
98
26
0
07 Feb 2024
The Essential Role of Causality in Foundation World Models for Embodied
  AI
The Essential Role of Causality in Foundation World Models for Embodied AI
Tarun Gupta
Wenbo Gong
Chao Ma
Nick Pawlowski
Agrin Hilmkil
...
Jianfeng Gao
Stefan Bauer
Danica Kragic
Bernhard Schölkopf
Cheng Zhang
92
17
0
06 Feb 2024
Previous
12345...121314
Next