ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.11432
  4. Cited By
Florence: A New Foundation Model for Computer Vision

Florence: A New Foundation Model for Computer Vision

22 November 2021
Lu Yuan
Dongdong Chen
Yi-Ling Chen
Noel Codella
Xiyang Dai
Jianfeng Gao
Houdong Hu
Xuedong Huang
Boxin Li
Chunyuan Li
Ce Liu
Mengchen Liu
Zicheng Liu
Yumao Lu
Yu Shi
Lijuan Wang
Jianfeng Wang
Bin Xiao
Zhen Xiao
Jianwei Yang
Michael Zeng
Luowei Zhou
Pengchuan Zhang
    VLM
ArXivPDFHTML

Papers citing "Florence: A New Foundation Model for Computer Vision"

50 / 664 papers shown
Title
Instance Brownian Bridge as Texts for Open-vocabulary Video Instance
  Segmentation
Instance Brownian Bridge as Texts for Open-vocabulary Video Instance Segmentation
Ze-Long Cheng
Kehan Li
Hao Li
Peng Jin
Chang Liu
Xiawu Zheng
Rongrong Ji
Jie Chen
VOS
38
2
0
18 Jan 2024
Connect, Collapse, Corrupt: Learning Cross-Modal Tasks with Uni-Modal
  Data
Connect, Collapse, Corrupt: Learning Cross-Modal Tasks with Uni-Modal Data
Yuhui Zhang
Elaine Sui
Serena Yeung-Levy
39
9
0
16 Jan 2024
Vehicle: Bridging the Embedding Gap in the Verification of
  Neuro-Symbolic Programs
Vehicle: Bridging the Embedding Gap in the Verification of Neuro-Symbolic Programs
M. Daggitt
Wen Kokke
R. Atkey
Natalia Slusarz
Luca Arnaboldi
Ekaterina Komendantskaya
NAI
36
10
0
12 Jan 2024
Distilling Vision-Language Models on Millions of Videos
Distilling Vision-Language Models on Millions of Videos
Yue Zhao
Long Zhao
Xingyi Zhou
Jialin Wu
Chun-Te Chu
...
Hartwig Adam
Ting Liu
Boqing Gong
Philipp Krahenbuhl
Liangzhe Yuan
VLM
34
13
0
11 Jan 2024
Learning to Prompt with Text Only Supervision for Vision-Language Models
Learning to Prompt with Text Only Supervision for Vision-Language Models
Muhammad Uzair Khattak
Muhammad Ferjad Naeem
Muzammal Naseer
Luc Van Gool
F. Tombari
VLM
VPVLM
33
19
0
04 Jan 2024
SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for
  Multimodal Alignment
SyCoCa: Symmetrizing Contrastive Captioners with Attentive Masking for Multimodal Alignment
Ziping Ma
Furong Xu
Jian Liu
Ming Yang
Qingpei Guo
VLM
42
3
0
04 Jan 2024
Improved Zero-Shot Classification by Adapting VLMs with Text
  Descriptions
Improved Zero-Shot Classification by Adapting VLMs with Text Descriptions
Oindrila Saha
Grant Van Horn
Subhransu Maji
VLM
45
20
0
04 Jan 2024
Black-Box Tuning of Vision-Language Models with Effective Gradient
  Approximation
Black-Box Tuning of Vision-Language Models with Effective Gradient Approximation
Zixian Guo
Yuxiang Wei
Ming-Yu Liu
Zhilong Ji
Jinfeng Bai
Yiwen Guo
Wangmeng Zuo
VLM
36
8
0
26 Dec 2023
UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces
UniRef++: Segment Every Reference Object in Spatial and Temporal Spaces
Jiannan Wu
Yi-Xin Jiang
Bin Yan
Huchuan Lu
Zehuan Yuan
Ping Luo
VOS
37
17
0
25 Dec 2023
InternVL: Scaling up Vision Foundation Models and Aligning for Generic
  Visual-Linguistic Tasks
InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
Zhe Chen
Jiannan Wu
Wenhai Wang
Weijie Su
Guo Chen
...
Bin Li
Ping Luo
Tong Lu
Yu Qiao
Jifeng Dai
VLM
MLLM
176
943
0
21 Dec 2023
InfoVisDial: An Informative Visual Dialogue Dataset by Bridging Large
  Multimodal and Language Models
InfoVisDial: An Informative Visual Dialogue Dataset by Bridging Large Multimodal and Language Models
Bingbing Wen
Zhengyuan Yang
Jianfeng Wang
Zhe Gan
Bill Howe
Lijuan Wang
MLLM
44
1
0
21 Dec 2023
Testing the Segment Anything Model on radiology data
Testing the Segment Anything Model on radiology data
J. Almeida
N. M. Rodrigues
Sara Silva
Nickolas Papanikolaou
MedIm
VLM
47
1
0
20 Dec 2023
Understanding the Multi-modal Prompts of the Pre-trained Vision-Language
  Model
Understanding the Multi-modal Prompts of the Pre-trained Vision-Language Model
Shuailei Ma
Chen-Wei Xie
Ying-yu Wei
Siyang Sun
Jiaqi Fan
Xiaoyi Bao
Yuxin Guo
Yun Zheng
VLM
VPVLM
26
2
0
18 Dec 2023
Data-Efficient Multimodal Fusion on a Single GPU
Data-Efficient Multimodal Fusion on a Single GPU
Noël Vouitsis
Zhaoyan Liu
S. Gorti
Valentin Villecroze
Jesse C. Cresswell
Guangwei Yu
G. Loaiza-Ganem
M. Volkovs
51
3
0
15 Dec 2023
TF-CLIP: Learning Text-free CLIP for Video-based Person
  Re-Identification
TF-CLIP: Learning Text-free CLIP for Video-based Person Re-Identification
Chenyang Yu
Xuehu Liu
Yingquan Wang
Pingping Zhang
Huchuan Lu
VLM
27
21
0
15 Dec 2023
VL-GPT: A Generative Pre-trained Transformer for Vision and Language
  Understanding and Generation
VL-GPT: A Generative Pre-trained Transformer for Vision and Language Understanding and Generation
Jinguo Zhu
Xiaohan Ding
Yixiao Ge
Yuying Ge
Sijie Zhao
Hengshuang Zhao
Xiaohua Wang
Ying Shan
ViT
VLM
19
32
0
14 Dec 2023
General Object Foundation Model for Images and Videos at Scale
General Object Foundation Model for Images and Videos at Scale
Junfeng Wu
Yi-Xin Jiang
Qihao Liu
Zehuan Yuan
Xiang Bai
Song Bai
VOS
VLM
38
39
0
14 Dec 2023
On Robustness to Missing Video for Audiovisual Speech Recognition
On Robustness to Missing Video for Audiovisual Speech Recognition
Oscar Chang
Otavio Braga
H. Liao
Dmitriy Serdyuk
Olivier Siohan
45
11
0
13 Dec 2023
Honeybee: Locality-enhanced Projector for Multimodal LLM
Honeybee: Locality-enhanced Projector for Multimodal LLM
Junbum Cha
Wooyoung Kang
Jonghwan Mun
Byungseok Roh
MLLM
43
112
0
11 Dec 2023
Learning Hierarchical Prompt with Structured Linguistic Knowledge for
  Vision-Language Models
Learning Hierarchical Prompt with Structured Linguistic Knowledge for Vision-Language Models
Yubin Wang
Xinyang Jiang
De Cheng
Dongsheng Li
Cairong Zhao
VLM
40
15
0
11 Dec 2023
Language-assisted Vision Model Debugger: A Sample-Free Approach to
  Finding and Fixing Bugs
Language-assisted Vision Model Debugger: A Sample-Free Approach to Finding and Fixing Bugs
Chaoquan Jiang
Jinqiang Wang
Rui Hu
Jitao Sang
32
0
0
09 Dec 2023
Uni3DL: Unified Model for 3D and Language Understanding
Uni3DL: Unified Model for 3D and Language Understanding
Xiang Li
Jian Ding
Zhaoyang Chen
Mohamed Elhoseiny
38
3
0
05 Dec 2023
Foundation Models for Weather and Climate Data Understanding: A
  Comprehensive Survey
Foundation Models for Weather and Climate Data Understanding: A Comprehensive Survey
Shengchao Chen
Guodong Long
Jing Jiang
Dikai Liu
Chengqi Zhang
SyDa
AI4CE
49
24
0
05 Dec 2023
Towards General Purpose Vision Foundation Models for Medical Image
  Analysis: An Experimental Study of DINOv2 on Radiology Benchmarks
Towards General Purpose Vision Foundation Models for Medical Image Analysis: An Experimental Study of DINOv2 on Radiology Benchmarks
Mohammed Baharoon
Waseem Qureshi
J. Ouyang
Yanwu Xu
Abdulrhman Aljouie
Wei Peng
MedIm
AI4CE
51
8
0
04 Dec 2023
SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference
SCLIP: Rethinking Self-Attention for Dense Vision-Language Inference
Feng Wang
Jieru Mei
Alan Yuille
VLM
35
55
0
04 Dec 2023
APoLLo: Unified Adapter and Prompt Learning for Vision Language Models
APoLLo: Unified Adapter and Prompt Learning for Vision Language Models
Sanjoy Chowdhury
Sayan Nag
Dinesh Manocha
VLM
33
17
0
04 Dec 2023
Grounding Everything: Emerging Localization Properties in
  Vision-Language Transformers
Grounding Everything: Emerging Localization Properties in Vision-Language Transformers
Walid Bousselham
Felix Petersen
Vittorio Ferrari
Hilde Kuehne
ObjD
VLM
48
39
0
01 Dec 2023
Segment and Caption Anything
Segment and Caption Anything
Xiaoke Huang
Jianfeng Wang
Yansong Tang
Zheng Zhang
Han Hu
Jiwen Lu
Lijuan Wang
Zicheng Liu
MLLM
VLM
34
18
0
01 Dec 2023
Brainformer: Mimic Human Visual Brain Functions to Machine Vision Models
  via fMRI
Brainformer: Mimic Human Visual Brain Functions to Machine Vision Models via fMRI
Xuan-Bac Nguyen
Xin Li
Pawan Sinha
Samee U. Khan
Khoa Luu
ViT
MedIm
35
0
0
30 Nov 2023
BioCLIP: A Vision Foundation Model for the Tree of Life
BioCLIP: A Vision Foundation Model for the Tree of Life
Samuel Stevens
Jiaman Wu
Matthew J Thompson
Elizabeth G Campolongo
Chan Hee Song
...
Wasila M Dahdul
Charles V. Stewart
Tanya Berger-Wolf
Wei-Lun Chao
Yu-Chuan Su
39
64
0
30 Nov 2023
OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for
  General Video Recognition
OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition
Tom Tongjia Chen
Hongshan Yu
Zhengeng Yang
Zechuan Li
Wei Sun
Chen Chen
23
8
0
30 Nov 2023
Knowledge Transfer from Vision Foundation Models for Efficient Training
  of Small Task-specific Models
Knowledge Transfer from Vision Foundation Models for Efficient Training of Small Task-specific Models
Raviteja Vemulapalli
Hadi Pouransari
Fartash Faghri
Sachin Mehta
Mehrdad Farajtabar
Mohammad Rastegari
Oncel Tuzel
43
7
0
30 Nov 2023
GELDA: A generative language annotation framework to reveal visual
  biases in datasets
GELDA: A generative language annotation framework to reveal visual biases in datasets
Krish Kabra
Kathleen M. Lewis
Guha Balakrishnan
VLM
24
1
0
29 Nov 2023
Explaining CLIP's performance disparities on data from blind/low vision
  users
Explaining CLIP's performance disparities on data from blind/low vision users
Daniela Massiceti
Camilla Longden
Agnieszka Slowik
Samuel Wills
Martin Grayson
C. Morrison
VLM
29
9
0
29 Nov 2023
E-ViLM: Efficient Video-Language Model via Masked Video Modeling with
  Semantic Vector-Quantized Tokenizer
E-ViLM: Efficient Video-Language Model via Masked Video Modeling with Semantic Vector-Quantized Tokenizer
Jacob Zhiyuan Fang
Skyler Zheng
Vasu Sharma
Robinson Piramuthu
VLM
38
0
0
28 Nov 2023
The curse of language biases in remote sensing VQA: the role of spatial
  attributes, language diversity, and the need for clear evaluation
The curse of language biases in remote sensing VQA: the role of spatial attributes, language diversity, and the need for clear evaluation
Christel Chappuis
Eliot Walt
Vincent Mendez
Sylvain Lobry
B. L. Saux
D. Tuia
31
3
0
28 Nov 2023
Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating
  Video-based Large Language Models
Video-Bench: A Comprehensive Benchmark and Toolkit for Evaluating Video-based Large Language Models
Munan Ning
Bin Zhu
Yujia Xie
Bin Lin
Jiaxi Cui
Lu Yuan
Dongdong Chen
Li-ming Yuan
ELM
MLLM
27
58
0
27 Nov 2023
ViT-Lens: Towards Omni-modal Representations
ViT-Lens: Towards Omni-modal Representations
Weixian Lei
Yixiao Ge
Kun Yi
Jianfeng Zhang
Difei Gao
Dylan Sun
Yuying Ge
Ying Shan
Mike Zheng Shou
21
18
0
27 Nov 2023
Align before Adapt: Leveraging Entity-to-Region Alignments for
  Generalizable Video Action Recognition
Align before Adapt: Leveraging Entity-to-Region Alignments for Generalizable Video Action Recognition
Yifei Chen
Dapeng Chen
Ruijin Liu
Sai Zhou
Wenyuan Xue
Wei Peng
33
6
0
27 Nov 2023
BadCLIP: Trigger-Aware Prompt Learning for Backdoor Attacks on CLIP
BadCLIP: Trigger-Aware Prompt Learning for Backdoor Attacks on CLIP
Jiawang Bai
Kuofeng Gao
Shaobo Min
Shu-Tao Xia
Zhifeng Li
Wei Liu
VLM
29
38
0
26 Nov 2023
Mug-STAN: Adapting Image-Language Pretrained Models for General Video
  Understanding
Mug-STAN: Adapting Image-Language Pretrained Models for General Video Understanding
Ruyang Liu
Jingjia Huang
Wei-Nan Gao
Thomas H. Li
Ge Li
VLM
34
3
0
25 Nov 2023
3D-MIR: A Benchmark and Empirical Study on 3D Medical Image Retrieval in
  Radiology
3D-MIR: A Benchmark and Empirical Study on 3D Medical Image Retrieval in Radiology
Asma Ben Abacha
Alberto Santamaría-Pang
Ho Hin Lee
J. Merkow
Qin Cai
...
Julia Gong
M. Lungren
Thomas Lin
Noel C. F. Codella
Ivan Tarapov
31
5
0
23 Nov 2023
Active Prompt Learning in Vision Language Models
Active Prompt Learning in Vision Language Models
Jihwan Bang
Sumyeong Ahn
Jae-Gil Lee
VLM
16
9
0
18 Nov 2023
Domain Aligned CLIP for Few-shot Classification
Domain Aligned CLIP for Few-shot Classification
Muhammad Waleed Gondal
Jochen Gast
Inigo Alonso Ruiz
Richard Droste
Tommaso Macri
Suren Kumar
Luitpold Staudigl
VLM
21
11
0
15 Nov 2023
Florence-2: Advancing a Unified Representation for a Variety of Vision
  Tasks
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Bin Xiao
Haiping Wu
Weijian Xu
Xiyang Dai
Houdong Hu
Yumao Lu
Michael Zeng
Ce Liu
Lu Yuan
VLM
50
143
0
10 Nov 2023
DAMEX: Dataset-aware Mixture-of-Experts for visual understanding of
  mixture-of-datasets
DAMEX: Dataset-aware Mixture-of-Experts for visual understanding of mixture-of-datasets
Yash Jain
Harkirat Singh Behl
Z. Kira
Vibhav Vineet
25
12
0
08 Nov 2023
Exploring Dataset-Scale Indicators of Data Quality
Exploring Dataset-Scale Indicators of Data Quality
Ben Feuer
Chinmay Hegde
29
1
0
07 Nov 2023
OVIR-3D: Open-Vocabulary 3D Instance Retrieval Without Training on 3D
  Data
OVIR-3D: Open-Vocabulary 3D Instance Retrieval Without Training on 3D Data
Shiyang Lu
Haonan Chang
E. Jing
Abdeslam Boularias
Kostas Bekris
24
55
0
06 Nov 2023
Adapting Segment Anything Model (SAM) through Prompt-based Learning for
  Enhanced Protein Identification in Cryo-EM Micrographs
Adapting Segment Anything Model (SAM) through Prompt-based Learning for Enhanced Protein Identification in Cryo-EM Micrographs
Fei He
Zhiyuan Yang
Mingyue Gao
Biplab Poudel
Newgin Sam Ebin Sam Dhas
Rajan Gyawali
Ashwin Dhakal
Jianlin Cheng
Dong Xu
23
4
0
04 Nov 2023
Align Your Prompts: Test-Time Prompting with Distribution Alignment for
  Zero-Shot Generalization
Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization
Jameel Hassan
Hanan Gani
Noor Hussein
Muhammad Uzair Khattak
Muzammal Naseer
Fahad Shahbaz Khan
Salman Khan
VLM
OOD
75
62
0
02 Nov 2023
Previous
123456...121314
Next