ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.11432
  4. Cited By
Florence: A New Foundation Model for Computer Vision

Florence: A New Foundation Model for Computer Vision

22 November 2021
Lu Yuan
Dongdong Chen
Yi-Ling Chen
Noel Codella
Xiyang Dai
Jianfeng Gao
Houdong Hu
Xuedong Huang
Boxin Li
Chunyuan Li
Ce Liu
Mengchen Liu
Zicheng Liu
Yumao Lu
Yu Shi
Lijuan Wang
Jianfeng Wang
Bin Xiao
Zhen Xiao
Jianwei Yang
Michael Zeng
Luowei Zhou
Pengchuan Zhang
    VLM
ArXivPDFHTML

Papers citing "Florence: A New Foundation Model for Computer Vision"

50 / 664 papers shown
Title
Rethinking Image-to-Video Adaptation: An Object-centric Perspective
Rethinking Image-to-Video Adaptation: An Object-centric Perspective
Rui Qian
Shuangrui Ding
Dahua Lin
OCL
52
1
0
09 Jul 2024
CEIA: CLIP-Based Event-Image Alignment for Open-World Event-Based
  Understanding
CEIA: CLIP-Based Event-Image Alignment for Open-World Event-Based Understanding
Wenhao Xu
Wenming Weng
Yueyi Zhang
Zhiwei Xiong
VLM
39
0
0
09 Jul 2024
AMD: Automatic Multi-step Distillation of Large-scale Vision Models
AMD: Automatic Multi-step Distillation of Large-scale Vision Models
Cheng Han
Qifan Wang
S. Dianat
Majid Rabbani
Raghuveer M. Rao
Yi Fang
Qiang Guan
Lifu Huang
Dongfang Liu
VLM
41
4
0
05 Jul 2024
Visualizing Dialogues: Enhancing Image Selection through Dialogue
  Understanding with Large Language Models
Visualizing Dialogues: Enhancing Image Selection through Dialogue Understanding with Large Language Models
Chang-Sheng Kao
Yun-Nung Chen
26
0
0
04 Jul 2024
FairMedFM: Fairness Benchmarking for Medical Imaging Foundation Models
FairMedFM: Fairness Benchmarking for Medical Imaging Foundation Models
Ruinan Jin
Zikang Xu
Yuan Zhong
Qiongsong Yao
Qi Dou
S. Kevin Zhou
Xiaoxiao Li
VLM
32
13
0
01 Jul 2024
The Progression of Transformers from Language to Vision to MOT: A
  Literature Review on Multi-Object Tracking with Transformers
The Progression of Transformers from Language to Vision to MOT: A Literature Review on Multi-Object Tracking with Transformers
Abhi Kamboj
32
0
0
24 Jun 2024
Self-supervised Pretraining and Finetuning for Monocular Depth and
  Visual Odometry
Self-supervised Pretraining and Finetuning for Monocular Depth and Visual Odometry
Boris Chidlovskii
L. Antsfeld
MDE
ViT
33
1
0
16 Jun 2024
Light Up the Shadows: Enhance Long-Tailed Entity Grounding with
  Concept-Guided Vision-Language Models
Light Up the Shadows: Enhance Long-Tailed Entity Grounding with Concept-Guided Vision-Language Models
Yikai Zhang
Qianyu He
Xintao Wang
Siyu Yuan
Jiaqing Liang
Yanghua Xiao
VLM
49
0
0
16 Jun 2024
Candidate Pseudolabel Learning: Enhancing Vision-Language Models by
  Prompt Tuning with Unlabeled Data
Candidate Pseudolabel Learning: Enhancing Vision-Language Models by Prompt Tuning with Unlabeled Data
Jiahan Zhang
Qinglai Wei
Feng Liu
Lei Feng
VLM
31
7
0
15 Jun 2024
Explore the Limits of Omni-modal Pretraining at Scale
Explore the Limits of Omni-modal Pretraining at Scale
Yiyuan Zhang
Handong Li
Jing Liu
Xiangyu Yue
VLM
LRM
49
1
0
13 Jun 2024
Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks
  and Algorithms
Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms
Miaosen Zhang
Yixuan Wei
Zhen Xing
Yifei Ma
Zuxuan Wu
...
Zheng-Wei Zhang
Qi Dai
Chong Luo
Xin Geng
Baining Guo
VLM
51
1
0
13 Jun 2024
GraphFM: A Comprehensive Benchmark for Graph Foundation Model
GraphFM: A Comprehensive Benchmark for Graph Foundation Model
Yuhao Xu
Xinqi Liu
Keyu Duan
Yi Fang
Yu-Neng Chuang
Daochen Zha
Qiaoyu Tan
AI4CE
35
1
0
12 Jun 2024
Can Foundation Models Reliably Identify Spatial Hazards? A Case Study on
  Curb Segmentation
Can Foundation Models Reliably Identify Spatial Hazards? A Case Study on Curb Segmentation
Diwei Sheng
Giles Hamilton-Fletcher
Mahya Beheshti
Chen Feng
John-Ross Rizzo
40
2
0
11 Jun 2024
Generalist Multimodal AI: A Review of Architectures, Challenges and
  Opportunities
Generalist Multimodal AI: A Review of Architectures, Challenges and Opportunities
Sai Munikoti
Ian Stewart
Sameera Horawalavithana
Henry Kvinge
Tegan H. Emerson
Sandra E Thompson
Karl Pazdernik
38
2
0
08 Jun 2024
CTSyn: A Foundational Model for Cross Tabular Data Generation
CTSyn: A Foundational Model for Cross Tabular Data Generation
Xiaofeng Lin
Chenheng Xu
Matthew Yang
Guang Cheng
43
3
0
07 Jun 2024
Efficient 3D-Aware Facial Image Editing via Attribute-Specific Prompt
  Learning
Efficient 3D-Aware Facial Image Editing via Attribute-Specific Prompt Learning
Amandeep Kumar
Muhammad Awais
Sanath Narayan
Hisham Cholakkal
Salman Khan
Rao Muhammad Anwer
45
0
0
06 Jun 2024
Wings: Learning Multimodal LLMs without Text-only Forgetting
Wings: Learning Multimodal LLMs without Text-only Forgetting
Yi-Kai Zhang
Shiyin Lu
Yang Li
Yanqing Ma
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
De-Chuan Zhan
Han-Jia Ye
VLM
35
6
0
05 Jun 2024
Visual-Text Cross Alignment: Refining the Similarity Score in
  Vision-Language Models
Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models
Jinhao Li
Haopeng Li
S. Erfani
Lei Feng
James Bailey
Feng Liu
VLM
34
3
0
05 Jun 2024
RAP: Efficient Text-Video Retrieval with Sparse-and-Correlated Adapter
RAP: Efficient Text-Video Retrieval with Sparse-and-Correlated Adapter
Meng Cao
Haoran Tang
Jinfa Huang
Peng Jin
Can Zhang
Ruyang Liu
Long Chen
Xiaodan Liang
Li-ming Yuan
Ge Li
101
11
0
29 May 2024
A Good Foundation is Worth Many Labels: Label-Efficient Panoptic
  Segmentation
A Good Foundation is Worth Many Labels: Label-Efficient Panoptic Segmentation
Niclas Vodisch
Kürsat Petek
Markus Kappeler
Abhinav Valada
Wolfram Burgard
VLM
40
4
0
29 May 2024
Transcending Fusion: A Multi-Scale Alignment Method for Remote Sensing
  Image-Text Retrieval
Transcending Fusion: A Multi-Scale Alignment Method for Remote Sensing Image-Text Retrieval
Rui Yang
Shuang Wang
Yi Han
Yuanheng Li
Dong Zhao
Dou Quan
Yanhe Guo
Licheng Jiao
68
3
0
29 May 2024
MMCTAgent: Multi-modal Critical Thinking Agent Framework for Complex
  Visual Reasoning
MMCTAgent: Multi-modal Critical Thinking Agent Framework for Complex Visual Reasoning
Somnath Kumar
Yash Gadhia
T. Ganu
A. Nambi
LRM
55
1
0
28 May 2024
ECG Semantic Integrator (ESI): A Foundation ECG Model Pretrained with
  LLM-Enhanced Cardiological Text
ECG Semantic Integrator (ESI): A Foundation ECG Model Pretrained with LLM-Enhanced Cardiological Text
Han Yu
Peikun Guo
Akane Sano
34
16
0
26 May 2024
Vertical Federated Learning for Effectiveness, Security, Applicability:
  A Survey
Vertical Federated Learning for Effectiveness, Security, Applicability: A Survey
Mang Ye
Wei Shen
Bo Du
E. Snezhko
Vassili Kovalev
PongChi Yuen
FedML
80
3
0
25 May 2024
A Survey on Vision-Language-Action Models for Embodied AI
A Survey on Vision-Language-Action Models for Embodied AI
Yueen Ma
Zixing Song
Yuzheng Zhuang
Jianye Hao
Irwin King
LM&Ro
82
43
0
23 May 2024
No Filter: Cultural and Socioeconomic Diversity in Contrastive
  Vision-Language Models
No Filter: Cultural and Socioeconomic Diversity in Contrastive Vision-Language Models
Angeline Pouget
Lucas Beyer
Emanuele Bugliarello
Xiao Wang
Andreas Steiner
Xiao-Qi Zhai
Ibrahim M. Alabdulmohsin
VLM
33
7
0
22 May 2024
Text-Video Retrieval with Global-Local Semantic Consistent Learning
Text-Video Retrieval with Global-Local Semantic Consistent Learning
Haonan Zhang
Pengpeng Zeng
Lianli Gao
Jingkuan Song
Yihang Duan
Xinyu Lyu
Hengtao Shen
VLM
CLIP
40
2
0
21 May 2024
Depth Prompting for Sensor-Agnostic Depth Estimation
Depth Prompting for Sensor-Agnostic Depth Estimation
Jin-Hwi Park
Chanhwi Jeong
Junoh Lee
Hae-Gon Jeon
MDE
VLM
48
8
0
20 May 2024
Adjacent Leader Decentralized Stochastic Gradient Descent
Adjacent Leader Decentralized Stochastic Gradient Descent
Haoze He
Jing Wang
A. Choromańska
30
0
0
18 May 2024
AquaLoRA: Toward White-box Protection for Customized Stable Diffusion
  Models via Watermark LoRA
AquaLoRA: Toward White-box Protection for Customized Stable Diffusion Models via Watermark LoRA
Weitao Feng
Wenbo Zhou
Jiyan He
Jie Zhang
Tianyi Wei
Guanlin Li
Tianwei Zhang
Weiming Zhang
Neng H. Yu
38
18
0
18 May 2024
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Tianhe Ren
Qing Jiang
Shilong Liu
Zhaoyang Zeng
Wenlong Liu
...
Hao Zhang
Feng Li
Peijun Tang
Kent Yu
Lei Zhang
ObjD
VLM
42
34
0
16 May 2024
Efficient Vision-Language Pre-training by Cluster Masking
Efficient Vision-Language Pre-training by Cluster Masking
Zihao Wei
Zixuan Pan
Andrew Owens
VLM
29
8
0
14 May 2024
FreeVA: Offline MLLM as Training-Free Video Assistant
FreeVA: Offline MLLM as Training-Free Video Assistant
Wenhao Wu
VLM
OffRL
40
20
0
13 May 2024
Open Challenges and Opportunities in Federated Foundation Models Towards
  Biomedical Healthcare
Open Challenges and Opportunities in Federated Foundation Models Towards Biomedical Healthcare
Xingyu Li
Lu Peng
Yuping Wang
Weihua Zhang
AI4CE
MedIm
LM&MA
71
5
0
10 May 2024
Selective Classification Under Distribution Shifts
Selective Classification Under Distribution Shifts
Hengyue Liang
Le Peng
Ju Sun
UQCV
43
1
0
08 May 2024
AAPL: Adding Attributes to Prompt Learning for Vision-Language Models
AAPL: Adding Attributes to Prompt Learning for Vision-Language Models
Gahyeon Kim
Sohee Kim
Seokju Lee
VLM
33
5
0
25 Apr 2024
DENOISER: Rethinking the Robustness for Open-Vocabulary Action
  Recognition
DENOISER: Rethinking the Robustness for Open-Vocabulary Action Recognition
Haozhe Cheng
Chen Ju
Haicheng Wang
Jinxiang Liu
Mengting Chen
Qiang Hu
Xiaoyun Zhang
Yanfeng Wang
DiffM
VLM
43
5
0
23 Apr 2024
SHE-Net: Syntax-Hierarchy-Enhanced Text-Video Retrieval
SHE-Net: Syntax-Hierarchy-Enhanced Text-Video Retrieval
Xuzheng Yu
Chen Jiang
Xingning Dong
Tian Gan
Ming Yang
Qingpei Guo
45
1
0
22 Apr 2024
LTOS: Layout-controllable Text-Object Synthesis via Adaptive
  Cross-attention Fusions
LTOS: Layout-controllable Text-Object Synthesis via Adaptive Cross-attention Fusions
Xiaoran Zhao
Tianhao Wu
Yu Lai
Zhiliang Tian
Zhen Huang
Yahui Liu
Zejiang He
Dongsheng Li
DiffM
38
1
0
21 Apr 2024
Parameter Efficient Fine Tuning: A Comprehensive Analysis Across
  Applications
Parameter Efficient Fine Tuning: A Comprehensive Analysis Across Applications
Charith Chandra Sai Balne
S. Bhaduri
Tamoghna Roy
Vinija Jain
Aman Chadha
40
12
0
21 Apr 2024
Progressive Multi-modal Conditional Prompt Tuning
Progressive Multi-modal Conditional Prompt Tuning
Xiaoyu Qiu
Hao Feng
Yuechen Wang
Wen-gang Zhou
Houqiang Li
VLM
29
1
0
18 Apr 2024
Pretraining Billion-scale Geospatial Foundational Models on Frontier
Pretraining Billion-scale Geospatial Foundational Models on Frontier
A. Tsaris
P. Dias
Abhishek Potnis
Junqi Yin
Feiyi Wang
D. Lunga
AI4CE
38
4
0
17 Apr 2024
A Progressive Framework of Vision-language Knowledge Distillation and
  Alignment for Multilingual Scene
A Progressive Framework of Vision-language Knowledge Distillation and Alignment for Multilingual Scene
Wenbo Zhang
Yifan Zhang
Jianfeng Lin
Binqiang Huang
Jinlu Zhang
Wenhao Yu
VLM
49
2
0
17 Apr 2024
Optimization of Prompt Learning via Multi-Knowledge Representation for
  Vision-Language Models
Optimization of Prompt Learning via Multi-Knowledge Representation for Vision-Language Models
Enming Zhang
Bingke Zhu
Yingying Chen
Qinghai Miao
Ming Tang
Jinqiao Wang
VLM
49
0
0
16 Apr 2024
Evolving Interpretable Visual Classifiers with Large Language Models
Evolving Interpretable Visual Classifiers with Large Language Models
Mia Chiquier
Utkarsh Mall
Carl Vondrick
VLM
30
10
0
15 Apr 2024
Leveraging Temporal Contextualization for Video Action Recognition
Leveraging Temporal Contextualization for Video Action Recognition
Minji Kim
Dongyoon Han
Taekyung Kim
Bohyung Han
51
2
0
15 Apr 2024
LLM-Seg: Bridging Image Segmentation and Large Language Model Reasoning
LLM-Seg: Bridging Image Segmentation and Large Language Model Reasoning
Junchi Wang
Lei Ke
MLLM
LRM
VLM
44
20
0
12 Apr 2024
On the Robustness of Language Guidance for Low-Level Vision Tasks:
  Findings from Depth Estimation
On the Robustness of Language Guidance for Low-Level Vision Tasks: Findings from Depth Estimation
Agneet Chatterjee
Tejas Gokhale
Chitta Baral
Yezhou Yang
VLM
35
2
0
12 Apr 2024
Generalized Contrastive Learning for Multi-Modal Retrieval and Ranking
Generalized Contrastive Learning for Multi-Modal Retrieval and Ranking
Tianyu Zhu
M. Jung
Jesse Clark
91
1
0
12 Apr 2024
PromptSync: Bridging Domain Gaps in Vision-Language Models through
  Class-Aware Prototype Alignment and Discrimination
PromptSync: Bridging Domain Gaps in Vision-Language Models through Class-Aware Prototype Alignment and Discrimination
Anant Khandelwal
VLM
23
1
0
11 Apr 2024
Previous
123456...121314
Next