ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.11432
  4. Cited By
Florence: A New Foundation Model for Computer Vision

Florence: A New Foundation Model for Computer Vision

22 November 2021
Lu Yuan
Dongdong Chen
Yi-Ling Chen
Noel Codella
Xiyang Dai
Jianfeng Gao
Houdong Hu
Xuedong Huang
Boxin Li
Chunyuan Li
Ce Liu
Mengchen Liu
Zicheng Liu
Yumao Lu
Yu Shi
Lijuan Wang
Jianfeng Wang
Bin Xiao
Zhen Xiao
Jianwei Yang
Michael Zeng
Luowei Zhou
Pengchuan Zhang
    VLM
ArXiv (abs)PDFHTML

Papers citing "Florence: A New Foundation Model for Computer Vision"

50 / 668 papers shown
Title
CalFuse: Feature Calibration Enhanced Parameter Fusion for Class-Continual Learning
CalFuse: Feature Calibration Enhanced Parameter Fusion for Class-Continual Learning
Jiaxin Guo
Xiaoguang Zhu
Xiaoguang Zhu
Lianlong Sun
Liangyu Teng
...
Di Li
Wei Zhou
Liang Song
Wei Zhou
Liang Song
CLLVLM
153
1
0
01 Jul 2025
Vision Generalist Model: A Survey
Vision Generalist Model: A Survey
Ziyi Wang
Yongming Rao
Shuofeng Sun
Xinrun Liu
Yi Wei
...
Zuyan Liu
Yanbo Wang
Hongmin Liu
Jie Zhou
Jiwen Lu
76
0
0
11 Jun 2025
Fusing Cross-modal and Uni-modal Representations: A Kronecker Product Approach
Youqi Wu
Jingwei Zhang
Farzan Farnia
32
0
0
10 Jun 2025
Multimodal Representation Alignment for Cross-modal Information Retrieval
Fan Xu
Luis A. Leiva
21
0
0
10 Jun 2025
OV-COAST: Cost Aggregation with Optimal Transport for Open-Vocabulary Semantic Segmentation
OV-COAST: Cost Aggregation with Optimal Transport for Open-Vocabulary Semantic Segmentation
Aditya Gandhamal
Aniruddh Sikdar
Suresh Sundaram
OT
99
0
0
04 Jun 2025
AetherVision-Bench: An Open-Vocabulary RGB-Infrared Benchmark for Multi-Angle Segmentation across Aerial and Ground Perspectives
AetherVision-Bench: An Open-Vocabulary RGB-Infrared Benchmark for Multi-Angle Segmentation across Aerial and Ground Perspectives
Aniruddh Sikdar
Aditya Gandhamal
Suresh Sundaram
VLM
69
0
0
04 Jun 2025
Attacking Attention of Foundation Models Disrupts Downstream Tasks
Attacking Attention of Foundation Models Disrupts Downstream Tasks
Hondamunige Prasanna Silva
Federico Becattini
Lorenzo Seidenari
AAML
37
0
0
03 Jun 2025
PDE-Transformer: Efficient and Versatile Transformers for Physics Simulations
PDE-Transformer: Efficient and Versatile Transformers for Physics Simulations
Benjamin Holzschuh
Qiang Liu
Georg Kohl
Nils Thuerey
AI4CE
54
1
0
30 May 2025
DINO-R1: Incentivizing Reasoning Capability in Vision Foundation Models
DINO-R1: Incentivizing Reasoning Capability in Vision Foundation Models
Chenbin Pan
Wenbin He
Zhengzhong Tu
Liu Ren
LRMVLM
92
0
0
29 May 2025
From Theory to Application: Fine-Tuning Large EEG Model with Real-World Stress Data
From Theory to Application: Fine-Tuning Large EEG Model with Real-World Stress Data
Siwen Wang
Shitou Zhang
Wan-Lin Chen
Dung Truong
Tzyy-Ping Jung
43
0
0
29 May 2025
Point-RFT: Improving Multimodal Reasoning with Visually Grounded Reinforcement Finetuning
Point-RFT: Improving Multimodal Reasoning with Visually Grounded Reinforcement Finetuning
Minheng Ni
Zhengyuan Yang
Linjie Li
Chung-Ching Lin
Kevin Qinghong Lin
W. Zuo
Lijuan Wang
ReLMLRM
97
1
0
26 May 2025
Distill CLIP (DCLIP): Enhancing Image-Text Retrieval via Cross-Modal Transformer Distillation
Distill CLIP (DCLIP): Enhancing Image-Text Retrieval via Cross-Modal Transformer Distillation
Daniel Csizmadia
Andrei Codreanu
Victor Sim
Vighnesh Prabhu
Michael Lu
Kevin Zhu
Sean O'Brien
Vasu Sharma
CLIPVLM
80
0
0
25 May 2025
Clip4Retrofit: Enabling Real-Time Image Labeling on Edge Devices via Cross-Architecture CLIP Distillation
Clip4Retrofit: Enabling Real-Time Image Labeling on Edge Devices via Cross-Architecture CLIP Distillation
Li Zhong
Ahmed Ghazal
Jun-Jun Wan
Frederik Zilly
Patrick Mackens
Joachim E. Vollrath
Bogdan Sorin Coseriu
252
0
0
23 May 2025
Seeing the Trees for the Forest: Rethinking Weakly-Supervised Medical Visual Grounding
Seeing the Trees for the Forest: Rethinking Weakly-Supervised Medical Visual Grounding
Ta Duc Huy
Duy Anh Huynh
Yutong Xie
Yuankai Qi
Qi Chen
...
Anton van den Hengel
Zhibin Liao
Minh-Son To
Johan Verjans
Vu Minh Hieu Phan
114
0
0
21 May 2025
TAGS: 3D Tumor-Adaptive Guidance for SAM
TAGS: 3D Tumor-Adaptive Guidance for SAM
Sirui Li
Linkai Peng
Zheyuan Zhang
Gorkem Durak
Ulas Bagci
MedImVLM
211
0
0
21 May 2025
GeoMM: On Geodesic Perspective for Multi-modal Learning
GeoMM: On Geodesic Perspective for Multi-modal Learning
Shibin Mei
Hang Wang
Bingbing Ni
82
0
0
16 May 2025
Boosting Text-to-Chart Retrieval through Training with Synthesized Semantic Insights
Boosting Text-to-Chart Retrieval through Training with Synthesized Semantic Insights
Yifan Wu
Lutao Yan
Yizhang Zhu
Yinan Mei
Jiannan Wang
Nan Tang
Yuyu Luo
124
1
0
15 May 2025
Griffin: Towards a Graph-Centric Relational Database Foundation Model
Griffin: Towards a Graph-Centric Relational Database Foundation Model
Yanbo Wang
Xiyuan Wang
Quan Gan
Minjie Wang
Qibin Yang
David Wipf
Muhan Zhang
385
0
0
08 May 2025
Handling Imbalanced Pseudolabels for Vision-Language Models with Concept Alignment and Confusion-Aware Calibrated Margin
Handling Imbalanced Pseudolabels for Vision-Language Models with Concept Alignment and Confusion-Aware Calibrated Margin
Yuchen Wang
X. Bai
Xiaochen Li
Weili Guan
Liqiang Nie
Xinyang Chen
VLM
130
0
0
04 May 2025
A Sensor Agnostic Domain Generalization Framework for Leveraging Geospatial Foundation Models: Enhancing Semantic Segmentation viaSynergistic Pseudo-Labeling and Generative Learning
A Sensor Agnostic Domain Generalization Framework for Leveraging Geospatial Foundation Models: Enhancing Semantic Segmentation viaSynergistic Pseudo-Labeling and Generative Learning
Anan Yaghmour
Melba M. Crawford
Saurabh Prasad
65
0
0
02 May 2025
Diff-Prompt: Diffusion-Driven Prompt Generator with Mask Supervision
Diff-Prompt: Diffusion-Driven Prompt Generator with Mask Supervision
Weicai Yan
Wang Lin
Zirun Guo
Ye Wang
Fangming Feng
Xiaoda Yang
Ziyi Wang
Tao Jin
DiffM
316
2
0
30 Apr 2025
FedMVP: Federated Multi-modal Visual Prompt Tuning for Vision-Language Models
FedMVP: Federated Multi-modal Visual Prompt Tuning for Vision-Language Models
Mainak Singha
Subhankar Roy
Sarthak Mehrotra
Ankit Jha
Moloud Abdar
Biplab Banerjee
Elisa Ricci
VLMVPVLM
198
0
0
29 Apr 2025
FrogDogNet: Fourier frequency Retained visual prompt Output Guidance for Domain Generalization of CLIP in Remote Sensing
FrogDogNet: Fourier frequency Retained visual prompt Output Guidance for Domain Generalization of CLIP in Remote Sensing
Hariseetharam Gunduboina
Muhammad Haris Khan
Biplab Banerjee
VLM
99
0
0
23 Apr 2025
CLIP-Powered Domain Generalization and Domain Adaptation: A Comprehensive Survey
CLIP-Powered Domain Generalization and Domain Adaptation: A Comprehensive Survey
Jindong Li
Yongqian Li
Yali Fu
Jiahong Liu
Yixin Liu
Menglin Yang
Irwin King
VLM
91
0
0
19 Apr 2025
CM3AE: A Unified RGB Frame and Event-Voxel/-Frame Pre-training Framework
CM3AE: A Unified RGB Frame and Event-Voxel/-Frame Pre-training Framework
Wentao Wu
Xinyu Wang
Chenglong Li
Bo Jiang
Jin Tang
Bin Luo
Qi Liu
105
0
0
17 Apr 2025
Beyond Words: Augmenting Discriminative Richness via Diffusions in Unsupervised Prompt Learning
Beyond Words: Augmenting Discriminative Richness via Diffusions in Unsupervised Prompt Learning
Hairui Ren
Fan Tang
He Zhao
Zixuan Wang
Dandan Guo
Yi Chang
VLM
87
0
0
16 Apr 2025
Memory-Modular Classification: Learning to Generalize with Memory Replacement
Memory-Modular Classification: Learning to Generalize with Memory Replacement
Dahyun Kang
Ahmet Iscen
Eunchan Jo
Sua Choi
Minsu Cho
Cordelia Schmid
VLMKELMOffRL
128
0
0
08 Apr 2025
Think When You Need: Self-Adaptive Chain-of-Thought Learning
Think When You Need: Self-Adaptive Chain-of-Thought Learning
Junjie Yang
Ke Lin
Xing Yu
ReLMLRMAI4CE
131
2
0
04 Apr 2025
STING-BEE: Towards Vision-Language Model for Real-World X-ray Baggage Security Inspection
STING-BEE: Towards Vision-Language Model for Real-World X-ray Baggage Security Inspection
Divya Velayudhan
A. Ahmed
Mohamad Alansari
Neha Gour
Abderaouf Behouch
...
Muzammal Naseer
Juergen Gall
Mohammed Bennamoun
Ernesto Damiani
Naoufel Werghi
120
0
0
03 Apr 2025
Leveraging Vision-Language Foundation Models to Reveal Hidden Image-Attribute Relationships in Medical Imaging
Leveraging Vision-Language Foundation Models to Reveal Hidden Image-Attribute Relationships in Medical Imaging
Amar Kumar
Anita Kriz
Barak Pertzov
Tal Arbel
MedIm
84
0
0
30 Mar 2025
GOAL: Global-local Object Alignment Learning
GOAL: Global-local Object Alignment Learning
Hyungyu Choi
Young Kyun Jang
Chanho Eom
VLM
422
0
0
22 Mar 2025
OSLoPrompt: Bridging Low-Supervision Challenges and Open-Set Domain Generalization in CLIP
OSLoPrompt: Bridging Low-Supervision Challenges and Open-Set Domain Generalization in CLIP
M. Cui
Divyam Gupta
Mainak Singha
Sai Bhargav Rongali
Ankit Jha
Muhammad Haris Khan
Biplab Banerjee
VLM
133
1
0
20 Mar 2025
M3: 3D-Spatial MultiModal Memory
M3: 3D-Spatial MultiModal Memory
Xueyan Zou
Yuchen Song
Ri-Zhao Qiu
Xuanbin Peng
Jianglong Ye
Sifei Liu
Xiaolong Wang
3DGS
114
0
0
20 Mar 2025
Optimized 3D Gaussian Splatting using Coarse-to-Fine Image Frequency Modulation
Optimized 3D Gaussian Splatting using Coarse-to-Fine Image Frequency Modulation
Umar Farooq
Jean-Yves Guillemaut
Adrian Hilton
M. Volino
3DGS
115
0
0
18 Mar 2025
Squeeze Out Tokens from Sample for Finer-Grained Data Governance
Squeeze Out Tokens from Sample for Finer-Grained Data Governance
Weixiong Lin
Chen Ju
Haicheng Wang
Shengchao Hu
Shuai Xiao
...
Yuheng Jiao
Mingshuai Yao
Jinsong Lan
Qingwen Liu
Ying Chen
84
0
0
18 Mar 2025
SAM2 for Image and Video Segmentation: A Comprehensive Survey
SAM2 for Image and Video Segmentation: A Comprehensive Survey
Zhang Jiaxing
Tang Hao
VLM
117
1
0
17 Mar 2025
TLAC: Two-stage LMM Augmented CLIP for Zero-Shot Classification
TLAC: Two-stage LMM Augmented CLIP for Zero-Shot Classification
Ans Munir
Faisal Z. Qureshi
M. H. Khan
Mohsen Ali
VLM
171
0
0
15 Mar 2025
Towards Graph Foundation Models: A Transferability Perspective
Yansen Wang
Wenqi Fan
Suhang Wang
Yao Ma
88
1
0
13 Mar 2025
Keeping Representation Similarity in Finetuning for Medical Image Analysis
Wenqiang Zu
Shenghao Xie
Hao Chen
Yiming Liang
Lei Ma
MedImOOD
145
0
0
10 Mar 2025
LVLM-Compress-Bench: Benchmarking the Broader Impact of Large Vision-Language Model Compression
Souvik Kundu
Anahita Bhiwandiwalla
Sungduk Yu
Phillip Howard
Tiep Le
S. N. Sridhar
David Cobbley
Hao Kang
Vasudev Lal
MQ
92
2
0
06 Mar 2025
Enhancing Collective Intelligence in Large Language Models Through Emotional Integration
Likith Kadiyala
Ramteja Sajja
Y. Sermet
Ibrahim Demir
412
0
0
05 Mar 2025
OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction
OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction
Huang Huang
Fangchen Liu
Letian Fu
Tingfan Wu
Mustafa Mukadam
Jitendra Malik
Ken Goldberg
Pieter Abbeel
LM&RoVLM
186
10
0
05 Mar 2025
PRISM: High-Resolution & Precise Counterfactual Medical Image Generation using Language-guided Stable Diffusion
Amar Kumar
Anita Kriz
Mohammad Havaei
Tal Arbel
MedIm
100
3
0
28 Feb 2025
InPK: Infusing Prior Knowledge into Prompt for Vision-Language Models
InPK: Infusing Prior Knowledge into Prompt for Vision-Language Models
Shuchang Zhou
Jiwei Wei
Shiyuan He
Yuyang Zhou
Chaoning Zhang
Jie Zou
Ning Xie
Yang Yang
VLMVPVLM
156
0
0
27 Feb 2025
DUNIA: Pixel-Sized Embeddings via Cross-Modal Alignment for Earth Observation Applications
DUNIA: Pixel-Sized Embeddings via Cross-Modal Alignment for Earth Observation Applications
Ibrahim Fayad
Max Zimmer
Martin Schwartz
P. Ciais
Fabian Gieseke
Gabriel Belouze
Sarah Brood
A. D. Truchis
Alexandre d’Aspremont
AI4TS
107
0
0
24 Feb 2025
SelaVPR++: Towards Seamless Adaptation of Foundation Models for Efficient Place Recognition
SelaVPR++: Towards Seamless Adaptation of Foundation Models for Efficient Place Recognition
Feng Lu
Tong Jin
X. Lan
Lijun Zhang
Yunpeng Liu
Yaowei Wang
Chun Yuan
87
1
0
23 Feb 2025
Recognize Any Surgical Object: Unleashing the Power of Weakly-Supervised Data
Recognize Any Surgical Object: Unleashing the Power of Weakly-Supervised Data
Jiajie Li
Brian R Quaranto
Chenhui Xu
Ishan Mishra
Ruiyang Qin
Dancheng Liu
Peter C W Kim
Jinjun Xiong
193
0
0
25 Jan 2025
Know "No'' Better: A Data-Driven Approach for Enhancing Negation Awareness in CLIP
Know "No'' Better: A Data-Driven Approach for Enhancing Negation Awareness in CLIP
J. Park
Jungbeom Lee
Jongyoon Song
Sangwon Yu
Dahuin Jung
Sungroh Yoon
125
3
0
19 Jan 2025
Explore the Use of Time Series Foundation Model for Car-Following Behavior Analysis
Explore the Use of Time Series Foundation Model for Car-Following Behavior Analysis
Luwei Zeng
Runze Yan
AI4TS
94
0
0
13 Jan 2025
GIT-CXR: End-to-End Transformer for Chest X-Ray Report Generation
Iustin Sîrbu
Iulia-Renata Sîrbu
Jasmina Bogojeska
Traian Rebedea
MedImViTLM&MA
79
1
0
05 Jan 2025
1234...121314
Next