Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.11432
Cited By
Florence: A New Foundation Model for Computer Vision
22 November 2021
Lu Yuan
Dongdong Chen
Yi-Ling Chen
Noel Codella
Xiyang Dai
Jianfeng Gao
Houdong Hu
Xuedong Huang
Boxin Li
Chunyuan Li
Ce Liu
Mengchen Liu
Zicheng Liu
Yumao Lu
Yu Shi
Lijuan Wang
Jianfeng Wang
Bin Xiao
Zhen Xiao
Jianwei Yang
Michael Zeng
Luowei Zhou
Pengchuan Zhang
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Florence: A New Foundation Model for Computer Vision"
50 / 664 papers shown
Title
GeoMM: On Geodesic Perspective for Multi-modal Learning
Shibin Mei
Hang Wang
Bingbing Ni
22
0
0
16 May 2025
Boosting Text-to-Chart Retrieval through Training with Synthesized Semantic Insights
Yifan Wu
Lutao Yan
Yizhang Zhu
Yinan Mei
Jiannan Wang
Nan Tang
Yuyu Luo
27
0
0
15 May 2025
Griffin: Towards a Graph-Centric Relational Database Foundation Model
Yanbo Wang
Xiyuan Wang
Quan Gan
Minjie Wang
Qibin Yang
David Wipf
Muhan Zhang
138
0
0
08 May 2025
Handling Imbalanced Pseudolabels for Vision-Language Models with Concept Alignment and Confusion-Aware Calibrated Margin
Yuchen Wang
X. Bai
X. Li
Weili Guan
Liqiang Nie
Xinyang Chen
VLM
49
0
0
04 May 2025
A Sensor Agnostic Domain Generalization Framework for Leveraging Geospatial Foundation Models: Enhancing Semantic Segmentation viaSynergistic Pseudo-Labeling and Generative Learning
Anan Yaghmour
Melba M. Crawford
Saurabh Prasad
29
0
0
02 May 2025
Diff-Prompt: Diffusion-Driven Prompt Generator with Mask Supervision
Weicai Yan
Wang Lin
Zirun Guo
Ye Wang
Fangming Feng
Xiaoda Yang
Zihan Wang
Tao Jin
DiffM
150
2
0
30 Apr 2025
FedMVP: Federated Multi-modal Visual Prompt Tuning for Vision-Language Models
Mainak Singha
Subhankar Roy
Sarthak Mehrotra
Ankit Jha
Moloud Abdar
Biplab Banerjee
Elisa Ricci
VLM
VPVLM
119
0
0
29 Apr 2025
FrogDogNet: Fourier frequency Retained visual prompt Output Guidance for Domain Generalization of CLIP in Remote Sensing
Hariseetharam Gunduboina
Muhammad Haris Khan
Biplab Banerjee
VLM
47
0
0
23 Apr 2025
CLIP-Powered Domain Generalization and Domain Adaptation: A Comprehensive Survey
Jindong Li
Yong Li
Yali Fu
Jiahong Liu
Yixin Liu
Menglin Yang
Irwin King
VLM
41
0
0
19 Apr 2025
CM3AE: A Unified RGB Frame and Event-Voxel/-Frame Pre-training Framework
Wentao Wu
Xueliang Wang
Chenglong Li
Bo Jiang
Jin Tang
Bin Luo
Qi Liu
34
0
0
17 Apr 2025
Beyond Words: Augmenting Discriminative Richness via Diffusions in Unsupervised Prompt Learning
Hairui Ren
Fan Tang
He Zhao
Zixuan Wang
Dandan Guo
Yi Chang
VLM
41
0
0
16 Apr 2025
Memory-Modular Classification: Learning to Generalize with Memory Replacement
Dahyun Kang
Ahmet Iscen
Eunchan Jo
Sua Choi
Minsu Cho
Cordelia Schmid
VLM
KELM
OffRL
39
0
0
08 Apr 2025
Think When You Need: Self-Adaptive Chain-of-Thought Learning
Junjie Yang
Ke Lin
Xing Yu
ReLM
LRM
AI4CE
57
1
0
04 Apr 2025
STING-BEE: Towards Vision-Language Model for Real-World X-ray Baggage Security Inspection
Divya Velayudhan
A. Ahmed
Mohamad Alansari
Neha Gour
Abderaouf Behouch
...
Muzammal Naseer
Juergen Gall
Mohammed Bennamoun
Ernesto Damiani
Naoufel Werghi
50
0
0
03 Apr 2025
Leveraging Vision-Language Foundation Models to Reveal Hidden Image-Attribute Relationships in Medical Imaging
Amar Kumar
Anita Kriz
Barak Pertzov
Tal Arbel
MedIm
56
0
0
30 Mar 2025
Feature Calibration enhanced Parameter Synthesis for CLIP-based Class-incremental Learning
Jiaxin Guo
Xiaoguang Zhu
Xiaoguang Zhu
Lianlong Sun
Liangyu Teng
Yang Liu
Di Li
Wei Zhou
Liang Song
CLL
VLM
59
1
0
24 Mar 2025
GOAL: Global-local Object Alignment Learning
Hyungyu Choi
Young Kyun Jang
Chanho Eom
VLM
177
0
0
22 Mar 2025
OSLoPrompt: Bridging Low-Supervision Challenges and Open-Set Domain Generalization in CLIP
M. Cui
Divyam Gupta
Mainak Singha
Sai Bhargav Rongali
Ankit Jha
Muhammad Haris Khan
Biplab Banerjee
VLM
53
1
0
20 Mar 2025
M3: 3D-Spatial MultiModal Memory
Xueyan Zou
Yuchen Song
Ri-Zhao Qiu
Xuanbin Peng
Jianglong Ye
Sifei Liu
Xiaolong Wang
3DGS
62
0
0
20 Mar 2025
Squeeze Out Tokens from Sample for Finer-Grained Data Governance
Weixiong Lin
Chen Ju
Haicheng Wang
Shengchao Hu
Shuai Xiao
...
Yuheng Jiao
Mingshuai Yao
Jinsong Lan
Qingwen Liu
Ying Chen
55
0
0
18 Mar 2025
Optimized 3D Gaussian Splatting using Coarse-to-Fine Image Frequency Modulation
Umar Farooq
Jean-Yves Guillemaut
Adrian Hilton
M. Volino
3DGS
72
0
0
18 Mar 2025
SAM2 for Image and Video Segmentation: A Comprehensive Survey
Zhang Jiaxing
Tang Hao
VLM
54
0
0
17 Mar 2025
TLAC: Two-stage LMM Augmented CLIP for Zero-Shot Classification
Ans Munir
Faisal Z. Qureshi
M. H. Khan
Mohsen Ali
VLM
70
0
0
15 Mar 2025
Towards Graph Foundation Models: A Transferability Perspective
Yansen Wang
Wenqi Fan
Suhang Wang
Yao Ma
43
1
0
13 Mar 2025
Keeping Representation Similarity in Finetuning for Medical Image Analysis
Wenqiang Zu
Shenghao Xie
Hao Chen
Yiming Liang
Lei Ma
MedIm
OOD
48
0
0
10 Mar 2025
LVLM-Compress-Bench: Benchmarking the Broader Impact of Large Vision-Language Model Compression
Souvik Kundu
Anahita Bhiwandiwalla
Sungduk Yu
Phillip Howard
Tiep Le
S. N. Sridhar
David Cobbley
Hao Kang
Vasudev Lal
MQ
59
1
0
06 Mar 2025
Enhancing Collective Intelligence in Large Language Models Through Emotional Integration
Likith Kadiyala
Ramteja Sajja
Y. Sermet
Ibrahim Demir
188
0
0
05 Mar 2025
OTTER: A Vision-Language-Action Model with Text-Aware Visual Feature Extraction
Huang Huang
Fangchen Liu
Letian Fu
Tingfan Wu
Mustafa Mukadam
Jitendra Malik
Ken Goldberg
Pieter Abbeel
LM&Ro
VLM
85
6
0
05 Mar 2025
PRISM: High-Resolution & Precise Counterfactual Medical Image Generation using Language-guided Stable Diffusion
Amar Kumar
Anita Kriz
Mohammad Havaei
Tal Arbel
MedIm
49
2
0
28 Feb 2025
InPK: Infusing Prior Knowledge into Prompt for Vision-Language Models
Shuchang Zhou
Jiwei Wei
Shiyuan He
Yuyang Zhou
Chaoning Zhang
Jie Zou
Ning Xie
Yang Yang
VLM
VPVLM
84
0
0
27 Feb 2025
DUNIA: Pixel-Sized Embeddings via Cross-Modal Alignment for Earth Observation Applications
Ibrahim Fayad
Max Zimmer
Martin Schwartz
P. Ciais
Fabian Gieseke
Gabriel Belouze
Sarah Brood
A. D. Truchis
Alexandre d’Aspremont
AI4TS
43
0
0
24 Feb 2025
SelaVPR++: Towards Seamless Adaptation of Foundation Models for Efficient Place Recognition
Feng Lu
Tong Jin
X. Lan
Lijun Zhang
Yunpeng Liu
Yaowei Wang
Chun Yuan
39
0
0
23 Feb 2025
Recognize Any Surgical Object: Unleashing the Power of Weakly-Supervised Data
Jiajie Li
Brian R Quaranto
Chenhui Xu
Ishan Mishra
Ruiyang Qin
Dancheng Liu
Peter C W Kim
Jinjun Xiong
94
0
0
25 Jan 2025
Know "No'' Better: A Data-Driven Approach for Enhancing Negation Awareness in CLIP
J. Park
Jungbeom Lee
Jongyoon Song
Sangwon Yu
Dahuin Jung
Sungroh Yoon
47
0
0
19 Jan 2025
Explore the Use of Time Series Foundation Model for Car-Following Behavior Analysis
Luwei Zeng
Runze Yan
AI4TS
48
0
0
13 Jan 2025
Feedback-Driven Vision-Language Alignment with Minimal Human Supervision
Giorgio Giannone
Ruoteng Li
Qianli Feng
Evgeny Perevodchikov
Rui Chen
Aleix M. Martinez
VLM
66
0
0
08 Jan 2025
GIT-CXR: End-to-End Transformer for Chest X-Ray Report Generation
Iustin Sîrbu
Iulia-Renata Sîrbu
Jasmina Bogojeska
Traian Rebedea
MedIm
ViT
LM&MA
36
0
0
05 Jan 2025
Fine-Tuning Games: Bargaining and Adaptation for General-Purpose Models
Benjamin Laufer
Jon M. Kleinberg
Hoda Heidari
60
8
0
03 Jan 2025
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
Pinelopi Papalampidi
Skanda Koppula
Shreya Pathak
Justin T Chiu
Joseph Heyward
Viorica Patraucean
Jiajun Shen
Antoine Miech
Andrew Zisserman
Aida Nematzdeh
VLM
69
24
0
31 Dec 2024
Improving Generated and Retrieved Knowledge Combination Through Zero-shot Generation
Xinkai Du
Quanjie Han
Chao Lv
Yi Liu
Yalin Sun
Hao Shu
Hongbo Shan
Maosong Sun
RALM
45
0
0
25 Dec 2024
Towards Graph Foundation Models: Learning Generalities Across Graphs via Task-Trees
Zehong Wang
Zheyuan Zhang
Tianyi Ma
Nitesh V. Chawla
Chuxu Zhang
Yanfang Ye
AI4CE
84
0
0
21 Dec 2024
Do Language Models Understand Time?
Xi Ding
Lei Wang
184
0
0
18 Dec 2024
Bringing Multimodality to Amazon Visual Search System
Xinliang Zhu
Michael Huang
Han Ding
Jinyu Yang
Kelvin Chen
...
Son Dinh Tran
Benjamin Z. Yao
Doug Gray
Anuj Bindal
Arnab Dhua
79
3
0
17 Dec 2024
Beyond Accuracy: On the Effects of Fine-tuning Towards Vision-Language Model's Prediction Rationality
Qitong Wang
Tang Li
Kien X. Nguyen
Xi Peng
90
0
0
17 Dec 2024
Retaining and Enhancing Pre-trained Knowledge in Vision-Language Models with Prompt Ensembling
Donggeun Kim
Yujin Jo
Myungjoo Lee
Taesup Kim
VLM
83
0
0
10 Dec 2024
CLIP-PING: Boosting Lightweight Vision-Language Models with Proximus Intrinsic Neighbors Guidance
Chu Myaet Thwal
Ye Lin Tun
Minh N. H. Nguyen
Eui-nam Huh
Choong Seon Hong
VLM
74
0
0
05 Dec 2024
Exploring Large Vision-Language Models for Robust and Efficient Industrial Anomaly Detection
Kun Qian
Tianyu Sun
Wenhong Wang
71
0
0
01 Dec 2024
EDTformer: An Efficient Decoder Transformer for Visual Place Recognition
Tong Jin
Feng Lu
Shuyu Hu
Chun Yuan
Yunpeng Liu
ViT
77
0
0
01 Dec 2024
Advancing Myopia To Holism: Fully Contrastive Language-Image Pre-training
Haicheng Wang
Chen Ju
Weixiong Lin
Shuai Xiao
Mengting Chen
...
Mingshuai Yao
Jinsong Lan
Ying Chen
Qingwen Liu
Yanfeng Wang
VLM
CLIP
80
4
0
30 Nov 2024
CLIP meets DINO for Tuning Zero-Shot Classifier using Unlabeled Image Collections
Mohamed Fazli Mohamed Imam
Rufael Fedaku Marew
Jameel Hassan
M. Fiaz
Alham Fikri Aji
Hisham Cholakkal
VLM
229
0
0
28 Nov 2024
1
2
3
4
...
12
13
14
Next