Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.11432
Cited By
Florence: A New Foundation Model for Computer Vision
22 November 2021
Lu Yuan
Dongdong Chen
Yi-Ling Chen
Noel Codella
Xiyang Dai
Jianfeng Gao
Houdong Hu
Xuedong Huang
Boxin Li
Chunyuan Li
Ce Liu
Mengchen Liu
Zicheng Liu
Yumao Lu
Yu Shi
Lijuan Wang
Jianfeng Wang
Bin Xiao
Zhen Xiao
Jianwei Yang
Michael Zeng
Luowei Zhou
Pengchuan Zhang
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Florence: A New Foundation Model for Computer Vision"
50 / 668 papers shown
Title
Distilling Vision-Language Foundation Models: A Data-Free Approach via Prompt Diversification
Yunyi Xuan
Weijie Chen
Shicai Yang
Di Xie
Luojun Lin
Yueting Zhuang
VLM
116
4
0
21 Jul 2024
Open-World Visual Reasoning by a Neuro-Symbolic Program of Zero-Shot Symbols
Gertjan J. Burghouts
Fieke Hillerstrom
Erwin Walraven
M. V. Bekkum
Frank Ruis
J. Sijs
Jelle van Mil
Judith Dijk
NAI
74
1
0
18 Jul 2024
CoAPT: Context Attribute words for Prompt Tuning
Gun Lee
Subin An
Sungyong Baik
Soochahn Lee
VPVLM
VLM
67
1
0
18 Jul 2024
ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference
Mengcheng Lan
Chaofeng Chen
Yiping Ke
Xinjiang Wang
Xue Jiang
Wayne Zhang
VLM
129
29
0
17 Jul 2024
Unconstrained Open Vocabulary Image Classification: Zero-Shot Transfer from Text to Image via CLIP Inversion
Philipp Allgeuer
Kyra Ahrens
Stefan Wermter
CLIP
VLM
92
3
0
15 Jul 2024
Quantized Prompt for Efficient Generalization of Vision-Language Models
Tianxiang Hao
Xiaohan Ding
Juexiao Feng
Yuhong Yang
Hui Chen
Guiguang Ding
VLM
MQ
94
5
0
15 Jul 2024
Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification
Wenshuo Peng
Kaipeng Zhang
Yue Yang
Hao Zhang
Ping Luo
VLM
84
3
0
11 Jul 2024
NODE-Adapter: Neural Ordinary Differential Equations for Better Vision-Language Reasoning
Yi Zhang
Chun-Wun Cheng
Ke Yu
Zhihai He
Carola-Bibiane Schonlieb
Angelica I Aviles-Rivero
VLM
89
2
0
11 Jul 2024
Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Cross-Regularization
Jinlong Li
Zequn Jie
Elisa Ricci
Lin Ma
N. Sebe
VLM
104
1
0
11 Jul 2024
Foundation Model Engineering: Engineering Foundation Models Just as Engineering Software
Dezhi Ran
Mengzhou Wu
Wei Yang
Tao Xie
AI4CE
84
2
0
11 Jul 2024
iiANET: Inception Inspired Attention Hybrid Network for efficient Long-Range Dependency
Haruna Yunusa
Qin Shiyin
Abdulrahman Hamman Adama Chukkol
Isah Bello
A. Lawan
Isah Bello
110
4
0
10 Jul 2024
Rethinking Image-to-Video Adaptation: An Object-centric Perspective
Rui Qian
Shuangrui Ding
Dahua Lin
OCL
101
1
0
09 Jul 2024
CEIA: CLIP-Based Event-Image Alignment for Open-World Event-Based Understanding
Wenhao Xu
Wenming Weng
Yueyi Zhang
Zhiwei Xiong
VLM
80
0
0
09 Jul 2024
AMD: Automatic Multi-step Distillation of Large-scale Vision Models
Cheng Han
Qifan Wang
S. Dianat
Majid Rabbani
Raghuveer M. Rao
Yi Fang
Qiang Guan
Lifu Huang
Dongfang Liu
VLM
86
5
0
05 Jul 2024
Visualizing Dialogues: Enhancing Image Selection through Dialogue Understanding with Large Language Models
Chang-Sheng Kao
Yun-Nung Chen
54
0
0
04 Jul 2024
FairMedFM: Fairness Benchmarking for Medical Imaging Foundation Models
Ruinan Jin
Zikang Xu
Yuan Zhong
Qiongsong Yao
Qi Dou
S. Kevin Zhou
Xiaoxiao Li
VLM
120
17
0
01 Jul 2024
The Progression of Transformers from Language to Vision to MOT: A Literature Review on Multi-Object Tracking with Transformers
Abhi Kamboj
79
0
0
24 Jun 2024
Self-supervised Pretraining and Finetuning for Monocular Depth and Visual Odometry
Boris Chidlovskii
L. Antsfeld
MDE
ViT
86
2
0
16 Jun 2024
Light Up the Shadows: Enhance Long-Tailed Entity Grounding with Concept-Guided Vision-Language Models
Yikai Zhang
Qianyu He
Xintao Wang
Siyu Yuan
Jiaqing Liang
Yanghua Xiao
VLM
86
0
0
16 Jun 2024
Candidate Pseudolabel Learning: Enhancing Vision-Language Models by Prompt Tuning with Unlabeled Data
Jiahan Zhang
Qinglai Wei
Feng Liu
Lei Feng
VLM
86
12
0
15 Jun 2024
Explore the Limits of Omni-modal Pretraining at Scale
Yiyuan Zhang
Handong Li
Jing Liu
Xiangyu Yue
VLM
LRM
92
1
0
13 Jun 2024
Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms
Miaosen Zhang
Yixuan Wei
Zhen Xing
Yifei Ma
Zuxuan Wu
...
Zheng Zhang
Qi Dai
Chong Luo
Xin Geng
Baining Guo
VLM
98
1
0
13 Jun 2024
GraphFM: A Comprehensive Benchmark for Graph Foundation Model
Yuhao Xu
Xinqi Liu
Keyu Duan
Yi Fang
Yu-Neng Chuang
Daochen Zha
Qiaoyu Tan
AI4CE
62
1
0
12 Jun 2024
Can Foundation Models Reliably Identify Spatial Hazards? A Case Study on Curb Segmentation
Diwei Sheng
Giles Hamilton-Fletcher
Mahya Beheshti
Chen Feng
John-Ross Rizzo
82
2
0
11 Jun 2024
Generalist Multimodal AI: A Review of Architectures, Challenges and Opportunities
Sai Munikoti
Ian Stewart
Sameera Horawalavithana
Henry Kvinge
Tegan H. Emerson
Sandra E Thompson
Karl Pazdernik
114
2
0
08 Jun 2024
CTSyn: A Foundational Model for Cross Tabular Data Generation
Xiaofeng Lin
Chenheng Xu
Matthew Yang
Guang Cheng
86
4
0
07 Jun 2024
Efficient 3D-Aware Facial Image Editing via Attribute-Specific Prompt Learning
Amandeep Kumar
Muhammad Awais
Sanath Narayan
Hisham Cholakkal
Salman Khan
Rao Muhammad Anwer
96
0
0
06 Jun 2024
Wings: Learning Multimodal LLMs without Text-only Forgetting
Yi-Kai Zhang
Shiyin Lu
Yang Li
Yanqing Ma
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
De-Chuan Zhan
Han-Jia Ye
VLM
131
10
0
05 Jun 2024
Visual-Text Cross Alignment: Refining the Similarity Score in Vision-Language Models
Jinhao Li
Haopeng Li
S. Erfani
Lei Feng
James Bailey
Feng Liu
VLM
108
6
0
05 Jun 2024
RAP: Efficient Text-Video Retrieval with Sparse-and-Correlated Adapter
Meng Cao
Haoran Tang
Jinfa Huang
Peng Jin
Can Zhang
Ruyang Liu
Long Chen
Xiaodan Liang
Li-ming Yuan
Ge Li
171
14
0
29 May 2024
A Good Foundation is Worth Many Labels: Label-Efficient Panoptic Segmentation
Niclas Vodisch
Kürsat Petek
Markus Kappeler
Abhinav Valada
Wolfram Burgard
VLM
78
4
0
29 May 2024
Transcending Fusion: A Multi-Scale Alignment Method for Remote Sensing Image-Text Retrieval
Rui Yang
Shuang Wang
Yi Han
Yuanheng Li
Dong Zhao
Dou Quan
Yanhe Guo
Licheng Jiao
98
4
0
29 May 2024
MMCTAgent: Multi-modal Critical Thinking Agent Framework for Complex Visual Reasoning
Somnath Kumar
Yash Gadhia
T. Ganu
A. Nambi
LRM
141
4
0
28 May 2024
ECG Semantic Integrator (ESI): A Foundation ECG Model Pretrained with LLM-Enhanced Cardiological Text
Han Yu
Peikun Guo
Akane Sano
82
19
0
26 May 2024
Vertical Federated Learning for Effectiveness, Security, Applicability: A Survey
Mang Ye
Wei Shen
Bo Du
E. Snezhko
Vassili Kovalev
PongChi Yuen
FedML
160
5
0
25 May 2024
A Survey on Vision-Language-Action Models for Embodied AI
Yueen Ma
Zixing Song
Yuzheng Zhuang
Jianye Hao
Irwin King
LM&Ro
349
54
0
23 May 2024
No Filter: Cultural and Socioeconomic Diversity in Contrastive Vision-Language Models
Angeline Pouget
Lucas Beyer
Emanuele Bugliarello
Xiao Wang
Andreas Steiner
Xiao-Qi Zhai
Ibrahim Alabdulmohsin
VLM
94
9
0
22 May 2024
Text-Video Retrieval with Global-Local Semantic Consistent Learning
Haonan Zhang
Pengpeng Zeng
Lianli Gao
Jingkuan Song
Yihang Duan
Xinyu Lyu
Hengtao Shen
VLM
CLIP
104
2
0
21 May 2024
Depth Prompting for Sensor-Agnostic Depth Estimation
Jin-Hwi Park
Chanhwi Jeong
Junoh Lee
Hae-Gon Jeon
MDE
VLM
92
10
0
20 May 2024
Adjacent Leader Decentralized Stochastic Gradient Descent
Haoze He
Jing Wang
A. Choromańska
67
0
0
18 May 2024
AquaLoRA: Toward White-box Protection for Customized Stable Diffusion Models via Watermark LoRA
Weitao Feng
Wenbo Zhou
Jiyan He
Jie Zhang
Tianyi Wei
Guanlin Li
Tianwei Zhang
Weiming Zhang
Neng H. Yu
96
21
0
18 May 2024
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Tianhe Ren
Qing Jiang
Shilong Liu
Zhaoyang Zeng
Wenlong Liu
...
Hao Zhang
Feng Li
Peijun Tang
Kent Yu
Lei Zhang
ObjD
VLM
137
38
0
16 May 2024
Efficient Vision-Language Pre-training by Cluster Masking
Zihao Wei
Zixuan Pan
Andrew Owens
VLM
95
10
0
14 May 2024
FreeVA: Offline MLLM as Training-Free Video Assistant
Wenhao Wu
VLM
OffRL
87
20
0
13 May 2024
Open Challenges and Opportunities in Federated Foundation Models Towards Biomedical Healthcare
Xingyu Li
Lu Peng
Yuping Wang
Weihua Zhang
AI4CE
MedIm
LM&MA
124
12
0
10 May 2024
Selective Classification Under Distribution Shifts
Hengyue Liang
Le Peng
Ju Sun
UQCV
101
2
0
08 May 2024
AAPL: Adding Attributes to Prompt Learning for Vision-Language Models
Gahyeon Kim
Sohee Kim
Seokju Lee
VLM
95
5
0
25 Apr 2024
DENOISER: Rethinking the Robustness for Open-Vocabulary Action Recognition
Haozhe Cheng
Chen Ju
Haicheng Wang
Jinxiang Liu
Mengting Chen
Qiang Hu
Xiaoyun Zhang
Yanfeng Wang
DiffM
VLM
84
6
0
23 Apr 2024
SHE-Net: Syntax-Hierarchy-Enhanced Text-Video Retrieval
Xuzheng Yu
Chen Jiang
Xingning Dong
Tian Gan
Ming Yang
Qingpei Guo
120
2
0
22 Apr 2024
LTOS: Layout-controllable Text-Object Synthesis via Adaptive Cross-attention Fusions
Xiaoran Zhao
Tianhao Wu
Yu Lai
Zhiliang Tian
Zhen Huang
Yahui Liu
Zejiang He
Dongsheng Li
DiffM
116
1
0
21 Apr 2024
Previous
1
2
3
4
5
6
...
12
13
14
Next