Papers
Communities
Organizations
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.11432
Cited By
Florence: A New Foundation Model for Computer Vision
22 November 2021
Lu Yuan
Dongdong Chen
Yi-Ling Chen
Noel Codella
Xiyang Dai
Jianfeng Gao
Houdong Hu
Xuedong Huang
Boxin Li
Chunyuan Li
Ce Liu
Mengchen Liu
Zicheng Liu
Yumao Lu
Yu Shi
Lijuan Wang
Jianfeng Wang
Bin Xiao
Zhen Xiao
Jianwei Yang
Michael Zeng
Luowei Zhou
Pengchuan Zhang
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Florence: A New Foundation Model for Computer Vision"
50 / 668 papers shown
Title
Parameter Efficient Fine Tuning: A Comprehensive Analysis Across Applications
Charith Chandra Sai Balne
S. Bhaduri
Tamoghna Roy
Vinija Jain
Aman Chadha
108
20
0
21 Apr 2024
Progressive Multi-modal Conditional Prompt Tuning
Xiaoyu Qiu
Hao Feng
Yuechen Wang
Wen-gang Zhou
Houqiang Li
VLM
95
3
0
18 Apr 2024
Pretraining Billion-scale Geospatial Foundational Models on Frontier
A. Tsaris
P. Dias
Abhishek Potnis
Junqi Yin
Feiyi Wang
D. Lunga
AI4CE
47
5
0
17 Apr 2024
A Progressive Framework of Vision-language Knowledge Distillation and Alignment for Multilingual Scene
Wenbo Zhang
Yifan Zhang
Jianfeng Lin
Binqiang Huang
Jinlu Zhang
Wenhao Yu
VLM
105
2
0
17 Apr 2024
Optimization of Prompt Learning via Multi-Knowledge Representation for Vision-Language Models
Enming Zhang
Bingke Zhu
Yingying Chen
Qinghai Miao
Ming Tang
Jinqiao Wang
VLM
69
0
0
16 Apr 2024
Evolving Interpretable Visual Classifiers with Large Language Models
Mia Chiquier
Utkarsh Mall
Carl Vondrick
VLM
103
11
0
15 Apr 2024
Leveraging Temporal Contextualization for Video Action Recognition
Minji Kim
Dongyoon Han
Taekyung Kim
Bohyung Han
98
2
0
15 Apr 2024
LLM-Seg: Bridging Image Segmentation and Large Language Model Reasoning
Junchi Wang
Lei Ke
MLLM
LRM
VLM
85
29
0
12 Apr 2024
On the Robustness of Language Guidance for Low-Level Vision Tasks: Findings from Depth Estimation
Agneet Chatterjee
Tejas Gokhale
Chitta Baral
Yezhou Yang
VLM
65
2
0
12 Apr 2024
Generalized Contrastive Learning for Multi-Modal Retrieval and Ranking
Tianyu Zhu
M. Jung
Jesse Clark
143
1
0
12 Apr 2024
PromptSync: Bridging Domain Gaps in Vision-Language Models through Class-Aware Prototype Alignment and Discrimination
Anant Khandelwal
VLM
77
1
0
11 Apr 2024
BRAVE: Broadening the visual encoding of vision-language models
Ouguzhan Fatih Kar
A. Tonioni
Petra Poklukar
Achin Kulshrestha
Amir Zamir
Federico Tombari
MLLM
VLM
80
32
0
10 Apr 2024
On the Surprising Efficacy of Distillation as an Alternative to Pre-Training Small Models
Sean Farhat
Deming Chen
120
0
0
04 Apr 2024
SalFoM: Dynamic Saliency Prediction with Video Foundation Models
Morteza Moradi
Mohammad Moradi
Francesco Rundo
C. Spampinato
Ali Borji
S. Palazzo
84
1
0
03 Apr 2024
Cross-Modal Conditioned Reconstruction for Language-guided Medical Image Segmentation
Xiaoshuang Huang
Hongxiang Li
Meng Cao
Long Chen
Chenyu You
Dong An
VLM
99
5
0
03 Apr 2024
Foundation Models for Structural Health Monitoring
Luca Benfenati
Daniele Jahier Pagliari
Luca Zanatta
Yhorman Alexander Bedoya Velez
Andrea Acquaviva
Massimo Poncino
Enrico Macii
Luca Benini
Luca Bompani
AI4CE
84
2
0
03 Apr 2024
Segment Any 3D Object with Language
Seungjun Lee
Yuyang Zhao
Gim Hee Lee
84
1
0
02 Apr 2024
Transformer based Pluralistic Image Completion with Reduced Information Loss
Qiankun Liu
Yuqi Jiang
Zhentao Tan
DongDong Chen
Ying Fu
Qi Chu
Gang Hua
Nenghai Yu
ViT
114
12
0
31 Mar 2024
LORD: Large Models based Opposite Reward Design for Autonomous Driving
Xin Ye
Feng Tao
Abhirup Mallik
Burhaneddin Yaman
Liu Ren
OffRL
119
5
0
27 Mar 2024
Open-Set Recognition in the Age of Vision-Language Models
Dimity Miller
Niko Sünderhauf
Alex Kenna
Keita Mason
VLM
68
6
0
25 Mar 2024
Enhancing Visual Continual Learning with Language-Guided Supervision
Bolin Ni
Hongbo Zhao
Chenghao Zhang
Ke Hu
Gaofeng Meng
Zhaoxiang Zhang
Shiming Xiang
CLL
VLM
135
4
0
24 Mar 2024
SDSTrack: Self-Distillation Symmetric Adapter Learning for Multi-Modal Visual Object Tracking
Xiaojun Hou
Jiazheng Xing
Yijie Qian
Yaowei Guo
Shuo Xin
...
Kai Tang
Mengmeng Wang
Zhengkai Jiang
Liang Liu
Yong-Jin Liu
105
29
0
24 Mar 2024
Few-Shot Adversarial Prompt Learning on Vision-Language Models
Yiwei Zhou
Xiaobo Xia
Zhiwei Lin
Bo Han
Tongliang Liu
VLM
106
16
0
21 Mar 2024
MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining
Di Wang
Jing Zhang
Minqiang Xu
Lin Liu
Dongsheng Wang
...
Chengxi Han
Haonan Guo
Bo Du
Dacheng Tao
Lefei Zhang
83
53
0
20 Mar 2024
FaceXFormer: A Unified Transformer for Facial Analysis
Kartik Narayan
VS Vibashan
Rama Chellappa
Vishal M. Patel
ViT
143
13
0
19 Mar 2024
Compositional Kronecker Context Optimization for Vision-Language Models
Kun Ding
Xiaohui Li
Qiang Yu
Ying Wang
Haojian Zhang
Shiming Xiang
VLM
79
0
0
18 Mar 2024
Generative Region-Language Pretraining for Open-Ended Object Detection
Chuang Lin
Yi Jiang
Zhuang Li
Zehuan Yuan
Jianfei Cai
ObjD
VLM
86
20
0
15 Mar 2024
UniCode: Learning a Unified Codebook for Multimodal Large Language Models
Sipeng Zheng
Bohan Zhou
Yicheng Feng
Ye Wang
Zongqing Lu
VLM
MLLM
86
9
0
14 Mar 2024
CLIP-BEVFormer: Enhancing Multi-View Image-Based BEV Detector with Ground Truth Flow
Chenbin Pan
Burhaneddin Yaman
Senem Velipasalar
Liu Ren
101
11
0
13 Mar 2024
MoPE-CLIP: Structured Pruning for Efficient Vision-Language Models with Module-wise Pruning Error Metric
Haokun Lin
Haoli Bai
Zhili Liu
Lu Hou
Muyi Sun
Linqi Song
Ying Wei
Zhenan Sun
CLIP
VLM
96
17
0
12 Mar 2024
Improving deep learning with prior knowledge and cognitive models: A survey on enhancing explainability, adversarial robustness and zero-shot learning
F. Mumuni
A. Mumuni
AAML
105
7
0
11 Mar 2024
RESTORE: Towards Feature Shift for Vision-Language Prompt Learning
Yuncheng Yang
Chuyan Zhang
Zuopeng Yang
Yuting Gao
Yulei Qin
Ke Li
Xing Sun
Jie Yang
Yun Gu
VLM
VPVLM
123
0
0
10 Mar 2024
In-context Prompt Learning for Test-time Vision Recognition with Frozen Vision-language Model
Junhui Yin
Xinyu Zhang
Lin Wu
Xianghua Xie
Xiaojie Wang
VPVLM
VLM
MLLM
65
2
0
10 Mar 2024
CLIP the Bias: How Useful is Balancing Data in Multimodal Learning?
Ibrahim Alabdulmohsin
Xiao Wang
Andreas Steiner
Priya Goyal
Alexander DÁmour
Xiao-Qi Zhai
84
21
0
07 Mar 2024
Non-autoregressive Sequence-to-Sequence Vision-Language Models
Kunyu Shi
Qi Dong
Luis Goncalves
Zhuowen Tu
Stefano Soatto
VLM
151
3
0
04 Mar 2024
Rethinking CLIP-based Video Learners in Cross-Domain Open-Vocabulary Action Recognition
Kun-Yu Lin
Henghui Ding
Jiaming Zhou
Yu-Ming Tang
Yi-Xing Peng
Zhilin Zhao
Chen Change Loy
Wei-Shi Zheng
VLM
125
18
0
03 Mar 2024
Multi-modal Attribute Prompting for Vision-Language Models
Xin Liu
Jiamin Wu
and Wenfei Yang
Xu Zhou
Tianzhu Zhang
VLM
89
12
0
01 Mar 2024
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers
Tsai-Shien Chen
Aliaksandr Siarohin
Willi Menapace
Ekaterina Deyneka
Hsiang-wei Chao
...
Yuwei Fang
Hsin-Ying Lee
Jian Ren
Ming-Hsuan Yang
Sergey Tulyakov
VGen
176
211
0
29 Feb 2024
CricaVPR: Cross-image Correlation-aware Representation Learning for Visual Place Recognition
Feng Lu
Xiangyuan Lan
Lijun Zhang
Dongmei Jiang
Yaowei Wang
Chun Yuan
101
37
0
29 Feb 2024
Percept, Chat, and then Adapt: Multimodal Knowledge Transfer of Foundation Models for Open-World Video Recognition
Boyu Chen
Siran Chen
Kunchang Li
Qinglin Xu
Yu Qiao
Yali Wang
72
5
0
29 Feb 2024
Demonstrating and Reducing Shortcuts in Vision-Language Representation Learning
Maurits J. R. Bleeker
Mariya Hendriksen
Andrew Yates
Maarten de Rijke
VLM
100
2
0
27 Feb 2024
StochCA: A Novel Approach for Exploiting Pretrained Models with Cross-Attention
SeungWon Seo
Suho Lee
Sangheum Hwang
87
0
0
25 Feb 2024
Towards Seamless Adaptation of Pre-trained Models for Visual Place Recognition
Feng Lu
Lijun Zhang
Xiangyuan Lan
Shuting Dong
Yaowei Wang
Chun Yuan
116
34
0
22 Feb 2024
The Effect of Batch Size on Contrastive Self-Supervised Speech Representation Learning
Nik Vaessen
David A. van Leeuwen
96
3
0
21 Feb 2024
VideoPrism: A Foundational Visual Encoder for Video Understanding
Long Zhao
N. B. Gundavarapu
Liangzhe Yuan
Hao Zhou
Shen Yan
...
Huisheng Wang
Hartwig Adam
Mikhail Sirotenko
Ting Liu
Boqing Gong
VGen
136
36
0
20 Feb 2024
Advancing Human Action Recognition with Foundation Models trained on Unlabeled Public Videos
Yang Qian
Yinan Sun
A. Kargarandehkordi
Parnian Azizian
O. Mutlu
Saimourya Surabhi
Pingyi Chen
Zain Jabbar
Dennis Paul Wall
Peter Washington
OffRL
78
1
0
14 Feb 2024
Towards a Foundation Model for Brain Age Prediction using coVariance Neural Networks
Saurabh Sihag
Gonzalo Mateos
Alejandro Ribeiro
74
6
0
12 Feb 2024
Real-World Robot Applications of Foundation Models: A Review
Kento Kawaharazuka
T. Matsushima
Andrew Gambardella
Jiaxian Guo
Chris Paxton
Andy Zeng
OffRL
VLM
LM&Ro
118
54
0
08 Feb 2024
LLMs Meet VLMs: Boost Open Vocabulary Object Detection with Fine-grained Descriptors
Sheng Jin
Xue-Qiu Jiang
Jiaxing Huang
Lewei Lu
Shijian Lu
VLM
ObjD
98
26
0
07 Feb 2024
The Essential Role of Causality in Foundation World Models for Embodied AI
Tarun Gupta
Wenbo Gong
Chao Ma
Nick Pawlowski
Agrin Hilmkil
...
Jianfeng Gao
Stefan Bauer
Danica Kragic
Bernhard Schölkopf
Cheng Zhang
92
17
0
06 Feb 2024
Previous
1
2
3
4
5
...
12
13
14
Next