Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.11432
Cited By
Florence: A New Foundation Model for Computer Vision
22 November 2021
Lu Yuan
Dongdong Chen
Yi-Ling Chen
Noel Codella
Xiyang Dai
Jianfeng Gao
Houdong Hu
Xuedong Huang
Boxin Li
Chunyuan Li
Ce Liu
Mengchen Liu
Zicheng Liu
Yumao Lu
Yu Shi
Lijuan Wang
Jianfeng Wang
Bin Xiao
Zhen Xiao
Jianwei Yang
Michael Zeng
Luowei Zhou
Pengchuan Zhang
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Florence: A New Foundation Model for Computer Vision"
50 / 664 papers shown
Title
ResCLIP: Residual Attention for Training-free Dense Vision-language Inference
Yuhang Yang
Jinhong Deng
Wen Li
Lixin Duan
VLM
81
0
0
24 Nov 2024
Probe-Me-Not: Protecting Pre-trained Encoders from Malicious Probing
Ruyi Ding
Tong Zhou
Lili Su
A. A. Ding
Xiaolin Xu
Yunsi Fei
AAML
69
1
0
19 Nov 2024
Transmission Line Defect Detection Based on UAV Patrol Images and Vision-language Pretraining
Ke Zhang
Zhaoye Zheng
Yurong Guo
Jiacun Wang
Jiyuan Yang
Yangjie Xiao
VLM
79
0
0
18 Nov 2024
Efficient Transfer Learning for Video-language Foundation Models
Haoxing Chen
Zizheng Huang
Y. Hong
Yanshuo Wang
Zhongcai Lyu
Zhuoer Xu
Jun Lan
Zhangxuan Gu
VLM
54
0
0
18 Nov 2024
CorrCLIP: Reconstructing Correlations in CLIP with Off-the-Shelf Foundation Models for Open-Vocabulary Semantic Segmentation
Dengke Zhang
Fagui Liu
Quan Tang
VLM
55
1
0
15 Nov 2024
GFT: Graph Foundation Model with Transferable Tree Vocabulary
Zehong Wang
Zheyuan Zhang
Nitesh V. Chawla
Chuxu Zhang
Yanfang Ye
49
10
0
09 Nov 2024
Aggregate-and-Adapt Natural Language Prompts for Downstream Generalization of CLIP
Chen Huang
Skyler Seto
Samira Abnar
David Grangier
Navdeep Jaitly
J. Susskind
VLM
51
0
0
31 Oct 2024
EchoFM: Foundation Model for Generalizable Echocardiogram Analysis
Sekeun Kim
Pengfei Jin
S. Song
Cheng Chen
Yiwei Li
Hui Ren
Xiang Li
Tianming Liu
Quanzheng Li
39
0
0
30 Oct 2024
CLIPErase: Efficient Unlearning of Visual-Textual Associations in CLIP
Tianyu Yang
Lisen Dai
Zheyuan Liu
Xiangqi Wang
Meng Jiang
Yapeng Tian
Xiangliang Zhang
VLM
MU
37
4
0
30 Oct 2024
Towards Unifying Understanding and Generation in the Era of Vision Foundation Models: A Survey from the Autoregression Perspective
Shenghao Xie
Wenqiang Zu
Mingyang Zhao
Duo Su
Shilong Liu
Ruohua Shi
Guoqi Li
Shanghang Zhang
Lei Ma
LRM
49
3
0
29 Oct 2024
Preserving Pre-trained Representation Space: On Effectiveness of Prefix-tuning for Large Multi-modal Models
Donghoon Kim
Gusang Lee
Kyuhong Shim
B. Shim
62
1
0
29 Oct 2024
Random Policy Enables In-Context Reinforcement Learning within Trust Horizons
Weiqin Chen
Santiago Paternain
OffRL
42
0
0
25 Oct 2024
ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning
Zhiwei Hao
Jianyuan Guo
Li Shen
Yong Luo
Han Hu
Yonggang Wen
VLM
26
0
0
23 Oct 2024
TIPS: Text-Image Pretraining with Spatial awareness
Kevis-Kokitsi Maninis
Kaifeng Chen
Soham Ghosh
Arjun Karpur
Koert Chen
...
Jan Dlabal
Dan Gnanapragasam
Mojtaba Seyedhosseini
Howard Zhou
Andre Araujo
VLM
35
3
0
21 Oct 2024
A Survey on Data Synthesis and Augmentation for Large Language Models
Ke Wang
Jiahui Zhu
Minjie Ren
Ziqiang Liu
Shiwei Li
...
Yiming Lei
Xiaoyu Wu
Qiqi Zhan
Qingjie Liu
Yunhong Wang
SyDa
42
18
0
16 Oct 2024
FTII-Bench: A Comprehensive Multimodal Benchmark for Flow Text with Image Insertion
Jiacheng Ruan
Yebin Yang
Zehao Lin
Zhiyu Li
Zeyun Tang
Zhiyu Li
Zhiyu Li
VLM
42
3
0
16 Oct 2024
MoTE: Reconciling Generalization with Specialization for Visual-Language to Video Knowledge Transfer
Minghao Zhu
Zhengpu Wang
Mengxian Hu
Ronghao Dang
Xiao Lin
Xun Zhou
Chengju Liu
Qijun Chen
45
1
0
14 Oct 2024
Continual Learning Improves Zero-Shot Action Recognition
Shreyank N. Gowda
Davide Moltisanti
Laura Sevilla-Lara
BDL
VLM
CLL
35
1
0
14 Oct 2024
Calibrated Cache Model for Few-Shot Vision-Language Model Adaptation
Kun Ding
Qiang Yu
Haojian Zhang
Gaofeng Meng
Shiming Xiang
VLM
32
0
0
11 Oct 2024
MedImageInsight: An Open-Source Embedding Model for General Domain Medical Imaging
Noel C. F. Codella
Ying Jin
Shrey Jain
Yu Gu
Ho Hin Lee
...
Lei Li
Thomas Lin
Ivan Tarapov
M. Lungren
Mu-Hsin Wei
LM&MA
VLM
MedIm
48
8
0
09 Oct 2024
CASA: Class-Agnostic Shared Attributes in Vision-Language Models for Efficient Incremental Object Detection
Mingyi Guo
Yuyang Liu
Zongying Lin
Peixi Peng
Yonghong Tian
Yonghong Tian
VLM
35
0
0
08 Oct 2024
Uncertainty-Guided Enhancement on Driving Perception System via Foundation Models
Yunhao Yang
Yuxin Hu
Mao Ye
Zaiwei Zhang
Zhichao Lu
Yi Xu
Ufuk Topcu
Ben Snyder
26
2
0
02 Oct 2024
Rethinking Misalignment in Vision-Language Model Adaptation from a Causal Perspective
Yanan Zhang
Jiangmeng Li
Lixiang Liu
Wenwen Qiang
VLM
29
1
0
01 Oct 2024
FAST: A Dual-tier Few-Shot Learning Paradigm for Whole Slide Image Classification
Kexue Fu
Xiaoyuan Luo
Linhao Qu
Shuo Wang
Ying Xiong
Ilias Maglogiannis
Longxiang Gao
Manning Wang
31
1
0
29 Sep 2024
How Effective is Pre-training of Large Masked Autoencoders for Downstream Earth Observation Tasks?
Jose Sosa
Mohamed Aloulou
Danila Rukhovich
Rim Sleimi
Boonyarit Changaival
Anis Kacem
Djamila Aouada
40
0
0
27 Sep 2024
Surveying the MLLM Landscape: A Meta-Review of Current Surveys
Ming Li
Keyu Chen
Ziqian Bi
Ming Liu
Benji Peng
...
Jinlang Wang
Sen Zhang
X. Pan
Jiawei Xu
Pohsun Feng
OffRL
54
2
0
17 Sep 2024
VidLPRO: A
V
i
d
‾
\underline{Vid}
Vi
d
eo-
L
‾
\underline{L}
L
anguage
P
‾
\underline{P}
P
re-training Framework for
R
o
‾
\underline{Ro}
R
o
botic and Laparoscopic Surgery
Mohammadmahdi Honarmand
Muhammad Abdullah Jamal
Omid Mohareri
63
1
0
07 Sep 2024
Geospatial foundation models for image analysis: evaluating and enhancing NASA-IBM Prithvi's domain adaptability
Chia-Yu Hsu
Wenwen Li
Sizhe Wang
42
12
0
31 Aug 2024
Spatio-Temporal Context Prompting for Zero-Shot Action Detection
Wei-Jhe Huang
Min-Hung Chen
Shang-Hong Lai
40
0
0
28 Aug 2024
Pixels to Prose: Understanding the art of Image Captioning
Hrishikesh Singh
Aarti Sharma
Millie Pant
3DV
VLM
25
0
0
28 Aug 2024
HPT++: Hierarchically Prompting Vision-Language Models with Multi-Granularity Knowledge Generation and Improved Structure Modeling
Yubin Wang
Xinyang Jiang
De Cheng
Wenli Sun
Dongsheng Li
Cairong Zhao
VLM
48
0
0
27 Aug 2024
Cross-Domain Foundation Model Adaptation: Pioneering Computer Vision Models for Geophysical Data Analysis
Zhixiang Guo
Xinming Wu
Luming Liang
Hanlin Sheng
Nuo Chen
Zhengfa Bi
AI4CE
57
1
0
22 Aug 2024
VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech Processing
Chunyu Qiang
Wang Geng
Yi Zhao
Ruibo Fu
Tao Wang
...
Chen Zhang
Hao Che
Longbiao Wang
Jianwu Dang
Jianhua Tao
AI4TS
41
0
0
11 Aug 2024
In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation
Dahyun Kang
Minsu Cho
ObjD
VLM
40
9
0
09 Aug 2024
Actra: Optimized Transformer Architecture for Vision-Language-Action Models in Robot Learning
Yueen Ma
Dafeng Chi
Shiguang Wu
Yuecheng Liu
Yuzheng Zhuang
Jianye Hao
Irwin King
44
5
0
02 Aug 2024
Text-Guided Video Masked Autoencoder
D. Fan
Jue Wang
Shuai Liao
Zhikang Zhang
Vimal Bhat
Xinyu Li
VGen
36
3
0
01 Aug 2024
Look Hear: Gaze Prediction for Speech-directed Human Attention
Sounak Mondal
Seoyoung Ahn
Zhibo Yang
Niranjan Balasubramanian
Dimitris Samaras
G. Zelinsky
Minh Hoai
47
1
0
28 Jul 2024
PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects
Junyi Li
Junfeng Wu
Weizhi Zhao
Song Bai
Xiang Bai
41
1
0
23 Jul 2024
Assessing Brittleness of Image-Text Retrieval Benchmarks from Vision-Language Models Perspective
Mariya Hendriksen
Shuo Zhang
R. Reinanda
Mohamed Yahya
Edgar Meij
Maarten de Rijke
56
0
0
21 Jul 2024
Distilling Vision-Language Foundation Models: A Data-Free Approach via Prompt Diversification
Yunyi Xuan
Weijie Chen
Shicai Yang
Di Xie
Luojun Lin
Yueting Zhuang
VLM
40
4
0
21 Jul 2024
Open-World Visual Reasoning by a Neuro-Symbolic Program of Zero-Shot Symbols
Gertjan J. Burghouts
Fieke Hillerstrom
Erwin Walraven
M. V. Bekkum
Frank Ruis
J. Sijs
Jelle van Mil
Judith Dijk
NAI
32
1
0
18 Jul 2024
CoAPT: Context Attribute words for Prompt Tuning
Gun Lee
Subin An
Sungyong Baik
Soochahn Lee
VPVLM
VLM
35
1
0
18 Jul 2024
ClearCLIP: Decomposing CLIP Representations for Dense Vision-Language Inference
Mengcheng Lan
Chaofeng Chen
Yiping Ke
Xinjiang Wang
Xue Jiang
Wayne Zhang
VLM
50
24
0
17 Jul 2024
Unconstrained Open Vocabulary Image Classification: Zero-Shot Transfer from Text to Image via CLIP Inversion
Philipp Allgeuer
Kyra Ahrens
Stefan Wermter
CLIP
VLM
35
3
0
15 Jul 2024
Quantized Prompt for Efficient Generalization of Vision-Language Models
Tianxiang Hao
Xiaohan Ding
Juexiao Feng
Yuhong Yang
Hui Chen
Guiguang Ding
VLM
MQ
32
5
0
15 Jul 2024
Data Adaptive Traceback for Vision-Language Foundation Models in Image Classification
Wenshuo Peng
Kaipeng Zhang
Yue Yang
Hao Zhang
Ping Luo
VLM
34
2
0
11 Jul 2024
NODE-Adapter: Neural Ordinary Differential Equations for Better Vision-Language Reasoning
Yi Zhang
Chun-Wun Cheng
Ke Yu
Zhihai He
Carola-Bibiane Schonlieb
Angelica I Aviles-Rivero
VLM
55
2
0
11 Jul 2024
Enhancing Robustness of Vision-Language Models through Orthogonality Learning and Cross-Regularization
Jinlong Li
Zequn Jie
Elisa Ricci
Lin Ma
N. Sebe
VLM
39
0
0
11 Jul 2024
Foundation Model Engineering: Engineering Foundation Models Just as Engineering Software
Dezhi Ran
Mengzhou Wu
Wei Yang
Tao Xie
AI4CE
39
1
0
11 Jul 2024
iiANET: Inception Inspired Attention Hybrid Network for efficient Long-Range Dependency
Haruna Yunusa
Qin Shiyin
Abdulrahman Hamman Adama Chukkol
Isah Bello
A. Lawan
Isah Bello
46
4
0
10 Jul 2024
Previous
1
2
3
4
5
...
12
13
14
Next