ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.11432
  4. Cited By
Florence: A New Foundation Model for Computer Vision

Florence: A New Foundation Model for Computer Vision

22 November 2021
Lu Yuan
Dongdong Chen
Yi-Ling Chen
Noel Codella
Xiyang Dai
Jianfeng Gao
Houdong Hu
Xuedong Huang
Boxin Li
Chunyuan Li
Ce Liu
Mengchen Liu
Zicheng Liu
Yumao Lu
Yu Shi
Lijuan Wang
Jianfeng Wang
Bin Xiao
Zhen Xiao
Jianwei Yang
Michael Zeng
Luowei Zhou
Pengchuan Zhang
    VLM
ArXiv (abs)PDFHTML

Papers citing "Florence: A New Foundation Model for Computer Vision"

50 / 668 papers shown
Title
Fine-Tuning Games: Bargaining and Adaptation for General-Purpose Models
Fine-Tuning Games: Bargaining and Adaptation for General-Purpose Models
Benjamin Laufer
Jon M. Kleinberg
Hoda Heidari
161
11
0
03 Jan 2025
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
Pinelopi Papalampidi
Skanda Koppula
Shreya Pathak
Justin T Chiu
Joseph Heyward
Viorica Patraucean
Jiajun Shen
Antoine Miech
Andrew Zisserman
Aida Nematzdeh
VLM
134
26
0
31 Dec 2024
Improving Generated and Retrieved Knowledge Combination Through
  Zero-shot Generation
Improving Generated and Retrieved Knowledge Combination Through Zero-shot Generation
Xinkai Du
Quanjie Han
Chao Lv
Yi Liu
Yalin Sun
Hao Shu
Hongbo Shan
Maosong Sun
RALM
152
2
0
25 Dec 2024
Towards Graph Foundation Models: Learning Generalities Across Graphs via Task-Trees
Towards Graph Foundation Models: Learning Generalities Across Graphs via Task-Trees
Zehong Wang
Zheyuan Zhang
Tianyi Ma
Nitesh Chawla
Chuxu Zhang
Yanfang Ye
AI4CE
152
0
0
21 Dec 2024
Do Language Models Understand Time?
Do Language Models Understand Time?
Xi Ding
Lei Wang
358
2
0
18 Dec 2024
Bringing Multimodality to Amazon Visual Search System
Bringing Multimodality to Amazon Visual Search System
Xinliang Zhu
Michael Huang
Han Ding
Jinyu Yang
Kelvin Chen
...
Son Dinh Tran
Benjamin Z. Yao
Doug Gray
Anuj Bindal
Arnab Dhua
125
3
0
17 Dec 2024
Beyond Accuracy: On the Effects of Fine-tuning Towards Vision-Language Model's Prediction Rationality
Beyond Accuracy: On the Effects of Fine-tuning Towards Vision-Language Model's Prediction Rationality
Qitong Wang
Tang Li
Kien X. Nguyen
Xi Peng
187
0
0
17 Dec 2024
Retaining and Enhancing Pre-trained Knowledge in Vision-Language Models
  with Prompt Ensembling
Retaining and Enhancing Pre-trained Knowledge in Vision-Language Models with Prompt Ensembling
Donggeun Kim
Yujin Jo
Myungjoo Lee
Taesup Kim
VLM
110
1
0
10 Dec 2024
CLIP-PING: Boosting Lightweight Vision-Language Models with Proximus Intrinsic Neighbors Guidance
CLIP-PING: Boosting Lightweight Vision-Language Models with Proximus Intrinsic Neighbors Guidance
Chu Myaet Thwal
Ye Lin Tun
Minh N. H. Nguyen
Eui-nam Huh
Choong Seon Hong
VLM
174
0
0
05 Dec 2024
Exploring Large Vision-Language Models for Robust and Efficient
  Industrial Anomaly Detection
Exploring Large Vision-Language Models for Robust and Efficient Industrial Anomaly Detection
Kun Qian
Tianyu Sun
Wenhong Wang
115
0
0
01 Dec 2024
EDTformer: An Efficient Decoder Transformer for Visual Place Recognition
EDTformer: An Efficient Decoder Transformer for Visual Place Recognition
Tong Jin
Feng Lu
Shuyu Hu
Chun Yuan
Yunpeng Liu
ViT
177
0
0
01 Dec 2024
Advancing Myopia To Holism: Fully Contrastive Language-Image
  Pre-training
Advancing Myopia To Holism: Fully Contrastive Language-Image Pre-training
Haicheng Wang
Chen Ju
Weixiong Lin
Shuai Xiao
Mengting Chen
...
Mingshuai Yao
Jinsong Lan
Ying Chen
Qingwen Liu
Yanfeng Wang
VLMCLIP
133
4
0
30 Nov 2024
CLIP meets DINO for Tuning Zero-Shot Classifier using Unlabeled Image Collections
CLIP meets DINO for Tuning Zero-Shot Classifier using Unlabeled Image Collections
Mohamed Fazli Mohamed Imam
Rufael Fedaku Marew
Jameel Hassan
Mustansar Fiaz
Alham Fikri Aji
Hisham Cholakkal
VLM
548
1
0
28 Nov 2024
ResCLIP: Residual Attention for Training-free Dense Vision-language
  Inference
ResCLIP: Residual Attention for Training-free Dense Vision-language Inference
Yuhang Yang
Jinhong Deng
Wen Li
Lixin Duan
VLM
120
1
0
24 Nov 2024
Probe-Me-Not: Protecting Pre-trained Encoders from Malicious Probing
Probe-Me-Not: Protecting Pre-trained Encoders from Malicious Probing
Ruyi Ding
Tong Zhou
Lili Su
A. A. Ding
Xiaolin Xu
Yunsi Fei
AAML
163
2
0
19 Nov 2024
Efficient Transfer Learning for Video-language Foundation Models
Haoxing Chen
Zizheng Huang
Y. Hong
Yanshuo Wang
Zhongcai Lyu
Zhuoer Xu
Jun Lan
Zhangxuan Gu
VLM
123
0
0
18 Nov 2024
CorrCLIP: Reconstructing Patch Correlations in CLIP for Open-Vocabulary Semantic Segmentation
CorrCLIP: Reconstructing Patch Correlations in CLIP for Open-Vocabulary Semantic Segmentation
Dengke Zhang
Fagui Liu
Quan Tang
VLM
172
2
0
15 Nov 2024
GFT: Graph Foundation Model with Transferable Tree Vocabulary
GFT: Graph Foundation Model with Transferable Tree Vocabulary
Zehong Wang
Zheyuan Zhang
Nitesh Chawla
Chuxu Zhang
Yanfang Ye
105
20
0
09 Nov 2024
Aggregate-and-Adapt Natural Language Prompts for Downstream
  Generalization of CLIP
Aggregate-and-Adapt Natural Language Prompts for Downstream Generalization of CLIP
Chen Huang
Skyler Seto
Samira Abnar
David Grangier
Navdeep Jaitly
J. Susskind
VLM
83
1
0
31 Oct 2024
EchoFM: Foundation Model for Generalizable Echocardiogram Analysis
EchoFM: Foundation Model for Generalizable Echocardiogram Analysis
Sekeun Kim
Pengfei Jin
S. Song
Cheng Chen
Yiwei Li
Hui Ren
Xiang Li
Tianming Liu
Quanzheng Li
116
0
0
30 Oct 2024
CLIPErase: Efficient Unlearning of Visual-Textual Associations in CLIP
CLIPErase: Efficient Unlearning of Visual-Textual Associations in CLIP
Tianyu Yang
Lisen Dai
Zheyuan Liu
Minhao Cheng
Meng Jiang
Yapeng Tian
VLMMU
110
5
0
30 Oct 2024
Towards Unifying Understanding and Generation in the Era of Vision
  Foundation Models: A Survey from the Autoregression Perspective
Towards Unifying Understanding and Generation in the Era of Vision Foundation Models: A Survey from the Autoregression Perspective
Shenghao Xie
Wenqiang Zu
Mingyang Zhao
Duo Su
Shilong Liu
Ruohua Shi
Guoqi Li
Shanghang Zhang
Lei Ma
LRM
167
3
0
29 Oct 2024
Preserving Pre-trained Representation Space: On Effectiveness of
  Prefix-tuning for Large Multi-modal Models
Preserving Pre-trained Representation Space: On Effectiveness of Prefix-tuning for Large Multi-modal Models
Donghoon Kim
Gusang Lee
Kyuhong Shim
B. Shim
104
1
0
29 Oct 2024
Random Policy Enables In-Context Reinforcement Learning within Trust Horizons
Random Policy Enables In-Context Reinforcement Learning within Trust Horizons
Weiqin Chen
Santiago Paternain
OffRL
78
0
0
25 Oct 2024
ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language
  Tuning
ADEM-VL: Adaptive and Embedded Fusion for Efficient Vision-Language Tuning
Zhiwei Hao
Jianyuan Guo
Li Shen
Yong Luo
Han Hu
Yonggang Wen
VLM
102
0
0
23 Oct 2024
TIPS: Text-Image Pretraining with Spatial awareness
TIPS: Text-Image Pretraining with Spatial awareness
Kevis-Kokitsi Maninis
Kaifeng Chen
Soham Ghosh
Arjun Karpur
Koert Chen
...
Jan Dlabal
Dan Gnanapragasam
Mojtaba Seyedhosseini
Howard Zhou
Andre Araujo
VLM
145
3
0
21 Oct 2024
A Survey on Data Synthesis and Augmentation for Large Language Models
A Survey on Data Synthesis and Augmentation for Large Language Models
Ke Wang
Jiahui Zhu
Minjie Ren
Ziqiang Liu
Shiwei Li
...
Yiming Lei
Xiaoyu Wu
Qiqi Zhan
Qingjie Liu
Yunhong Wang
SyDa
186
21
0
16 Oct 2024
FTII-Bench: A Comprehensive Multimodal Benchmark for Flow Text with
  Image Insertion
FTII-Bench: A Comprehensive Multimodal Benchmark for Flow Text with Image Insertion
Jiacheng Ruan
Yebin Yang
Zehao Lin
Feiyu Xiong
Zeyun Tang
Zhiyu Li
Zhiyu Li
VLM
87
3
0
16 Oct 2024
MoTE: Reconciling Generalization with Specialization for Visual-Language
  to Video Knowledge Transfer
MoTE: Reconciling Generalization with Specialization for Visual-Language to Video Knowledge Transfer
Minghao Zhu
Zhengpu Wang
Mengxian Hu
Ronghao Dang
Xiao Lin
Xun Zhou
Chengju Liu
Qijun Chen
81
1
0
14 Oct 2024
Continual Learning Improves Zero-Shot Action Recognition
Continual Learning Improves Zero-Shot Action Recognition
Shreyank N. Gowda
Davide Moltisanti
Laura Sevilla-Lara
BDLVLMCLL
140
1
0
14 Oct 2024
Calibrated Cache Model for Few-Shot Vision-Language Model Adaptation
Calibrated Cache Model for Few-Shot Vision-Language Model Adaptation
Kun Ding
Qiang Yu
Haojian Zhang
Gaofeng Meng
Shiming Xiang
VLM
69
0
0
11 Oct 2024
MedImageInsight: An Open-Source Embedding Model for General Domain
  Medical Imaging
MedImageInsight: An Open-Source Embedding Model for General Domain Medical Imaging
Noel C. F. Codella
Ying Jin
Shrey Jain
Yu Gu
Ho Hin Lee
...
Lei Li
Thomas Lin
Ivan Tarapov
M. Lungren
Mu-Hsin Wei
LM&MAVLMMedIm
93
9
0
09 Oct 2024
CASA: Class-Agnostic Shared Attributes in Vision-Language Models for Efficient Incremental Object Detection
CASA: Class-Agnostic Shared Attributes in Vision-Language Models for Efficient Incremental Object Detection
Mingyi Guo
Yuyang Liu
Zongying Lin
Peixi Peng
Yonghong Tian
Yonghong Tian
VLM
146
0
0
08 Oct 2024
Uncertainty-Guided Enhancement on Driving Perception System via
  Foundation Models
Uncertainty-Guided Enhancement on Driving Perception System via Foundation Models
Yunhao Yang
Yuxin Hu
Mao Ye
Zaiwei Zhang
Zhichao Lu
Yi Xu
Ufuk Topcu
Ben Snyder
85
2
0
02 Oct 2024
Rethinking Misalignment in Vision-Language Model Adaptation from a
  Causal Perspective
Rethinking Misalignment in Vision-Language Model Adaptation from a Causal Perspective
Yanan Zhang
Jiangmeng Li
Lixiang Liu
Jingyao Wang
VLM
137
4
0
01 Oct 2024
FAST: A Dual-tier Few-Shot Learning Paradigm for Whole Slide Image
  Classification
FAST: A Dual-tier Few-Shot Learning Paradigm for Whole Slide Image Classification
Kexue Fu
Xiaoyuan Luo
Linhao Qu
Shuo Wang
Ying Xiong
Ilias Maglogiannis
Longxiang Gao
Manning Wang
67
2
0
29 Sep 2024
How Effective is Pre-training of Large Masked Autoencoders for
  Downstream Earth Observation Tasks?
How Effective is Pre-training of Large Masked Autoencoders for Downstream Earth Observation Tasks?
Jose Sosa
Mohamed Aloulou
Danila Rukhovich
Rim Sleimi
Boonyarit Changaival
Anis Kacem
Djamila Aouada
80
2
0
27 Sep 2024
Surveying the MLLM Landscape: A Meta-Review of Current Surveys
Surveying the MLLM Landscape: A Meta-Review of Current Surveys
Ming Li
Keyu Chen
Ziqian Bi
Ming Liu
Benji Peng
...
Jinlang Wang
Sen Zhang
X. Pan
Jiawei Xu
Pohsun Feng
OffRL
128
2
0
17 Sep 2024
VidLPRO: A $\underline{Vid}$eo-$\underline{L}$anguage
  $\underline{P}$re-training Framework for $\underline{Ro}$botic and
  Laparoscopic Surgery
VidLPRO: A Vid‾\underline{Vid}Vid​eo-L‾\underline{L}L​anguage P‾\underline{P}P​re-training Framework for Ro‾\underline{Ro}Ro​botic and Laparoscopic Surgery
Mohammadmahdi Honarmand
Muhammad Abdullah Jamal
Omid Mohareri
151
2
0
07 Sep 2024
Geospatial foundation models for image analysis: evaluating and
  enhancing NASA-IBM Prithvi's domain adaptability
Geospatial foundation models for image analysis: evaluating and enhancing NASA-IBM Prithvi's domain adaptability
Chia-Yu Hsu
Wenwen Li
Sizhe Wang
93
14
0
31 Aug 2024
Spatio-Temporal Context Prompting for Zero-Shot Action Detection
Spatio-Temporal Context Prompting for Zero-Shot Action Detection
Wei-Jhe Huang
Min-Hung Chen
Shang-Hong Lai
75
0
0
28 Aug 2024
Pixels to Prose: Understanding the art of Image Captioning
Pixels to Prose: Understanding the art of Image Captioning
Hrishikesh Singh
Aarti Sharma
Millie Pant
3DVVLM
90
1
0
28 Aug 2024
HPT++: Hierarchically Prompting Vision-Language Models with
  Multi-Granularity Knowledge Generation and Improved Structure Modeling
HPT++: Hierarchically Prompting Vision-Language Models with Multi-Granularity Knowledge Generation and Improved Structure Modeling
Yubin Wang
Xinyang Jiang
De Cheng
Wenli Sun
Dongsheng Li
Cairong Zhao
VLM
108
0
0
27 Aug 2024
Cross-Domain Foundation Model Adaptation: Pioneering Computer Vision
  Models for Geophysical Data Analysis
Cross-Domain Foundation Model Adaptation: Pioneering Computer Vision Models for Geophysical Data Analysis
Zhixiang Guo
Xinming Wu
Luming Liang
Hanlin Sheng
Nuo Chen
Zhengfa Bi
AI4CE
105
4
0
22 Aug 2024
VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech Processing
VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech Processing
Chunyu Qiang
Wang Geng
Yi Zhao
Ruibo Fu
Tao Wang
...
Chen Zhang
Hao Che
L. Wang
Jianwu Dang
J. Tao
AI4TS
105
0
0
11 Aug 2024
In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic
  Segmentation
In Defense of Lazy Visual Grounding for Open-Vocabulary Semantic Segmentation
Dahyun Kang
Minsu Cho
ObjDVLM
140
11
0
09 Aug 2024
Actra: Optimized Transformer Architecture for Vision-Language-Action
  Models in Robot Learning
Actra: Optimized Transformer Architecture for Vision-Language-Action Models in Robot Learning
Yueen Ma
Dafeng Chi
Shiguang Wu
Yuecheng Liu
Yuzheng Zhuang
Jianye Hao
Irwin King
71
5
0
02 Aug 2024
Text-Guided Video Masked Autoencoder
Text-Guided Video Masked Autoencoder
D. Fan
Jue Wang
Shuai Liao
Zhikang Zhang
Vimal Bhat
Xinyu Li
VGen
62
3
0
01 Aug 2024
Look Hear: Gaze Prediction for Speech-directed Human Attention
Look Hear: Gaze Prediction for Speech-directed Human Attention
Sounak Mondal
Seoyoung Ahn
Zhibo Yang
Niranjan Balasubramanian
Dimitris Samaras
G. Zelinsky
Minh Hoai
117
2
0
28 Jul 2024
PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects
PartGLEE: A Foundation Model for Recognizing and Parsing Any Objects
Junyi Li
Junfeng Wu
Weizhi Zhao
Song Bai
Xiang Bai
101
3
0
23 Jul 2024
Previous
12345...121314
Next