ResearchTrend.AI
  • Papers
  • Communities
  • Organizations
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.11432
  4. Cited By
Florence: A New Foundation Model for Computer Vision

Florence: A New Foundation Model for Computer Vision

22 November 2021
Lu Yuan
Dongdong Chen
Yi-Ling Chen
Noel Codella
Xiyang Dai
Jianfeng Gao
Houdong Hu
Xuedong Huang
Boxin Li
Chunyuan Li
Ce Liu
Mengchen Liu
Zicheng Liu
Yumao Lu
Yu Shi
Lijuan Wang
Jianfeng Wang
Bin Xiao
Zhen Xiao
Jianwei Yang
Michael Zeng
Luowei Zhou
Pengchuan Zhang
    VLM
ArXiv (abs)PDFHTML

Papers citing "Florence: A New Foundation Model for Computer Vision"

50 / 668 papers shown
Title
BadCLIP: Trigger-Aware Prompt Learning for Backdoor Attacks on CLIP
BadCLIP: Trigger-Aware Prompt Learning for Backdoor Attacks on CLIP
Jiawang Bai
Kuofeng Gao
Shaobo Min
Shu-Tao Xia
Zhifeng Li
Wei Liu
VLM
124
45
0
26 Nov 2023
Mug-STAN: Adapting Image-Language Pretrained Models for General Video
  Understanding
Mug-STAN: Adapting Image-Language Pretrained Models for General Video Understanding
Ruyang Liu
Jingjia Huang
Wei-Nan Gao
Thomas H. Li
Ge Li
VLM
116
3
0
25 Nov 2023
3D-MIR: A Benchmark and Empirical Study on 3D Medical Image Retrieval in
  Radiology
3D-MIR: A Benchmark and Empirical Study on 3D Medical Image Retrieval in Radiology
Asma Ben Abacha
Alberto Santamaría-Pang
Ho Hin Lee
J. Merkow
Qin Cai
...
Julia Gong
M. Lungren
Thomas Lin
Noel C. F. Codella
Ivan Tarapov
65
5
0
23 Nov 2023
Active Prompt Learning in Vision Language Models
Active Prompt Learning in Vision Language Models
Jihwan Bang
Sumyeong Ahn
Jae-Gil Lee
VLM
75
14
0
18 Nov 2023
Domain Aligned CLIP for Few-shot Classification
Domain Aligned CLIP for Few-shot Classification
Muhammad Waleed Gondal
Jochen Gast
Inigo Alonso Ruiz
Richard Droste
Tommaso Macri
Suren Kumar
Luitpold Staudigl
VLM
68
12
0
15 Nov 2023
Florence-2: Advancing a Unified Representation for a Variety of Vision
  Tasks
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Bin Xiao
Haiping Wu
Weijian Xu
Xiyang Dai
Houdong Hu
Yumao Lu
Michael Zeng
Ce Liu
Lu Yuan
VLM
141
175
0
10 Nov 2023
DAMEX: Dataset-aware Mixture-of-Experts for visual understanding of
  mixture-of-datasets
DAMEX: Dataset-aware Mixture-of-Experts for visual understanding of mixture-of-datasets
Yash Jain
Harkirat Singh Behl
Z. Kira
Vibhav Vineet
71
15
0
08 Nov 2023
Exploring Dataset-Scale Indicators of Data Quality
Exploring Dataset-Scale Indicators of Data Quality
Ben Feuer
Chinmay Hegde
79
1
0
07 Nov 2023
OVIR-3D: Open-Vocabulary 3D Instance Retrieval Without Training on 3D
  Data
OVIR-3D: Open-Vocabulary 3D Instance Retrieval Without Training on 3D Data
Shiyang Lu
Haonan Chang
E. Jing
Abdeslam Boularias
Kostas Bekris
93
58
0
06 Nov 2023
Adapting Segment Anything Model (SAM) through Prompt-based Learning for
  Enhanced Protein Identification in Cryo-EM Micrographs
Adapting Segment Anything Model (SAM) through Prompt-based Learning for Enhanced Protein Identification in Cryo-EM Micrographs
Fei He
Zhiyuan Yang
Mingyue Gao
Biplab Poudel
Newgin Sam Ebin Sam Dhas
Rajan Gyawali
Ashwin Dhakal
Jianlin Cheng
Dong Xu
69
4
0
04 Nov 2023
Align Your Prompts: Test-Time Prompting with Distribution Alignment for
  Zero-Shot Generalization
Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization
Jameel Hassan
Hanan Gani
Noor Hussein
Muhammad Uzair Khattak
Muzammal Naseer
Fahad Shahbaz Khan
Salman Khan
VLMOOD
141
68
0
02 Nov 2023
Multimodal Foundation Models for Zero-shot Animal Species Recognition in
  Camera Trap Images
Multimodal Foundation Models for Zero-shot Animal Species Recognition in Camera Trap Images
Zalan Fabian
Zhongqi Miao
Chunyuan Li
Yuanhan Zhang
Ziwei Liu
...
Laura Siabatto
Andrés Link
Pablo Arbelaez
Rahul Dodhia
J. L. Ferres
111
11
0
02 Nov 2023
fMRI-PTE: A Large-scale fMRI Pretrained Transformer Encoder for
  Multi-Subject Brain Activity Decoding
fMRI-PTE: A Large-scale fMRI Pretrained Transformer Encoder for Multi-Subject Brain Activity Decoding
Xuelin Qian
Yun Wang
Jingyang Huo
Jianfeng Feng
Yanwei Fu
MedIm
54
8
0
01 Nov 2023
From Image to Language: A Critical Analysis of Visual Question Answering
  (VQA) Approaches, Challenges, and Opportunities
From Image to Language: A Critical Analysis of Visual Question Answering (VQA) Approaches, Challenges, and Opportunities
Md Farhan Ishmam
Md Sakib Hossain Shovon
M. F. Mridha
Nilanjan Dey
169
44
0
01 Nov 2023
MM-VID: Advancing Video Understanding with GPT-4V(ision)
MM-VID: Advancing Video Understanding with GPT-4V(ision)
Kevin Qinghong Lin
Faisal Ahmed
Linjie Li
Chung-Ching Lin
E. Azarnasab
...
Lin Liang
Zicheng Liu
Yumao Lu
Ce Liu
Lijuan Wang
MLLM
86
65
0
30 Oct 2023
Foundation Models for Generalist Geospatial Artificial Intelligence
Foundation Models for Generalist Geospatial Artificial Intelligence
Johannes Jakubik
Sujit Roy
C. Phillips
P. Fraccaro
Denys Godwin
...
Hamed Alemohammad
M. Maskey
R. Ganti
Kommy Weldemariam
Rahul Ramachandran
AI4CEVLM
98
105
0
28 Oct 2023
CoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary
  Object Detection
CoDet: Co-Occurrence Guided Region-Word Alignment for Open-Vocabulary Object Detection
Chuofan Ma
Yi Jiang
Xin Wen
Zehuan Yuan
Xiaojuan Qi
ObjDVLM
92
50
0
25 Oct 2023
Videoprompter: an ensemble of foundational models for zero-shot video
  understanding
Videoprompter: an ensemble of foundational models for zero-shot video understanding
Adeel Yousaf
Muzammal Naseer
Salman Khan
Fahad Shahbaz Khan
Mubarak Shah
VLM
77
2
0
23 Oct 2023
Open-Set Image Tagging with Multi-Grained Text Supervision
Open-Set Image Tagging with Multi-Grained Text Supervision
Xinyu Huang
Yi-Jie Huang
Youcai Zhang
Weiwei Tian
Rui Feng
Yuejie Zhang
Yanchun Xie
Yaqian Li
Lei Zhang
VLM
87
35
0
23 Oct 2023
HairCLIPv2: Unifying Hair Editing via Proxy Feature Blending
HairCLIPv2: Unifying Hair Editing via Proxy Feature Blending
Tianyi Wei
DongDong Chen
Wenbo Zhou
Jing Liao
Weiming Zhang
Gang Hua
Neng H. Yu
73
13
0
16 Oct 2023
Black-box Targeted Adversarial Attack on Segment Anything (SAM)
Black-box Targeted Adversarial Attack on Segment Anything (SAM)
Sheng Zheng
Chaoning Zhang
Xinhong Hao
AAML
125
7
0
16 Oct 2023
VeCLIP: Improving CLIP Training via Visual-enriched Captions
VeCLIP: Improving CLIP Training via Visual-enriched Captions
Zhengfeng Lai
Haotian Zhang
Bowen Zhang
Wentao Wu
Haoping Bai
...
Zhe Gan
Jiulong Shan
Chen-Nee Chuah
Yinfei Yang
Meng Cao
CLIPVLM
119
31
0
11 Oct 2023
Heuristic Vision Pre-Training with Self-Supervised and Supervised
  Multi-Task Learning
Heuristic Vision Pre-Training with Self-Supervised and Supervised Multi-Task Learning
Zhiming Qian
VLMSSL
62
0
0
11 Oct 2023
IMITATE: Clinical Prior Guided Hierarchical Vision-Language Pre-training
IMITATE: Clinical Prior Guided Hierarchical Vision-Language Pre-training
Che Liu
Sibo Cheng
Miaojing Shi
Anand Shah
Wenjia Bai
Rossella Arcucci
94
27
0
11 Oct 2023
Lightweight In-Context Tuning for Multimodal Unified Models
Lightweight In-Context Tuning for Multimodal Unified Models
Yixin Chen
Shuai Zhang
Boran Han
Jiaya Jia
70
2
0
08 Oct 2023
Analyzing Zero-Shot Abilities of Vision-Language Models on Video
  Understanding Tasks
Analyzing Zero-Shot Abilities of Vision-Language Models on Video Understanding Tasks
Avinash Madasu
Anahita Bhiwandiwalla
Vasudev Lal
VLM
78
0
0
07 Oct 2023
Module-wise Adaptive Distillation for Multimodality Foundation Models
Module-wise Adaptive Distillation for Multimodality Foundation Models
Chen Liang
Jiahui Yu
Ming-Hsuan Yang
Matthew A. Brown
Huayu Chen
Tuo Zhao
Boqing Gong
Tianyi Zhou
111
10
0
06 Oct 2023
Toward a Foundation Model for Time Series Data
Toward a Foundation Model for Time Series Data
Chin-Chia Michael Yeh
Xin Dai
Huiyuan Chen
Yan Zheng
Yujie Fan
...
Vivian Lai
Zhongfang Zhuang
Junpeng Wang
Liang Wang
Wei Zhang
AI4TSAI4CE
159
26
0
05 Oct 2023
Talking Models: Distill Pre-trained Knowledge to Downstream Models via
  Interactive Communication
Talking Models: Distill Pre-trained Knowledge to Downstream Models via Interactive Communication
Zhe Zhao
Qingyun Liu
Huan Gui
Bang An
Lichan Hong
Ed H. Chi
79
1
0
04 Oct 2023
Reformulating Vision-Language Foundation Models and Datasets Towards
  Universal Multimodal Assistants
Reformulating Vision-Language Foundation Models and Datasets Towards Universal Multimodal Assistants
Tianyu Yu
Jinyi Hu
Yuan Yao
Haoye Zhang
Yue Zhao
...
Jiao Xue
Dahai Li
Zhiyuan Liu
Hai-Tao Zheng
Maosong Sun
VLMMLLM
50
20
0
01 Oct 2023
InstructCV: Instruction-Tuned Text-to-Image Diffusion Models as Vision
  Generalists
InstructCV: Instruction-Tuned Text-to-Image Diffusion Models as Vision Generalists
Yulu Gan
Sungwoo Park
Alexander Schubert
Anthony Philippakis
Ahmed Alaa
VLM
116
25
0
30 Sep 2023
Domain-Controlled Prompt Learning
Domain-Controlled Prompt Learning
Qinglong Cao
Zhengqin Xu
Yuantian Chen
Chao Ma
Xiaokang Yang
VLM
100
18
0
30 Sep 2023
Training a Large Video Model on a Single Machine in a Day
Training a Large Video Model on a Single Machine in a Day
Yue Zhao
Philipp Krahenbuhl
VLM
106
17
0
28 Sep 2023
FLIP: Cross-domain Face Anti-spoofing with Language Guidance
FLIP: Cross-domain Face Anti-spoofing with Language Guidance
K. Srivatsan
Muzammal Naseer
Karthik Nandakumar
CVBM
113
47
0
28 Sep 2023
End-to-End (Instance)-Image Goal Navigation through Correspondence as an
  Emergent Phenomenon
End-to-End (Instance)-Image Goal Navigation through Correspondence as an Emergent Phenomenon
G. Bono
L. Antsfeld
Boris Chidlovskii
Zhi Zheng
Christian Wolf
3DV
72
10
0
28 Sep 2023
AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language Models
AutoCLIP: Auto-tuning Zero-Shot Classifiers for Vision-Language Models
Sanghwan Kim
Hao Tang
Fisher Yu
VLMCLIP
73
5
0
28 Sep 2023
BT-Adapter: Video Conversation is Feasible Without Video Instruction
  Tuning
BT-Adapter: Video Conversation is Feasible Without Video Instruction Tuning
Ruyang Liu
Chen Li
Yixiao Ge
Ying Shan
Thomas H. Li
Ge Li
116
30
0
27 Sep 2023
Object-Centric Open-Vocabulary Image-Retrieval with Aggregated Features
Object-Centric Open-Vocabulary Image-Retrieval with Aggregated Features
Hila Levi
Guy Heller
Dan Levi
Ethan Fetaya
OCLVLM
74
4
0
26 Sep 2023
DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via
  Multi-Modal Causal Attention
DeepSpeed-VisualChat: Multi-Round Multi-Image Interleave Chat via Multi-Modal Causal Attention
Z. Yao
Xiaoxia Wu
Conglong Li
Minjia Zhang
Heyang Qi
Olatunji Ruwase
A. A. Awan
Samyam Rajbhandari
Yuxiong He
102
11
0
25 Sep 2023
MUTEX: Learning Unified Policies from Multimodal Task Specifications
MUTEX: Learning Unified Policies from Multimodal Task Specifications
Rutav Shah
Roberto Martín-Martín
Yuke Zhu
OffRL
103
58
0
25 Sep 2023
VidChapters-7M: Video Chapters at Scale
VidChapters-7M: Video Chapters at Scale
Antoine Yang
Arsha Nagrani
Ivan Laptev
Josef Sivic
Cordelia Schmid
VGen
102
28
0
25 Sep 2023
Multimodal Deep Learning for Scientific Imaging Interpretation
Multimodal Deep Learning for Scientific Imaging Interpretation
Abdulelah S. Alshehri
Franklin L. Lee
Shihu Wang
47
2
0
21 Sep 2023
TinyCLIP: CLIP Distillation via Affinity Mimicking and Weight
  Inheritance
TinyCLIP: CLIP Distillation via Affinity Mimicking and Weight Inheritance
Kan Wu
Houwen Peng
Zhenghong Zhou
Bin Xiao
Mengchen Liu
...
Xi
Xi Chen
Xinggang Wang
Hongyang Chao
Han Hu
VLMOODD
95
64
0
21 Sep 2023
Dataset Factory: A Toolchain For Generative Computer Vision Datasets
Dataset Factory: A Toolchain For Generative Computer Vision Datasets
Daniel Kharitonov
Ryan Turner
106
1
0
20 Sep 2023
Integrating Visual Foundation Models for Enhanced Robot Manipulation and
  Motion Planning: A Layered Approach
Integrating Visual Foundation Models for Enhanced Robot Manipulation and Motion Planning: A Layered Approach
Chenguang Yang
Peng Zhou
Jiaming Qi
34
9
0
20 Sep 2023
Efficient Pyramid Channel Attention Network for Pathological Myopia
  Recognition
Efficient Pyramid Channel Attention Network for Pathological Myopia Recognition
Xiaoqing Zhang
Jilu Zhao
Yan Li
Hao Wu
Xiangtian Zhou
Jiang Liu
60
1
0
17 Sep 2023
MA-SAM: Modality-agnostic SAM Adaptation for 3D Medical Image
  Segmentation
MA-SAM: Modality-agnostic SAM Adaptation for 3D Medical Image Segmentation
Cheng Chen
Juzheng Miao
Dufan Wu
Zhiling Yan
Sekeun Kim
...
Lichao Sun
Xiang Li
Tianming Liu
Pheng-Ann Heng
Quanzheng Li
MedIm
129
63
0
16 Sep 2023
Disentangling Spatial and Temporal Learning for Efficient Image-to-Video
  Transfer Learning
Disentangling Spatial and Temporal Learning for Efficient Image-to-Video Transfer Learning
Zhiwu Qing
Shiwei Zhang
Ziyuan Huang
Yingya Zhang
Changxin Gao
Deli Zhao
Nong Sang
86
20
0
14 Sep 2023
Zero-Shot Visual Classification with Guided Cropping
Zero-Shot Visual Classification with Guided Cropping
Piyapat Saranrittichai
Mauricio Muñoz
Volker Fischer
Chaithanya Kumar Mummadi
VLM
67
1
0
12 Sep 2023
Progressive Feature Adjustment for Semi-supervised Learning from
  Pretrained Models
Progressive Feature Adjustment for Semi-supervised Learning from Pretrained Models
Hai-Ming Xu
Lingqiao Liu
Hao Chen
Ehsan Abbasnejad
Rafael Felix
76
0
0
09 Sep 2023
Previous
123...567...121314
Next