ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.11432
  4. Cited By
Florence: A New Foundation Model for Computer Vision

Florence: A New Foundation Model for Computer Vision

22 November 2021
Lu Yuan
Dongdong Chen
Yi-Ling Chen
Noel Codella
Xiyang Dai
Jianfeng Gao
Houdong Hu
Xuedong Huang
Boxin Li
Chunyuan Li
Ce Liu
Mengchen Liu
Zicheng Liu
Yumao Lu
Yu Shi
Lijuan Wang
Jianfeng Wang
Bin Xiao
Zhen Xiao
Jianwei Yang
Michael Zeng
Luowei Zhou
Pengchuan Zhang
    VLM
ArXivPDFHTML

Papers citing "Florence: A New Foundation Model for Computer Vision"

50 / 664 papers shown
Title
Steps towards prompt-based creation of virtual worlds
Steps towards prompt-based creation of virtual worlds
Jasmine Roberts
Andrzej Banburski-Fahey
J. Lanier
14
13
0
10 Nov 2022
InternImage: Exploring Large-Scale Vision Foundation Models with
  Deformable Convolutions
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
Wenhai Wang
Jifeng Dai
Zhe Chen
Zhenhang Huang
Zhiqi Li
...
Tong Lu
Lewei Lu
Hongsheng Li
Xiaogang Wang
Yu Qiao
VLM
47
660
0
10 Nov 2022
ERNIE-UniX2: A Unified Cross-lingual Cross-modal Framework for
  Understanding and Generation
ERNIE-UniX2: A Unified Cross-lingual Cross-modal Framework for Understanding and Generation
Bin Shan
Yaqian Han
Weichong Yin
Shuohuan Wang
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
MLLM
VLM
24
7
0
09 Nov 2022
Deep Learning based Computer Vision Methods for Complex Traffic
  Environments Perception: A Review
Deep Learning based Computer Vision Methods for Complex Traffic Environments Perception: A Review
Talha Azfar
Jinlong Li
Hongkai Yu
R. Cheu
Yisheng Lv
Ruimin Ke
38
21
0
09 Nov 2022
State-of-the-art Models for Object Detection in Various Fields of
  Application
State-of-the-art Models for Object Detection in Various Fields of Application
S. A. G. Naqvi
Syed Shahnawaz Ali
ObjD
OOD
38
0
0
01 Nov 2022
Open-vocabulary Semantic Segmentation with Frozen Vision-Language Models
Open-vocabulary Semantic Segmentation with Frozen Vision-Language Models
Chaofan Ma
Yu-Hao Yang
Yanfeng Wang
Ya Zhang
Weidi Xie
VLM
31
48
0
27 Oct 2022
M$^3$ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task
  Learning with Model-Accelerator Co-design
M3^33ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design
Hanxue Liang
Zhiwen Fan
Rishov Sarkar
Ziyu Jiang
Tianlong Chen
Kai Zou
Yu Cheng
Cong Hao
Zhangyang Wang
MoE
42
82
0
26 Oct 2022
Compressing And Debiasing Vision-Language Pre-Trained Models for Visual
  Question Answering
Compressing And Debiasing Vision-Language Pre-Trained Models for Visual Question Answering
Q. Si
Yuanxin Liu
Zheng Lin
Peng Fu
Weiping Wang
VLM
42
1
0
26 Oct 2022
CPL: Counterfactual Prompt Learning for Vision and Language Models
CPL: Counterfactual Prompt Learning for Vision and Language Models
Xuehai He
Diji Yang
Weixi Feng
Tsu-Jui Fu
Arjun Reddy Akula
Varun Jampani
P. Narayana
Sugato Basu
William Yang Wang
Junfeng Fang
VPVLM
VLM
52
15
0
19 Oct 2022
MedCLIP: Contrastive Learning from Unpaired Medical Images and Text
MedCLIP: Contrastive Learning from Unpaired Medical Images and Text
Zifeng Wang
Zhenbang Wu
Dinesh Agarwal
Jimeng Sun
CLIP
VLM
MedIm
49
402
0
18 Oct 2022
FIMP: Foundation Model-Informed Message Passing for Graph Neural
  Networks
FIMP: Foundation Model-Informed Message Passing for Graph Neural Networks
S. Rizvi
Nazreen Pallikkavaliyaveetil
David Zhang
Zhuoyang Lyu
Nhi Nguyen
...
Amin Karbasi
Rex Ying
Maria Brbić
Rahul M. Dhodapkar
David van Dijk
GNN
AI4CE
21
1
0
17 Oct 2022
Visual Classification via Description from Large Language Models
Visual Classification via Description from Large Language Models
Sachit Menon
Carl Vondrick
VLM
38
291
0
13 Oct 2022
SVL-Adapter: Self-Supervised Adapter for Vision-Language Pretrained
  Models
SVL-Adapter: Self-Supervised Adapter for Vision-Language Pretrained Models
Omiros Pantazis
Gabriel J. Brostow
Kate E. Jones
Oisin Mac Aodha
VLM
36
40
0
07 Oct 2022
MaPLe: Multi-modal Prompt Learning
MaPLe: Multi-modal Prompt Learning
Muhammad Uzair Khattak
H. Rasheed
Muhammad Maaz
Salman Khan
Fahad Shahbaz Khan
VPVLM
VLM
212
538
0
06 Oct 2022
CLIP model is an Efficient Continual Learner
CLIP model is an Efficient Continual Learner
Vishal Thengane
Salman Khan
Munawar Hayat
Fahad Shahbaz Khan
BDL
VLM
CLL
117
46
0
06 Oct 2022
VIMA: General Robot Manipulation with Multimodal Prompts
VIMA: General Robot Manipulation with Multimodal Prompts
Yunfan Jiang
Agrim Gupta
Zichen Zhang
Guanzhi Wang
Yongqiang Dou
Yanjun Chen
Li Fei-Fei
Anima Anandkumar
Yuke Zhu
Linxi Fan
LM&Ro
38
338
0
06 Oct 2022
Towards a Unified View on Visual Parameter-Efficient Transfer Learning
Towards a Unified View on Visual Parameter-Efficient Transfer Learning
Bruce X. B. Yu
Jianlong Chang
Lin Liu
Qi Tian
Changan Chen
VPVLM
VLM
68
34
0
03 Oct 2022
ERNIE-ViL 2.0: Multi-view Contrastive Learning for Image-Text
  Pre-training
ERNIE-ViL 2.0: Multi-view Contrastive Learning for Image-Text Pre-training
Bin Shan
Weichong Yin
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
VLM
27
19
0
30 Sep 2022
Linearly Mapping from Image to Text Space
Linearly Mapping from Image to Text Space
Jack Merullo
Louis Castricato
Carsten Eickhoff
Ellie Pavlick
VLM
170
106
0
30 Sep 2022
REST: REtrieve & Self-Train for generative action recognition
REST: REtrieve & Self-Train for generative action recognition
Adrian Bulat
Enrique Sanchez
Brais Martínez
Georgios Tzimiropoulos
VLM
29
4
0
29 Sep 2022
PACT: Perception-Action Causal Transformer for Autoregressive Robotics
  Pre-Training
PACT: Perception-Action Causal Transformer for Autoregressive Robotics Pre-Training
Rogerio Bonatti
Sai H. Vemprala
Shuang Ma
Felipe Vieira Frujeri
Shuhang Chen
Ashish Kapoor
39
22
0
22 Sep 2022
OmniVL:One Foundation Model for Image-Language and Video-Language Tasks
OmniVL:One Foundation Model for Image-Language and Video-Language Tasks
Junke Wang
Dongdong Chen
Zuxuan Wu
Chong Luo
Luowei Zhou
Yucheng Zhao
Yujia Xie
Ce Liu
Yu-Gang Jiang
Lu Yuan
MLLM
VLM
38
149
0
15 Sep 2022
Correlation Information Bottleneck: Towards Adapting Pretrained
  Multimodal Models for Robust Visual Question Answering
Correlation Information Bottleneck: Towards Adapting Pretrained Multimodal Models for Robust Visual Question Answering
Jingjing Jiang
Zi-yi Liu
Nanning Zheng
26
8
0
14 Sep 2022
PaLI: A Jointly-Scaled Multilingual Language-Image Model
PaLI: A Jointly-Scaled Multilingual Language-Image Model
Xi Chen
Tianlin Li
Soravit Changpinyo
A. Piergiovanni
Piotr Padlewski
...
Andreas Steiner
A. Angelova
Xiaohua Zhai
N. Houlsby
Radu Soricut
MLLM
VLM
37
688
0
14 Sep 2022
CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language
  Representation Alignment
CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment
Hongwei Xue
Yuchong Sun
Bei Liu
Jianlong Fu
Rui Song
Houqiang Li
Jiebo Luo
CLIP
VLM
25
68
0
14 Sep 2022
PreSTU: Pre-Training for Scene-Text Understanding
PreSTU: Pre-Training for Scene-Text Understanding
Jihyung Kil
Soravit Changpinyo
Xi Chen
Hexiang Hu
Sebastian Goodman
Wei-Lun Chao
Radu Soricut
VLM
142
29
0
12 Sep 2022
VL-Taboo: An Analysis of Attribute-based Zero-shot Capabilities of
  Vision-Language Models
VL-Taboo: An Analysis of Attribute-based Zero-shot Capabilities of Vision-Language Models
Felix Vogel
Nina Shvetsova
Leonid Karlinsky
Hilde Kuehne
VLM
63
7
0
12 Sep 2022
FETA: Towards Specializing Foundation Models for Expert Task
  Applications
FETA: Towards Specializing Foundation Models for Expert Task Applications
Amit Alfassy
Assaf Arbelle
Oshri Halimi
Sivan Harary
Roei Herzig
...
Christoph Auer
Kate Saenko
Peter W. J. Staar
Rogerio Feris
Leonid Karlinsky
23
19
0
08 Sep 2022
Statistical Foundation Behind Machine Learning and Its Impact on
  Computer Vision
Statistical Foundation Behind Machine Learning and Its Impact on Computer Vision
Lei Zhang
H. Shum
VLM
SSL
30
2
0
06 Sep 2022
Design of the topology for contrastive visual-textual alignment
Design of the topology for contrastive visual-textual alignment
Zhun Sun
32
1
0
05 Sep 2022
Diffusion Models: A Comprehensive Survey of Methods and Applications
Diffusion Models: A Comprehensive Survey of Methods and Applications
Ling Yang
Zhilong Zhang
Yingxia Shao
Shenda Hong
Runsheng Xu
Yue Zhao
Wentao Zhang
Bin Cui
Ming-Hsuan Yang
DiffM
MedIm
226
1,314
0
02 Sep 2022
Injecting Image Details into CLIP's Feature Space
Injecting Image Details into CLIP's Feature Space
Zilun Zhang
Cuifeng Shen
Yuan-Chung Shen
Huixin Xiong
Xinyu Zhou
VLM
CLIP
32
0
0
31 Aug 2022
Efficient Vision-Language Pretraining with Visual Concepts and
  Hierarchical Alignment
Efficient Vision-Language Pretraining with Visual Concepts and Hierarchical Alignment
Mustafa Shukor
Guillaume Couairon
Matthieu Cord
VLM
CLIP
24
27
0
29 Aug 2022
PromptFL: Let Federated Participants Cooperatively Learn Prompts Instead
  of Models -- Federated Learning in Age of Foundation Model
PromptFL: Let Federated Participants Cooperatively Learn Prompts Instead of Models -- Federated Learning in Age of Foundation Model
Tao Guo
Song Guo
Junxiao Wang
Wenchao Xu
FedML
VLM
LRM
24
110
0
24 Aug 2022
Image as a Foreign Language: BEiT Pretraining for All Vision and
  Vision-Language Tasks
Image as a Foreign Language: BEiT Pretraining for All Vision and Vision-Language Tasks
Wenhui Wang
Hangbo Bao
Li Dong
Johan Bjorck
Zhiliang Peng
...
Kriti Aggarwal
O. Mohammed
Saksham Singhal
Subhojit Som
Furu Wei
MLLM
VLM
ViT
75
629
0
22 Aug 2022
Deep is a Luxury We Don't Have
Deep is a Luxury We Don't Have
Ahmed Taha
Yen Nhi Truong Vu
Brent Mombourquette
Thomas P. Matthews
Jason Su
Sadanand Singh
ViT
MedIm
26
2
0
11 Aug 2022
Language-Guided Face Animation by Recurrent StyleGAN-based Generator
Language-Guided Face Animation by Recurrent StyleGAN-based Generator
Tiankai Hang
Huan Yang
Bei Liu
Jianlong Fu
Xin Geng
B. Guo
VGen
35
13
0
11 Aug 2022
Quality Not Quantity: On the Interaction between Dataset Design and
  Robustness of CLIP
Quality Not Quantity: On the Interaction between Dataset Design and Robustness of CLIP
Thao Nguyen
Gabriel Ilharco
Mitchell Wortsman
Sewoong Oh
Ludwig Schmidt
CLIP
VLM
53
99
0
10 Aug 2022
Advancing Plain Vision Transformer Towards Remote Sensing Foundation
  Model
Advancing Plain Vision Transformer Towards Remote Sensing Foundation Model
Di Wang
Qiming Zhang
Yufei Xu
Jing Zhang
Bo Du
Dacheng Tao
Lefei Zhang
36
243
0
08 Aug 2022
Frozen CLIP Models are Efficient Video Learners
Frozen CLIP Models are Efficient Video Learners
Ziyi Lin
Shijie Geng
Renrui Zhang
Peng Gao
Gerard de Melo
Xiaogang Wang
Jifeng Dai
Yu Qiao
Hongsheng Li
CLIP
VLM
23
200
0
06 Aug 2022
LATTE: LAnguage Trajectory TransformEr
LATTE: LAnguage Trajectory TransformEr
A. Bucker
Luis F. C. Figueredo
Sami Haddadin
Ashish Kapoor
Shuang Ma
Sai H. Vemprala
Rogerio Bonatti
LM&Ro
41
59
0
04 Aug 2022
Expanding Language-Image Pretrained Models for General Video Recognition
Expanding Language-Image Pretrained Models for General Video Recognition
Bolin Ni
Houwen Peng
Minghao Chen
Songyang Zhang
Gaofeng Meng
Jianlong Fu
Shiming Xiang
Haibin Ling
VLM
CLIP
ViT
40
314
0
04 Aug 2022
GPPF: A General Perception Pre-training Framework via Sparsely Activated
  Multi-Task Learning
GPPF: A General Perception Pre-training Framework via Sparsely Activated Multi-Task Learning
Benyuan Sun
Jinqiao Dai
Zihao Liang
Cong Liu
Yi Yang
Bo Bai
MoE
18
4
0
03 Aug 2022
Masked Vision and Language Modeling for Multi-modal Representation
  Learning
Masked Vision and Language Modeling for Multi-modal Representation Learning
Gukyeong Kwon
Zhaowei Cai
Avinash Ravichandran
Erhan Bas
Rahul Bhotika
Stefano Soatto
36
67
0
03 Aug 2022
PEA: Improving the Performance of ReLU Networks for Free by Using
  Progressive Ensemble Activations
PEA: Improving the Performance of ReLU Networks for Free by Using Progressive Ensemble Activations
Á. Utasi
35
0
0
28 Jul 2022
Learning Visual Representation from Modality-Shared Contrastive
  Language-Image Pre-training
Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training
Haoxuan You
Luowei Zhou
Bin Xiao
Noel Codella
Yu Cheng
Ruochen Xu
Shih-Fu Chang
Lu Yuan
CLIP
VLM
27
47
0
26 Jul 2022
Applying Spatiotemporal Attention to Identify Distracted and Drowsy
  Driving with Vision Transformers
Applying Spatiotemporal Attention to Identify Distracted and Drowsy Driving with Vision Transformers
Samay Lakhani
ViT
MedIm
21
1
0
22 Jul 2022
TinyViT: Fast Pretraining Distillation for Small Vision Transformers
TinyViT: Fast Pretraining Distillation for Small Vision Transformers
Kan Wu
Jinnian Zhang
Houwen Peng
Mengchen Liu
Bin Xiao
Jianlong Fu
Lu Yuan
ViT
21
248
0
21 Jul 2022
Progress and limitations of deep networks to recognize objects in
  unusual poses
Progress and limitations of deep networks to recognize objects in unusual poses
Amro Abbas
Stéphane Deny
OOD
AAML
15
18
0
16 Jul 2022
Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision
  and Language Models
Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision and Language Models
Rui Qian
Yeqing Li
Zheng Xu
Ming Yang
Serge Belongie
Huayu Chen
VLM
41
22
0
15 Jul 2022
Previous
123...11121314
Next