ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.11432
  4. Cited By
Florence: A New Foundation Model for Computer Vision

Florence: A New Foundation Model for Computer Vision

22 November 2021
Lu Yuan
Dongdong Chen
Yi-Ling Chen
Noel Codella
Xiyang Dai
Jianfeng Gao
Houdong Hu
Xuedong Huang
Boxin Li
Chunyuan Li
Ce Liu
Mengchen Liu
Zicheng Liu
Yumao Lu
Yu Shi
Lijuan Wang
Jianfeng Wang
Bin Xiao
Zhen Xiao
Jianwei Yang
Michael Zeng
Luowei Zhou
Pengchuan Zhang
    VLM
ArXivPDFHTML

Papers citing "Florence: A New Foundation Model for Computer Vision"

50 / 664 papers shown
Title
Towards Grand Unification of Object Tracking
Towards Grand Unification of Object Tracking
B. Yan
Yi-Xin Jiang
Pei Sun
D. Wang
Zehuan Yuan
Ping Luo
Huchuan Lu
ViT
VOT
51
136
0
14 Jul 2022
Big Learning
Big Learning
Yulai Cong
Miaoyun Zhao
AI4CE
32
0
0
08 Jul 2022
Revisiting Classifier: Transferring Vision-Language Models for Video
  Recognition
Revisiting Classifier: Transferring Vision-Language Models for Video Recognition
Wenhao Wu
Zhun Sun
Wanli Ouyang
VLM
105
93
0
04 Jul 2022
Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
Jiasen Lu
Christopher Clark
Rowan Zellers
Roozbeh Mottaghi
Aniruddha Kembhavi
ObjD
VLM
MLLM
77
393
0
17 Jun 2022
Entity-Graph Enhanced Cross-Modal Pretraining for Instance-level Product
  Retrieval
Entity-Graph Enhanced Cross-Modal Pretraining for Instance-level Product Retrieval
Xiao Dong
Xunlin Zhan
Yunchao Wei
Xiaoyong Wei
Yaowei Wang
Minlong Lu
Xiaochun Cao
Xiaodan Liang
32
11
0
17 Jun 2022
MixGen: A New Multi-Modal Data Augmentation
MixGen: A New Multi-Modal Data Augmentation
Xiaoshuai Hao
Yi Zhu
Srikar Appalaraju
Aston Zhang
Wanqian Zhang
Boyang Li
Mu Li
VLM
25
83
0
16 Jun 2022
Write and Paint: Generative Vision-Language Models are Unified Modal
  Learners
Write and Paint: Generative Vision-Language Models are Unified Modal Learners
Shizhe Diao
Wangchunshu Zhou
Xinsong Zhang
Jiawei Wang
MLLM
AI4CE
29
16
0
15 Jun 2022
GLIPv2: Unifying Localization and Vision-Language Understanding
GLIPv2: Unifying Localization and Vision-Language Understanding
Haotian Zhang
Pengchuan Zhang
Xiaowei Hu
Yen-Chun Chen
Liunian Harold Li
Xiyang Dai
Lijuan Wang
Lu Yuan
Lei Li
Jianfeng Gao
ObjD
VLM
30
291
0
12 Jun 2022
Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional
  MoEs
Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs
Jinguo Zhu
Xizhou Zhu
Wenhai Wang
Xiaohua Wang
Hongsheng Li
Xiaogang Wang
Jifeng Dai
MoMe
MoE
39
66
0
09 Jun 2022
Neural Prompt Search
Neural Prompt Search
Yuanhan Zhang
Kaiyang Zhou
Ziwei Liu
VPVLM
VLM
55
145
0
09 Jun 2022
Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture
  of Experts
Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts
Basil Mustafa
C. Riquelme
J. Puigcerver
Rodolphe Jenatton
N. Houlsby
VLM
MoE
33
185
0
06 Jun 2022
Visual Clues: Bridging Vision and Language Foundations for Image
  Paragraph Captioning
Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning
Yujia Xie
Luowei Zhou
Xiyang Dai
Lu Yuan
Nguyen Bach
Ce Liu
Michael Zeng
VLM
MLLM
37
28
0
03 Jun 2022
REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual
  Question Answering
REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering
Yuanze Lin
Yujia Xie
Dongdong Chen
Yichong Xu
Chenguang Zhu
Lu Yuan
55
71
0
02 Jun 2022
An Efficient Modern Baseline for FloodNet VQA
An Efficient Modern Baseline for FloodNet VQA
Aditya Kane
Sahil Khose
39
4
0
30 May 2022
Generalizing Multimodal Pre-training into Multilingual via Language
  Acquisition
Generalizing Multimodal Pre-training into Multilingual via Language Acquisition
Liang Zhang
Anwen Hu
Qin Jin
VLM
33
5
0
29 May 2022
Contrastive Learning Rivals Masked Image Modeling in Fine-tuning via
  Feature Distillation
Contrastive Learning Rivals Masked Image Modeling in Fine-tuning via Feature Distillation
Yixuan Wei
Han Hu
Zhenda Xie
Zheng-Wei Zhang
Yue Cao
Jianmin Bao
Dong Chen
B. Guo
CLIP
88
124
0
27 May 2022
GIT: A Generative Image-to-text Transformer for Vision and Language
GIT: A Generative Image-to-text Transformer for Vision and Language
Jianfeng Wang
Zhengyuan Yang
Xiaowei Hu
Linjie Li
Kevin Qinghong Lin
Zhe Gan
Zicheng Liu
Ce Liu
Lijuan Wang
VLM
61
529
0
27 May 2022
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal
  Skip-connections
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections
Chenliang Li
Haiyang Xu
Junfeng Tian
Wei Wang
Ming Yan
...
Ji Zhang
Songfang Huang
Feiran Huang
Jingren Zhou
Luo Si
VLM
MLLM
36
213
0
24 May 2022
Language Models with Image Descriptors are Strong Few-Shot
  Video-Language Learners
Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
Zhenhailong Wang
Manling Li
Ruochen Xu
Luowei Zhou
Jie Lei
...
Chenguang Zhu
Derek Hoiem
Shih-Fu Chang
Joey Tianyi Zhou
Heng Ji
MLLM
VLM
170
138
0
22 May 2022
CoCa: Contrastive Captioners are Image-Text Foundation Models
CoCa: Contrastive Captioners are Image-Text Foundation Models
Jiahui Yu
Zirui Wang
Vijay Vasudevan
Legg Yeung
Mojtaba Seyedhosseini
Yonghui Wu
VLM
CLIP
OffRL
85
1,263
0
04 May 2022
All You May Need for VQA are Image Captions
All You May Need for VQA are Image Captions
Soravit Changpinyo
Doron Kukliansky
Idan Szpektor
Xi Chen
Nan Ding
Radu Soricut
32
70
0
04 May 2022
i-Code: An Integrative and Composable Multimodal Learning Framework
i-Code: An Integrative and Composable Multimodal Learning Framework
Ziyi Yang
Yuwei Fang
Chenguang Zhu
Reid Pryzant
DongDong Chen
...
Bin Xiao
Yuanxun Lu
Takuya Yoshioka
Michael Zeng
Xuedong Huang
40
45
0
03 May 2022
In Defense of Image Pre-Training for Spatiotemporal Recognition
In Defense of Image Pre-Training for Spatiotemporal Recognition
Xianhang Li
Huiyu Wang
Chen Wei
Jieru Mei
Alan Yuille
Yuyin Zhou
Cihang Xie
30
0
0
03 May 2022
CenterCLIP: Token Clustering for Efficient Text-Video Retrieval
CenterCLIP: Token Clustering for Efficient Text-Video Retrieval
Shuai Zhao
Linchao Zhu
Xiaohan Wang
Yi Yang
VLM
CLIP
20
112
0
02 May 2022
Can Information Behaviour Inform Machine Learning?
Can Information Behaviour Inform Machine Learning?
M. Ridley
AI4CE
29
0
0
01 May 2022
Continual Learning with Foundation Models: An Empirical Study of Latent
  Replay
Continual Learning with Foundation Models: An Empirical Study of Latent Replay
O. Ostapenko
Timothée Lesort
P. Rodríguez
Md Rifat Arefin
Arthur Douillard
Irina Rish
Laurent Charlin
39
52
0
30 Apr 2022
Flamingo: a Visual Language Model for Few-Shot Learning
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLM
VLM
51
3,369
0
29 Apr 2022
Reducing Predictive Feature Suppression in Resource-Constrained
  Contrastive Image-Caption Retrieval
Reducing Predictive Feature Suppression in Resource-Constrained Contrastive Image-Caption Retrieval
Maurits J. R. Bleeker
Andrew Yates
Maarten de Rijke
52
4
0
28 Apr 2022
TEMOS: Generating diverse human motions from textual descriptions
TEMOS: Generating diverse human motions from textual descriptions
Mathis Petrovich
Michael J. Black
Gül Varol
45
373
0
25 Apr 2022
CLIP-Dissect: Automatic Description of Neuron Representations in Deep
  Vision Networks
CLIP-Dissect: Automatic Description of Neuron Representations in Deep Vision Networks
Tuomas P. Oikarinen
Tsui-Wei Weng
VLM
23
82
1
23 Apr 2022
Transformer Decoders with MultiModal Regularization for Cross-Modal Food
  Retrieval
Transformer Decoders with MultiModal Regularization for Cross-Modal Food Retrieval
Mustafa Shukor
Guillaume Couairon
Asya Grechka
Matthieu Cord
ViT
47
18
0
20 Apr 2022
K-LITE: Learning Transferable Visual Models with External Knowledge
K-LITE: Learning Transferable Visual Models with External Knowledge
Sheng Shen
Chunyuan Li
Xiaowei Hu
Jianwei Yang
Yujia Xie
...
Ce Liu
Kurt Keutzer
Trevor Darrell
Anna Rohrbach
Jianfeng Gao
CLIP
VLM
36
83
0
20 Apr 2022
ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented
  Visual Models
ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models
Chunyuan Li
Haotian Liu
Liunian Harold Li
Pengchuan Zhang
J. Aneja
...
Ping Jin
Houdong Hu
Zicheng Liu
Yong Jae Lee
Jianfeng Gao
50
145
0
19 Apr 2022
TOV: The Original Vision Model for Optical Remote Sensing Image
  Understanding via Self-supervised Learning
TOV: The Original Vision Model for Optical Remote Sensing Image Understanding via Self-supervised Learning
Chao Tao
Ji Qi
Guo Zhang
Qing Zhu
Weipeng Lu
Haifeng Li
42
40
0
10 Apr 2022
Unsupervised Prompt Learning for Vision-Language Models
Unsupervised Prompt Learning for Vision-Language Models
Hao Huang
Jack Chu
Fangyun Wei
VPVLM
MLLM
VLM
38
132
0
07 Apr 2022
DaViT: Dual Attention Vision Transformers
DaViT: Dual Attention Vision Transformers
Mingyu Ding
Bin Xiao
Noel Codella
Ping Luo
Jingdong Wang
Lu Yuan
ViT
51
242
0
07 Apr 2022
Unified Contrastive Learning in Image-Text-Label Space
Unified Contrastive Learning in Image-Text-Label Space
Jianwei Yang
Chunyuan Li
Pengchuan Zhang
Bin Xiao
Ce Liu
Lu Yuan
Jianfeng Gao
VLM
SSL
56
221
0
07 Apr 2022
"This is my unicorn, Fluffy": Personalizing frozen vision-language
  representations
"This is my unicorn, Fluffy": Personalizing frozen vision-language representations
Niv Cohen
Rinon Gal
E. Meirom
Gal Chechik
Yuval Atzmon
VLM
MLLM
56
83
0
04 Apr 2022
Single-Stream Multi-Level Alignment for Vision-Language Pretraining
Single-Stream Multi-Level Alignment for Vision-Language Pretraining
Zaid Khan
B. Vijaykumar
Xiang Yu
S. Schulter
Manmohan Chandraker
Y. Fu
CLIP
VLM
27
16
0
27 Mar 2022
Multi-modal Misinformation Detection: Approaches, Challenges and
  Opportunities
Multi-modal Misinformation Detection: Approaches, Challenges and Opportunities
S. Abdali
Sina shaham
Bhaskar Krishnamachari
22
18
0
25 Mar 2022
Reshaping Robot Trajectories Using Natural Language Commands: A Study of
  Multi-Modal Data Alignment Using Transformers
Reshaping Robot Trajectories Using Natural Language Commands: A Study of Multi-Modal Data Alignment Using Transformers
A. Bucker
Luis F. C. Figueredo
Sami Haddadin
Ashish Kapoor
Shuang Ma
Rogerio Bonatti
LM&Ro
45
49
0
25 Mar 2022
Focal Modulation Networks
Focal Modulation Networks
Jianwei Yang
Chunyuan Li
Xiyang Dai
Lu Yuan
Jianfeng Gao
3DPC
38
263
0
22 Mar 2022
DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object
  Detection
DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection
Hao Zhang
Feng Li
Shilong Liu
Lei Zhang
Hang Su
Jun Zhu
L. Ni
H. Shum
ViT
98
1,376
0
07 Mar 2022
Vision-Language Intelligence: Tasks, Representation Learning, and Large
  Models
Vision-Language Intelligence: Tasks, Representation Learning, and Large Models
Feng Li
Hao Zhang
Yi-Fan Zhang
Shixuan Liu
Jian Guo
L. Ni
Pengchuan Zhang
Lei Zhang
AI4TS
VLM
24
36
0
03 Mar 2022
OFA: Unifying Architectures, Tasks, and Modalities Through a Simple
  Sequence-to-Sequence Learning Framework
OFA: Unifying Architectures, Tasks, and Modalities Through a Simple Sequence-to-Sequence Learning Framework
Peng Wang
An Yang
Rui Men
Junyang Lin
Shuai Bai
Zhikang Li
Jianxin Ma
Chang Zhou
Jingren Zhou
Hongxia Yang
MLLM
ObjD
74
850
0
07 Feb 2022
Multiview Transformers for Video Recognition
Multiview Transformers for Video Recognition
Shen Yan
Xuehan Xiong
Anurag Arnab
Zhichao Lu
Mi Zhang
Chen Sun
Cordelia Schmid
ViT
26
212
0
12 Jan 2022
MERLOT Reserve: Neural Script Knowledge through Vision and Language and
  Sound
MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound
Rowan Zellers
Jiasen Lu
Ximing Lu
Youngjae Yu
Yanpeng Zhao
Mohammadreza Salehi
Aditya Kusupati
Jack Hessel
Ali Farhadi
Yejin Choi
48
207
0
07 Jan 2022
Masked Feature Prediction for Self-Supervised Visual Pre-Training
Masked Feature Prediction for Self-Supervised Visual Pre-Training
Chen Wei
Haoqi Fan
Saining Xie
Chaoxia Wu
Alan Yuille
Christoph Feichtenhofer
ViT
100
655
0
16 Dec 2021
RegionCLIP: Region-based Language-Image Pretraining
RegionCLIP: Region-based Language-Image Pretraining
Yiwu Zhong
Jianwei Yang
Pengchuan Zhang
Chunyuan Li
Noel Codella
...
Luowei Zhou
Xiyang Dai
Lu Yuan
Yin Li
Jianfeng Gao
VLM
CLIP
40
557
0
16 Dec 2021
HairCLIP: Design Your Hair by Text and Reference Image
HairCLIP: Design Your Hair by Text and Reference Image
Tianyi Wei
Dongdong Chen
Wenbo Zhou
Jing Liao
Zhentao Tan
Lu Yuan
Weiming Zhang
Nenghai Yu
CLIP
33
108
0
09 Dec 2021
Previous
123...121314
Next