ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.11432
  4. Cited By
Florence: A New Foundation Model for Computer Vision

Florence: A New Foundation Model for Computer Vision

22 November 2021
Lu Yuan
Dongdong Chen
Yi-Ling Chen
Noel Codella
Xiyang Dai
Jianfeng Gao
Houdong Hu
Xuedong Huang
Boxin Li
Chunyuan Li
Ce Liu
Mengchen Liu
Zicheng Liu
Yumao Lu
Yu Shi
Lijuan Wang
Jianfeng Wang
Bin Xiao
Zhen Xiao
Jianwei Yang
Michael Zeng
Luowei Zhou
Pengchuan Zhang
    VLM
ArXiv (abs)PDFHTML

Papers citing "Florence: A New Foundation Model for Computer Vision"

50 / 668 papers shown
Title
Frozen CLIP Models are Efficient Video Learners
Frozen CLIP Models are Efficient Video Learners
Ziyi Lin
Shijie Geng
Renrui Zhang
Peng Gao
Gerard de Melo
Xiaogang Wang
Jifeng Dai
Yu Qiao
Hongsheng Li
CLIPVLM
98
209
0
06 Aug 2022
LATTE: LAnguage Trajectory TransformEr
LATTE: LAnguage Trajectory TransformEr
A. Bucker
Luis F. C. Figueredo
Sami Haddadin
Ashish Kapoor
Shuang Ma
Sai H. Vemprala
Rogerio Bonatti
LM&Ro
146
59
0
04 Aug 2022
Expanding Language-Image Pretrained Models for General Video Recognition
Expanding Language-Image Pretrained Models for General Video Recognition
Bolin Ni
Houwen Peng
Minghao Chen
Songyang Zhang
Gaofeng Meng
Jianlong Fu
Shiming Xiang
Haibin Ling
VLMCLIPViT
144
330
0
04 Aug 2022
GPPF: A General Perception Pre-training Framework via Sparsely Activated
  Multi-Task Learning
GPPF: A General Perception Pre-training Framework via Sparsely Activated Multi-Task Learning
Benyuan Sun
Jinqiao Dai
Zihao Liang
Cong Liu
Yi Yang
Bo Bai
MoE
84
4
0
03 Aug 2022
Masked Vision and Language Modeling for Multi-modal Representation
  Learning
Masked Vision and Language Modeling for Multi-modal Representation Learning
Gukyeong Kwon
Zhaowei Cai
Avinash Ravichandran
Erhan Bas
Rahul Bhotika
Stefano Soatto
92
68
0
03 Aug 2022
PEA: Improving the Performance of ReLU Networks for Free by Using
  Progressive Ensemble Activations
PEA: Improving the Performance of ReLU Networks for Free by Using Progressive Ensemble Activations
Á. Utasi
49
0
0
28 Jul 2022
Learning Visual Representation from Modality-Shared Contrastive
  Language-Image Pre-training
Learning Visual Representation from Modality-Shared Contrastive Language-Image Pre-training
Haoxuan You
Luowei Zhou
Bin Xiao
Noel Codella
Yu Cheng
Ruochen Xu
Shih-Fu Chang
Lu Yuan
CLIPVLM
89
48
0
26 Jul 2022
Applying Spatiotemporal Attention to Identify Distracted and Drowsy
  Driving with Vision Transformers
Applying Spatiotemporal Attention to Identify Distracted and Drowsy Driving with Vision Transformers
Samay Lakhani
ViTMedIm
50
2
0
22 Jul 2022
TinyViT: Fast Pretraining Distillation for Small Vision Transformers
TinyViT: Fast Pretraining Distillation for Small Vision Transformers
Kan Wu
Jinnian Zhang
Houwen Peng
Mengchen Liu
Bin Xiao
Jianlong Fu
Lu Yuan
ViT
87
267
0
21 Jul 2022
Progress and limitations of deep networks to recognize objects in
  unusual poses
Progress and limitations of deep networks to recognize objects in unusual poses
Amro Abbas
Stéphane Deny
OODAAML
60
17
0
16 Jul 2022
Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision
  and Language Models
Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision and Language Models
Rui Qian
Yeqing Li
Zheng Xu
Ming-Hsuan Yang
Serge Belongie
Huayu Chen
VLM
74
22
0
15 Jul 2022
Towards Grand Unification of Object Tracking
Towards Grand Unification of Object Tracking
B. Yan
Yi Jiang
Pei Sun
D. Wang
Zehuan Yuan
Ping Luo
Huchuan Lu
ViTVOT
173
140
0
14 Jul 2022
Big Learning
Big Learning
Yulai Cong
Miaoyun Zhao
AI4CE
108
0
0
08 Jul 2022
Revisiting Classifier: Transferring Vision-Language Models for Video
  Recognition
Revisiting Classifier: Transferring Vision-Language Models for Video Recognition
Wenhao Wu
Zhun Sun
Wanli Ouyang
VLM
197
99
0
04 Jul 2022
Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
Unified-IO: A Unified Model for Vision, Language, and Multi-Modal Tasks
Jiasen Lu
Christopher Clark
Rowan Zellers
Roozbeh Mottaghi
Aniruddha Kembhavi
ObjDVLMMLLM
173
412
0
17 Jun 2022
Entity-Graph Enhanced Cross-Modal Pretraining for Instance-level Product
  Retrieval
Entity-Graph Enhanced Cross-Modal Pretraining for Instance-level Product Retrieval
Xiao Dong
Xunlin Zhan
Yunchao Wei
Xiaoyong Wei
Yaowei Wang
Minlong Lu
Xiaochun Cao
Xiaodan Liang
82
11
0
17 Jun 2022
MixGen: A New Multi-Modal Data Augmentation
MixGen: A New Multi-Modal Data Augmentation
Xiaoshuai Hao
Yi Zhu
Srikar Appalaraju
Aston Zhang
Wanqian Zhang
Boyang Li
Mu Li
VLM
127
90
0
16 Jun 2022
Write and Paint: Generative Vision-Language Models are Unified Modal
  Learners
Write and Paint: Generative Vision-Language Models are Unified Modal Learners
Shizhe Diao
Wangchunshu Zhou
Xinsong Zhang
Jiawei Wang
MLLMAI4CE
99
17
0
15 Jun 2022
GLIPv2: Unifying Localization and Vision-Language Understanding
GLIPv2: Unifying Localization and Vision-Language Understanding
Haotian Zhang
Pengchuan Zhang
Xiaowei Hu
Yen-Chun Chen
Liunian Harold Li
Xiyang Dai
Lijuan Wang
Lu Yuan
Lei Li
Jianfeng Gao
ObjDVLM
106
304
0
12 Jun 2022
Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional
  MoEs
Uni-Perceiver-MoE: Learning Sparse Generalist Models with Conditional MoEs
Jinguo Zhu
Xizhou Zhu
Wenhai Wang
Xiaohua Wang
Hongsheng Li
Xiaogang Wang
Jifeng Dai
MoMeMoE
107
70
0
09 Jun 2022
Neural Prompt Search
Neural Prompt Search
Yuanhan Zhang
Kaiyang Zhou
Ziwei Liu
VPVLMVLM
110
152
0
09 Jun 2022
Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture
  of Experts
Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts
Basil Mustafa
C. Riquelme
J. Puigcerver
Rodolphe Jenatton
N. Houlsby
VLMMoE
172
205
0
06 Jun 2022
Visual Clues: Bridging Vision and Language Foundations for Image
  Paragraph Captioning
Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning
Yujia Xie
Luowei Zhou
Xiyang Dai
Lu Yuan
Nguyen Bach
Ce Liu
Michael Zeng
VLMMLLM
81
28
0
03 Jun 2022
REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual
  Question Answering
REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering
Yuanze Lin
Yujia Xie
Dongdong Chen
Yichong Xu
Chenguang Zhu
Lu Yuan
93
75
0
02 Jun 2022
An Efficient Modern Baseline for FloodNet VQA
An Efficient Modern Baseline for FloodNet VQA
Aditya Kane
Sahil Khose
65
4
0
30 May 2022
Generalizing Multimodal Pre-training into Multilingual via Language
  Acquisition
Generalizing Multimodal Pre-training into Multilingual via Language Acquisition
Liang Zhang
Anwen Hu
Qin Jin
VLM
52
5
0
29 May 2022
Contrastive Learning Rivals Masked Image Modeling in Fine-tuning via
  Feature Distillation
Contrastive Learning Rivals Masked Image Modeling in Fine-tuning via Feature Distillation
Yixuan Wei
Han Hu
Zhenda Xie
Zheng Zhang
Yue Cao
Jianmin Bao
Dong Chen
B. Guo
CLIP
161
128
0
27 May 2022
GIT: A Generative Image-to-text Transformer for Vision and Language
GIT: A Generative Image-to-text Transformer for Vision and Language
Jianfeng Wang
Zhengyuan Yang
Xiaowei Hu
Linjie Li
Kevin Qinghong Lin
Zhe Gan
Zicheng Liu
Ce Liu
Lijuan Wang
VLM
180
564
0
27 May 2022
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal
  Skip-connections
mPLUG: Effective and Efficient Vision-Language Learning by Cross-modal Skip-connections
Chenliang Li
Haiyang Xu
Junfeng Tian
Wei Wang
Ming Yan
...
Ji Zhang
Songfang Huang
Feiran Huang
Jingren Zhou
Luo Si
VLMMLLM
102
224
0
24 May 2022
Language Models with Image Descriptors are Strong Few-Shot
  Video-Language Learners
Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
Zhenhailong Wang
Manling Li
Ruochen Xu
Luowei Zhou
Jie Lei
...
Chenguang Zhu
Derek Hoiem
Shih-Fu Chang
Joey Tianyi Zhou
Heng Ji
MLLMVLM
230
142
0
22 May 2022
CoCa: Contrastive Captioners are Image-Text Foundation Models
CoCa: Contrastive Captioners are Image-Text Foundation Models
Jiahui Yu
Zirui Wang
Vijay Vasudevan
Legg Yeung
Mojtaba Seyedhosseini
Yonghui Wu
VLMCLIPOffRL
340
1,314
0
04 May 2022
All You May Need for VQA are Image Captions
All You May Need for VQA are Image Captions
Soravit Changpinyo
Doron Kukliansky
Idan Szpektor
Xi Chen
Nan Ding
Radu Soricut
101
76
0
04 May 2022
i-Code: An Integrative and Composable Multimodal Learning Framework
i-Code: An Integrative and Composable Multimodal Learning Framework
Ziyi Yang
Yuwei Fang
Chenguang Zhu
Reid Pryzant
DongDong Chen
...
Bin Xiao
Yuanxun Lu
Takuya Yoshioka
Michael Zeng
Xuedong Huang
109
49
0
03 May 2022
In Defense of Image Pre-Training for Spatiotemporal Recognition
In Defense of Image Pre-Training for Spatiotemporal Recognition
Xianhang Li
Huiyu Wang
Chen Wei
Jieru Mei
Alan Yuille
Yuyin Zhou
Cihang Xie
74
0
0
03 May 2022
CenterCLIP: Token Clustering for Efficient Text-Video Retrieval
CenterCLIP: Token Clustering for Efficient Text-Video Retrieval
Shuai Zhao
Linchao Zhu
Xiaohan Wang
Yi Yang
VLMCLIP
76
122
0
02 May 2022
Can Information Behaviour Inform Machine Learning?
Can Information Behaviour Inform Machine Learning?
M. Ridley
AI4CE
60
0
0
01 May 2022
Continual Learning with Foundation Models: An Empirical Study of Latent
  Replay
Continual Learning with Foundation Models: An Empirical Study of Latent Replay
O. Ostapenko
Timothée Lesort
P. Rodríguez
Md Rifat Arefin
Arthur Douillard
Irina Rish
Laurent Charlin
120
53
0
30 Apr 2022
Flamingo: a Visual Language Model for Few-Shot Learning
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLMVLM
434
3,621
0
29 Apr 2022
Reducing Predictive Feature Suppression in Resource-Constrained
  Contrastive Image-Caption Retrieval
Reducing Predictive Feature Suppression in Resource-Constrained Contrastive Image-Caption Retrieval
Maurits J. R. Bleeker
Andrew Yates
Maarten de Rijke
100
4
0
28 Apr 2022
TEMOS: Generating diverse human motions from textual descriptions
TEMOS: Generating diverse human motions from textual descriptions
Mathis Petrovich
Michael J. Black
Gül Varol
190
393
0
25 Apr 2022
CLIP-Dissect: Automatic Description of Neuron Representations in Deep
  Vision Networks
CLIP-Dissect: Automatic Description of Neuron Representations in Deep Vision Networks
Tuomas P. Oikarinen
Tsui-Wei Weng
VLM
63
90
1
23 Apr 2022
Transformer Decoders with MultiModal Regularization for Cross-Modal Food
  Retrieval
Transformer Decoders with MultiModal Regularization for Cross-Modal Food Retrieval
Mustafa Shukor
Guillaume Couairon
Asya Grechka
Matthieu Cord
ViT
89
19
0
20 Apr 2022
K-LITE: Learning Transferable Visual Models with External Knowledge
K-LITE: Learning Transferable Visual Models with External Knowledge
Sheng Shen
Chunyuan Li
Xiaowei Hu
Jianwei Yang
Yujia Xie
...
Ce Liu
Kurt Keutzer
Trevor Darrell
Anna Rohrbach
Jianfeng Gao
CLIPVLM
72
85
0
20 Apr 2022
ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented
  Visual Models
ELEVATER: A Benchmark and Toolkit for Evaluating Language-Augmented Visual Models
Chunyuan Li
Haotian Liu
Liunian Harold Li
Pengchuan Zhang
J. Aneja
...
Ping Jin
Houdong Hu
Zicheng Liu
Yong Jae Lee
Jianfeng Gao
112
153
0
19 Apr 2022
TOV: The Original Vision Model for Optical Remote Sensing Image
  Understanding via Self-supervised Learning
TOV: The Original Vision Model for Optical Remote Sensing Image Understanding via Self-supervised Learning
Chao Tao
Ji Qi
Guo Zhang
Qing Zhu
Weipeng Lu
Haifeng Li
111
47
0
10 Apr 2022
Unsupervised Prompt Learning for Vision-Language Models
Unsupervised Prompt Learning for Vision-Language Models
Hao Huang
Jack Chu
Fangyun Wei
VPVLMMLLMVLM
112
133
0
07 Apr 2022
DaViT: Dual Attention Vision Transformers
DaViT: Dual Attention Vision Transformers
Mingyu Ding
Bin Xiao
Noel Codella
Ping Luo
Jingdong Wang
Lu Yuan
ViT
173
255
0
07 Apr 2022
Unified Contrastive Learning in Image-Text-Label Space
Unified Contrastive Learning in Image-Text-Label Space
Jianwei Yang
Chunyuan Li
Pengchuan Zhang
Bin Xiao
Ce Liu
Lu Yuan
Jianfeng Gao
VLMSSL
176
227
0
07 Apr 2022
"This is my unicorn, Fluffy": Personalizing frozen vision-language
  representations
"This is my unicorn, Fluffy": Personalizing frozen vision-language representations
Niv Cohen
Rinon Gal
E. Meirom
Gal Chechik
Yuval Atzmon
VLMMLLM
130
88
0
04 Apr 2022
Single-Stream Multi-Level Alignment for Vision-Language Pretraining
Single-Stream Multi-Level Alignment for Vision-Language Pretraining
Zaid Khan
B. Vijaykumar
Xiang Yu
S. Schulter
Manmohan Chandraker
Y. Fu
CLIPVLM
127
17
0
27 Mar 2022
Previous
123...121314
Next