ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2111.11432
  4. Cited By
Florence: A New Foundation Model for Computer Vision

Florence: A New Foundation Model for Computer Vision

22 November 2021
Lu Yuan
Dongdong Chen
Yi-Ling Chen
Noel Codella
Xiyang Dai
Jianfeng Gao
Houdong Hu
Xuedong Huang
Boxin Li
Chunyuan Li
Ce Liu
Mengchen Liu
Zicheng Liu
Yumao Lu
Yu Shi
Lijuan Wang
Jianfeng Wang
Bin Xiao
Zhen Xiao
Jianwei Yang
Michael Zeng
Luowei Zhou
Pengchuan Zhang
    VLM
ArXivPDFHTML

Papers citing "Florence: A New Foundation Model for Computer Vision"

50 / 664 papers shown
Title
MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action
MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action
Zhengyuan Yang
Linjie Li
Jianfeng Wang
Kevin Qinghong Lin
E. Azarnasab
Faisal Ahmed
Zicheng Liu
Ce Liu
Michael Zeng
Lijuan Wang
ReLM
KELM
LRM
26
368
0
20 Mar 2023
EVA-02: A Visual Representation for Neon Genesis
EVA-02: A Visual Representation for Neon Genesis
Yuxin Fang
Quan-Sen Sun
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
VLM
ViT
CLIP
44
263
0
20 Mar 2023
CLIP goes 3D: Leveraging Prompt Tuning for Language Grounded 3D
  Recognition
CLIP goes 3D: Leveraging Prompt Tuning for Language Grounded 3D Recognition
Deepti Hegde
Jeya Maria Jose Valanarasu
Vishal M. Patel
CLIP
44
65
0
20 Mar 2023
A Region-Prompted Adapter Tuning for Visual Abductive Reasoning
A Region-Prompted Adapter Tuning for Visual Abductive Reasoning
Hao Zhang
Yeo Keat Ee
Basura Fernando
VLM
31
3
0
18 Mar 2023
Investigating the Role of Attribute Context in Vision-Language Models
  for Object Recognition and Detection
Investigating the Role of Attribute Context in Vision-Language Models for Object Recognition and Detection
Kyle Buettner
Adriana Kovashka
28
0
0
17 Mar 2023
Towards a Foundation Model for Neural Network Wavefunctions
Towards a Foundation Model for Neural Network Wavefunctions
Michael Scherbela
Leon Gerard
Philipp Grohs
34
9
0
17 Mar 2023
Dual-path Adaptation from Image to Video Transformers
Dual-path Adaptation from Image to Video Transformers
Jungin Park
Jiyoung Lee
Kwanghoon Sohn
ViT
21
37
0
17 Mar 2023
Video Action Recognition with Attentive Semantic Units
Video Action Recognition with Attentive Semantic Units
Yifei Chen
Dapeng Chen
Ruijin Liu
Hao Li
Wei Peng
21
11
0
17 Mar 2023
GridCLIP: One-Stage Object Detection by Grid-Level CLIP Representation
  Learning
GridCLIP: One-Stage Object Detection by Grid-Level CLIP Representation Learning
Jiaying Lin
S. Gong
VLM
CLIP
ObjD
25
22
0
16 Mar 2023
Text-to-image Diffusion Models in Generative AI: A Survey
Text-to-image Diffusion Models in Generative AI: A Survey
Chenshuang Zhang
Chaoning Zhang
Mengchun Zhang
In So Kweon
VLM
51
267
0
14 Mar 2023
Challenges and Practices of Deep Learning Model Reengineering: A Case
  Study on Computer Vision
Challenges and Practices of Deep Learning Model Reengineering: A Case Study on Computer Vision
Wenxin Jiang
Vishnu Banna
Naveen Vivek
Abhinav Goel
Nicholas Synovic
George K. Thiruvathukal
James C. Davis
VLM
45
18
0
13 Mar 2023
Revisiting Class-Incremental Learning with Pre-Trained Models:
  Generalizability and Adaptivity are All You Need
Revisiting Class-Incremental Learning with Pre-Trained Models: Generalizability and Adaptivity are All You Need
Da-Wei Zhou
Han-Jia Ye
De-Chuan Zhan
Ziwei Liu
CLL
41
101
0
13 Mar 2023
Scaling Vision-Language Models with Sparse Mixture of Experts
Scaling Vision-Language Models with Sparse Mixture of Experts
Sheng Shen
Z. Yao
Chunyuan Li
Trevor Darrell
Kurt Keutzer
Yuxiong He
VLM
MoE
26
63
0
13 Mar 2023
Multimodal Data Integration for Oncology in the Era of Deep Neural
  Networks: A Review
Multimodal Data Integration for Oncology in the Era of Deep Neural Networks: A Review
Asim Waqas
Aakash Tripathi
Ravichandran Ramachandran
Paul Stewart
Ghulam Rasool
AI4CE
42
32
0
11 Mar 2023
HumanBench: Towards General Human-centric Perception with Projector
  Assisted Pretraining
HumanBench: Towards General Human-centric Perception with Projector Assisted Pretraining
Shixiang Tang
Cheng Chen
Qingsong Xie
Meilin Chen
Yizhou Wang
...
Feng Zhu
Haiyang Yang
Li Yi
Rui Zhao
Wanli Ouyang
VLM
34
36
0
10 Mar 2023
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set
  Object Detection
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
Shilong Liu
Zhaoyang Zeng
Tianhe Ren
Feng Li
Hao Zhang
...
Chun-yue Li
Jianwei Yang
Hang Su
Jun Zhu
Lei Zhang
ObjD
124
1,831
0
09 Mar 2023
UniHCP: A Unified Model for Human-Centric Perceptions
UniHCP: A Unified Model for Human-Centric Perceptions
Yuanzheng Ci
Yizhou Wang
Meilin Chen
Shixiang Tang
Lei Bai
Feng Zhu
Rui Zhao
F. Yu
Donglian Qi
Wanli Ouyang
82
51
0
06 Mar 2023
The Contribution of Knowledge in Visiolinguistic Learning: A Survey on
  Tasks and Challenges
The Contribution of Knowledge in Visiolinguistic Learning: A Survey on Tasks and Challenges
Maria Lymperaiou
Giorgos Stamou
VLM
34
4
0
04 Mar 2023
Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering
Prophet: Prompting Large Language Models with Complementary Answer Heuristics for Knowledge-based Visual Question Answering
Zhou Yu
Xuecheng Ouyang
Zhenwei Shao
Mei Wang
Jun Yu
MLLM
94
11
0
03 Mar 2023
Image as Set of Points
Image as Set of Points
Xu Ma
Yuqian Zhou
Huan Wang
Can Qin
Bin Sun
Chang Liu
Yun Fu
VLM
50
49
0
02 Mar 2023
Towards Generalisable Video Moment Retrieval: Visual-Dynamic Injection
  to Image-Text Pre-Training
Towards Generalisable Video Moment Retrieval: Visual-Dynamic Injection to Image-Text Pre-Training
Dezhao Luo
Jiabo Huang
S. Gong
Hailin Jin
Yang Liu
VGen
21
28
0
28 Feb 2023
TextIR: A Simple Framework for Text-based Editable Image Restoration
TextIR: A Simple Framework for Text-based Editable Image Restoration
Yun-Hao Bai
Cairong Wang
Shuzhao Xie
Chao Dong
Chun Yuan
Zhi Wang
DiffM
30
15
0
28 Feb 2023
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense
  Video Captioning
Vid2Seq: Large-Scale Pretraining of a Visual Language Model for Dense Video Captioning
Antoine Yang
Arsha Nagrani
Paul Hongsuck Seo
Antoine Miech
Jordi Pont-Tuset
Ivan Laptev
Josef Sivic
Cordelia Schmid
AI4TS
VLM
41
221
0
27 Feb 2023
Side Adapter Network for Open-Vocabulary Semantic Segmentation
Side Adapter Network for Open-Vocabulary Semantic Segmentation
Mengde Xu
Zheng-Wei Zhang
Fangyun Wei
Han Hu
Xiang Bai
VLM
26
249
0
23 Feb 2023
Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey
Large-scale Multi-Modal Pre-trained Models: A Comprehensive Survey
Tianlin Li
Guangyao Chen
Guangwu Qian
Pengcheng Gao
Xiaoyong Wei
Yaowei Wang
Yonghong Tian
Wen Gao
AI4CE
VLM
48
205
0
20 Feb 2023
Few-shot Multimodal Multitask Multilingual Learning
Few-shot Multimodal Multitask Multilingual Learning
Aman Chadha
Vinija Jain
53
0
0
19 Feb 2023
Zero-Shot Anomaly Detection via Batch Normalization
Zero-Shot Anomaly Detection via Batch Normalization
Aodong Li
Chen Qiu
Marius Kloft
Padhraic Smyth
Maja R. Rudolph
Stephan Mandt
32
0
0
15 Feb 2023
Semantic Image Segmentation: Two Decades of Research
Semantic Image Segmentation: Two Decades of Research
G. Csurka
Riccardo Volpi
Boris Chidlovskii
3DV
37
50
0
13 Feb 2023
Diagnosing and Rectifying Vision Models using Language
Diagnosing and Rectifying Vision Models using Language
Yuhui Zhang
Jeff Z. HaoChen
Shih-Cheng Huang
Kuan-Chieh Jackson Wang
James Zou
Serena Yeung
39
45
0
08 Feb 2023
SimCon Loss with Multiple Views for Text Supervised Semantic
  Segmentation
SimCon Loss with Multiple Views for Text Supervised Semantic Segmentation
Yash J. Patel
Yusheng Xie
Yi Zhu
Srikar Appalaraju
R. Manmatha
40
4
0
07 Feb 2023
AIM: Adapting Image Models for Efficient Video Action Recognition
AIM: Adapting Image Models for Efficient Video Action Recognition
Taojiannan Yang
Yi Zhu
Yusheng Xie
Aston Zhang
Chong Chen
Mu Li
ViT
65
145
0
06 Feb 2023
Grounding Large Language Models in Interactive Environments with Online
  Reinforcement Learning
Grounding Large Language Models in Interactive Environments with Online Reinforcement Learning
Thomas Carta
Clément Romac
Thomas Wolf
Sylvain Lamprier
Olivier Sigaud
Pierre-Yves Oudeyer
LM&Ro
LLMAG
25
183
0
06 Feb 2023
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image
  and Video
mPLUG-2: A Modularized Multi-modal Foundation Model Across Text, Image and Video
Haiyang Xu
Qinghao Ye
Mingshi Yan
Yaya Shi
Jiabo Ye
...
Guohai Xu
Ji Zhang
Songfang Huang
Feiran Huang
Jingren Zhou
MLLM
VLM
MoE
46
161
0
01 Feb 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
320
4,300
0
30 Jan 2023
Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge
  Transferring
Revisiting Temporal Modeling for CLIP-based Image-to-Video Knowledge Transferring
Ruyang Liu
Jingjia Huang
Ge Li
Jiashi Feng
Xing Wu
Thomas H. Li
AI4TS
CLIP
VLM
35
47
0
26 Jan 2023
Affective Faces for Goal-Driven Dyadic Communication
Affective Faces for Goal-Driven Dyadic Communication
Scott Geng
Revant Teotia
Purva Tendulkar
Sachit Menon
Carl Vondrick
VGen
34
19
0
26 Jan 2023
ClimaX: A foundation model for weather and climate
ClimaX: A foundation model for weather and climate
Tung Nguyen
Johannes Brandstetter
Ashish Kapoor
Jayesh K. Gupta
Aditya Grover
AI4Cl
AI4CE
11
246
0
24 Jan 2023
ATP: Adaptive Tensor Parallelism for Foundation Models
ATP: Adaptive Tensor Parallelism for Foundation Models
Shenggan Cheng
Ziming Liu
Jiangsu Du
Yang You
21
6
0
20 Jan 2023
Masked Autoencoding Does Not Help Natural Language Supervision at Scale
Masked Autoencoding Does Not Help Natural Language Supervision at Scale
Floris Weers
Vaishaal Shankar
Angelos Katharopoulos
Yinfei Yang
Tom Gunter
CLIP
23
4
0
19 Jan 2023
Joint Representation Learning for Text and 3D Point Cloud
Joint Representation Learning for Text and 3D Point Cloud
Rui Huang
Xuran Pan
Henry Zheng
Haojun Jiang
Zhifeng Xie
S. Song
Gao Huang
38
19
0
18 Jan 2023
Towards Models that Can See and Read
Towards Models that Can See and Read
Roy Ganz
Oren Nuriel
Aviad Aberdam
Yair Kittenplon
Shai Mazor
Ron Litman
24
13
0
18 Jan 2023
Learning Customized Visual Models with Retrieval-Augmented Knowledge
Learning Customized Visual Models with Retrieval-Augmented Knowledge
Haotian Liu
Kilho Son
Jianwei Yang
Ce Liu
Jianfeng Gao
Yong Jae Lee
Chunyuan Li
VLM
40
54
0
17 Jan 2023
GLIGEN: Open-Set Grounded Text-to-Image Generation
GLIGEN: Open-Set Grounded Text-to-Image Generation
Yuheng Li
Haotian Liu
Qingyang Wu
Fangzhou Mu
Jianwei Yang
Jianfeng Gao
Chunyuan Li
Yong Jae Lee
VLM
80
570
1
17 Jan 2023
UATVR: Uncertainty-Adaptive Text-Video Retrieval
UATVR: Uncertainty-Adaptive Text-Video Retrieval
Bo Fang
Wenhao Wu
Chang-rui Liu
Yu Zhou
Yuxin Song
Weiping Wang
Min Yang
Xiang Ji
Jingdong Wang
26
46
0
16 Jan 2023
Toward Building General Foundation Models for Language, Vision, and
  Vision-Language Understanding Tasks
Toward Building General Foundation Models for Language, Vision, and Vision-Language Understanding Tasks
Xinsong Zhang
Yan Zeng
Jipeng Zhang
Hang Li
VLM
AI4CE
LRM
34
17
0
12 Jan 2023
Filtering, Distillation, and Hard Negatives for Vision-Language
  Pre-Training
Filtering, Distillation, and Hard Negatives for Vision-Language Pre-Training
Filip Radenovic
Abhimanyu Dubey
Abhishek Kadian
Todor Mihaylov
Simon Vandenhende
Yash J. Patel
Y. Wen
Vignesh Ramanathan
D. Mahajan
VLM
40
82
0
05 Jan 2023
CiT: Curation in Training for Effective Vision-Language Data
CiT: Curation in Training for Effective Vision-Language Data
Hu Xu
Saining Xie
Po-Yao (Bernie) Huang
Licheng Yu
Russ Howes
Gargi Ghosh
Luke Zettlemoyer
Christoph Feichtenhofer
VLM
DiffM
33
25
0
05 Jan 2023
All in Tokens: Unifying Output Space of Visual Tasks via Soft Token
All in Tokens: Unifying Output Space of Visual Tasks via Soft Token
Jia Ning
Chen Li
Zheng-Wei Zhang
Zigang Geng
Qi Dai
Kun He
Han Hu
56
44
0
05 Jan 2023
Test of Time: Instilling Video-Language Models with a Sense of Time
Test of Time: Instilling Video-Language Models with a Sense of Time
Piyush Bagad
Makarand Tapaswi
Cees G. M. Snoek
86
36
0
05 Jan 2023
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition
  with Pre-trained Vision-Language Models
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models
Wenhao Wu
Xiaohan Wang
Haipeng Luo
Jingdong Wang
Yi Yang
Wanli Ouyang
106
48
0
31 Dec 2022
Previous
123...10111213149
Next