Papers
Communities
Organizations
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.11432
Cited By
Florence: A New Foundation Model for Computer Vision
22 November 2021
Lu Yuan
Dongdong Chen
Yi-Ling Chen
Noel Codella
Xiyang Dai
Jianfeng Gao
Houdong Hu
Xuedong Huang
Boxin Li
Chunyuan Li
Ce Liu
Mengchen Liu
Zicheng Liu
Yumao Lu
Yu Shi
Lijuan Wang
Jianfeng Wang
Bin Xiao
Zhen Xiao
Jianwei Yang
Michael Zeng
Luowei Zhou
Pengchuan Zhang
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Florence: A New Foundation Model for Computer Vision"
50 / 668 papers shown
Title
RS5M and GeoRSCLIP: A Large Scale Vision-Language Dataset and A Large Vision-Language Model for Remote Sensing
Zilun Zhang
Tiancheng Zhao
Yulong Guo
Yuxiang Cai
DiffM
VLM
177
66
0
20 Jun 2023
RemoteCLIP: A Vision Language Foundation Model for Remote Sensing
Fan Liu
Delong Chen
Zhan-Rong Guan
Xiaocong Zhou
Jiale Zhu
Qiaolin Ye
Liyong Fu
Jun Zhou
VLM
170
230
0
19 Jun 2023
A Universal Semantic-Geometric Representation for Robotic Manipulation
Tong Zhang
Yingdong Hu
Hanchen Cui
Hang Zhao
Yang Gao
142
18
0
18 Jun 2023
LabelBench: A Comprehensive Framework for Benchmarking Adaptive Label-Efficient Learning
Jifan Zhang
Yifang Chen
Gregory H. Canal
Stephen Mussmann
Arnav M. Das
...
Yinglun Zhu
Jeffrey Bilmes
S. Du
Kevin Jamieson
Robert D. Nowak
VLM
105
12
0
16 Jun 2023
Robustness Analysis on Foundational Segmentation Models
Madeline Chantry Schiappa
Shehreen Azad
V. Sachidanand
Yunhao Ge
O. Mikšík
Yogesh S Rawat
Vibhav Vineet
OOD
VLM
AAML
82
9
0
15 Jun 2023
Encyclopedic VQA: Visual questions about detailed properties of fine-grained categories
Thomas Mensink
J. Uijlings
Lluis Castrejon
A. Goel
Felipe Cadar
Howard Zhou
Fei Sha
A. Araújo
V. Ferrari
96
44
0
15 Jun 2023
COSA: Concatenated Sample Pretrained Vision-Language Foundation Model
Sihan Chen
Xingjian He
Handong Li
Xiaojie Jin
Jiashi Feng
Qingbin Liu
VLM
CLIP
88
9
0
15 Jun 2023
Extending CLIP's Image-Text Alignment to Referring Image Segmentation
Seoyeon Kim
Minguk Kang
Dongwon Kim
Jaesik Park
Suha Kwak
VLM
107
11
0
14 Jun 2023
MOFI: Learning Image Representations from Noisy Entity Annotated Images
Wentao Wu
Aleksei Timofeev
Chen Chen
Bowen Zhang
Kun Duan
...
Yantao Zheng
Jonathon Shlens
Xianzhi Du
Zhe Gan
Yinfei Yang
VLM
94
8
0
13 Jun 2023
Robustness of SAM: Segment Anything Under Corruptions and Beyond
Yu Qiao
Chaoning Zhang
Taegoo Kang
Donghun Kim
Chenshuang Zhang
Choong Seon Hong
AAML
56
34
0
13 Jun 2023
A Comprehensive Survey on Applications of Transformers for Deep Learning Tasks
Saidul Islam
Hanae Elmekki
Ahmed Elsebai
Jamal Bentahar
Najat Drawel
Gaith Rjoub
Witold Pedrycz
ViT
MedIm
98
212
0
11 Jun 2023
Genomic Interpreter: A Hierarchical Genomic Deep Neural Network with 1D Shifted Window Transformer
Zehui Li
Akashaditya Das
W. Beardall
Yiren Zhao
Guy-Bart Stan
57
4
0
08 Jun 2023
MMSum: A Dataset for Multimodal Summarization and Thumbnail Generation of Videos
Jielin Qiu
Jiacheng Zhu
William Jongwon Han
Aditesh Kumar
Karthik Mittal
...
Linjie Li
Jianfeng Wang
Ding Zhao
Bo Li
Lijuan Wang
VGen
77
8
0
07 Jun 2023
Q: How to Specialize Large Vision-Language Models to Data-Scarce VQA Tasks? A: Self-Train on Unlabeled Images!
Zaid Khan
B. Vijaykumar
S. Schulter
Xiang Yu
Y. Fu
Manmohan Chandraker
VLM
MLLM
98
18
0
06 Jun 2023
Diversifying Joint Vision-Language Tokenization Learning
Vardaan Pahuja
A. Piergiovanni
A. Angelova
85
0
0
06 Jun 2023
Understanding Segment Anything Model: SAM is Biased Towards Texture Rather than Shape
Chaoning Zhang
Yu Qiao
Shehbaz Tariq
Sheng Zheng
Chenshuang Zhang
Chenghao Li
Hyundong Shin
Choong Seon Hong
VLM
73
10
0
03 Jun 2023
Enhancing CLIP with CLIP: Exploring Pseudolabeling for Limited-Label Prompt Tuning
Cristina Menghini
Andrew T. Delworth
Stephen H. Bach
VLM
147
26
0
02 Jun 2023
Towards In-context Scene Understanding
Ivana Balazevic
David Steiner
Nikhil Parthasarathy
Relja Arandjelović
Olivier J. Hénaff
103
31
0
02 Jun 2023
Consistency-guided Prompt Learning for Vision-Language Models
Shuvendu Roy
Ali Etemad
VLM
VPVLM
123
63
0
01 Jun 2023
Exploring Open-Vocabulary Semantic Segmentation without Human Labels
Jun Chen
Deyao Zhu
Guocheng Qian
Guohao Li
Zhicheng Yan
Chenchen Zhu
Fanyi Xiao
Mohamed Elhoseiny
Sean Culatana
VLM
100
11
0
01 Jun 2023
Joint Adaptive Representations for Image-Language Learning
A. Piergiovanni
A. Angelova
VLM
76
0
0
31 May 2023
Multi-modal Queried Object Detection in the Wild
Yifan Xu
Mengdan Zhang
Chaoyou Fu
Peixian Chen
Xiaoshan Yang
Ke Li
Changsheng Xu
ObjD
VLM
133
32
0
30 May 2023
Learning without Forgetting for Vision-Language Models
Da-Wei Zhou
Yuanhan Zhang
Jingyi Ning
Jingyi Ning
De-Chuan Zhan
De-Chuan Zhan
Ziwei Liu
VLM
CLL
168
44
0
30 May 2023
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
Sihan Chen
Handong Li
Qunbo Wang
Zijia Zhao
Ming-Ting Sun
Xinxin Zhu
Qingbin Liu
244
112
0
29 May 2023
Test-Time Adaptation with CLIP Reward for Zero-Shot Generalization in Vision-Language Models
Shuai Zhao
Xiaohan Wang
Linchao Zhu
Yezhou Yang
VLM
111
23
0
29 May 2023
Deeply Coupled Cross-Modal Prompt Learning
Xuejing Liu
Wei Tang
Jinghui Lu
Rui Zhao
Zhaojun Guo
Fei Tan
VLM
77
17
0
29 May 2023
KAFA: Rethinking Image Ad Understanding with Knowledge-Augmented Feature Adaptation of Vision-Language Models
Zhiwei Jia
P. Narayana
Arjun Reddy Akula
G. Pruthi
Haoran Su
Sugato Basu
Varun Jampani
VLM
OffRL
91
4
0
28 May 2023
Learning from Children: Improving Image-Caption Pretraining via Curriculum
Hammad A. Ayyubi
R. Lokesh
Alireza Zareian
Bohong Wu
Shih-Fu Chang
VLM
CLIP
64
2
0
27 May 2023
Three Towers: Flexible Contrastive Learning with Pretrained Image Models
Jannik Kossen
Mark Collier
Basil Mustafa
Tianlin Li
Xiaohua Zhai
Lucas Beyer
Andreas Steiner
Jesse Berent
Rodolphe Jenatton
Efi Kokiopoulou
VLM
69
13
0
26 May 2023
MEMEX: Detecting Explanatory Evidence for Memes via Knowledge-Enriched Contextualization
Shivam Sharma
S Ramaneswaran
Udit Arora
Md. Shad Akhtar
Tanmoy Chakraborty
77
9
0
25 May 2023
ViTMatte: Boosting Image Matting with Pretrained Plain Vision Transformers
J. Yao
Xinggang Wang
Shusheng Yang
Baoyuan Wang
ViT
108
64
0
24 May 2023
PaCE: Unified Multi-modal Dialogue Pre-training with Progressive and Compositional Experts
Yunshui Li
Binyuan Hui
Zhichao Yin
Min Yang
Fei Huang
Yongbin Li
MoE
95
21
0
24 May 2023
Training Transitive and Commutative Multimodal Transformers with LoReTTa
Manuel Tran
Yashin Dicente Cid
Amal Lahiani
Fabian J. Theis
Tingying Peng
Eldad Klaiman
87
2
0
23 May 2023
Parts of Speech-Grounded Subspaces in Vision-Language Models
James Oldfield
Christos Tzelepis
Yannis Panagakis
M. Nicolaou
Ioannis Patras
89
9
0
23 May 2023
CLIP4STR: A Simple Baseline for Scene Text Recognition with Pre-trained Vision-Language Model
Shuai Zhao
Xiaohan Wang
Linchao Zhu
Yezhou Yang
CLIP
VLM
131
27
0
23 May 2023
i-Code Studio: A Configurable and Composable Framework for Integrative AI
Yuwei Fang
Mahmoud Khademi
Chenguang Zhu
Ziyi Yang
Reid Pryzant
...
Yao Qian
Takuya Yoshioka
Lu Yuan
Michael Zeng
Xuedong Huang
86
2
0
23 May 2023
VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending
Xingjian He
Sihan Chen
Fan Ma
Zhicheng Huang
Xiaojie Jin
Zikang Liu
Dongmei Fu
Yi Yang
Qingbin Liu
Jiashi Feng
VLM
CLIP
108
18
0
22 May 2023
Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design
Ibrahim Alabdulmohsin
Xiaohua Zhai
Alexander Kolesnikov
Lucas Beyer
VLM
160
64
0
22 May 2023
Album Storytelling with Iterative Story-aware Captioning and Large Language Models
Munan Ning
Yujia Xie
Dongdong Chen
Zeyin Song
Lu Yuan
Yonghong Tian
QiXiang Ye
Liuliang Yuan
76
8
0
22 May 2023
i-Code V2: An Autoregressive Generation Framework over Vision, Language, and Speech Data
Ziyi Yang
Mahmoud Khademi
Yichong Xu
Reid Pryzant
Yuwei Fang
...
Yu Shi
Lu Yuan
Takuya Yoshioka
Michael Zeng
Xuedong Huang
70
2
0
21 May 2023
Pengi: An Audio Language Model for Audio Tasks
Soham Deshmukh
Benjamin Elizalde
Rita Singh
Huaming Wang
MLLM
AuLLM
131
182
0
19 May 2023
Enhancing Vision-Language Pre-Training with Jointly Learned Questioner and Dense Captioner
Zikang Liu
Sihan Chen
Longteng Guo
Handong Li
Xingjian He
Qingbin Liu
82
1
0
19 May 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLM
MLLM
ObjD
151
122
0
18 May 2023
Semantically Aligned Task Decomposition in Multi-Agent Reinforcement Learning
Wenhao Li
Dan Qiao
Baoxiang Wang
Xiangfeng Wang
Bo Jin
H. Zha
93
6
0
18 May 2023
Discffusion: Discriminative Diffusion Models as Few-shot Vision and Language Learners
Xuehai He
Weixi Feng
Tsu-Jui Fu
Varun Jampani
Arjun Reddy Akula
P. Narayana
Sugato Basu
William Yang Wang
Xinze Wang
DiffM
114
8
0
18 May 2023
Paxion: Patching Action Knowledge in Video-Language Foundation Models
Zhenhailong Wang
Ansel Blume
Sha Li
Genglin Liu
Jaemin Cho
Zineng Tang
Joey Tianyi Zhou
Heng Ji
KELM
VGen
60
32
0
18 May 2023
Rethinking Multimodal Content Moderation from an Asymmetric Angle with Mixed-modality
Jialing Yuan
Ye Yu
Gaurav Mittal
Matthew Hall
Sandra Sajeev
Mei Chen
93
11
0
17 May 2023
Online Continual Learning Without the Storage Constraint
Ameya Prabhu
Zhipeng Cai
P. Dokania
Philip Torr
V. Koltun
Ozan Sener
CLL
177
32
0
16 May 2023
A Comprehensive Survey on Segment Anything Model for Vision and Beyond
Chunhui Zhang
Li Liu
Yawen Cui
Guanjie Huang
Weilin Lin
Yiqian Yang
Yuehong Hu
VLM
102
101
0
14 May 2023
On the Hidden Mystery of OCR in Large Multimodal Models
Yuliang Liu
Zhang Li
Mingxin Huang
Chunyuan Li
Dezhi Peng
Mingyu Liu
Lianwen Jin
Xiang Bai
VLM
MLLM
154
97
0
13 May 2023
Previous
1
2
3
...
7
8
9
...
12
13
14
Next