Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2205.06160
Cited By
Localized Vision-Language Matching for Open-vocabulary Object Detection
12 May 2022
M. A. Bravo
Sudhanshu Mittal
Thomas Brox
VLM
ObjD
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Localized Vision-Language Matching for Open-vocabulary Object Detection"
16 / 16 papers shown
Title
RegionCLIP: Region-based Language-Image Pretraining
Yiwu Zhong
Jianwei Yang
Pengchuan Zhang
Chunyuan Li
Noel Codella
...
Luowei Zhou
Xiyang Dai
Lu Yuan
Yin Li
Jianfeng Gao
VLM
CLIP
138
576
0
16 Dec 2021
Open-Vocabulary Instance Segmentation via Robust Cross-Modal Pseudo-Labeling
Dat T. Huynh
Jason Kuen
Zhe Lin
Jiuxiang Gu
Ehsan Elhamifar
ISeg
VLM
51
85
0
24 Nov 2021
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation
Xiuye Gu
Nayeon Lee
Weicheng Kuo
Huayu Chen
VLM
ObjD
274
917
0
28 Apr 2021
Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers
Antoine Miech
Jean-Baptiste Alayrac
Ivan Laptev
Josef Sivic
Andrew Zisserman
ViT
54
139
0
30 Mar 2021
Open-Vocabulary Object Detection Using Captions
Alireza Zareian
Kevin Dela Rosa
Derek Hao Hu
Shih-Fu Chang
VLM
ObjD
120
430
0
20 Nov 2020
COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning
Simon Ging
Mohammadreza Zolfaghari
Hamed Pirsiavash
Thomas Brox
ViT
CLIP
70
172
0
01 Nov 2020
COBE: Contextualized Object Embeddings from Narrated Instructional Video
Gedas Bertasius
Lorenzo Torresani
27
24
0
14 Jul 2020
A Simple Semi-Supervised Learning Framework for Object Detection
Kihyuk Sohn
Zizhao Zhang
Chun-Liang Li
Han Zhang
Chen-Yu Lee
Tomas Pfister
86
497
0
10 May 2020
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
Xiujun Li
Xi Yin
Chunyuan Li
Pengchuan Zhang
Xiaowei Hu
...
Houdong Hu
Li Dong
Furu Wei
Yejin Choi
Jianfeng Gao
VLM
105
1,938
0
13 Apr 2020
Instance-aware, Context-focused, and Memory-efficient Weakly Supervised Object Detection
Zhongzheng Ren
Zhiding Yu
Xiaodong Yang
Xuan Li
Yong Jae Lee
Alex Schwing
Jan Kautz
WSOD
62
201
0
09 Apr 2020
Where Does It Exist: Spatio-Temporal Video Grounding for Multi-Form Sentences
Zhu Zhang
Zhou Zhao
Yang Zhao
Qi. Wang
Huasheng Liu
Lianli Gao
64
115
0
19 Jan 2020
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
347
941
0
24 Sep 2019
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
Weijie Su
Xizhou Zhu
Yue Cao
Bin Li
Lewei Lu
Furu Wei
Jifeng Dai
VLM
MLLM
SSL
153
1,663
0
22 Aug 2019
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
Hao Hao Tan
Joey Tianyi Zhou
VLM
MLLM
237
2,479
0
20 Aug 2019
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
226
3,678
0
06 Aug 2019
Microsoft COCO Captions: Data Collection and Evaluation Server
Xinlei Chen
Hao Fang
Nayeon Lee
Ramakrishna Vedantam
Saurabh Gupta
Piotr Dollar
C. L. Zitnick
213
2,475
0
01 Apr 2015
1