ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2011.10678
  4. Cited By
Open-Vocabulary Object Detection Using Captions

Open-Vocabulary Object Detection Using Captions

20 November 2020
Alireza Zareian
Kevin Dela Rosa
Derek Hao Hu
Shih-Fu Chang
    VLM
    ObjD
ArXivPDFHTML

Papers citing "Open-Vocabulary Object Detection Using Captions"

40 / 40 papers shown
Title
Decoupling Classifier for Boosting Few-shot Object Detection and Instance Segmentation
Decoupling Classifier for Boosting Few-shot Object Detection and Instance Segmentation
Bin-Bin Gao
Xiaochen Chen
Z. Huang
Congchong Nie
Jun Liu
Jinxiang Lai
Guannan Jiang
Xi-Zhao Wang
Chengjie Wang
100
28
0
20 May 2025
FG-CLIP: Fine-Grained Visual and Textual Alignment
FG-CLIP: Fine-Grained Visual and Textual Alignment
Chunyu Xie
Bin Wang
Fanjing Kong
Jincheng Li
Dawei Liang
Gengshen Zhang
Dawei Leng
Yuhui Yin
CLIP
VLM
99
0
0
08 May 2025
CQ-DINO: Mitigating Gradient Dilution via Category Queries for Vast Vocabulary Object Detection
CQ-DINO: Mitigating Gradient Dilution via Category Queries for Vast Vocabulary Object Detection
Zhichao Sun
Huazhang Hu
Yidong Ma
Gang Liu
Nemo Chen
Xu Tang
Feng-Long Xie
Yongchao Xu
ObjD
82
0
0
24 Mar 2025
Cyclic Contrastive Knowledge Transfer for Open-Vocabulary Object Detection
Cyclic Contrastive Knowledge Transfer for Open-Vocabulary Object Detection
Chuhan Zhang
Chaoyang Zhu
Pingcheng Dong
Long Chen
Dong Zhang
ObjD
VLM
387
0
0
14 Mar 2025
Referring to Any Person
Referring to Any Person
Qing Jiang
Lin Wu
Zhaoyang Zeng
Tianhe Ren
Yuda Xiong
Yihao Chen
Qin Liu
Lei Zhang
373
0
0
11 Mar 2025
Enhancing Novel Object Detection via Cooperative Foundational Models
Enhancing Novel Object Detection via Cooperative Foundational Models
Rohit K Bharadwaj
Muzammal Naseer
Salman Khan
Fahad Shahbaz Khan
ObjD
VLM
234
1
0
17 Jan 2025
Towards Visual Grounding: A Survey
Towards Visual Grounding: A Survey
Linhui Xiao
Xiaoshan Yang
X. Lan
Yaowei Wang
Changsheng Xu
ObjD
146
4
0
31 Dec 2024
OpenAD: Open-World Autonomous Driving Benchmark for 3D Object Detection
OpenAD: Open-World Autonomous Driving Benchmark for 3D Object Detection
Zhongyu Xia
Jishuo Li
Zhiwei Lin
Xinhao Wang
Yansen Wang
Ming-Hsuan Yang
VLM
106
2
0
26 Nov 2024
Rethinking The Training And Evaluation of Rich-Context Layout-to-Image Generation
Rethinking The Training And Evaluation of Rich-Context Layout-to-Image Generation
Jiaxin Cheng
Zixu Zhao
Tong He
Tianjun Xiao
Yicong Zhou
Zheng Zhang
DiffM
74
0
0
07 Sep 2024
DiPEx: Dispersing Prompt Expansion for Class-Agnostic Object Detection
DiPEx: Dispersing Prompt Expansion for Class-Agnostic Object Detection
Jia Syuen Lim
Zhuoxiao Chen
Mahsa Baktashmotlagh
Zhi Chen
Xin Yu
Zi Huang
Yadan Luo
VLM
ObjD
109
1
0
21 Jun 2024
Unsupervised Vision-and-Language Pre-training Without Parallel Images
  and Captions
Unsupervised Vision-and-Language Pre-training Without Parallel Images and Captions
Liunian Harold Li
Haoxuan You
Zhecan Wang
Alireza Zareian
Shih-Fu Chang
Kai-Wei Chang
SSL
VLM
81
12
0
24 Oct 2020
VirTex: Learning Visual Representations from Textual Annotations
VirTex: Learning Visual Representations from Textual Annotations
Karan Desai
Justin Johnson
SSL
VLM
128
433
0
11 Jun 2020
Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal
  Transformers
Pixel-BERT: Aligning Image Pixels with Text by Deep Multi-Modal Transformers
Zhicheng Huang
Zhaoyang Zeng
Bei Liu
Dongmei Fu
Jianlong Fu
ViT
114
437
0
02 Apr 2020
Dont Even Look Once: Synthesizing Features for Zero-Shot Detection
Dont Even Look Once: Synthesizing Features for Zero-Shot Detection
Pengkai Zhu
Hanxiao Wang
Venkatesh Saligrama
ObjD
54
88
0
18 Nov 2019
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
Weijie Su
Xizhou Zhu
Yue Cao
Bin Li
Lewei Lu
Furu Wei
Jifeng Dai
VLM
MLLM
SSL
124
1,657
0
22 Aug 2019
VisualBERT: A Simple and Performant Baseline for Vision and Language
VisualBERT: A Simple and Performant Baseline for Vision and Language
Liunian Harold Li
Mark Yatskar
Da Yin
Cho-Jui Hsieh
Kai-Wei Chang
VLM
109
1,939
0
09 Aug 2019
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for
  Vision-and-Language Tasks
ViLBERT: Pretraining Task-Agnostic Visiolinguistic Representations for Vision-and-Language Tasks
Jiasen Lu
Dhruv Batra
Devi Parikh
Stefan Lee
SSL
VLM
197
3,659
0
06 Aug 2019
Cap2Det: Learning to Amplify Weak Caption Supervision for Object
  Detection
Cap2Det: Learning to Amplify Weak Caption Supervision for Object Detection
Keren Ye
Ruotong Wang
Adriana Kovashka
Wei Li
Danfeng Qin
Jesse Berent
94
60
0
23 Jul 2019
C-MIL: Continuation Multiple Instance Learning for Weakly Supervised
  Object Detection
C-MIL: Continuation Multiple Instance Learning for Weakly Supervised Object Detection
Fang Wan
Chang-rui Liu
Wei Ke
Xiangyang Ji
Jianbin Jiao
QiXiang Ye
WSOD
43
229
0
11 Apr 2019
Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption
  Alignment
Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption Alignment
Samyak Datta
Karan Sikka
Anirban Roy
Karuna Ahuja
Devi Parikh
Ajay Divakaran
44
103
0
27 Mar 2019
NOTE-RCNN: NOise Tolerant Ensemble RCNN for Semi-Supervised Object
  Detection
NOTE-RCNN: NOise Tolerant Ensemble RCNN for Semi-Supervised Object Detection
J. Gao
Jiang Wang
Shengyang Dai
Li Li
Ram Nevatia
ObjD
46
93
0
01 Dec 2018
Multi-level Multimodal Common Semantic Space for Image-Phrase Grounding
Multi-level Multimodal Common Semantic Space for Image-Phrase Grounding
Hassan Akbari
Svebor Karaman
Surabhi Bhargava
Brian Chen
Carl Vondrick
Shih-Fu Chang
40
82
0
28 Nov 2018
Learning to discover and localize visual objects with open vocabulary
Learning to discover and localize visual objects with open vocabulary
Keren Ye
Ruotong Wang
Wei Li
Danfeng Qin
Adriana Kovashka
Jesse Berent
ObjD
35
4
0
25 Nov 2018
The Open Images Dataset V4: Unified image classification, object
  detection, and visual relationship detection at scale
The Open Images Dataset V4: Unified image classification, object detection, and visual relationship detection at scale
Alina Kuznetsova
H. Rom
N. Alldrin
J. Uijlings
Ivan Krasin
...
S. Popov
Matteo Malloci
Alexander Kolesnikov
Tom Duerig
V. Ferrari
ObjD
VLM
87
1,340
0
02 Nov 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
953
93,936
0
11 Oct 2018
Zero-Shot Object Detection
Zero-Shot Object Detection
Ankan Bansal
Karan Sikka
Gaurav Sharma
Rama Chellappa
Ajay Divakaran
VLM
ObjD
74
360
0
12 Apr 2018
Zero-Shot Detection
Zero-Shot Detection
Pengkai Zhu
Hanxiao Wang
Venkatesh Saligrama
VLM
ObjD
53
99
0
19 Mar 2018
Knowledge Aided Consistency for Weakly Supervised Phrase Grounding
Knowledge Aided Consistency for Weakly Supervised Phrase Grounding
Kan Chen
J. Gao
Ram Nevatia
58
90
0
11 Mar 2018
Visual and Semantic Knowledge Transfer for Large Scale Semi-supervised
  Object Detection
Visual and Semantic Knowledge Transfer for Large Scale Semi-supervised Object Detection
Yuxing Tang
Josiah Wang
Xiaofang Wang
Boyang Gao
Emmanuel Dellandrea
R. Gaizauskas
Liming Chen
ObjD
44
115
0
09 Jan 2018
Revisiting knowledge transfer for training object class detectors
Revisiting knowledge transfer for training object class detectors
J. Uijlings
S. Popov
V. Ferrari
VLM
ObjD
34
71
0
21 Aug 2017
Weakly-supervised Visual Grounding of Phrases with Linguistic Structures
Weakly-supervised Visual Grounding of Phrases with Linguistic Structures
Fanyi Xiao
Leonid Sigal
Yong Jae Lee
56
139
0
03 May 2017
Mask R-CNN
Mask R-CNN
Kaiming He
Georgia Gkioxari
Piotr Dollár
Ross B. Girshick
ObjD
300
27,018
0
20 Mar 2017
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense
  Image Annotations
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations
Ranjay Krishna
Yuke Zhu
Oliver Groth
Justin Johnson
Kenji Hata
...
Yannis Kalantidis
Li Li
David A. Shamma
Michael S. Bernstein
Fei-Fei Li
167
5,706
0
23 Feb 2016
Deep Residual Learning for Image Recognition
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
1.4K
192,638
0
10 Dec 2015
Weakly Supervised Deep Detection Networks
Weakly Supervised Deep Detection Networks
Hakan Bilen
Andrea Vedaldi
WSOD
52
784
0
09 Nov 2015
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal
  Networks
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Shaoqing Ren
Kaiming He
Ross B. Girshick
Jian Sun
AIMat
ObjD
410
61,900
0
04 Jun 2015
Microsoft COCO Captions: Data Collection and Evaluation Server
Microsoft COCO Captions: Data Collection and Evaluation Server
Xinlei Chen
Hao Fang
Nayeon Lee
Ramakrishna Vedantam
Saurabh Gupta
Piotr Dollar
C. L. Zitnick
153
2,461
0
01 Apr 2015
Weakly Supervised Object Localization with Multi-fold Multiple Instance
  Learning
Weakly Supervised Object Localization with Multi-fold Multiple Instance Learning
R. G. Cinbis
Jakob Verbeek
Cordelia Schmid
WSOD
SSL
58
430
0
03 Mar 2015
LSDA: Large Scale Detection Through Adaptation
LSDA: Large Scale Detection Through Adaptation
Judy Hoffman
S. Guadarrama
Eric Tzeng
Ronghang Hu
Jeff Donahue
Ross B. Girshick
Trevor Darrell
Kate Saenko
ObjD
69
334
0
18 Jul 2014
Microsoft COCO: Common Objects in Context
Microsoft COCO: Common Objects in Context
Nayeon Lee
Michael Maire
Serge J. Belongie
Lubomir Bourdev
Ross B. Girshick
James Hays
Pietro Perona
Deva Ramanan
C. L. Zitnick
Piotr Dollár
ObjD
255
43,290
0
01 May 2014
1