ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.05499
  4. Cited By
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set
  Object Detection
v1v2v3v4 (latest)

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

9 March 2023
Shilong Liu
Zhaoyang Zeng
Tianhe Ren
Feng Li
Hao Zhang
Jie Yang
Chun-yue Li
Jianwei Yang
Hang Su
Jun Zhu
Lei Zhang
    ObjD
ArXiv (abs)PDFHTMLGithub (8136★)

Papers citing "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

40 / 690 papers shown
Title
MiniGPT-v2: large language model as a unified interface for
  vision-language multi-task learning
MiniGPT-v2: large language model as a unified interface for vision-language multi-task learning
Jun Chen
Deyao Zhu
Xiaoqian Shen
Xiang Li
Zechun Liu
Pengchuan Zhang
Raghuraman Krishnamoorthi
Vikas Chandra
Yunyang Xiong
Mohamed Elhoseiny
MLLM
258
475
0
14 Oct 2023
Interactive Navigation in Environments with Traversable Obstacles Using
  Large Language and Vision-Language Models
Interactive Navigation in Environments with Traversable Obstacles Using Large Language and Vision-Language Models
Zhen Zhang
Anran Lin
Chun Wai Wong
Xiangyu Chu
Qi Dou
K. W. S. Au
LM&Ro
96
8
0
13 Oct 2023
Learning to Act from Actionless Videos through Dense Correspondences
Learning to Act from Actionless Videos through Dense Correspondences
Po-Chen Ko
Jiayuan Mao
Yilun Du
Shao-Hua Sun
Josh Tenenbaum
111
89
0
12 Oct 2023
Towards reporting bias in visual-language datasets: bimodal augmentation
  by decoupling object-attribute association
Towards reporting bias in visual-language datasets: bimodal augmentation by decoupling object-attribute association
Qiyu Wu
Mengjie Zhao
Yutong He
Lang Huang
Junya Ono
Hiromi Wakaki
Yuki Mitsufuji
107
5
0
02 Oct 2023
Region-centric Image-Language Pretraining for Open-Vocabulary Detection
Region-centric Image-Language Pretraining for Open-Vocabulary Detection
Dahun Kim
A. Angelova
Weicheng Kuo
ObjDVLM
82
4
0
29 Sep 2023
ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and
  Planning
ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and Planning
Yuanyi Zhong
Alihusein Kuwajerwala
Sacha Morin
Krishna Murthy Jatavallabhula
Bipasha Sen
...
Celso Miguel de Melo
Joshua B. Tenenbaum
Antonio Torralba
Florian Shkurti
Liam Paull
LM&Ro
142
189
0
28 Sep 2023
InternLM-XComposer: A Vision-Language Large Model for Advanced
  Text-image Comprehension and Composition
InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition
Pan Zhang
Xiaoyi Wang
Bin Wang
Yuhang Cao
Chao Xu
...
Conghui He
Xingcheng Zhang
Yu Qiao
Da Lin
Jiaqi Wang
MLLM
198
241
0
26 Sep 2023
Detect Everything with Few Examples
Detect Everything with Few Examples
Xinyu Zhang
Yuting Wang
Abdeslam Boularias
ObjDVLM
106
14
0
22 Sep 2023
Triple Regression for Camera Agnostic Sim2Real Robot Grasping and
  Manipulation Tasks
Triple Regression for Camera Agnostic Sim2Real Robot Grasping and Manipulation Tasks
Yuanhong Zeng
Yizhou Zhao
Ying Nian Wu
83
0
0
16 Sep 2023
GRID: Scene-Graph-based Instruction-driven Robotic Task Planning
GRID: Scene-Graph-based Instruction-driven Robotic Task Planning
Zhe Ni
Xiao-Xin Deng
Cong Tai
Xin-Yue Zhu
Qinghongbing Xie
Yang Liu
Xiang Wu
Long Zeng
LM&Ro
93
15
0
14 Sep 2023
A One Stop 3D Target Reconstruction and multilevel Segmentation Method
A One Stop 3D Target Reconstruction and multilevel Segmentation Method
Jinfeng Xu
Wei Zhao
Zhiyan Tang
X. Gan
3DV
57
2
0
14 Aug 2023
Follow Anything: Open-set detection, tracking, and following in
  real-time
Follow Anything: Open-set detection, tracking, and following in real-time
Alaa Maalouf
Ninad Jadhav
Krishna Murthy Jatavallabhula
Makram Chahine
Daniel M.Vogt
Robert J. Wood
Antonio Torralba
Daniela Rus
107
25
0
10 Aug 2023
ImageBrush: Learning Visual In-Context Instructions for Exemplar-Based
  Image Manipulation
ImageBrush: Learning Visual In-Context Instructions for Exemplar-Based Image Manipulation
Yasheng Sun
Yifan Yang
Houwen Peng
Yifei Shen
Yuqing Yang
Hang-Rui Hu
Lili Qiu
Hideki Koike
DiffMLM&Ro
87
39
0
02 Aug 2023
Tool Documentation Enables Zero-Shot Tool-Usage with Large Language
  Models
Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models
Cheng-Yu Hsieh
Sibei Chen
Chun-Liang Li
Yasuhisa Fujii
Alexander Ratner
Chen-Yu Lee
Ranjay Krishna
Tomas Pfister
LLMAGSyDa
155
44
0
01 Aug 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming-Hsuan Yang
Fahad Shahbaz Khan
VLM
148
128
0
25 Jul 2023
Fashion Matrix: Editing Photos by Just Talking
Fashion Matrix: Editing Photos by Just Talking
Zheng Chong
Xujie Zhang
Fuwei Zhao
Zhenyu Xie
Xiaodan Liang
DiffM
80
2
0
25 Jul 2023
OG: Equip vision occupancy with instance segmentation and visual
  grounding
OG: Equip vision occupancy with instance segmentation and visual grounding
Zichao Dong
Hang Ji
Weikun Zhang
Xufeng Huang
Junbo Chen
ISegVLM
46
0
0
12 Jul 2023
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
GPT4RoI: Instruction Tuning Large Language Model on Region-of-Interest
Shilong Zhang
Pei Sun
Shoufa Chen
Min Xiao
Wenqi Shao
Wenwei Zhang
Yu Liu
Kai-xiang Chen
Ping Luo
MLLMVLM
173
238
0
07 Jul 2023
Counting Guidance for High Fidelity Text-to-Image Synthesis
Counting Guidance for High Fidelity Text-to-Image Synthesis
Wonjune Kang
Kevin Galim
H. Koo
Nam Ik Cho
DiffM
126
10
0
30 Jun 2023
What a MESS: Multi-Domain Evaluation of Zero-Shot Semantic Segmentation
What a MESS: Multi-Domain Evaluation of Zero-Shot Semantic Segmentation
Benedikt Blumenstiel
Johannes Jakubik
Hilde Kuhne
Michael Vossing
VLM
129
18
0
27 Jun 2023
When Foundation Model Meets Federated Learning: Motivations, Challenges, and Future Directions
When Foundation Model Meets Federated Learning: Motivations, Challenges, and Future Directions
Weiming Zhuang
Chen Chen
Lingjuan Lyu
Chong Chen
Yaochu Jin
Lingjuan Lyu
AIFinAI4CE
231
99
0
27 Jun 2023
AssistGPT: A General Multi-modal Assistant that can Plan, Execute,
  Inspect, and Learn
AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn
Difei Gao
Lei Ji
Luowei Zhou
Kevin Lin
Joya Chen
Zihan Fan
Mike Zheng Shou
MLLM
118
76
0
14 Jun 2023
Transferring Foundation Models for Generalizable Robotic Manipulation
Transferring Foundation Models for Generalizable Robotic Manipulation
Jiange Yang
Wenhui Tan
Chuhao Jin
Keling Yao
Bei Liu
Jianlong Fu
Ruihua Song
Gangshan Wu
Limin Wang
LM&Ro
142
9
0
09 Jun 2023
LRVS-Fashion: Extending Visual Search with Referring Instructions
LRVS-Fashion: Extending Visual Search with Referring Instructions
Simon Lepage
Jérémie Mary
David Picard
101
1
0
05 Jun 2023
Segment Anything in High Quality
Segment Anything in High Quality
Lei Ke
Mingqiao Ye
Martin Danelljan
Yifan Liu
Yu-Wing Tai
Chi-Keung Tang
Feng Yu
VLM
129
341
0
02 Jun 2023
Building One-class Detector for Anything: Open-vocabulary Zero-shot OOD
  Detection Using Text-image Models
Building One-class Detector for Anything: Open-vocabulary Zero-shot OOD Detection Using Text-image Models
Yunhao Ge
Jie Jessie Ren
Jiaping Zhao
Kaifeng Chen
Andrew Gallagher
Laurent Itti
Balaji Lakshminarayanan
VLMObjD
53
1
0
26 May 2023
ChatCAD+: Towards a Universal and Reliable Interactive CAD using LLMs
ChatCAD+: Towards a Universal and Reliable Interactive CAD using LLMs
Zihao Zhao
Sheng Wang
Jinchen Gu
Yitao Zhu
Lanzhuju Mei
Zixu Zhuang
Zhiming Cui
Qian Wang
Dinggang Shen
LM&MA
122
43
0
25 May 2023
Visual Programming for Text-to-Image Generation and Evaluation
Visual Programming for Text-to-Image Generation and Evaluation
Jaemin Cho
Abhaysinh Zala
Joey Tianyi Zhou
MLLM
130
51
0
24 May 2023
AMELI: Enhancing Multimodal Entity Linking with Fine-Grained Attributes
AMELI: Enhancing Multimodal Entity Linking with Fine-Grained Attributes
Barry Menglong Yao
Yu Chen
Qifan Wang
Sijia Wang
Minqian Liu
Zhiyang Xu
Licheng Yu
Lifu Huang
130
8
0
24 May 2023
Compositional Text-to-Image Synthesis with Attention Map Control of
  Diffusion Models
Compositional Text-to-Image Synthesis with Attention Map Control of Diffusion Models
Ruichen Wang
Zekang Chen
Chen Chen
Jiancang Ma
H. Lu
Xiaodong Lin
DiffM
96
73
0
23 May 2023
Matcher: Segment Anything with One Shot Using All-Purpose Feature
  Matching
Matcher: Segment Anything with One Shot Using All-Purpose Feature Matching
Yang Liu
Muzhi Zhu
Hengtao Li
Hao Chen
Xinlong Wang
Chunhua Shen
VLMMLLM
181
90
0
22 May 2023
Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts
Make-A-Protagonist: Generic Video Editing with An Ensemble of Experts
Yuyang Zhao
Enze Xie
Lanqing Hong
Zhenguo Li
G. Lee
DiffMVGen
110
34
0
15 May 2023
Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with
  Foundation Models
Bridging the Domain Gap: Self-Supervised 3D Scene Understanding with Foundation Models
Zhimin Chen
Longlong Jing
Yingwei Li
Bing Li
116
34
0
15 May 2023
A Comprehensive Survey on Segment Anything Model for Vision and Beyond
A Comprehensive Survey on Segment Anything Model for Vision and Beyond
Chunhui Zhang
Li Liu
Yawen Cui
Guanjie Huang
Weilin Lin
Yiqian Yang
Yuehong Hu
VLM
102
101
0
14 May 2023
Segment and Track Anything
Segment and Track Anything
Yangming Cheng
Liulei Li
Yuanyou Xu
Xiaodi Li
Zongxin Yang
Wenguan Wang
Yi Yang
VOS
98
205
0
11 May 2023
Learnable Ophthalmology SAM
Learnable Ophthalmology SAM
Zhongxi Qiu
Yan Hu
Heng Li
Jiang-Dong Liu
VLMMedIm
95
27
0
26 Apr 2023
Expressive Text-to-Image Generation with Rich Text
Expressive Text-to-Image Generation with Rich Text
Songwei Ge
Taesung Park
Jun-Yan Zhu
Jia-Bin Huang
DiffM
175
82
0
13 Apr 2023
SATR: Zero-Shot Semantic Segmentation of 3D Shapes
SATR: Zero-Shot Semantic Segmentation of 3D Shapes
Ahmed Abdelreheem
Ivan Skorokhodov
M. Ovsjanikov
Peter Wonka
3DPC
115
39
0
11 Apr 2023
Virtual Guidance as a Mid-level Representation for Navigation with Augmented Reality
Virtual Guidance as a Mid-level Representation for Navigation with Augmented Reality
Hsuan-Kung Yang
Tsung-Chih Chiang
Tingxin Liu
Chun-Wei Huang
Jou-Min Liu
Tsu-Ching Hsiao
Chun-Yi Lee
69
1
0
05 Mar 2023
CLIP-Adapter: Better Vision-Language Models with Feature Adapters
CLIP-Adapter: Better Vision-Language Models with Feature Adapters
Peng Gao
Shijie Geng
Renrui Zhang
Teli Ma
Rongyao Fang
Yongfeng Zhang
Hongsheng Li
Yu Qiao
VLMCLIP
355
1,066
0
09 Oct 2021
Previous
123...121314