ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.05499
  4. Cited By
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set
  Object Detection
v1v2v3v4 (latest)

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

9 March 2023
Shilong Liu
Zhaoyang Zeng
Tianhe Ren
Feng Li
Hao Zhang
Jie Yang
Chun-yue Li
Jianwei Yang
Hang Su
Jun Zhu
Lei Zhang
    ObjD
ArXiv (abs)PDFHTMLGithub (8136★)

Papers citing "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

50 / 691 papers shown
Title
OneRef: Unified One-tower Expression Grounding and Segmentation with
  Mask Referring Modeling
OneRef: Unified One-tower Expression Grounding and Segmentation with Mask Referring Modeling
Linhui Xiao
Xiaoshan Yang
Fang Peng
Yaowei Wang
Changsheng Xu
ObjD
131
7
0
10 Oct 2024
TweedieMix: Improving Multi-Concept Fusion for Diffusion-based Image/Video Generation
TweedieMix: Improving Multi-Concept Fusion for Diffusion-based Image/Video Generation
Gihyun Kwon
Jong Chul Ye
DiffM
132
5
0
08 Oct 2024
On Efficient Variants of Segment Anything Model: A Survey
On Efficient Variants of Segment Anything Model: A Survey
Xiaorui Sun
Jing Liu
Jikang Cheng
Xiaofeng Zhu
Ping Hu
VLM
147
7
0
07 Oct 2024
Is What You Ask For What You Get? Investigating Concept Associations in Text-to-Image Models
Is What You Ask For What You Get? Investigating Concept Associations in Text-to-Image Models
Salma Abdel Magid
Weiwei Pan
Simon Warchol
Grace Guo
Junsik Kim
Mahia Rahman
Hanspeter Pfister
201
0
0
06 Oct 2024
AI-rays: Exploring Bias in the Gaze of AI Through a Multimodal
  Interactive Installation
AI-rays: Exploring Bias in the Gaze of AI Through a Multimodal Interactive Installation
Ziyao Gao
Yiwen Zhang
Ling Li
Theodoros Papatheodorou
Wei Zeng
46
0
0
03 Oct 2024
SPINE: Online Semantic Planning for Missions with Incomplete Natural Language Specifications in Unstructured Environments
SPINE: Online Semantic Planning for Missions with Incomplete Natural Language Specifications in Unstructured Environments
Zachary Ravichandran
Varun Murali
Mariliza Tzes
George J. Pappas
Vijay Kumar
LRM
148
10
0
03 Oct 2024
ET-Plan-Bench: Embodied Task-level Planning Benchmark Towards Spatial-Temporal Cognition with Foundation Models
ET-Plan-Bench: Embodied Task-level Planning Benchmark Towards Spatial-Temporal Cognition with Foundation Models
Lingfeng Zhang
Yuening Wang
Hongjian Gu
Atia Hamidizadeh
Zhanguang Zhang
...
Tongtong Cao
Yuzheng Zhuang
Yingxue Zhang
Jianye Hao
Jianye Hao
LM&Ro
125
2
0
02 Oct 2024
iTeach: Interactive Teaching for Robot Perception using Mixed Reality
iTeach: Interactive Teaching for Robot Perception using Mixed Reality
Jishnu Jaykumar P
Cole Salvato
Vinaya Bomnale
Jikai Wang
Yu Xiang
140
0
0
01 Oct 2024
Find Everything: A General Vision Language Model Approach to Multi-Object Search
Find Everything: A General Vision Language Model Approach to Multi-Object Search
Daniel Choi
Angus Fung
Haitong Wang
Aaron Hao Tan
132
3
0
01 Oct 2024
Playful DoggyBot: Learning Agile and Precise Quadrupedal Locomotion
Playful DoggyBot: Learning Agile and Precise Quadrupedal Locomotion
Xin Duan
Ziwen Zhuang
Hang Zhao
Soeren Schwertfeger
133
2
0
30 Sep 2024
SECURE: Semantics-aware Embodied Conversation under Unawareness for Lifelong Robot Learning
SECURE: Semantics-aware Embodied Conversation under Unawareness for Lifelong Robot Learning
Rimvydas Rubavicius
Peter David Fagan
A. Lascarides
Subramanian Ramamoorthy
LM&Ro
458
0
0
26 Sep 2024
Multi-View and Multi-Scale Alignment for Contrastive Language-Image Pre-training in Mammography
Multi-View and Multi-Scale Alignment for Contrastive Language-Image Pre-training in Mammography
Yuexi Du
John Onofrey
Nicha Dvornek
VLM
110
2
0
26 Sep 2024
RT-GuIDE: Real-Time Gaussian splatting for Information-Driven Exploration
RT-GuIDE: Real-Time Gaussian splatting for Information-Driven Exploration
Yuezhan Tao
Dexter Ong
Varun Murali
Igor Spasojevic
Pratik Chaudhari
Vijay Kumar
3DGS
130
7
0
26 Sep 2024
Generative Object Insertion in Gaussian Splatting with a Multi-View Diffusion Model
Generative Object Insertion in Gaussian Splatting with a Multi-View Diffusion Model
Hongliang Zhong
Can Wang
Jingbo Zhang
Jing Liao
3DGSDiffM
88
2
0
25 Sep 2024
CloudTrack: Scalable UAV Tracking with Cloud Semantics
CloudTrack: Scalable UAV Tracking with Cloud Semantics
Yannik Blei
Michael Krawez
Nisarga Nilavadi
Tanja Katharina Kaiser
Wolfram Burgard
108
1
0
24 Sep 2024
Articulated Object Manipulation using Online Axis Estimation with SAM2-Based Tracking
Articulated Object Manipulation using Online Axis Estimation with SAM2-Based Tracking
Xi Wang
Tianxing Chen
Qiaojun Yu
Tianling Xu
Zanxin Chen
Yiting Fu
Cewu Lu
Cewu Lu
Ping Luo
Ping Luo
126
6
0
24 Sep 2024
OW-Rep: Open World Object Detection with Instance Representation Learning
OW-Rep: Open World Object Detection with Instance Representation Learning
Sunoh Lee
Minsik Jeon
Jihong Min
Junwon Seo
ObjD
496
0
0
24 Sep 2024
Tiny Robotics Dataset and Benchmark for Continual Object Detection
Tiny Robotics Dataset and Benchmark for Continual Object Detection
Francesco Pasti
Riccardo De Monte
Davide Dalle Pezze
Gian Antonio Susto
Nicola Bellotto
114
1
0
24 Sep 2024
FineCops-Ref: A new Dataset and Task for Fine-Grained Compositional Referring Expression Comprehension
FineCops-Ref: A new Dataset and Task for Fine-Grained Compositional Referring Expression Comprehension
Junzhuo Liu
Xiaohu Yang
Weiwei Li
Peng Wang
ObjD
141
5
0
23 Sep 2024
Autonomous Exploration and Semantic Updating of Large-Scale Indoor Environments with Mobile Robots
Autonomous Exploration and Semantic Updating of Large-Scale Indoor Environments with Mobile Robots
Sai Haneesh Allu
Itay Kadosh
Tyler Summers
Yu Xiang
108
0
0
23 Sep 2024
COHERENT: Collaboration of Heterogeneous Multi-Robot System with Large Language Models
COHERENT: Collaboration of Heterogeneous Multi-Robot System with Large Language Models
Kehui Liu
Zixin Tang
Dong Wang
Ziyi Wang
Bin Zhao
Bin Zhao
151
14
0
23 Sep 2024
Towards Global Localization using Multi-Modal Object-Instance Re-Identification
Towards Global Localization using Multi-Modal Object-Instance Re-Identification
Aneesh Chavan
Vaibhav Agrawal
Vineeth Bhat
Sarthak Chittawar
Siddharth Srivastava
Chetan Arora
K. M. Krishna
159
0
0
18 Sep 2024
HiFi-CS: Towards Open Vocabulary Visual Grounding For Robotic Grasping Using Vision-Language Models
HiFi-CS: Towards Open Vocabulary Visual Grounding For Robotic Grasping Using Vision-Language Models
V. Bhat
Prashanth Krishnamurthy
Ramesh Karri
Farshad Khorrami
151
6
0
16 Sep 2024
One missing piece in Vision and Language: A Survey on Comics Understanding
One missing piece in Vision and Language: A Survey on Comics Understanding
Emanuele Vivoli
Andrey Barsky
Mohamed Ali Souibgui
Artemis LLabres
Marco Bertini
Dimosthenis Karatzas
128
5
0
14 Sep 2024
QueryCAD: Grounded Question Answering for CAD Models
QueryCAD: Grounded Question Answering for CAD Models
Claudius Kienle
Benjamin Alt
Darko Katic
Rainer Jäkel
Jan Peters
108
2
0
13 Sep 2024
Click2Mask: Local Editing with Dynamic Mask Generation
Click2Mask: Local Editing with Dynamic Mask Generation
Omer Regev
Omri Avrahami
Dani Lischinski
DiffM
118
2
0
12 Sep 2024
Rethinking The Training And Evaluation of Rich-Context Layout-to-Image Generation
Rethinking The Training And Evaluation of Rich-Context Layout-to-Image Generation
Jiaxin Cheng
Zixu Zhao
Tong He
Tianjun Xiao
Yicong Zhou
Zheng Zhang
DiffM
154
0
0
07 Sep 2024
RoomDiffusion: A Specialized Diffusion Model in the Interior Design
  Industry
RoomDiffusion: A Specialized Diffusion Model in the Interior Design Industry
Zhaowei Wang
Ying Hao
Hao Wei
Qing Xiao
Lulu Chen
Yulong Li
Yue Yang
Tianyi Li
DiffM
51
0
0
05 Sep 2024
Training-Free Sketch-Guided Diffusion with Latent Optimization
Training-Free Sketch-Guided Diffusion with Latent Optimization
Sandra Zhang Ding
Jiafeng Mao
Kiyoharu Aizawa
DiffM
190
3
0
31 Aug 2024
UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios
UrBench: A Comprehensive Benchmark for Evaluating Large Multimodal Models in Multi-View Urban Scenarios
Baichuan Zhou
Haote Yang
Dairong Chen
Junyan Ye
Tianyi Bai
Jinhua Yu
Songyang Zhang
Dahua Lin
Conghui He
Weijia Li
VLM
182
7
0
30 Aug 2024
Generic Objects as Pose Probes for Few-shot View Synthesis
Generic Objects as Pose Probes for Few-shot View Synthesis
Zhirui Gao
Renjiao Yi
Chenyang Zhu
Ke Zhuang
Wei Chen
K. Xu
173
2
0
29 Aug 2024
Depth-Weighted Detection of Behaviours of Risk in People with Dementia using Cameras
Depth-Weighted Detection of Behaviours of Risk in People with Dementia using Cameras
Pratik K. Mishra
Irene Ballester
Andrea Iaboni
Bing Ye
Kristine Newman
Alex Mihailidis
Shehroz S. Khan
81
2
0
28 Aug 2024
Segment Any Mesh
Segment Any Mesh
George Tang
William Zhao
Logan Ford
David Benhaim
Paul Zhang
97
9
0
24 Aug 2024
FungiTastic: A multi-modal dataset and benchmark for image categorization
FungiTastic: A multi-modal dataset and benchmark for image categorization
Lukás Picek
Klara Janouskova
Milan Šulc
Jirí Matas
160
1
0
24 Aug 2024
NuSegDG: Integration of Heterogeneous Space and Gaussian Kernel for Domain-Generalized Nuclei Segmentation
NuSegDG: Integration of Heterogeneous Space and Gaussian Kernel for Domain-Generalized Nuclei Segmentation
Zhenye Lou
Qing Xu
Zekun Jiang
Xiangjian He
Z. Chen
Yi Wang
Chenxin Li
Maggie M. He
Wenting Duan
109
2
0
21 Aug 2024
Sycophancy in Vision-Language Models: A Systematic Analysis and an Inference-Time Mitigation Framework
Sycophancy in Vision-Language Models: A Systematic Analysis and an Inference-Time Mitigation Framework
Yunpu Zhao
Rui Zhang
Junbin Xiao
Changxin Ke
Ruibo Hou
Yifan Hao
Qi Guo
70
4
0
21 Aug 2024
Target-Oriented Object Grasping via Multimodal Human Guidance
Target-Oriented Object Grasping via Multimodal Human Guidance
Pengwei Xie
Siang Chen
Dingchang Hu
Yixiang Dai
Kaiqin Yang
Guijin Wang
112
4
0
20 Aug 2024
OpenScan: A Benchmark for Generalized Open-Vocabulary 3D Scene Understanding
OpenScan: A Benchmark for Generalized Open-Vocabulary 3D Scene Understanding
Youjun Zhao
Jiaying Lin
Shuquan Ye
Qianshi Pang
Rynson W. H. Lau
178
2
0
20 Aug 2024
Locate Anything on Earth: Advancing Open-Vocabulary Object Detection for Remote Sensing Community
Locate Anything on Earth: Advancing Open-Vocabulary Object Detection for Remote Sensing Community
Jiancheng Pan
Yanxing Liu
Yuqian Fu
Muyuan Ma
Jiaohao Li
D. Paudel
Luc Van Gool
Xiaomeng Huang
ObjD
140
9
0
17 Aug 2024
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
xGen-MM (BLIP-3): A Family of Open Large Multimodal Models
Le Xue
Manli Shu
Anas Awadalla
Jun Wang
An Yan
...
Zeyuan Chen
Silvio Savarese
Juan Carlos Niebles
Caiming Xiong
Ran Xu
VLM
112
96
0
16 Aug 2024
A Survey of Text-to-SQL in the Era of LLMs: Where are we, and where are we going?
A Survey of Text-to-SQL in the Era of LLMs: Where are we, and where are we going?
Xinyu Liu
Shuyu Shen
Boyan Li
Peixian Ma
Runzhi Jiang
Yuxin Zhang
Ju Fan
Guoliang Li
Nan Tang
Yuyu Luo
78
32
0
09 Aug 2024
LAC-Net: Linear-Fusion Attention-Guided Convolutional Network for
  Accurate Robotic Grasping Under the Occlusion
LAC-Net: Linear-Fusion Attention-Guided Convolutional Network for Accurate Robotic Grasping Under the Occlusion
Jinyu Zhang
Yongchong Gu
Jianxiong Gao
Haitao Lin
Qiang Sun
Xinwei Sun
Xiangyang Xue
Yanwei Fu
89
2
0
06 Aug 2024
Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models
Self-Introspective Decoding: Alleviating Hallucinations for Large Vision-Language Models
Fushuo Huo
Wenchao Xu
Zhong Zhang
Yining Qi
Zhicheng Chen
Peilin Zhao
VLMMLLM
214
31
0
04 Aug 2024
Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey
Generalized Out-of-Distribution Detection and Beyond in Vision Language Model Era: A Survey
Atsuyuki Miyai
Jingkang Yang
Jingyang Zhang
Yifei Ming
Sisir Dhakal
...
Yixuan Li
Hai "Helen" Li
Ziwei Liu
Toshihiko Yamasaki
Kiyoharu Aizawa
141
13
0
31 Jul 2024
Pyramid Coder: Hierarchical Code Generator for Compositional Visual
  Question Answering
Pyramid Coder: Hierarchical Code Generator for Compositional Visual Question Answering
Ruoyue Shen
Nakamasa Inoue
Koichi Shinoda
71
1
0
30 Jul 2024
BIV-Priv-Seg: Locating Private Content in Images Taken by People With Visual Impairments
BIV-Priv-Seg: Locating Private Content in Images Taken by People With Visual Impairments
Yu-Yun Tseng
Tanusree Sharma
Lotus Zhang
Abigale Stangl
Leah Findlater
Yang Wang
Danna Gurari
208
0
0
25 Jul 2024
SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation
SAM-CP: Marrying SAM with Composable Prompts for Versatile Segmentation
Pengfei Chen
Lingxi Xie
Xinyue Huo
Xuehui Yu
Xiaopeng Zhang
Yingfei Sun
Zhenjun Han
Qi Tian
VLM
204
1
0
23 Jul 2024
VCP-CLIP: A visual context prompting model for zero-shot anomaly
  segmentation
VCP-CLIP: A visual context prompting model for zero-shot anomaly segmentation
Zhen Qu
Xian Tao
Mukesh Prasad
Fei Shen
Zhengtao Zhang
Xinyi Gong
Guiguang Ding
VLM
108
16
0
17 Jul 2024
DreamStory: Open-Domain Story Visualization by LLM-Guided Multi-Subject Consistent Diffusion
DreamStory: Open-Domain Story Visualization by LLM-Guided Multi-Subject Consistent Diffusion
Huiguo He
Huan Yang
Zixi Tuo
Yuan Zhou
Qiuyue Wang
Yuhang Zhang
Zeyu Liu
Wenhao Huang
Hongyang Chao
Jian Yin
DiffMVGen
200
17
0
17 Jul 2024
Robotic Control via Embodied Chain-of-Thought Reasoning
Robotic Control via Embodied Chain-of-Thought Reasoning
Michał Zawalski
William Chen
Karl Pertsch
Oier Mees
Chelsea Finn
Sergey Levine
LRMLM&Ro
169
88
0
11 Jul 2024
Previous
123...1011121314
Next