ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.05499
  4. Cited By
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set
  Object Detection
v1v2v3v4 (latest)

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

9 March 2023
Shilong Liu
Zhaoyang Zeng
Tianhe Ren
Feng Li
Hao Zhang
Jie Yang
Chun-yue Li
Jianwei Yang
Hang Su
Jun Zhu
Lei Zhang
    ObjD
ArXiv (abs)PDFHTMLGithub (8136★)

Papers citing "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

50 / 690 papers shown
Title
InstructPart: Task-Oriented Part Segmentation with Instruction Reasoning
InstructPart: Task-Oriented Part Segmentation with Instruction Reasoning
Zifu Wan
Yaqi Xie
Ce Zhang
Zhiqiu Lin
Zihan Wang
Simon Stepputtis
Deva Ramanan
Katia Sycara
34
0
0
23 May 2025
SEM: Enhancing Spatial Understanding for Robust Robot Manipulation
SEM: Enhancing Spatial Understanding for Robust Robot Manipulation
Xuewu Lin
Tianwei Lin
Lichao Huang
Hongyu Xie
Yiwei Jin
Keyu Li
Zhizhong Su
52
0
0
22 May 2025
Analyzing Fine-Grained Alignment and Enhancing Vision Understanding in Multimodal Language Models
Analyzing Fine-Grained Alignment and Enhancing Vision Understanding in Multimodal Language Models
Jiachen Jiang
Jinxin Zhou
Bo Peng
Xia Ning
Zhihui Zhu
109
0
0
22 May 2025
Panoptic Captioning: Seeking An Equivalency Bridge for Image and Text
Panoptic Captioning: Seeking An Equivalency Bridge for Image and Text
Kun-Yu Lin
Hongjun Wang
Weining Ren
Kai Han
299
0
0
22 May 2025
Expanding Zero-Shot Object Counting with Rich Prompts
Expanding Zero-Shot Object Counting with Rich Prompts
Huilin Zhu
Senyao Li
Jingling Yuan
Zhengwei Yang
Yu Guo
Wenxuan Liu
Xian Zhong
Shengfeng He
VLM
109
0
0
21 May 2025
From Grounding to Manipulation: Case Studies of Foundation Model Integration in Embodied Robotic Systems
From Grounding to Manipulation: Case Studies of Foundation Model Integration in Embodied Robotic Systems
Xiuchao Sui
Daiying Tian
Qi Sun
Ruirui Chen
Dongkyu Choi
Kenneth Kwok
Soujanya Poria
LM&Ro
121
0
0
21 May 2025
Exploring the Limits of Vision-Language-Action Manipulations in Cross-task Generalization
Exploring the Limits of Vision-Language-Action Manipulations in Cross-task Generalization
Jiaming Zhou
Ke Ye
Jiayi Liu
Teli Ma
Zifang Wang
Ronghe Qiu
Kun-Yu Lin
Zhilin Zhao
Junwei Liang
132
2
0
21 May 2025
Object-Focus Actor for Data-efficient Robot Generalization Dexterous Manipulation
Object-Focus Actor for Data-efficient Robot Generalization Dexterous Manipulation
Yihang Li
Tianle Zhang
Xuelong Wei
Jiayi Li
Lin Zhao
Dongchi Huang
Zhirui Fang
Minhua Zheng
Wenjun Dai
Xiaodong He
82
0
0
21 May 2025
Ground-V: Teaching VLMs to Ground Complex Instructions in Pixels
Ground-V: Teaching VLMs to Ground Complex Instructions in Pixels
Yongshuo Zong
Qin Zhang
Dongsheng An
Zhihua Li
Xiang Xu
Linghan Xu
Zhuowen Tu
Yifan Xing
Onkar Dabeer
ObjD
103
0
0
20 May 2025
From Assistants to Adversaries: Exploring the Security Risks of Mobile LLM Agents
From Assistants to Adversaries: Exploring the Security Risks of Mobile LLM Agents
Liangxuan Wu
Chao Wang
Tianming Liu
Yanjie Zhao
Haoyu Wang
AAML
85
0
0
19 May 2025
Policy Contrastive Decoding for Robotic Foundation Models
Policy Contrastive Decoding for Robotic Foundation Models
Shihan Wu
Ji Zhang
Xu Luo
Junlin Xie
Jingkuan Song
Heng Tao Shen
Lianli Gao
OffRL
284
0
0
19 May 2025
GraspMolmo: Generalizable Task-Oriented Grasping via Large-Scale Synthetic Data Generation
GraspMolmo: Generalizable Task-Oriented Grasping via Large-Scale Synthetic Data Generation
Abhay Deshpande
Yuquan Deng
Arijit Ray
Jordi Salvador
Winson Han
Jiafei Duan
Kuo-Hao Zeng
Yuke Zhu
Ranjay Krishna
Rose Hendrix
109
0
0
19 May 2025
Scalable Video-to-Dataset Generation for Cross-Platform Mobile Agents
Scalable Video-to-Dataset Generation for Cross-Platform Mobile Agents
Yunseok Jang
Yeda Song
Sungryull Sohn
Lajanugen Logeswaran
Tiange Luo
Dong-Ki Kim
Kyunghoon Bae
Honglak Lee
VGen
62
0
0
19 May 2025
Is Semantic SLAM Ready for Embedded Systems ? A Comparative Survey
Is Semantic SLAM Ready for Embedded Systems ? A Comparative Survey
Calvin Galagain
Martyna Poreba
François Goulette
80
1
0
18 May 2025
PRS-Med: Position Reasoning Segmentation with Vision-Language Model in Medical Imaging
PRS-Med: Position Reasoning Segmentation with Vision-Language Model in Medical Imaging
Quoc-Huy Trinh
Minh-Van Nguyen
Jung Peng
Ulas Bagci
Debesh Jha
210
0
0
17 May 2025
VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning
VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning
Yuqi Liu
Tianyuan Qu
Zhisheng Zhong
Bohao Peng
Shu Liu
Bei Yu
Jiaya Jia
VLMLRM
143
3
0
17 May 2025
MedSG-Bench: A Benchmark for Medical Image Sequences Grounding
MedSG-Bench: A Benchmark for Medical Image Sequences Grounding
Jingkun Yue
Siqi Zhang
Zinan Jia
Huihuan Xu
Zongbo Han
Xiaohong Liu
Guangyu Wang
VLM
72
0
0
17 May 2025
LOVE: Benchmarking and Evaluating Text-to-Video Generation and Video-to-Text Interpretation
LOVE: Benchmarking and Evaluating Text-to-Video Generation and Video-to-Text Interpretation
Jiarui Wang
Huiyu Duan
Ziheng Jia
Yu Zhao
Woo Yi Yang
...
Zhongfu Chen
Juntong Wang
Yuke Xing
Guangtao Zhai
Xiongkuo Min
VGen
84
1
0
17 May 2025
PSDiffusion: Harmonized Multi-Layer Image Generation via Layout and Appearance Alignment
PSDiffusion: Harmonized Multi-Layer Image Generation via Layout and Appearance Alignment
Dingbang Huang
Wenbo Li
Yifei Zhao
Xinyu Pan
Yanhong Zeng
Bo Dai
DiffM
68
0
0
16 May 2025
Disambiguating Reference in Visually Grounded Dialogues through Joint Modeling of Textual and Multimodal Semantic Structures
Disambiguating Reference in Visually Grounded Dialogues through Joint Modeling of Textual and Multimodal Semantic Structures
Shun Inadumi
Nobuhiro Ueda
Koichiro Yoshino
ObjD
93
0
0
16 May 2025
Does Feasibility Matter? Understanding the Impact of Feasibility on Synthetic Training Data
Does Feasibility Matter? Understanding the Impact of Feasibility on Synthetic Training Data
Yiwen Liu
Jessica Bader
Jae Myung Kim
DiffM
81
1
0
15 May 2025
Towards Safe Robot Foundation Models Using Inductive Biases
Maximilian Tölle
Theo Gruner
Daniel Palenicek
Tim Schneider
Jonas Günster
Joe Watson
Davide Tateo
Puze Liu
Jan Peters
OffRLAI4CE
69
0
0
15 May 2025
A Unified and Scalable Membership Inference Method for Visual Self-supervised Encoder via Part-aware Capability
A Unified and Scalable Membership Inference Method for Visual Self-supervised Encoder via Part-aware Capability
Jie Zhu
Jirong Zha
Ding Li
Leye Wang
147
1
0
15 May 2025
Air-Ground Collaboration for Language-Specified Missions in Unknown Environments
Air-Ground Collaboration for Language-Specified Missions in Unknown Environments
Fernando Cladera
Zachary Ravichandran
Jason Hughes
Varun Murali
Carlos Nieto-Granda
M. Hsieh
George J. Pappas
Camillo J Taylor
Vijay Kumar
85
2
0
14 May 2025
ManipBench: Benchmarking Vision-Language Models for Low-Level Robot Manipulation
ManipBench: Benchmarking Vision-Language Models for Low-Level Robot Manipulation
Enyu Zhao
Vedant Raval
Hejia Zhang
Jiageng Mao
Zeyu Shangguan
Stefanos Nikolaidis
Yun Wang
Daniel Seita
LM&RoCoGe
100
0
0
14 May 2025
Augmented Reality for RObots (ARRO): Pointing Visuomotor Policies Towards Visual Robustness
Augmented Reality for RObots (ARRO): Pointing Visuomotor Policies Towards Visual Robustness
Reihaneh Mirjalili
Tobias Jülg
Florian Walter
Wolfram Burgard
69
0
0
13 May 2025
LaDi-WM: A Latent Diffusion-based World Model for Predictive Manipulation
LaDi-WM: A Latent Diffusion-based World Model for Predictive Manipulation
Yuhang Huang
JIazhao Zhang
SHilong Zou
Xinwang Liu
Ruizhen Hu
Kai Xu
94
0
0
13 May 2025
OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning
OpenThinkIMG: Learning to Think with Images via Visual Tool Reinforcement Learning
Zhaochen Su
Linjie Li
Mingyang Song
Yunzhuo Hao
Zhengyuan Yang
...
Guanjie Chen
Jiawei Gu
Juntao Li
Xiaoye Qu
Yu Cheng
OffRLLRM
99
11
0
13 May 2025
From Seeing to Doing: Bridging Reasoning and Decision for Robotic Manipulation
From Seeing to Doing: Bridging Reasoning and Decision for Robotic Manipulation
Yifu Yuan
Haiqin Cui
Yibin Chen
Zibin Dong
Fei Ni
Longxin Kou
Jinyi Liu
Pengyi Li
Yan Zheng
Jianye Hao
160
0
0
13 May 2025
BETTY Dataset: A Multi-modal Dataset for Full-Stack Autonomy
BETTY Dataset: A Multi-modal Dataset for Full-Stack Autonomy
Micah Nye
Ayoub Raji
Andrew Saba
Eidan Erlich
Robert Exley
...
Ritesh Misra
Matthew Sivaprakasam
Marko Bertogna
Deva Ramanan
Sebastian A. Scherer
136
0
0
12 May 2025
The First WARA Robotics Mobile Manipulation Challenge -- Lessons Learned
The First WARA Robotics Mobile Manipulation Challenge -- Lessons Learned
David Cáceres-Domínguez
M. Iannotta
Abhishek Kashyap
Shuo Sun
Yuxuan Yang
...
Zheng Jia
Graziano Carriero
Sofia Lindqvist
Silvio Di Castro
Matteo Iovino
82
0
0
11 May 2025
Towards Artificial General or Personalized Intelligence? A Survey on Foundation Models for Personalized Federated Intelligence
Towards Artificial General or Personalized Intelligence? A Survey on Foundation Models for Personalized Federated Intelligence
Yu Qiao
Huy Q. Le
Avi Deb Raha
Phuong-Nam Tran
Apurba Adhikary
Mengchun Zhang
Loc X. Nguyen
Eui-nam Huh
Dusit Niyato
Choong Seon Hong
AI4CE
163
1
0
11 May 2025
UniDiffGrasp: A Unified Framework Integrating VLM Reasoning and VLM-Guided Part Diffusion for Open-Vocabulary Constrained Grasping with Dual Arms
UniDiffGrasp: A Unified Framework Integrating VLM Reasoning and VLM-Guided Part Diffusion for Open-Vocabulary Constrained Grasping with Dual Arms
Xueyang Guo
Hongwei Hu
Chengye Song
Jingshu Chen
Zilin Zhao
Yu Fu
Bowen Guan
Zhenze Liu
102
0
0
11 May 2025
Causal Prompt Calibration Guided Segment Anything Model for Open-Vocabulary Multi-Entity Segmentation
Causal Prompt Calibration Guided Segment Anything Model for Open-Vocabulary Multi-Entity Segmentation
Wenwen Qiang
Jianqi Zhang
Jingyao Wang
Changwen Zheng
VLM
147
0
0
10 May 2025
Describe Anything in Medical Images
Describe Anything in Medical Images
Xi Xiao
Yunbei Zhang
Thanh-Huy Nguyen
Ba Thinh Lam
Janet Wang
...
Xiaobei Wang
Xiao Wang
Hao Xu
Tianming Liu
Min Xu
MedImVLM
198
0
0
09 May 2025
Federated Deconfounding and Debiasing Learning for Out-of-Distribution Generalization
Federated Deconfounding and Debiasing Learning for Out-of-Distribution Generalization
Zhuang Qi
Sijin Zhou
Lei Meng
Han Hu
Han Yu
Xiangxu Meng
FedMLCML
497
1
0
08 May 2025
EcoAgent: An Efficient Edge-Cloud Collaborative Multi-Agent Framework for Mobile Automation
EcoAgent: An Efficient Edge-Cloud Collaborative Multi-Agent Framework for Mobile Automation
Biao Yi
Xavier Hu
Yexin Chen
Shengyu Zhang
Hongxia Yang
Fan Wu
Leilei Gan
LLMAG
498
0
0
08 May 2025
Visual Affordances: Enabling Robots to Understand Object Functionality
Visual Affordances: Enabling Robots to Understand Object Functionality
Tommaso Apicella
Alessio Xompero
Andrea Cavallaro
134
0
0
08 May 2025
CityNavAgent: Aerial Vision-and-Language Navigation with Hierarchical Semantic Planning and Global Memory
CityNavAgent: Aerial Vision-and-Language Navigation with Hierarchical Semantic Planning and Global Memory
Weichen Zhang
Chen Gao
Shiquan Yu
Ruiying Peng
Baining Zhao
Qian Zhang
Jinqiang Cui
Xinlei Chen
Yongqian Li
LLMAGLM&Ro
151
0
0
08 May 2025
Lay-Your-Scene: Natural Scene Layout Generation with Diffusion Transformers
Lay-Your-Scene: Natural Scene Layout Generation with Diffusion Transformers
Divyansh Srivastava
Xiang Zhang
He Wen
Chenru Wen
Zhuowen Tu
DiffM
84
0
0
07 May 2025
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation
Teng Hu
Zhentao Yu
Zhengguang Zhou
Sen Liang
Yuan Zhou
Qin Lin
Qinglin Lu
DiffMVGen
201
6
0
07 May 2025
Corner Cases: How Size and Position of Objects Challenge ImageNet-Trained Models
Corner Cases: How Size and Position of Objects Challenge ImageNet-Trained Models
Mishal Fatima
Steffen Jung
Margret Keuper
81
0
0
06 May 2025
From Word to Sentence: A Large-Scale Multi-Instance Dataset for Open-Set Aerial Detection
From Word to Sentence: A Large-Scale Multi-Instance Dataset for Open-Set Aerial Detection
Guoting Wei
Yu Liu
Xia Yuan
Xizhe Xue
Linlin Guo
Yifan Yang
Chunxia Zhao
Zongwen Bai
Haokui Zhang
Rong Xiao
ObjD
85
0
0
06 May 2025
No Other Representation Component Is Needed: Diffusion Transformers Can Provide Representation Guidance by Themselves
No Other Representation Component Is Needed: Diffusion Transformers Can Provide Representation Guidance by Themselves
Dengyang Jiang
Mengmeng Wang
Liuzhuozheng Li
Lei Zhang
Haoyu Wang
Wei Wei
Guang Dai
Yanning Zhang
Jingdong Wang
DiffM
123
0
0
05 May 2025
6D Pose Estimation on Spoons and Hands
6D Pose Estimation on Spoons and Hands
Kevin Tan
Fan Yang
Yuxiao Chen
79
0
0
05 May 2025
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Unified Multimodal Understanding and Generation Models: Advances, Challenges, and Opportunities
Wei Wei
Jintao Guo
Shanshan Zhao
Minghao Fu
Lunhao Duan
...
Guo-Hua Wang
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
DiffM
323
1
0
05 May 2025
LLM-Guided Probabilistic Program Induction for POMDP Model Estimation
LLM-Guided Probabilistic Program Induction for POMDP Model Estimation
Aidan Curtis
Hao Tang
Thiago Veloso
Kevin Ellis
Joshua B. Tenenbaum
Tomás Lozano-Pérez
Leslie Pack Kaelbling
405
1
0
04 May 2025
Compositional Image-Text Matching and Retrieval by Grounding Entities
Compositional Image-Text Matching and Retrieval by Grounding Entities
Madhukar Reddy Vongala
Saurabh Srivastava
Jana Kosecka
CLIPCoGeVLM
99
0
0
04 May 2025
CrayonRobo: Object-Centric Prompt-Driven Vision-Language-Action Model for Robotic Manipulation
CrayonRobo: Object-Centric Prompt-Driven Vision-Language-Action Model for Robotic Manipulation
Xiaoqi Li
Lingyun Xu
Hao Fei
Jiaming Liu
Yan Shen
...
Jiahui Xu
Liang Heng
Siyuan Huang
Shanghang Zhang
Hao Dong
LM&Ro
131
0
0
04 May 2025
RESAnything: Attribute Prompting for Arbitrary Referring Segmentation
RESAnything: Attribute Prompting for Arbitrary Referring Segmentation
Ruiqi Wang
Hao Zhang
VLM
113
1
0
03 May 2025
Previous
123456...121314
Next