ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.05499
  4. Cited By
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set
  Object Detection
v1v2v3v4 (latest)

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

9 March 2023
Shilong Liu
Zhaoyang Zeng
Tianhe Ren
Feng Li
Hao Zhang
Jie Yang
Chun-yue Li
Jianwei Yang
Hang Su
Jun Zhu
Lei Zhang
    ObjD
ArXiv (abs)PDFHTMLGithub (8136★)

Papers citing "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

50 / 690 papers shown
Title
ZeroCAP: Zero-Shot Multi-Robot Context Aware Pattern Formation via Large Language Models
ZeroCAP: Zero-Shot Multi-Robot Context Aware Pattern Formation via Large Language Models
Vishnunandan L. N. Venkatesh
Byung-Cheol Min
LM&Ro
196
2
0
02 Apr 2024
Data-Efficient 3D Visual Grounding via Order-Aware Referring
Data-Efficient 3D Visual Grounding via Order-Aware Referring
Tung-Yu Wu
Sheng-Yu Huang
Yu-Chiang Frank Wang
147
0
0
25 Mar 2024
Segment Anything Model for Road Network Graph Extraction
Segment Anything Model for Road Network Graph Extraction
Congrui Hetang
Haoru Xue
Cindy X. Le
Tianwei Yue
Wenping Wang
Yihui He
141
17
0
24 Mar 2024
DreamMotion: Space-Time Self-Similar Score Distillation for Zero-Shot
  Video Editing
DreamMotion: Space-Time Self-Similar Score Distillation for Zero-Shot Video Editing
Hyeonho Jeong
Jinho Chang
Geon Yeong Park
Jong Chul Ye
DiffMVGen
106
18
0
18 Mar 2024
GazeFusion: Saliency-Guided Image Generation
GazeFusion: Saliency-Guided Image Generation
Yunxiang Zhang
Nan Wu
Connor Z. Lin
Gordon Wetzstein
Qi Sun
113
0
0
16 Mar 2024
Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling
  and Visual-Language Co-Referring
Griffon v2: Advancing Multimodal Perception with High-Resolution Scaling and Visual-Language Co-Referring
Yufei Zhan
Yousong Zhu
Hongyin Zhao
Fan Yang
Ming Tang
Jinqiao Wang
ObjD
103
14
0
14 Mar 2024
Annotation Free Semantic Segmentation with Vision Foundation Models
Annotation Free Semantic Segmentation with Vision Foundation Models
Soroush Seifi
Daniel Olmeda Reino
Fabien Despinoy
Rahaf Aljundi
VLM
108
1
0
14 Mar 2024
CART: Caltech Aerial RGB-Thermal Dataset in the Wild
CART: Caltech Aerial RGB-Thermal Dataset in the Wild
Connor T. Lee
Matthew O. Anderson
Nikhil Raganathan
Xingxing Zuo
Kevin Do
Georgia Gkioxari
Soon-Jo Chung
80
8
0
13 Mar 2024
NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning
NavCoT: Boosting LLM-Based Vision-and-Language Navigation via Learning Disentangled Reasoning
Bingqian Lin
Yunshuang Nie
Ziming Wei
Jiaqi Chen
Shikui Ma
Jianhua Han
Hang Xu
Xiaojun Chang
Xiaodan Liang
LM&RoLRM
143
28
0
12 Mar 2024
Contrastive Region Guidance: Improving Grounding in Vision-Language
  Models without Training
Contrastive Region Guidance: Improving Grounding in Vision-Language Models without Training
David Wan
Jaemin Cho
Elias Stengel-Eskin
Mohit Bansal
VLMObjD
118
36
0
04 Mar 2024
The Revolution of Multimodal Large Language Models: A Survey
The Revolution of Multimodal Large Language Models: A Survey
Davide Caffagni
Federico Cocchi
Luca Barsellotti
Nicholas Moratelli
Sara Sarto
Lorenzo Baraldi
Lorenzo Baraldi
Marcella Cornia
Rita Cucchiara
LRMVLM
139
64
0
19 Feb 2024
OpenFMNav: Towards Open-Set Zero-Shot Object Navigation via
  Vision-Language Foundation Models
OpenFMNav: Towards Open-Set Zero-Shot Object Navigation via Vision-Language Foundation Models
Yuxuan Kuang
Hai Lin
Meng Jiang
LM&Ro
105
34
0
16 Feb 2024
Domain Adaptable Fine-Tune Distillation Framework For Advancing Farm
  Surveillance
Domain Adaptable Fine-Tune Distillation Framework For Advancing Farm Surveillance
Raza Imam
Muhammad Huzaifa
Nabil Mansour
Shaher Bano Mirza
Fouad Lamghari
126
1
0
10 Feb 2024
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
SPHINX-X: Scaling Data and Parameters for a Family of Multi-modal Large Language Models
Chris Liu
Renrui Zhang
Longtian Qiu
Siyuan Huang
Weifeng Lin
...
Hao Shao
Pan Lu
Hongsheng Li
Yu Qiao
Peng Gao
MLLM
248
116
0
08 Feb 2024
Multimodal Rationales for Explainable Visual Question Answering
Multimodal Rationales for Explainable Visual Question Answering
Kun Li
G. Vosselman
Michael Ying Yang
134
2
0
06 Feb 2024
Unified Hallucination Detection for Multimodal Large Language Models
Unified Hallucination Detection for Multimodal Large Language Models
Xiang Chen
Chenxi Wang
Yida Xue
Ningyu Zhang
Xiaoyan Yang
Qian Li
Yue Shen
Lei Liang
Jinjie Gu
Huajun Chen
HILM
130
45
0
05 Feb 2024
Conditioning non-linear and infinite-dimensional diffusion processes
Conditioning non-linear and infinite-dimensional diffusion processes
E. Baker
Gefan Yang
Michael L. Severinsen
C. Hipsley
Stefan Sommer
DiffM
84
8
0
02 Feb 2024
SAGD: Boundary-Enhanced Segment Anything in 3D Gaussian via Gaussian Decomposition
SAGD: Boundary-Enhanced Segment Anything in 3D Gaussian via Gaussian Decomposition
Xu Hu
Yuxi Wang
Lue Fan
Junsong Fan
Junran Peng
Zhen Lei
Qing Li
Zhaoxiang Zhang
Zhaoxiang Zhang
3DGS
173
9
0
31 Jan 2024
Rapid post-disaster infrastructure damage characterisation enabled by
  remote sensing and deep learning technologies -- a tiered approach
Rapid post-disaster infrastructure damage characterisation enabled by remote sensing and deep learning technologies -- a tiered approach
Nadiia Kopiika
A. Karavias
P. Krassakis
Zehao Ye
Jelena Ninić
N. Shakhovska
Nikolaos Koukouzas
S. Argyroudis
S. Mitoulis
45
13
0
31 Jan 2024
InternLM-XComposer2: Mastering Free-form Text-Image Composition and
  Comprehension in Vision-Language Large Model
InternLM-XComposer2: Mastering Free-form Text-Image Composition and Comprehension in Vision-Language Large Model
Xiao-wen Dong
Pan Zhang
Yuhang Zang
Yuhang Cao
Bin Wang
...
Conghui He
Xingcheng Zhang
Yu Qiao
Dahua Lin
Jiaqi Wang
VLMMLLM
166
268
0
29 Jan 2024
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual
  Perception
Mobile-Agent: Autonomous Multi-Modal Mobile Device Agent with Visual Perception
Junyang Wang
Haiyang Xu
Jiabo Ye
Mingshi Yan
Weizhou Shen
Ji Zhang
Fei Huang
Jitao Sang
147
129
0
29 Jan 2024
HAZARD Challenge: Embodied Decision Making in Dynamically Changing
  Environments
HAZARD Challenge: Embodied Decision Making in Dynamically Changing Environments
Qinhong Zhou
Sunli Chen
Yisong Wang
Haozhe Xu
Weihua Du
Hongxin Zhang
Yilun Du
Josh Tenenbaum
Chuang Gan
AI4CE
81
18
0
23 Jan 2024
CCA: Collaborative Competitive Agents for Image Editing
CCA: Collaborative Competitive Agents for Image Editing
Tiankai Hang
Shuyang Gu
Dong Chen
Xin Geng
Baining Guo
164
5
0
23 Jan 2024
Semantic Prompt Learning for Weakly-Supervised Semantic Segmentation
Semantic Prompt Learning for Weakly-Supervised Semantic Segmentation
Ci-Siang Lin
Chien-Yi Wang
Yu-Chiang Frank Wang
Min-Hung Chen
VLM
254
0
0
22 Jan 2024
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
Zongxin Yang
Guikun Chen
Xiaodi Li
Wenguan Wang
Yi Yang
LM&RoLLMAG
196
41
0
16 Jan 2024
Seeing the Unseen: Visual Common Sense for Semantic Placement
Seeing the Unseen: Visual Common Sense for Semantic Placement
Ram Ramrakhya
Aniruddha Kembhavi
Dhruv Batra
Z. Kira
Kuo-Hao Zeng
Luca Weihs
VLM
108
6
0
15 Jan 2024
RePLan: Robotic Replanning with Perception and Language Models
RePLan: Robotic Replanning with Perception and Language Models
Marta Skreta
Zihan Zhou
Jia Lin Yuan
Kourosh Darvish
Alán Aspuru-Guzik
Animesh Garg
LM&RoLRM
125
26
0
08 Jan 2024
Learning to Prompt with Text Only Supervision for Vision-Language Models
Learning to Prompt with Text Only Supervision for Vision-Language Models
Muhammad Uzair Khattak
Muhammad Ferjad Naeem
Muzammal Naseer
Luc Van Gool
F. Tombari
VLMVPVLM
97
22
0
04 Jan 2024
A Semantic Space is Worth 256 Language Descriptions: Make Stronger
  Segmentation Models with Descriptive Properties
A Semantic Space is Worth 256 Language Descriptions: Make Stronger Segmentation Models with Descriptive Properties
Junfei Xiao
Ziqi Zhou
Wenxuan Li
Shiyi Lan
Jieru Mei
Zhiding Yu
Alan Yuille
Yuyin Zhou
Cihang Xie
VLM
65
1
0
21 Dec 2023
ASSISTGUI: Task-Oriented Desktop Graphical User Interface Automation
ASSISTGUI: Task-Oriented Desktop Graphical User Interface Automation
Difei Gao
Lei Ji
Zechen Bai
Mingyu Ouyang
Peiran Li
...
Peiyi Wang
Xiangwu Guo
Hengxu Wang
Luowei Zhou
Mike Zheng Shou
LLMAG
109
24
0
20 Dec 2023
Collaborating Foundation Models for Domain Generalized Semantic
  Segmentation
Collaborating Foundation Models for Domain Generalized Semantic Segmentation
Yasser Benigmim
Subhankar Roy
S. Essid
Vicky Kalogeiton
Stéphane Lathuilière
139
14
0
15 Dec 2023
InstructAny2Pix: Flexible Visual Editing via Multimodal Instruction
  Following
InstructAny2Pix: Flexible Visual Editing via Multimodal Instruction Following
Shufan Li
Harkanwar Singh
Aditya Grover
DiffM
93
10
0
11 Dec 2023
Make-A-Storyboard: A General Framework for Storyboard with Disentangled
  and Merged Control
Make-A-Storyboard: A General Framework for Storyboard with Disentangled and Merged Control
Jingkuan Song
Litao Guo
Lianli Gao
Hengtao Shen
Jingkuan Song
DiffM
72
3
0
06 Dec 2023
AI-SAM: Automatic and Interactive Segment Anything Model
AI-SAM: Automatic and Interactive Segment Anything Model
Yimu Pan
Sitao Zhang
Alison D. Gernand
Jeffery A. Goldstein
J. Z. Wang
VLM
68
4
0
05 Dec 2023
Human Demonstrations are Generalizable Knowledge for Robots
Human Demonstrations are Generalizable Knowledge for Robots
Te Cui
Guangyan Chen
Tianxing Zhou
Zicai Peng
Mengxiao Hu
Haoyang Lu
Haizhou Li
Meiling Wang
Yi Yang
Yufeng Yue
LM&Ro
98
6
0
05 Dec 2023
Learning Efficient Unsupervised Satellite Image-based Building Damage
  Detection
Learning Efficient Unsupervised Satellite Image-based Building Damage Detection
Yiyun Zhang
Zijian Wang
Yadan Luo
Xin Yu
Zi Huang
51
4
0
04 Dec 2023
LVDiffusor: Distilling Functional Rearrangement Priors from Large Models
  into Diffusor
LVDiffusor: Distilling Functional Rearrangement Priors from Large Models into Diffusor
Yiming Zeng
Mingdong Wu
Long Yang
Jiyao Zhang
Hao Ding
Hui Cheng
Hao Dong
DiffM
73
8
0
03 Dec 2023
Segment Any 3D Gaussians
Segment Any 3D Gaussians
Jiazhong Cen
Jiemin Fang
Chen Yang
Lingxi Xie
Xiaopeng Zhang
Wei Shen
Qi Tian
3DGS
178
76
0
01 Dec 2023
GaussianEditor: Editing 3D Gaussians Delicately with Text Instructions
GaussianEditor: Editing 3D Gaussians Delicately with Text Instructions
Jiemin Fang
Junjie Wang
Xiaopeng Zhang
Lingxi Xie
Qi Tian
3DGSDiffM
133
117
0
27 Nov 2023
Obj-NeRF: Extract Object NeRFs from Multi-view Images
Obj-NeRF: Extract Object NeRFs from Multi-view Images
Zhiyi Li
Lihe Ding
Tianfan Xue
61
1
0
26 Nov 2023
Griffon: Spelling out All Object Locations at Any Granularity with Large
  Language Models
Griffon: Spelling out All Object Locations at Any Granularity with Large Language Models
Yufei Zhan
Yousong Zhu
Zhiyang Chen
Fan Yang
E. Goles
Jinqiao Wang
ObjD
114
17
0
24 Nov 2023
Florence-2: Advancing a Unified Representation for a Variety of Vision
  Tasks
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Bin Xiao
Haiping Wu
Weijian Xu
Xiyang Dai
Houdong Hu
Yumao Lu
Michael Zeng
Ce Liu
Lu Yuan
VLM
141
175
0
10 Nov 2023
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Shilong Liu
Hao Cheng
Haotian Liu
Hao Zhang
Feng Li
...
Hang Su
Jun Zhu
Lei Zhang
Jianfeng Gao
Chun-yue Li
MLLMVLM
121
126
0
09 Nov 2023
Get the Ball Rolling: Alerting Autonomous Robots When to Help to Close
  the Healthcare Loop
Get the Ball Rolling: Alerting Autonomous Robots When to Help to Close the Healthcare Loop
Jiaxin Shen
Yanyao Liu
Ziming Wang
Ziyuan Jiao
Yufeng Chen
Wenjuan Han
39
0
0
05 Nov 2023
UniFolding: Towards Sample-efficient, Scalable, and Generalizable
  Robotic Garment Folding
UniFolding: Towards Sample-efficient, Scalable, and Generalizable Robotic Garment Folding
Han Xue
Yutong Li
Wenqiang Xu
Huanyu Li
Dongzhe Zheng
Cewu Lu
105
15
0
02 Nov 2023
Audio-Visual Instance Segmentation
Audio-Visual Instance Segmentation
Ruohao Guo
Yaru Chen
Yanyu Qi
Wenzhen Yue
Dantong Niu
...
Wenzhen Yue
Ji Shi
Qixun Wang
Peiliang Zhang
Buwen Liang
VLMVOS
127
2
0
28 Oct 2023
Fine-Tuning Language Models Using Formal Methods Feedback
Fine-Tuning Language Models Using Formal Methods Feedback
Yunhao Yang
N. Bhatt
Tyler Ingebrand
William Ward
Steven Carr
Zhangyang Wang
Ufuk Topcu
69
9
0
27 Oct 2023
Drive Anywhere: Generalizable End-to-end Autonomous Driving with
  Multi-modal Foundation Models
Drive Anywhere: Generalizable End-to-end Autonomous Driving with Multi-modal Foundation Models
Tsun-Hsuan Wang
Alaa Maalouf
Wei Xiao
Yutong Ban
Alexander Amini
Guy Rosman
S. Karaman
Daniela Rus
73
46
0
26 Oct 2023
Woodpecker: Hallucination Correction for Multimodal Large Language
  Models
Woodpecker: Hallucination Correction for Multimodal Large Language Models
Shukang Yin
Chaoyou Fu
Sirui Zhao
Tong Xu
Hao Wang
Dianbo Sui
Yunhang Shen
Ke Li
Xingguo Sun
Enhong Chen
VLMMLLM
108
133
0
24 Oct 2023
Interactive Task Planning with Language Models
Interactive Task Planning with Language Models
Boyi Li
Philipp Wu
Pieter Abbeel
Jitendra Malik
LM&Ro
116
38
0
16 Oct 2023
Previous
123...121314
Next