ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.05499
  4. Cited By
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set
  Object Detection

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection

9 March 2023
Shilong Liu
Zhaoyang Zeng
Tianhe Ren
Feng Li
Hao Zhang
Jie-jin Yang
Chun-yue Li
Jianwei Yang
Hang Su
Jun Zhu
Lei Zhang
    ObjD
ArXivPDFHTML

Papers citing "Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection"

50 / 1,339 papers shown
Title
Adaptive Visual Scene Understanding: Incremental Scene Graph Generation
Adaptive Visual Scene Understanding: Incremental Scene Graph Generation
Naitik Khandelwal
Xiao Liu
Mengmi Zhang
CLL
31
0
0
02 Oct 2023
CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense
  Prediction
CLIPSelf: Vision Transformer Distills Itself for Open-Vocabulary Dense Prediction
Size Wu
Wenwei Zhang
Lumin Xu
Sheng Jin
Xiangtai Li
Wentao Liu
Chen Change Loy
CLIP
VLM
32
69
0
02 Oct 2023
Towards reporting bias in visual-language datasets: bimodal augmentation
  by decoupling object-attribute association
Towards reporting bias in visual-language datasets: bimodal augmentation by decoupling object-attribute association
Qiyu Wu
Mengjie Zhao
Yutong He
Lang Huang
Junya Ono
Hiromi Wakaki
Yuki Mitsufuji
33
4
0
02 Oct 2023
GRID: A Platform for General Robot Intelligence Development
GRID: A Platform for General Robot Intelligence Development
Sai H. Vemprala
Shuhang Chen
Abhinav Shukla
Dinesh Narayanan
Ashish Kapoor
32
10
0
02 Oct 2023
Comics for Everyone: Generating Accessible Text Descriptions for Comic
  Strips
Comics for Everyone: Generating Accessible Text Descriptions for Comic Strips
Reshma Ramaprasad
6
5
0
01 Oct 2023
Region-centric Image-Language Pretraining for Open-Vocabulary Detection
Region-centric Image-Language Pretraining for Open-Vocabulary Detection
Dahun Kim
A. Angelova
Weicheng Kuo
ObjD
VLM
17
3
0
29 Sep 2023
UniQuadric: A SLAM Backend for Unknown Rigid Object 3D Tracking and
  Light-Weight Modeling
UniQuadric: A SLAM Backend for Unknown Rigid Object 3D Tracking and Light-Weight Modeling
Linghao Yang
Yanmin Wu
Yu Deng
Rui Tian
Xinggang Hu
Tiefeng Ma
28
1
0
29 Sep 2023
ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and
  Planning
ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and Planning
Yuanyi Zhong
Alihusein Kuwajerwala
Sacha Morin
Krishna Murthy Jatavallabhula
Bipasha Sen
...
Celso Miguel de Melo
Joshua B. Tenenbaum
Antonio Torralba
Florian Shkurti
Liam Paull
LM&Ro
36
169
0
28 Sep 2023
InternLM-XComposer: A Vision-Language Large Model for Advanced
  Text-image Comprehension and Composition
InternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition
Pan Zhang
Xiaoyi Wang
Bin Wang
Yuhang Cao
Chao Xu
...
Conghui He
Xingcheng Zhang
Yu Qiao
Da Lin
Jiaqi Wang
MLLM
80
226
0
26 Sep 2023
VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided
  Planning
VideoDirectorGPT: Consistent Multi-scene Video Generation via LLM-Guided Planning
Han Lin
Abhaysinh Zala
Jaemin Cho
Joey Tianyi Zhou
LM&Ro
VGen
DiffM
56
74
0
26 Sep 2023
MoCaE: Mixture of Calibrated Experts Significantly Improves Object
  Detection
MoCaE: Mixture of Calibrated Experts Significantly Improves Object Detection
Kemal Oksuz
Selim Kuzucu
Tom Joy
P. Dokania
MoE
24
5
0
26 Sep 2023
Motion Segmentation from a Moving Monocular Camera
Motion Segmentation from a Moving Monocular Camera
Yuxiang Huang
John S. Zelek
VOS
31
5
0
24 Sep 2023
Detect Everything with Few Examples
Detect Everything with Few Examples
Xinyu Zhang
Yuting Wang
Abdeslam Boularias
ObjD
VLM
32
13
0
22 Sep 2023
A Large-scale Dataset for Audio-Language Representation Learning
A Large-scale Dataset for Audio-Language Representation Learning
Luoyi Sun
Xuenan Xu
Mengyue Wu
Weidi Xie
34
20
0
20 Sep 2023
Bridging Zero-shot Object Navigation and Foundation Models through
  Pixel-Guided Navigation Skill
Bridging Zero-shot Object Navigation and Foundation Models through Pixel-Guided Navigation Skill
Wenzhe Cai
Siyuan Huang
Guangran Cheng
Yuxing Long
Peng Gao
Changyin Sun
Hao Dong
LM&Ro
25
42
0
19 Sep 2023
Specification-Driven Video Search via Foundation Models and Formal
  Verification
Specification-Driven Video Search via Foundation Models and Formal Verification
Yunhao Yang
Jean-Raphael Gaglione
Sandeep Chinchali
Ufuk Topcu
65
6
0
18 Sep 2023
Triple Regression for Camera Agnostic Sim2Real Robot Grasping and
  Manipulation Tasks
Triple Regression for Camera Agnostic Sim2Real Robot Grasping and Manipulation Tasks
Yuanhong Zeng
Yizhou Zhao
Ying Nian Wu
38
0
0
16 Sep 2023
Efficient Object Rearrangement via Multi-view Fusion
Efficient Object Rearrangement via Multi-view Fusion
Dehao Huang
Chao Tang
Hong Zhang
OCL
32
4
0
16 Sep 2023
GRID: Scene-Graph-based Instruction-driven Robotic Task Planning
GRID: Scene-Graph-based Instruction-driven Robotic Task Planning
Zhe Ni
Xiao-Xin Deng
Cong Tai
Xin-Yue Zhu
Qinghongbing Xie
Yong-Jin Liu
Xiang Wu
Long Zeng
LM&Ro
32
14
0
14 Sep 2023
Leveraging Foundation models for Unsupervised Audio-Visual Segmentation
Leveraging Foundation models for Unsupervised Audio-Visual Segmentation
Swapnil Bhosale
Haosen Yang
Diptesh Kanojia
Xiatian Zhu
VOS
47
5
0
13 Sep 2023
Knowledge-Guided Short-Context Action Anticipation in Human-Centric
  Videos
Knowledge-Guided Short-Context Action Anticipation in Human-Centric Videos
Sarthak Bhagat
Simon Stepputtis
Joseph Campbell
Katia P. Sycara
33
4
0
12 Sep 2023
ImageBind-LLM: Multi-modality Instruction Tuning
ImageBind-LLM: Multi-modality Instruction Tuning
Jiaming Han
Renrui Zhang
Wenqi Shao
Peng Gao
Peng Xu
...
Yafei Wen
Xiaoxin Chen
Xiangyu Yue
Hongsheng Li
Yu Qiao
MLLM
49
117
0
07 Sep 2023
Tracking Anything with Decoupled Video Segmentation
Tracking Anything with Decoupled Video Segmentation
Ho Kei Cheng
Seoung Wug Oh
Brian L. Price
Alexander Schwing
Joon-Young Lee
VOS
VLM
43
121
0
07 Sep 2023
Prompt me a Dataset: An investigation of text-image prompting for
  historical image dataset creation using foundation models
Prompt me a Dataset: An investigation of text-image prompting for historical image dataset creation using foundation models
Hassan el-Hajj
Matteo Valleriani
21
0
0
04 Sep 2023
Big-model Driven Few-shot Continual Learning
Big-model Driven Few-shot Continual Learning
Ziqi Gu
Chunyan Xu
Zihan Lu
Xin Liu
Anbo Dai
Zhen Cui
CLL
35
1
0
02 Sep 2023
OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation
OpenIns3D: Snap and Lookup for 3D Open-vocabulary Instance Segmentation
Zhening Huang
Xiaoyang Wu
Xi Chen
Hengshuang Zhao
Lei Zhu
Joan Lasenby
ISeg
3DPC
VLM
55
47
0
01 Sep 2023
GREC: Generalized Referring Expression Comprehension
GREC: Generalized Referring Expression Comprehension
Shuting He
Henghui Ding
Chang Liu
Xudong Jiang
ObjD
27
14
0
30 Aug 2023
WALL-E: Embodied Robotic WAiter Load Lifting with Large Language Model
WALL-E: Embodied Robotic WAiter Load Lifting with Large Language Model
Tianyu Wang
Yifan Li
Haitao Lin
Xiangyang Xue
Yanwei Fu
LM&Ro
27
8
0
30 Aug 2023
Zero-Shot Edge Detection with SCESAME: Spectral Clustering-based
  Ensemble for Segment Anything Model Estimation
Zero-Shot Edge Detection with SCESAME: Spectral Clustering-based Ensemble for Segment Anything Model Estimation
Hiroaki Yamagiwa
Yusuke Takase
Hiroyuki Kambe
Ryosuke Nakamoto
VLM
37
5
0
26 Aug 2023
Position-Enhanced Visual Instruction Tuning for Multimodal Large
  Language Models
Position-Enhanced Visual Instruction Tuning for Multimodal Large Language Models
Chi Chen
Ruoyu Qin
Fuwen Luo
Xiaoyue Mi
Peng Li
Maosong Sun
Yang Liu
MLLM
VLM
27
45
0
25 Aug 2023
How to Evaluate the Generalization of Detection? A Benchmark for
  Comprehensive Open-Vocabulary Detection
How to Evaluate the Generalization of Detection? A Benchmark for Comprehensive Open-Vocabulary Detection
Yi Yao
Peng Liu
Tiancheng Zhao
Qianqian Zhang
Jiajia Liao
Chunxin Fang
Kyusong Lee
Qing Wang
VLM
ObjD
29
10
0
25 Aug 2023
Large Multilingual Models Pivot Zero-Shot Multimodal Learning across
  Languages
Large Multilingual Models Pivot Zero-Shot Multimodal Learning across Languages
Jinyi Hu
Yuan Yao
Chong Wang
Shanonan Wang
Yinxu Pan
...
Yankai Lin
Jiao Xue
Dahai Li
Zhiyuan Liu
Maosong Sun
MLLM
VLM
35
48
0
23 Aug 2023
ASPIRE: Language-Guided Data Augmentation for Improving Robustness
  Against Spurious Correlations
ASPIRE: Language-Guided Data Augmentation for Improving Robustness Against Spurious Correlations
Sreyan Ghosh
Chandra Kiran Reddy Evuru
Sonal Kumar
Utkarsh Tyagi
Sakshi Singh
Sanjoy Chowdhury
Dinesh Manocha
OOD
30
1
0
19 Aug 2023
MV-ROPE: Multi-view Constraints for Robust Category-level Object Pose
  and Size Estimation
MV-ROPE: Multi-view Constraints for Robust Category-level Object Pose and Size Estimation
Jiaqi Yang
Yucong Chen
Xiangting Meng
C. Yan
Ming Li
Ran Chen
Lige Liu
Tao Sun
L. Kneip
47
2
0
17 Aug 2023
A One Stop 3D Target Reconstruction and multilevel Segmentation Method
A One Stop 3D Target Reconstruction and multilevel Segmentation Method
J. Xu
Wei-Ye Zhao
Zhiyan Tang
X. Gan
3DV
24
2
0
14 Aug 2023
Polyp-SAM++: Can A Text Guided SAM Perform Better for Polyp
  Segmentation?
Polyp-SAM++: Can A Text Guided SAM Perform Better for Polyp Segmentation?
Risab Biswas
MedIm
33
22
0
12 Aug 2023
Follow Anything: Open-set detection, tracking, and following in
  real-time
Follow Anything: Open-set detection, tracking, and following in real-time
Alaa Maalouf
Ninad Jadhav
Krishna Murthy Jatavallabhula
Makram Chahine
Daniel M.Vogt
Robert J. Wood
Antonio Torralba
Daniela Rus
26
24
0
10 Aug 2023
Pseudo-label Alignment for Semi-supervised Instance Segmentation
Pseudo-label Alignment for Semi-supervised Instance Segmentation
Jie Hu
Cheng Chen
Liujuan Cao
Shengchuan Zhang
Annan Shu
Guannan Jiang
Rongrong Ji
ISeg
44
13
0
10 Aug 2023
Multimodal Pretrained Models for Verifiable Sequential Decision-Making:
  Planning, Grounding, and Perception
Multimodal Pretrained Models for Verifiable Sequential Decision-Making: Planning, Grounding, and Perception
Yunhao Yang
Cyrus Neary
Ufuk Topcu
LM&Ro
OffRL
30
5
0
10 Aug 2023
OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion
  and Infinite Data Generation
OmniDataComposer: A Unified Data Structure for Multimodal Data Fusion and Infinite Data Generation
Dongyang Yu
Shihao Wang
Yuan Fang
Wangpeng An
VGen
41
0
0
08 Aug 2023
ImageBrush: Learning Visual In-Context Instructions for Exemplar-Based
  Image Manipulation
ImageBrush: Learning Visual In-Context Instructions for Exemplar-Based Image Manipulation
Yasheng Sun
Yifan Yang
Houwen Peng
Yifei Shen
Yuqing Yang
Hang-Rui Hu
Lili Qiu
Hideki Koike
DiffM
LM&Ro
37
33
0
02 Aug 2023
LISA: Reasoning Segmentation via Large Language Model
LISA: Reasoning Segmentation via Large Language Model
Xin Lai
Zhuotao Tian
Yukang Chen
Yanwei Li
Yuhui Yuan
Shu Liu
Jiaya Jia
LM&Ro
VLM
MLLM
LRM
36
401
0
01 Aug 2023
Tool Documentation Enables Zero-Shot Tool-Usage with Large Language
  Models
Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models
Cheng-Yu Hsieh
Sibei Chen
Chun-Liang Li
Yasuhisa Fujii
Alexander Ratner
Chen-Yu Lee
Ranjay Krishna
Tomas Pfister
LLMAG
SyDa
46
41
0
01 Aug 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming Yang
Fahad Shahbaz Khan
VLM
38
119
0
25 Jul 2023
Fashion Matrix: Editing Photos by Just Talking
Fashion Matrix: Editing Photos by Just Talking
Zheng Chong
Xujie Zhang
Fuwei Zhao
Zhenyu Xie
Xiaodan Liang
DiffM
23
2
0
25 Jul 2023
Described Object Detection: Liberating Object Detection with Flexible
  Expressions
Described Object Detection: Liberating Object Detection with Flexible Expressions
Chi Xie
Zhao Zhang
YiXuan Wu
Feng Zhu
Rui Zhao
Shuang Liang
ObjD
39
30
0
24 Jul 2023
Industrial Segment Anything -- a Case Study in Aircraft Manufacturing,
  Intralogistics, Maintenance, Repair, and Overhaul
Industrial Segment Anything -- a Case Study in Aircraft Manufacturing, Intralogistics, Maintenance, Repair, and Overhaul
Keno Moenck
Arne Wendt
Philipp Prünte
Julian Koch
Arne Sahrhage
...
Falko Kähler
Dirk Holst
Martin Gomse
Thorsten Schuppstuhl
Daniel Schoepflin
VLM
36
6
0
24 Jul 2023
Iterative Robust Visual Grounding with Masked Reference based
  Centerpoint Supervision
Iterative Robust Visual Grounding with Masked Reference based Centerpoint Supervision
Menghao Li
Chunlei Wang
W. Feng
Shuchang Lyu
Guangliang Cheng
Xiangtai Li
Binghao Liu
Qi Zhao
33
5
0
23 Jul 2023
Subject-Diffusion:Open Domain Personalized Text-to-Image Generation
  without Test-time Fine-tuning
Subject-Diffusion:Open Domain Personalized Text-to-Image Generation without Test-time Fine-tuning
Jiancang Ma
Junhao Liang
Chen Chen
H. Lu
31
138
0
21 Jul 2023
RepViT: Revisiting Mobile CNN From ViT Perspective
RepViT: Revisiting Mobile CNN From ViT Perspective
Ao Wang
Hui Chen
Zijia Lin
Hengjun Pu
Guiguang Ding
34
180
0
18 Jul 2023
Previous
123...24252627
Next